Network Socket Programming
David Nguyen, Duy-Ky Nguyen, PhD
1. Introduction
The principle is 2 machines can communicate only if they know each other`s physical network address. Given a web site, say our ISP <a href=#> www.cwo.com</a> that could been seen as logical address, how our machine must get the physical address of that site to reach there. This physical address is usually Ethernet address (EA) of its network interface card (NIC).
We`ll use 2 utilities implemented in Windows ping and tracert (traceroute in UNIX). The ping utility is used to check if the target destination is accessible. It has an option to record the routes to get there, but maximum is 9. If the number of routes is greater than 9, we can use the trace utility.
In the simplest case, we`ll use ping to check if the NIC card of our machine is working well by typing a command ping nguyen into the command console (MS-DOS window) where nguyen is our machine ID.
c:\>ping nguyen
Pinging nguyen [192.168.47.41] with 32 bytes of data:
Reply from 192.168.47.41: bytes=32 time<10ms TTL=128
Reply from 192.168.47.41: bytes=32 time<10ms TTL=128
Reply from 192.168.47.41: bytes=32 time<10ms TTL=128
Reply from 192.168.47.41: bytes=32 time<10ms TTL=128
Ping statistics for 192.168.47.41:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
|
In the next simpler case, one machine, say Host 41 with full address 192.168.47.41 known as IP (Internet Protocol) address (IA), want to access the another one, say Host 31 (192.168.47.31 where 192.168.47 as net_id and 31 as host_id). First it sends out a request (broadcast) to all machine on the same local physical network to see if any has the address 192.168.47.31. The machine Host 31 responds with its physical network address and a connection is established.
c:\>ping tramhung
Pinging tramhung [192.168.47.31] with 32 bytes of data:
Reply from 192.168.47.31: bytes=32 time<10ms TTL=128
Reply from 192.168.47.31: bytes=32 time<10ms TTL=128
Reply from 192.168.47.31: bytes=32 time<10ms TTL=128
Reply from 192.168.47.31: bytes=32 time<10ms TTL=128
Ping statistics for 192.168.47.31:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
|
Although IE is used for all network jobs, humans work best using its equivalent name known as domain name. The Domain Name System is a distributed database to provide the mapping between IA and the name.
Now, if Host 41 wants to a remote one, say our ISP with domain name <a href=#>www.cwo.com</a>, a literal address instead of numerical one. It broadcasts a request with that name address. Since there`s no such machine on its local network, it receives no respond within a time period. It`ll send that request to its default router. Every machine knows its default router for this very case. A regular machine attaches to only 1 physical network, while a router has at least two. Once receiving the request, the router does exactly the same, ie broadcasts to see, on the same physical network, if any machine has that address. If not, it sends the request to its default router. This very process keeps going until there`s a machine responds with its physical address and a connection is established. An advantage of this method is there`re usually more than 1 router for a physical network and the route with lighter traffic will respond sooner and the signal will take that route for a faster access.
Every NIC has different EA. If you happen to be an engineer to test your network product, ensure to assign a new EA different from those on the test network. Otherwise we`ll access to both, different time, different NIC whichever responds first.
c:\>ping cwo.com
Pinging cwo.com [209.210.78.4] with 32 bytes of data:
Reply from 209.210.78.4: bytes=32 time=291ms TTL=60
Reply from 209.210.78.4: bytes=32 time=260ms TTL=61
Reply from 209.210.78.4: bytes=32 time=270ms TTL=61
Reply from 209.210.78.4: bytes=32 time=261ms TTL=61
Ping statistics for 209.210.78.4:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 260ms, Maximum = 291ms, Average = 270ms
|
Now we find routes to <a href=#>www.cwo.com</a>
c:\>tracert www.cwo.com
Tracing route to cwo.com [209.210.78.4]
over a maximum of 30 hops:
1 270 ms 280 ms 261 ms pm3-3ok.cwo.com [207.173.10.6]
2 261 ms 270 ms 260 ms oakland-sac.cwo.com [207.173.10.1]
3 271 ms 270 ms 271 ms sac-main.cwo.com [209.210.78.20]
4 260 ms 280 ms 271 ms cwo.com [209.210.78.4]
Trace complete.
|
We have the corresponding network using this trace information
Below are further network info of our machine using command ipconfig /all
c:\>ipconfig /all
Windows 2000 IP Configuration
Host Name . . . . . . . . . . . . : nguyen
Primary DNS Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Broadcast
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
Ethernet adapter Local Area Connection:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : 3Com EtherLink 10/100 PCI TX NIC (3C
905B-TX)
Physical Address. . . . . . . . . : 00-10-4B-9F-04-49
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.47.41
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.47.1
DNS Servers . . . . . . . . . . . :
PPP adapter CWO:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : WAN (PPP/SLIP) Interface
Physical Address. . . . . . . . . : 00-53-45-00-00-00
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 207.173.10.107
Subnet Mask . . . . . . . . . . . : 255.255.255.255
Default Gateway . . . . . . . . . : 207.173.10.107
DNS Servers . . . . . . . . . . . : 64.57.100.3
209.210.78.3
NetBIOS over Tcpip. . . . . . . . : Disabled
|
We have seen 1 IA for Ethernet link (192.168.47.41) and 1 IA for modem PPP/SLIP (Point to Point Protocol / Serial Link IP). We have also noticed that PPP has the same IP Address and Default Gateway (other name for Router) as it`s a router. The subnet mask is all ones (255.255.255.255) might mean a modem connection is a special network with only 1 host, it`s ours.
Network Class
The original network class is below
Its advantage is a router needs to look only at net id in doing its routing job, each physical network has its own net id. We can easily see that the class A and B have too many hosts, over 16 millions and 65,000 hosts, respectively; while it allows only 2 millions networks. This is a real problem nowadays. There have been several approaches to conserve net id. Among them are using subnet addressing or using proxy server. Subnet approach divides host id into subnet id and host id, say class B has 14-bit net id, 8-bit subnet id and 8-bit host id. Proxy router connects 2 network of the same net id as above (net 209.210.78)
2. Network Basic
Network access is based on TCP/IP suite composed of 2 protocols: transport control protocol (TCP) and internet protocol (IP). The IP is fast, but some data may get lost, it is suitable for audio/video transfer where speed is more important. A little lost of audio/video data makes no difference, but a slowly receiving is a real problem. While a data transfer requires additional TCP to ensure no such loss, but the trade-off is its slow speed.
TCP/IP is just a carrier for applications, like e-mail or web page. Simple Mail Transfer Protocol (SMTP) is the application to send email and receive and put into mail account (mail box). POP (Post-Office Protocol) is used to open and read or delete email. It requires user name and password, like a key to open mail box to get regular mail. HTTP (Hyper Text Transfer Protocol) is for web pages written in HTML format (Hyper Text Markup Language).
TCP is reliable and slow as some extra time for its transfer control to check and re-transmit lost data. It is a daunting task to deal efficiently with issues like time-out in waiting for acknowledgement, duplicate transmit or acknowledge, etc. Low-level network programming (TCP/IP programming) deals with that sort of thing and it has been built into operating system (OS) like UNIX, Window, or real-time OS (RTOS) like Vx-Works. These OS`s have provided us a simple way to do high-level network programming called socket programming for application like SMTP, HTTP, etc. There is a minor difference between socket programming for UNIX, Window and VxWorks in the way to call socket library built into those OS`s. Socket was first created for Unix call BSD socket (Berkeley Software Distribution) in University of California Berkeley, then window and VxWorks sockets followed. OS developers are interested in TCP/IP programming, while we focus on socket programming. We are going to have some basic code for window socket (winsock) as we believe even Unix users also have Window PC.
An error detection is implemented within IP. It`s used either to report error occurrence for an appropriate action if there`s no use of TCP (connectionless service), or to ask for re-transmission of lost data in case usage of TCP (connection-oriented service).
3. Network Socket Basic
File (in general meaning, email also a file) is partitioned into packets to send over a network. Each packet has a header with destination address, sequence number and type. For the same file, its packets may travel different routes to the destination address out of order. The receiver uses the sequence number to assemble the packets, and uses the type to respond appropriately, say, acknowledges only TCP type.
We, as network socket programmer, provide necessary information for OS in creating socket with header. We are going to use 3 socket types: raw (SOCK_RAW), TCP (SOCK_STREAM) and IP (SOCK_DGRAM).
Ethernet header includes physical Ethernet address (EA), IP one has logical Internet address (IA), the numerical format, not the literal one.
Endianess
This is for byte ordering in case of multi-byte data.
By inspection, we may draw a logic conclusion that a single byte data will be appear on the highest data bus for big endian machine and the lowest for the little endian machine. A hardware (HW) engineer may use that part of data bus to connect to single byte data bus device. However some uP maker do not follow this rule !
UNIX machine uses big endian and PC uses little one. The network was first created using UNIX machine and now we have to use big endian to access the network. So we have to convert from little endian to big one for all network information, like EA, IA, ... not transmission data.
4. Ping Utility with SOCK_RAW
We will start with the simplest program called ping utility using SOCK_RAW. It is usually used to check if the destination is accessible.
Ping pseudo code is
Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create raw socket: socket() Create header for raw socket Loop few times (usually 4) Send socket: sendto() Get socket: recvfrom() Cleanup before exit: WSACleanup() |
Ping C code is ping.c
The ping utility has a feature to record route with the option -r 9. However, we found it unreliable, it just hang, probably some router were too busy to do that job. Once we got it done successfully, it gave different route than using the trace utility.
Note
All winsock programs require winsock library, so we must add wsock32.lib into Project - Setting - Link.
5. Trace Route Utility with SOCK_RAW
This trace route utility is also using SOCK_RAW.
Trace Route pseudo code is Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create socket: socket(SOCK_RAW) Create header for raw socket Loop until target reached Set TTL = 0 Send socket: sendto() Get socket: recvfrom() Stop if receiving from target address Inc TTL Cleanup before exit: WSACleanup() |
Trace Route C code is tracert.c
There`s a TTL (Time To Live) in every packet over a network. TTL starts with some number and each time the packet goes through 1 router, the router will decrement TTL if it`s positive or send an error message back to the sender otherwise and we know the router IA. So the trace route program starts with TTL = 0, so the first router will respond an error message and we know its name. We next increment TTL, the first router sees it positive (1) so decrements it. The 2nd router finds it 0 and send back an error message and we know the name of 2nd router. That process keeps going on until the name of router matches with the name of target.
6. Network Topology
The next 2 programs will have Server-Client working with each other. We will have a Client to ask Server to send it a file using SOCK_DGRAM and SOCK_STREAM. If we have only 1 machine, we can run both Client-Server on the same machine. If we have 2 machines with NIC cards, we also need a hub to connect them, we cannot just simply plug a cable into 2 NICs.
7. Connectionless File Transfer
A pseudo-code for a file transfer program using SOCK_DGRAM
Server | Client |
Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create socket: socket(SOCK_DGRAM) Bind socket: bind() Loop until end of file (EOF) Get socket: recvfrom() // wait for sendto() from client Send socket: sendto() Stop if EOFCleanup before exit: WSACleanup() |
Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create socket: socket(SOCK_DGRAM) Loop until end of file (EOF) Send socket: sendto() Get socket: recvfrom() // sendto() fr server Stop if EOFCleanup before exit: WSACleanup() |
C codes are UDPsever.c and UDPclient.c.
We need to run UDPserver.c first. We need to put any big text file tst_r.txt (a printable ASCII file) within the same directory with UDPserver.c. We used 600 KB file so it takes quite some time for a very short route. We`ll compare 2 files tst_r.txt and tst_w.txt to check if they are identical.
8. Connection-Oriented File Transfer
A pseudo-code for a file transfer program using SOCK_STREAM
Server | Client |
Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create socket: socket(SOCK_DGRAM) Bind socket: bind() //wait for 1st from Client Listen socket: listen() Accept socket: accept() // wait for connect() from client Loop until end of file (EOF) Get socket: recvfrom() Send socket: sendto() Stop if EOF Cleanup before exit: WSACleanup() |
Load winsock library: WSAStartup() Initialize protocol: protocol, destination, . . . Create socket: socket(SOCK_DGRAM) Connect socket: connect() // for accept() Loop until end of file (EOF) Send socket: sendto() Get socket: recvfrom() Stop if EOF Cleanup before exit: WSACleanup() |
C codes are TCPserver.c and TCPclient.c.
We need to run TCPserver first. We`ll put the test file tst_r.txt in the same directory with TCPserver.c and do the same mentioned above with UDPserver.c
9. HTTP Server
We`ll modify the TCPserver above to have a HTTPserver. There`re some key points for our surprisingly simple HTTPserver
- the port number must be 80 reserved for application HTTP;
- by convention, we must have a file index.htm which is the default file on the server for a browser;
- the full path must be provided in HTTPserver C code;
C code for HTTPserver is HTTPserver.c.
When we run the HTTPserver, it will wait us by open Internet Explorer and type http://nguyen as our machine ID is nguyen. We`ll see our web page displayed on the browser.
Our web server is so simple as the complicated thing are with the browser. Typically, a server is of 6 K code line and a browser is of 60 K, 10 times more complicated.
10. HTML (Hyper Text Markup Language)
HTML files are display by browser like Internet Explorer or Netscape. HTML files are based on 3 basic rules
- Tags are all text within angle brackets <...>
- Browser uses tags to display document in a formatting manner; it ignores undefined tags;
- Browser display all "non-tag" texts one after another separated by a single space, regardless how many number of blank lines, spaces and tabs; ie these format control have no meaning in HTML files;
To create a HTML file using a regular text editor, say notepad, when we have a new line, but we won`t have the new line display on the browser! If we need it, use <p> . . . </p>. Some other tags are <b> . . . </b> for bold text, <i> . . . </i> for italic. For the current HTML, it`s case insensitive, so <B> and <b> are the same, and some even don`t have close tag like <hr> or don`t require to close tag, like <p>. However, the new HTML, called XHTML (Extended HTML) is case sensitive and requires lower case and requires to close all tags. We use </> for <hr> which has no close tag.
The minimum HTML file has <html> and <body>, so we have our minimum index.htm as below
<html> <body> Hello! This is my first HTML file. </body> </html> |
Hello! This is my first HTML file. |
Browser and HTML offer us to low-cost and nice way for graphic display to remote monitor status of an equipment, say temperature. It`s possible for us to define a special tag to mark some variable, say temperature. When the server receives a request from client, it sees this special tag and respond accordingly, ie reads temperature and send back to client.
11. Conclusions
If we want to modify the ping program to test a file transfer using only 1 program instead of 2 (server - client) we may want to set DATASIZE reasonable for 1 packet, we used 1400.
We can used the same DATASIZE above for other file transfer programs, but we won`t see the difference between connectionless and connection-oriented service as we haven`t taken advantage of TCP/IP built into the OS. We need to increase DATASIZE as much as we can, we used 60000, and we found connectionless service is faster.
The whole code mentioned in this note is here for your convenient download.