Читать книгу Network Forensics - Messier Ric - Страница 10
2
Networking Basics
Protocols
ОглавлениеTo explain what a protocol is, we're going to step out of the world of networking and technology altogether. I can't help but think of the Goldie Hawn movie Protocol when thinking about this topic, and though that may be dating me somewhat, it's relevant. In the movie, Goldie Hawn plays a waitress who saves the life of an Arab dignitary and ends up with a job in the State Department working in Middle East affairs. You may be wondering why the movie is called Protocol and why this has anything at all to do with networking. A protocol is a standard of communication. In order to have productive conversations between two parties, we need protocols. This is especially true when you are talking about entirely different cultures, as in the Arabic countries and the United States. For the conversation and any negotiations to go smoothly, they rely on protocols – standards of behavior and communication that both parties adhere to so nothing is misunderstood.
When you think about it, the same is true in the networking world. For two systems, especially ones that speak entirely different languages, as might be the case with a Linux system trying to communicate to a Windows system, there must be standards of behavior and communication. In the early days of the Internet, back when it was still called the Arpanet in the late '60s and early '70s, many more operating systems were around than might seem to be the case today. Although there still are many, once you start factoring in larger systems, the day-to-day experience of the vast majority of people is with three operating systems: Windows, macOS, and Linux. Two of those come from the same root operating system – Unix. However, they have just enough differences even today that protocols are important to make sure every conversation takes place smoothly.
Most of the time, when there is a conversation about protocols, you will hear someone refer to layers. This is because protocols are generally placed into stacks to explain how they relate to one another. Every type of communication on a network will involve multiple protocols across multiple layers, though each protocol is generally only aware of its own layer. There is one exception to that, but we'll get to it later in this chapter. Network protocols are mapped into two stacks. One is a generic model, and the other is a description of a set of protocols specifically designed to work together. Even the TCP/IP protocols can be mapped into the generic model, however.
Regardless of which way you think about the protocols, one important factor to keep in mind is that every layer only ever talks to its own layer on the other side. If you think about writing someone a letter, you can conceive of how this operates. You write a letter, you put it in an envelope, seal the envelope, address it, put a stamp on it, and then put it in the mailbox. For every action you put into pulling the letter together, there is a corresponding action on the receiving end. Your post office on the sending end determines how the envelope should get to the recipient by looking at the ZIP code. The sending post office has no interest in anything inside the envelope and really doesn't have any interest in the street address or the name of the recipient.
Let's say that the letter you are sending is to someone at a business. The address you have placed on the envelope is for the business. Once the envelope reaches the destination post office (the one that owns the ZIP code), the postal workers there have to look at the street address in order to determine which truck to put it on for delivery. The person driving the truck and out delivering the mail doesn't look at the ZIP code because it's irrelevant – the truck only delivers to a single ZIP code. Likewise, the name on the envelope is also irrelevant; the only important part is the street address. Once it gets to the business and lands in the mail room or with the receptionist, or whoever gets the mail when it arrives, that person will look at the name on the envelope and deliver it. The recipient then gets the letter, opens it, and reads the contents.
The same is true when we talk about protocol stacks. At every point during the process of sending and receiving, there is a specific piece of information that is intended for and handled by a specific person or target. The ZIP code tells the sending post office how to get to the destination. The street address tells the receiving post office how to get to the destination. The name on the envelope tells the receiving party who the letter is actually destined for, and in the end, the letter is probably only meaningful in any way to the recipient. None of these parties has much interest in looking at the other information because it doesn't help them to do their job. Certainly, each party can see the rest of the information (except, perhaps, the contents of the letter), but they only focus on the information they actually need. You will see this repeated over and over as we start talking about the different protocol stacks and then the specific protocols from the TCP/IP suite of protocols.
An essential concept that you should understand before we get started is encapsulation. Regardless of which communications stack you are referring to, data passes from one layer to another. Each layer distinguishes itself by applying some data associated to that layer before passing it on to the next layer down. This process is called encapsulation. Going back to our mail example, the letter is encapsulated inside the envelope and then the person's name is added to the envelope. After that, the street address and then finally the ZIP code (since the city/town and state are just the long form of the ZIP, they are redundant) are added. This addressing information encapsulates the information that comes before, though in a less obvious way than you will get from the IP addresses and other forms of address discussed below.
On the receiving end, the communication goes through de-encapsulation by removing the headers that were added on the sending end before the data is sent to the next layer up the stack. You will see this process of encapsulation as we start talking about the two different models and then, more concretely, when we start looking at the different protocols in operation.
Open Systems Interconnection (OSI) Model
In the 1970s, a number of communication protocols including the nascent TCP were used on the Arpanet as well as System Network Architecture (SNA) from IBM, DECnet from Digital Equipment Corporation, and many others. The International Organization for Standardization (ISO) decided a single model was needed to fit all communication protocols. In 1977, the ISO made use of work done by the Honeywell Corporation to create an abstract model describing different functions used in communications systems. By 1983, it had merged its standard with a similar standard by the International Telephone and Telegraph Consultative Committee to create the current Open Systems Interconnection (OSI) model.
NOTE
The acronym “ISO” is a compromise, recognizing the different abbreviations across the three languages used within ISO and is based on the Greek isos, meaning equal.
The OSI model consists of seven separate and distinct layers, each describing a particular set of functions and behaviors. Although every protocol used for communication will fit into one of these seven layers, not all communication streams will make use of all seven layers. Some types of communication are far more simplistic than others and may not need some of the higher layers of the protocol stack, depending on the intention of the communication. You can see a representation of the OSI model, drawn as a stack of boxes, in Figure 2.1.
Figure 2.1 : The Open Systems Interconnection seven layer model.
We will go through the model from the bottom to the top, as though we were reading a message off the wire. At the very bottom of the stack, at layer 1, is the physical layer. The physical layer includes all of the tangible components that you can touch – cabling, network interfaces, and the actual signaling medium, whether it's light or electrical. Since the name is pretty straightforward and descriptive, this one will be the easiest to remember and keep straight.
The next one up is the data link layer, layer 2. The data link layer is how systems on the same physical network communicate. For every layer in the stack, there is generally a way to differentiate communication streams – a way of addressing. At layer 2, this is the Media Access Control (MAC) address. The MAC address is attached directly to the network interface, which is why it is sometimes called the physical interface. The data link layer makes sure that devices on the same physical network can communicate reliably with one another. If you are using a switch on your network, the switch is operating at layer 2 because it makes use of the MAC address to determine where to send network messages.
NOTE
The MAC address is six bytes and it is expected to be globally unique, meaning no other network interface in the world will have the same MAC address as the network interface on your system. Those six bytes are broken into two separate sections, three bytes per section. The first half, 24 bits, is the organizationally unique identifier (OUI) that identifies the vendor of the network interface. The second half is the identifier for the interface itself. The OUI is something that can be looked up in one of several online databases so if you have the OUI, you can know the vendor of the interface.
The third layer is the network layer. Layer 3 makes sure that devices that are not on the same physical layer can communicate. Layer 3 messages typically require a router to pass messages from one network to another. This layer also requires an address. The Internet Protocol (IP) and the Internet Packet Exchange (IPX) protocol from Novell both operate at layer 3, providing network addresses, as well as addresses for the hosts on those networks.
Layer 4 is the transport layer. Where previous layers were about getting messages to the host, this is the first layer where the message has fully arrived at the host. Layer 4 allows for multiplexing of network communications on a single host. It does this by using ports. Each network address may have a large number of ports to communicate to. Systems that use the TCP/IP protocols will have 65,536 ports to communicate to on the different transport protocols. The User Datagram Protocol, the Transmission Control Protocol, and the Sequenced Packet Exchange Protocol (SPX) are all at this layer.
Layer 5 is the session layer. While the transport layer can support a connected form of communication between two systems, that is strictly system to system. Layer 5 is where the communication stream between those two hosts is managed. Depending on the implementation and the protocols being used, you may only have one-way traffic or you may have bi-directional traffic. The session layer determines how that communication will happen. The protocols at this layer handle the negotiation of the communication flow. Telnet, Secure Shell (SSH), and the File Transfer Protocol (FTP) are at this layer, though they also are commonly said to live at the application layer as well. Many session layer protocols straddle multiple layers.
Layer 6 is the presentation layer. This layer handles the conversion between the network communication and the application. Any data encoding and decoding as well as data formatting would be done at this layer. JPEG and GIF files are at this layer. The Hypertext Transport Protocol (HTTP) is also at this layer. Anything that does encryption/decryption or compression would be at the presentation layer.
Finally, layer 7 is the application layer. Any application programming interfaces (APIs) would exist at this layer. This is where the interface to the user is.
TCP/IP Protocol Suite
The TCP/IP protocol suite was developed over a number of years and evolved into what we have today. While it is sometimes referred to as a model, the TCP/IP protocol suite is a description of an as-built set of protocols designed to work together. The communication protocols on the Arpanet were developed as they were determined to be necessary rather than planned well ahead of time. For instance, initially there was no Internet Protocol (IP). The Internet Protocol was part of the Transmission Control Program and offered connectionless service between two systems. If the two systems wanted the communication to be connection-oriented and have the connection managed by the Transmission Control Program rather than a higher-layer application, it would use the Transmission Control Protocol (TCP). Eventually, IP was separated out to handle network addressing and other network functions. On top of that, other protocols were developed. So, the TCP/IP architecture or model is documentation of what is in place.
NOTE
The TCP/IP protocol suite is sometimes referred to as the Department of Defense (DoD) model, because the DoD provided funding for the Arpanet, where TCP and IP were developed.
Whereas the OSI model is seven layers, TCP/IP, or the Internet Protocol suite, is only four layers. While it is much simplified over the OSI model, you will see that all of the same functions are described within the four layers. Even though the Internet uses the Internet Protocol suite to operate, it's more common in my experience at Internet service providers and network equipment vendors for networking professionals to refer to the layers of the OSI model, partly because of the granularity it offers, which helps to differentiate the functionality being referred to.
The first layer of TCP/IP is the Link layer. This encompasses functionality from the first two layers of the OSI model. Both the physical and the data link layer of the OSI model are represented in this layer, so the same functionality and examples from those layers apply here. This is where the MAC address lives and this layer makes sure that systems on the same physical network can communicate with one another.
The second layer is the Internet layer. This is the same as the network layer in the OSI model. This is where IP lives. IP provides network addressing and helps to ensure that messages can get from one network to another. IP is a routable protocol, though not all network layer protocols are. Of course, every host on a network gets its own address, so talking about network addressing is incomplete. The important distinction, though, is that the bulk of any IP address is the network address. The smallest portion is the actual host component. This reflects the large number of networks that are connected together across the Internet where the number of hosts on any given network is comparatively much smaller.
The third layer is the Transport layer, corresponding to layer 4 in the OSI model. It shares the same name between the OSI model and the TCP/IP model. This is where multiplexing on each system happens, through the use of ports. Ports provide a way for multiple applications to listen simultaneously on the same IP address as well as for multiple applications to originate traffic using separate source ports, allowing return traffic to get back to the correct application.
Finally, the fourth and last layer in the TCP/IP model is the Application layer. While it shares the same name as layer 7 in the OSI model, it encompasses all of the functions of layers 5–7 of the OSI model. Applications reside here. If they need presentation functions or session management, the applications take care of all of that and those functions aren't broken out and described separately from the application itself.
As you can see, the TCP/IP model is quite a bit simpler to think about than the OSI model. If you want to get fine-grained about functionality, though, the OSI model is better as a reference point. Ultimately, they are both just for conceptualizing and referring to the functions without specific reference to the protocols in use.
Protocol Data Units
We've talked about the various layers of the two communication models. Ultimately, the purpose for those models is to build different means for multiple systems to communicate with one another. The protocols don't exist for the purpose of the protocols. They exist to be able to effectively and efficiently send data from one system to another. The data is wrapped up with the different headers from each layer that allow the receiving system to identify where the data is headed, including what application.
As different protocols add their headers, encapsulating the data that is already there, the result is a different chunk of data than what was there before the protocol got its say. The resulting chunk of data, just as the chunk of data that started out, is called a protocol data unit (PDU). Each layer of the communications stack has a different protocol data unit associated with it. This means that at most layers, we use a different word to describe the chunk of data, or protocol data unit, we are looking at.
In order to talk about the different words, we are going to start at the very top of the stack. This is because when a message is being prepared for sending, it starts at the application. The application creates data. The protocol data unit at the application layer is just “data.” As we move down through the presentation and session layers, we are still talking about just data. You may not actually be working with protocols in layers 5–7, so there isn't really a PDU associated with it. It's just the data until we get to layer 4 of the OSI model.
Once we get to the transport layer, whether we are talking about the OSI model or the TCP/IP model, we are talking about the data that has the transport headers stacked on top. After those headers, which include the source and destination port numbers, are in place, you have a segment if you are using TCP and a datagram if you are using the User Datagram Protocol (UDP). The segment or datagram is then handed to IP to add some additional headers.
The IP headers include the source and destination IP address as well as some additional information, including indications as to whether what we have is a just a fragment of a larger communication stream or just an individual message. Once the IP headers are on and sitting atop the TCP or UDP headers, you have a packet. A few protocols may be in use at layer 2, including Ethernet, Asynchronous Transfer Mode (ATM), Point to Point Protocol (PPP), or 802.11 (WiFi). No matter what the protocol is, there will be a set of headers that includes the source and destination MAC addresses. Once the layer 2 headers are on, you have a frame. The frame is what is placed onto the network.
Once the frame is converted to the right signaling mechanism, either an optical signal or an electrical signal, we are looking at bits. In the end, no matter what data you are sending, it is sent a bit at a time. If you are looking at the data as it is passing across the network, you are looking at a stream of bits. Later on, we'll look at more details of the different protocols you will see as we start pulling these messages – frames, packets, segments, and datagrams – apart.