Help:IP Networks

Jump to: navigation, search

On Kogence container cloud high performance computing (HPC) platform your requested compute infrastructure is preconfigured with various types of data networks depending upon the requirements of the model you are running. For example, all software containers requested by your workflow are automatically composed so that all containers in your workflow can talk to each other. Kogence cloud HPC platform allows you to run a multi-software workflows on an autoscaling multi-node cluster of HPC severs orchestrated on the cloud infrastructure as and when needed by your model. See container network to understand architecture of networking among individual containers in your workflow. Refer to the microprocessor architecture to understand the OmniPath (a type of Remote Direct Memory Access network) network among the sockets of an cloud HPC server and the network among the cores within a socket. Your cloud HPC clusters will also be pre-configured with OS bypass network (e.g. OpenFabircs Infiniband) for your multi-node MPI workloads, if and as needed by your model. Please refer to the OS bypass network document for the details.

There is an Internet Protocol (IP) network configured among all the nodes of your HPC cloud cluster. Resource managers, job schedulers as well as MPI libraries that we automatically preconfigure your clusters with, are dependent on an existing IP network among the nodes. This document describes the IP network among different components of Kogence platform architecture.

Kogence Cloud HPC Platform - IP Network Architecture

Geographical Regions, Data Center Zones, Proximity Placement Groups and Resource Groups

Kogence cloud HPC platform is, by design, a cloud agnostic multi-region, multi-data-center app. Your cloud HPC infrastructure is orchestrated on hardware from many different data centers provided by many different cloud infrastructure providers across the globe. Scope and choices are configurable and is done on per customer basis at the time of deployment depending on the geographical location and needs of each enterprise customer. We take data security very seriously. We only use the data centers whose operations are regularly audited by independent firms against an ISAE 3000/AT 101 Type 2 Examination standard. Please refer to the Kogence security policy document for detailed criteria of our data center selection.

Our data centers are distributed in some 20+ geographical regions and organized in some 50+ data center zones. Each geographical region has 2 or 3 data center zones and each zone is mapped to 2 or 3 physical data centers. Data center zones can be treated as logical data centers that are fail safe. Even if a physical data center has power failure, for example, the logical data center (i.e. the zone) is still up and running. Your HPC resources are always orchestrated in a single data center zone. Exact physical data center that is used for your HPC resources changes from each request to next request.

Geographical level isolation could be important for workloads with compliance and data sovereignty requirements where guarantees must be made that user data does not leave a particular geographic region. Geographical level orchestration configuration tuning is also important for workloads that are latency-sensitive and need to be located near users in a particular geographic area.

All nodes within a Kogence autoscaling cloud HPC cluster are orchestrated within a proximity placement group that provides low latency and high throughput IP network among the nodes in addition to OS Bypass network for multi node MPI simulations.

Enterprise clients with redundancy needs or those with users distributed across the globe can choose to configure deployment of Kogence Grand Central App to span across data center zones and geographical regions. In each data center zone, participating data centers are connected to each other over redundant low-latency private network links. Likewise, all data center zones in a region communicate with each other over redundant private network links. These intra- and inter-data-center zone links are needed for fail safe redundant data replication and automatic backups for storage and databases.

Each enterprise client's data, backups as well as all other infrastructure resource reside in a separate resource group. On request from customer's authorized representative (such as at the termination of Kogence license/contract) the entire resource group can be deleted with no copies of customer data left anywhere.

Virtual Private IP Network - Host and Network Based Intrusion Prevention

Please refer to the Fundamentals of IP Network if you need to familiarize yourself with the IP network terminology used below.

Your organization's Kogence autoscaling cloud HPC clusters are orchestrated within a private IP network with a size /16 IPv4 CIDR block (example: Each enterprise customer's simulations are launched in a separate private network. Simultaneously running simulations from either the same user or different users of same organization are launched in separate subnets with /24 IPv4 CIDR block size (example: of the private network. Network and subnet sizes as well as geographical regions and logical data center zones are configurable for each enterprise customer's deployment.

Kogence Platform Architecture v3.png

Each subnet is connected to a router. Each node in the cluster can communicate with each other within the private network using private IP addresses and the subnet masks. Subnet routers do not interact with each other. The network is connected to an Internet gateway. Master node of the cluster defines a route from the master node to the Internet gateway in the subnet router. The Internet gateway provides the one-to-one NAT on behalf of the master node in the Kogence autoscaling cloud HPC cluster, so that when traffic leaves the subnet and goes to the Internet, the reply address field is set to the public IPv4 address of the master node and not its private IP address. Conversely, traffic that's destined for the public IPv4 address of the master node has its destination address translated into it's private IPv4 address before the traffic is delivered to the private network. The private network is also connected to a DNS server that resolves the public DNS hostname of the master node to its IP address. Other worker nodes of the cluster can only be accessed within the private subnet. Worker nodes of the Kogence autoscaling cloud HPC cluster don't need to accept incoming traffic from the Internet and therefore do not have public IP addresses; however, they can send requests to the Internet using the NAT gateway. Master node of the cluster only communicates with Internet under secure and encrypted pipes created by industry standard TLS/SSL or SSH protocols. Router also provides DHCP service to automatically assign private IP addresses to all nodes in your subnet. Subnet routers provide subnet level network based intrusion prevention and only allow the well defined Internet and subnet traffic to be routed among the nodes while all other inbound and outbound traffic is blocked.

Each node also provides host based intrusion prevention. The network interface controllers on all hosts define well specified and required inbound and outbound traffic and all other traffic is blocked. Each node host is also firewalled at OS level with corresponding iptable entries. At an application level, all Kogence cloud based docker containers run in secure non-privileged mode. For under standing the network architecture among then containers please see separate document on Container Networking.

In typical non-HPC optimized cloud service, network interface controllers on all hosts would be virtual network interfaces. The software defined virtual network interfaces seriously impact the network bandwidth and increase the latency to a level that they become performance limiting factor in, for example, a distributed memory MPI applications. On Kogence cloud HPC platform, if offer nodes specifically designed for the Network Limited Work Loads. These nodes use hardware based network virtualization. The network card on the host hardware is specifically designed to separate the network traffic on all hosted virtual machines at the hardware level without impacting the network bandwidth and increasing latency. Network Limited Work Loads nodes on Kogence cloud HPC platform provide upto 100Gbps of network bandwidth. Please also check separate document on OS Bypass network for multi node distributed memory MPI simulations.

The Fundamentals of IP Network

The Internet Protocol suite (IP suite, sometimes also know as the TCP/IP suite) is the set of communications protocols widely used for the internet (as well as other networks). IP Protocol suite should not be confused with IPv4 or IPv6 which is just one set of protocols of the Internet Layer of the entire IP suite. The suite was originally part of the UNIX operating system and was later integrated with all common OS and is now maintained by the Internet Engineering Task Force.

The suite contains many different protocols --- two of the most important ones are TCP (Transmission Control Protocol) and IP (Internet Protocol). The IP suite, like many other protocol suites, may be viewed as a set of 4 main layers of logic and services. From lowest to highest, these are the Link Layer, the Internet Layer, the Transport Layer, and the Application Layer. This 4 layer IP model should not be confused with the popular OSI reference model that describes the same protocol suite in more granular 7 layers. Each Layer can use any of the multiple possible protocols. The OS kernel comes with many utilities to coordinate data packet exchange between different Layers. These kernel utilities are able to coordinate between different Layers with each using any of the acceptable protocol.

Each layer solves a set of problems involving the transmission of data, and provides a well-defined service to the upper layer protocols based on using services from some lower layers. Upper layers are logically closer to the user and deal with more abstract data, relying on lower layer protocols to translate data into forms that can eventually be physically transmitted. A very helpful abstraction resulting from layered protocol mental model is to be able think that application client (e.g. ssh client) is directly communicating with the application server (e.g. ssh daemon) on remote host and the communication is entirely focusing on ssh functionality such as authorization, authentication and encryption. In reality, the ssh data packets at sender host go through a stack of software/logic (part of OS, device drivers, firmaware or hardware) and are wrapped by TCP headers, IP headers and then Ethernet headers. On the recipient host side, these data packets again go through same stack of software/logic in reverse order and these headers are removed one by one. But before data packet is successfully handed over to an upper layer, the lower layers on the sender and recipient host may exchange many data packets back and forth several times to make sure that the data is correct and reliable. Similarly, before the data packets actually reach the desired destination it may actually hop through multiple network devices that upper Layers are completely agnostic of. This way upper layer can focus on higher level logic (e.g. ssh authorization, authentication and encryption) and higher level source and destination addresses and not worry about lower level details such data integrity, data reliability, data routing etc.

Link Layer Standards: IEEE 802 LAN

The lowest layer is the Link Layer that consists of software and hardware needed to transmit data in actual physical media. Here we will review one of the most important Link Layer standard -- the LAN standards.

A local area network (LAN) is a collection of devices that are physically connected to the same hub, switch or group of interconnected switches, while all being configured to use the same low level (i.e. the Link Layer) communication protocol (i.e. the IEEE 802 protocols). IEEE 802 is a family of IEEE standards dealing with local area networks (LAN) and metropolitan area networks (MAN) and is maintained by the IEEE 802 LAN/MAN Standards Committee (LMSC) consisting of several working groups with individual working group focusing on a particular LAN/MAN standard/technology. The groups are numbered from 802.1 to 802.22. Among the 802 standards, the most widely used standards are for the Ethernet (802.3),  Wireless LAN (Wi-Fi, 908.11), Bridging and Virtual Bridged LANs (802.1). Other historical LAN protocols include ARCNET and AppleTalk. In modern days, LAN is synonymous with networks built to communicate using IEEE 802 standards while the wired LAN is synonymous with Ethernet.

The services and protocols specified in IEEE 802 map to the Link Layer of the IP protocol suite. The Link Layer is typically subdivided into two sub-layers (Data Link and Physical) in the seven-layer Open Systems Interconnection (OSI) networking reference model. IEEE 802 further splits the OSI Data Link Layer into two sub-layers named logical link control (LLC) and media access control (MAC). For applications running on one host to be able to communicate with other hosts within the network, one also need to define other communication standards and protocols. For example, Link Layer protocols such as LAN protocols will not ensure data packets arrive in order, they will not ensure reliability and they will not do any error corrections. Moreover, higher level concepts of ports and sockets needs to be defined. For example, if you open to browser tabs as clients that are requesting different pages from a webserver, then your client host needs to be able to track and figure out how to deliver the content to correct tab. This is done using concepts of ports and sockets as will see below.

There can be non-LAN IP-networks. IP protocol suite is designed to work with any Link Layer technology which can be different from LAN technologies defined by the IEEE 802 standards. Conversely, you can also potentially access your LAN devices from within the LAN by building a non-IP protocol stack on top of the low level Link Layer LAN protocols defined by the IEEE 802 standards. But if you need to access these devices over the internet then almost certainly you will need to connect your LAN (or your non-LAN network) using network hardware that support IP protocol stack on top of LAN or other Link Layer standards (of course, then hosts devices on your network will themselves need to have the IP software stack support).

Within the IP protocol suite terminology, LAN is a network (or a subnet as we will see below) as seen by the IP Layer. Therefore, LANs that use IP protocol suite are sometimes also referred more specifically as the IP-networks. From the Link Layer perspective, this network may be divided into network segments that are connected together by switches/bridges and hubs/repeaters to form a LAN. The IP Layer does not know anything about the concept of network segments. The network segments are single collision domains, meaning only one sender can send message on the segment at a time and can lead to congestion if a segment consists of large number of hosts. Switches and bridges help reduce the congestion by breaking LANs into several smaller collision domains (i.e. smaller network segments). To function properly, LANs are configured so that any device can send a broadcast message that can be seen by all devices on the LAN.  For this reason, LANs are often referred to as broadcast domains.

Link Layer Network Hardware: Network Interface Controller, Repeater, Hub, Bridge and Switch

Network Interface Controller (NIC)

An NIC goes by several common names such as the a network interface card, a network adapter, a LAN adapter or simply a network interface. Any device that needs to be connected to a network needs to have at least one NIC. If device is attached to multiple networks then it can have more than one NIC. The NIC is a Link Layer device and provides physical access to a networking medium and, for LAN and other similar networks, provides a low-level addressing system through the use of MAC addresses that are uniquely assigned to each NIC in a network.

A device must have one NIC for each network (not network type) to which it connects. For instance, if a host attaches to two LAN networks, it must have two network cards. When you install a new NIC, you are creating a new network interface. However some network interfaces may be logical and are not necessarily associated with an NIC. For instance, the loopback interface has no NIC associated with it. Loopback interface is used by the host to send the message back to itself. Similarly, virtual network interfaces are also not associated with any specific NIC. One can create virtual network interfaces using kernel utilities as described in a separate section below.

IP protocol suite supports many types of network interfaces including IEEE 802.3 LAN (i.e. the Ethernet LAN), Token-ring (tr) (another common LAN protocol), Serial Line Internet Protocol (SLIP, which is used for serial connections), Loopback (lo, the loopback interface is used by a host to send messages back to itself), FDDI, Serial Optical (so, the Serial Optical interface is for use with optical point-to-point networks using the Serial Optical Link device handler), Point-to-Point Protocol (PPP, the Point to Point protocol is most often used when connecting to another computer or network via a modem) and Virtual IP Address (VIPA).

One should carefully notice the difference between network, network interface and network driver. Defining a network interface can be seen as defining a specific device driver that needs to be used for traffic going through that network interface and defining some other basic properties so that the IP layer knows which traffic should be routed to that network interface. Even virtual network interfaces have network drivers defined for them. We use the term NIC for both the actual network interface hardware as well as the virtual network interfaces that are defined in software and don't have hardware associated with them.

Each NIC is assigned an IP address that belongs to the subnet that it connects to. Most OSs are configured by default to obtain IP settings (IP address, subnet mask, and default gateway) automatically using a Dynamic Host Configuration Protocol (DHCP) server. The network devices act as DHCP client and broadcast a DHCP request. Any DHCP server on the network can respond. Most modern network routers have built in DHCP server. Network administrators of larger networks can also deploy individual DHCP server on the network. If no DHCP server responds, then it is common practice for OSs to assign an IP address in the 169.254.x.x range, which is referred to as a link-local IP address autoconfiguration (Microsoft calls it APIPA). Network admin may also assign IP addresses manually as described later in a separate section. Network devices can then use ARP protocol (see below) to make sure that no 2 devices have same IP address on the network.

Each NIC typically will know its own IP, its subnet mask and IP of the default gateway. The subnet mask defines the set of IP addresses that belong to the subnet to which the NIC is connected to while the default gateway is the default route for all those IP addresses that do not belong to the subnet. One can also define additional static routes for the NIC. A static route is nothing but a way of specifying a route for the traffic that does not belong to the network to which NIC is connected but at the same time it must also not go through the default gateway. One can add a static route to a different network that cannot be accessed through your default gateway. As discussed below, these routes for all NICs are maintained by the kernel in a routing table.

MAC Addresses and Private IP Addresses

An NIC connects the device to a network (or a subnet). Each NIC has a unique private Link Layer address (which is the MAC address in case of a LAN Link Layer) as well as a unique private IP Layer address (which is the private IP address in case of IPv4 or IPv6 IP Layer). Both of these addresses are unique within the network (or the subnet) that the NIC is connecting to.

As we will see below in the direct and indirect routing examples, both of these private addresses are needed. Depending upon the protocols that each Layer is using, these two unique private addresses may be called something different but both are required. IP address define the address of final destination. As packet hops through multiple devices on the network (or across the networks), at each hop the Link Layer will put the Link Layer address (i.e. the MAC addresses in case of LAN) of the next hop. The next hop looks at the IP address and decides if it is meant for that device or if it needs to be transmitted to next hop. If it needs to go to next hop then that device's Link Layer address is substituted in the Link Layer header before transmitting. Therefore you can see that we need both these headers with inner header containing the unique private address of final destination and outer header containing the unique private address of the next hop. In principle, it is possible to use private IP addresses for both headers (of course, addresses will be different in two headers at each hop) but in LAN networks MAC address is used as the Link Layer private addresses. MAC address is a concept defined in the LAN standards (which can be used as a Link Layer protocol in the TCP/IP stack). TCP/IP and LAN protocols were developed independently and TCP/IP can use non-LAN protocols for Link Layer. TCP/IP on the other hand, has its own local/private addressing system through private IP addresses.

There is a prevalent misconception that the MAC addresses are unique across all network devices on a network as well as across all networks across the entire planet. This is not correct. First of all MAC addresses are associated to network interfaces (e.g. network cards), i.e. the hardware that allows hosts to connect to a network. These hardware can be swapped. Network interfaces can also be implemented in software, meaning they can be virtual. MAC address can easily be spoofed as well by writing a piece of code that publishes different MAC address to any request for identification. From the protocol perspective, MAC addresses are only supposed to be unique within a network (e.g. within a LAN). As we will see below, if you connect multiple routers together then in that network of routers, it is the MAC address of routers that needs to be unique. The MAC addresses of each devices behind each router should be unique as well but 2 devices behind 2 different routers are allowed to have same MAC address. By the way, same is true for the private IP addresses as well.

IEEE manages the MAC addresses of hardware sold by manufacturers, so they tended to be globally unique. This is done because sellers do not know how you will connect the device to the network. The hardware they are selling might itself act as a router and then its MAC address needs to be unique or if it is used behind a router then it does not need to be unique. With network hardware being virtualized, things are starting to be much more complex.

IP Aliasing, Virtual Network Interfaces and Sub-Interface

Most OS kernel allow two private IP addresses to be assigned to a single network interface. Irrespective, the concepts of direct and indirect routing discussed below in a separate section work the same way.

Assigning single and multiple IP addresses manually to a device is a potential security risk as it may expose traffic from a different subnet if a different subnet IP address is added by mistake. Basically, the NIC on which 2 different subnet IP addresses are added would become a gateway for those 2 subnets and can route traffic between the two subnets. Kogence cloud HPC platform uses DHCP server for automatic assignment of IP addresses to NIC and we do not assign multiple IP addresses to same NIC.

One can also define virtual interfaces or sub-interfaces of an existing hardware network interface using the ip utility, for example, as discussed later. Sub-interfaces are virtual interfaces configured such that the traffic from each sub-interface goes through the primary hardware NIC. On the other hand, virtual interfaces do not have to be connected to an actual hardware NIC. As an example, the bridge networks defined by the docker daemon by default only route traffic between the containers on same host and are not connected to hardware NIC. In that case, host machines routing table will have a route to the bridge network so even though host can communicate to the containers, the other devices on the host network cannot reach the containers. For more details please see Container Networking document.

Processes that use the IP address of the virtual interface as their source address can send packets through any real hardware network interface (if connected) that provides the best route for that destination. Incoming packets destined for a virtual IP address are delivered to the process regardless of the interface through which they arrive. The routing algorithm is agnostic of the fact that the source IP address is an IP address of a virtual interface and works just like as discussed below in a separate section.

Repeaters and Hubs

A repeater is one of the simplest 2-port network device. A hub is simply a multi-port repeater. Hubs and repeaters are basically signal amplifiers and help extend the range and the physical separation between the hosts. There is always a limit on the physical separation between hosts that is defined in the IEEE 802 standards. That said, a basic repeater is not actually used in a network hooked up with 10/100BaseT cables, it used to be found with thinnet LANs.

Hub broadcasts packets on all ports (except from the port it received the signal) that it receives on any one port. Every hosts then checks the MAC address to see if packet was meant for it or not. Hubs make collision domains in network. Meaning that collisions are possible if two device try to send data simultaneously. Each host should use CSMA/CD to check if nobody is writing before attempting send a packet. In hubs, bandwitdh between all ports is shared. Meaning that if one port is sending a lot of data then others may not be able to send anything. Hubs can be daisy chained. Hub will create a bus topology (not star) network. This is true (logically) even if they are daisy chained. The entire network (hooked by one or more hubs) is considered as only one network segment. Entire network segment is a single collision domain.

Network segment is a layer 2 concept (Data Link sublayer of the Link Layer). As far as IP Layer is concerned, all network segments belong to same "network" or same "sub network". On the other hand, sub network or super network are concepts in IP layer. Sub-networks have different IP addresses (or different IP address ranges). IP range of a sub-network is a subset of the IP range of its super-network.

Hubs used to be a simple way of making local area networks (LAN) but nowadays hubs are almost completely replaced by switches.

Switches and Bridges

A bridge is a 2-port network device and a switch is simply a multi-port bridge.

Since a network segment is a single collision domain, if you have a large number of hosts on a single network segment, we may have a lot of congestion and traffic may be choked. To reduce the network congestion we may organize hosts in smaller network segments and then connect all network segments to a switch. As discussed above, each segment may have several nodes hooked to an independent hub. And all hubs can then be connected to a switch.

A bridge/switch is a little smarter version of hub/repeater. A switch/bridge does look inside the destination MAC address field of a data packets. It keeps a table of mapping between the MAC addresses of connected devices and the switch/bridge port to which each network segment is connected. Once the MAC address table is populated, a bridge/switch will also know which devices (MAC address) are in which segment (a switch/bridge port). Until the entire MAC table is populated, switch will broadcast packets to all its ports. If switch recognizes the MAC address in the address field of the data packet then it only passes data to its desired port and hence desired network segment and not to all of its ports/network-segments/devices. But if does not recognize the address then it forwards it to all its ports. This way bridge/switch is able to reduce congestion.

Switches are the Link Layer devices and they do not know anything about the IP addresses. Switched do not look for IP addresses. As we will see later, the IP address to MAC address mapping is maintained by network interface software that is part of the IP Layer software suite and is typically part of the OS.

Bridge/switch forms a star topology of network. A bunch of network segments connected together through bridges/switches represents a homogeneous single network as far as IP Layer is concerned. They are different network segments only at the Data Link layer level. Network-segments and MAC addresses are also Link Layer concepts.

Switch generally provides a dedicated bandwidth to each of its ports. Meaning that one of the ports can not hog all the bandwidth and can not stop other devices from communicating with each other.

Internet Layer Standards: IPv4

Internet Layer is responsible for routing data packets between the internet gateways. Link layer takes care of transporting data from host to the internet gateway and the transport layer takes care of trasporting data from host to the application.

IPv4, IPv6, ICMP and IGMP are some common examples of Internet Layer protocols. Internet Layer protocol defines the concepts of IP addresses and routing algorithms. Two versions of the Internet Protocol (IP) are very common: IP Version 4 (IPv4) and IP Version 6 (IPv6). Each version defines an IP address differently. Because of its prevalence, the generic term IP address typically still refers to the addresses defined by IPv4.

Subnet, IPv4 CIDR, Private Address Space, Network Address Translation and Port Forwarding

Early network design, when global end-to-end connectivity was envisioned for communications with all Internet hosts, intended that IP addresses be uniquely assigned to every computer or device connected to the internet (directly or indirectly). IPv4 addresses were 32 bit divided into 4 segments of eight bits (known as "octets") and typically written as A.B.C.D where A, B, C and D are each numbers between 0-255. A was treated as the address of LAN (for example, ARPANET was network number 10) and B, C and D identified a unique host on the LAN. Total available address space under this scheme was about 4B.

Of course this could not work with the current scales of the internet. A combination of private address space, subnetting and network address translation (NAT) is now common in all modern networks to allow the scale of current internet.


IP networks are hierarchical. Meaning a network can be broken down into subnets (a subnet is in itself a proper IP network). Everything behind a router is a subnet. A group of interconnected routers can form a network of subnets. A host that wants to connect to two or more subnets will need as many number of NIC with each NIC connecting to one subnet. Each NIC will receive a separate IP address from each subnet.

One should note that a subnet is in itself a proper IP network. Therefore the terms subnet and network is often used interchangeably and meaning should be clear from the context.

Subnet Masks

Subnets have a range of IP addresses. We usually don't write -, instead, we shorten it to Hosts in the subnet can be assigned one of these as private IP address (private within that subnet). Each number between the dots in an IP address is actually 8 binary digits (00000000 to 11111111) which we write in decimal form (between 0 and 255) to make it more readable. The /16 means that the first 16 binary digits is the network address of the subnet, in other words, the 1.2.*.* part is the the network address and last 16 can vary among the devices on the subnet. This means that any IP address beginning with 1.2.*.* is part of the subnet: and are in the subnet, and is not. In this example, the subnet mask of the subnet is

We usually use subnets ending in /8, /16 and /24 that makes it easier to understand even though any length is allowed. For example, is a big subnet containing any address from to (over 16 million addresses). while is smaller, containing only IP addresses from to is smaller still, containing addresses to And subnet contains only one IP address i.e.

Private Address Space

The Internet Engineering Task Force (IETF) has directed the Internet Assigned Numbers Authority (IANA) to reserve the following IPv4 address ranges for private networks - (10/8); - (172.16/12) and - (192.168/16). Hosts in all private subnets can use these IPv4 addresses simultaneously. We like to call these private IP addresses. These address spaces are not routed on global internet routers. These are only routed by the private routers attached to these private subnets.

If network is designed as consisting multiple subnets, then it is recommended to use small and separate private IP address space for each of these subnets. That way routers and gateways will be able to route traffic between subnets as needed without any Network Address Translation (NAT). In route to route traffic between subnets that use same private address space one would need to configure routers and gateways to use NAT and PAT as discussed below. Traffic going into the internet and coming back would always need NAT (and potentially PAT).

An address for a destination host is specified simply by an IP address (and not by mask). As we discussed later in a separate section on routing, the network device's kernel routing table contains the subnet masks of all subnets that this network device can access through all the NICs available on this network device. If network of multiple subnets is properly configured with non overlapping private address spaces then the destination IP address will only fall in one of the subnets or none of them. If it is in one of the subnets then packet is sent to appropriate NIC that connects to that subnet. NIC uses the Layer 2 (LAN) protocols to wrap it with LAN packet and directly send it to proper MAC address. If it is not in any subnets then it is passed to the MAC address of the router (i.e. the default gateway) and then router will either send it to another subnet (if router knows that that address is in another subnet) or to the default gateway. But it will go to the gateway only if it is not one of the private addresses as defined by the IANA.

Network Address Translation and Port Forwarding

With private subnets one has to use Network Address Translation (NAT) for sure and port forwarding (or Port Address Translation PAT) in more advanced use cases for hosts in private network to be able to access hosts on other networks or on the internet and vice-versa. The private IP addresses are private to the subnet. Meaning that the hosts on different subnets can have same private IP address. Whereas, public IP addresses are unique on the entire internet. Subnets can be referred to as public if router will has a route to the internet gateway. It does not necessarily mean that hosts inside subnet are accessible from internet. Hosts in the subnet will be able to access internet as there is a route defined to the internet gateway. This is enabled using Network Address Translation (NAT). Most routers can do NAT. Ports and port forwarding is actually a Transport Layer concept. Routers that are able to do both NAT and PAT would have both the IP Layer and Transport Layer capabilities. For outgoing traffic, routers replace the private source IP address and application's port number by the public IP address of the router and certain specific source port number and record those tracking details in a table. When reply comes back, NAT again replaces the destination IP address and port numbers to the original values so packet can be routed inside the private subnet to proper destination.

One should note that the hosts inside the subnet cannot be accessed from the internet even with NAT in place. For example, you cannot do ssh either to the public IP address of router or the private IP address of a subnet from any host outside of the subnet. One can enable port forwarding in the router to handle such use cases. One can define routing rules in the router to specify that all traffic coming to the port 22 and on the public IP address of the router should be forwarded to specific port (could be port 22 itself) of specific private IP address in the subnet that has the ssh daemon running on that specific port.

In addition, hosts in the subnet can, of course, also be directly connected to internet (meaning it will have a public IP address in addition to a private subnet IP address). Each such host will have at least 2 NIC, one connecting it to the subnet and other connecting it to the internet gateway.

In all these network architecture examples, internet gateway can be replaced by the virtual private gateway if the subnet has to be behind a corporate VPN.

Classless Inter-Domain Routing

With private address spaces and private subnets as discussed above, the Classless Inter-Domain Routing (CIDR) replaced the classful routing in IPv4 starting in about 1993. As discussed above, in CIDR, the private address spaces are described by combination of IP address and a subnet mask such as a.b.c.d/16. All network interfaces in a network should have an IP address from the subnet that it connects to. DHCP servers make sure that all hosts in a subnet get proper private IP addresses within that subnet. If network admin are assigning IP address manually to each NIC then they need to ensure consistency otherwise packets will not route properly.

With CIDR, subnets can be any arbitrary size as determined by the subnet mask (i.e. they do not have any class). In past, IP addresses were classful with first few digits of IP address determining the class of the IP address. Some were part of very large class while others were part of smaller class. Depending upon the class of the IP address, global routers will route the packets to a subordinate router. This is not used any more with CIDR.

Internet Layer Network Hardware: Network Interface Controller, Router and Gateway


Routers are known as layer 3 devices as they operate on third layer (the network layer) of OSI model. Most popular data protocol used in layer 3 is IP. Sometimes these may also be known as IP routers. Router connects two (or more) networks/subnets. Both networks can have multiple devices in it. Both networks independently assign their own private IP addresses to each of their devices. Router gets an IP addresses from each of these networks that it is connecting. Routers typically maintain routing tables so that they can decide fast routes for sending packets.

Routers have significant amount of logic/functionality built in. Modern routers serve several functions. Simplest routers act like a switch, a network gateway, a DHCP server and a NAT gateway at the same time. There can be more features and functionality.

IP networks are hierarchical. Meaning a network can be broken down into subnets. Everything behind a router is a subnet. A group of interconnected routers can form a network that is network of subnets. Note that subnet itself is a proper IP network, so these terms are often used interchangeably.

As basic router functionality, it connects two networks. It has two NIC, one for each network. Each of these NIC receives an IP address from the network they are connected to. If router is connected to internet then, one of these IP address is know as its public IP address and other as its private IP address. Router looks into both the source and destination IP address fields of packets it receives for routing. If IP address is in one of the 2 networks then it forwards it to that network otherwise it forwards it to the default gateway.

When switched ON, the DHCP server functionality in the router assigns IP addresses to devices on the subnet.

As a switch (for the private subnet) to it keeps a table of MAC address, connected port and assigned private IP addresses of each connected device. It also keeps information about the next closest router connected to it. This closest router is typically called a "default gateway". And this table is known as routing table.

Routers also act as NAT gateway allowing, for example, one internet connection (one public IP address) to be shared between multiple devices in the subnet. Switch/hub creates network. It does not connect two networks. It can not help sharing internet. A switch also does not have DHCP capability. DHCP is usually performed by a router.


Gateway implies two main functionality. Firstly, as the name suggests, it is an entry door for a network/subnet. Secondly, it may be used for protocol translation. For example, two networks that are being connected run on two different protocols then a gateway may be needed to do the translation. With large majority of networks nowadays being TCP/IP networks, gateways typically just imply being an entry door for a network/subnet.

In this sense every router is a gateway for the subnets that are connected to the router. Router is the default gateway for those subnets. Meaning any packet that is destined for an IP address not in the subnet is sent to the router. The kernel routing table of the router itself would define another default gateway. Any address that is not in all the subnets that router connects are forwarded to that default gateway (may also be known as the internet gateway depending on the context). This next closest router is the gateway of the other network.

In this sense, a router is basically a combined function of a gateway and a switch. Every router is a gateway but every gateway is not necessarily a router (it may just be a switch or just have a route to the switch of target subnet).

Transport Layer Standards: TCP and UDP

The TCP (for the point-to-point communication) and the UDP (for the multicasting or broadcasting) are two most common examples of Transport Layer protocols. Transport Layer encapsulates the application data blocks and passes it to the lower Internet Layer. In abstract sense, just like a browser appears to be directly communicating to webserver at Application Layer level, transport layer of host talks to the transport layer of the server directly. Transport layer protocols make sure that they can deliver error free/reliable data to the application layer.

TCP (one example of the Transport Layer protocols) deals with data integrity and correct sequencing of data. The Internet Layer (lower layer) deals with routing the packets. Both the Transport Layer and Internet layer protocols are typically implemented within the OS kernel. They are not essential components of OS, though. For example, you can compile linux without networking even though that is very rare. On the other hand, Application Layer programs are in the user space and not part of OS kernel. Link Layer protocols are typically implemented in hardware (special chipsets, firmware) or as device drivers. OS still includes components of Link Layer (e.g. network interface software) that allows IP suite stack to be able to work with many different Link Layer protocols such as Ethernet or Wi-Fi LAN etc.

TCP packets are called segments while UDP packets are called datagrams. TCP is fairly elaborate protocol. Such protocols are known as connection-based protocols (point-to-point as opposed to multicasting, read below). TCP first sends some byte stream in order to establish a connection. It then sends data as numbered segments, receives and reorders them. It also provides error detection using error detection codes and prevents lost packets through automatic repeat requests, flow control and congestion control.

On the other hand, UDP is a very simple protocol, and does not provide virtual circuits, nor reliable communication, delegating these functions to the application program that is built to utilize UDP transport instead of TCP transport. TCP is used for many protocols, including HTTP web browsing and email transfer. UDP may be used for multicasting and broadcasting, since retransmissions are not possible to a large amount of hosts. TFTP (trivial file transfer protocol, a little brother to FTP), DHCPCD (a DHCP client), multiplayer games, streaming audio, video conferencing, etc. typically use UDP transport. For unreliable applications like games, audio, or video, you just ignore the dropped packets, or perhaps try to cleverly compensate for them in the Application Layer. Why would you use an unreliable underlying protocol? For speed. It's way faster to fire-and-forget than it is to keep track of what has arrived safely and make sure it's in order and all that. If you're sending chat messages, TCP is great; if you're sending 40 positional updates per second of the players in the world, maybe it doesn't matter so much if one or two get dropped, and UDP is a good choice.

Just like Application Layer protocol HTTP introduced the concept of a URL, the Transport Layer introduces the concept of ports and sockets, the Internet Layer introduces the concept of IP address and the Link Layer introduces the concept of MAC address.

Transport layer can receive data from several different application layers such as FTP, HTTP etc. Hence, TCP encapsulation headers will contain the port number of the application together with some error control bits etc.. Port number is used to decide which data belongs to which application. Different applications talk and listen to different ports. One should note that the IP address is NOT part of the TCP header. IP address are part of Internet Layer headers not the transport layer headers.

When applications communicate via IP they must specify not only the target's IP address but also the "port address" of the application. A port address uniquely identifies an application. Standard network applications, on the server side, use standard port addresses. For example, HTTP (the web) is port 80, HHTPS is port 443, SSH is port 22 telnet is port 23, SMTP is port 25 and so on. Ports under 1024 are often considered special, and usually require special OS privileges to use. For example, a non-privileged user can not start a wenserver to listen to port 80. Non privileged user can start the webserver on a non-privileged port. These registered port addresses can be seen in /etc/services.

Just like IPv4 and IPv6, the TCP and UDP protocols are part of OS kernel. Protocols are implemented using a concept known as sockets. When Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor. A file descriptor is simply an integer associated with an open file. A file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just about anything else. Everything in Unix is a file! So when you want to communicate with another program over the Internet you're gonna do it through a file descriptor --- this is Socket. You make a call to the socket() system routine to open a socket. It returns the socket descriptor, and you communicate through it using the specialized send() and recv() socket calls. Since, it's a file descriptor, you might as well use the normal read() and write() calls to communicate through the socket but send() and recv() offer much greater control over your data transmission because send(), rec() will pass it to TCP/UDP sublayer. TCP uses stream sockets while UDP uses datagram sockets.

Sockets are not limited to Unix systems. All modern operating systems implement a version of the POSIX socket interface, for example, even the Winsock implementation for MS Windows, developed by unaffiliated developers, closely follows the POSIX standard. Most popular POSIX socket API is written in the C programming language and most other programming languages provide similar interfaces that are typically written as a wrapper library based on the C API. Concept and context of sockets is actually wider than the IP protocol suite in the sense that IP protocol suite communication channels are simply implemented using different socket domain called AF_INET (for IPv4) and AFNET6 (for IPv6) domain sockets. There are other socket domain that are all implemented based on the same POSIX socket API. For example, Unix domain socket (AF_UNIX) is a data communications endpoint for exchanging data between processes executing on the same host operating system (also known as inter-process communication or IPC) and these are also interfaced with the same sockets API that is used for IP protocol suite as well as all other supported network protocols. In addition to socket domain, one also needs to specify the socket type. For example, Stream Sockets (SOCK_STREAM for TCP reliable stream-oriented service) and Datagram Sockets (SOCK_DGRAM for UDP datagram service).

Relationship between network end point address (i.e. the IP address + port combination) and the socket is analogous to the relationship between a file and a file handle/descriptor. When an applications/processes opens read/write pipe to a file, it receives a new file descriptor. Similarly when a new connection is created to same network address a new socket is created. For example, lets consider an example of a HTTP webserver running on the server host (say IP a.b.c.d and port 80) and several remote client browsers running on the different client hosts (say port 30050 and port 60070) establishing separate connections to the server. On the server host, when HTTP daemon is being deployed, socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) method will be used to create an endpoint for communication specifying the domain, type and communication protocol. Method will return a file descriptor, sockfd, for the socket on success. One would then use the method bind(sockfd, a.b.c.d:80, addrlen) bind a socket to an HTTP server address. After a socket has been bound with an address, listen(spckfd,backlog) prepares it for incoming connections (backlog specifies the maximum number of pending client connections). Only one socket, among all sockets associated to a given address, can be in the "listen" state ensuring that only one server/daemon services the requests. When an application is listening for stream-oriented connections from other hosts, it is notified of such events and must initialize the connection using the accept(sockfd, or,addrlen) function to accept connection request events from or On the server host machine, the accept() creates a new socket for each accepted connection and return returns the new socket descriptor for the accepted connection, or -1 if an error occurs. Once a connection is accepted, it is dequeued (does not count against backlog). All further communication with the client now occurs via this new socket descriptor.

Clients will themselves use socket() to first create a socket on its side and get the clientsockfd. Then it will use connect(clientsockfd,a.b.c.d:80,addrlen) to connect to a remote server that is already listening for incoming connections. Once connection is successful, send() and recv(), or write() and read(), or sendto() and recvfrom(), are used for sending and receiving data to/from a remote socket. To use many of socket features, you need to use the send() / recv() family of system calls rather than write() / read(). close() causes the system to release resources allocated to a socket. gethostbyname() and gethostbyaddr() are used to resolve host names and addresses. select() is used to pend, waiting for one or more of a provided list of sockets to be ready to read, ready to write, or that have errors. poll() is used to check on the state of a socket in a set of sockets. The set can be tested to see if any socket can be written to, read from or if an error occurred. getsockopt() is used to retrieve the current value of a particular socket option for the specified socket. setsockopt() is used to set a particular socket option for the specified socket.

Application Layer Standards

FTP, HTTP, POP3, IMAP, SMTP, SSH, Telnet, rlogin, DNS (domain name service) etc. are some common application layer protocols. Application layer programs are usually configured as client and server, with each running on one of the two hosts that want to communicate to exchange data or information.

Lets consider a simple example. The HTTP (hyper-text transport) protocol is one of the simplest simplest examples of a protocol used between application-to-application. In this scenario a web browser running on one of the internet host machine is considered as the HTTP client and a webserver program (such as Apache, Microsoft IIS or Nginx) running on another internet host machine hosting the website that is listening to the HTTP requests coming from the client browser is considered as the server. Client will send the HTTP address (i.e. the URL) of the document it wants in certain predefined format. And server will send that document back in certain predefined format. This exchange protocol is HTTP protocol.

Both applications (i.e. the client and the server) are not worried about loss of data or error in data. Both applications are assuming that the Transport Layer has taken care of data reliability. So the HTTP protocol only deals with the format of the request message that client will send and the format of the response message that server will send back.

Network Routing Details

ARP Table

When application on a network device wants to send a message to a destination device, the IP Layer passes the data to Link Layer. But before doing so, the IP layer has to translate IP address to the appropriate Link Layer addressing system (MAC addresses in the case of LAN). The OS kernel includes a protocol called Address Resolution Protocol (ARP) for the IPv4 version of IP Layer and the Neighbor Discovery Protocol (NDP) for the IPv6 version of IP Layer. This kernel protocol provides functionality to allow TCP/IP to use variety of different Link Layer systems including LAN. For example, once kernel utilities know that network is using LAN for Link Layer, it will create ARP datagrams and encapsulate them in the LAN packets and send it out within the local network. This is a broadcast message that is asking device that has a given IP address within the network to respond back with its MAC address. The ability of each device to send broadcast messages to all devices on the LAN is important and results from the fact that LAN is a broadcast domain. This way the Link Layer of TCP/IP suite in each device maintains a table for IP address to Link Layer address mapping. Once table is populated, IP Layer will pass the datagram to LAN and ask it to send it to specific MAC address. LAN protocols will then take care of rest (as discussed below Link Layer network switches will map MAC to hardware port ID).

Kernel Routing Table

When an application on a network device wants to send a message to a destination device, and if the device has multiple NIC, the IP Layer utilities should be able to find out which NIC to use. The OS kernel maintains a routing table (known as the kernel forwarding table or sometimes simply as the kernel routing table) for this purpose. NICs are Link Layer devices and use Link Layer addressing system. Therefore, in addition, once the NIC has been identified, the OS kernel needs to be able to find the Link Layer address of the destination. As discussed above, the ARP table is used for the that purpose.

The kernel routing table consists of a series of entries. Each entry consists of multiple fields. For example, the table may look like this:

destination netmask gateway interface GW le0 local le0

default N/A GW le0

In this example, there is only one network interface le0. There are 3 routes defined for this network interface le0 in the kernel routing table. As described later in a separate section, kernel routing table can be printed using kernel utilities such as ip r list. Kernel prepares this list by combining routes defined for each of the NICs defined on the host. Each NIC will at the lest specify the subnet that that NIC connects to. Alternatively, NIC might receive this information from a DHCP server when NIC connects to a network as described above. In addition, at least one of the NICs may also define a default route for the IP addresses that are not in the subnet that NIC connects to. This typically called the default gateway. NIC may also define additional static routes, typically those are also gateways to other network or subnets.

Direct Routing Using ARP and Routing Table

Lets consider an example of the direct routing to understand how packets are routed to a destination device that is connected directly to the same network as the source device.

  • The IP layer of the machine SO receives a packet addressed to the machine DEST at the IP address Notice that the IP layer headers of the packet always contain the true ultimate destination address of DEST. The issue that we are currently discussing is the procedure the OS kernel uses to find out how to prepare the Link Layer headers.
  • In order to prepare the Link Layer headers, SO consults its kernel routing table. Lets take the example of above described routing table with only one network interface le0 and 3 routes defined for this network interface le0 in the kernel routing table.
  • SO applies each netmask to the destination IP address until it finds a match with the destination address. For example, applying the mask to the destination IP address results in which matches the second destination in the routing table. If no match is found, SO uses the third default entry (known as the default gateway).
  • Having found a destination match, SO uses the gateway and interface fields of the second entry for preparing the Link Layer packet (i.e. for preparing the Ethernet headers in this case). SO addresses the Link Layer packet for the gateway of that entry and transmits that Link Layer packet through the network interface of that entry. Please notice that the IP Layer headers will always have the IP address of the true destination DEST.
  • In this case, the gateway is local -- meaning the local network. Therefore, in this direct routing example, SO prepares the Link Layer packet also with the true ultimate destination address DEST. But Link Layer routing needs Link Layer address of DEST. In this case, the second entry tells the kernel that the interface is to an Ethernet, therefore SO does a lookup in the ARP table to translate the IP address for DEST to a MAC address for DEST
  • When SO transmits the Link Layer packet on the Ethernet with the MAC address of DEST in the header, it is directly received by DEST.
  • The IP Layer of DEST looks at the IP address in IP Layer headers and decides that the packet has already reached its destination and no further routing is necessary.

Indirect Routing Using ARP and Routing Table

Next, lets consider an example of the indirect routing to understand how packets are routed to a device not directly connected to the local network as the source device.

  • The IP layer of the machine SO receives a packet addressed to the machine DEST2 at the IP address Again, notice that the IP layer headers of the packet always contain the true ultimate destination address of DEST2. The issue that we are currently discussing is the procedure the OS kernel uses to find out how to prepare the Link Layer headers.
  • SO consults its kernel routing table, which is same as the one shown above.
  • As described in the previous example, SO applies each netmask to the IP address until it finds a match with the destination address. In this example, it will match the first entry. Therefore, SO uses the gateway and interface fields of the first entry. 
  • Having found a destination match, SO uses the gateway and interface fields of the first entry for preparing the Link Layer packet. SO addresses the Link Layer packet for the gateway of that entry and transmits that Link Layer packet through the network interface of that entry. Routing table tells the kernel that the gateway for that entry is GW and the interface is le0 (which kernel knows that as an Ethernet interface). Please notice that the IP Layer headers will always have the IP address of the true destination DEST2.
  • Therefore, in this indirect routing example, SO prepares the Link Layer packet not with with the Link Layer address of the true destination DEST2 but with the Link Layer address of the gateway GW of the first entry that has an IP address of But, the Link Layer routing needs Link Layer address of that device GW. In this case, that entry of routing table tells the kernel that the interface is to an Ethernet, therefore SO does a lookup in the ARP table to translate the IP address for GW to a MAC address for GW and is then received by the Link Layer of GW
  • When IP Layer of GW receives the packet, it reads the ultimate destination IP address as DEST2. Finding that the address is not its own, and because GW is configured as a router, it consults its kernel routing table as SO did above. GW finds that the ultimate destination address can be reached via the local gateway and sends the packet out the local Ethernet interface addressed to the ultimate destination MAC address.
  • DEST2 receives the packet.

Kernel Network Management Utilities ip and arp

On Kogence Cloud HPC platform, users do not have privileges to manage/change the network settings but they can use these utilities to check the network configuration.

Two basic kernel utilities are popular for exploring and managing networks -- ifconfig and ip. The ifconfig is a legacy utility that is part of the net-tools package and is not recommended anymore. The ip is a newer recommended uility that is paty of the iproute2util package.

The basic syntax of ip utility is:

ip [ OPTIONS ] OBJECT { COMMAND | help }

where OBJECT is the object you want to manage/modify which can be

  • link (l) - for displaying and modifying network interfaces
  • address (a or addr) - for displaying and modifying IP addresses of network interfaces
  • route (r) - for displaying and modifying the routing table.
  • neigh (n) - for displaying and modifying neighbor objects (ARP table).

For example, when operating with the link object the commands take the following form:


The most commonly used COMMANDS used when working with the link objects are: showsetadd, and del. For example, ip link show will show all available network interfaces while ip link show dev eth0 will show the details of the eth0 interface. The command ip link set eth0 down will disable the eth0 network interface while ip link set eth0 up will enable the same interface.

When operating with the address object the commands take the following form:


The most frequently used COMMANDS of the address object are: show, add, and del. For example, ip a show will show IP addresses of all network interfaces while ip addr show dev eth0 will IP address(es) of the eth0 network interface. The command ip address add dev eth0 will assign the IP address in a /24 subnet to the interface eth0. Similarly, ip address del dev eth0 will remove the assigned IP address from eth0 interface.

Similarly, working with the route object, ip r list will list all the routes in the kernel's routing table while ip r list will display the route for the subnet

If we want to add a route that specifies the IP address as the gateway for the subnet , we will use the command ip r add via while ip r add dev eth0 will add a route to subnet that can be reached on device eth0. The command ip r add default via dev eth0 will add a default route via the local gateway for the device eth0.

For permanently adding route to the network interface eth0, one would need to edit /etc/network/interfaces/route-eth0 to append something like via dev eth0 and then restarting the network service using systemctl restart network.service.

The kernel utility arp is useful for exploring the ARP table. The syntax takes the form:

arp [-v] [-i if] [-t type] -a [hostname]

By default arp will print full ARP table. -v is for verbose printing. -i, -t and -a can be used to filter results specific to an interface, specific to a type of hardware or specific host. List of possible hardware types (which support ARP) are ash(Ash), ether(Ethernet), ax25(AMPR AX.25), netrom (AMPR NET/ROM), rose (AMPR ROSE), arcnet (ARCnet), dlci (Frame Relay DLCI), fddi (Fiber Distributed Data Interface), hippi (HIPPI), irda (IrLAP), x25 (generic X.25), eui64 (Generic EUI-64).

By the way, ip utility allows one to enable or disable ARP on an NIC through ip link set dev eth0 arp on or ip link set dev eth0 arp off.