Docker Networking Fundamentals
One of the reasons why Kogence uses Dockers as the HPC containerization technology is that the Docker provide rich network isolation features that are not available in other HPC containerization technologies such as Singularity. Docker container running on same host or on different hosts of a cluster can be connected as well as to the host on which they are running. This allows Kogence to orchestrate HPC workloads on containers such that these workloads are not even aware that they are running inside Docker. Similarly, in the case of distributed memory multi-node workloads such as MPI workloads are also not aware whether their connected peer workloads are also Docker workloads or not. Whether your Docker hosts run Linux, Windows, or a mix of the two, we can use Docker to manage them in a platform-agnostic way.
Docker’s networking subsystem is pluggable, using drivers. Docker offers several drivers such as
overlay by default, and provide core networking functionality. On Kogence Container Cloud HPC platform we use each of these depending upon the workload you launch. User remains agnostic to the details and her workload automatically runs on a container cloud cluster orchestrated with the most optimal choice for her specific workload.
Docker Bridge Network
If Docker container are started with default bridge network then the containers will use a separate network namespace and the host network interfaces, routing tables, ARP tables etc will not be visible inside the container.
Bridge network works with a single host only. It provides network connectivity between the host and the containers running on that single host.
Docker daemon creates a virtual switch on the host. This virtual switch typically shows up as the
docker0 network interface in the list of interfaces on the host. The
docker0 interface is connected to a private subnet and receives the first IP address of that private address space. As with any other LAN hardware, this switch (i.e. the
docker0 interface) also gets a MAC address. The host is able to receive and send packets using this network interface on the connected subnet either using Ethernet broadcasting mechanism (e.g. when destination MAC address is not known) or using point to point mechanism through the destination MAC address of any other network device connected to the same subnet. As the host communicates with other network devices using this interface, the host kernel will populate the MAC addresses of other network devices connected to this subnet in its ARP table. The host kernel routing table will also be populated with the subnet IP address range (i.e. the subnet mask) of this subnet so when host receives a packet, on any of host's network interface and not just
docker0, destined for an IP address in this subnet range then host will use this
docker0 network interface to further route the packet. Note that the host acts as the gateway for this subnet and therefore, hosts kernel routing table does not get a default gateway entry for this interface.
When a container, say
cont#1, gets created (either using
docker run or
docker create) on this host, the Docker daemon adds a new virtual port (technically they are known as the
veth-pair type network interface) on this virtual switch. This virtual port shows up as
vethXYZ@ifAB interface on the host machine. Lets say this interface has an interface ID of
CD. The Docker daemon also creates a new
eth0@ifCD network interface inside the container
cont#1. This interface will have an interface ID of
@ifAB at the end of
vethXYZ@ifAB indicates that this port on the switch (identified by the interface ID
CD) is connected to the interface ID
AB (which is the
eth0@ifCD interface on the container
@ifCD at the end of
eth0@ifCD indicates that this Ethernet interface is connected to the interface ID
CD (which is the
vethXYZ@ifAB port on the virtual switch on host). Please note that technically all interfaces are created on the host machine -- they are just in different network namespaces. Kernel assigns the numerical network interface IDs to all network interfaces sequentially across all namespaces.
Please note that these ports (i.e. the
veth-pair type network interface) on the virtual switch do not receive an IP address even though they show up as proper network interface with a MAC address in the list of network interfaces. But these MAC addresses will never get populated in an ARP table and these interfaces will never get used directly. The host will use the
docker0 interface to send packets to the containers and the containers will use their own interfaces like
eth0@ifCD to send packets to the host as as well as to the other containers. These interfaces like
vethXYZ@ifAB should be mentally modelled as ports on the switch with an Ethernet wire connected. The interface on the other end (such as the
eth0@ifCD interface) that this Ethernet wire connects to is the one that receives an IP address not the port itself. Technically,
vethXYZ@ifAB is a slave interface while the
docker0 is the master interface. Any outbound packet that goes through the
vethXYZ@ifAB interface will get the
docker0 IP address as source IP address.
eth0@ifCD interface on the container
cont#1 is also connected to same private subnet as the
docker0 and receives the a unique private IP address of that private address space. As with any other LAN hardware, this interface also gets a MAC address. The processes on this container are able to receive and send packets using this network interface on the connected subnet either using Ethernet broadcasting mechanism (e.g. when destination MAC address is not known) or using point to point mechanism through the destination MAC address of any other network device connected to the same subnet. As the container communicates with other network devices using this interface, the container will populate the MAC addresses of other network devices connected to this subnet in its own ARP table. The kernel routing table of the container will also be populated with the subnet IP address range (i.e. the subnet mask) of this subnet so when container receives a packet, on any of its network interface and not just
eth0@ifCD, destined for an IP address in this subnet range then the container will use this
eth0@ifCD network interface to further route the packet. Note that the host acts as the gateway for this subnet and therefore the container's routing table table will also list the
docker0 IP address as the default gateway entry for this interface. So any packet that is not destined for this subnet will be forwarded to the host machine on the
docker0 interface. Host machine can then do NAT/PAT (if configured) and use its other network interfaces (if configured) to forward the packet to correct destination out side of host and containers private network. Please not the by default, NAT/PAT is not configured. So by default container can access the internet provided host has other network interfaces that are connected to the internet but the reverse is not possible (unless NAT/PAT in configured on host by publishing ports from container), i.e. the internet cannot access the containers even though host may be made accessible on the internet.
On Kogence container cloud HPC platform, by default, we do not publish any of the containers port. If your workloads needs to expose some ports then those specific ports are published specifically for your workload only. Corresponding changes on the host's network router and host's iptables are also made automatically as needed.
Docker Host Network
If Docker's host network driver is used then Docker containers do not run in a separate network namespace. There will be not network isolation and the workloads running in the container will use the host network stack. This means that the host's network interfaces, routing tables, ARP tables etc will be visible inside the container.
By default, Kogence container cloud HPC platform uses the host network and therefore all network and host level security implemented applies even to the workloads running in the containers as well. Other network choices are made only for specific types of workloads for which host network does not suffice. Please contact us for further details.