Dealing with Fragmented Traffic

Fragmented IPv4 traffic may cause you a lot of problems in real life. Not only it increases the load on router CPUs, but also impacts applications performance (e.g. TCP needs to re-send the whole packet on a single fragment loss). In addition to that, traffic fragmentation is used in numerous network attacks, allowing an attacker to bypass firewalls or IDSes in some situations. Due to all these reasons, you may want to avoid fragmentation at all and/or ensure your network is insulated from fragmented packets. Unfortunately, there are cases when using IPv4 fragmentation is unavoidable.

MTU Mismatch Issues

Fragmentation occurs when you have MTU mismatch on the path between two communicating endpoints. Endpoints may only know about their local MTU settings, but not about the minimum MTU along the path (although an MTU discovery procedure exists). Nowadays, most of end-user connections are Ethernet-based. This means you commonly expect to see endpoint MTU values of at least 1500 bytes. Even while modern equipment enjoys jumbo GigabitEthernet frames of more than 9Kbyte in size, by default you commonly see MTU set to 1500 bytes everywhere. The default value is usually OK when transported across Internet, since most (read: good) ISPs support user-side MTU of 1500.

Your Ethernet network may function perfectly, until the day you decide to “virtualize” networking resources. Commonly, people face MTU issues when they run tunneling technologies on top of Ethernet with its default MTU value of 1500. These could be GRE, IPsec tunnel or MPLS – you name it. Any tunneling solution that does not terminate at the endpoint PC, may cause MTU issues and lead to packet fragmentation. In most cases, if you originate tunnel from user’s PC, the software automatically adjusts MTU.

The problem lies in the fact that by default many switches use system MTU value of 1500. Many lower-end fastEthernet switches don’t even support jumbo frames. If a VPN tunnel traverses a whole Ethernet infrastructure you face the problems of upgrading all switches to support jumbo frames. In large installations this could be a serious issue, so some workarounds would be nice to have.

Path MTU Discovery

Defines in RFC1191 and called PMTUD for short, this simple procedure allows endpoints to dynamically discover the minimum MTU across the communication path. The procedure uses special Don’t Fragment (DF) flag in the IP packet header. Packets with this flag are never fragmented, but rather dropped when a router sees that the packet does not fit outgoing link’s MTU. When dropping the packet, the router should signal back to the sending host with a special ICMP unreachable message, telling that the packet has been dropped due to the large size and suggesting the new MTU value.

Based on these, an endpoint may first “probe” a new path with MTU-sized, DF-bit marked packets. By listening to the ICMP responses the host may find the proper path MTU value. PMTUD is commonly started with a first TCP session between the two hosts.

The problem with PMTUD is that in modern Internet the router-based signaling does not work well. Routers either rate-limit the ICMP unreachable message or firewalls filter them. This effectively prevents PMTUD from working properly and often makes it rather another problem rather than a solution.

In order to resolve PMTUD “issues”, people commonly use either of two hacks in Cisco IOS routers. The first one is removing DF bits from all packets (commonly TCP packets) using a route-map like-this:

route-map CLEAR_DF
match ip address TCP
set ip df 0
!
interface FastEthernet 0/0
ip policy route-map CLEAR_DF

However, this solutions results in the ingress router fragmenting packets (the effect you ultimately want to avoid). Therefore, use the above configuration as last resort, when your customers start bashing you :)

Preventing Fragmentation

The second procedure applies to TCP traffic only (it’s the most often use type of communications anyways). Using the special TCP option called MSS (Maximum Segment Size) a TCP endpoint may signal the other end about locally supported MTU (basically, MSS = MTU-IP_header_size-TCP_header_size). Endpoints select the minimum MSS supported by both parties. Cisco IOS support special feature called TCP MTU adjustment, which allows router to rewrite the option with the value provided by system administration. By setting this option to a value matching new MTU, you may trick both endpoints into thinking that the actual MSS is lower then they suppose. The interface level command is: ip tcp adjust-mss [value]. With this command configured, every incoming TCP SYN packet is inspected for TCP MSS option and the value is changed per the configuration. You can see this feature often implemented with VPN solutions, such as PPPoE, DMVPN, GRE, etc.

Of course, the best way to prevent fragmentation and PMTUD issues is setting the underlying MTU to a value large enough to accommodate the original packet with tunnel overhead (GRE, MPLS, IPsec). If you’ve done that before actually implementing VPN solutions, you’re a lucky person ;)

Filtering Fragments

The final part is getting your network rid of fragmented traffic. With Cisco IOS you have two ways to accomplish this. The first solution is matching fragments with a special “fragment” keyword like this: permit ip host 1.1.1.1 host 2.2.2.2 fragments. This ACL entry matches any non-initial packet fragment. Non-initial fragment is IP packet with non-zero fragment offset (FO) field. When router fragments a packet, the packet splits as follows:

a) Initial fragment: the first part of the packet. This fragment has the “M” (more) flag set, meaning more fragment will follow. Has the FO value of zero, signaling the position of the fragment inside the original datagram. Usually, this initial fragment contains the upper-level protocol header (such as TCP/UDP) and thus bears the port numbers information. Note that normal packets have FO=0, M=0.

b) Non-Initial fragments: packets with non-zero FO field, meaning these fragments follow the initial one. These fragments have M-bit set, with exception to the final fragment. The final fragment has M bit set to zero, signaling the end of the sequence.

Based on the above, the “fragments” keyword could only be used with ACL entries that do not reference any port number or TCP flags. This is because non-initial fragments don’t carry the upper level protocol information. (Well, there are some special cases [tiny fragments] when crafted fragments are used to split TCP port information across the fragments, but discussing those in depth is beyond the scope of this document).

Note the special treatment that IOS provides to non-initial fragments when matching against an access-list entry that contains protocol/port numbers, e.g. permit tcp any host 1.1.1.1 eq 80. If a non-initial fragment is matched against this entry, then IOS ignores any upper-level protocol information in the ACL entry (the protocol and the port number) and compares the source/destination IP in the fragment with the ACL source/destination values. Effectively, this procedure permits any non-initial fragments matching layer 3 information in the ACL entry. The same goes to “deny” entries matching against the non-initial fragments.

Therefore, if you want to prevent fragmented IP packets from reaching you application ports, put a “deny” statement with “fragments” keyword before the “permit” statement allowing traffic to the application port, like this:

ip access-list ONLY_NON_FRAGMENTS
deny ip any host 1.1.1.1 fragments
permit tcp any host 1.1.1.1 eq www

IP Virtual Reassembly

Virtual Reassembly is special IOS feature that allows the router to obtain full picture of a fragmented packet on the fly. When you activate virtual-reassembly on interface, using the command ip virtual-reassembly, IOS starts tracking all incoming fragmented packets. The code delays fragmented packets until it receives all of them, or until the maximum reassembly timeout expires (there are some other thresholds, discussed below). After this, the router performs “virtual” datagram reassembly. Here “virtual” means the packet is not getting actually assembled into a single entity, but rather IOS views it as a whole for subsequent processing. If the router does not receive all fragments during the reassembly timeout, the incomplete packet is dropped.

Reassembly is needed by some applications such as NAT or IPS in order to perform true packet inspection. For example, NAT ALGs (Application Level Gateways) may need to rewrite the packet contents or get some additional information no present in a single fragment. (This is why you can see virtual-reassembly enabled automatically when you activate NAT on interface). After the virtual reassembly procedure and other internal processing have been performed, router switches all datagrams in their original, fragmented form. The router does not send out the assembled datagram to avoid any MTU issues.

Here is the full syntax for the virtual-reassembly command:

ip virtual-reassemblymax-reassemblies number] [max-fragments number] [timeout seconds] [drop-fragments]

In this syntax, max-reassemblies tell the router the maximum number of simultaneous packets to track on the interface. If a fragment for a new packet arrives on the interface, and there is already maximum number of states present, the router drops the new fragment. This is a security precaution against DoS attacks to drain out router resources. Next, max-fragments is the maximum number of fragments allowed in a packet. The any single reassembly currently in progress the router will reject any further fragments exceeding this count. The timeout value is the maximum amount of time the router will wait for assembly to complete before discarding all accumulated fragments. We mentioned this value above. Finally, if you specify drop-fragments keyword, the router will drop any fragments received on the interface. A quick and simple way to block any fragmented traffic.

Finally, virtual reassembly automatically detects common fragmented packets attacks, such as tiny fragments (hiding TCP/UDP port numbers in non-initial fragments) or overlapping fragments (crafting fragments so that they overlap in the actual packet). This provides some minimal level of protection and makes packet fragmentation less risky from security perspective (though still highly undesirable from applications perspective).