Using NBAR for HTTP URL filtering

NBAR protocol classification feature has long supported enhanced HTTP URL matching features. However, Cisco documentation site never provided a detailed description of the pattern language used for URL matching; neither has it explained how the engine matches client/server data streams. In this post we will give an overview of how NBAR works with URL filtering.

We will arrange this post in a FAQ manner as follows.

Q1: What is the syntax for matching the URLs?
A1: The syntax resembles regular expressions but it is actually not. Rather, it is more similar to using globs or wildcard special characters. The pattern you configured is matched against the string found after the “GET”, “POST” or “PUT” method in HTTP request packet. Note that NBAR is smart enough to remove the leading “/” in the file path. However, the HTTP request must end up with “\r\n” or the NBAR StILE (stateful inspection language engine) will not recognize it (an easy way to fool the inspection engine)!

Here is the list of the available wildcards:

“*” - match any pattern e.g. “aaa”, “abcd1234”. To match the substring “xyz” in the beginning of a string use the pattern “xyz*”. To match “xyz” anywhere in the string use the pattern “*xyz*”. Use the pattern “*xyz” to match the substring in the end. Note that pattern “xyz” matches only the exact string “xyz”. You may also use complicated patterns “ab*cd*ef” at the expense of some CPU penalty probably.
“?” - match any single character. For example pattern “???” matches “xyz”, “abc”,”efg”,”123”. You can mix “*” and “?” like in pattern “tes*.????”
”[]” - match range of characters. E.g. “[abc]” will match any single character “a”, “b” or “c”.
”|” - alternative. Separate patterns with “|” in order to specify “OR” matching logic. For example “xyz|abc” matches either full “xyz” or “abc” strings. You may mix “|” with other globs like this “*xyz*|*abc*|*pqr*”. Note sure about the overall limit of using the “|” symbol but you’d better keep it to minimum, in order to make matching faster.
“()” - grouping. Denotes the boundaries of a sub-pattern. For example instead of “*.txt|*.bin” you can write “*.(txt|bin)”.

All matching is case insensitive. So the pattern “text” matches “TEXT” as well. The engine matches your URL pattern against the directory path and the file name in the URL. E.g. If the URL is “http://www.cisco.com/pub/uploads/image.jpeg” the URL matching will only use the “pub/uploads/image.jpeg” part of the URL. As a matter of fact, when you submit request like the above URL, it translates into the following headers (there are actually more, but this is the bare minimum):

GET /pub/uploads/image.jpeg HTTP/1.1
Host: www.cisco.com

Q2: What if I want to match the host name used in the URL?
A2: You need to match the “Host” header field then. Use the match protocol http host [pattern]. You can use the same glob patterns for matching that you use to matching the filename in the URL.

Here is a good example: Using NBAR for application filtering

Q3: How NBAR actually classifies the traffic flows?
A3: When you apply a policy-map that contains a class with “match protocol” statement, the system starts NBAR classification engine on the interface. Any packet, be it ingress or ingress, passes the NBAR inspection engine provided that it passes the basic filters like matching the port number assigned to the protocol. You can change the port map using the global command ip nbar port-map [protocol].

When the engine sees a TCP SYN packet for the matching session, it starts the internal state machine, trying to parse the packet flow. Every new packet in the flow (in any direction) is inspected. Note that the NBAR does not classify a flow instantly. It may take some packet exchange until the engine determines that the flow matches specified criteria. As soon as there is enough information about the flow to classify it, the engine “tags” the bi-directional flow with the corresponding class value and reports this decision to the policy-map. Note that this classification applies in both directions – that is to the packets belonging to the flow and heading either ingress or egress.

For example, if a user starts a web sessions ands opens an URL matching any of your NBAR criteria, the engine will classify the flow as soon as it sees the packet with the URL string. After this, both packet flows from the client to server or from the server to the client would the respective class – that is, the policy map could be applied in either direction of the interface.

The engine will remove the “tag” when it faces the flow “end” criteria: e.g. it catches a TCP FIN or TCP RST flag. After this, flow is no longer monitored by NBAR and reported as matching the respective class.

Consider the example below. In this scenario, Fa0/0 is the outside interface, facing the internet. User’s traffic flows out of this interface.

class-map match-all TEST

 match protocol http url "*.(t?xt|ocx|ex[ea])"

!

policy-map TEST

 class TEST

   drop

!

interface FastEthernet0/0

 ip address 155.1.37.3 255.255.255.0

 service-policy input TEST

However, as soon as user opens the URL matching the class-map specification, the engine will classify the flow as matching the class “TEST”. After this, all returning packets (server to client) for this flow will be dropped by the policy map.

Note that even though this works, it is not the recommended way of applying the URL filtering. This is because the actual GET requests will still get to the destination server and generate response traffic. In some situations, the client may send a FIN packet when server generates the response. At this point, NBAR will report the flow as destroyed and the policy-map will not match the packets. If the server response is slow enough, it will actually bypass the ingress filter! Thus, it is always recommended to apply the NBAR classification and policy action in the direction matching the direction of the protocol commands (e.g. apply the URL filtering in the direction of GET commands).

Q4: Are there any other matching criteria besides matching the URL or Host name?
A4: Yes, you can match a number of client/server HTTP headers. Better then that, you can match content-type in server replies, e.g.

match protocol http mime text/plain.

However this type of matching relies on the server to return the proper MIME encoding type in response using the “Content-Type” header. The benefit is the ability to block a whole class of files (e.g. image/jpeg) with a single match criterion.

Q5: Is NBAR good for production deployment?
A5: No. In fact, NBAR is OK if you need some quick and dirty hack, but it is not to be used as a content filtering engine. There is a bunch of reasons:

1) NBAR inspection consumes too much router CPU resources.
2) NBAR inspection is easy to trick. For example by simply avoiding “\r\n” in the end of the string you would make the engine believe it is not an URL request.
3) NBAR engine is buggy. Sometimes you may found your class matches much more traffic than you wanted and drops really important packets. Ooops! This happens all the times.
4) NBAR does not perform true TCP stream reassembly, removing duplicates and assembling fragments. This makes it totally vulnerable to any kind of tricky TCP/IP attacks.

So if you want some real content security, use the specialized solutions :)