Move over Variance: BGP Proportional Load Balancing is here!
Having a blast in Chicago with the RS bootcamp students. Thanks for all the hard work you are doing this week!
A student from a past Reno class, named Michal, asked if I would create a blog post regarding BGP proportional load balancing based on the bandwidth of the links to EBGP peers. It has been on my list of things to do, and here it is. Thanks for the request Michal.
The secret to this trick is to pay attention to the links between directly connected external BGP neighbors, (in this case between R6-R5 and R2-R3), and send the link bandwidth extended community attribute to iBGP peer R1. It is enabled by entering the bgp dmzlink-bw command and using extended communities to share the information. To summarize: routes learned from directly connected external neighbor are advertised to IBGP peers including the bandwidth of the external link where the routes were learned, and then the IBGP router (R1) can proportionally load balance between the two paths.
Here is the diagram we will use.
We’ll use loobpacks for our IBGP connections, so let’s verify that we have connectivity between loopbacks in AS 123.
R1#ping 6.6.6.6 source loopback 0Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 6.6.6.6, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/43/76 ms
R1#
R1#ping 2.2.2.2 source loopback 0Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/40/72 ms
Ok, that looks good, so let’s configure R1 to be an IBGP peer with R6 and R2. The dmzlink-bw feature is implemented as part of the IPv4 address family configuration.
R1(config)#router bgp 126
R1(config-router)#neighbor 6.6.6.6 remote-as 126
R1(config-router)#neighbor 2.2.2.2 remote-as 126
R1(config-router)#neighbor 6.6.6.6 update-source lo0
R1(config-router)#neighbor 2.2.2.2 update-source lo0R1(config-router)#address-family ipv4
R1(config-router-af)#bgp dmzlink-bw
R1(config-router-af)#neighbor 6.6.6.6 activate
R1(config-router-af)#neighbor 2.2.2.2 activate
R1(config-router-af)#neighbor 6.6.6.6 send-community both
R1(config-router-af)#neighbor 2.2.2.2 send-community both
R1(config-router-af)#maximum-paths ibgp 2
R1(config-router-af)#end
Next, we will configure R6, and R2 to be IBGP neighbors with R1, and EBGP neighbors with R5 and R3 respectively. We are going to manipulate the external interfaces on R6 and R2 to reflect a bandwidth of 6000k and 5000k respectively using the bandwidth command. BGP can originate the link bandwidth community only for directly connected links to eBGP neighbors. In our example, this will be originated from R6 and R2.
R6(config)#router bgp 126
R6(config-router)#neighbor 1.1.1.1 remote-as 126
R6(config-router)#neighbor 1.1.1.1 update-source lo0
R6(config-router)#neighbor 10.56.0.5 remote-as 345
R6(config-router)#address-family ipv4
R6(config-router-af)#bgp dmzlink-bw
R6(config-router-af)#neighbor 1.1.1.1 activate
R6(config-router-af)#neighbor 1.1.1.1 next-hop-self
R6(config-router-af)#neighbor 1.1.1.1 send-community both
R6(config-router-af)#neighbor 10.56.0.5 activate
R6(config-router-af)#neighbor 10.56.0.5 dmzlink-bw
R6(config-router-af)#int fa 0/0
R6(config-if)#bandwidth 6000
Now, on to R2, with virtually the same configuration.
R2(config)#router bgp 126
R2(config-router)#neighbor 1.1.1.1 remote-as 126
R2(config-router)#neighbor 1.1.1.1 update-source lo0
R2(config-router)#neighbor 10.23.0.3 remote-as 345
R2(config-router)#address-family ipv4
R2(config-router-af)#bgp dmzlink-bw
R2(config-router-af)#neighbor 1.1.1.1 activate
R2(config-router-af)#neighbor 1.1.1.1 next-hop-self
R2(config-router-af)#neighbor 1.1.1.1 send-community both
R2(config-router-af)#neighbor 10.23.0.3 activate
R2(config-router-af)#neighbor 10.23.0.3 dmzlink-bw
R2(config-router-af)#int ser 0/1.23
R2(config-subif)#bandwidth 5000
Now we will configure R5 and R3 as the EBGP neighbors of R6 and R2 respectively. These EBGP peers don't need any special configuration, other than standard BGP.
R5(config)#router bgp 345
R5(config-router)#neighbor 10.56.0.6 remote-as 126
R5(config-router)#neighbor 4.4.4.4 remote-as 345
R5(config-router)#neighbor 4.4.4.4 update-source lo0
R5(config-router)#neighbor 4.4.4.4 next-hop-selfR3(config)#router bgp 345
R3(config-router)#neighbor 10.23.0.2 remote-as 126
R3(config-router)#neighbor 4.4.4.4 remote-as 345
R3(config-router)#neighbor 4.4.4.4 update-source lo0
R3(config-router)#neighbor 4.4.4.4 next-hop-self
Last, but not least we configure R4 as an IBGP peer to R5 and R3. In addition, we will create a loopback and add it into BGP. We will use the loopack as a target destination from R1 to verify the load balancing in a later step, so watch for that coming up.
R4(config)#int loop 44
R4(config-if)#ip add 44.44.44.44 255.255.255.0
R4(config-if)#router bgp 345
R4(config-router)#neighbor 5.5.5.5 remote-as 345
R4(config-router)#neighbor 3.3.3.3 remote-as 345
R4(config-router)#network 44.44.44.0 mask 255.255.255.0
Now let’s verify. Because we are on R4, let’s verify the BGP neighborships it has.
R4#show ip bgp summary
BGP router identifier 44.44.44.44, local AS number 345
BGP table version is 2, main routing table version 2
1 network entries using 120 bytes of memory
1 path entries using 52 bytes of memory
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 1) using 32 bytes of memory
BGP using 452 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secsNeighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 3.3.3.3 4 345 4 5 2 0 0 00:00:41 0 5.5.5.5 4 345 4 5 2 0 0 00:00:35 0
! Note: we can easily verify what routes are being advertised out from R4.R4#show ip bgp neighbors 5.5.5.5 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incompleteNetwork Next Hop Metric LocPrf Weight Path
*> 44.44.44.0/24 0.0.0.0 0 32768 iTotal number of prefixes 1
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incompleteNetwork Next Hop Metric LocPrf Weight Path
*> 44.44.44.0/24 0.0.0.0 0 32768 iTotal number of prefixes 1
R4#
Looks like AS 345 is fine. Let’s jump to R1, in AS 126, and verify from there.
R1#show ip bgp summary
BGP router identifier 1.1.1.1, local AS number 126
BGP table version is 3, main routing table version 3
1 network entries using 120 bytes of memory
2 path entries using 104 bytes of memory
1 multipath network entries and 2 multipath paths
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 496 total bytes of memory
BGP activity 1/0 prefixes, 2/0 paths, scan interval 60 secsNeighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 2.2.2.2 4 126 10 9 3 0 0 00:06:39 1 6.6.6.6 4 126 11 10 3 0 0 00:07:14 1
R1#show ip bgp
BGP table version is 3, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incompleteNetwork Next Hop Metric LocPrf Weight Path
* i44.44.44.0/24 6.6.6.6 0 100 0 345 i *>i 2.2.2.2 0 100 0 345 i! Note: Looks like we have the neighbors, and the 44.44.44.0/24 prefix.
! To see more detail on the 44.44.44.0 network, we can use a couple additional commands.R1#show ip bgp 44.44.44.0
BGP routing table entry for 44.44.44.0/24, version 3
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
Flag: 0x820
Not advertised to any peer
345
6.6.6.6 (metric 1) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100, valid, internal, multipath
DMZ-Link Bw 750 kbytes
345
2.2.2.2 (metric 1) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
DMZ-Link Bw 625 kbytes! Note: Let's see what the routing table has to say about this network.
R1#show ip route 44.44.44.0
Routing entry for 44.44.44.0/24
Known via "bgp 126", distance 200, metric 0
Tag 345, type internal
Last update from 2.2.2.2 00:02:56 ago
Routing Descriptor Blocks:
* 6.6.6.6, from 6.6.6.6, 00:02:56 ago
Route metric is 0, traffic share count is 6
AS Hops 1
Route tag 345
2.2.2.2, from 2.2.2.2, 00:02:56 ago
Route metric is 0, traffic share count is 5
AS Hops 1
Route tag 345! Note: We can also get the information from the CEF table.
R1#show ip cef 44.44.44.0
44.44.44.0/24, version 47, epoch 0, per-destination sharing
0 packets, 0 bytes
via 6.6.6.6, 0 dependencies, recursive
traffic share 6
next hop 10.16.0.6, FastEthernet0/1 via 6.6.6.0/24
valid adjacency
via 2.2.2.2, 0 dependencies, recursive
traffic share 5
next hop 10.12.0.2, FastEthernet0/0 via 2.2.2.0/24
valid adjacency
0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
So now that the route is there, how do we test the load balancing? One option is to do an extended ping, and record the path. We are expecting a 6 to 5 ratio for outbound traffic favoring the R6 path more than the R2 path. Let's send 30 ping requests, and show the full response for the benefit of verification.
R1#ping
Protocol [ip]:
Target IP address: 44.44.44.44
Repeat count [5]: 30
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: loopback0
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: r
Number of hops [ 9 ]: 4
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 30, 100-byte ICMP Echos to 44.44.44.44, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet has IP options: Total option bytes= 19, padded length=20
Record route: <*>
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)Reply to request 0 (204 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
<*>
End of listReply to request 1 (156 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
<*>
End of list! Note: the path changes on the next ping request, and begins to use R6 as the next hop.
Reply to request 2 (160 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route: (10.16.0.1) (10.56.0.6) (10.45.0.5) (44.44.44.44)
<*>
End of listReply to request 3 (128 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route: (10.16.0.1) (10.56.0.6) (10.45.0.5) (44.44.44.44)
<*>
End of listReply to request 4 (156 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 5 (172 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 6 (108 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 7 (136 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 8 (180 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
<*>
End of listReply to request 9 (152 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 10 (80 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 11 (308 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 12 (204 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 13 (108 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 14 (160 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 15 (140 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 16 (140 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 17 (104 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 18 (84 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 19 (192 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 20 (232 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 21 (220 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 22 (168 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 23 (140 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.12.0.1)
(10.23.0.2)
(10.34.0.3)
(44.44.44.44)
<*>
End of listReply to request 24 (88 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 25 (224 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 26 (484 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 27 (128 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 28 (108 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listReply to request 29 (136 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(10.16.0.1)
(10.56.0.6)
(10.45.0.5)
(44.44.44.44)
<*>
End of listSuccess rate is 100 percent (30/30), round-trip min/avg/max = 80/166/484 ms
R1#
The first 2 requests, numbered 0-1, used the path of R2-R3-R4. The next 6 requests, numbered 2-7, used the path of R6-R5-r4. The next 5, numbered 8-12, use the R2-R3-R4 path again, and then the next 6 use the R6-R5-R4 path.
Happy studies.