CCIE Troubleshooting: Part 2 (Dude, Where's My Routing?)
Thanks for tuning in again! We're back for more of the excitement known as Troubleshooting! Today we're going to look at little more at some of the more nefarious (my word for the day) things that may come your way. How simple little commands can certainly change the way your lab is going!
In case you haven't noticed by now, the CCIE lab is a largely psychological event. Technical knowledge is a very good thing, but if you can't handle the pressure then it doesn't help much! I still remember my first lab exam. Or more importantly the weeks leading up to my first lab exam, and I couldn't make simple things work correctly! Stuff I'd been doing for years. And it was all in my head.
So what kinds of things can be on your lab which may have an impact on this stuff? Some are very simple, some are not.
So in the last post, I mentioned a little about process. The process by which we troubleshoot things (or how we even start our lab) may make a tremendous amount of difference in what our outcome, or at least our psychological state may end up being. Remember, you are there for the proctor's entertainment. And sometimes they are very entertained!
Let's take an obvious one. What if "no ip routing" was in one of your routers? "What?" you say... "Something THAT obvious, any CCNA could figure out, that's just plain (insert appropriate word of shock or exasperation here)."
Well, yes and no. A lot depends on WHEN you discover it. So let's assume the obvious. You check the routing table:
TestRouter(config)#do sh ip ro
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static routeGateway of last resort is not set
TestRouter(config)#no ip routing
TestRouter(config)#
TestRouter(config)#
TestRouter(config)#do sh ip ro
Default gateway is not setHost Gateway Last Use Total Uses Interface
ICMP redirect cache is empty
TestRouter(config)#
That's an obvious one to figure out. Ok.... What if you don't check that, but go to implement a routing protocol???
TestRouter(config)#router rip
IP routing not enabled
TestRouter(config)#
Ok, another obvious one. Mostly because the router TELLS you what's wrong! Let's hop to another router and look at some not-so-obvious problems with this command! Again, some of my routers are already fully configured from Mock Lab 4, so there's a functional network going on that we're going to mess with!
Rack1R1(config-if)#do sh frame map
Serial0/0/0 (up): ip 145.1.125.2 dlci 105(0x69,0x1890), static,
CISCO, status defined, active
Serial0/0/0 (up): ip 145.1.125.5 dlci 105(0x69,0x1890), dynamic,
broadcast,
CISCO, status defined, active
Serial0/1/0.12 (up): point-to-point dlci, dlci 112(0x70,0x1C00), broadcast
status defined, active
Serial0/1/0.13 (up): point-to-point dlci, dlci 131(0x83,0x2030), broadcast
status defined, active
Rack1R1(config-if)#do sh ip int br | ex un
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 145.1.17.1 YES manual up up
Serial0/0/0 145.1.125.1 YES manual up up
Serial0/1/0.12 145.1.12.1 YES manual up up
Serial0/1/0.13 145.1.13.1 YES manual up up
Loopback0 150.1.1.1 YES manual up up
Loopback1 145.1.111.111 YES manual up up
Rack1R1(config-if)#Rack1R1(config-if)#do ping 145.1.125.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 56/59/64 ms
Rack1R1(config-if)#do ping 145.1.125.5Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 28/29/32 ms
Rack1R1(config-if)#do ping 145.1.12.2Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.12.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 56/56/60 ms
Rack1R1(config-if)#do ping 145.1.13.3Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.13.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 28/29/32 ms
Rack1R1(config-if)#
Looks like a great setup, and things are working!
R1
no ip routing
In my case, I still have "debug ip routing" turned on, so there's LOTS of stuff going on at the moment. In the beginning of your lab, you wouldn't see a thing. :)
Rack1R1(config)#do ping 145.1.125.2Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Rack1R1(config)#do ping 145.1.125.5Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.5, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Rack1R1(config)#do ping 145.1.12.2Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.12.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Rack1R1(config)#do ping 145.1.13.3Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.13.3, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Rack1R1(config)#
Now how's that for an interesting turn of events???
Rack1R1(config)#do sh frame map
Serial0/0/0 (up): ip 145.1.125.2 dlci 105(0x69,0x1890), static,
CISCO, status defined, active
Serial0/0/0 (up): ip 145.1.125.5 dlci 105(0x69,0x1890), dynamic,
broadcast,
CISCO, status defined, active
Serial0/1/0.12 (up): point-to-point dlci, dlci 112(0x70,0x1C00), broadcast
status defined, active
Serial0/1/0.13 (up): point-to-point dlci, dlci 131(0x83,0x2030), broadcast
status defined, active
Rack1R1(config)#
Nothing has changed in my mapping or other configuration! In other words, my frame-relay configuration is perfectly fine!
Rack1R1(config)#do ping 145.1.17.7Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.17.7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/202/1000 ms
Rack1R1(config)#
I can ping other interfaces perfectly fine. Just not frame-relay.
So if you did your IP address checks, and immediately dove into configuring your lab, you'd get to frame-relay and insist that you were going insane!
Rack1R1(config)#do debug ip packet detail
IP packet debugging is on (detailed)
Rack1R1(config)#do ping 145.1.125.2Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.2, timeout is 2 seconds:*Feb 16 16:24:57.185: IP: tableid=0, s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), routed via RIB
*Feb 16 16:24:57.185: IP: s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), len 100, sending
*Feb 16 16:24:57.185: ICMP type=8, code=0.
*Feb 16 16:24:59.185: IP: tableid=0, s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), routed via RIB
*Feb 16 16:24:59.185: IP: s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), len 100, sending
*Feb 16 16:24:59.185: ICMP type=8, code=0
*Feb 16 16:24:59.745: IP: tableid=0, s=150.1.1.1 (local), d=150.1.6.6 (FastEthernet0/0), routed via RIB
*Feb 16 16:24:59.745: IP: s=150.1.1.1 (local), d=150.1.6.6 (FastEthernet0/0), len 145, sending
*Feb 16 16:24:59.745: TCP src=65067, dst=179, seq=2727314646, ack=3832900916, win=16289 ACK PSH FIN.
*Feb 16 16:25:00.613: IP: s=150.1.4.4 (Serial0/0/0), d=145.1.125.1, len 64, rcvd 1
*Feb 16 16:25:00.613: TCP src=45959, dst=179, seq=2516853527, ack=0, win=16384 SYN
*Feb 16 16:25:00.613: IP: tableid=0, s=145.1.125.1 (local), d=150.1.4.4 (FastEthernet0/0), routed via RIB
*Feb 16 16:25:00.613: IP: s=145.1.125.1 (local), d=150.1.4.4 (FastEthernet0/0), len 40, sending
*Feb 16 16:25:00.613: TCP src=179, dst=45959, seq=0, ack=2516853528, win=0 ACK RST
*Feb 16 16:25:01.185: IP: tableid=0, s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), routed via RIB
*Feb 16 16:25:01.185: IP: s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), len 100, sending
*Feb 16 16:25:01.185: ICMP type=8, code=0.
*Feb 16 16:25:02.613: IP: s=150.1.4.4 (Serial0/0/0), d=145.1.125.1, len 64, rcvd 1
*Feb 16 16:25:02.613: TCP src=45959, dst=179, seq=2516853527, ack=0, win=16384 SYN
*Feb 16 16:25:02.613: IP: tableid=0, s=145.1.125.1 (local), d=150.1.4.4 (FastEthernet0/0), routed via RIB
*Feb 16 16:25:02.613: IP: s=145.1.125.1 (local), d=150.1.4.4 (FastEthernet0/0), len 40, sending
*Feb 16 16:25:02.613: TCP src=179, dst=45959, seq=0, ack=2516853528, win=0 ACK RST
*Feb 16 16:25:03.185: IP: tableid=0, s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), routed via RIB
*Feb 16 16:25:03.185: IP: s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), len 100, sending
*Feb 16 16:25:03.185: ICMP type=8, code=0.
*Feb 16 16:25:05.185: IP: tableid=0, s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), routed via RIB
*Feb 16 16:25:05.185: IP: s=145.1.17.1 (local), d=145.1.125.2 (FastEthernet0/0), len 100, sending
*Feb 16 16:25:05.185: ICMP type=8, code=0.
Success rate is 0 percent (0/5)
Rack1R1(config)#
Rack1R1(config)#do un all
All possible debugging has been turned off
Rack1R1(config)#
You'll notice a couple of incoming BGP messages there as well trying to re-establish that connection. Bottom line is that things LOOK like they are being sent, and yet nothing actually goes out. How long would you spend pulling your hair out? Would you ever think to look at "no ip routing" as your culprit? Perhaps if you have been through this before then "yes" but otherwise, it's not part of what you would expect for a local interface connection! Process, process, process. Otherwise, definitely a time killer!
If you look really closely, you'll see something more interesting on that. The packets are going out your FastEthernet0/0 interface. Any idea why? Not treating the local interface as valid there, your router is trying to ARP for the address. In this functional, pre-configured network, SW1 (the other end of Fa0/0) has a route to the IP and therefore will reply with Proxy ARP. If other routers did not have a route, you would see "encapsulation failed" messages in the debug.
Rack1R1(config)#ip routing
Rack1R1(config)#
Rack1R1(config)#do ping 145.1.125.2Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 145.1.125.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 56/58/60 ms
Rack1R1(config)#
*Feb 16 16:29:24.525: %PIM-5-NBRCHG: neighbor 145.1.125.5 UP on interface Serial0/0/0
*Feb 16 16:29:24.529: %PIM-5-DRCHG: DR change from neighbor 145.1.125.1 to 145.1.125.5 on interface Serial0/0/0
Rack1R1(config)#
As soon as we re-enable it, things start working again! And OSPF, and BGP, and PIM.
What other things can we do? How about this.... Can you assign 140.100.0.1/24 to Fa0/0 for me? Sounds simple...
TestRouter(config)#do sh run int f0/0
Building configuration...Current configuration : 92 bytes
!
interface FastEthernet0/0
no ip address
no ip route-cache
duplex auto
speed auto
endTestRouter(config)#int f0/0
TestRouter(config-if)#ip addr 140.100.0.1 255.255.255.0
Bad mask /24 for address 140.100.0.1
TestRouter(config-if)#
What happened there? I swear I've typed IP addresses like that for years. Really, my children can probably type that IP address correctly. What's wrong? How about "no ip subnet-zero" in your configuration? Would you discover it? Perhaps, perhaps not. But it can be very frustrating if you haven't seen it before! On a side note, if you configure addresses and THEN use "no ip subnet-zero" your existing interfaces will work just fine, but any new ones within the first available subnet (subnet-zero) cannot be used!
Fun, huh?
Ever see an OSPF interface not work? Sure... but here's one. What can cause this?
Before:
Rack1R1(config-if)#do sh ip o i b
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 150.1.1.1/24 1 LOOP 0/0
VL7 1 0 145.1.125.1/24 64 P2P 1/1
VL6 1 0 145.1.13.1/24 64 P2P 1/1
VL5 1 0 145.1.12.1/24 64 P2P 1/1
VL4 1 0 145.1.17.1/24 1 P2P 1/1
Fa0/0 1 17 145.1.17.1/24 1 BDR 1/1
Se0/1/0.13 1 123 145.1.13.1/24 64 P2P 1/1
Se0/1/0.12 1 123 145.1.12.1/24 64 P2P 1/1
Se0/0/0 1 125 145.1.125.1/24 64 P2MP 1/1
Rack1R1(config-if)#
and After:
Rack1R1(config-if)#do sh ip o i b
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 150.1.1.1/24 1 LOOP 0/0
VL7 1 0 145.1.125.1/24 64 P2P 1/1
VL6 1 0 145.1.13.1/24 64 P2P 1/1
VL5 1 0 145.1.12.1/24 64 P2P 1/1
VL4 1 0 0.0.0.0/0 65535 DOWN 0/0
Fa0/0 1 17 145.1.17.1/24 1 DR 0/0
Se0/1/0.13 1 123 145.1.13.1/24 64 P2P 1/1
Se0/1/0.12 1 123 145.1.12.1/24 64 P2P 1/1
Se0/0/0 1 125 145.1.125.1/24 64 P2MP 1/1
Rack1R1(config-if)#
The virtual-link VL4 is down (could be many things), but I've lost a neighbor on Fa0/0. You probably wouldn't see a change like this in the live running (because that would usually imply something you did!) but let's pretend I'm setting OSPF up for the first time, and my neighbor on a fast ethernet is not coming up (see above, it does work, and there's nothing on the other side causing a problem, I promise). So you have local configuration....
Rack1R1(config-if)#do sh run | s ospf
ip ospf network point-to-multipoint
ip ospf authentication-key CISCO12
ip ospf authentication null
router ospf 1
log-adjacency-changes
area 0 authentication message-digest
area 17 virtual-link 150.1.7.7 message-digest-key 1 md5 CISCO
area 123 authentication
area 123 virtual-link 150.1.3.3 message-digest-key 1 md5 CISCO
area 123 virtual-link 150.1.2.2 message-digest-key 1 md5 CISCO
area 125 virtual-link 150.1.5.5 message-digest-key 1 md5 CISCO
network 145.1.12.1 0.0.0.0 area 123
network 145.1.13.1 0.0.0.0 area 123
network 145.1.17.1 0.0.0.0 area 17
network 145.1.125.1 0.0.0.0 area 125
network 150.1.1.1 0.0.0.0 area 0
Rack1R1(config-if)#
Rack1R1(config-if)#do sh ip o i f0/0
FastEthernet0/0 is up, line protocol is up
Internet Address 145.1.17.1/24, Area 17
Process ID 1, Router ID 150.1.1.1, Network Type BROADCAST, Cost: 1
Transmit Delay is 1 sec, State DR, Priority 1
Designated Router (ID) 150.1.1.1, Interface address 145.1.17.1
No backup designated router on this network
Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
oob-resync timeout 40
No Hellos (Passive interface)
Supports Link-local Signaling (LLS)
Index 1/7, flood queue length 0
Next 0x0(0)/0x0(0)
Last flood scan length is 2, maximum is 15
Last flood scan time is 0 msec, maximum is 0 msec
Neighbor Count is 0, Adjacent neighbor count is 0
Suppress hello for 0 neighbor(s)
Rack1R1(config-if)#
Ok, there's a hint.... Passive interface. But look in the OSPF section. There is nothing about passive-interface up there!
Rack1R1(config-if)#do sh run | in passive
Rack1R1(config-if)#
In fact, there's nothing on the entire router about passive-interface! There's a new one! We know that loopbacks are automatically treated as stub hosts unless otherwise specified, but there's nothing to automatically treat a FastEthernet as passive is there? Particularly not in the "router ospf" section!?!?
Rack1R1(config-if)#do sh run int f0/0
Building configuration...Current configuration : 135 bytes
!
interface FastEthernet0/0
ip address 145.1.17.1 255.255.255.0
ip pim sparse-mode
duplex auto
speed auto
no routing dynamic
endRack1R1(config-if)#
While it may seem like an innocuous command, and one you may have never seen before... the "no routing dynamic" will cause this passive behavior without typing "passive-interface fa0/0" anyplace! And if you aren't looking for it, we can see all sorts of issues with it.
So there's a couple more things to be paranoid about for your lab exam. But how do we check for them? Honestly, as part of your initial discovery, I'd do a very simple command!
Rack1R1(config)#do sh run | in no\
no service password-encryption
no ip subnet-zero
no ip domain lookup
no dspfarm
no routing dynamic
no ip address
no frame-relay inverse-arp IP 102
no frame-relay inverse-arp IP 103
no frame-relay inverse-arp IP 104
no frame-relay inverse-arp IP 113
no ip address
no synchronization
no auto-summary
no ip http secure-server
Rack1R1(config)#
There's a space after the "\" character up there (to tell GREP about the special character to follow). That way you avoid anything with just "no" in there and only get "no " as a match.
There may be many things shown (as above) that we really don't care about. But some things like the "subnet-zero" and the "routing dynamic" should definitely leap out at us! A quick scan on each device like this can save HOURS of troubleshooting later.
The fixes are simple!
Rack1R1(config)#ip sub
Rack1R1(config)#int f0/0
Rack1R1(config-if)#rou dyn
Rack1R1(config-if)#
*Feb 16 17:06:49.197: %OSPF-5-ADJCHG: Process 1, Nbr 150.1.7.7 on FastEthernet0/0 from LOADING to FULL, Loading Done
Rack1R1(config-if)#
*Feb 16 17:07:04.225: %OSPF-5-ADJCHG: Process 1, Nbr 150.1.7.7 on OSPF_VL4 from LOADING to FULL, Loading Done
Rack1R1(config-if)#
Within about 10 seconds, things are back to working order again.
Never fear.... There's more coming.... Will our router be boiled in hot lava? Will our switch get hijacked by the proctor? Have we been punk'd by the lab? Stay tuned next time for the exciting conclusion to CCIE Troubleshooting!