The Network Engineer’s Question: Why Learn Ansible?

In today's blog INE instructor Keith Bogart discusses the rise in automation and scripting technologies in Cisco exams, and why it's worth it for Networking certification candidates to learn Ansible.

It used to be that when one desired to obtain a Computer Networking-related Certification (such as Cisco’s CCNA or CCIE), everybody knew what that meant. “Networking” primarily meant Routing and Switching, with maybe a few other irritating topics thrown in just to help sprout some more gray hairs. But for the most part, one could be assured that if he/she knew the command-line interface of the platform they were studying for (such as Cisco or Juniper) then whatever might be encountered on the exam, was something that could be configured within the CLI.

However, when Cisco came out with a major revamping of all their certifications in early 2020 suddenly a new and unknown topic was thrust upon students...AUTOMATION! Thrown into that terrifying category were terms that previously, it had been assumed that only the “server people” needed to know such as Chef, Puppet, Ansible and (gasp) Python. Network Engineers who had for years relied on their deep understanding of Cisco’s IOS software CLI to pass any Cisco exam placed in front of them, now were faced with the reality that they would need to learn something new and very different from what they were used to.

Some embraced this new challenge while others were (and still are) quite resistant. I have even personally known people who become angry at the idea that automation and scripting technologies are now a significant part of the current Cisco exams. A common refrain is often heard goes something like this, “Why do I have to learn Python or Ansible? Learning how to make a feature or protocol work using those technologies is even more complex and time-consuming than if I just continued to login to my device and used the CLI!”

From the standpoint of the initial configuration of protocols-and-features on a single box, (i.e. router, switch, firewall, etc) the argument above is definitely valid. The true value of an application like Ansible lays with the concept of “scale”.

There are two kinds of networks in the world. The first kind I would label as “set and forget”. This is the type of network in which devices are quickly forgotten and never touched, once they have been configured with their appropriate network-related protocols-and-features. You’ll often find this type of network in a SOHO (Small Office/Home Office) environment. So what is the role of Ansible in this type of network? Even a network that rarely experiences protocol or feature changes could still benefit from the scalability features Ansible provides when it comes to troubleshooting, security monitoring and software upgrades.

Let’s imagine that you are suddenly confronted someday with what could potentially be a network issue (Internet access is down, an internal server resource can’t be reached, the network is “slow”, etc). If you don’t already own some kind of Network Monitoring software, you may be used to implementing a series of tried-and-true CLI commands on several of your routers-and-switches in attempts to isolate which of your devices is the root of the problem. Perhaps your method is something like this:

Start with the router or switch closest to the employee with the complaint.
Issue a series of commands on that device such as:

Ping <destination IP>
Traceroute <destination IP>
Show interfaces | <regular expression matching>
Show processes cpu
Show version
Etc, etc

If the local router-or-switch is discovered NOT to be the culprit you might move to the next networking device and repeat the same steps until the offending device is found.

You can imagine that if a procedure such as the one described above required you to SSH into dozens (or hundreds) of devices one at a time, run all of your troubleshooting commands, gather output into a text file, scan the file for something suspicious and then move on to the next device...you could be looking at a LONG time before the offending device was identified. This is a perfect scenario where Ansible’s built-in scalability could save you a ton of time.

Your first Ansible-related step would be to include all of your network devices into an Ansible Hosts file (which provides Ansible the information it needs such as IP addresses and SSH usernames-and-passwords to connect to each device). For the “set and forget” network this would only need to be created once. Then you would create a YAML file (in Ansible terminology called a “Playbook”) that would reference your Ansible Hosts file and include each of those commands you would normally manually issue on your hosts. That same Playbook could also instruct Ansible to take the output your devices returned from each command and save them in individual files...or even one combined file. After running the Ansible Playbook, one could then use grep or some other search tool to quickly parse through your file searching for text that would narrow down your search. And the best part of this entire process is that it’s repeatable! Unless you delete it, that Ansible Playbook will always be available the next time network-related trouble finds your way.

The same benefit of Ansible holds true when it comes to mass upgrades of your networking software. By running a single Ansible Playbook, one could perform software updates across routers and switches in a matter of minutes rather than hours or days.

Our second type of network belongs to a company (or government entity or ISP) that is large and experiences changes frequently. Perhaps it is a network shared by different customers that come and go, necessitating the frequent editing of Access Control Lists. Or it might be an ISPs network that requires frequent tuning of routing filters and policies. If these types of configuration changes need to be implemented on several devices at once, Ansible can be leveraged so that:

Network downtime is minimized as Ansible is able to SSH into several devices simultaneously and push configuration policy changes en masse.
Configuration changes are consistent and less error-prone because they don’t rely on a human logging into routers and switches (one-by-one) and typing the same commands over and over.
Ansible Playbooks which have been stored in logical file systems provide a historical record of the changes that were made and are easily repeatable should those same changes need to be implemented on different devices in the future.

Then there is always the scenario in which an intruder gains access to one of your routers, switches or firewalls and implements some unauthorized configuration. Perhaps this person was a former employee who removed a line or two of your Access Control List configuration leaving a backdoor open for themselves. Wouldn’t it be nice to scan all of your networking devices on a periodic basis, comparing their current configurations against some saved “Golden Configs” to detect these types of changes? An Ansible Playbook can easily do this and, if you find a device with an unauthorized configuration change, Ansible can quickly be leveraged to restore your “Golden Config” back onto the hacked device.

These are but a few examples of the power of Ansible. Hopefully, if you were one of those that was resistant to its charms, you have begun to see how Ansible could truly be beneficial to you.