Posted on Jun 17, 2016
This post is also available in: Spanish
We know that many corporative installations nowadays use Nagios as their main monitoring system for networks, systems and applications. Also, as we mentioned in the article on the best network monitoring tools, Zabbix has been taking pieces from Nagios’ cake for a long time. There are many doubts that start to arise when it comes to choosing the ideal monitoring tool for an installation, and this is precisely the reason we’ve gotten down to work today to analyze both these systems in depth. As was expected, we also brought Pandora FMS into this comparative, for perspective purposes.
Nagios is considered by some -mainly those that have spent some time in the IT world- to be the “industry standard” when it comes to Open Source monitoring. And, it’s somewhat true, because they were the first to actually get it right. Before Nagios, there were tools, but they were so amateur or so focused on specific tasks, that they were not even close to the innovation Nagios brought in.
The first version of Nagios was introduced last century: 1999. It’s been 17 years and technology has come a long way, Nagios has kept up by creating an “addon” ecosystem or third party complements that try to compensate the lack of features.
Zabbix comes along in 2001. It’s a full-blown development, not a simple Nagios fork, and it’s main characteristic is that it has a very wholistic view on monitoring. It covers performance, not only statuses, which is one of the most significant lacks in Nagios. Apart from having a WEB management system that allows central management, without the pesky configuration files, like the ones Nagios needs.
Pandora FMS is born in 2004. Just like Zabbix, it’s entirely developed from the ground up. It’s main feature is the fact that it’s more than just an IT monitoring system, it’s a monitoring framework which allows anything from infrastructure monitoring (networks and servers) to performance and application monitoring (APM), and even transactional business monitoring (BAM). Just like other modern systems it has a central management system and it’s based on a relational SQL database. Just like Nagios, it has an “Enterprise” edition, but its Open Source version is more than enough to implement any monitoring need. Neither Nagios nor Pandora FMS are “limited” versions like other manufacturers do for their free of cost editions, rather they’re just missing a few features that are aimed toward larger environments.
Management and Setup
Here is where we can see the most significant differences among these systems. No one doubts that Zabbix has a web based management interface which is centralized through their database, just like Pandora FMS. Nagios, nevertheless, is still stuck to the 90’s and is still managed in thousands of places through a complex pleiad of interlaced text files, scripts and manual procedures that also make it necessary to use third party tools for its deployment, like Chef or Puppet.
This has (or had) the advantage that Nagios, since it doesn’t use a database to store information, needs less resources. But, nowadays the bottleneck is no longer the hardware, it’s the capability to efficiently manage configuration, and Nagios is the opposite of easy in this aspect. The difficulty in management makes it so that more than just having Nagios installed, you have an entire team dedicated to managin Nagios, meaning that the software on its own, without a team of people, cannot be exploited correctly.
Nagios (and some of its newer forks like Naemon) still use CGIs written in C. This technology was invented back in the 80’s, and not that it’s bad technology (it’s actually fast and very solid) but it makes it complicated to expand or improve on. It implies that in order to make a simple change it’s necessary to patch the monolithic architecture code and manually compile. Let’s remember that the Nagios ecosystem is based on hundreds of different patches for different versions of each Fork. It’s literally a bazaar. Let’s bear in mind that Nagios’ configuration is based on text files, each time a change is made, a reset is necessary.
If Nagios was the bazaar paradigm, Zabbix and Pandora FMS are the exact opposite: the cathedral. They’re solid projects with a complex and modular architecture, that has grown throughout time with a design directed by the team of architects itself. Neither Zabbix nor Pandora FMS have forks. Both Nagios and Pandora do have “Enterprise” versions. Zabbix doesn’t. The Zabbix model seems to be based on support and implantation services, along with technical formation.
We compared Zabbix vs. Nagios vs. Pandora FMS regarding plugins and “out of the box” monitoring
Zabbix and Nagios both need installing a lot of plugins in order for them to be efficient and offer a series of complete features. Zabbix, on the other hand, doesn’t have an “official” plugin library for the community, although it does have a list of OIDs for SNMP queries. Furthermore, it doesn’t offer the possibility to work with Enterprise tools such as Oracle, Exchange, Active Directory, and others in the core.
Nagios has a huge library, but it’s low on maintenance since all the plugins are 100% open source and there’s not a company to back it up or take care of them.
Pandora has a smaller library than that of Nagios (it doesn’t even reach 500 plugins) but it’s maintained by a company and disregarding the fact that some of those plugins are “Enterprise” (under paid licensing) it’s all very focused to “real” daily products, and not exclusively toward open technology. Pandora FMS, also in its Open Source version, has a default collection of plugins and modules that are “plug and play, ready to use” meant for simpler tasks, both with agents and remote checks. It also includes an SNMP explorer and a set of SNMP and WMI wizards to remotely monitor network devices and servers.
Zabbix has a powerful template and trigger definition system based on regular expressions. It’s quite powerful, yet at the same time complex in use: only meant for people who are capable of understanding regular expressions. In Nagios there is nothing of the sort -in its Open version at least- and for Pandora FMS it’s been replaced by screens and wizards on its WEB interface which are much friendlier to use.
In order to monitor with Nagios, it’s necessary to become accustomed to deal with hundreds of custom scripts, that, when made by another person, almost become some sort of black magic. It’s very complicated for multiple persons to manage it. In the end Nagios ends up being a strange mix between software and custom development.
In order to correctly use Nagios, you don’t need only Nagios, but also four or five community “addons” (check_mk, pnp4nagios, OMD, NRPE, NSCA, ndoutils, thruk, nagvis), apart from other complete complex projects (such as puppet), in order to manage configurations and, of course, thousands of self made script lines. Zabbix and Pandora FMS are autonomous in this sense.
We also compared these three based on their respective communities
The biggest community belongs to Nagios, simply because it was first dibs in this terrain. As a matter of fact, Nagios has an almost infinite amount of forks: OpsView, OP5, Centreon, Icinga, Naemon, Shinken, and the list goes on. This implies a chaotic ecosystem when it comes to applying plugins or tools that can be crossed over from one another. Each branch has a different philosophy and with time this makes it totally incompatible with other branches and with the fathering project (Nagios).
We compared Zabbix, Nagios and Pandora regarding their reports
Zabbix, Nagios and Pandora all have the concept of a “Customizable user screen”. On Nagios a plugin with its own entity is needed (nagvis) but on Zabbix and Pandora, this is prebuilt. Now, we can definitely obtain the best visual results with Pandora FMS:
The reports that Nagios can generate are quite poor. Zabbix improves on this a little, but the concept of report understand as something to “turn in to a customer or boss”, is only available on Pandora. Even in its “free” version, it has a very powerful report generator that allows for a lot of customization, much more than those on Zabbix or Nagios.
We compared the visual graphs for all three as well
Nagios historically has needed third party plugins to perform this task. On recent forks it’s been included by default, but they’re still graphs oriented to communications, with little margin for custom features. Nagios and graphs have always had a “complicated” relationship, considering the origin of Nagios was meant for event management, not data management.
Zabbix has its own graphs, but the graphs on Pandora are generated in real time from the database, which allows the data to be used for combined graphs, scale changes, and custom colors, sizes and graph keys, in a way that they become an active part of the information, not only a technical graph, but also part of a complete report.
Nagios XI (Enterprise)
Although some people consider that agent-based monitoring technology is “demodé” or outdated, the truth is that very large manufacturers (CA, HP, IBM) sometimes mask their remote technology making them seem like something 100% agentless, when what they’re actually doing is copying an agent, running it, and then deleting it. For many monitoring tasks an agent is still a necessary element on the device. Nagios has many (NRPE, NCPA, NRDP, and others) that like most other things on Nagios, are meant to be quite DIY. On many occasions this leads to a lack of maintenance or to some of them being outdated. The fact that there are different agents for a single platform is very consisten with the Nagios mindset. Zabbix also has many more complex features “built in” to the agent itself, such as native event gathering (using an API that comes from Windows NT4 and ensures compatibility and speed, nothing like WMI methods), inventory gathering, service and process watchdog, real time log gathering for process and service downtimes, native user interface for WMI, registry for parameters from the performance counter, integrated network checks on the agent, and many other features that cannot be applied through “scripts” or commands since they mean that the agent has to work on a low level, instead of at user level.
Last but not least: Scalability
It’s not easier knowing “who’s got the bigger one” in this case, but if we refer to public success stories published on each respective webiste, the most complex project taken on by a customer that has exposed a case with numbers and measurements is that of Rakuten from Japan. They use Pandora FMS to monitor almost 10,000 nodes. Pandora FMS has unknown installations that use the Open Source edition with over 30,000 nodes monitored, and theoretically with distributed architecture included in version 6.0 -on the Enterprise edition- you can reach a million nodes. In the official documentation for Pandora FMS the recommended numbers offered are 3,000 agents per server.
Nagios has a wide array of ways, each more artisanal than the next, to offer distributed monitoring. Zabbix and Pandora adopt a similar model, although Pandora has a specific product (its Metaconsole) for distributed, complex and large environments.
With this we hope you can get an idea of the advantages and disadvantages for these three monitoring systems. If you have any doubts, info you feel is missing, or general comments, we entice you to leave it in the comments section.
Some screenshots. Click on image to enlarge.