There are a few religions in the world of monitoring. Pushing metrics vs pulling metrics, agents vs agent-less and files vs databases for storing config being some common examples.
On the files side of the fence the clear choice for many people is still Nagios. In the new world of DevOps, cloud and automation there is a desire to have everything written in text files and stored in a version control system.
To make life easier configuration management modules exist to abstract the file creation and create a model for what checks go where and what alerts need to be configured. For all of the power you get with doing things this way you lose some of the setup friendliness.
A few years after Nagios was released everyone started building tools as web apps with databases. You can create nice slick interfaces, store UI state and even display metrics on pretty graphs and store them as dashboards. Zabbix is probably the best example of what can be done when you store everything in a big MySQL database. It has an API and you can still configure the agents using config management. But the actual monitoring config is locked away and can’t be easily viewed or diff’ed like Nagios config changes in source control.
When you build a monitoring system from scratch you get to try new things. One of the things we wanted to try was storing everything in a human readable text format in the database. This means referencing configuration objects by name and creating a data model that maps to directories and files behind the scenes when exporting and importing. This isn’t something you can really change later on so we’ve been quite dedicated to sticking to this philosophy in everything we build.
You get a few cool benefits from this design. A flashy web interface that is automatically generating config behind the scenes being the most noticeable one. This drives up adoption outside of operations and in a lot of cases speeds up making quick changes. This provides all of the benefits that you get by going with a tool like Zabbix including the pretty graphs and host information pages.
It also means that we can easily export a grouping of plugins, dashboards and alert rules and share them in a common format via our Packs Library. You create them in the UI and then save them as files and folders which makes them portable.
However, that still doesn’t help the ops people who love to edit text files and push them into config management. We have added a few features in the past to help people who want to drive plugin deployment from config management (solo mode). However, this only solves the issue of storing plugins and doesn’t help with storing everything else.
Ideally both use cases should be able to co-exist; the GUI people and the command line people all working the way they want to work. We’re a step closer to that being a reality today with the release of the latest Outlyer Command Line Tool features for backing up configuration for entire organisations and accounts to disk.
The ‘dlcli backup org’ command exports all of your configuration to human readable files and folders on disk which means they can automatically be committed to your source code system. You can then restore organisations, accounts or individual objects from any point in time.
Details for how to set that up on a Linux backup server are here:
https://support.outlyer.com/hc/en-gb/articles/208281033-Backup-and-Restore
So now when anyone uses the web interface to make any configuration changes in Outlyer we get a commit of the difference in human readable format in Github and a link to the diff posted in Slack via web hook. It’s pretty cool seeing the list of changes and it has already helped to restore one accidentally deleted dashboard.
That solves the issue of mirroring configuration in Git so it can be reviewed or restored. But we aren’t quite there yet on driving all config in a bi-directional way. To do that properly we need to add a Git remote endpoint to Outlyer so that plugins, dashboards, rules and links can all be created locally in a text editor, tested remotely using ‘dlcli run’ and then pushed directly up via ‘git push Outlyer master’.
Github have managed to keep both technical and less technical people happy with clever interface design which inspires us to try to do the same.
We’re not far off the dream now. Hopefully this is an interesting approach for anyone who enjoys thinking about monitoring system design.