Version: Next

Monitoring

cAdvisor

This was the perfect service to capture metrics about my docker containers. One thing that was kind of crazy to deal with was the resource usage of cadvisor, so had to find some arguments to add to the container to limit the amount of stats capturing that it does.

Gatus

Getting a bit frustrated with uptime-kuma's slow performance when dealing with many endpoints I decide to look into alternatives that would be better optomized for my needs. I found gatus that seemed to be promising, with a simple configuration setup and actually using a database, I was pretty excited to get it to work. Due to wanting to have a better declarative way of setting up monitors gatus was a great solution and instead of using authentik passthrough to check if applications were running instead I used direct checks to their docker containers. This allowed for a simple setup without needing to worry about the complicated setup I had with uptime-kuma.

Grafana

A service that I have included in my setup throughout all of the transitioning between virtualization technologies, grafana has been the absolute best way to centralize logging and metrics for my cluster/services. The difference in usage between docker and kubernetes was the aggregator tools that were used to power visualizations. With docker I used telegraf and influxdb to handle log gathering and time series data to create a custom resource monitor and log viewer. One thing was that the log viewing capabilites were terrible due to the format in which influxdb would keep logs, but it was amazing for the resource monitor. With kubernetes I moved to tools that are more well suited for log gathering as resource monitoring was not much of a concern. Using promtail as a log collector agent and loki as a log tagger and centralize storage for logs really built out a simple to use dashboard to view all logs generated by my containers. This allowed better oversight on any deployment, pod, or daemonset.

Got rid of this since komodo could give me the same functionality at a high level.

Update (4/17/26)

I actually added this back since komodo only gave a real time snapshot of metrics for my container. Combined with cadvisor I could get a better understanding of my container resource usage, especially since I was looking to limit resource consumption for all of my services. I had way to many instances of something using way more resources than I expect.

Healthchecks

Another monitoring service but specifically for cron jobs. Healthchecks allowed for better tracking of ensure scheduled tasks continued to be executed and notify me if the task has not checked in. All that is required for a task is for when it starts that it registers with healthchecks, when it ends, and if it fails. I currently use it for handling all container appdata backup jobs and for alerting on the subtitle sync service that I created. What has been really great is with a certain api key I can allow a task to auto create a new monitor, e.g. creating a new service to my cluster and needs to be backed up, so that I no longer need to manually create them. The only manual task that I do now is just prettifying the name in healthchecks. In addition I can send logs related to the completion or failure of a job to allow for quick debugging on what may have happened.

Loki

A service built by the grafana team that is responsible for post processing and storing log information from agents. Initially when I first configured this I stored all logs locally and with that I had insanely poor performance when trying to view them in grafana and it also crashed vscode a couple of times when attempting to open the folder in the editor. Eventually I moved it to vultr and stored it in object storage and it has been so fast when viewing log information now. It took some to properly configure the tagging system but I used pre-configured settings for kubernetes logs that made it really easy to ingest.

Got rid of this since komodo could give me the same functionality at a high level

Updated (4/17/26)

Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.

Promtail

Another service built by the grafana team that reads and pushes logs from local nodes or servers to a service like loki. This is the only service that I currently run as a daemonset as it would need to be deployed on every node to scrape kubernetes logs. This was really easy to configure having copied a configuration for kubernetes scraping.

Got rid of this since komodo could give me the same functionality at a high level

Updated (4/17/26)

Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.

SpeedTest Tracker

I forgot when exactly, but there was a point in time where it seemed like the wifi was acting up and speeds were terrible, so I decided to try and get a better understanding of when it happens. In order to do this I wanted to run speedtests every so often on the network, both in the VPN and the Router to see how well the network is performing. Setting this up was really straighforward and almost a set it and forget it setup. I just setup thresholds (which took some time to figure out) to get a notice when internet speeds were really bad. Setting this up to work in the vpn was tricky due to the network dependency on the qbittorrent instance, but I was able to figure this out with recreator.

Uptime Kuma

warning

Deprecated 5/29/25

Another very cool monitoring tool that I use, uptime-kuma continuously ping exposed services to ensure they are up and running. If anything does go down I am able to get notified through discord the application that is not running. I also use it to ensure that there is proper authentication middleware for some of the services I run in case something like authentik or authelia goes down. I can also create status pages that I can share to show users if there is an issue with any service. The only downside to this service is the need to manually add a new check for every new public service, and so it was not the best solution for me as it was something I forget pretty easily to do when setting up new services.

Uptime Robot

This isn't really a self hosted service, but it was a free solution for me to monitor my public services if they every went down. This was extremely useful in knowing if the server went down because all the monitoring I have for it, is hosted on the server itself. It automatically sends me a discord notification if any service goes down and it keeps track of uptime for my services as well.