How to Use the Grafana Operator: Managing a Grafana Cloud Stack in Kubernetes

https://grafana.com/blog/2024/04/24/how-to-use-the-grafana-operator-managing-a-grafana-cloud-stack-in-kubernetes/

By meysamazad at

kodama-lens | 5 comments | 3 weeks ago
In advance: I like the Grafana, Prometheus, Mimir Loki stack BUT:

I have difficulties understanding their product lineup and roadmap. Everything is named Grafana something. Grafana the company, Grafana the monitoring Frontend, Grafana Agent, Grafana Agent Flow, Grafana Agent Operator, Grafana Operator. It is easy to mix things up especially if you talk to someone who is not in the Grafana world.

They also seem to change their mind quiet fast on how users are supposed to send metrics/logs. First there was the loki, prometheus clients, then Grafana Agent Operator, then Grafana Agent Flow mode and now they announced alloy at kubecon, immediately deprecating other solutions. All of wich do more or less the same - as far as I know.

These iterations are too fast for companies to develop trust in their solution. Even in private test clusters I have trouble catching up. And I have to confess the the new alloy solution did not make a good first impression. I dont't know why it replaces the Agent Flow mode and even worse, instead of sticking to k8s standards it uses its own custom configuration language that does not seem to have any advantages.

I hope they figure something out that works for base tasks and improve that solution over time instead of doing something new every year.

stupendousyappi | 0 comments | 3 weeks ago
I don't blame them for using a different config language, k8s style YAML sucks, which is why half of the k8s ecosystem is tools to generate it for you. And I believe Agent and Alloy are meant to be installable outside k8s as well. I do think it would have been less confusing to just call this product Agent 3.0 instead of Alloy, but I'm not too concerned about naming decisions.
lucianbr | 1 comment | 3 weeks ago
I don't understand the objection to the names. You think someone who is not in the "Grafana world" would easier understand an arbitrary name like "Thor" than "Grafana Operator"? I have no clue what Mimir is, or Loki, but "Grafana operator" is clearly a kubernetes operator for deploying Grafana. Like a "postgres operator" I read about sometime. How are things from norse mythology supposed to make it easier to understand software components and the relations between them?
kodama-lens | 0 comments | 3 weeks ago
Naming things is hard and I don't mind if you cant tell what a software/service does by it's name but I don't like that there so little separation between the names:

> I just deployed the Grafana Oparator, oops I ment Grafana Agent Operator

Hopefully it gets better with Alloy

nijave | 0 comments | 2 weeks ago
Imo at this point you're better off just using OSS not packaged by Grafana. Grafana Agent pulled in a ton of OSS as dependencies and basically implemented a service manager and config management system. However, such a big dependency tree seemed to complicate updates and also abstracted away useful config.

Fluent*/Promtail for logs and Prometheus for metrics haven't had much/any changes in the last couple years as far as I know

baby_souffle | 0 comments | 3 weeks ago
Agreed in all counts. They change product strategy so fast it seems like 1/2 of all links in their docs are 404s or go to something that is now deprecated.

I use grafana and Loki because those are stable. Use vector for log source and process and Prometheus for metrics and it’s much simpler!

__turbobrew__ | 1 comment | 2 weeks ago
Grafana is really nice when you can go 100% all in on it.

One thing I have found is that certain parts of grafana aren’t really designed for very large k8s clusters. For example if I want to utilize the k8s_sd_config for scraping metrics from pods the agent needs to watch ALL pods in the cluster which for clusters with hundreds of thousands of pods is a no go.

I ended up recently writing my own daemonset agent which queries the kubelet API to get pods on the local node and uses that information to set up the scrape routines. There are other solutions such as scrape managers which do the watching and then tell the agents what endpoints to scrape, but that seems overly complex, and those features are in beta and not in the happy path.

Another thing I would like to do more easily is have different relabel configs for different pods/targets. If you are running a multitennant k8s cluster it is useful to allow tenants to define their own scrape configs because they may want to relabel/rename/drop their metrics. Again, I found the grafana solutions either too complex or they don’t solve my problem so I wrote into my own solution where tenants define their scrape configs in k8s config maps and then the agent consumes those configmaps to set up scraping.

nijave | 0 comments | 2 weeks ago
Iirc this is a limitation with Prometheus, too, but they give you a switch to change the behavior

It's possible Grafana Agent inherited (potentially without exposing the corresponding configuration switch to change it)

Edit, it might also be a limitation with k8s, too. I think watches don't support annotations

> Again, I found the grafana solutions either too complex or they don’t solve my problem so I wrote into my own solution where tenants define their scrape configs in k8s config maps and then the agent consumes those configmaps to set up scraping.

I think the Prometheus Operator supports this. I also setup something similar with weaveworks tf-controller (or whatever they renamed it to). It can data source configmaps and shove that into other places

darkwater | 4 comments | 3 weeks ago
So yet another example of people piggybacking on the Kubernetes manifests & inner loop to define and apply resources on other APIs. I wonder when some places will start to provision dedicated k8s masters just to run their IaC loop, instead of running actual workload...
MrDarcy | 1 comment | 3 weeks ago
Already happening. A control plane only cluster is an excellent replacement for Hashicorp Vault for example.

See also the Cluster API to manage other clusters and Crossplane to manage anything Terraform would have managed.

darkwater | 1 comment | 2 weeks ago
> Already happening. A control plane only cluster is an excellent replacement for Hashicorp Vault for example.

Vault? Don't you mean Terraform (Enterprise), or platforms based on TF that do that kind of thing (converging platform to its code definition, automatically).

I don't really understand why (and how) one would use K8s API server as a replacement for HC Vault.

nijave | 0 comments | 2 weeks ago
It has RBAC, service accounts, and Secrets all with APIs. I guess you could automate provisioning storage accounts on nodes or have a system that creates tokens with the cluster key. Not sure that's really any better than not using it, though
dilyevsky | 0 comments | 2 weeks ago
We do that - chucked all kubernetes apis and run an apiserver with our own apis + controllers to reconcile state. It’s a bit of work to make it work bc the tooling is obviously not designed for this but you can make it happen
kodama-lens | 0 comments | 3 weeks ago
You mean everyone that runs a Crossplane cluster? Already happening
dboreham | 1 comment | 3 weeks ago
Could call that project "tortuga".
spdustin | 0 comments | 2 weeks ago
Tortugas hasta el final, in fact.