GitHub — vmware/kube-fluentd-operator: Auto-configuration of Fluentd daemon-set based on Kubernetes metadata

15 min readFeb 25, 2022

Originally published at https://github.com.

Kubernetes Fluentd Operator (KFO) is a Fluentd config manager with batteries included, config validation, no needs to restart, with sensible defaults and best practices built-in. Use Kubernetes labels to filter/route logs per namespace!

kube-fluentd-operator configures Fluentd in a Kubernetes environment. It compiles a Fluentd configuration from configmaps (one per namespace) — similar to how an Ingress controller would compile nginx configuration from several Ingress resources. This way only one instance of Fluentd can handle all log shipping for an entire cluster and the cluster admin does NOT need to coordinate with namespace admins.

Cluster administrators set up Fluentd only once and namespace owners can configure log routing as they wish. KFO will re-configure Fluentd accordingly and make sure logs originating from a namespace will not be accessible by other tenants/namespaces.

KFO also extends the Fluentd configuration language making it possible to refer to pods based on their labels and the container name pattern. This enables for very fined-grained targeting of log streams for the purpose of pre-processing before shipping. Writing a custom processor, adding a new Fluentd plugin, or writing a custom Fluentd plugin allow KFO to be extendable for any use case and any external logging ingestion system.

Finally, it is possible to ingest logs from a file on the container filesystem. While this is not recommended, there are still legacy or misconfigured apps that insist on logging to the local filesystem.

The easiest way to get started is using the Helm chart. Official images are not published yet, so you need to pass the image.repository and image.tag manually:

git clone git@github.com:vmware/kube-fluentd-operator.git helm install kfo ./kube-fluentd-operator/charts/log-router \ --set rbac.create=true \ --set image.tag=v1.16.2 \ --set image.repository=vmware/kube-fluentd-operator

Alternatively, deploy the Helm chart from a Github release:

Then create a namespace demo and a configmap describing where all logs from demo should go to. The configmap must contain an entry called "fluent.conf". Finally, point the kube-fluentd-operator to this configmap using annotations.

In a minute, this configuration would be translated to something like this:

Even though the tag ** was used in the <match> directive, the kube-fluentd-operator correctly expands this to demo.**. Indeed, if another tag which does not start with demo. was used, it would have failed validation. Namespace admins can safely assume that they has a dedicated Fluentd for themselves.

All configuration errors are stored in the annotation logging.csp.vmware.com/fluentd-status. Try replacing ** with an invalid tag like 'hello-world'. After a minute, verify that the error message looks like this:

# extract just the value of logging.csp.vmware.com/fluentd-status kubectl get ns demo -o jsonpath='{.metadata.annotations.logging\.csp\.vmware\.com/fluentd-status}' bad tag for <match>: hello-world. Tag must start with **, $thisns or demo

When the configuration is made valid again the fluentd-status is set to "".

To see kube-fluentd-operator in action you need a cloud log collector like logz.io, loggly, papertrail or ELK accessible from the K8S cluster. A simple loggly configuration looks like this (replace TOKEN with your customer token):

Get the code using go get or git clone this repo:

charts/log-router: Builds the Helm chart
base-image: Builds a Fluentd 1.2.x image with a curated list of plugins
config-reloader: Builds the daemon that generates fluentd configuration files

This is where interesting work happens. The dependency graph shows the high-level package interaction and general dataflow.

config: handles startup configuration, reading and validation
datasource: fetches Pods, Namespaces, ConfigMaps from Kubernetes
fluentd: parses Fluentd config files into an object graph
processors: walks this object graph doing validations and modifications. All features are implemented as chained Processor subtypes
generator: serializes the processed object graph to the filesystem for Fluentd to read
controller: orchestrates the high-level datasource -> processor -> generator pipeline.

It works be rewriting the user-provided configuration. This is possible because kube-fluentd-operator knows about the kubernetes cluster, the current namespace and also has some sensible defaults built in. To get a quick idea what happens behind the scenes consider this configuration deployed in a namespace called monitoring:

It gets processed into the following configuration which is then fed to Fluentd:

To give the illusion that every namespace runs a dedicated Fluentd the user-provided configuration is post-processed. In general, expressions starting with $ are macros that are expanded. These two directives are equivalent: <match **>, <match $thisns>. Almost always, using the ** is the preferred way to match logs: this way you can reuse the same configuration for multiple namespaces.

Kube-fluentd-operator defines one namespace to be the admin namespace. By default this is set to kube-system. The admin namespace is treated differently. Its configuration is not processed further as it is assumed only the cluster admin can manipulate resources in this namespace. If you don't plan to use any of the advanced features described bellow, you can just route all logs from all namespaces using this snippet in the admin namespace:

** in this context is not processed and it means literally everything.

Fluentd assumes it is running in a distro with systemd and generates logs with these Fluentd tags:

systemd.{unit}: the journal of a systemd unit, for example systemd.docker.service
docker: all docker logs, not containers. If systemd is used, the docker logs are in systemd.docker.service
k8s.{component}: logs from a K8S component, for example k8s.kube-apiserver
kube.{namespace}.{pod_name}.{container_name}: a log originating from (namespace, pod, container)

As the admin namespace is processed first, a match-all directive would consume all logs and any other namespace configuration will become irrelevant (unless <copy> is used). A recommended configuration for the admin namespace is this one (assuming it is set to kube-system) - it captures all but the user namespaces' logs:

Note the <match systemd.** syntax. A single * would not work as the tag is the full name - including the unit type, for example systemd.nginx.service

The above config will pipe all logs from the pods labelled with app=log-router through a logfmt parser before sending them to loggly. Again, this configuration is valid in any namespace. If the namespace doesn't contain any log-router components then the <filter> directive is never activated. The _container is sort of a "meta" label and it allows for targeting the log stream of a specific container in a multi-container pod.

If you use Kubernetes recommended labels for the pods and deployments, then KFO will rewrite . characters into _.

For example, let’s assume the following labels exist in the fluentd-config in the testing namespace:

This label $labels(_container=nginx-ingress-controller) will filter by container name pattern. The label will convert to this for example: kube.testing.*.nginx-ingress-controller._labels.*.*.

This label $labels(app.kubernetes.io/name=nginx-ingress, _container=nginx-ingress-controller) converts to this kube.testing.*.nginx-ingress-controller._labels.*.nginx_ingress.

This label $labels(app.kubernetes.io/name=nginx-ingress) converts to this $labels(kube.testing*.*._labels.*.nginx_ingress).

This fluentd configmap in the testing namespace:

will be rewritten inside of KFO pods as this:

All plugins that change the fluentd tag are disabled for security reasons. Otherwise a rogue configuration may divert other namespace’s logs to itself by prepending its name to the tag.

The labels parameter is similar to the $labels macro and helps the daemon locate all pods that might log to the given file path. The <parse> directive is optional and if omitted the default @type none will be used. If you know the format of the log file you can explicitly specify it, for example @type apache2 or @type json.

The above configuration would translate at runtime to something similar to this:

Most log streams are line-oriented. However, stacktraces always span multiple lines. kube-fluentd-operator integrates stacktrace processing using the fluent-plugin-detect-exceptions. If a Java-based pod produces stacktraces in the logs, then the stacktraces can be collapsed in a single log event like this:

Notice how filter is used instead of match as described in fluent-plugin-detect-exceptions. Internally, this filter is translated into several match directives so that the end user doesn't need to bother with rewriting the Fluentd tag.

Also, users don’t need to bother with setting the correct stream parameter. kube-fluentd-operator generates one internally based on the container id and the stream.

Sometimes you only have a few valid options for log sinks: a dedicated S3 bucket, the ELK stack you manage, etc. The only flexibility you’re after is letting namespace owners filter and parse their logs. In such cases you can abstract over an output plugin configuration — basically reducing it to a simple name which can be referenced from any namespace. For example, let’s assume you have an S3 bucket for a “test” environment and you use loggly for a “staging” environment. The first thing you do is define these two output in the admin namespace:

In the above example for the admin configuration, the match directive is first defined to direct where to send logs for the systemd, docker, kube-system, and kubernetes control plane components. Below the match directive we have defined the plugin directives which define the log sinks that can be reused by namespace configurations.

A namespace can refer to the staging and test plugins oblivious to the fact where exactly the logs end up:

kube-fluentd-operator will insert the content of the plugin directive in the match directive. From then on, regular validation and postprocessing takes place.

Sometimes you might need to split a single log stream to perform different processing based on the contents of one of the fields. To achieve this you can use the retag plugin that allows to specify a set of rules that match regular expressions against the specified fields. If one of the rules matches, the log is re-emitted with a new namespace-unique tag based on the specified tag.

Logs that are emitted by this plugin can be consequently filtered and processed by using the $tag macro when specifiying the tag:

kube-fluentd-operator ensures that tags specified using the $tag macro never conflict with tags from other namespaces, even if the tag itself is equivalent.

By default, you can consume logs only from your namespaces. Often it is useful for multiple namespaces (tenants) to get access to the logs streams of a shared resource (pod, namespace). kube-fluentd-operator makes it possible using two constructs: the source namespace expresses its intent to share logs with a destination namespace and the destination namespace expresses its desire to consume logs from a source. As a result logs are streamed only when both sides agree.

A source namespace can share with another namespace using the @type share macro:

producer namespace configuration:

consumer namespace configuration:

The consuming namespace can use the usual syntax inside the <label @$from...> directive. The fluentd tag is being rewritten as if the logs originated from the same namespace.

Often you run mulitple Kubernetes clusters but you need to aggregate all logs to a single destination. To distinguish between different sources, kube-fluentd-operator can attach arbitrary metadata to every log event. The metadata is nested under a key chosen with--meta-key. Using the helm chart, metadata can be enabled like this:

helm install ... \ --set meta.key=metadata \ --set meta.values.region=us-east-1 \ --set meta.values.env=staging \ --set meta.values.cluster=legacy

Every log event, be it from a pod, mounted-file or a systemd unit, will now carry this metadata:

All logs originating from a file look exactly as all other Kubernetes logs. However, their stream field is not set to stdout but to the path to the source file:

Custom resources are introduced from v1.13.0 release onwards. It allows to have a dedicated resource for fluentd configurations, which enables to manage them in a more consistent way and move away from the generic ConfigMaps. It is possible to create configs for a new application simply by attaching a FluentdConfig resource to the application manifests, rather than using a more generic ConfigMap with specific names and/or labels.

The “crd” has been introduced as a new datasource, configurable through the helm chart values, to allow users that are currently set up with ConfigMaps and do not want to perform the switchover to FluentdConfigs, to be able to keep on using them. The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed. The existing configurations though ConfigMaps can be migrated to CRDs through the following migration flow

A new user, who is installing kube-fluentd-operator for the first time, should set the datasource: crd option in the chart. This enables the crd support
A user who is already using kube-fluentd-operator with either datasource: default or datasource: multimap will have update to the new chart and set the ‘crdMigrationMode’ property to ‘true’. This enables the config-reloader to launch with the crd datasource and the legacy datasource (either default or multimap depending on what was configured in the datasource property). The user can slowly migrate one by one all configmap resources to the corresponding fluentdconfig resources. When the migration is complete, the Helm release can be upgraded by changing the ‘crdMigrationMode’ property to ‘false’ and switching the datasource property to ‘crd’. This will effectively disable the legacy datasource and set the config-reloader to only watch fluentdconfig resources.

This projects tries to keep up with major releases for Fluentd docker image.

kube-fluentd-operator aims to be easy to use and flexible. It also favors sending logs to multiple destinations using <copy> and as such comes with many plugins pre-installed:

fluentd (1.14.4)
fluent-config-regexp-type (1.0.0)
fluent-mixin-config-placeholders (0.4.0)
fluent-plugin-amqp (0.14.0)
fluent-plugin-azure-loganalytics (0.7.0)
fluent-plugin-cloudwatch-logs (0.14.2)
fluent-plugin-concat (2.5.0)
fluent-plugin-datadog (0.14.0)
fluent-plugin-detect-exceptions (0.0.14) — forked to allow fluentd v1 plugin api
fluent-plugin-elasticsearch (5.1.0)
fluent-plugin-gelf-hs (1.0.8)
fluent-plugin-google-cloud (0.13.0) — forked to allow fluentd v1.14.x
fluent-plugin-grafana-loki (1.2.16)
fluent-plugin-grok-parser (2.6.2)
fluent-plugin-json-in-json-2 (1.0.2)
fluent-plugin-kafka (0.17.2)
fluent-plugin-kinesis (3.4.1)
fluent-plugin-kubernetes (0.3.1)
fluent-plugin-kubernetes_metadata_filter (2.9.1)
fluent-plugin-kubernetes_sumologic (2.4.2)
fluent-plugin-logentries (0.2.10)
fluent-plugin-loggly (1.0.0) — forked to fix for new fluentd api
fluent-plugin-logzio (0.0.21)
fluent-plugin-mail (0.3.0)
fluent-plugin-mongo (1.5.0)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-mysqlslowquery (0.0.9)
fluent-plugin-out-http (1.3.3)
fluent-plugin-papertrail (0.2.8)
fluent-plugin-prometheus (2.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-record-reformer (0.9.1)
fluent-plugin-redis (0.3.5)
fluent-plugin-remote_syslog (1.0.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-route (1.0.0)
fluent-plugin-s3 (1.6.1)
fluent-plugin-secure-forward (0.4.5)
fluent-plugin-splunkhec (2.1)
fluent-plugin-sumologic_output (1.7.2)
fluent-plugin-systemd (1.0.5)
fluent-plugin-uri-parser (0.3.0)
fluent-plugin-verticajson (0.0.6)
fluent-plugin-vmware-log-intelligence (2.0.6)
fluent-plugin-vmware-loginsight (1.0.0)

DEPRECATIONS (these are deprecated until fixed — #266):

When customizing the image be careful not to uninstall plugins that are used internally to implement the macros.

If you need other destination plugins you are welcome to contribute a patch or just create an issue.

The config-reloader binary is the one that listens to changes in K8S and generates Fluentd files. It runs as a daemonset and is not intended to interact with directly. The synopsis is useful when trying to understand the Helm chart or just hacking.

usage: config-reloader [<flags>] Regenerates Fluentd configs based Kubernetes namespace annotations against templates, reloading Fluentd if necessary Flags: --help Show context-sensitive help (also try --help-long and --help-man). --version Show application version. --master="" The Kubernetes API server to connect to (default: auto-detect) --kubeconfig="" Retrieve target cluster configuration from a Kubernetes configuration file (default: auto-detect) --datasource=default Datasource to use (default|fake|fs|multimap|crd) --crd-migration-mode Enable the crd datasource together with the current datasource to facilitate the migration (used only with --datasource=default|multimap) --fs-dir=FS-DIR If datasource=fs is used, configure the dir hosting the files --interval=60 Run every x seconds --allow-file Allow @type file for namespace configuration --id="default" The id of this deployment. It is used internally so that two deployments don't overwrite each other's data --fluentd-rpc-port=24444 RPC port of Fluentd --log-level="info" Control verbosity of config-reloader logs --fluentd-loglevel="info" Control verbosity of fluentd logs --buffer-mount-folder="" Folder in /var/log/{} where to create all fluentd buffers --annotation="logging.csp.vmware.com/fluentd-configmap" Which annotation on the namespace stores the configmap name? --default-configmap="fluentd-config" Read the configmap by this name if namespace is not annotated. Use empty string to suppress the default. --status-annotation="logging.csp.vmware.com/fluentd-status" Store configuration errors in this annotation, leave empty to turn off --kubelet-root="/var/lib/kubelet/" Kubelet root dir, configured using --root-dir on the kubelet service --namespaces=NAMESPACES ... List of namespaces to process. If empty, processes all namespaces --templates-dir="/templates" Where to find templates --output-dir="/fluentd/etc" Where to output config files --meta-key=META-KEY Attach metadat under this key --meta-values=META-VALUES Metadata in the k=v,k2=v2 format --fluentd-binary=FLUENTD-BINARY Path to fluentd binary used to validate configuration --prometheus-enabled Prometheus metrics enabled (default: false) --admin-namespace="kube-system" The namespace to be treated as admin namespace

kubeletRoot

The home dir of the kubelet, usually set using --root-dir on the kubelet

/var/lib/kubelet

fluentd.extraVolumeMounts

Mount extra volumes for the fluentd container, required to mount ssl certificates when elasticsearch has tls enabled

reloader.extraVolumeMounts

Mount extra volumes for the reloader container

podAnnotations

Pod annotations for the daemonset

Simple, define configuration only for the admin namespace (by default kube-system):

Simple, exclude them at the admin namespace level (by default kube-system):

It is not possible to handle this globally. Instead, provide this config for the noisy namespace and configure other namespaces at the cost of some code duplication:

On the bright side, the configuration of noisy-namespace contains nothing specific to noisy-namespace and the same content can be used for all namespaces whose logs we need collected.

Your cluster is running under RBAC. You need to enable a serviceaccount for the log-router pods. It’s easy when using the Helm chart:

helm install ./charts/log-router --set rbac.create=true ...

First you need version 1.1.0 or later. At the namespace level you need to add a source directive of type mounted-file:

The type mounted-file is again a macro that is expanded to a tail plugin. The <parse> directive is optional and if not set a @type none will be used instead.

In order for this to work the pod must define a mount of type emptyDir at /var/log/httpd or any of it parent folders. For example, this pod definition is part of the test suite (it logs to /var/log/hello.log):

To get the hello.log ingested by Fluentd you need at least this in the configuration for kfo-test namespace:

I want to push logs from namespace `demo` to logz.io

demo.conf: <match **> @type logzio_buffered endpoint_url https://listener.logz.io:8071?token=TOKEN&type=log-router output_include_time true output_include_tags true <buffer> @type memory flush_thread_count 4 flush_interval 3s queue_limit_length 4096 </buffer> </match>

For details you should consult the plugin documentation.

To get the general idea how truncation works, consider this table:

Humio speaks the elasticsearh protocol so configuration is pretty similar to Elasticsearch. The example bellow is based on https://github.com/humio/kubernetes2humio/blob/master/fluentd/docker-image/fluent.conf.

For details you should consult the plugin documentation.

The container comes with a file validation command. To use it put all your *.conf file in a directory. Use the namespace name for the filename. Then use this one-liner, bind-mounting the folder and feeding it as a DATASOURCE_DIR env var:

docker run --entrypoint=/bin/validate-from-dir.sh \ --net=host --rm \ -v /path/to/config-folder:/workspace \ -e DATASOURCE_DIR=/workspace \ vmware/kube-fluentd-operator:latest

It will run fluentd in dry-run mode and even catch incorrect plug-in usage. This is so common that it’ already captured as a script validate-logging-config.sh. The preferred way to use it is to copy it to your project and invoke it like this:

validate-logging-config.sh path/to/folder

All path/to/folder/*.conf files will be validated. Check stderr and the exit code for errors.

Use <label> as usual, the daemon ensures that label names are unique cluster-wide. For example to route several pods' logs to destination X, and ignore a few others you can use this:

<match $labels(app=foo)> @type relabel @label @blackhole </match> <match $labels(app=bar)> @type relabel @label @blackhole </match> <label @blackhole> <match **> @type null </match> </label> # at this point, foo and bar's logs are being handled in the @blackhole chain, # the rest are still available for processing <match **> @type .. </match>

The ingress controller uses a format different than the plain Nginx. You can use this fragment to configure the namespace hosting the ingress-nginx controller:

The retag plugin allows to split a log stream based on whether the contents of certain fields match the given regular expressions.

<match $labels(app=apache)> @type retag <rule> key message pattern ^ERR tag notifications.error </rule> <rule> key message pattern ^ERR invert true tag notifications.other </rule> </match> <match $tag(notifications.error)> # manage log stream with error severity </match> <match $tag(notifications.**)> # manage log stream with non-error severity </match>

You need to run make like this:

This will build the code, then config-reloader will connect to the K8S cluster, fetch the data and generate *.conf files in the ./tmp directory. If there are errors the namespaces will be annotated.

Use the vmware/kube-fluentd-operator:TAG as a base and do any modification as usual. If this plugin is not top-secret consider sending us a patch :)

When deploying the daemonset using Helm, make sure to pass some metadata:

For the cluster in USA:

helm install ... \ --set=meta.key=cluster_info \ --set=meta.values.region=us-east-2

For the cluster in Europe:

helm install ... \ --set=meta.key=cluster_info \ --set=meta.values.region=eu-west-2

If you are using ELK you can easily get only the logs from Europe using cluster_info.region: +eu-west-2. In this example the metadata key is cluster_info but you can use any key you like.

It is possible to reduce configuration burden by using a default configmap name. The default value is fluentd-config - kube-fluentd-operator will read the configmap by that name if the namespace is not annotated. If you don't like this default name or happen to use this configmap for other purposes then override the default with--default-configmap=my-default.

.pos files store the progress of the upload process and .buf are used for local buffering. Colliding .pos/.buf paths can lead to races in Fluentd. As such, kube-fluentd-operator tries hard to rewrite such path-based parameters in a predictable way. You only need to make sure they are unique for your namespace and config-reloader will take care to make them unique cluster-wide.

Use --annotation=acme.com/fancy-config to use acme.com/fancy-config as annotation name. However, you'd also need to customize the Helm chart. Patches are welcome!

The kube-fluentd-operator project team welcomes contributions from the community. If you wish to contribute code and you have not signed our contributor license agreement (CLA), our bot will update the issue when you open a Pull Request. For any questions about the CLA process, please refer to our FAQ. For more detailed information, refer to CONTRIBUTING.md.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

GitHub — vmware/kube-fluentd-operator: Auto-configuration of Fluentd daemon-set based on Kubernetes metadata

I want to push logs from namespace `demo` to logz.io

Written by Phil Wilkins

No responses yet

GitHub — vmware/kube-fluentd-operator: Auto-configuration of Fluentd daemon-set based on Kubernetes metadata

I want to push logs from namespace demo to logz.io

Written by Phil Wilkins

No responses yet

I want to push logs from namespace `demo` to logz.io