k3s.live

Based on the IT journey of Michael Rickert

Rancher kubernetes log forwarding

Have you had to try and forward pod/container logs from a rancher cluster using rancher logging (banzaicloud/fluent) and everything gets nested and jumbled together on the receiving end? Lets fix that!

First install the rancher logging chart from their official repository, this is actually the banzaicloud logging operator under the hood with some rancher specific wrappings around it. That’s an important note if you ever need to look up how to do xyz within the logging interface, chances are you actually want to look up banzaicloud logging with fluent not specifically rancher.

Lets get started.

Install the logging operator from rancher cluster tools

I like to set some extra filterKubernetes rules right in the logging operator helm config yaml, however I’m not sure yet if this has any impact on the actual log flow operation. (hint: i dont think it does). Be sure to toss in any other pool tolerations you need at the fluentbit section.

fluentbit:
  filterKubernetes:
    Merge_Log: 'On'
    Merge_Log_Key: 'kubernetes'
    Merge_Log_Trim: ''
    Merge_Parser: ''
    Keep_Log: 'Off'
  tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/controlplane
      value: 'true'
    - effect: NoExecute
      key: node-role.kubernetes.io/etcd
      value: 'true'
    - effect: NoSchedule
      key: pool
      operator: Equal
      value: infra

For this example I’m sending logs via syslog to a splunk receiving endpoint, but the important thing here is to have a ClusterOutput set and configured.

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: splunk-output
  namespace: cattle-logging-system
spec:
  syslog:
    buffer:
      chunk_limit_size: 10M
      flush_interval: 5s
      flush_mode: interval
      overflow_action: block
      retry_forever: true
      retry_type: exponential_backoff
      timekey: 30s
      timekey_use_utc: true
      timekey_wait: 10s
      total_limit_size: 500M
    format:
      type: json
    host: mysplunkhost.k3s.live
    insecure: true
    port: 514
    transport: udp
    format:
      app_name_field: kubernetes.pod_name
      hostname_field: kubernetes.host
      log_field: message
      rfc6587_message_size: false

Finally, and the main star of the show, is the ClusterFlow with filter record_transformer. This enables full ruby control of the incoming log records to manipulate them as needed. Dig is being used here to extract the nested kubernetes metadata (hostname, container name, etc) and bring it forward to the top level of the logging output. This allows our log collector (splunk) to properly categorize the incoming log stream data.

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: kubernetes-filter-v3
  namespace: cattle-logging-system
spec:
  filters:
    - record_transformer:
        enable_ruby: true
        records:
        - "container": "${record.dig('kubernetes','container_name')}"
        - "namespace": "${record.dig('kubernetes','namespace_name')}"
        - "pod": "${record.dig('kubernetes','pod_name')}"
        - "host": "${record.dig('kubernetes','host')}"
        - "cluster": "staging"
        - "location": "chicago"
  globalOutputRefs:
    - splunk-output

In addition I added two new static records ‘cluster: staging’ ‘location: chicago’ to help us track which rancher cluster these specific log streams are coming from, making it easier to search in splunk.

The above yaml snippet then gets interpreted by fluentbit as follows:

      <record>
        @type transform
        @log_level info
        container ${record.dig('kubernetes','container_name')}
        namespace ${record.dig('kubernetes','namespace_name')}
        pod ${record.dig('kubernetes','pod_name')}
        host ${record.dig('kubernetes','host')}

        cluster: staging
        location: chicago
      </record>

Looks familiar right? Once you get the clusterflow syntax via yaml in the correct format, the banzaicloud operator converts it to proper fluentd code with types and records. From here you can repeat the same above process for any type of record manipulation you'd like by just following the fluentd docs and matching the yaml up to the examples written in <record> format!

Enjoy your new logging world

Leave a Reply