Categories
Tutorials

Kubernetes Observability with Elasticsearch, Kibana, Filebeat, and Metricbeat

Kubernetes has managed to be the leading solution for container orchestration nowadays. It really is a powerful system with endless possibilities for microservices-based system designs. But as we know, with great power, comes great responsibility 🙂

Getting good observability on a system running numerous microservices on lots of nodes can be challenging. Containers are constantly on the move – starting up, terminating, moving between hosts, and whatnot. Although it’s very agile and can save costs, tracing all of our services and processing all of their outputs? Not so great. Luckily, we have the elastic stack.

In this post, I’ll go over the entire stack deployment using this open git repository. All of the components will be deployed to a namespace called “logging” – but you can change it to whatever you like. Just make sure to update all the YAMLs of the stack.

TL;DR

You can open the kubernetes-elastic-visibility repo on GitHub and run the command in the README.md. You’ll have the full observability stack deployed and ready to use on your cluster.

Namespace

Let’s prepare the terrain and create our namespace. Deploy the following YAML to create the “logging” namespace:

kind: Namespace
apiVersion: v1
metadata:
  name: logging

Elasicsearch Cluster

We’ll deploy the Elasticsearch cluster as a StatefulSet. Kubernetes StatefulSets is the preferred way of deploying applications that require state. The best example is Databases, that require a persistent disk to be mounted and in some cases, some ordering to the cluster nodes start times. Elasticsearch is no different, we want to be able to tell each node how to reach other nodes in the cluster, and we want to be able to mount the correct disk to each node. StatefulSets gives your pods the naming convention necessary for the nodes discovery and works with PersistatentVolumeClaim that can guarantee we mount the correct disks.

Our Elasticsearch cluster will consist of 3 master nodes under a StatefulSet and a single service that will allow other workloads inside the cluster to communicate with it.

Note: that there are a lot of ways to run Elasticsearch cluster, using only masters is only one of them. You can combine masters, data, and ingest node roles in your cluster depending on your requirements.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.9.3
        resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 100m
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name
            value: k8s-logs
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts
            value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
          - name: cluster.initial_master_nodes
            value: "es-cluster-0,es-cluster-1,es-cluster-2"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: gp2
      resources:
        requests:
          storage: 100Gi

Pay attention to the ENV section where we set the cluster hosts. Because we use a StatefulSet with the name “es-cluster”, each of our nodes will be assigned an index number starting at 0. Because of that promise, we can set the seeding hosts environment variable in advance with:

es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch

Pattern: <SET-NAME>-<INDEX>.<SERVICE-NAME>

Under the volumeClaimTemplates section, you can see we are assigning persistent storage of 100Gi to each pod using the storage class gp2. The size can be changed to whatever you forecast your cluster needs, also the gp2 storage class should be changed depending on your cloud provider. I’m using gp2 because my cluster is hosted on AWS and I want to use a General Purpose SSD.

In order for our cluster to be reachable for other services, we need to create a service. Because all of our nodes are masters, we can read/write to either of them.

kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: logging
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
    - port: 9200
      name: rest
    - port: 9300
      name: inter-node

This service definition will expose our nodes under the DNS name of elasticsearch, making it easier to find and load-balanced from other deployments.

Kibana

Kibana will be our UI tool to visualize and explore the data in Elasticsearch. We’ll deploy a simple Deployment component to the cluster with a single pod to run the app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.9.3
        resources:
          limits:
            cpu: 1000m
            memory: 4Gi
          requests:
            cpu: 1000m
            memory: 4Gi
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch:9200
        ports:
        - containerPort: 5601

Because we created a service for the Elasticsearch cluster, we can tell Kibana what is the address of this cluster at the ENV section and make it connect to it upon boot. DNS for the service is just its name: http://elasticsearch:9200. You can read more about how DNS is resolved in Kubernetes here.

Same as with the elastic cluster, we want to have a simple way to expose communication to our Kibana pod. To do that, we’ll create another service.

apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
  labels:
    app: kibana
spec:
  ports:
  - port: 5601
  selector:
    app: kibana

This service is for our convenience and has no operation effect on the cluster. We will use kubectl CLI to create a tunnel to our Kibana pod (so we won’t have to open it to the public internet). One option is to just find the pod name and use it for our command. But we could easily create ourselves a shortcut using a static Service name and proxy traffic to it.

To create a tunnel using our Kibana service, run this:

kubectl -n logging port-forward service/kibana 5601

Now you can point your browser to http://localhost:5601 and you should have the Kibana UI ready for use.

Filebeat

Now that we have Elasticsearch cluster ready to ingest our data, and Kibana ready to show it to us, we can start writing our logs and metrics. We’ll start with the logs.

In Kubernetes, each pod consists of 1 or more containers, each container stdout/stderr streams are automatically written to a file at the hosting node. So, if we have access to the host storage, we can read these files which are basically the same as looking at the containers stdout/stderr streams. These files are stored, in most cases, at the host machine under this path: /var/log/containers/*.log (* = container id).

So the plan is to run a pod on each node, that will mount the log directory of the host, tail all the files, and write it to elastic. To do all of that we’ll use Filebeat. Filebeat is the elastic stack log shipper. It can tail our files, and with the right processor even enrich our logs with some Kubernetes metadata, then, write it all to our elastic cluster.

First, let’s create the configuration file for Filebeat using a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: logging
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.inputs:
    - type: container
      paths:
        - /var/log/containers/*.log
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
    
    processors:
      - add_cloud_metadata:
      - add_host_metadata:

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      

This config will be mounted as filebeat.yml file to the pod. It tells filebeat what files to tail, what processors to run, and where to write its data.

To run a single pod on each node we’ll use DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: logging
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: filebeat
        image: docker.elastic.co/beats/filebeat:7.9.3
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: ELASTICSEARCH_HOST
          value: elasticsearch
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: varlog
          mountPath: /var/log
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0640
          name: filebeat-config
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: varlog
        hostPath:
          path: /var/log      
      - name: data
        hostPath:          
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate

Each pod in this set will mount the configuration file from the ConfigMap we created a step before, mount it’s host node /var/log and /var/lib/docker/containers directories as it’s own (to tail), and run Filebeat with our elastic cluster configured as output (address from ENV).

If all goes well, we’ll have a single Filebeat container, running on each node of our Kubernetes cluster, and logs from all containers are flowing to our Elastic cluster.

Metricbeats

We have Filebeat running as a log shipper, now we need a metric shipper. Here comes Metricbeat – a metric shipper from Elastic.

Metricbeat will help us ship metrics from our host nodes and running pods to Elasticsearch. It will do it using 2 methods. 1) running in the cluster alongside kube-state-metrics (will be explained), reading, and publishing its metrics. 2) running on each node, reading the node (host) stats, and publishing them. For the first one, we’ll have a single pod, for the second, a DaemonSet (just like Filebeat).

First let’s create the DameonSet configuration file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-config
  namespace: logging
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

    processors:
      - add_cloud_metadata:

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']          
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-modules
  namespace: logging
  labels:
    k8s-app: metricbeat
data:
  system.yml: |-
    - module: system
      period: 10s
      metricsets:
        - cpu
        - load
        - memory
        - network
        - process
        - process_summary
        - core
        - diskio
        - socket
      processes: ['.*']
      process.include_top_n:
        by_cpu: 5      # include top 5 processes by CPU
        by_memory: 5   # include top 5 processes by memory

    - module: system
      period: 1m
      metricsets:
        - filesystem
        - fsstat
      processors:
      - drop_event.when.regexp:
          system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'
  kubernetes.yml: |-
    - module: kubernetes
      metricsets:
        - node
        - system
        - pod
        - container
        - volume
      period: 10s
      host: ${NODE_NAME}
      hosts: ["https://${NODE_NAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.verification_mode: "none"
      # If there is a CA bundle that contains the issuer of the certificate used in the Kubelet API,
      # remove ssl.verification_mode entry and use the CA, for instance:
      #ssl.certificate_authorities:
        #- /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    # Currently `proxy` metricset is not supported on Openshift, comment out section
    - module: kubernetes
      metricsets:
        - proxy
      period: 10s
      host: ${NODE_NAME}
      hosts: ["localhost:10249"]

Now, create the DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metricbeat
  namespace: logging
  labels:
    k8s-app: metricbeat
spec:
  selector:
    matchLabels:
      k8s-app: metricbeat
  template:
    metadata:
      labels:
        k8s-app: metricbeat
    spec:
      serviceAccountName: metricbeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: metricbeat
        image: docker.elastic.co/beats/metricbeat:7.9.3
        args: [
          "-c", "/etc/metricbeat.yml",
          "-e",
          "-system.hostfs=/hostfs",
        ]
        env:
        - name: ELASTICSEARCH_HOST
          value: elasticsearch
        - name: ELASTICSEARCH_PORT
          value: "9200"        
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0          
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/metricbeat.yml
          readOnly: true
          subPath: metricbeat.yml
        - name: data
          mountPath: /usr/share/metricbeat/data
        - name: modules
          mountPath: /usr/share/metricbeat/modules.d
          readOnly: true
        - name: proc
          mountPath: /hostfs/proc
          readOnly: true
        - name: cgroup
          mountPath: /hostfs/sys/fs/cgroup
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: cgroup
        hostPath:
          path: /sys/fs/cgroup
      - name: config
        configMap:
          defaultMode: 0640
          name: metricbeat-daemonset-config
      - name: modules
        configMap:
          defaultMode: 0640
          name: metricbeat-daemonset-modules
      - name: data
        hostPath:
          # When metricbeat runs as non-root user, this directory needs to be writable by group (g+w)
          path: /var/lib/metricbeat-data
          type: DirectoryOrCreate

These pods will be responsible for running on all nodes and reading their stats about CPU, memory, network, disk, and more.

On top of that, we need another dedicated pod that will read and publish metrics for our workloads CPU, memory disk, etc… For that, we’ll get the help of kube-state-metrics. This official Kubernetes service will run inside the cluster, read all metrics from the Kubernetes API server, and expose them to Metricbeat in a format it can understand. To deploy kube-state-metrics just run the following command from the repo root:

kubectl apply -f ./kube-state-metrics

Now Let’s deploy the ConfigMap for the Metricbeat pod:

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-deployment-config
  namespace: logging
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

    processors:
      - add_cloud_metadata:
    
    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']            
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-deployment-modules
  namespace: logging
  labels:
    k8s-app: metricbeat
data:
  # This module requires `kube-state-metrics` up and running under `kube-system` namespace
  kubernetes.yml: |-
    - module: kubernetes
      metricsets:
        - state_node
        - state_deployment
        - state_replicaset
        - state_pod
        - state_container
        - state_cronjob
        - state_resourcequota
      period: 10s
      host: ${NODE_NAME}
      hosts: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]    

And the Deployment:

# Deploy singleton instance in the whole cluster for some unique data sources, like kube-state-metrics
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metricbeat
  namespace: logging
  labels:
    k8s-app: metricbeat
spec:
  selector:
    matchLabels:
      k8s-app: metricbeat
  template:
    metadata:
      labels:
        k8s-app: metricbeat
    spec:
      serviceAccountName: metricbeat
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: metricbeat
        image: docker.elastic.co/beats/metricbeat:7.9.3
        args: [
          "-c", "/etc/metricbeat.yml",
          "-e",
        ]
        env:
        - name: ELASTICSEARCH_HOST
          value: elasticsearch
        - name: ELASTICSEARCH_PORT
          value: "9200"                
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/metricbeat.yml
          readOnly: true
          subPath: metricbeat.yml
        - name: modules
          mountPath: /usr/share/metricbeat/modules.d
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0640
          name: metricbeat-deployment-config
      - name: modules
        configMap:
          defaultMode: 0640
          name: metricbeat-deployment-modules

If all is set up correctly, you’ll have a pod on each node that will read and publish the nodes metrics, and another pod that will run against the kube-state-metrics server and publishes the Pods stats.

Conclusion

Elastic has a good suite of services that can provide us with an end-to-end solution for metrics and logs. Deploying this stack is quick and easy and provides good coverage for our cluster observability. Although the Elastic stack is great, there are a lot of other solutions out there that can give you the same end result. It is worth to check them out and choose the best one for our requirements.

And one last semi-warning I like to add when talking about any drop-in solution for Kubernetes. Kubernetes is a fairly complicated system. Services are constantly moving and changing, there are a lot of internal dependencies, microservices, APIs, and there are different setups that can be done (self-managed on the cloud, fully managed, on-premises). Having a drop-in solution (of any kind) is nice and can save us a lot of time, but, we should not forget that behind these few YAMLs a lot of work has been done. When deploying a solution on a production cluster, we should get to know the internals, how stuff works and what are the bits and bytes of our solution in order to be able to support and maintain it.

Leave a Reply

Your email address will not be published. Required fields are marked *