--- slug: sixth-blog-post title: Sixth Blog Post authors: [slorber, yangshun] tags: [hola, docusaurus] --- import CodeBlock from '@site/src/components/CodeBloack';
Automatically Adjusting App Size in Kubernetes with Horizontal PodAutoscale

Introduction

In the real world, when you run your applications, you want them to be strong and able to handle lots of users without crashing. But you also want to save money by not using more resources than you need. So, you need a tool that can make your applications get bigger or smaller depending on how many people are using them.

Usually, when you set up your apps in Kubernetes, you decide on a fixed number of copies to run all the time. But in real-life situations, like when many people visit your website or use your app, this fixed setup might not work well. It's like trying to fit everyone into the same-sized room, even when the crowd keeps changing. For example, if your website gets more visitors in the afternoon or during holidays, it could slow down or crash. This happens because there aren't enough copies of your app to handle all the requests. Manually changing the number of copies each time this happens can be a hassle. Plus, it's not efficient because you'd end up using more resources than you need, costing you more money. So, it's better to have a way for your apps to automatically adjust their size based on how busy they are. That way, they can handle more visitors when needed and use fewer resources when things calm down.

This is where Horizontal Pod Autoscaling comes in! It's a clever system that automatically adjusts your app's size to handle more or fewer visitors, saving you money. Think of it as a tool that watches how busy your app is and adds more capacity when needed, or removes it when things calm down. So, you don't have to worry about manually changing things all the time.

Horizontal pod scaling focuses on adding or removing resources for your app, while vertical pod scaling focuses on making sure each resource has enough capacity to run smoothly.

How Horizontal Pod Autoscaling Keeps Your Apps Running Smoothly

Watching Your App: HPA keeps an eye on how much your app is using resources like CPU and memory.
Comparing and Deciding: It compares how many resources your app is using with a target value you set. If your app is using more resources than it should, HPA knows it needs to do something.
Scaling Up or Down: If your app is using a lot of resources, HPA can make your app bigger by adding more parts to it. This helps your app handle more users without slowing down. But if your app isn't using many resources, HPA can make your app smaller by removing some parts. This saves resources and money.

If you want to know more about how HPA decides when to make your app bigger or smaller, you can check out the details in the official documentation.

Behind the scenes, a HorizontalPodAutoscaler is just another CRD (Custom Resource Definition) that powers a special feature in Kubernetes. It works like a helpful tool that tells Kubernetes when to make your apps bigger or smaller based on how busy they are.To use it, you create a HorizontalPodAutoscaler file that tells Kubernetes which app you want to scale. Then, you use a command called kubectl to apply that file.Just remember, you can't use HorizontalPodAutoscaler with every type of app. For example, you can't use it with things like DaemonSets.

To work properly, HPA needs a metrics server in your cluster. This server keeps track of important details like how much CPU and memory your apps are using. One popular option for this server is the Kubernetes Metrics Server. The metrics server is like an extra tool that adds more capabilities to the Kubernetes system. Basically, it works by gathering information about how much of your computer's resources, like CPU and memory, your apps are using. Then, it makes this information available to other parts of Kubernetes, like the Horizontal Pod Autoscaler.You can also use the metrics server with a command called kubectl top to check on how your apps are doing. This can be helpful for fixing any issues with autoscaling.

Be sure to take a look at the main documentation to understand why the metrics server is important.

If you want to use metrics other than CPU or memory, like counting how many times your app is used, you can use something called Prometheus with a special adapter called prometheus-adapter. This lets you change the size of your apps based on these different metrics, not just how much CPU and memory they're using.

This guide will help you:

Set up the Kubernetes Metrics Server in your DOKS cluster.
Learn the key ideas and how to make HPAs for your apps.
Test each HPA setup with two situations: when your app load stays the same, and when it changes.
Set up and use the Prometheus Adapter to change app sizes based on different metrics.

Before starting this tutorial, make sure you have:

A Git client installed on your computer to download the Starter Kit repository.
Helm installed, which helps with managing Kubernetes Metrics Server installations and updates.
Kubectl installed for interacting with Kubernetes. Make sure it's set up and ready to use by following the instructions here to connect to your cluster.

Next, we'll show you how to deploy the Kubernetes Metrics Server using the Helm package manager.

Setting Up the Kubernetes Metrics Server

Using a single kubectl command, you can set up the Kubernetes Metrics Server in high availability mode.
Alternatively, you can use Helm to deploy the metrics-server chart to your cluster.

In this tutorial, we're using the Helm installation method because it's more flexible. You can easily adjust settings later if needed, like making sure everything stays up and running smoothly.

Keep in mind, setting up the metrics server needs some special permissions. For all the details on what you need, take a look at the official requirements page.

Here's how to deploy the metrics server using Helm:

Start by downloading the Starter Kit repository and open it on your computer.
Next, add the metrics-server Helm repository to your Helm configuration. Then, check the available charts.
The result you see will look something like this:

Note: The chart we're interested in is called metrics-server/metrics-server. This will install the Kubernetes metrics server on your cluster. For more information, you can visit the metrics-server chart page. Next, open and take a look at the metrics-server Helm values file provided in the Starter Kit repository. You can use any text editor you like, but it's best if it supports YAML linting. For example, you can use VS Code.

Lastly, you'll install the Kubernetes Metrics Server using Helm. This will also create a special namespace just for the metrics-server.

Note: We're using a specific version of the metrics-server Helm chart. In this case, we chose version 3.8.2, which corresponds to the 0.6.1 release of the metrics-server application (you can find this information in the output from Step 2). It's generally a good idea to specify a particular version like this. It helps ensure that your setup remains consistent and makes it easier to manage changes over time using tools like Git.

Deployment Status

You can check the status of the metrics-server deployment by

The result you see will look something like this (make sure to check that the STATUS column says "deployed"):-

Next, take a look at the status of all the resources in the metrics-server namespace in Kubernetes:

The result will look something like this (make sure the deployment and replicaset resources are healthy, and there are two of each):

443/TCP 8m54s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/metrics-server 2/2 2 2 8m55s NAME DESIRED CURRENT READY AGE replicaset.apps/metrics-server-694d47d564 2 2 2 8m55s `} />

Lastly, let's see if the kubectl top command works. It's similar to the Linux top command, which shows the current usage of resources like CPU and memory. The command below will display the current resource usage for all Pods in the kube-system namespace:

The result you'll see will look something like this (CPU usage is shown in millicores, and memory usage is shown in Mebibytes):

If you see a similar output as shown above, then you've set up the metrics server correctly. In the next step, we'll show you how to set up and test your first HorizontalPodAutoscaling resource.

Introduction to Horizontal Pod Autoscalers (HPAs)

Up until now, whenever you made a new setup for your application in Kubernetes, you set a fixed number of copies for it to run. This might be okay for basic situations or when you're just testing things out. But now, let's talk about something called a Horizontal Pod Autoscaler (HPA) which can make things more flexible.

With an HPA, Kubernetes can automatically adjust the number of copies of your application based on how much it's being used. Think of it like this: if your app suddenly gets really popular and needs more copies to handle all the users, the HPA can make that happen without you having to manually change anything.

Breaking Down the Configuration:

spec.scaleTargetRef: This tells Kubernetes what to keep an eye on and adjust. In our case, it's our deployment, my-app-deployment.

spec.minReplicas: This is the smallest number of copies of our app that Kubernetes will keep running. We've set it to 1, so there's always at least one copy running, even if the app isn't busy.

spec.maxReplicas: This is the largest number of copies Kubernetes will make. We've set it to 10, so even if lots of people start using our app, Kubernetes won't make too many copies and overload things

spec.metrics.type: This tells Kubernetes what information to use when deciding if it needs to make more copies. In our example, we're using the "Resource" type, which means Kubernetes looks at things like how much CPU our app is using. If it goes over 50%, Kubernetes will make more copies to handle the extra load.

Next, there are two ways you can set up an HPA for your application deployment:

Using kubectl command: If you already have a deployment set up, you can use a command called kubectl autoscale to add an HPA to it.
Creating a YAML file: Alternatively, you can create a simple text file (called a YAML file) that describes your HPA settings. Then, you use another command, kubectl apply, to make those changes happen in your Kubernetes cluster.

If you want to quickly test something without dealing with complex files, you can use the first option. Let's try it with an example from the Starter Kit.

First, if you haven't already, you'll need to make a copy of the Starter Kit repository on your computer. Then, go to the folder where you saved it.
Next, let's create a deployment called "myapp-test". This deployment is meant to do something that uses up CPU, like printing a message over and over again without stopping.
Lastly, we'll make a Horizontal Pod Autoscaler (HPA) that focuses on our "myapp-test" deployment. This HPA helps adjust the number of copies of our app based on how much CPU it's using.

Here's what we're doing: We're asking Kubernetes to create something called an HPA for us. This HPA will keep an eye on our "myapp-test" deployment. When the average CPU usage of our app hits 50%, the HPA will automatically adjust the number of copies of our app. It won't make more than 3 copies, but it'll always keep at least 1 copy running. You can check if the HPA was created by running:

Here's what you might see when you check if the HPA was created successfully: In the output, you'll notice a column labeled "TARGETS" which shows a value of 50%. This means that the HPA is set to keep the average CPU usage of our app at 50%. However, you might also see a higher number, like 240%, which represents the current CPU usage. This just means that our app is currently using more CPU than the target set by the HPA.

Here's what you can expect: After creating the HPA, you might notice that for a short period, usually about 15 seconds, the "TARGETS" column will show "<unknown>/50%". This is completely normal. It just means that the HPA is gathering information about how much CPU our app is using and calculating the average over time. By default, the HPA checks these metrics every 15 seconds.

If you want to see what's happening behind the scenes, you can check the events that the HPA generates by using:

Here's what you might see in the output: Check out the list of events, and you'll notice something interesting. You'll see how the HPA is doing its job by automatically increasing the number of copies of our app.

Annotations: CreationTimestamp: Mon, 28 Feb 2022 10:10:50 +0200 Reference: Deployment/myapp-test Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 240% (48m) / 50% Min replicas: 1 Max replicas: 3 Deployment pods: 3 current / 3 desired ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target `} />

In a real-world situation, it's better to use a specific YAML file to create each HPA. This makes it easier to keep track of changes by saving the file in a Git repository. It also allows you to revisit the file later if you need to make any adjustments.

Before we move on to the next step, let's remove the "myapp-test" deployment and the HPA that we created earlier. This will clean up our environment and get it ready for the next stage.

Next, we're going to see how HPAs work in two different situations:

Steady CPU Usage: Here, the application will be doing tasks that use up a lot of CPU power consistently.
Simulated Traffic: In this case, we'll pretend that there's a sudden surge of people using our application by sending a lot of requests to it over the internet. This will help us understand how the HPA reacts to increased demand.

Letting Apps Grow Automatically with Metrics Server

During this step, we'll see how HPAs work in two scenarios:

Active Workload: We'll have an app that's consistently performing tasks that require a lot of computer power.
High Volume Usage: We'll simulate a scenario where many users are accessing our web app by sending it a high volume of rapid requests using a script.

Scenario 1 - Keeping Busy with CPU-Intensive Tasks

In this scenario, we'll create a basic program using Python. This program will stay busy by doing tasks that require a lot of computer power. Below is the Python code:

You can deploy the code using a file called "constant-load-deployment-test" from the Starter Kit repository. This file sets up everything needed for your program to run.

To get started, first, you need to copy the Starter Kit repository to your computer. Then, go to the folder where you copied it.

Next, let's create the deployment for our program using a command called "kubectl". We're also creating a separate area, called a "namespace", to make it easier to see what's happening.

Note: The deployment file included in this repository sets limits for the resources (like CPU and memory) that the sample application Pods can use. This is important because the HPA needs these limits to work properly. It's a good idea to set resource limits for all your application Pods to make sure they don't use up too much of your cluster's resources.

Check to make sure that the deployment was created without any issues and that it's now running as expected.

Here's what you might see in the output: You'll notice that there's only one copy of the application running at the moment.

After that, let's set up the "constant-load-hpa-test" resource in your cluster using the "kubectl" command.

This command will make a HPA resource, which will keep an eye on the sample deployment we made earlier. You can check how the "constant-load-test" HPA is doing by using:

You'll see some details on the screen. Look for the part that says "REFERENCE". It shows that the HPA is keeping an eye on our "constant-load-deployment-test" deployment. Also, check out the "TARGETS" section. It tells us how much CPU our app is using.

You may also notice in the above information that the number of copies of our sample app, shown in the "REPLICAS" column, went up from 1 to 3. This matches what we set in the HPA configuration. Since the app in our example quickly creates load, the whole process of scaling happened pretty fast.If you want to see more details about what the HPA did, you can check its events using the command: "kubectl describe hpa -n hpa-constant-load".

Scenario 2 - Testing External Traffic

In this scenario, we'll create a more realistic test where we simulate external users accessing our application. To do this, we'll use a different area, called a namespace, along with a set of files to observe how our application behaves separately from the previous test.

You're going to test a sample server called "quote of the moment". When you send it an HTTP request, it sends back a different quote each time. To put pressure on the server, you'll send a lot of requests really quickly, about one every millisecond.

To get started, you need to set up the "quote" deployment and service using a command called "kubectl". Before you do that, make sure you're in the right folder on your computer, called "Kubernetes-Starter-Kit-Developers.

Now, let's make sure that the quote application deployment and services are working correctly.

Here's how the output might look:

80/TCP 3m5s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/quote 1/1 1 1 3m5s NAME DESIRED CURRENT READY AGE replicaset.apps/quote-6c8f564ff 1 1 1 3m5s `} />

After that, let's set up the HPA for the quote deployment using the "kubectl" command:

Now, let's see if the HPA resource is set up and working correctly:

Here's how the output might look:

In this case, it's important to note that we've set a different value for the CPU usage threshold, and we're also using a different approach to scale down. Here's how the configuration for the "external-load-test" HPA looks:

In this setup, we've changed how quickly the autoscaler reacts when scaling down, setting it to 60 seconds instead of the default 300 seconds. This isn't typically necessary, but it helps speed up the process for this specific test. Usually, the autoscaler waits for 5 minutes after scaling before making more changes. This helps prevent rapid changes and keeps things stable.

In the last step, you'll use a script from this repository to put pressure on the target application, which is the quote server. This script quickly makes a bunch of HTTP requests, pretending to be users accessing the server. This helps simulate a lot of external activity, which is useful for demonstration purposes.

Make sure to open two separate windows on your computer screen so you can see the results more clearly. You can use a tool like "tmux" for this.In the first window, run the script called "quote service load test". You can stop the script at any time by pressing Ctrl+C

In another window, use the command "kubectl watch" with the "-w" flag to keep an eye on the Horizontal Pod Autoscaler (HPA) resource. This will let you see any changes to the HPA in real-time.

Check out the animation below to see the results of the experiment:

Next, you'll discover how to adjust the size of your applications based on specific metrics from Prometheus. For instance, you can make your deployments bigger or smaller depending on how many times your application gets visited with HTTP requests, instead of just looking at how much CPU or memory it's using

Automatically Scaling Applications with Prometheus: Beyond CPU Metrics

In the previous steps, you learned how to make your applications bigger or smaller based on how much computer power they use (like CPU). But, you can also do this with other things, not just computer power. For example, you can use a tool called Prometheus to keep track of how many times people visit your website (like with HTTP requests). Then, you can tell your system to automatically adjust your website's size based on how many visitors it's getting. This way, if a lot of people are coming to your site, it can automatically become bigger to handle the traffic!

To do this, you'll need to set up something called the "prometheus-adapter." It's like a special tool that helps Prometheus talk to Kubernetes, which is the system managing your applications. The prometheus-adapter helps them understand each other better, like a translator between them.

If you want something quick and simple that uses less computer power, go for metrics-server. It gives you basic info like how much your computer's brain (CPU) and memory are working. But, if you need more detailed control and want to adjust your apps based on other things besides just CPU and memory, then choose Prometheus with the prometheus-adapter. It's like having a more advanced system that can handle a lot more information.

Before you start, make sure you have something called "Prometheus Operator" set up in your system. You also need to know a bit about "ServiceMonitors." If you don't have them set up yet, you can follow the instructions in the "Prometheus Monitoring Stack" chapter from the Starter Kit repository. Once that's ready, you need to create something called a "ServiceMonitor" to keep an eye on how your application is doing. This will send the information to the Kubernetes system through the prometheus-adapter. After that, the system can adjust your applications based on this information using something called the "horizontal pod auto scaler.

Simple Steps for Scaling Applications with Custom Metrics Using Prometheus

Install the prometheus-adapter: First, you need to set up something called "prometheus-adapter" in your system.

Tell Prometheus about your metrics: Then, you let Prometheus know what information to keep an eye on from your applications. We do this by creating something called "ServiceMonitors.

Show your metrics to the system: After that, you tell the prometheus-adapter to share your application's custom metrics with the Kubernetes system. This is done by defining "discovery rules.

Tell the system how to adjust: Lastly, you create something called an "HPA" (horizontal pod auto scaler) that focuses on your application. You configure it to change the size of your application based on the custom metrics you've set up.

Easy Install: Prometheus Adapter with Helm

You can install the Prometheus adapter using Helm, which is a tool that helps with managing software on your system. Here's how:

First, copy the Starter Kit to your computer by using a command called "clone." Then, go to the folder where you put the Starter Kit on your computer.
Now, include the prometheus-community Helm repo on your system, and see what kinds of charts (software packages) are available.

The result you see will be something like this:

The chart we're interested in is called "prometheus-adapter." This is the one that will set up prometheus-adapter on your system. You can find more information about it on the prometheus-adapter chart page.Next, open the prometheus-adapter Helm values file from the Starter Kit repository using a text editor. It's best to use one that supports YAML linting. For example, you can use VS Code.

Be sure to change the Prometheus endpoint setting according to your setup, and the instructions are there in the Helm values file.

Once you've made the necessary changes, save the file. Now, install a prometheus-adapter using Helm. It will also create a special space for the prometheus-adapter called a "namespace.

We're using a particular version of the prometheus-adapter Helm chart, specifically version 3.3.1. This version matches with the 0.9.1 release of the Prometheus-adapter application (you can find this information in the output from Step 2).Choosing a specific version is a good idea because it gives us control and predictability. It helps ensure that we get the expected results, and it's easier to manage different versions using tools like Git.

Evaluating Setup: What to Look For

Check to see if prometheus-adapter has been successfully set up by:

The result you want to see will look something like this (pay attention to the word "deployed" in the STATUS column):

Now, take a look at the status of the resources in the special space we created for the prometheus-adapter, called the "prometheus-adapter namespace.

The result you're aiming for will resemble this (pay attention to the deployment and replicaset resources, they should be in good health and the count should be 2):

443/TCP 2m55s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-adapter 2/2 2 2 2m55s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-adapter-7f475b958b 2 2 2 2m55s `} />

Wait for a little bit and then ask the system for information using something called the "custom.metrics.k8s.io API." Save the answers to a file

custom_metrics_k8s_io.json`} />

Open the file named "custom_metrics_k8s_io.json," and you'll see information about special metrics that the Kubernetes system is sharing.

If what you see matches the example above, it means you've set up the prometheus-adapter correctly. Now, let's move on to the next step where you'll discover how to create a simple application. This application will have its own special metrics, and we'll make sure Prometheus knows how to keep an eye on them by setting up something called a "ServiceMonitor.

Setting Up a Practice Application with Prometheus Example Metrics

Let's set up a practice application to see if our system is working correctly. We're going to use an example application called "prometheus-example," and it gives us information about how our website is doing. Here are the things it will tell us:

Total incoming requests:

a. Metric: http_requests_total

b. What it tells us: The overall number of people coming to our website.

Duration of all requests:

a. Metric: http_request_duration_seconds

b. What it tells us: How long, on average, people spend on our website.

Total count of all requests:

a. Metric: http_request_duration_seconds_count

b. What it tells us: The total number of all visits to our website

Total duration of all requests

a. Metric: http_request_duration_seconds_sum

b. What it tells us: The combined time, in seconds, everyone spends on our website.

Histogram of request durations:

a. Metric: http_request_duration_seconds_bucke

b. What it tells us: A fancy way of showing the range of time people spend on our website.

This will help us test if our system is set up correctly.

Once you've deployed the prometheus-example application, you'll want to test out how the automatic scaling works with custom metrics. To do this, you'll need to send a bunch of requests to the prometheus-example service and see how it responds by adjusting its size based on the number of requests.

Before you do that, make sure you're in the right folder on your computer where you copied the Starter Kit.

Next, let's put the prometheus-example application into action. You can do this by using a special set of instructions written in a file called a "YAML manifest" from the Starter Kit. This file will tell your system to create the prometheus-example application along with some other necessary things, like a service. It also sets up a special area, known as a "namespace," called prometheus-custom-metrics-test, just to make sure everything is working smoothly.

After you've set up the prometheus-example application, double-check to make sure everything was created correctly in the special area we made called "prometheus-custom-metrics-test namespace.

You'll see something like this in the results (make sure the "prometheus-example-app" is listed and marked as "up and running," along with the associated service).

80/TCP 10s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-example-app 1/1 1 1 10s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-example-app-7455d5c48f 1 1 1 10s `} />

Preparing Prometheus for Application Monitoring

Before we set up something called a "ServiceMonitor" for our application, let's make sure our Prometheus system is ready for it. Follow these steps:

Start by finding out which Prometheus instances are in your system. The Starter Kit uses a place called the "monitoring namespace," but you might need to adjust this based on how your system is set up.

The result you'll see will look something like this:
Now, choose the Prometheus instance you found in the last step (if there's more than one, select the one that matches your setup). Look for something called "serviceMonitorSelector.matchLabels" and note down its value.

You'll see something like this in the results (keep an eye out for a label called "release").

By default, each Prometheus instance is set up to find only service monitors that have a certain label. To make sure our prometheus-example-app ServiceMonitor gets noticed by Prometheus, we need to give it a label called "release" with the value "kube-prom-stack."

Before you do anything else, make sure you're in the right folder on your computer where you copied the Starter Kit.

Next, take a look at a file called "prometheus-example-service-monitor" from the Starter Kit on your computer. You can use a program like VS Code, which is a good choice because it helps with checking if everything is written correctly in this special file called YAML.

In the part of the file that talks about "metadata.labels," make sure to include the label we found earlier (it's called "release" and should have the value "kube-prom-stack"). The ServiceMonitor file will look something like this:

Last step! Let's make the ServiceMonitor we need. This ServiceMonitor will tell Prometheus to keep an eye on the /metrics information from our prometheus-example-app.

Once you've finished the steps above, you'll notice a new item in the Targets panel on the Prometheus dashboard. To view this dashboard, you need to use a command to make it accessible. Here's an example command using the Starter Kit naming conventions. Please adjust it based on how your system is set up:

The result you'll see looks something like this (look for "prometheus-example-app" in the list of discovered targets):

Open a web browser and go to "localhost:9090". Then, click on "Status" and choose "Targets". You'll see that the target we added has been included.

Enabling Kubernetes Access to Custom Metrics with Prometheus Adapter

Even though Prometheus can find and gather data about your application's special metrics, the prometheus-adapter needs some help to share these metrics with Kubernetes. To make this happen, you have to set up a few rules called "discovery rules." These rules will guide the prometheus-adapter in exposing your application's custom metrics.

Quoting from the official guide:

The adapter decides which metrics to show and how to show them using a set of "discovery" rules. Each rule works on its own (so they shouldn't overlap), and it outlines the steps for the adapter to expose a metric in the API.

Each rule has roughly four parts:

Discovery:
- How the adapter should find all Prometheus metrics for this rule.
Association:
- How the adapter should figure out which Kubernetes resources a specific metric is related to.
Naming:
- How the adapter should display the metric in the custom metrics API.
Querying:
- How a request for a certain metric on one or more Kubernetes objects should be translated into a question to Prometheus.

A regular definition for a discovery rule looks something like this:

>" name: matches: "^(.*)_total" as: "${1}_per_second" metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)" `} />

Let's break down the configuration into simpler parts:

seriesQuery:
- This part represents the metric you want to track, like the total number of HTTP requests shown by the application's /metrics endpoint. It tells the prometheus-adapter to choose the http_requests_total metric for all your application Pods that are not null (pod!=""). This is like telling it what to look for.
resources.template:
- Think of this as a template that Prometheus uses to understand where the metrics are coming from (like from a Pod). This part helps the prometheus-adapter figure out the resource that exposes the metrics (e.g., Pod). It's like connecting the metric to the thing it's related to.
name:
- Here, you're giving a new name to the rule. In simple terms, you're telling the prometheus-adapter to rename http_requests_total to http_requests_per_second. Essentially, you're saying you're interested in the number of HTTP requests per second, not just a simple count. This is about making the metric name more meaningful.
metricsQuery:
- This is a fancy term for a parameterized query in Prometheus language (PromQL). It's like a way of asking Prometheus for information. In this case, it calculates the rate of HTTP requests on average over a set period (like 2 minutes). It's the way of telling the prometheus-adapter how to interpret the metric.

Now that you've learned how to create discovery rules for prometheus-adapter, let's apply this knowledge. Follow these simple steps to tell prometheus-adapter to use the rules you've just set up:

Start by making sure you're in the right folder on your computer where you copied the Starter Kit.
After that, open a file called "prometheus-adapter Helm values" from the Starter Kit on your computer. You can use a program like VS Code, which is a good choice because it helps check if everything is written correctly in this special file called YAML.
Find a part in the file called "rules," and remove the commenting symbols (usually hashtags or double slashes) from the beginning of the lines. When you're done, it should look something like this
First, keep a file where we list all the settings we want to use. Then, when we want to make changes to our project, we use Helm to update it with these new settings.

Note: When there aren't any requests coming in, we only see the version metric. To start seeing more metrics, like how many times our website is visited, we need to do something: generate some HTTP requests. This means we need to simulate people visiting our website by sending requests to it.

Open a web browser and go to http://localhost:8080. Then, just refresh the homepage of the application a few times.

If everything's working correctly, you'll be able to check a new metric by using the custom metrics API. To make it easier to read the results, you can install a tool called jq.

The result you see might look something like this:

When you look at the output above, you'll see a metric called http_requests_per_second with a value of 0. This is expected because we haven't put any pressure on the application yet.

Now, for the last step, we'll set up something called HPA for the deployment of our application. Then, we'll create some activity on the application using a script called custom_metrics_service_load_test, which you can find in the Starter Kit repository.

Setting Up and Testing HPAs with Custom Metrics

Creating and testing HPAs (Horizontal Pod Autoscalers) with custom metrics is pretty much like what we did with the metrics server examples. The only thing that changes is how we measure the application's performance, which in this case uses custom metrics like http_requests_per_second.

A typical HPA definition based on custom metrics looks something like this (important fields are explained as we go along):

First, go to the folder where you saved the Starter Kit on your computer:

Next, set up the prometheus-custom-metrics-hpa resource in your cluster by using kubectl:

The command above creates something called an HPA, which stands for Horizontal Pod Autoscaler. It's set up to watch over the sample deployment we made earlier. You can see how the HPA is doing by using:

The result you'll see looks something like this (pay attention to the REFERENCE column, which points to the prometheus-example-app deployment, and the TARGETS column, showing the current number of http_requests_per_second):

In the last step, you'll use a script provided in this repository to put some pressure on the target (which is the Prometheus-example-app). This script will make a bunch of quick HTTP requests, acting like multiple users accessing the app simultaneously (it's enough for showing how things work).

To make it easier to see what's happening, it's best to split your screen into two separate windows. You can do this with a tool like tmux. Then, in the first window, you'll run a script called custom_metrics_service_load_test. You can stop it anytime by pressing Ctrl+C.

Next, in the second window, you'll want to keep an eye on what's happening with the HPA resource. To do this, you can use a command called kubectl watch with the -w flag. This command will continuously show you updates in real-time.

You can watch as the autoscaler starts working when the load increases (while the load generator script is running). It'll increase the number of replicas for the prometheus-example-app deployment. Once you stop the load generator script, there's a waiting period, and after about 5 minutes, the number of replicas goes back to the original value of 1.

Highlighted Phases:

Phase 1: This is when things are ramping up. You'll see the HPA gradually increasing the number of replicas from 1 to 8 as the initial load increases from around 2140 milli-requests per second to a more manageable 620 milli-requests per second with more Pods added.

Phase 2: Things start to stabilize here. The current load has small ups and downs, staying between 520-540 milli-requests per second.

Phase 3: In this phase, there's a sudden increase in load, going over 10% of the threshold value to 562 milli-requests per second. Since we're out of the hysteresis window, the HPA adds more replicas (9) to stabilize the system. This quickly brings the load back down to around 480 milli-requests per second.

Phase 4: Here, we stop the load generator script. You'll see the application's load decrease rapidly. After about 5 minutes (the default cooldown time), the number of replicas goes back to the minimum value of 1.

Let's say we want to keep the number of HTTP requests close to a certain limit, like our threshold. The HPA won't keep increasing the number of replicas if the average number of requests is already close to the threshold (let's say within ± 10%). Even if we haven't hit the upper limit yet, this helps prevent constant changes in the number of replicas. This idea is called hysteresis. It's like a stabilizing factor that helps avoid bouncing back and forth between different replica counts. Hysteresis is important because it keeps systems more stable, preventing constant ups and downs.

Conclusion

In this guide, you discovered how to make your application adjust its size based on its needs. We used some tools to measure this, like metrics-server and Prometheus.You also got to see how this works in real life. The HPAs automatically change your application's size to handle more visitors or traffic, keeping everything running smoothly.

Excited to try it out? Give it a go with your own applications and see how it works!