CreationTimestamp: Mon, 28 Feb 2022 10:10:50 +0200
Reference: Deployment/myapp-test
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 240% (48m) / 50%
Min replicas: 1
Max replicas: 3
Deployment pods: 3 current / 3 desired
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
`} />
In a real-world situation, it's better to use a specific YAML file to create each HPA. This makes it easier to keep track of changes by saving the file in a Git repository. It also allows you to revisit the file later if you need to make any adjustments.
Before we move on to the next step, let's remove the "myapp-test" deployment and the HPA that we created earlier. This will clean up our environment and get it ready for the next stage.
Next, we're going to see how HPAs work in two different situations:
- Steady CPU Usage: Here, the application will be doing tasks that use up a lot of CPU power consistently.
- Simulated Traffic: In this case, we'll pretend that there's a sudden surge of people using our application by sending a lot of requests to it over the internet. This will help us understand how the HPA reacts to increased demand.
Letting Apps Grow Automatically with Metrics Server
During this step, we'll see how HPAs work in two scenarios:
- Active Workload: We'll have an app that's consistently performing tasks that require a lot of computer power.
- High Volume Usage: We'll simulate a scenario where many users are accessing our web app by sending it a high volume of rapid requests using a script.
Scenario 1 - Keeping Busy with CPU-Intensive Tasks
In this scenario, we'll create a basic program using Python. This program will stay busy by doing tasks that require a lot of computer power. Below is the Python code:
You can deploy the code using a file called "constant-load-deployment-test" from the Starter Kit repository. This file sets up everything needed for your program to run.
To get started, first, you need to copy the Starter Kit repository to your computer. Then, go to the folder where you copied it.
Next, let's create the deployment for our program using a command called "kubectl". We're also creating a separate area, called a "namespace", to make it easier to see what's happening.
Note: The deployment file included in this repository sets limits for the resources (like CPU and memory) that the sample application Pods can use. This is important because the HPA needs these limits to work properly. It's a good idea to set resource limits for all your application Pods to make sure they don't use up too much of your cluster's resources.
Check to make sure that the deployment was created without any issues and that it's now running as expected.
Here's what you might see in the output: You'll notice that there's only one copy of the application running at the moment.
After that, let's set up the "constant-load-hpa-test" resource in your cluster using the "kubectl" command.
This command will make a HPA resource, which will keep an eye on the sample deployment we made earlier. You can check how the "constant-load-test" HPA is doing by using:
You'll see some details on the screen. Look for the part that says "REFERENCE". It shows that the HPA is keeping an eye on our "constant-load-deployment-test" deployment. Also, check out the "TARGETS" section. It tells us how much CPU our app is using.
You may also notice in the above information that the number of copies of our sample app, shown in the "REPLICAS" column, went up from 1 to 3. This matches what we set in the HPA configuration. Since the app in our example quickly creates load, the whole process of scaling happened pretty fast.If you want to see more details about what the HPA did, you can check its events using the command: "kubectl describe hpa -n hpa-constant-load".
Scenario 2 - Testing External Traffic
In this scenario, we'll create a more realistic test where we simulate external users accessing our application. To do this, we'll use a different area, called a namespace, along with a set of files to observe how our application behaves separately from the previous test.
You're going to test a sample server called "quote of the moment". When you send it an HTTP request, it sends back a different quote each time. To put pressure on the server, you'll send a lot of requests really quickly, about one every millisecond.
To get started, you need to set up the "quote" deployment and service using a command called "kubectl". Before you do that, make sure you're in the right folder on your computer, called "Kubernetes-Starter-Kit-Developers.
Now, let's make sure that the quote application deployment and services are working correctly.
Here's how the output might look:
80/TCP 3m5s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/quote 1/1 1 1 3m5s
NAME DESIRED CURRENT READY AGE
replicaset.apps/quote-6c8f564ff 1 1 1 3m5s
`} />
After that, let's set up the HPA for the quote deployment using the "kubectl" command:
Now, let's see if the HPA resource is set up and working correctly:
Here's how the output might look:
In this case, it's important to note that we've set a different value for the CPU usage threshold, and we're also using a different approach to scale down. Here's how the configuration for the "external-load-test" HPA looks:
In this setup, we've changed how quickly the autoscaler reacts when scaling down, setting it to 60 seconds instead of the default 300 seconds. This isn't typically necessary, but it helps speed up the process for this specific test. Usually, the autoscaler waits for 5 minutes after scaling before making more changes. This helps prevent rapid changes and keeps things stable.
In the last step, you'll use a script from this repository to put pressure on the target application, which is the quote server. This script quickly makes a bunch of HTTP requests, pretending to be users accessing the server. This helps simulate a lot of external activity, which is useful for demonstration purposes.
Make sure to open two separate windows on your computer screen so you can see the results more clearly. You can use a tool like "tmux" for this.In the first window, run the script called "quote service load test". You can stop the script at any time by pressing Ctrl+C
In another window, use the command "kubectl watch" with the "-w" flag to keep an eye on the Horizontal Pod Autoscaler (HPA) resource. This will let you see any changes to the HPA in real-time.
Check out the animation below to see the results of the experiment:
Next, you'll discover how to adjust the size of your applications based on specific metrics from Prometheus. For instance, you can make your deployments bigger or smaller depending on how many times your application gets visited with HTTP requests, instead of just looking at how much CPU or memory it's using
Automatically Scaling Applications with Prometheus: Beyond CPU Metrics
In the previous steps, you learned how to make your applications bigger or smaller based on how much computer power they use (like CPU). But, you can also do this with other things, not just computer power. For example, you can use a tool called Prometheus to keep track of how many times people visit your website (like with HTTP requests). Then, you can tell your system to automatically adjust your website's size based on how many visitors it's getting. This way, if a lot of people are coming to your site, it can automatically become bigger to handle the traffic!
To do this, you'll need to set up something called the "prometheus-adapter." It's like a special tool that helps Prometheus talk to Kubernetes, which is the system managing your applications. The prometheus-adapter helps them understand each other better, like a translator between them.
If you want something quick and simple that uses less computer power, go for metrics-server. It gives you basic info like how much your computer's brain (CPU) and memory are working. But, if you need more detailed control and want to adjust your apps based on other things besides just CPU and memory, then choose Prometheus with the prometheus-adapter. It's like having a more advanced system that can handle a lot more information.
Before you start, make sure you have something called "Prometheus Operator" set up in your system. You also need to know a bit about "ServiceMonitors." If you don't have them set up yet, you can follow the instructions in the "Prometheus Monitoring Stack" chapter from the Starter Kit repository. Once that's ready, you need to create something called a "ServiceMonitor" to keep an eye on how your application is doing. This will send the information to the Kubernetes system through the prometheus-adapter. After that, the system can adjust your applications based on this information using something called the "horizontal pod auto scaler.
Simple Steps for Scaling Applications with Custom Metrics Using Prometheus
- Install the prometheus-adapter: First, you need to set up something called "prometheus-adapter" in your system.
- Tell Prometheus about your metrics: Then, you let Prometheus know what information to keep an eye on from your applications. We do this by creating something called "ServiceMonitors.
- Show your metrics to the system: After that, you tell the prometheus-adapter to share your application's custom metrics with the Kubernetes system. This is done by defining "discovery rules.
- Tell the system how to adjust: Lastly, you create something called an "HPA" (horizontal pod auto scaler) that focuses on your application. You configure it to change the size of your application based on the custom metrics you've set up.
Easy Install: Prometheus Adapter with Helm
You can install the Prometheus adapter using Helm, which is a tool that helps with managing software on your system. Here's how:
- First, copy the Starter Kit to your computer by using a command called "clone." Then, go to the folder where you put the Starter Kit on your computer.
- Now, include the prometheus-community Helm repo on your system, and see what kinds of charts (software packages) are available.
The result you see will be something like this:
- The chart we're interested in is called "prometheus-adapter." This is the one that will set up prometheus-adapter on your system. You can find more information about it on the prometheus-adapter chart page.Next, open the prometheus-adapter Helm values file from the Starter Kit repository using a text editor. It's best to use one that supports YAML linting. For example, you can use VS Code.
- Be sure to change the Prometheus endpoint setting according to your setup, and the instructions are there in the Helm values file.
- Once you've made the necessary changes, save the file. Now, install a prometheus-adapter using Helm. It will also create a special space for the prometheus-adapter called a "namespace.
We're using a particular version of the prometheus-adapter Helm chart, specifically version 3.3.1. This version matches with the 0.9.1 release of the Prometheus-adapter application (you can find this information in the output from Step 2).Choosing a specific version is a good idea because it gives us control and predictability. It helps ensure that we get the expected results, and it's easier to manage different versions using tools like Git.
Evaluating Setup: What to Look For
Check to see if prometheus-adapter has been successfully set up by:
The result you want to see will look something like this (pay attention to the word "deployed" in the STATUS column):
Now, take a look at the status of the resources in the special space we created for the prometheus-adapter, called the "prometheus-adapter namespace.
The result you're aiming for will resemble this (pay attention to the deployment and replicaset resources, they should be in good health and the count should be 2):
443/TCP 2m55s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-adapter 2/2 2 2 2m55s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-adapter-7f475b958b 2 2 2 2m55s
`} />
Wait for a little bit and then ask the system for information using something called the "custom.metrics.k8s.io API." Save the answers to a file
custom_metrics_k8s_io.json`} />
Open the file named "custom_metrics_k8s_io.json," and you'll see information about special metrics that the Kubernetes system is sharing.
If what you see matches the example above, it means you've set up the prometheus-adapter correctly. Now, let's move on to the next step where you'll discover how to create a simple application. This application will have its own special metrics, and we'll make sure Prometheus knows how to keep an eye on them by setting up something called a "ServiceMonitor.
Setting Up a Practice Application with Prometheus Example Metrics
Let's set up a practice application to see if our system is working correctly. We're going to use an example application called "prometheus-example," and it gives us information about how our website is doing. Here are the things it will tell us:
Total incoming requests:
a. Metric: http_requests_total
b. What it tells us: The overall number of people coming to our website.
Duration of all requests:
a. Metric: http_request_duration_seconds
b. What it tells us: How long, on average, people spend on our website.
Total count of all requests:
a. Metric: http_request_duration_seconds_count
b. What it tells us: The total number of all visits to our website
Total duration of all requests
a. Metric: http_request_duration_seconds_sum
b. What it tells us: The combined time, in seconds, everyone spends on our website.
Histogram of request durations:
a. Metric: http_request_duration_seconds_bucke
b. What it tells us: A fancy way of showing the range of time people spend on our website.
This will help us test if our system is set up correctly.
Once you've deployed the prometheus-example application, you'll want to test out how the automatic scaling works with custom metrics. To do this, you'll need to send a bunch of requests to the prometheus-example service and see how it responds by adjusting its size based on the number of requests.
Before you do that, make sure you're in the right folder on your computer where you copied the Starter Kit.
Next, let's put the prometheus-example application into action. You can do this by using a special set of instructions written in a file called a "YAML manifest" from the Starter Kit. This file will tell your system to create the prometheus-example application along with some other necessary things, like a service. It also sets up a special area, known as a "namespace," called prometheus-custom-metrics-test, just to make sure everything is working smoothly.
After you've set up the prometheus-example application, double-check to make sure everything was created correctly in the special area we made called "prometheus-custom-metrics-test namespace.
You'll see something like this in the results (make sure the "prometheus-example-app" is listed and marked as "up and running," along with the associated service).
80/TCP 10s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-example-app 1/1 1 1 10s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-example-app-7455d5c48f 1 1 1 10s
`} />
Preparing Prometheus for Application Monitoring
Before we set up something called a "ServiceMonitor" for our application, let's make sure our Prometheus system is ready for it. Follow these steps:
- Start by finding out which Prometheus instances are in your system. The Starter Kit uses a place called the "monitoring namespace," but you might need to adjust this based on how your system is set up.
The result you'll see will look something like this:
- Now, choose the Prometheus instance you found in the last step (if there's more than one, select the one that matches your setup). Look for something called "serviceMonitorSelector.matchLabels" and note down its value.
You'll see something like this in the results (keep an eye out for a label called "release").
By default, each Prometheus instance is set up to find only service monitors that have a certain label. To make sure our prometheus-example-app ServiceMonitor gets noticed by Prometheus, we need to give it a label called "release" with the value "kube-prom-stack."
Before you do anything else, make sure you're in the right folder on your computer where you copied the Starter Kit.
Next, take a look at a file called "prometheus-example-service-monitor" from the Starter Kit on your computer. You can use a program like VS Code, which is a good choice because it helps with checking if everything is written correctly in this special file called YAML.
In the part of the file that talks about "metadata.labels," make sure to include the label we found earlier (it's called "release" and should have the value "kube-prom-stack"). The ServiceMonitor file will look something like this:
Last step! Let's make the ServiceMonitor we need. This ServiceMonitor will tell Prometheus to keep an eye on the /metrics information from our prometheus-example-app.
Once you've finished the steps above, you'll notice a new item in the Targets panel on the Prometheus dashboard. To view this dashboard, you need to use a command to make it accessible. Here's an example command using the Starter Kit naming conventions. Please adjust it based on how your system is set up:
The result you'll see looks something like this (look for "prometheus-example-app" in the list of discovered targets):
Open a web browser and go to "localhost:9090". Then, click on "Status" and choose "Targets". You'll see that the target we added has been included.
Enabling Kubernetes Access to Custom Metrics with Prometheus Adapter
Even though Prometheus can find and gather data about your application's special metrics, the prometheus-adapter needs some help to share these metrics with Kubernetes. To make this happen, you have to set up a few rules called "discovery rules." These rules will guide the prometheus-adapter in exposing your application's custom metrics.
Quoting from the official guide:
The adapter decides which metrics to show and how to show them using a set of "discovery" rules. Each rule works on its own (so they shouldn't overlap), and it outlines the steps for the adapter to expose a metric in the API.
Each rule has roughly four parts:
- Discovery:
- How the adapter should find all Prometheus metrics for this rule.
- Association:
- How the adapter should figure out which Kubernetes resources a specific metric is related to.
- Naming:
- How the adapter should display the metric in the custom metrics API.
- Querying:
- How a request for a certain metric on one or more Kubernetes objects should be translated into a question to Prometheus.
A regular definition for a discovery rule looks something like this:
>"
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)"
`} />
Let's break down the configuration into simpler parts:
- seriesQuery:
- This part represents the metric you want to track, like the total number of HTTP requests shown by the application's /metrics endpoint. It tells the prometheus-adapter to choose the http_requests_total metric for all your application Pods that are not null (pod!=""). This is like telling it what to look for.
- resources.template:
- Think of this as a template that Prometheus uses to understand where the metrics are coming from (like from a Pod). This part helps the prometheus-adapter figure out the resource that exposes the metrics (e.g., Pod). It's like connecting the metric to the thing it's related to.
- name:
- Here, you're giving a new name to the rule. In simple terms, you're telling the prometheus-adapter to rename http_requests_total to http_requests_per_second. Essentially, you're saying you're interested in the number of HTTP requests per second, not just a simple count. This is about making the metric name more meaningful.
- metricsQuery:
- This is a fancy term for a parameterized query in Prometheus language (PromQL). It's like a way of asking Prometheus for information. In this case, it calculates the rate of HTTP requests on average over a set period (like 2 minutes). It's the way of telling the prometheus-adapter how to interpret the metric.
Now that you've learned how to create discovery rules for prometheus-adapter, let's apply this knowledge. Follow these simple steps to tell prometheus-adapter to use the rules you've just set up:
- Start by making sure you're in the right folder on your computer where you copied the Starter Kit.
- After that, open a file called "prometheus-adapter Helm values" from the Starter Kit on your computer. You can use a program like VS Code, which is a good choice because it helps check if everything is written correctly in this special file called YAML.
- Find a part in the file called "rules," and remove the commenting symbols (usually hashtags or double slashes) from the beginning of the lines. When you're done, it should look something like this
- First, keep a file where we list all the settings we want to use. Then, when we want to make changes to our project, we use Helm to update it with these new settings.
Note: When there aren't any requests coming in, we only see the version metric. To start seeing more metrics, like how many times our website is visited, we need to do something: generate some HTTP requests. This means we need to simulate people visiting our website by sending requests to it.
Open a web browser and go to http://localhost:8080. Then, just refresh the homepage of the application a few times.
If everything's working correctly, you'll be able to check a new metric by using the custom metrics API. To make it easier to read the results, you can install a tool called jq.
The result you see might look something like this:
When you look at the output above, you'll see a metric called http_requests_per_second with a value of 0. This is expected because we haven't put any pressure on the application yet.
Now, for the last step, we'll set up something called HPA for the deployment of our application. Then, we'll create some activity on the application using a script called custom_metrics_service_load_test, which you can find in the Starter Kit repository.
Setting Up and Testing HPAs with Custom Metrics
Creating and testing HPAs (Horizontal Pod Autoscalers) with custom metrics is pretty much like what we did with the metrics server examples. The only thing that changes is how we measure the application's performance, which in this case uses custom metrics like http_requests_per_second.
A typical HPA definition based on custom metrics looks something like this (important fields are explained as we go along):
First, go to the folder where you saved the Starter Kit on your computer:
Next, set up the prometheus-custom-metrics-hpa resource in your cluster by using kubectl:
The command above creates something called an HPA, which stands for Horizontal Pod Autoscaler. It's set up to watch over the sample deployment we made earlier. You can see how the HPA is doing by using:
The result you'll see looks something like this (pay attention to the REFERENCE column, which points to the prometheus-example-app deployment, and the TARGETS column, showing the current number of http_requests_per_second):
In the last step, you'll use a script provided in this repository to put some pressure on the target (which is the Prometheus-example-app). This script will make a bunch of quick HTTP requests, acting like multiple users accessing the app simultaneously (it's enough for showing how things work).
To make it easier to see what's happening, it's best to split your screen into two separate windows. You can do this with a tool like tmux. Then, in the first window, you'll run a script called custom_metrics_service_load_test. You can stop it anytime by pressing Ctrl+C.
Next, in the second window, you'll want to keep an eye on what's happening with the HPA resource. To do this, you can use a command called kubectl watch with the -w flag. This command will continuously show you updates in real-time.
You can watch as the autoscaler starts working when the load increases (while the load generator script is running). It'll increase the number of replicas for the prometheus-example-app deployment. Once you stop the load generator script, there's a waiting period, and after about 5 minutes, the number of replicas goes back to the original value of 1.
Highlighted Phases:
Phase 1: This is when things are ramping up. You'll see the HPA gradually increasing the number of replicas from 1 to 8 as the initial load increases from around 2140 milli-requests per second to a more manageable 620 milli-requests per second with more Pods added.
Phase 2: Things start to stabilize here. The current load has small ups and downs, staying between 520-540 milli-requests per second.
Phase 3: In this phase, there's a sudden increase in load, going over 10% of the threshold value to 562 milli-requests per second. Since we're out of the hysteresis window, the HPA adds more replicas (9) to stabilize the system. This quickly brings the load back down to around 480 milli-requests per second.
Phase 4: Here, we stop the load generator script. You'll see the application's load decrease rapidly. After about 5 minutes (the default cooldown time), the number of replicas goes back to the minimum value of 1.
Let's say we want to keep the number of HTTP requests close to a certain limit, like our threshold. The HPA won't keep increasing the number of replicas if the average number of requests is already close to the threshold (let's say within ± 10%). Even if we haven't hit the upper limit yet, this helps prevent constant changes in the number of replicas. This idea is called hysteresis. It's like a stabilizing factor that helps avoid bouncing back and forth between different replica counts. Hysteresis is important because it keeps systems more stable, preventing constant ups and downs.
Conclusion
In this guide, you discovered how to make your application adjust its size based on its needs. We used some tools to measure this, like metrics-server and Prometheus.You also got to see how this works in real life. The HPAs automatically change your application's size to handle more visitors or traffic, keeping everything running smoothly.
Excited to try it out? Give it a go with your own applications and see how it works!