PCF Archive - Bitwise

Auto Scaling Applications on Pivotal Cloud Foundry

Kedar Joshi — Fri, 06 Mar 2020 11:27:00 +0000

Setup App Auto Scaler

To enable auto-scaling, the application must be bound to the App Auto Scaler service. This service can be instantiated from App manager or using CF CLI (which needs App Auto Scaler plugins for the CLI to be enabled).

Application can be bound to App Auto Scaler service using:

Apps manager User interface
CF command-line interface

Configuring Auto Scaler

Once the application has been bound to App Auto Scaler service, it can be configured using various parameters (i.e. auto scaling rules) which we will be seeing in brief below.

Configuring the scaling rules can also be achieved through Apps manager user interface or CF CLI.

The following are some useful CLI commands for configuring auto-scaling which are self-explanatory:

Similar to the `manifest.yml` file, auto-scaling rules can be maintained in app auto scaler YML file as seen below. We can give any name to this file.

Here auto-scaling has been configured using the parameter “CPU utilization”. If CPU utilization goes below 40% then CF will scale down the application up to MIN of 2 instances whereas if CPU utilization reaches above 70% then CF will scale to the application up to MAX of 4 instances.

Let’s say we have created a YML file with above mentioned scaling health rule as “demoApp-AutoScalar.yml” at the same level where our build file is. Then we can use the below command for configuring auto-scaling for our app names “DemoMyApp”.

cf configure-autoscaling DemoMyApp demoApp-AutoScalar.yml

I would highly recommend using YML file configuration as it can be maintained alongside of code base and provides advantages considering modern style deployment approach like Blue-Green deployment, etc.

How App Auto Scaler Determines When to Scale

App Auto scalar service determines whether to ramp up / ramp down application instance or maintain the current number of instance(s) by averaging the values of configured metric for the last 120 seconds.

After every 35 seconds, App Auto Scaler service evaluates whether to auto-scale the application or not by following the approach mentioned above.

App Auto Scaler scales the apps as follows:

Increment by one instance when any metric exceeds the High threshold specified
Decrement by one instance only when all metrics fall below the Low threshold specified

Understanding Auto Scale Health Rule

The table below lists the metrics that you can use App Auto Scaler rules on:

Metric	Description	rule_type
CPU Utilization	Average CPU utilization for all instances of APP	memory
Memory Utilization	Average memory percentage for all instances of App	memory
HTTP Throughput	Total App request per second divided number of instances	http_throughtput
HTTP Latency	Average latency of application response to HTTP request	http_latency
Rabbit MQ Depth	Queue length of the specified queue	rabbitmq

It is very important to understand application performance while applying scaling rules on HTTP Throughput and HTTP Latency. The following points should be considered while applying scale rules on throughput or latency of HTTP requests:

Initial number of application instances.
Performance benchmarking results of the application (to understand at what load application performance starts to deteriorate) and how many instances are needed to avoid going beyond that load.
While calculating HTTP Latency time, any backend service/database communication should also be taken into consideration, and if there is any proportional deterioration in backing services, they should be taken into account so as to not escalate an already deteriorated situation.
While setting up the rule on HTTP request, we should consider peak time traffic coming to application which helps to configure auto-scaling in an efficient manner. Your max instances for autoscaling should also be able to accommodate traffic considering the unavailability of other datacenters your app may be hosted on.

While setting up the Rabbit MQ based scale rule, –subtype is a required field which holds name of the queue. For example, as seen below, we can also configure more than 1 rabbitmq queues.

Newer versions of CF also allows to set autoscaling based on a combination multiple metrics, such as those identified below:

With the recent release of CF, we can also create custom metrics based on which we can configure auto scaling for our application.

Schedule Application Auto Scaler

It is best to set up auto-scaling with multiple rules to handle rare scenarios, such as an overnight increase in traffic due to holiday seasons like Thanksgiving. These kinds of occurrences can be scheduled ahead of time.

PCF Auto Scaler provides functionality to schedule “auto-scaling” to handle for rare ‘known’ events which may impact application availability/performance.

This can be achieved from Apps manager. For this go to your deployed application which is bound to app-auto scalar service and select ‘Manage scaling’ and select ‘schedule Limit Change’. Below is the sample rule setup:

The above configuration will scale up the application on Nov 14, 2019, at 8 PM and will scale down the application on Nov 15, 2019, at 8 PM.

Challenges While Configuring Auto Scaling

As mentioned in PCF Auto-Scaler known issues official documentation, some of the commands to enable or disable autoscaling from CLI may not be supported in future versions of the CLI, so it is best to stick with the apps manager or the autoscaler API for now.

It is very important while configuring application auto-scaling that we are selecting correct metrices. Improper metrics might result in unexpected results.

Consider the following scenario — it may seem like a good idea to scale on http latency since latency or response time seems like a good indicator of when the application is under load and may need to scale. Say, typically your app is taking 500 ms to respond. If there is a considerable load on the application, you would expect the response time to go up. But that may not always be true. Consider you app is under a DDOS attack. Most of the input coming to the app now is invalid and your app processes them in under 20 ms. If there are 1000s of such requests, it will actually bring down the average response time of your app, and your app may actually scale down instead of scale-up. In such scenarios, it might be better to combine multiple metrics such as CPU, http throughput, http latency or use some custom metric for scaling.

Thus, we have seen that, if used properly, application autoscaling can be an important tool to ensure the reliability and availability of your application. For related information, check out our webinar.

The post Auto Scaling Applications on Pivotal Cloud Foundry appeared first on Bitwise.

Blue Green Deployment Strategy for Cloud based Services

Dominic Gonsalves — Mon, 10 Feb 2020 11:58:00 +0000

Understanding Blue Green Deployment Strategy

This strategy requires we have two versions on the production environment. One is the version that is current and LIVE in Production (this we can call Blue version). Other is the new version which we plan to promote and make LIVE (this we can call Green version).

After deployment of the new version (Green) we will do some health checks and perform sanity tests to ensure this new version is safe to promote to LIVE traffic. Once the green version has been validated, we may choose to switch traffic to Green version. Now the Green version gets all the Live Traffic.

We can choose to keep the Blue version or discard it. At any point during the Blue Green deployment, if the Green version validation fails we can choose to roll back to the previous (Blue) version.

Challenges with Blue Green Deployment Strategy

One of the challenges with this strategy is making the application backward compatible as both the Blue and Green version would be running in parallel. Usually, if there is only application code change this should not be a big deal.

The real challenge comes when the new version of the application requires a database structure change like rename of column or dropping a column. One way to work around this is to design your database changes in a phased manner where the initial change will not modify existing object attributes but will add new ones.

Once everything has been tested a migration can be done. However, this ties the development strategy to the deployment and is one of the challenges that come with Blue Green deployment.

Implementations

Router Based

In router-based approach the traffic flow to the Live version of the service and the new version is controlled and switched via the Cloud Foundry (CF) router. We will try to understand it in a sequence of steps.

Say we have a simple service that gives us the weather for a location. The current version of this service in production is v1. This is the Blue version. Now we want to promote a new version v1.1. This will be the Green version.

As you can see the weather API is accessible via the URL weather.demo.com. So any request coming for weather API is routed via the CF router to the current Live version of production (v1). The new version v1.1, though deployed, is not accessible yet via any URL. Now let us make the version accessible via a temporary URL. This can be done through Command Line Interface (CLI) command as below:

$ cf push green –n weather-green.

Now any request for weather API via the production URL weather.demo.com continues to be routed to the current production version while the new version will be accessible via the new temporary URL weather-green.demo.com

Now the developers and testers can validate the new version via the temporary URL. If validation of the new version is successful, we can also bind the original URL (route) to the new version.

$ cf map-route green demo.com -n weather

The router now load balances the requests for URL weather.demo.com between version v1 and v1.1 of the Weather API.
After some time, if we are able to verify the new version is running without any problems, we can unmap the production URL from the Blue version (v1). We can also go ahead and unmap and then optionally remove the temporary route mapped to the new version.

$ cf unmap-route blue example.com -n weather

$ cf unmap-route green example.com -n weather-green

This way we have actually promoted a new version of weather API into production without any downtime.

Service Discovery Based

In service discovery-based approach we use a service registry where services will be registered. Let’s take for example Netflix Eureka service registry. So, consumers of the service will not directly invoke specified endpoint URLs but will lookup URLs for services they want to invoke from the registry and then invoke those URLs.

We first need to make the service instances Discoverable. We do this by enabling Discovery Client with the annotation @EnableDiscoveryClient on the Spring Boot app main class. Before that, we need to add the below dependency into our Spring Boot project.

compile(‘org.springframework.cloud:spring-cloud-starter-netflix-eureka-client’)

So when we need to switch traffic between Blue and Green instances it is done by registering of a new version of the service with the same name and unregistering the old version (live version). So consumers continue to invoke the service, in the same way, relying on the service registry to provide it with the service URLs. It can be done in stages as below.

Deploy the new version of the service without registering it in the service registry. This is the Green version. The Live version we will call Blue version. We perform validation tests on the Green version independently.

If the tests are good, we register the Green version of the service with the same app name. So now Live traffic goes to both blue and green instance.

If everything seems normal, we unregister the Blue version and now live traffic goes only to Green instance.

Canary Deployment

A variant of Blue Green deployment is the canary deployment (coarse-grained canary). The main goal of this strategy is to minimize the impact to users due to rolling out an erroneous version of the application into production. This can be explained in steps as below.

Install the application to a server instance where Live production traffic cannot be reached.
After internal validation of the application, we can start to route a small subset of the LIVE traffic to the new version. This can be done at the Router. Say we want to only allow internal company users to first use it and then slowly to users in a city, state or country and so on.
Anytime during this process, if a critical issue is identified we can roll back the new version.
If all looks good, we can route all the traffic to the new version and decommission the old version or hold it for some time as a backup.

This is one way to achieve coarse grained canary deployments without any special setup.

Future Outlook

PCF Native Rolling App Update (Beta)

PCF 2.4 natively supports ZERO Downtime rolling deployment feature. This is however in Beta mode and you would need CLI v6.40 or later to use this feature. However, this is not a full feature Blue-Green deployment process, rather it allows you to perform a rolling app deployment. Below are some of the commands that support this:

Deployment (Zero downtime): cf v3-zdt-push APP-NAME

Cancel deployment (No Zero downtime guarantee): cf v3-cancel-zdt-push APP-NAME

Restart (Zero downtime): cf v3-zdt-restart APP-NAME

However, before using these commands it must be noted that these are in beta phase and there are some limitations of the use. For more information, PCF documentation must be referred.

Native Fine Grained Canary (beta)

PCF is in the process of replacing its go router implementation with service mesh (Istio) based solution. This will allow for lots of new exciting capabilities including weighted routing. Weighted routing natively allows you to send percentage based traffic to the canary app.

We will look at these upcoming capabilities in a future article. For related information, check out our webinar.

The post Blue Green Deployment Strategy for Cloud based Services appeared first on Bitwise.