Welcome_image My team has successfully finished the first stage of the “Train Planet” ticket reservation system and installed it on the Production. You can see the result here. The rollout day finally came and surprisingly I was pretty calm and chilled out because we spent some time implementing non-functional features allowing us to fully diagnose the application on the production environment.

In this article, I would like to highlight seven important steps to reduce your stress level during the rollout process. Feel free to treat this blog post as a readiness checklist before going live.

1. Automatic end-to-end tests and performance tests before the rollout

A good IT specialist knows how important is the testing phase. Before production rollout, it is highly advised to freeze the code and perform end-to-end tests. To be in line with business needs the recommended way is to determine test scenarios covering main or even all business cases and create automatic tests by using dedicated tools like Cucumber, Selenium, Appium, Ranorex, Cypress and many others. This step should cover also performance tests to check how fast is our application and discover application bottlenecks. The helpful tool for that is JMeter.

Our action points:
X Cypress for E2E tests
X JMeter for Performance tests

2. Infrastructure monitoring

When the application is installed on the production it is strongly recommended to collect all infrastructure metrics like the amount of memory consumption, disk space availability, CPU usage, network traffic, and other statistics required to deploy and run our application smoothly. Furthermore, the alerting process should be also defined to notify DevOps or other people about potential issues causing the application crash. Nowadays there is a whole bunch of tools that can monitor the infrastructure, for example, Zabbix, Nagios, Cacti, Dynatrace. If the application is installed on the cloud it is possible to use the Software as a Service (SaaS) tools provided by a cloud operator like AWS CloudWatch or Azure Monitor.

Our action points:
X AWS CloudWatch
X Dynatrace

3. Application monitoring

Many companies have dedicated teams to create and develop the whole application and the separated maintenance teams to take care and monitor the application on production. Both teams have different goals and usually, there is a big surprise when the application changes hands. The development team is usually focused on functional requirements and deadlines. The maintenance team is focused on application stability and smooth functioning. One of the questions from the maintenance team is: where we can find the application metrics? It is possible to create metrics during the development phase or use modern tools to auto-detect the infrastructure, used frameworks and engines, discover application dependencies, track transactions across all tiers and collect potential culprits. The impressive tool is a Dynatrace which allows to track the whole flow inside the application, measure every step like a database query, interaction with external systems or find the bottleneck inside the code. It also allows to monitor Garbage Collector, Docker images, etc.

Our action points:
X AWS CloudWatch
X AWS Route 53 Health Checks
X Dynatrace

4. Log rotation policy and log storage

What a big surprise is when the monitoring system alerts you that the disk storage is full. One of the potential culprits is the application log. Developers love to put all information in the log file to be able to trace bugs or catch unexpected situations. Before the production rollout, the good practice is to review log levels to reduce the amount of information stored in the log file and create the log rotation policy. In cloud environments and microservices approach, it is also good practice to have some log storage to collect data from many microservices. The excellent choice is ELK (Elasticsearch, Logstash, Kibana) stack or Dynatrace Log Storage functionality. It is worth to develop the B3 propagation standard for distributed tracing in microservices environment by using Zipkin or Spring Cloud Sleuth.

Our action points:
X AWS EBS Storage and EBS Snapshots for backup purposes
X Dynatrace

5. Confidential data policy

The GDPR is everywhere. It is required to define the confidential data policy and respect it. It is a huge subject for the separate blog post, so here will be mentioned only a small part - confidential data stored in log files. The application usually interacts with other systems by using credentials (for example user/password tags in SOAP header). Additionally, inside the application, there are functionalities for login purposes, sending emails or other functions processing users data. It is required to review logs and check if crucial data are protected, anonymized or even deleted from the log file. In the Java world, the logging frameworks provide functions allowing to replace or remove a significant part of the log.

Our action points:
X Log4j2 replace function
X Manual audit and custom library for anonymization

6. Upgrade automation and rollback procedure definition

The automatic upgrade approach is very common and most probably your company uses Jenkins, Bamboo or similar tool to perform such a case. There is the flip side of the coin. To be a professional IT company, it is required to define additional action points:

  • rollback procedure in case of upgrade failure
  • maintenance window agreed with the business
  • unavailability time definition (for example 15s for application upgrade or blue-green deployment process without stop-the-world)
  • maintenance page in case of application unavailability

Our action points:
X Jenkins build
X Docker + Docker Compose
X Repository and Docker images tagging
X Amazon Elastic Container Registry
X NGINX configuration to display maintenance page during the upgrade

7. User and conversion rate tracking

The last but not least point is user tracking. Each application, even if it is for internal usage only, has active sessions and user traffic. It is worth to know where the user spends the time and how long it takes. Additionally, when it is the transactional system (e.g. e-shop) the good implementation should also measure conversion rate (how many products were sold and what was the income).

Our action points:
X Google Analytics
X Google Enhanced Ecommerce
X Google Tag Manager
X Dynatrace

Are you ready for Go-live?

Our readiness checklist was shown in this article. It is your turn - define your checkpoint list and choose tools that help you to monitor the application. If you are familiar with such awesome tools, let us know in a comment.

Let’s improve our IT world together!

If you would like to know more about our methodology this is how we do it.

Updated:

Leave a comment