How to solve the test environment routing problem under Spring Cloud

foreword

Since the Spring Cloud Tencent microservice development framework was officially announced at the end of June, it has received very hot attention from many developers. In less than a month, the number of Github Stars has exceeded 2,000, more than 1,000 developers have joined our community, and more than 20 developers have contributed to the project code. The popularity of the project has greatly exceeded our expectations. Verifies our point of view in the original announcement: Spring Boot + Spring Cloud is still a fairly widely used development framework.

During this month, the most concerned question for Spring Cloud Tencent followers is, what is the follow-up plan for Spring Cloud Tencent?

In the past period of time, our main focus has been on the most basic atomic capabilities of service governance in the field of microservices, such as service discovery, dynamic configuration, current limiting and fuse, routing, etc. Other Spring Cloud suites are basically limited to these basic capabilities. However, in the process of practicing Spring Cloud, enterprises find that these atomic capabilities cannot directly provide solutions for their specific business scenarios, and often require secondary development and customization. For example, customizing Spring Cloud Gateway's Filter, enhancing Feign, and supporting various complex service routing scenarios, etc. Therefore, out-of-the-box business-universal solutions are more valuable to businesses.

In summary, one of Spring Cloud Tencent's important follow-up plans is to provide out-of-the-box general business solutions, from tools to solutions, on the basis of continuously consolidating the atomic capabilities of service governance .

To this end, Spring Cloud Tencent has added a spring-cloud-tencent-plugin-starts module, under which solutions for different business scenarios can be implemented. At this stage, we mainly focus on the scenario-based solution of refined traffic governance capabilities, and are divided into three stages according to the development process:

Multiple test environment scenarios in the development and testing phases
Canary release, blue-green release, full-link grayscale, etc. in the release stage
Unitization, AB testing, etc. in the production operation stage

In this issue, we mainly talk about the actual combat of multiple test environment scenarios in the development and testing phase, and introduce in detail the solution of Spring Cloud Tencent to realize multiple test environment scenarios.

1. Basic knowledge

1.1 What is test environment routing

In the actual development process, different microservices under a microservice architecture system may be developed and maintained by multiple teams. Each team only needs to focus on one or more microservices to which it belongs, while the microservices maintained by each team are There may be mutual invocation relationships between services. If a team is developing its own microservice, it needs to verify the complete microservice call chain during debugging. At this point, you need to rely on the microservices of other teams. How to deploy the development joint debugging environment will encounter the following problems:

If all teams use the same development and joint debugging environment, when a team's test microservice instance fails to run normally, it will affect other applications that depend on the microservice and fail to run normally.
If each team has a separate set of development and joint debugging environment, then each team not only needs to maintain the microservice applications of its own environment, but also needs to maintain its own microservice applications of other team environments, which greatly reduces the efficiency. At the same time, each team needs to deploy a complete set of microservice architecture applications, and the cost increases greatly as the number of teams increases.

At this time, the architecture of the test environment routing can be used to help deploy a development and joint debugging environment with simple operation and maintenance and low cost. Test environment routing is an environment governance strategy based on service routing. The core is to maintain a stable baseline environment as the basic environment. The test environment only needs to deploy the microservices that need to be changed. There are two basic concepts of a multi-test environment, as follows:

Baseline Environment: A complete and stable basic environment, which serves as a bottom-line available environment for other environmental traffic paths of the same type. Users should try their best to ensure the integrity and stability of the baseline environment.
Test environment (Feature Environment): A temporary environment, which may only be a development/test environment type. The test environment does not need to deploy the full-link complete service, but only deploys the service that has changed this time, and other services are routed through the service. way to reuse the baseline environment service resources.

After deploying multiple test environments, developers can route test requests to different test environments through certain routing rules. If the test environment does not have corresponding microservices to process requests on the link, it will be downgraded to the baseline environment for processing. Therefore, developers need to deploy the microservices for developing new tests to the corresponding test environment, and the microservices that do not need to be updated or are not managed by the developer reuse the services of the baseline environment to complete the test of the corresponding test environment.

Although test environment routing is a relatively mature solution for developing and testing environments, there are not many production development frameworks that can be used out of the box, and developers often need to develop corresponding functions twice. Therefore, a relatively complete solution is needed to help implement test environment routing, simplify development difficulty and improve development efficiency.

1.2 Service routing

Service Routing Model

Service routing abstracts the most simplified model as shown in the figure below, which solves the problem of " which requests are forwarded to which instances ". In detail, there are three questions: 1. How to accurately identify the request? 2. How to accurately identify the instance? 3. How to forward?

(Figure: Schematic diagram of service routing model)

In the microscopic world of traffic, an entity is identified by tags (attributes) uniformly. For example, the request has the source calling service, the target environment tag, etc., and the service instance has the version number, instance group, environment group and other tags. Service routing is to forward requests that meet label matching conditions to service instances that meet matching conditions. Therefore, the model of service routing can be disassembled into the following professional terms:

Service instance coloring (set label information for service instances)
Traffic coloring (set label information for requests)
Service routing (forwarding the request to the target instance according to the routing policy)

How the service instance label is passed to the caller

When a service instance is registered with the registry, it will bring the label information. When the service caller obtains the service instance information from the registry, it includes the label information of the instance.

Full-link transparent transmission of labels

There is a type of request label data that needs to be transmitted on the service response link all the time, such as TraceId in full-link tracing, FeatureEnv label of test environment routing, etc.

The difference between service routing and load balancing

Service routing and load balancing both solve the problem of selecting service instances. The difference is that service routing selects a batch of service instances that meet the routing rules from the full number of service instances, while load balancing selects an instance suitable for processing requests from the list of service instances after route matching.

Second, the test environment routing implementation principle

2.1 Overview of the scheme

The sample implementation of test environment routing is shown in the following figure as an example. There are two test environments and one baseline environment. Traffic will go through the following components from end to end: App -> Gateway -> User Center -> Points Center -> Activity Center.

Figure: Schematic diagram of test environment routing

According to the service routing chapter in the previous section, in order to achieve the ability to test environment routing, the development work needs to do three things:

Service instance coloring (identifies which test environment the instance belongs to)
Traffic coloring (identify which test environment the request should be forwarded to)
service routing

a. The gateway forwards the request to the user center of the corresponding target test environment according to the requested target test environment label.

b. When the service is called, it will be forwarded to the target service instance in the same test environment first. If there is no service instance in the same test environment, it will be forwarded to the baseline environment.

The following three subsections will introduce the principles of these three parts in detail.

2.2 Service instance coloring

In the scenario of multiple test environments, it is necessary to distinguish the instances deployed in each test environment, so it is necessary to put the tag <featureenv=test environment name> on the instance. Spring Cloud Tencent supports a total of three service instance coloring methods.

Method 1: Configuration file

Coloring can be achieved by configuring the following in Spring Boot's application.yml configuration file:

spring:
  cloud:
    tencent:
      metadata:
        content:
          idc: shanghai
          env: f1

When the Spring Cloud Tencent application starts, it reads the configuration file and parses the idc=shanghai and env=f1 tag information.

If the above configuration files are placed in the project source code, different packages are required to realize that different instances have different label values. You can set different tag values for the same runtime package in the following two ways:

Parameter override is enabled via -D, for example:-Dspring.cloud.tencent.metadata.content.idc=guangzhou
In the standard way of Spring Boot, put the application.ymlplug-in on the local disk

Method 2: Environment Variables

Environment variables are very convenient in container scenarios. Spring Cloud Tencent stipulates that the environment variables prefixed with SCT_METADATA_CONTENT_ are the label information of the instance, for example:

SCT_METADATA_CONTENT_IDC=shanghai
SCT_METADATA_CONTENT_ENV=f1

When the Spring Cloud Tencent application starts, it will automatically read the environment variables and parse out the IDC=shanghai and ENV=f1 tag information.

Method 3: Custom implementation of SPI

The first two methods are built-in methods of Spring Cloud Tencent, but they do not necessarily conform to the specifications of each production project. Therefore, Spring Cloud Tencent also provides a method that allows developers to customize the label provider. For example, the following two practical scenarios:

Put the instance tag in a configuration file on the machine, such as /etc/metadata.
When the application starts, it calls the company's CMDB interface to obtain meta information.

In this scenario, just implement the InstanceMetadataProvider SPI extension.

2.3 Flow staining

Traffic coloring is to mark each request with the target test environment label, and match the target service instance according to the request label during routing and forwarding. The flow coloring can be divided into the following ways:

Method 1: Static Dyeing

Section 2.2 introduces a series of label information that can be set for service instances, such as idc=shanghai, env=f1, etc. In some scenarios, it is expected that all requests passing through the current instance will carry the label information of the current instance. For example, requests through the instance of env=f1 carry the label information of env=f1.

There are three ways to color service instances. There are also three ways to define which tags need to be transparently transmitted to the link as request tags. The core idea is to define the key list of tag key-value pairs that need to be transmitted across the link.

Specify through the spring.cloud.tencent.metadata.content.transitive=["idc", "env"] configuration item of the configuration file
Specified by the SCT_METADATA_CONTENT_TRANSITIVE=IDC,ENV environment variable
Specify by implementing the InstanceMetadataProvider#getTransitiveMetadataKeys() method

Method 2: Dynamic Dyeing

Static coloring is to use some labels of service instances as request labels. Service instance labels are relatively static, and will not change after the application is initialized once. However, in practical application scenarios, different requests often need to set different label information. At this time, the ability to pass dynamic dyeing is required.

Dynamically coloring a request is also as simple as adding an HTTP request header prefixed with X-Polaris-Metadata-Transitive-, for example: X-Polaris-Metadata-Transitive-featureenv=f1. In this way, featureenv=f1 can be transparently transmitted on the link as a request label.

Method 3: Gateway traffic coloring

Gateways are often used as ingress or transit points for traffic. After the request of the gateway, label information can be added to the request according to some coloring rules. For example, to satisfy the request parameter uid=1000, the request is marked with featureenv=f1.

Gateway traffic coloring is a very practical ability. In Spring Cloud Tencent, a very flexible Spring Cloud Gateway coloring plug-in based on coloring rules has been implemented. For example, the following coloring rules can be implemented as featureenv=f1 tags for requests with uid=1000, and featureenv=f2 tags for requests with uid=1001. For more detailed coloring rules, please refer to the documentation.

{
    "rules":[
        {
            "conditions":[
                {
                    "key":"${http.query.uid}",
                    "values":["1000"],
                    "operation":"EQUALS"
                }
            ],
            "labels":[
                {
                    "key":"featureenv",
                    "value":"f1"
                }
            ]
        },
        {
            "conditions":[
                {
                    "key":"${http.query.uid}",
                    "values":["1001"],
                    "operation":"EQUALS"
                }
            ],
            "labels":[
                {
                    "key":"featureenv",
                    "value":"f2"
                }
            ]
        }
    ]
}

At the same time, Spring Cloud Tencent also reserves the TrafficStainer SPI, and users can implement custom traffic coloring plug-ins.

2.4 Principle of Spring Cloud Tencent Routing Function

Polaris provides a very complete service governance capability, and the upper-level service framework can quickly realize powerful service governance capabilities based on the Polaris native SDK. Spring Cloud Tencent implements service routing capabilities on the basis of Polaris.

Polaris Service Routing Principle

The implementation principle of Polaris service routing is not complicated. As shown in the figure below, all instance information is obtained from the registration center, and then a series of RouterFilter plug-ins are used to filter out the set of instances that meet the conditions.

Figure: Polaris Service Routing Execution Chain

In the multi-test environment scenario, the MetadataRouter (metadata routing) plug-in is mainly used. The core capability of this plug-in is to completely match the label of the service instance according to the requested label.

For example, if the request has two tags key1=value1 and key2=value2, MetadataRouter will filter out all instances that contain service instances that satisfy both key1=value1 and key2=value2. In the multi-test environment scenario, Spring Cloud Tencent uses the featureenv tag by default, and uses the featureenv tag to filter out service instances belonging to the same test environment.

Spring Cloud Tencent Service Routing Principle

Spring Cloud Tencent implements the routing core into two parts:

Extend RestTemplate, Feign, SCG to get the requested label information and stuff it into the RouterContext (routing information context).
Extend the Spring Cloud load balancing component (Ribbon before the Hoxton version, and Spring Cloud LoadBalancer after the 2020 version), and call the Polaris service routing API in the extended implementation to filter service instances.

The logic of the extension part is more complicated. Interested readers can refer to the source code of the spring-cloud-starter-tencent-polaris-router module.

3. Test environment routing user operation guide

In the previous section, the implementation principle of the test environment routing was introduced in detail. This section describes in detail what needs to be operated from the user's point of view.

The test environment routing of traffic through Spring Cloud Tencent is very simple, and the core consists of three steps:

Service adds test environment routing plug-in dependencies
The deployed instance is marked with an environment tag
Label the request traffic with an environment tag

Complete the above three steps.

3.1 Add test environment routing plugin dependencies

The spring-cloud-tencent-featureenv-plugin module in Spring Cloud Tencent closes the entire test environment routing capability. All services only need to add this dependency to introduce the test environment routing capability.

3.2 Label the service instance with the environment

The spring-cloud-tencent-featureenv-plugin uses the featureenv label as the matching label by default. Users can also specify the label key used by the test environment routing through the built-in system-feature-env-router-label=custom_feature_env_key label. The following three ways use the default featureenv as an example.

Method 1: Configuration file

Add the configuration to the configuration file of the service instance, such as adding the following in bootstrap.yml:

spring:
  cloud:
    tencent:
      metadata:
        content:
          featureenv: f1  # f1 替换为测试环境名称

Method 2: Environment Variables

It can also be marked by adding environment variables in the operating system where the service instance is located, for example: SCT_METADATA_CONTENT_featureenv=f1

Mode 3: SPI mode

You can include featureenv in the return value of the custom implementation InstanceMetadataProvider#getMetadata() method.

Baseline Environment Label Value

Note that the service instance deployed in the baseline environment does not need to set the featureenv tag, indicating that it does not belong to any test environment, so that it can match the baseline environment when the request does not match the corresponding test environment.

3.3 Flow staining

Method 1: Client-side dyeing (recommended)

As shown in the figure below, in the HTTP request sent by the client, adding the X-Polaris-Metadata-Transitive-featureenv=f1 request header can achieve coloring. This method allows developers to colorize traffic according to business logic when a request is created.

Figure: Schematic diagram of client staining

Method 2: Gateway dynamic coloring (recommended)

Dynamic coloring is that developers configure certain coloring rules to automatically colorize traffic when it passes through the gateway, which is quite convenient to use. For example, the requests of users with uid=1 are forwarded to the f1 environment, and the requests of users with uid=0 are forwarded to the f2 environment. You only need to configure one coloring rule to achieve this.

Figure: Schematic diagram of gateway dynamic staining

Spring Cloud Tencent implements the traffic coloring plug-in by implementing the GlobalFilter of Spring Cloud Gateway. The developer only needs to add the spring-cloud-tencent-gateway-plugin dependency and turn on the coloring plug-in switch in the configuration file (spring.cloud.tencent.plugin. scg.staining.enabled=true) to introduce traffic staining capability.

Method 3: Gateway static coloring

Adding a fixed header to the request is the most common plugin for gateways, as shown in the following figure. A gateway can be deployed in each environment, and all requests passing through the gateway can add the X-Polaris-Metadata-Transitive-featureenv=f1 request header. This method requires a gateway to be deployed in each environment, and the cost is high, so the frequency of use is relatively low.

Figure: Schematic diagram of gateway static staining

After completing the above steps, the test environment routing can be realized, and readers can run the polaris-router-featureenv-example under Spring Cloud Tencent for a complete experience.

4. Summary

The test environment routing is a very practical function in the development stage of the microservice architecture system, which can greatly reduce the maintenance cost and resource cost of the test environment, and can greatly improve the R&D efficiency. From the chapter of the operation guide, it can be seen that it is very simple to implement test environment routing through Spring Cloud Tencent. It only needs to add the corresponding environment label to the deployed service instance and add a label to the request header.

The common test environment routing implementation solutions in the industry often need to deliver routing rules to services on the link to achieve routing capabilities. However, through the metadata routing capability of Polaris, there is no need to issue any routing rules in the whole solution, only the corresponding label information needs to be set in the instance, and the operation cost is very low.

If the project just uses Spring Cloud Gateway as the gateway, the integration of the gateway coloring plug-in in Spring Cloud Tencent can further reduce the cost of traffic coloring. The client does not need to do anything, but only needs to configure the gateway coloring rules to achieve traffic coloring.

At present, Spring Cloud Tencent mainly realizes the test environment routing capability of calling traffic between microservices, and does not involve the test environment routing capability of message queue and task scheduling.

5. Welcome to co-construction

If your project is using the Spring Cloud framework, and

Precipitated a very practical general-purpose plug-in capabilities and scenario-based solutions
Currently encountering some landing problems
Interested in the Spring Cloud Tencent project

You are very welcome to work with us to polish more practical and general capabilities, and jointly build a microservice development framework that meets various actual production scenarios. One of your suggestions, Issue, Pull Request or even just a small Star is a great support for the Spring Cloud Tencent community.

Github address: https://github.com/Tencent/spring-cloud-tencent

We have created a high-quality technical exchange group. With excellent people, you will become better yourself. Quickly click to join the group and enjoy the joy of growing together. In addition, if you want to change jobs recently, I spent 2 weeks a year ago collecting a wave of big factory scriptures, and those who are ready to change jobs after the holiday can click here to get it !