Joke Collection Website - Talk about mood - About API Gateway (4)-Current Limiting

About API Gateway (4)-Current Limiting

Generally speaking, traffic control is a strategy to control user requests, which mainly includes: authority, traffic restriction and traffic scheduling.

Permission has been mentioned in the last article, this article is about flow limiting, and the next article is about traffic scheduling.

Current limiting refers to limiting the frequency (QPS/QPM) or number of times users make calls.

From the point of view of users or operations, the most intuitive function of traffic restriction is charging.

Generally, the external APIs of major open platforms have some free quotas, which can be used for personal testing. Once a large-scale call is made, it is necessary to pay a larger amount (frequency and frequency) and charge according to the number of calls or frequency. Once you exceed your limit, you will be restricted from calling.

In fact, this is the biggest use of current limiting, but users or students who operate it don't feel it, so it is not understood by most people.

Behind the gateway are various services, and the interface of each service is exposed to users through the gateway. Theoretically, the user's traffic is unpredictable, and there may be a wave at any time. Once the peak traffic exceeds the carrying capacity of the service, the service will hang up, such as sending a wave of Weibo when there is big news, such as 12306 in previous years.

Therefore, the gateway must ensure that the traffic that reached the back-end service in the past does not exceed the upper limit that the service can carry. This upper limit is negotiated between the gateway and various services.

From simple to difficult, current limiting can be divided into single machine current limiting, single group current limiting and complete group current limiting.

Here we don't discuss specific current limiting algorithms such as leaky bucket and token bucket, but only discuss concepts and ideas.

The idea of single machine current limiting is very simple, that is, the current limiting value of each machine x the number of machines = the total current limiting value.

For example, the QPS limit of user A is 100, and the gateway has 10 machines, so each machine is limited to 10 qp.

Let's talk about benefits first. This method is very simple to implement. Each machine can calculate qps in local memory, and if it exceeds the threshold, it will reject the stream.

However, the defects of single-machine current limiting are also very obvious, mainly reflected in two points:

? When the number of machines deployed by the gateway changes, it is necessary to adjust the current limit value of each machine according to the number of machines. In reality, due to expansion, contraction, machine downtime and other reasons, the change of the number of machines is common.

? The premise of single-machine current limiting is that the user traffic carried by each gateway is average, but in fact, at some time, the user traffic is not completely evenly distributed on each machine.

For example:

10 machines, each of which is limited to qps 10, of which the actual qps of 3 machines is 15, which was rejected because the user traffic exceeded the limit. The qps of the other 7 units is 7 each. In this way, the total QPS of users =15 * 3+7 * 7 = 94. The user's qps is not exceeded, but some traffic is rejected, which is very problematic.

In fact, the threshold of single current limit will be set slightly larger to offset the problem of uneven flow.

Due to the above problems, single machine current limiting is usually used as a backup means, and cluster current limiting is used most of the time.

Let's look at a schematic diagram first:

Compared with single machine current limiting, the calculation of cluster current limiting is moved to redis cluster, which solves the defect of single machine current limiting.

However, cluster current limiting is not perfect, because redis is introduced, so when the network jitter between gateway and redis and redis itself fails, cluster current limiting will fail. At this time, it is still necessary to rely on single-machine current limiting to bargain-hunting.

In other words, the combination of cluster current limiting and single machine current limiting is a safer scheme.

Next, let's think about such a problem: large gateways are generally deployed in multiple rooms and areas. Of course, back-end services are also deployed in multiple rooms and areas. In terms of protection services, cluster current limiting is enough. But for users, there are still some problems:

For example, the upper limit of QPS purchased by users is 30, and our gateway is deployed in the northern, central and southern regions of China, so how to allocate the 30QPS?

The average is definitely not good. The user's traffic may be obviously unbalanced. For example, the user's business is mainly concentrated in the north of China, so most of the user's traffic will enter the gateway in the north. If the gateway limits QPS to 10, users will definitely complain.

How about limiting each region to 30? No, if users' traffic is evenly distributed in all regions, then users may actually use 90QPS after purchasing 30QPS, which is too bad.

According to the idea of solving the unbalanced restriction of single machine traffic, can we make a public redis cluster to count?

No, limited by the speed of signal propagation and the vast territory of China, it is of course unrealistic to count every flow. Excessive rt will lead to the loss of meaning, the bandwidth cost will become extremely expensive, and the specification requirements for redis will be very high. In short, the cost of solving the problem is too high.

There is an ingenious solution: local cluster ladder counting+complete set checking.

Or just an example:

When the current limit threshold is 90, these three areas are calculated separately. When the value of the local area reaches 30, go to the other two areas to get the current count value of the other party, and add the count values of the three areas. If it exceeds, tell the other two areas to exceed and start rejecting traffic. If not, repeat the above actions every time the local QPS rises 10.

This can effectively reduce the number of interactions with redis and realize real cluster current limiting in the whole region.

Of course, this kind of regional cluster current limiting is definitely not allowed, because of the existence of rt and step counting interval, but it is still much better than single cluster current limiting.

When a user's traffic is particularly large, redis counting will encounter typical hot key problems, which will lead to excessive pressure on a single node of redis cluster. There are two ways to solve this problem: breaking up and sampling.

Breaking up refers to adding some suffixes to hot keys to make them multiple keys, so as to hash them to impassable redis nodes and share the pressure.

For example, if the hotkey is abcd, then after breaking up, the keys will become abcd 1, abcd2, abcd3 and abcd4. Technology, add 1, 2, 3, 4 suffix in turn.

Sampling means that for hotkeys, not every request arrives, but every request, for example, 10, so that the pressure of redis will be reduced to one tenth.

After that, the traffic dispatch is completed. Haha, let's talk about monitoring in the next article. By the way, I am using the domestic gateway: Wukong, from Eolinker. I think it's better than Kong. Interested students can find out for themselves.

www.eolinker.com