Joke Collection Website - Public benefit messages - Derived Function of SMS Template Based on Dynamic Programming Algorithm

Derived Function of SMS Template Based on Dynamic Programming Algorithm

Our company is responsible for a micro-service, which provides a complete short message interface for all business lines of the company.

Limited by the increasingly tight policies of telecom operators, it is increasingly difficult to send text messages.

All SMS service providers have put forward similar requirements for filing SMS templates, otherwise it will affect the sending rate of SMS.

Generally speaking, when each system sends a new SMS, it will make an incremental report to the SMS service provider.

But once you change the service provider, you need to sort out all the SMS templates.

Manually sorting out the past SMS templates requires the participation of many career line personnel, which consumes a lot of manpower and is easy to leak.

Maintaining SMS template documents can be used as a future plan, but for the first time, you still need to manually sort out the past SMS templates, which can't avoid the workload.

Therefore, it is considered to automatically analyze the short message template by collecting the short messages in the last 1~3 months.

After research, php provides similar_text () to calculate the text similarity and time complexity o (n 3) of a single message.

So we can set a reasonable threshold, traverse the examples of short message recording and generation, and generate short message examples.

The time complexity of the whole script is O(m.n) m: the number of SMS records n: the number of SMS templates.

There are 200 templates on average, and we have changed SMS service providers six times for various reasons.

Using the above scheme, it is inefficient to manually deduct non-template dynamic content for each centralized filing.

Therefore, template analysis tools are needed.

Among them, because the dynamic content in the short message template is often in a common format such as name, it cannot be analyzed by traditional methods such as regularization, and no related class library, tools or methods have been found.

It is considered that by comparing similar strings with high similarity, the common * * * strings are continuously extracted to approach the short message template.

For two strings of length n, the longest male * * * string is counted by the traditional double pointer traversal method. According to the environmental test under warp, when the length of the short message exceeds 30 words, the calculation of two short messages cannot be completed in normal time, that is, it cannot be used for the short message analysis of any long marketing short message.

Because in fact, the time complexity required for two short messages to calculate the public * * * string is O (2 n) n, when n is greater than 20, it can not be ignored as a small constant coefficient.

That is, the time complexity of the whole script suddenly increases to O (M.N.2 L) m: the number of SMS records n: the number of SMS templates l: the average SMS length.

For convenience, the definition is as follows

The short message A is the string A: its elements are A 1, A2 ... one; A ... Ann.

The short message B is the string B: its elements are B 1, B2 ... Bachelor of Medicine ... BM.

A 1~an and b 1~bm correspond to the male string called c(m, n).

O (2 n+m), in which a lot of time is used to calculate the common * * * character string of each subset of the overlapping subproblem A 1, A2 ...; A ... Ann, B 1, B2 ... Bachelor of Medicine ... BM, so as long as repeated low-order terms can be separated, the time complexity can be reduced by using the idea of dp.

Take a substring a 1~an of short message A and a substring b 1~bm of short message B.

It is obvious that:

That is to say, when the values of an and bm are actually known, we only need to know the information of three intermediate values of C (n- 1, m- 1), C (n- 1, m) and C (n, m- 1) in advance.

For this reason, we save a matrix of length N*M to save the length of all the intermediate values of c(n, m).

In addition, in order to find the growth direction of the atomic string, it is necessary to record which of C (n- 1, m- 1), C (n- 1, m) and C (n, m- 1) is calculated.

matrix

From the above matrix, we can know that two public * * * strings are * good old cows, and the time of this algorithm is optimized from O (2 (m+n)) to O(M*N) M: the length of string A and the length of string B.

Macroscopically, the time complexity of the whole script returns to O(m.n) m: the number of SMS records n: the number of SMS templates.

There is no problem in the online gray scale execution and full scale execution of the script, and the goal is achieved.