Joke Collection Website - Public benefit messages - Awareness of business and content security
Awareness of business and content security
Often contact with several departments of Party A: Security Department, Operation Department, Audit Department, Development Department, etc. Every department has different concerns. The security department is basically responsible for network security, the operation department is responsible for ensuring the effectiveness of marketing strategies, the audit department is responsible for content quality and content violations, and the development department will be involved in the unified development and construction of security platforms. ? The importance of each department's work is also directly related to the company's business, but no matter which department has problems, the enterprise will be affected.
For an intuitive example, for a game company, it may be affected by DDOS attacks, which may affect the stable operation of the business, the company's reputation may be affected by data leakage, and the content may be illegal, which will make the whole game take off the shelf for rectification. The most common problem is plug-in. The direct consequence is the loss of users and income.
For example, there are all kinds of pornographic information. In June, 2009, the Internet Information Office conducted a thorough investigation of voice and deleted a large number of applications. ? The main solution in the industry is to connect business-related texts, pictures, videos and audio to the machine audit platform. At present, it is mainly the saas detection platform of third-party service providers or the self-built detection platform of enterprises, which is mainly used to improve efficiency and reduce audit time, and at the same time, it is combined with manual audit to ensure the effect and reduce the rate of missed judgment and misjudgment.
Especially in the game APP, in terms of game cracking, if you are interested, you can search Taobao shop and enter keywords to crack the game. There will be many shops and games to choose from. In addition to removing the normal charges in the game, the game will also add some abnormal functions, such as double attack, to attract players. Some shops charge according to the membership system and pay 150 yuan every month, which has exceeded the income of a single user of many genuine games. It is very fatal for genuine games. To solve this kind of problem, take mobile games as an example. For the cracking problem, reinforcement measures can be taken to prevent reverse cracking. For the plug-in problem, we can check the simulator, multi-connection, cloud real machine and simulated click through the anti-plug-in technology of the game, and combine the operation means to enhance the deterrent effect on the plug-in.
At the end of 18, Starbucks made a new registration and coffee coupon delivery activity. At that time, user authentication was relatively simple, and coffee coupons could be obtained by filling in less information. One and a half days after the launch, the wool party brushed away almost 400W coupons, which is about 1000 W according to the price of the medium cup. ? In the circle of the wool party, it is still possible to get hundreds of thousands of points. For the protection of the wool party, support the threat intelligence database, such as the blacklist of mobile phone numbers, IP and e-mail numbers, and then conduct data analysis and behavior analysis by collecting relevant information of users during the activities. ? In this black and gray industry, the driving force of interests is strong and the confrontation is fierce.
The interesting thing about data leakage is that basically more than 60% of data leakage is done by insiders. Recently, a recruitment website leaked 16W resume information, which is a typical internal and external collusion. 50 yuan's resume was illegally sold to vendors and sold on Taobao at the price of 1-2 yuan. Therefore, data leakage prevention can be solved not only by using some data leakage prevention products, but also by perfecting the system, paying attention to the division of authority, strengthening audit activities, training internal personnel on safety awareness and increasing legal awareness.
DDoS attack is the oldest but most effective network attack. Thanks to the development of network communication and Internet technology, DDoS attacks are becoming more and more serious. For example, many IOT devices can be used for DDos attacks. It is difficult for users to solve the attack source and can only be passively protected. In China, dozens of GB attacks are very common now. It is usually mixed with traffic and CC attacks, so it is difficult to handle the localized deployment of protective equipment, and most of them are solved by cloud cleaning. We can see that many domestic security vendors are transforming from hardware to cloud services, which is also a trend of cloud security services.
In this sharing, I still pay some attention to how to solve the content security problems faced by enterprises under the background of the explosive growth of UGC content and the increasing national supervision.
Status quo of content governance. From three angles, the first is the characteristics of supervision: there are many supervision departments, many regulations and requirements, and many special rectification.
The regulatory authorities include: the Internet Information Office and the former State Administration of Radio, Film and Television are now split into the State Administration of Radio and Television, the State Press and Publication Administration, the State Film Bureau, the Ministry of Culture, the Ministry of Public Security and the Ministry of Industry and Information Technology.
The regulatory contents of each regulatory department have their own emphases, but there will also be overlaps. ? For example, the Press and Publication Administration mainly supervises news content, while the State Administration of Radio, Film and Television censors radio and television content, such as various online dramas and TV dramas.
For an enterprise, as the object of supervision, it will be supervised by the public security department and the network information office at the same time. Supervision methods are generally implemented through user reports and special inspection activities. In particular, reporting by users is a very important channel. For example, the network office provides a center for reporting illegal and bad information in the central network office. In June this year alone, 65,438+065,438+700,000 reported incidents were accepted. Regulators have not only established their own reporting platforms, but also required major content platforms to build reporting channels, so we can see that, for example, major video websites have reporting feedback portals.
//In our future work and life, all of us can talk about the bad websites or contents we encounter and submit them to the Internet office through reporting.
The second feature of supervision is that there are many regulatory requirements. Interested parties can check the regulatory requirements on the official websites of various regulatory authorities, which are very detailed at present;
Here I want to emphasize the main body of responsibility, one of which is the user and the other is the platform.
1, taking a scene as an example, a user posts pornographic advertising information on the content platform. This behavior of users is illegal, and it is also illegal for content platforms to publish such content. Objectively speaking, both should be punished, but in reality, the accountability cost of users is very high, so what we can see in all kinds of content violations is mostly the handling of platforms.
Moreover, since June 1 2065438, the Network Security Law has been officially implemented, and the regulatory authorities have another legal basis. Take another scenario as an example:
A malicious user tampered with the website to publish pornographic information through cyber attacks, and the operating platform not only violated the requirements of content publishing, but also failed to implement information system protection according to the cyber security law, and the operator will be punished according to the cyber security law.
The third feature of supervision is that there are many governance activities.
According to the inspection of the Information Office, from February 20 18 to June 20 19, four content governance activities were carried out.
18 February,18 conducted a special inspection of the app, mainly involving pornography, drugs, illegal games, bad learning and other applications, and removed 33W apps.
65438+September 65438+1October, special rectification was carried out on educational apps. It was verified that more than 20 apps such as "working dogs" and "pocket teachers" illegally spread obscene and pornographic content and were removed from the shelves.
65438+65438+ 10-June, a half-year "whole network rectification action" was launched.
In June, a special rectification campaign for speech was launched.
We can see the determination and strength of the country in building a green grid space environment.
Even under such strong supervision, illegal content still emerges one after another.
The characteristics of illegal content: covering many scenes, many data variants and strong antagonism.
(1) covers the scene, which is pervasive. ? News content, user comments, user avatars, nicknames, and the barrage of watching online dramas, no scene with content can escape the harassment of illegal content.
(2) There are many kinds and varieties of illegal data in various scenarios. From the initial text sensitive words to the current font checking, confusion of special symbols and embedding illegal content in pictures, in the last year or two, ASMR content types have appeared in pronunciation, which will be mixed with a lot of pornographic content.
(3) Strong antagonism is reflected in the fact that the distribution of illegal content is not organized and antagonistic, and the detection or operation strategy is countered through the change of content form and account number. This part will elaborate the necessity of the construction of national defense volume in depth later.
Then, under the background of strong national supervision, it is actually a difficult problem to do a good job in content security. ?
For managers, what they want to see in the end generally includes two indicators: the effect of detection and the impact on business. ? The detection effect here generally depends on the correct rate and recall rate. The impact on the business mainly depends on the detection time, and try not to affect the user experience. For example, in IM chat, if the detection time of a text exceeds 1s, it will seriously affect the user experience.
In order to achieve these goals, there are many difficulties in the self-built detection system from 0 to 1.
The first is cost input, two main costs: labor cost and equipment cost. In terms of labor costs, the cost of recruiting people on the Internet is still very high. Just a mature algorithm expert, the annual salary is generally around 50W. Moreover, the whole system needs not only algorithmic personnel, but also related operation and audit personnel. Just investing in manpower requires a million levels. ? In terms of equipment, GPU nodes needed for image processing are relatively expensive now. For example, NVIDIA's P40 graphics card is listed in 16. Now it needs about 5W, and P40 can detect pictures at a speed of about 30QPS. In addition, model training also needs GPU nodes. This is also a relatively high overhead.
In addition to considering the cost, there are obstacles to data accumulation and audit experience. Taking image training as an example, a detection model needs tens of thousands or even hundreds of thousands of sample data. It is impossible to accumulate such sample data without a certain amount of time and channels.
In addition, the auditor's experience, audit process and system are also important guarantees for the effect. The auditor's audit experience determines the subjective audit effect and audit efficiency, and the perfect process and system are the objective guarantee of the effect. ? The experience of personnel depends on continuous learning and training, and the process and system need time to formulate and improve. It takes a process.
Next, let me introduce the construction of test team and technical system.
The first is team building, here I take the company's team as an example;
The whole big team is subdivided into several small teams, including algorithm team, system development team, operation team and manual audit team.
The core technology is realized by the algorithm team, which is subdivided into different groups, such as the group that does text machine semester and the group that studies picture machine;
The system development team is responsible for building the business platform;
The operation team is responsible for direct docking with the business department, clarifying the requirements of test standards, adjusting some test strategies in real time, and optimizing the effect;
The audit team has the largest number, and at present, it also completes all-weather audit work in a shift rotation mode.
Two principles should be considered when formulating testing standards, one is the principle of comprehensiveness, and the other is the principle of landing.
From a comprehensive point of view, there are two main needs to be considered, one is the country, and the other is the operating platform. ? For the country, pornography, terrorism and contraband are all prohibited contents, and there will be relevant laws and regulations prohibiting them from appearing in civilization. These standards are basically tests for all content platforms.
Taking the operation platform as an example, it is not advisable to abuse, irrigate, compete and other advertising information.
This paper emphasizes the real-time from the request to the implementation of the standard, which needs to be completed as soon as possible to reduce the vacuum period of the test. ?
From the landing point of view, it is necessary to collect data and train models. Data can be collected for people. Standards can be descriptive, but data collection and labeling must be detailed. For example, under the classification of pornography, for the detection requirements of "sexual behavior", the required words themselves describe the categories and concepts of sexual behavior, and more details are needed to mark the data. For example, pictures of leaking buttocks need to be explained, and classified according to shooting angle, whether there are any missing spots, whether they are children's photos and other factors. Will eventually be marked as pornographic, vulgar, sexy or normal photos.
After the standards are formulated, different standards are applied according to the needs of on-site testing. ? There is nothing wrong with news content publishing sexy pictures, but it is not normal to appear in children's education IM.
The three most important platforms:
The detection platform (the core of the service) is preset with various trained models.
Manual audit platform (effect and ability supplement, improve efficiency), its functions include data sampling and rapid operation.
The model training platform (effect guarantee) is mainly composed of GPU clusters.
The business system is connected with the detection system, which can feed back the detection results of words and pictures in real time. ? The data that needs manual audit will be docked by the detection platform and the audit platform, and finally the results will be returned to the business system by the audit platform.
Machine training platform, mainly based on the badcase of each channel for model training and optimization, and finally input the training results for the detection platform.
In this way, these platforms form a closed loop, achieving the goals of fast service access and sustainable effect optimization.
The above three parts, team, standard and platform, form a relatively perfect testing system. It can meet the needs of conventional content detection.
But the reality is that content governance not only deals with content, but also needs deep detection and defense systems.
The objective facts show that most illegal contents are published by abnormal users, and content governance is a direct contest between enterprises and black and white producers. It's just that the content detection method is too simple or in a state of being exhausted.
Why is content governance a direct contest between enterprises and black ash production? Let's first look at the business process of black ash production:
From the perspective of roles, there are publishers, business subcontractors and content platforms. There are several publishers, such as various pornographic websites. In order to attract traffic, it is necessary to publish website related information, and some people will publish illegal content on the same industry platform for the purpose of malicious competition. Publishers will find the role of business subcontracting to realize illegal content publishing. This kind of business subcontracting will involve many roles, including people who specialize in writing automation tools, people who resell accounts, and platforms that realize content distribution, such as various group control platforms. In the end, there will be an issuer releasing water on various platforms.
At present, the production of black ash is very mature, and the division of labor in each link is different. As PPT shows, there are specialized mobile phone card vendors, account merchants, coding platforms, various cloud control platforms and so on.
As we all know, all the current mobile phone cards come from the real-name registration system. Therefore, there is a way for mobile phone card manufacturers to handle cards in large quantities. By registering a company, they can apply for a large number of IOT cards in the name of the company. These IOT cards have no voice function, but they can send and receive short messages. It can be used to register and log in accounts. ? So when you call back the mobile phone number of the registered number, the voice prompt: when the number you dialed is not enabled with voice function, it is probably an Internet of Things card.
The interest drive here is very strong. For example, a new number is worth several yuan, but it can be worth tens or even hundreds of yuan by publishing normal content from time to time.
Released on major content platforms, the confrontation is particularly fierce now. Take Weibo as an example. You can observe that in the past, pornographic accounts would directly post pornographic remarks in various hot spots, such as pornographic websites, or add contact information. ? This kind of picture is easy to find and title, but now it has been changed to a sexier picture. Most of the published content is normal comments, but the individual owner is pornographic information. In order to enhance antagonism.
In the context of this strong confrontation, it is too simple to rely solely on content detection, and deep protection is the key.
Content governance is not only the detection of published content, but also the rectification from the source. ? It is necessary to establish an all-round defense system, from account registration to account login, to user behavior, and finally to publishing content, so as to achieve better results. In other words, it extends from content detection to user behavior detection, and with the help of the ability of user portrait, it can better resist the attack of black ash production.
In the registration stage, there will be problems of batch registration and false registration. We can consider using verification code, number authentication and real person authentication to solve the problem of batch login and violent cracking in the login stage. We can use verification code and anti-cheating technology. Then detect the publishing behavior and content, such as dealing with the behavior that the same account publishes a large number of similar content in a short time.
The technical means mentioned here are briefly explained with verification code and anti-cheating.
First-hand verification code, mainly used for human-computer identification, aims to increase the attack cost of attackers. Early verification codes, such as character verification codes, are very easy to crack. OCR recognition technology is mainly used for cracking. It is easy to identify the characters or smart verification codes in the most used pictures at present, and it can be judged by analyzing some behavior information and equipment information of users. Now more mainstream, such as puzzle sliding verification code, text click verification code, enhance the ability of confrontation.
The anti-cheating technology used here, such as IP portrait, will detect the user's IP geographical location, whether it is a proxy IP or not. The detection of device environment will detect whether the device is an emulator, whether there is root or jailbreak, analyze the user's behavior, and set a normal behavior baseline through rules according to the information between various dimensions. Usually, this is mainly due to the event entry of registration, login and key business operations (such as posting operations).
The above are typical security issues, and we have focused on sharing the content security construction. ? -Kaka orange juice, content and business security practitioners.
- Previous article:Samsung s2 1 How to set MMS?
- Next article:New Zealand calling card
- Related articles
- How to retrieve deleted multimedia message pictures?
- What should I do if I forget the payment code of Shaanxi government service network unified public payment platform?
- Is there a charge for SMS sent by Tianyi customer service client?
- Bank of communications can't receive dynamic short messages.
- 8 words of congratulations and encouragement for the college entrance examination
- How to say "Happy New Year" in advance?
- How to set the ringtone after prison break?
- The leader won't let me pay a New Year call.
- 469 customers in Singapore deposited 40 million in the bank and disappeared completely. Whose responsibility is this?
- What does lonely height mean?