Joke Collection Website - News headlines - What are the data labeling methods?

What are the data labeling methods?

There are three main ways of data annotation, namely, image, voice and text.

First, the image class

1, rectangular frame

To pull a 2D frame, it is necessary to pull a fitting frame and select the detected objects (people, cars, plants, animals). Generally, after the frame is selected, it needs to be labeled with corresponding attributes (gender, age, color, size) and so on.

2. Polygonal picture frame

Polygonal frame is a little more difficult than rectangular frame, and it needs to draw an outline around the labeling elements in the form of multi-point frame. Like rectangular boxes, polygonal boxes need to be labeled with corresponding labels to mark their properties.

3.OCR recognition

OCR has two labeling methods, one is to use multi-point framing, and the other is to transfer the content that needs framing absolutely and accurately. This annotation method is mainly used for text training.

4. Semantic segmentation

This kind of painting is relatively rare compared to drawing a frame. You need to distinguish the elements in the picture and mark and fill in each part separately. You need to dig out some selected elements in the frame by matting, and then select the corresponding attribute tags, thus cutting out some elements.

5. Dots

Dotting is generally used to mark faces or key parts, which will limit and require the position of points, thus achieving high-precision detection and recognition.

6, picture audit classification

There are generally two ways to judge a picture, one is to classify the picture, and the other is to judge whether the picture is valid.

Second, the pronunciation class.

1, zhuyin

Phonetic symbols are one of the most common phonetic symbols. Journalists need to listen to some pronunciation before copying what they hear. Common languages include (Chinese, foreign languages, dialects) and so on. , according to the time can be divided into long words or short words. Generally, the voice below one minute (usually about three seconds) is short voice, in which the length of voice, sound quality, pre-scoring results, cutting and other factors will greatly affect the difficulty of voice transcription.

2. Other kinds of phonetic annotation

The proportion of other kinds of voice is relatively small, so it is necessary to judge whether a paragraph of text corresponds to voice or whether a paragraph of voice contains illegal sensitive elements.

Third, the text class.

1, emotional note

This label needs to judge the emotion contained in a sentence according to a sentence. There are generally three levels (positive, neutral and negative). If the requirements are high, it may be divided into six or even twelve emotional labels.

2, entity labeling

It is necessary to extract entities in a sentence, such as TV, refrigerator and basketball, and sometimes it is necessary to divide the sentence into categories such as encyclopedia, music, news or action instructions in the text.

3. Similarity judgment

It is necessary to judge whether the meanings expressed in the two sentences are consistent. If the consistent flag is 1 and the inconsistent flag is-1, the flag 0 cannot be determined.

4. Other types of text comments

Other types of text annotation, such as public opinion annotation, judge whether the company mentioned in an article has a positive or negative impact. There is also article sensitivity detection to determine whether the text content has illegal sensitive information.

The role of data annotation

1. Machine learning training: Data labeling is a necessary step to train a supervised machine learning model. By labeling or annotating data, the model can learn the relationship between input data and output labels, so as to perform tasks such as classification, regression and prediction. High quality annotation data is helpful to improve the model performance.

2. Data analysis and insight: Tagged data can be used for data analysis to help researchers and decision makers find patterns, trends and correlations in data. This is very important for making business strategy, market research and decision support.

3. Natural language processing: Text data labeling is used for natural language processing tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Labeling text is helpful to train text understanding model and improve the accuracy of text processing.

4. Sound and voice processing: voice and audio data tags are used for voice recognition, music classification, voice analysis and other applications. Labeling speech is helpful to train automatic speech recognition system and audio processing tools.

5. Medical diagnosis: Medical image data annotation is very important for medical diagnosis and treatment planning. By marking X-ray, MRI and CT scan images, doctors can diagnose diseases more accurately.