Joke Collection Website - Blessing messages - Full-text retrieval of mobile phone short messages

Full-text retrieval of mobile phone short messages

It is a retrieval technology, which uses data such as words, sounds and images. As the main content rather than the appearance characteristics of retrieval document content.

The main systems are TRS system and Tianyu system.

Compared with other search engines, the distinctive feature of full-text search engine is that it can take any meaningful word in the text as a search item, and the search result is the original document, not the document clue.

With the development of computer industry, there are more and more electronic information based on computer storage devices. This information can be roughly divided into two categories: structured data and unstructured data. Structured data refers to the financial accounts and production data of enterprises, and the score data of students. And unstructured data refers to multimedia data, such as text data, images and sounds. According to statistics, unstructured data accounts for more than 80% of the total information. For structured data, RDBMS (Relational Database Management System) technology is the best way to manage structured data at present. However, due to the underlying structure of RDBMS itself, it has some shortcomings in managing a large number of unstructured data, especially the slow speed of querying these massive unstructured data. And through full-text retrieval technology, these unstructured data can be managed efficiently.

After several years of development, full-text retrieval has developed from the initial string matching program to a large-scale software that can comprehensively manage unstructured data such as super-large text, voice, images and moving images. Due to the profound changes in connotation and extension, full-text retrieval system has become synonymous with a new generation of management information system, and the basic indicators for measuring full-text retrieval system have gradually formed norms.

First of all, we are concerned about the recall ratio, that is, the ratio of the amount of related materials retrieved by the system to the total amount of related materials in the system database during a certain retrieval. Accuracy is a key to ensure that we find the most useful materials, which is the ratio of the number of useful materials retrieved by the system to the total number of materials retrieved. Retrieval speed or response time is the guarantee of improving work efficiency, which refers to the time required from submitting the retrieval topic to finding out the data results. The most basic retrieval speed should be "ten million Chinese characters, second-level response". There are also indicators such as the scope of inclusion (the scope of search), user burden (the sum of users' efforts in the retrieval process) and output form (the expression form of output information), which are also factors to measure the quality of full-text retrieval system.

Search engine should be the most important application of full-text retrieval technology. At present, the use of search engines has become the second largest Internet application technology after sending and receiving emails. Search engine originated from the traditional information full-text retrieval theory, that is, the computer program scans every word in every article, establishes a word-by-word file, and the retrieval program sorts the articles containing these search words according to the frequency and probability of each search word appearing in each article, and finally outputs the sorting results. Full-text retrieval technology is the core supporting technology of search engine.

A good search engine is the key to an ideal website. Many people like to use website search when visiting websites. Website retrieval should be the perfect combination of classified directory navigation and full-text retrieval, including the following aspects:

The key of classified directory navigation is the search scope, and the limitation of search scope can make the search results not too much and too much;

Full-text retrieval is very important for website retrieval, which can help people find the needed web pages quickly under normal circumstances.

Sometimes it is difficult to locate the required information by using classified directory navigation and full-text retrieval, so it is necessary to combine retrieval assistance;

There must be related sorting function, because when there are too many search results, users can't browse them one by one. Most users only browse the first few items, and there is no relevant sorting. Perhaps the accurate search results are behind and users can't browse, but the top search results are almost irrelevant, causing users' illusion.

In addition, we should also consider the particularity of HTML/XML, the support for sudden access of a large number of concurrent users, the dynamic characteristics of the website and the efficiency of index maintenance.

At present, there are Lucene, Solr, ElasticSearch and so on. The full-text retrieval process is divided into two processes: indexing and searching;

index

Collect source data (target information to be searched) from relational databases, Internet and file systems. Source data comes from a wide range of sources.

Collect source data in a unified location, such as a storage system. To create an index, create an index in the index repository (file system), extract key information from the source database, and extract a word from the key information. Words are related to source data. That is, when an index is created, words are associated with the source data, and this association is recorded in the index database. If a word is found, it means the source data (http web pages, e-books, news, etc. ) I have found it.

Search (search)

Users perform search (full-text retrieval) and write query keywords.

Search the index from the index database, and search the index database word by word according to the query keyword.

Display search results.