Joke Collection Website - Public benefit messages - How did the pictures of search engines come out? I want to put my picture on it. What should I do?
How did the pictures of search engines come out? I want to put my picture on it. What should I do?
If you are posting in Baidu space, and there are pictures to upload, ask for the size of the pictures, not too big.
How does the search engine realize the search? With the rapid development of Internet and the increase of WEB information, users need to find the information they need in the ocean of information, just like looking for a needle in a haystack. Search engine technology just solves this problem (it can provide information retrieval service for users). Search engine refers to a kind of website that provides search services on the Internet. The servers of these websites collect the page information of a large number of websites on the Internet locally through network search software (such as network search robots) or network login, and establish information databases and index databases after processing, so as to respond to various searches put forward by users and provide information or related pointers needed by users. Users' retrieval methods mainly include free word full-text retrieval, keyword retrieval, classified retrieval and other special information retrieval (such as enterprise, name, telephone yellow pages and so on). Let's take the network search robot as an example to illustrate the search engine technology.
1. network robot technology
Robots are also called spiders, worms or randomness, and their core purpose is to obtain information on the Internet. It is generally defined as "software that searches files on the network, automatically tracks the hypertext structure of files and searches all referenced files circularly". Robots use hypertext links in the homepage to traverse and crawl from one HTML document to another through U-toe references. The information collected by online robots can be used for many purposes, such as indexing, verifying the legality of HIML files, verifying and confirming uRL link points, monitoring and obtaining updated information, site mirroring and so on.
When a robot crawls on the Internet, it is necessary to establish a URL list to record the access trajectory. It uses hypertext, and URLs pointing to other documents are hidden in the document and need to be analyzed and extracted. Robots are usually used to generate index databases. All search programs have the following working steps:
(1) The robot takes the URL from the initial URL list and reads the content it points to from the Internet;
(2) Extract some information (such as keywords) from each document and put it into the index database;
(3) extracting URLs pointing to other documents from the documents and adding them to the URL list;
(4) Repeat the above three steps until no new URL appears or some restrictions (time or disk space) are exceeded;
(5) Add a retrieval interface in the index database and publish it to online users or provide it for users to retrieve.
Search algorithms generally have two basic search strategies: depth first and breadth first. The robot determines the search strategy by accessing the URL list: first in, first out, forming breadth-first search. When the initial list contains a large number of server addresses, breadth-first search will produce good initial results, but it is difficult to go deep into the server; First-in, first-out and depth-first search can produce better document distribution and make it easier to find the structure of documents, that is, to find the maximum number of cross-references. You can also use the traversal search method, that is, directly change the 32-bit IP address and search the whole Internet one by one.
Search engine is a high-tech network application system. It includes network technology, database technology, dynamic indexing technology, retrieval technology, automatic classification technology, machine learning and other artificial intelligence technologies.
2. Indexing technology
Index technology is one of the core technologies of search engine. Search engines should sort, classify and index the collected information to produce an index database, and the core of Chinese search engines is word segmentation technology. Word segmentation technology is to segment the words in a sentence by using certain rules and thesaurus to prepare for automatic indexing. At present, non-clustering method is the most widely used in indexing, which has a great relationship with language knowledge, as follows:
(1) Store the grammar library and match the vocabulary library to segment the words in the sentence;
(2) Store the vocabulary base, which should store the usage frequency and common collocation methods of vocabulary at the same time;
(3) Wide vocabulary, which can be divided into different professional libraries to facilitate the processing of professional documents;
(4) For sentences that cannot be segmented, treat each word as a word.
The indexer generates a relational index table from keywords to URLs. Index tables generally use some form of inverted table (1nversionUst), that is, search the corresponding URL through index items. The index table should also record the position of the index items in the document, so that the searcher can calculate the adjacent relationship or close relationship between the index items and store them on the hard disk in a specific data structure.
Different search engine systems can adopt different indexing methods. For example, Webcrawler uses full-text retrieval technology to index every word in the web page; Lycos only indexes the page name, title and the most important optional words such as 100 comments; Infoseek provides concept retrieval and phrase retrieval, and supports Boolean operations such as AND, or, near and not. The indexing methods of search engines can be roughly divided into three categories: automatic indexing, manual indexing and user login.
3. Retriever and result processing technology
The main function of the retriever is to search in the inverted list formed by the indexer according to the keywords input by the user, at the same time, to complete the correlation evaluation between the page and the retrieval, to sort the results to be output, and to realize a certain user correlation feedback mechanism.
Search engines often get hundreds of search results. In order to get useful information, the common method is to rank the web pages according to their importance or relevance, and rank them according to their relevance. Relevance here refers to the number of search keywords appearing in the document. When the quota is high, the document is considered to be more relevant. Visibility is also one of the commonly used metrics. The visibility of a web page refers to the number of hyperlinks in the portal of a web page. The visibility method is based on the view that the more a webpage is cited by other webpages, the more valuable it is. In particular, the more important web pages are, the more important they are. The result processing technology can be summarized as follows:
(1) Sorting by frequency Generally speaking, the more keywords a page contains, the better the relevance of its search target, which is a very reasonable solution.
(2) Rank pages according to the degree of being visited. In this method, the search engine will record the frequency of pages being visited. Pages that people often visit should usually contain more information or have other attractive advantages. This solution is suitable for general search users, and because most search engines are not professional users, this solution is also suitable for general search engines.
(3) The second search further refines the results (compared with flne), optimizes the search results according to certain conditions, and can select categories and related words for the second search.
Because the current search engine is not intelligent, unless you know the title of the document you are looking for, the first result is not necessarily the "best" result. Therefore, although some documents are highly relevant, they are not necessarily the documents that users need most.
Industrial application of search engine technology;
The industry application of search engines generally refers to a variety of search engine industries and product application modes provided by KW communication, which are generally divided into the following forms:
Industrial application of 1 and * * *
N Track and collect information sources related to business work in real time.
N Fully meet the global observation needs of internal staff for Internet information.
N Solve the information source problem of government extranet and government intranet in time, and realize dynamic publishing.
N Quickly solve the information acquisition needs of * * * main website for sub-websites at all levels.
N Comprehensively integrate information to realize cross-regional and cross-departmental sharing and effective communication of internal information resources of * * *.
N Save manpower, material resources and time for information collection and improve office efficiency.
2, enterprise industry application
N Real-time and accurate monitoring and tracking of competitors' dynamics is a sharp weapon for enterprises to obtain competitive intelligence.
N Get the public information of competitors in time, and study the development and market demand of the same industry.
N provides convenient and multi-channel enterprise strategic decision-making tools for enterprise decision-making departments and management.
N It is the key to improve the core competitiveness of enterprises to greatly improve the efficiency of obtaining and using information and save the related expenses of collecting, storing and mining information.
It is the nerve center to improve the overall analysis and research ability, rapid market response ability and establish a competitive intelligence data warehouse with knowledge management as the core.
3. Application of news media industry
N Track and collect thousands of online media information quickly and accurately, expand news clues and improve the collection speed.
N supports effective capture of tens of thousands of news items every day. The depth and breadth of the monitoring range can be set by yourself.
N Support intelligent extraction and review of required content.
N Realize the integration of Internet information content collection, browsing, editing, management and publishing.
4, industry website application
N Track and collect information sources related to websites in real time.
N Track the websites of industry information sources in time, and update the website information automatically and quickly. Update information dynamically.
N Realize the integration of Internet information content collection, browsing, editing, management and publishing.
N puts forward the business management mode of commercial websites, which greatly improves the business application requirements of industry websites.
Aiming at the generation of information website classification catalogue, a user-generated website classification structure is proposed. And the classification structure can be added and updated in real time. Not limited by series. Thereby greatly improving the applicability of the industry.
N Provide professional search engine SEO optimization services, and quickly promote the promotion of industry websites.
N provides advertising cooperation with CCDC call search engine. Establish industry website alliance to improve the popularity of industry websites.
5) Monitoring and monitoring of network information
Network public opinion system. Such as "kilowatt communication-network public opinion radar monitoring system"
N website information and content monitoring system, such as "KW Communication-website information and content monitoring system (in-station detective)"
With the rapid development of Internet and the increase of WEB information, users have to look for information in the ocean of information, just like looking for a needle in a haystack.
Just like a needle, search engine technology just solves this problem (it can provide information retrieval service for users). At present,
Search engine technology is becoming the object of research and development in computer industry and academia.
With the rapid increase of WEB information, search engines have gradually developed from 1995.
Technology. According to the article "Accessibility of Network Information" published in the July issue of Science magazine 1999, it is estimated that the current global
There are more than 800 million web pages, and the effective data exceeds 9T, and it is doubling every four months. Users should be in such a vast
Looking for information in Korea's information ocean is doomed to be a futile search for a needle in a haystack. Search engines are just to solve this "trek"
"The emergence of problems and technologies. Search engines use certain strategies to collect, discover and understand information on the Internet.
, extraction, organization and processing, and provide users with retrieval services, so as to achieve the purpose of information navigation. Search engines provide
Our navigation service has become a very important network service on the Internet, and search engine sites are also called "web portals".
. Search engine technology has therefore become the object of research and development in computer industry and academia. The purpose of this paper is to study.
The key technologies of this engine are briefly introduced to attract more attention.
classify
According to the different methods of information collection and service provision, search engine systems can be divided into three categories:
1. directory search engine: collect information manually or semi-automatically. After editing and viewing the information, people
Strive to form an information summary and put the information into a predetermined classification framework. Most of the information is website-oriented and provides directory browsing.
Browsing service and direct retrieval service. This kind of search engine adds human intelligence, so the information is accurate and the navigation quality is high.
The disadvantages are manual intervention, large maintenance, little information and untimely information update. The representative of this search engine is
: Yahoo, LookSmart, Open Directory, Go Guide, etc.
2. Robot search engine: A robot program called Spider automatically interacts with each other according to certain strategies.
Information is collected and found in the internet. Indexers build indexes for the collected information, and retrievers input according to users' queries.
Search the index database and return the query results to the user. The service mode is web full-text retrieval service. This kind of search
The advantage of wired engine is that it has a large amount of information, updates in time, and does not need manual intervention. The disadvantage is that it returns too many irrelevant messages.
Information that users must filter from the results. The representatives of this kind of search engines are AltaVista and Northern Ligh.
T, Excite, Infoseek, Inktomi, FAST, Lycos, Google domestic representatives are: skynet, youyou, o.
PenFind et al.
3. Meta-search engine: This search engine does not have its own data, but searches the query requests of multiple users at the same time.
The results returned by search engines are returned as their own results after repeated elimination and reordering.
For home use. The service mode is web-oriented full-text retrieval. The advantage of this search engine is that it returns a lot of information.
So, the disadvantage is that you can't make full use of the functions of the search engine you use, and users need to do more screening. This search reference
The representatives of Qing are web crawler, InfoMarket and so on.
Sexual energy indicator
We can regard the search of WEB information as an information retrieval problem, that is, searching in a document library composed of web pages.
Print documents related to user queries. Therefore, we can measure the performance parameter of traditional information retrieval system-recall ratio (R.
Ecall) and accuracy measure the performance of search engines.
The recall ratio is the ratio of the number of relevant documents retrieved to the number of all relevant documents in the document library, which measures the retrieval system.
The recall rate of the system (search engine); Accuracy is the ratio of the number of relevant documents retrieved to the total number of documents retrieved, which is a measure.
The purpose is to search the accuracy of the system (search engine). For a retrieval system, it is impossible to give consideration to both recall and accuracy.
Its beauty: the accuracy is low when the recall rate is high, and the recall rate is low when the accuracy is high. So we often use the recall rate of 1 1 and the accuracy of 1.
Average (i.e. 1 1 point average accuracy) to measure the accuracy of a retrieval system. For search engine systems, because there is no
There is a search engine system that can collect all web pages, so the recall rate is difficult to calculate. The current search engine department
Everyone cares about accuracy.
There are many factors that affect the performance of search engine system, the most important of which is information retrieval model, including documents and queries.
The representation method of, the matching strategy of evaluating the relevance between documents and users' queries, the sorting method of query results and the comparison between users.
Customs feedback mechanism.
Main skills
Search engine consists of four parts: searcher, indexer, retriever and user interface.
1. Prosecutor
The function of a searcher is to roam the Internet, find and collect information. It is often a computer program, day and night.
Keep running. It should collect all kinds of new information as much as possible and as soon as possible, and at the same time, because of the letters on the Internet.
The information is updated quickly, and the collected old information should be updated regularly to avoid dead connections and invalid connections. Currently there are
Two strategies for collecting information:
● Start with a set of initial URLs and follow the hyperlinks in these URLs, giving priority to width and depth.
Degree-first or heuristic method for searching information circularly on the Internet. These start URLs can be arbitrary URLs, but usually
Are some very popular websites with many links (such as Yahoo! )。
● Web space is divided by domain name, IP address or national domain name, and each searcher is responsible for exhausting a subspace.
Search. Searchers collect various types of information, including HTML, XML, newsgroup articles, FTP files,
Word processing documents, multimedia information. Searchers are usually implemented by distributed and parallel computing technologies to improve information.
The speed of discovery and update. The amount of information found by commercial search engines can reach millions of web pages every day.
2. indexer
The function of indexer is to understand the information searched by the searcher and extract index items from it to represent documents and students.
Document library index table.
There are two kinds of index items: objective items have nothing to do with the semantic content of the document, such as the author's name,
Website, update time, code, length, link popularity, etc. Content index entries are used to
Reflect the content of the document, such as keywords and their weights, phrases, words, etc. Content index items can be divided into single index items and
There are two kinds of multiple index items (or phrase index items). A single index item is an English word in English, which is relatively easy to extract.
Because there are natural separators (spaces) between words; For languages with continuous writing, such as Chinese, word segmentation is necessary.
Points. In search engines, it is usually necessary to assign a weight to a single index item to indicate the difference between the index item and the document.
Degree, and used to calculate the relevance of the query results. The methods used generally include statistics, information theory and probability theory. short
The extraction methods of language indicators include statistics, probability theory and linguistics.
Index tables generally use some form of inverted table, that is, search the corresponding documents by index items.
. The index table can also record the positions of index items in the document, so that searchers can calculate the adjacent positions between index items.
Proximity.
The indexer can use centralized indexing algorithm or distributed indexing algorithm. When the amount of data is large, it must be real-time.
Real-time indexing, otherwise you can't keep up with the rapid increase of information. Indexing algorithm of indexer
Performance (such as the response speed of large-scale peak queries) has a great impact. Search engines are very effective.
The degree depends on the quality of the index.
3. Retriever The function of a retriever is to quickly check out documents in the index library according to users' queries and compare them with each other.
Evaluate the relevance of the query, sort the results to be output, and realize some user relevance feedback mechanisms.
There are four kinds of information retrieval models commonly used by searchers: set theory model, algebraic model, probability model and mixed model.
4. User interface
The function of the user interface is to input user queries, display query results, and provide user-related feedback mechanisms. primary
The purpose is to facilitate users to use search engines and obtain effective and timely information from search engines in an efficient and multi-channel manner.
The design and implementation of user interface adopts the theory and method of human-computer interaction, which fully adapts to human thinking habits.
User input interface can be divided into simple interface and complex interface.
The simple interface only provides a text box for the user to input the query string; Complex interfaces allow users to limit queries, such as
Logical operation (AND, OR, NOT; +,-), proximity (adjacent, near), domain name range (such as. edu。 ).
, location (such as title, content), information time, length, etc. At present, some companies and institutions are considering making.
The condition of the query option.
Not come and go.
Search engine has become a new research and development field. Because it needs information retrieval, artificial intelligence and calculation.
Theories and theories in the fields of computer network, distributed processing, database, data mining, digital library and natural language processing.
Technology, so it is comprehensive and challenging. And because the search engine has a large number of users, it has good economic value.
It has attracted great attention from computer science and information industry all over the world, and its research and development are very active at present.
There are many noteworthy trends.
1. attaches great importance to improving the accuracy of information query results and the effectiveness of retrieval.
When querying information, we don't pay much attention to the number of returned results, but look at whether the results meet our own needs. As one of them,
For a query, traditional search engines often return hundreds of thousands or millions of documents, and users must filter the results. solve
At present, there are several ways to solve the problem of too many query results: First, get the sentences that users are not querying through various methods.
The real uses expressed in this paper include tracking users' retrieval behavior and analyzing user models by using intelligent agents. Use correlation
Degree feedback mechanism enables users to tell search engines which documents are related to their own needs (and their relevance) and which documents are related to their own needs.
Irrelevant, gradually refined through multiple interactions. The second is to classify the results with text classification technology.
Class, using visualization technology to display the classification structure, users can only browse the categories they are interested in. The third is to carry out on-site classes.
Clustering or content clustering reduces the total amount of information.
2. Information filtering and personalized service based on intelligent agent.
Information intelligent agent is another mechanism to use Internet information. It uses an automatically obtained domain model (such as We)
B knowledge, information processing, information resources related to users' interests, domain organization structure), user model (such as user background)
, interest, behavior, style) knowledge for information collection, indexing, filtering (including interest filtering and bad information filtering).
, and automatically submit information that users are interested in and useful to. Intelligent agent has the characteristics of continuous learning and strong adaptability.
The ability to dynamically change information and user interests, thus providing personalized services. Intelligent agent can be completed at the client.
It can also be run on the server side.
3. Use distributed architecture to improve the system scale and performance.
The implementation of search engine can adopt centralized architecture and distributed architecture, both of which have their own advantages. but
When the scale of the system reaches a certain level (for example, the number of web pages reaches 1 100 million), it is inevitable to adopt some distributed methods to improve it.
System performance. Except for the user interface, all components of a search engine can be distributed: searchers can
Collaborative division of labor on multiple machines for information discovery to improve the speed of information discovery and update; The indexer may
Distribute indexes on different machines to reduce the requirements of indexes on machines; The retriever can be on different machines.
How to change the search engine of delimit word search? You can't change it. Soso belongs to Tencent. How can I change it to Baidu?
How to make the pictures on the website become keywords that can be searched by search engines? The search engine does not support image search for the time being. You just need to add ALT tags to your pictures to match your keywords.
How does the search engine realize the search? This is a very complicated calculation method, and the technologies used by various search engines are different. Generally speaking, every website will have a certain feature code (we usually call it a tag) when writing a webpage, and the search engine will index these feature codes, as well as the database of the search engine, in which every website will be registered. There will be relevant results when searching. If you are interested in research, just search. It's not clear.
Want to know the websites of some foreign search engines (preferably picture search engines), what are the websites of foreign search engines (preferably picture search engines)? Blog: blogdex.media.mit.edu/
Blogdigger
:blogdigger。 /
Blog headlines
: Blog-News. Information
BlogStreet
:blogstreet。 /
Crayons (create your own newspaper)
: crayons. /
Fagan discoverer: blog, journal, and; Simple information aggregation
: faganfinder. /blog/
Feed worker
:feedster。 /
Free news
:newsisfree。
Syndicate 8.
:syndic8。 /
Technorati
:technorati。 /
Search engine refers to a system that collects information on the Internet according to certain strategies and uses specific computer programs, organizes and processes the information, and displays the processed information to users. It is a system that provides retrieval service for users.
Search engine consists of four parts: searcher, indexer, retriever and user interface, including full-text index, directory index, meta search engine, vertical search engine, aggregate search engine, portal search engine and free link list. Baidu and Google are representatives of search engines.
Is there a search engine that can search with pictures? At present, Baidu, Google, Soso and sogou all use keyword search methods. Because the image search method is too troublesome and the search theme is not clear, even if it is developed, the number of users will be very small and the benefits will not be high. It is estimated that all major developers have seen this before developing.
How to test the search function of web search engine? Some suggestions: 1. Help: How to test the security of the search function module of the website? Record an example of a search, and then let the attack be automatically tested. Possible security issues include XSS, SQL blind injection (especially search-based SQL injection) 2. Check whether the non-domain authentication login has a verification code. How to check if the attack can't be automatically checked, or study whether there is a verification code, what is the sign, such as the word verification? Configure scan rules for this standard. 3。 Whether to use ssl encryption channel for login verification, how to test this depends on whether the parameters such as password and user name are encrypted and whether transmission is adopted after submission.
The difference between pc search engine and mobile search engine is two concepts in itself, but many people like to confuse them.
First of all, Baidu claims that the crawler used on the mobile side is the same as that on the PC side, but we will see different displays when we look at the keyword rankings of some websites. The key is whether we have made a mobile website interface suitable for mobile phones. There are also some people who only adapt themselves. Don't worry, with the rationalization of mobile phone rankings, websites without mobile phones will go backwards.
In terms of customer experience, the mobile terminal needs to be simplified compared with the PC terminal because it displays less information.
- Related articles
- Why do I need SMS authentication to log in to the glory of the king? Briefly introduce the reasons why you need SMS authentication to log in to the glory of the king.
- How to publish the information of house rental?
- What is the number one dual machine?
- Where can I get the truck Beidou driver card?
- Excerpts from the passionate Moments copywriting about climbing the Great Wall (41 items)
- How to draw hide-and-seek
- Big Brother USA AT & amp; How to check network traffic? I'll call you back and send you a message. I will check *646# in minutes. )
- How to set the automatic deduction time of personal flexible employment social security
- Slogan of decoration company
- The last words of death in the Eye of the Trial: A guide to all-female character remake