WO2012025040A1

WO2012025040A1 - Visualized search engine system and implementation method and application thereof

Info

Publication number: WO2012025040A1
Application number: PCT/CN2011/078725
Authority: WO
Inventors: 黄斌
Original assignee: Huang Bin
Priority date: 2010-08-27
Filing date: 2011-08-22
Publication date: 2012-03-01

Abstract

Disclosed are a visualized search engine system and implementation method and application thereof. In the visualized search engine system, a web crawler apparatus implementing preview function of the search result page cost-effectively and efficiently, then a display control apparatus concurrently displays the text abstract and thumbnail of the Internet search result on the displayed page, facilitating users accurately identifying their desired content; lastly using semantic analysis to enable precision search, allowing the search engine to accurately provide the most desirable information from the database to the user. Also disclosed is a method for network shopping navigation based on the visualized search engine system.

Description

Visual search engine system and its implementation method and application

The invention relates to a visual search engine system for displaying Internet search results in an illustrated manner, and relates to a method for realizing display control of Internet search results by the visual search engine system and its application in network shopping navigation, and belongs to the field of Internet search technology.

Background technique

The Internet has become one of the main sources of information for people. Search engines play an irreplaceable role in helping users get the information they need quickly from the inexhaustible internet data. A search engine (search engine) collects information on the Internet according to a certain strategy and uses a specific computer program. After organizing and processing the information, the information is displayed to the user, thereby providing the user with information for searching for the service. service system. Instead of actually searching the web pages of the Internet, it searches the pre-organized web index database.

The vertical search engine (vert i cal search engine) is a valuable information and related service for a specific area, a specific group of people or a specific need, and is a subdivision and extension of the general search engine. It integrates a certain type of special information in the web index database, and directs the fields to extract the required data for processing and then returns it to the user in some form. The vertical search engine is characterized by "specialized, precise, deep" and has an industry color. Compared to the massive information disorder of the general search engine, the vertical search engine appears to be more focused, specific and in-depth.

Vertical search engines generally include the following technologies: 1. Search engine crawler: used to crawl related web pages on the Internet; 2. Web page structured information extraction technology or metadata collection technology: used to extract structures from web pages 3. Data segmentation, indexing: used to store and index data; 4. Data presentation: Since the stored data is not simple web page data, it needs to be considered for display according to industry needs.

In terms of data display, current mainstream search engines such as Google, Baidu, Bing, etc. have different layouts for their search results display pages. But there are more similarities between them than they are different, for example, they all display search results in a text-only manner. The page title is displayed for each search result, and a page description summary is followed by the page title. This layout design can present more search results in one page, but since only the text summary of the web page is displayed, the user clicks on a search result based on the content of the text summary, but finds that the page appears far from the page he wants. . Therefore, the user can only click back, and then click on another search result, resulting in a poor user experience. To this end, Google launched a visual preview of search results in 2010, allowing users to preview each page directly as a thumbnail in the search results list. The user will see a magnifying glass logo to the right of the search results. Click on the magnifying glass to see a thumbnail preview of the page. Users can also swipe down to see a preview of all search results. However, the hardware and software costs of achieving the above effects are enormous.

There are still some technical means to implement the page preview function, such as using the CGI program to capture the image area of the browser, and using the drawing function of the browser to generate the image. However, the prior art does not utilize the web crawler Set up a solution that implements the page preview feature. In the operation process of the existing web crawler device, the content of the webpage is generally analyzed only by the webpage file, and the content thereof is extracted. Some web crawlers go a step further and perform simple processing on these contents, such as semantic annotation, which is convenient for search engines to sort and sort. However, these web crawlers generally do not have the function of page rendering, so the search result page preview function cannot be conveniently implemented.

In terms of word segmentation and indexing, the existing search engine searches into its own database system based on the keywords input by the user, and feeds back the results of the retrieval to the user. In this process, the biggest problem is that the user does not know what kind of keywords should be entered in order to accurately express the information that they need to search. The search service provider needs to analyze and judge according to the information input by the user, and provide the search information according to the judgment result. Therefore, it is often unanswered between the judgment of the search service provider and the needs of the user.

With the continuous development of web search technology, the concept of intelligent search has emerged. The so-called intelligent search uses word segmentation dictionary, synonym dictionary, and homonym dictionary to improve the retrieval effect. Further, it can assist the query at the knowledge level or the concept level, and form a knowledge through the topic dictionary, the upper and lower dictionary, and the related peer dictionary search processing. The system or concept network gives the user intelligent knowledge prompts, which ultimately helps the user to get the best retrieval results. For example, query "computer", information related to "computer" can also be retrieved; you can further narrow the scope of the query to "microcomputer", "server" or expand the query to "information technology" or query related "electronics" Technology, "software", "computer application" and other categories. In addition, some existing search engines also provide so-called "association" functions, which are based on previous user selection results for statistical analysis, and based on these analysis results to provide the most likely results for users to choose. But this does not actually solve the problem of the accuracy of web search, because for a large number of people, there is a certain statistical law, and for a certain search of a specific user, the statistical rules do not have much meaning.

Vertical search engines have many applications, such as enterprise search, supply and demand information search engine, shopping search, property search, talent search, map search, mp3 search, image search, and so on. Taking the shopping search engine as an example, the overall workflow is as follows: After crawling the webpage, the webpage product information is extracted, the product name, price, introduction, etc. are extracted, and then the information is cleaned, deduplicated, classified, analyzed and compared, and the data is extracted. Mining, finally providing user search through word segmentation index, providing market market report through analysis and mining.

However, there is a common shortcoming of the existing search engine-based online shopping navigation technology, that is, there is a disconnection of the webpage jump from the search to the viewing to the purchase of the entire experience process, and the user often cannot find the original purchase path at the end, only Being able to re-use the search engine to search again is a waste of time and effort.

Summary of the invention

A first technical problem to be solved by the present invention is to provide a visual search engine system. The visual search engine system can display Internet search results in an illustrated manner.

A second technical problem to be solved by the present invention is to provide a method for the visual search engine system to implement display control of Internet search results.

A third technical problem to be solved by the present invention is to provide a system based on the above-described visual search engine. The current online shopping navigation method.

In order to achieve the above object of the invention, the present invention adopts the following technical solutions:

A visual search engine system comprising a web crawler device, a display control device and a semantic analysis device, characterized in that:

The web crawler device further includes a plurality of information collectors, a page analyzer, a URL filter, a page filter, a URL manager, a picture generator, a URL library, and a page library; wherein

The information collector is located at the bottom layer of the web crawler device, and directly interacts with the Internet to obtain a web page, and the page analyzer is connected with the information collector, and parses the link mark from the page content. The URL is forwarded to the URL filter for parsing; on the other hand, the page content is parsed into a text format and submitted to the page filter for processing;

After the URL filter filters the URL and limits the site scope and the theme, the URL filter is stored in the URL library; after the page filter performs redundancy detection of the page content, the detected page is stored in the page library;

The image generator is connected to the URL library, and generates a picture corresponding to the page for the URL stored in the URL library; the display control device further includes:

a text search result display unit for displaying text search results in a list manner;

a graphic search result display unit, configured to display a webpage thumbnail corresponding to the text search result;

a focus tracking unit for capturing text focus and/or graphic focus of the user's attention;

a focus webpage thumbnail display unit, configured to display a graphic focus corresponding to a text focus selected by the user; a synchronous display control unit, configured to synchronously display the displayed text focus and the graphic focus in the display page, and perform a bidirectional loop through the leading pointer The linked list realizes synchronous and coordinated changes;

The text search result display unit is located at a left middle position of the entire display page, the focus webpage thumbnail display unit is located at a central area of the entire display page, and the graphic search result display unit is respectively located at the upper right of the focus webpage thumbnail display unit. Corner and bottom right corner;

The semantic analysis device further includes:

The input word segment unit is configured to accept a target information description word input by the user, and perform a word segmentation operation on the target information description word;

a semantic determining unit, configured to determine whether the target information descriptor has complete semantics;

a reference vocabulary unit, configured to provide a vocabulary associated with the target information descriptor to the user if the target information descriptor does not have complete semantics;

The secondary input unit is configured to perform secondary input by the user, thereby determining semantics of the target information descriptor, and performing subsequent retrieval according to the semantic.

A method for implementing display control of Internet search results by the above visual search engine system, comprising a page rendering step, a display control step and a semantic analysis step, wherein:

The page rendering step includes the following sub-steps:

(1) generate a start tag of the web page; (2) rendering the content in the page template, wherein each time a label is entered, the life cycle stages of the label are sequentially invoked;

(3) rendering the body in the web page;

(4) Generate an end tag of the web page;

(5) Clearing the data;

The display control step includes the following sub-steps:

(6) In the display page, the text search result is vertically arranged in parallel with the corresponding webpage thumbnail, and the central part of the display page is a focus display area for displaying the graphic focus corresponding to the text focus selected by the user;

(7) the text focus and the graphic focus are synchronously displayed in the display page, and the synchronous coordination change is implemented by the bidirectional circular linked list with the leading pointer, wherein the head pointer is used to implement the judgment of the position of the text focus; The semantic analysis step includes the following substeps:

(8) accepting a target information descriptor input by the user, performing a word segmentation operation on the target information descriptor word; 0) determining whether the target information descriptor word has complete semantics;

(10) If yes, perform subsequent retrieval directly; if not, provide the user with a vocabulary associated with the target information descriptor;

(11) The user performs a secondary input to determine the semantics of the target information descriptor, and performs subsequent retrieval based on the semantics.

A web crawler with page rendering function, characterized in that:

The web crawler includes a plurality of information collectors, a page analyzer, a URL filter, a page filter, a URL manager, a picture generator, a URL library, and a page library; wherein

The picture generator is connected to the URL library, and generates a picture corresponding to the page for the URL stored in the URL library. The information collector starts from the information source, requests through the hup protocol, downloads a web page, the page analyzer analyzes the page and extracts the link, and then the information collector accesses the network in an iterative manner.

The information collector searches the web page by using a graph traversal algorithm.

The URL filter uses the semantic information of the extended metadata to perform topic correlation prediction on the URL extracted from the Web page, and performs pruning processing according to the principle of collecting related links and discarding irrelevant links.

The URL manager obtains a URL list from the URL library on the one hand, and assigns the task to a plurality of information collectors after the task is arranged; on the other hand, obtains a new URL list from a plurality of information collectors, and saves the lists to In the URL library. A method for implementing a page rendering function by a web crawler device, comprising the steps of:

(1) Generate a start tag of the web page;

(2) rendering the content in the page template, wherein each time a label is entered, the life cycle stages of the label are sequentially invoked;

(3) rendering the body in the web page;

(4) Generate an end tag of the web page;

(5) Clear the data.

A method for implementing a page rendering function by a web crawler device, comprising the steps of: when a picture tag is found to refer to a picture, a request is sent to the server; at this time, the following code is continued to be rendered, and the server returns the picture. File, then re-render this part of the code.

When it is found that there is a < _SC ript^ tag of JavaScr ipt code, execute the statement, re-render part of the code, and then generate the image as a result of the rendering.

A display control device for displaying search results in an image and text manner, comprising: a text search result display unit, configured to display a text search result in a list manner;

a graphic search result display unit, configured to display a webpage thumbnail corresponding to the text search result; a focus tracking unit, configured to capture a text focus and/or a graphic focus of the user's attention;

The text search result display unit is located at a left middle position of the entire display page, the focus webpage thumbnail display unit is located at a central area of the entire display page, and the graphic search result display unit is respectively located at the upper right of the focus webpage thumbnail display unit. Corner and bottom right corner.

In the display page of the display control device, the text search result is vertically arranged in parallel with the corresponding web page thumbnail.

The position of the head pointer in the bidirectional circular linked list corresponds to the position of the text focus in the text search result.

A display control method for displaying search results in an illustrated manner, characterized in that:

In the display page, the text search result is vertically arranged in parallel with the corresponding webpage thumbnail, and the central part of the display page is a focus display area for displaying the graphic focus corresponding to the text focus selected by the user; the text focus and The graphic focus is synchronously displayed in the display page, and the synchronous coordinated change is realized by the bidirectional circular linked list with the leading pointer, wherein the head pointer is used for realizing the judgment of the position of the text focus.

A method for realizing accurate search by using semantic analysis, which is characterized by the following steps:

(1) accepting a target information description word input by the user, and performing a word segmentation operation on the target information description word;

(2) determining whether the target information descriptor has complete semantics;

(3) If yes, perform subsequent retrieval directly; if not, provide the user with the description of the target information Word associated with the word;

(4) The user performs a secondary input to determine the semantics of the target information descriptor, and performs subsequent retrieval based on the semantics.

A network shopping navigation method is implemented based on a visual search engine system including a web crawler device and a display control device, wherein the web crawler device is configured to capture and generate a webpage thumbnail, wherein: the visual search engine system When used for web shopping navigation, first, according to the shopping object keyword input by the user, the display control device displays the text search result of the shopping object on the left side of the search result page, and displays the upper right corner and the lower right corner of the search result page. a webpage thumbnail corresponding to the text search result, and a central area of the search result page displays a thumbnail of the focus webpage of the shopping object currently selected by the user;

A selection column is set in the search result page, and the user puts the selected search result into the selection column for comparison, and then enters the webpage where the shopping object is located from the selection column to purchase.

Preferably, in the selection column, a webpage ID is set for each target webpage for the shopping object, and the webpage ID is transit managed.

Preferably, the selection column temporarily saves the webpage ID, and the item to be purchased is added or discarded by the operation of the webpage ID.

Preferably, a favorite folder is further set in the search result page, and the favorite is open to the registered user, and the webpage ID selected by the registered user is stored for a long time.

Preferably, when comparing, the thumbnails of the shopping objects captured and generated by the web crawler are grouped together for selection by the user.

Preferably, after the user determines the shopping object to be purchased, the user enters the online shop page where the shopping object is located to make a purchase, and the online shop page is displayed in a virtual floating manner.

The visual search engine system and the implementation method thereof provided by the invention use the web crawler device to directly render the webpage to the page, and save the rendering result directly in the image format, thereby laying a technical foundation for realizing the page preview function with low cost and high efficiency. ; display the text summary and web page thumbnail of the Internet search results in the display page, so that users can accurately identify the content they need; use semantic analysis to achieve accurate search, so that the search engine can accurately think of the user from the database The information you want is available to the user. When the above-mentioned visual search engine system is used as a vertical search engine for online shopping occasion services, the "search", "view" and "comparison" in the online shopping process are integrated into the interior of the visual search engine system, thereby forming a complete The online shopping navigation process effectively improves the user's online shopping experience.

DRAWINGS

The present invention will be further described in detail below in conjunction with the drawings and specific embodiments.

1 is a schematic diagram of an overall architecture of a visual search engine system provided by the present invention;

2 is a schematic diagram of the overall composition of the network crawler device in the visual search engine system; FIG. 3 is a schematic flowchart of the basic function of the network crawler device to implement the network crawler;

4 is a schematic flowchart of a web crawling device implementing a page rendering function; 5 is a schematic diagram of a display page of a display control device in the visual search engine system;

6 is a schematic diagram of a bidirectional circular linked list for implementing a head pointer of a synchronous display control unit; FIG. 7 is a schematic diagram showing a correspondence between an initial state of a page and a bidirectional circular linked list;

8 is a schematic diagram showing a correspondence between an intermediate state of a page and a bidirectional circular linked list;

9 is a flow chart of a method for realizing accurate search by using semantic analysis in the present invention;

FIG. 10 is a diagram showing an example of a homepage of the visual search engine system as a network shopping search engine; FIG. 11 is a diagram showing an example of a search result page when performing a “search” according to user input information;

Figure 12 is a diagram showing an example of a display page in which a user selects a preliminary selected shopping object together to "view" for "comparison";

FIG. 13 is a diagram showing an example of a user entering the online shop where the shopping object is located after the user compares the shopping object to be purchased.

detailed description

The visual search engine system provided by the present invention mainly solves the technical problems of the three aspects. The first is to realize the preview function of the search result page with low cost and high efficiency. The second is to display the text summary and web page thumbnail of the Internet search result in the display page, so that the user can accurately identify the content that he needs. Finally, the semantic analysis is used. Achieve accurate search, so that the search engine can accurately provide the user with the most desired information from the database. Detailed explanations are provided below.

As shown in FIG. 1 , the visual search engine system can implement multiple service functions such as web page collection, web page sorting and indexing, page rendering, and query service. These service functions are mainly realized through the cooperation of the web crawler device, the display control device and the semantic analysis device, and the specific description is as follows:

As shown in Figure 2, the web crawler in the visual search engine system is mainly composed of the following parts:

Information collector

Each information collector is a web spider (Web Spider), which is at the bottom of the web crawler. It is an interface for web crawlers to interact directly with massive Internet information (such as forums, blogs, WAPs, documents, audio and video materials, etc.). section. The role of the information collector is to obtain a web page. It usually starts from an information source (such as a user query, a URL list, or a certain page), requests through the h«p protocol, downloads a web page, analyzes the page and extracts the link, and then the information collector accesses iteratively. The internet. In a specific embodiment of the invention, the information collector preferably searches for the web page using a graph traversal algorithm, such as a breadth-first or depth-first strategy.

In order to ensure high-speed access to information in Web pages, the web crawler uses multi-threading technology for each information collector based on the parallel mechanism. In general, each information collector can start hundreds of threads simultaneously for page information collection. The URL manager manages the URL queue to be collected by means of interleaving access, and allocates collection tasks to each information collector. Therefore, it is ensured that at most one thread of the same information collector is connected to the same web server, thereby effectively avoiding the web server from being accessed. The amount suddenly increased and there was a blockage or even a downtime.

2. Link (URL) filter

Stored in the URL library are all URLs extracted from the collected pages, in order to avoid the collection of pages The "topic drift" problem, these URLs must be subject to topic relevance prediction before entering the URL library. We use the semantic information of extended metadata (ie, HTML tags such as Anchor) to perform topic correlation prediction on URLs extracted from the collected pages, and then cut according to the principle of collecting related links and discarding irrelevant links. Branch processing reduces the number of unrelated pages collected by the system, thereby saving a lot of system operation costs and effectively improving the speed and efficiency of topic information search. The link filter will be predicted as a link (URL) into the library of the topic-related page, and then distributed as a to-be-collected URL by the URL manager to each information collector to collect the web page pointed to by the URL link.

3. Page filter

In order to further improve the precision of the system, it is necessary to make a topic correlation judgment on the collected pages, that is, page filtering. This is essentially a process of text topic classification. Improve the precision of the system by removing less relevant pages (less than the set threshold). According to the theory of total information, natural language as the subject of cognition, "the state of motion of things and its changing ways", including form, meaning and its utility to the subject of cognition, are called grammatical information, semantic information and language of things. Use information, and the whole of these three is called "full information." Natural language texts have the characteristics of synonymousness of words, polysemy of words, etc. Web text is a special carrier of natural language. Therefore, when judging whether a text is related to the collection theme of the system, we should not only care about the grammar of the text. Information, also needs to care about the semantic accuracy of the text. The webpage filter of this web crawler is based on this, absorbs the idea of the traditional vector space model, uses the concept-based vector space method to filter the page content, and maps the vocabulary to the conceptual level, the conceptual meaning expressed from the word. The hierarchy is also the semantic level to analyze the relevance of the text.

4. Page Analyzer

The main function of the page analyzer is to parse the content of the captured page. It can be divided into two parts: one part is to parse the URL with the link mark, and the URL filter is parsed to extract the link; the other part is the page content. Parse to text format and hand it to the page filter for processing.

5. URL Manager

The main function of the URL Manager is to manage URL tasks. On the one hand, the URL manager obtains a list of URLs from the URL library, and arranges them for assignment to multiple information collectors. On the other hand, the URL manager obtains a new list of URLs from multiple information collectors, and these lists are Save to the URL library with a certain strategy.

As shown in FIG. 3, when the above-mentioned web crawler implements the basic functions of the web crawler, the URL manager starts the information collector to start the collection of the web page, and stores the collected web page. It is then analyzed by the page analyzer to get both the mark and the page. The tag is parsed by the URL filter, and the page part is sent to the page filter. After the content filter is detected by the page filter, it is stored in the page library. The web page is sent to the URL library after filtering by the URL filter to limit the scope and theme of the site. Thereafter, the image generator connected to the URL library starts working, and the image corresponding to the page is generated for the URL stored in the URL library. A detailed description will be given below.

First, the user enters a URL to make a request to the server, and the server returns a web page in html format; The parser starts to load the source code of the html language. If it finds that there is a <l ink> tag in the <11^01> tag that references the external CSS file, the CSS file is issued, and the server returns the CSS file; the page parser continues to load Enter the code in the 00^> section of the html and start rendering the page.

As shown in Figure 4, the specific steps of the web crawler to implement the page rendering function are as follows:

1. Rendering preparation phase

Used for pre-rendering operations, such as initializing some data;

2. Generate a start tag

The start tag used to generate an Html file;

3. Render the template

This step is mainly used to render the content in the template. At this stage, there are usually multiple tags to be rendered. Each time a tag is entered, the life cycle stages of the tag are called in turn. That is, the local is a recursive entry from the upper tag to the lower tag, and only the lower tag is rendered. , the calling component will continue the operation of the subsequent phase.

4. Rendering body

Similar to rendering a template, it also renders the content of a template. For example, for a tag (<a href="page l ink") thi s i s body</ a> ), its body is the text " thi s i s body ".

5. Generate end tag

This step is generally used to generate an end tag or to control the execution flow of the inline tag.

6. Clear data

The other phases are not often used, and more are to ensure the integrity of the life cycle.

It should be noted that when an <img> tag is found to reference an image, a request is made to the server. At this time, you don't have to wait until the image is downloaded, but continue to render the code behind; the server returns the image file. Since the image takes up a certain area, it affects the arrangement of the following paragraphs, so you need to go back and re-render this part of the code; when you find a <script^ tag with a JavaScript code, execute the statement and re-render the JavaScript processing. Part of the page code; then the image generator will generate the image as a result of the rendering.

The present invention has been described above by taking a Web page in the html format as an example. However, the web crawler having the page rendering function provided by the present invention is not limited to processing pages in the html format, and web pages in other formats can be directly processed.

By using the web crawler device in the visual search engine system, when we search according to the address of the webpage, we can not only understand the basic content of the page, but more importantly, can see the basic display effect, thereby learning more about the whole. The content of the page.

In the present invention, the Internet search result to be displayed includes two types of data-text search result data and corresponding web page thumbnail data, instead of a single type of text data or graphic data. In order to display as many search results as possible in the same display page, and at the same time to achieve effective control of a variety of data, the relationship between the two types of data is reflected. The display control device in the visual search engine system is as shown in FIG. The displayed display position setting scheme, that is, the text display area and the graphic display area are vertically arranged in parallel, and the focus display area is set in the central part of the display page. The selected text focus is combined with the corresponding graphic focus and arranged on the same horizontal line. This display position setting scheme considers that the reading order of the text is from left to right. To comply with people's reading habits, the same related content (ie, corresponding text focus and graphic focus) must be listed on the same horizontal line from left to right. Show.

As shown in FIG. 5, the display control device in the visual search engine system simultaneously displays a text summary (ie, a text search result) of the search result and a corresponding web page thumbnail in the display page. In order to achieve a better display effect, the display control device includes at least three display function units, which are a text search result display unit, a focus web page thumbnail display unit, and a graphic search result display unit. Wherein, the text search result display unit is located at the middle of the left side of the entire display page, the focus webpage thumbnail display unit is located in the central area of the entire display page, and the graphic search result display unit may have multiple, respectively located in the focus webpage thumbnail display unit The upper right and lower right corners (other locations are also available).

In the text search result display unit, the text search result of the web search can be displayed in a list. For example, in the embodiment shown in Fig. 5, the text search result 1 to the text search result 5 are displayed in a list. In the case where the display control device is used as a computer display, the user can further click on the text search results by using the mouse, for example, clicking the text search result 3 as the text search result of interest, and further clicking the corresponding link. . In order to prevent the customer from clicking a text search result, but found that the page appears far from the desired page, the display control device between the text search result display unit and the focus page thumbnail display unit, the graphic search result display unit Correlating the display content, wherein the webpage corresponding to the text search result (ie, the text focus) selected by the user in the text search result display unit is associated with the focus webpage thumbnail display unit, and other text search results are combined. The corresponding web page is associated with the graphic search result display unit. In other words, the thumbnail of the web page corresponding to the text search result (ie, the text focus) selected by the user is always displayed in the thumbnail page display unit of the focus page, and the user is not displayed in the display unit of the graphic search result. The thumbnail of the web page corresponding to the selected other text search results. The above-described focus web page thumbnail display unit and graphic search result display unit can implement web page thumbnail display using techniques such as web crawler (Web Crawler).

In the present display control device, the display area occupied by the focus web page thumbnail display unit is large and always located in the center area of the display page. In this way, the thumbnail of the webpage corresponding to the text search result selected by the user can be clearly and comprehensively displayed, which is convenient for the user to decide whether to perform further click operations. In the visual preview function of the search results provided by Google, the thumbnail of the webpage is only displayed on the right side of the search result and the display area is small, so it is difficult for the user to see the specific content of the thumbnail of the webpage, and it is not convenient to make further Click to judge.

In the following, a specific method for realizing the display of the Internet search results by the display control device is further introduced. After achieving effective text focus and/or graphical focus position placement, the more important task is to achieve synchronous control of the text stream, image stream, and display focus, requiring a single control adjustment to the text stream, image stream, and focus. When relevant data streams are enabled, they can be effectively synchronized.

Among the display control devices, the display content of the focus web page thumbnail display unit can be changed. When the user's focus of attention (indicated by the position of the mouse operated by the user) changes, the text focus and the corresponding graphic focus will change accordingly, thereby achieving effective focus conversion.

To this end, a focus tracking unit is specifically provided in the display control device for capturing the text focus and/or the graphic focus of the user's attention; and a synchronous display control unit for assisting the user to display the display content of the focus webpage thumbnail display unit. Feel free to adjust, and make the text focus and graphic focus appear synchronously in the display page, achieving synchronous and coordinated changes.

The above-described synchronous display control unit can synchronously control three kinds of data (character stream, picture stream, and display focus) using a bidirectional circular linked list with a leading pointer L as shown in FIG. This bidirectional circular linked list is recyclable. When the user clicks on any of the text stream or the picture stream or the display focus, the control list is adjusted accordingly. The adjustment of the control linked list drives the adjustment of other data streams, and the whole can be effectively synchronized. This bidirectional circular list must be bidirectional, that is, the display focus can be adjusted from top to bottom in the text stream, or it can be adjusted from bottom to top, and the picture stream is the same. Finally, the bidirectional circular list also has a head pointer L, which is used to determine the focus position. When the position of the head pointer L changes, that is, the entire focus display content changes, thereby causing the display of the text focus and the graphic focus to change.

Figure 7 is a schematic diagram showing the correspondence between the initial state of the page and the bidirectional circular linked list. In the initial state, both the text focus and the graphic focus are located at the top of the display page, and the head pointer L in the corresponding bidirectional circular list is located at the leftmost position. Fig. 8 is a schematic diagram showing the correspondence between the intermediate state of the page and the bidirectional circular linked list. At this point, both the text focus and the graphic focus are in the middle of the display page, and the head pointer L in the corresponding bidirectional circular list is also in the middle position. The bidirectional circular linked list here is a controller that calls both the text stream and the graphics stream. It provides parameters for the text stream and the graphics stream that should be in the focus display state or need to be changed. For example, the number of nodes in the bidirectional circular linked list corresponds to the logarithm of the search results to be displayed, and each node corresponds to a pair of search results (text search) The result and the corresponding web page thumbnail). When the L pointer points to which position, the focus is on which pair of search results. The thumbnail of the webpage in the search result is displayed by the focus webpage thumbnail display unit, and the corresponding text search result is in the focus position in the text search result display unit. The display position of other search results is adjusted accordingly. As the focus of the text and the focus of the graphic change, the position of the head pointer L also changes accordingly, but the relationship between the text stream and the picture stream has been fixed by the bidirectional circular linked list, realizing the synchronous display.

In addition, the present invention converts the processing method of "input one-word-retrieve" used by the existing search engine into the processing method of "input one-word-semantic-sense-retrieve", that is, semantic judgment after the word-dividing operation, and judgment input Whether the word is information with certain semantics, if so, subsequent retrieval is performed directly; if not, the user is provided with a vocabulary associated with the input word. The user then performs a second input (selected in the associated vocabulary) to accurately determine the true semantics of the user input information, thereby obtaining accurate web search results based on the semantics.

As shown in Figure 9, the network search method is implemented in the form that it needs to be performed twice in most cases. Input operation: After the user inputs the most important target information descriptor (first input operation), the related vocabulary is retrieved and provided to the user, and the user selects from it (the second input operation), thereby clarifying the specific search. The goal is to enable the search engine to accurately provide the user with the most desired information from the database. Specifically, the web search method adds a "meta vocabulary association database" to the search engine, and encourages the user to input a word that best represents the search target in the "meta vocabulary association database", and the word is the most information he needs. Important target description. When the search engine accepts the words input by the user and performs the maximum segmentation, it judges whether the input information has complete semantics, and if so, directly performs subsequent search operations. If not, the words input by the search engine are in the "meta vocabulary association". Correlation analysis is performed in the database, and the user is provided with a multiple choice based on the results, so that the user can more accurately describe the target information he needs through further selection. This multiple choice has all the relevant information related to the first input word, so that the real purpose of the user is very accurate for the search engine, and the search engine can quickly and cost-effectively provide the search results required by the user. user.

It should be noted that the network search method does not simply decompose the existing one-step search into a two-step optional search, but instead discards the usual one-step search method and discards the multi-step or indefinite step query method. This is based on the results of two studies, namely:

1. For the "meta vocabulary", a relatively recognized basic set is on the order of a million, and when doing a one-step search, a million-level "meta vocabulary" is not enough to express the latest vocabulary development. However, if a two-step search is performed, it is theoretically possible to express a lexical space of the order of a million square meters, that is, to reach the trillions of orders of magnitude. This order of magnitude should be sufficient to express all possible metadata in the existing information space. Moreover, the number of association libraries can be limited or refined according to the requirements of practicality, thereby reducing the computational overhead of the search engine and achieving the purpose of reducing costs.

2. Implementing this web search method requires semantic analysis. Because the formation of semantics needs to include two parts, the so-called "ontology" and "behavior". Only when these two parts are and form an association (including two cases where the two are certain "ontologies") can a meaningful semantics be formed. Therefore, the two-step search method first determines whether the user's input forms a complete semantic. If it cannot be formed, the determination of "ontology" is taken as the first step, and the target of the second search is determined as "behavior". In this way, a two-step search can completely constitute a valid "semantic search", thus providing users with information suitable for their needs.

In the network search method, the "meta-lexical association database" is not represented by one-dimensional data of a simple relational database, but by a multi-dimensional associated vocabulary matrix. Specifically, we uniformly encode each meta-word and use the relevant code as its representation. The specific coding scheme can be implemented by using various existing technologies, such as an XML method, an N3 mode, and a triplet mode, and will not be described in detail herein. When the coding of a certain vocabulary is determined, the effective association analysis between the vocabularies generates the associated information of a certain vocabulary, that is, for a meta vocabulary S, S {ci, dj} is used to store the associated vocabulary, and Ci is classified as the first layer, and dj is classified as the second layer. Here, ci, dj can be selected according to the needs of the designer, for example, classifying ci according to the subject of knowledge, and classifying dj according to people's needs. Through this two-layer classification table Show, then a vocabulary produces a linked vocabulary. For example: The word car, the first categorization of its associated vocabulary can be expressed as (model, brand, manufacturer, seller, repair, performance, picture), etc., while the second tier classification such as the model includes (small car, truck) , trucks, etc. Its complete representation is:

S {model car, truck, truck

Γ3⁄4

Pulp ffi

Return

H t }

When the user uses the search engine adopting the web search method, if the input word is "car" for the first time, after searching for s= "car", the search engine detects the associated vocabulary matrix of "car" and provides (ci) the textual information expressed, that is, the associated vocabulary is "=model, brand, manufacturer, vendor, repair, performance, picture", and when the user makes a second confirmation, if the word vehicle type is selected, The search engine will retrieve all texts including {"cars, trucks, trucks" and "models" to truly retrieve all the content the user needs.

In another embodiment of the present invention, a user enters the search engine through the Internet in a computer at his home (such as College Road, Haidian District, Beijing), the purpose of which is to search for the location of the ICBC nearest to his home. So he entered the word "ICBC" in the search box. The usual search engine will immediately display the text information about the four words "ICBC" on the Internet to the user according to their own sorting method. The user selects a webpage that may include the information he or she needs from at least tens of pages of the selection, and then clicks on the link to enter it, and finds the information he needs from the webpage. When using the search engine using this web search method, when the user inputs the words "ICBC", the first use of the maximization word segmentation algorithm (such as forward maximization word segmentation algorithm, reverse maximization word segmentation algorithm or probability maximization word segmentation) Algorithm, etc.) Determine this is a vocabulary, and according to whether it has the principle of complete semantics, judge that this is not a complete semantic input, but a vocabulary, and understand that the information required by the user is related to ICBC. Information (the understanding here is based on the fact that the user is a valid input, not an unintended input, so it can be easily determined based on the input of a word). After understanding the purpose of this use, we can relatively fully list all vocabulary information related to "ICBC" in the search engine's meta-vocabulary association database, and based on these words and the word "ICBC" The degree of relevance is arranged, for example, "ICBC outlets", "ICBC business hours", "ICBC online banking", etc., and only some of the most relevant words can be listed. Such as "stock", "institutional profile", "new news", "address", "service process", "corporate culture" and so on. When users face these related words, they can choose "address". Then, based on the user's IP address, the search engine can accurately determine that the information to be searched by the user is actually "the address of the ICBC closest to the location of the IP address." In this way, the search engine can accurately retrieve it in its own database and will have "College Road ICBC Address" The web page is displayed directly to the user, and the user can quickly get the information they really need.

When the above-mentioned visual search engine system is used as a vertical search engine (hereinafter referred to as a map search) for a web shopping occasion service, the homepage of the online shopping navigation website provided is shown in FIG. In the homepage of the Tubu search, the search box is located at the top of the page, the left side is the eye-catching "Picture" logo, and the bottom is a series of commonly used search shortcuts, including "Latest", "Recommended", "Cosmetics", "Group purchase", "comprehensive shopping", "shopping discount", "digital home appliances", "female fashion", "mother and baby children", "clothing apparel" and so on. Below the search box and search shortcuts above is a selection of themes consisting of a series of thumbnails of web pages. These web page thumbnails are all generated by crawling by web crawlers in the visual search engine system.

The use process of the graphic search provided by the present invention fully respects the usage habits of ordinary users, including the steps of searching, screening, comparing and screening (this step can be omitted), and entering the online shop page where the shopping object is located. These steps are very similar to using other shopping search engines. But when the existing shopping search engine is in use,

The operations of "viewing" and "comparing" often need to leave the website where the shopping search engine is located, which is inconvenient to operate, and users often cannot find the original search portal after a complicated web page jump. In order to solve this problem, this Tubu search integrates "search", "view" and "comparison" in the online shopping process into the interior of the visual search engine system through the cooperation of the web crawler device and the display control device. This forms a complete online shopping navigation process that greatly improves the user's shopping experience.

FIG. 11 is a diagram showing an example of a search result page when "searching" is performed based on user input information. In the example diagram, the user inputs the shopping object keyword of "car" in the search box, and then displays the text search result related to the shopping object "car" on the left side of the display page, and displays the upper right corner of the page and The thumbnail of the web page corresponding to the text search result is displayed in the lower right corner. The focus web page thumbnail of the shopping object currently selected by the user is located in the center area of the entire display page. The basic frame of the display page shown in Fig. 11 is determined by the display control device in the visual search engine system, and thus is very similar to the display page shown in Fig. 5.

Since the display page shown in FIG. 11 can clearly display the thumbnail of the webpage where a certain shopping object is located, the user can complete the "viewing" operation without clicking the webpage where the shopping object is located. Since the "view" operation is completely done inside the map search, the user's operation is greatly simplified. On the other hand, the user can use the map purchase search to see the related information of the object to be purchased and its price, and realize the selection of the shopping object by selecting the webpage, thereby realizing the search object at the search engine level, and making the purchase of the map. Searching for the role of online shopping navigation is more prominent.

In order to facilitate the selection and comparison of the user when using the map search, the selection bar and the favorites are set in the search result page shown in FIG. In the selection column, a webpage ID is set for each target webpage for the shopping object, and the webpage ID is transited during the processing of the target shopping object. The selection bar can use the cooki e and the background to select the temporary storage library, store the temporarily saved web page ID, and add or discard the items to be purchased by the operation of the web page ID.

When a user conducts online shopping using a graphic search, the search results are numerous, and must be placed in the selection bar for "comparison", and then from the selection bar to the web page where the target shopping object is located. Selection in the search The bar is open to any user, and the favorites are only open to registered users. The web page ID stored in the pick bar is only temporarily saved. When the user does not use the Tesco search for a certain period of time, the corresponding selection bar will be automatically cleared. The favorites used by registered users can save the web page ID selected by the user for a long time, so that they can be called at any time in the future.

A notable feature of this Motobu search is that the user's search results must first be placed in the selection bar for "comparison", and then from the selection bar to the web page where the target shopping object is located. Figure 12 is an example of a display page where the user "patch" the initially selected purchase objects together for "comparison". The thumbnail of the shopping object displayed during the "comparison" process is still captured and generated by the web crawler. Since the web crawler in the map search has a strong webpage thumbnail crawling capability, it is possible to realize arbitrary display of the shopping object thumbnails inside the cartographic search, so that the users can collectively put them together for "comparison". In the "comparison" process, the user still does not leave the platform provided by the Tesco search, thus avoiding the trouble that the existing shopping search engine needs to perform repeated page jumps when performing "comparison", which greatly simplifies the user's operation.

Since this map search is used as a web shopping portal, it only provides the online shopping navigation function, and does not sell any merchandise itself. Therefore, after the user compares and determines the object to be purchased, the user needs to enter the link through the map purchase search. The online store page where the shopping object is located is purchased. Figure 13 is a diagram showing an example of a user entering a shop page where a shopping object is located. In this operation, the online shop page where the shopping object is located is displayed in a virtual floating manner, and the direction of the target is controlled, and the direct conversion within the search engine result is realized, so that the user does not leave the Tesco search local feeling, and further improves. The user's shopping experience.

The visual search engine system and its implementation method and application provided by the present invention are described in detail above. Any obvious changes made to the present invention without departing from the spirit of the invention will constitute an infringement of the patent right of the present invention and will bear corresponding legal liabilities.

Claims

Rights request

The information collector is located at a bottom layer of the web crawler device, and directly interacts with the Internet to obtain a web page, and the page analyzer is connected with the information collector, and parses the link mark from the page content. The URL is forwarded to the URL filter for parsing; on the other hand, the page content is parsed into a text format and submitted to the page filter for processing;

After the URL filter filters the URL to the site scope and the theme, the URL filter is stored in the URL library; after the page filter performs redundancy detection of the page content, the detected page is stored in the page library;

The picture generator is connected to the URL library, and generates a picture corresponding to the page for the URL stored in the URL library;

The display control device further includes:

The semantic analysis device further includes:

2. A method for implementing display control of Internet search results by a visual search engine system according to claim 1, comprising a page rendering step, a display control step, and a semantic analysis step, wherein:

The page rendering step includes the following sub-steps: (1) generate a start tag of the web page;

(3) rendering the body in the web page;

(4) Generate an end tag of the web page;

(5) Clear the data;

The display control step includes the following sub-steps:

3. A web crawler with page rendering function, characterized by:

The picture generator is connected to the URL library, and generates a picture corresponding to the page for the URL stored in the URL library.

4. The web crawler apparatus according to claim 3, wherein:

The information collector starts from the information source and requests the h«p protocol to download the web page. The page analyzer analyzes the page and extracts the link, and then the information collector accesses the network in an iterative manner.

5. The web crawler apparatus according to claim 3 or 4, wherein:

6. The web crawler apparatus according to claim 3, wherein: The URL filter uses the semantic information of the extended metadata to perform topic correlation prediction on the URL extracted from the web page, and performs pruning processing according to the principle of collecting related links and discarding irrelevant links.

7. The web crawler apparatus according to claim 3, wherein:

The URL manager obtains a list of URLs from the URL library on the one hand, and assigns the tasks to a plurality of information collectors after being arranged in a task; on the other hand, obtains a new URL list from a plurality of information collectors, and saves the lists to In the URL library.

8. A method for implementing a page rendering function by a web crawler device according to claim 3, comprising the steps of:

(1) Generate a start tag of the web page;

(3) rendering the body in the web page;

(4) Generate an end tag of the web page;

(5) Clear the data.

9. The method for implementing a page rendering function by a web crawler device according to claim 8, wherein: in the step (2), calling each lifecycle stage of the label refers to recursion from an upper layer label to a lower layer label. At the entrance, only the underlying label is rendered, and the calling component continues the subsequent phases.

10. The method for implementing a page rendering function by a web crawler device according to claim 8, wherein: in the step (4), the operation of generating an end tag is replaced by an operation of controlling an inline tag execution flow.

11. A method of implementing a page rendering function by a web crawler device according to claim 8, further comprising the steps of:

When a picture tag is found to reference a picture, a request is made to the server; at this point, the subsequent code is rendered, the server returns the file of the picture, and the code is re-rendered.

12. The method for implementing a page rendering function by a web crawler device according to claim 11, wherein: when it is found that there is a < _SC ript^ tag of a JavaScript code, executing a statement, re-rendering part of the code, and then rendering the result Generate an image.

13. A display control device for displaying search results in an image and text manner, comprising: a text search result display unit, configured to display a text search result in a list manner;

The text search result display unit is located at a left middle position of the entire display page, the focus web page thumbnail display unit is located at a central area of the entire display page, and the graphic search result display unit is respectively located in the focus Click the top right corner and bottom right corner of the page thumbnail display unit.

14. The display control device for displaying search results in a graphic form according to claim 13, wherein:

15. The display control device for displaying search results in a graphic form according to claim 13, wherein:

16. A display control method for displaying search results in an image and text manner, wherein: in a display page, a text search result is vertically arranged in parallel with a corresponding webpage thumbnail, and a central portion of the display page is a focus display area, and is used for Displaying a graphic focus corresponding to the text focus selected by the user; the text focus and the graphic focus are synchronously displayed in the display page, and synchronously changing the change is realized by a bidirectional circular linked list with a leading pointer, wherein the head pointer is used for Achieve a judgment on the location of the text focus.

17. The display control method for displaying search results in an illustrated manner according to claim 16, wherein:

18. A method for implementing an accurate search using semantic analysis, comprising the steps of:

(3) If yes, perform subsequent retrieval directly; if not, provide the user with a vocabulary associated with the target information descriptor;

19. The method for implementing an accurate search by using semantic analysis according to claim 18, wherein: in the step (1), the word segmentation operation adopts a maximization word segmentation algorithm.

20. The method of claim 18, wherein the step (2) has "ontology" and "behavior" in the target information descriptor, and "ontology" When "associated with" behavior, it is considered that the target information descriptor has complete semantics.

The method for realizing accurate search by using semantic analysis according to claim 18, wherein: in the step (2), if the target information descriptor does not have complete semantics, first determining the target information. Describe the "ontology" in the word.

22. The method for implementing an accurate search by using semantic analysis according to claim 21, wherein: in the step (4), determining, by the user's secondary input, the "row" corresponding to the target information descriptor For ".

23. The method for implementing an accurate search using semantic analysis according to claim 18, wherein: in the step (3), the vocabulary associated with the target information descriptor is stored by the meta-vocabulary association database.

24. The method for realizing accurate search by using semantic analysis according to claim 23, wherein: in the meta-vocabulary association database, for a meta-vocabulary S, S {ci, dj} is used to store the associated vocabulary , and ci is classified as the first layer, and dj is classified as the second layer.

25. The method for implementing an accurate search by using semantic analysis according to claim 18, wherein: in the step (4), the user performs the second selection in a vocabulary associated with the target information descriptor. Inputs.

26. A method of network shopping navigation, implemented based on a visual search engine system comprising a web crawler device and a display control device, wherein the web crawler device is configured to crawl and generate a webpage thumbnail, wherein: the visual search When the engine system is used for web shopping navigation, first, according to the shopping object keyword input by the user, the display control device displays the text search result of the shopping object on the left side of the search result page, and the upper right corner and the lower right corner of the search result page are displayed. a thumbnail of the webpage corresponding to the text search result, and a central area of the search result page displays a thumbnail of the focus webpage of the shopping object currently selected by the user;

27. The method of network shopping navigation according to claim 26, wherein:

In the selection column, a webpage ID is set for each target webpage for the shopping object, and the webpage ID is transit managed.

28. The method of network shopping navigation according to claim 27, wherein:

The selection bar temporarily saves the webpage ID, and joins or discards the item to be purchased by the operation of the webpage ID.

29. The method of network shopping navigation according to claim 26, wherein:

A favorite is also set in the search result page, and the favorite is open to the registered user, and the webpage ID selected by the registered user is stored for a long time.

30. The method of network shopping navigation according to claim 26, wherein:

When comparing, the thumbnails of the shopping objects captured and generated by the web crawler are grouped together for selection by the user.

31. The method of network shopping navigation according to claim 26, wherein:

After determining the shopping object to be purchased, the user enters the online shop page where the shopping object is located through the link, and the online shop page is displayed in a virtual floating manner.