JP4962967B2 - Web page search server and query recommendation method - Google Patents

Web page search server and query recommendation method Download PDF

Info

Publication number
JP4962967B2
JP4962967B2 JP2008004844A JP2008004844A JP4962967B2 JP 4962967 B2 JP4962967 B2 JP 4962967B2 JP 2008004844 A JP2008004844 A JP 2008004844A JP 2008004844 A JP2008004844 A JP 2008004844A JP 4962967 B2 JP4962967 B2 JP 4962967B2
Authority
JP
Japan
Prior art keywords
query
search
word
web page
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2008004844A
Other languages
Japanese (ja)
Other versions
JP2009169541A (en
Inventor
竜己 小林
Original Assignee
ヤフー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤフー株式会社 filed Critical ヤフー株式会社
Priority to JP2008004844A priority Critical patent/JP4962967B2/en
Publication of JP2009169541A publication Critical patent/JP2009169541A/en
Application granted granted Critical
Publication of JP4962967B2 publication Critical patent/JP4962967B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

  The present invention relates to a Web page search server and a query recommendation method.

  The development of computer technology such as database technology seen in recent years and network technology seen on the Internet uses a wide variety of document information (hereinafter simply referred to as “documents”) stored in a database distributed on the network. Users can use it without being aware of the location of the database. However, it is unlikely that all of a wide variety of documents are required for the user, and the user needs to select a document having useful information for himself / herself from the large number of documents.

  Currently, many search engines having a query search function (keyword search function) are provided as systems that support the work of selecting useful documents from a large amount of documents distributed on the Internet. In a query search, a document in which a keyword given as a query by a user appears is extracted from a group of available documents and presented. Therefore, the user needs to input a query suitable for querying a document including information desired by the user to the system. However, it is a burden for the user to come up with an appropriate query necessary for obtaining the target document. Therefore, a system that recommends an input query to a user is useful.

As a conventional technology that supports user query search by recommending a query, for example, when a certain document or document group is found as a result of a past search using a query, it appears in the document or document group It can be used as a query related to the query based on the assumption that the word to be related is a word related to the query and based on an evaluation method using the appearance frequency of a large number of words appearing in the document or document group. The existence of a technique of extracting a related query and recommending a related query extracted in this way when a user inquires about a query related to the query has been introduced (for example, Patent Document 1).
JP 2000-112968 A

  However, in the above method, since the entire document is to be analyzed, the processing load increases, and depending on the document, a so-called noise mixture in which a word not related to the query is extracted is expected.

  Therefore, an object of the present invention is to provide a technique for recommending a related query with less of these problems by extracting related queries based on a document title and a summary sentence.

  In view of the strong relationship between display information (title and summary text) related to a Web page selected by a user in a query search and the searched query, the present inventor presents a recommended query by obtaining the degree of correlation between them. The mechanism was found and the present invention was completed. Specifically, the present invention provides the following.

  (1) In a Web page search system that inputs a query and searches for a Web page, information relating to a Web page selected by a user from among a plurality of Web page candidates presented after the search is executed, A word extracting means for extracting words contained in the one displayed on the screen of the terminal used by the user for the selection, a query input for performing the search, and a word extracted by the word extracting means A correlation degree recording means for recording the correlation degree, a recommended query extraction means for extracting a recommended query based on the correlation degree recorded in the correlation degree recording means when a query is input, and the recommended query extraction means A Web page search server comprising recommendation query transmission means for transmitting an extracted recommendation query to a terminal used by a user.

  According to such a configuration of the present invention, since the degree of correlation with the query is determined based on information actually referred to when the user selects the Web page, a substantially meaningful degree of correlation is grasped. In addition, the problem of noise expected when the degree of correlation with words included in the content of the Web page itself and the increase in the load on computer processing are alleviated.

  (2) The degree of correlation recorded by the correlation degree recording unit is the number of times of selection related to extraction of the word by the word extraction unit in a search performed by inputting the query. The web page search server described in 1).

  According to such a configuration of the present invention, when a plurality of the same words are included in the information displayed on the user's terminal regarding the selected Web page, the degree of correlation can be obtained without being affected by the same word. Since it can, it can be made not to be influenced by the habit of creating display information.

  (3) The degree of correlation recorded by the correlation degree recording unit is the number of words extracted by the word extraction unit in a search performed by inputting the query. Web page search server.

  According to such a configuration of the present invention, when a plurality of the same words are included in the information displayed on the user's terminal regarding the selected Web page, the degree of correlation reflecting the number can be obtained. Therefore, the degree of correlation with important words that are repeatedly displayed is increased, and the situation can be brought closer to the actual situation of the user's selection decision.

  (4) An advertisement related word recording unit that records a code for specifying a banner advertisement in association with one or more words, and the input query is associated with a code for specifying a banner advertisement by the advertisement related word recording unit. A determination unit that determines whether or not the word matches a recorded word; and a Web page related to a search performed by inputting the query to a banner advertisement specified by the code according to a determination result by the determination unit The web page search server according to any one of (1) to (3), further comprising banner advertisement transmission means for transmitting to a terminal used by the user for display on the terminal.

  According to such a configuration of the present invention, the banner advertisement related to the query input for the search can be displayed together with the web page related to the search, that is, the screen for displaying the search result. Can be raised.

  (5) Advertising-related word recording means for recording a code specifying a banner advertisement in association with one or more words, query-related word recording means for recording the query in association with one or more words, and the query-related Determination of whether or not the words recorded in association with the input query by the word recording means match the words recorded in association with the code specifying the banner advertisement by the advertisement related word recording means And a banner advertisement transmitted to the terminal used by the user for displaying the banner advertisement specified by the code on the Web page related to the search performed by inputting the query according to the determination result by the determination means The Web page search server according to any one of (1) to (3), further comprising a transmission unit.

  According to such a configuration of the present invention, it is possible to determine a banner advertisement to be posted based on a word related to a query input for a search. The banner advertisement can be determined flexibly based on the concept words. Therefore, even when the usage of words changes with the times, it is possible to respond flexibly.

  (6) In a Web page search system that searches a Web page by inputting a query, information related to the Web page selected by the user from among a plurality of Web page candidates presented after execution of the search. A word extraction step for extracting words included in the terminal displayed on the terminal screen used by the user for the selection, a query input by the computer for performing the search, and the extracted A correlation degree recording step for recording a degree of correlation with a word; a recommendation query extracting step for extracting a recommendation query based on the recorded degree of correlation when a query is input; and the extracted recommendation query A query recommendation method including a recommendation query transmission step of transmitting a message to a terminal used by a user.

  According to such a configuration of the present invention, since the invention described in (1) is realized using a computer, the same effects as in (1) can be achieved.

  According to the present invention, since the recommended query is extracted based on the information displayed on the screen of the user terminal such as the title and the summary sentence, not the content itself containing a large amount of information and a lot of noise, the calculation load of the system is reduced. It can be reduced and the accuracy of the recommendation query can be improved. As a result, the utilization rate of search sites can be improved and an increase in advertisements can be expected.

  Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings. This is merely an example, and the technical scope of the present invention is not limited to this.

(First embodiment)
[Display recommendation query]
In the current Web page search system, when a query is input and a search button is pressed, a list of Web pages related to the query (hereinafter referred to as a search result list) is displayed on the terminal screen. In the search result list, links to a plurality of Web pages are displayed as titles and summary sentences of the Web pages.

  FIG. 1 is a diagram illustrating an example of a search result list displayed on the screen of the user terminal. In this example, a search is executed based on a query “patent”.

  Here, title 4 and summary sentence 5 of candidate Web pages are displayed. Since these are words or sentences each consisting of a plurality of words, the words constituting them by morphological analysis. Can be extracted. This is performed by the word extraction means described later.

  The user selects and clicks the Web page he / she wants from the title 4 and the summary sentence 5 displayed in the search result list 1.

  At this time, a kind of correlation is recognized between the query inputted by the user and the words constituting the title 4 and summary sentence 5 of the Web page selected by the user. Usually, there are a plurality of words extracted from the title 4 and the summary sentence 5 through morphological analysis. If there are n words (that is, n types) excluding duplication, the n words and the input query are between them. You can make a word correlation table. This word correlation degree table is stored and updated, and this is performed by the correlation degree recording means described later.

  FIG. 2A is a diagram illustrating an example of a word correlation degree table. This example shows a word extracted from a title and a summary sentence based on an example in which a query “patent” is inputted and a Web page whose title is “XX patent office” is selected.

  Here, the numerical value part 6 represents the number of selections that triggered the extraction operation. Therefore, even if the same word is extracted from the title and summary sentence multiple times in the extraction process based on one selection, it is set to 1. This is because even if the content is the same, the number of words included in the content differs depending on the summary method.

  Even if the same query is input, the Web page selected from the search result list 1 is not necessarily the same depending on the user and depending on the situation. Therefore, the above-described “input query-execute search-display search result list-select-- When the process of “word extraction” is repeatedly performed, the above-mentioned word correlation degree table is gradually expanded and updated.

  In other words, when n words are extracted from the title and summary sentence of the selected Web page for one query when this process is performed for the first time, the frequency is calculated for these n words. A word correlation table with 1 as the number is created. This frequency is recorded in the numerical part 6 of the word correlation table.

  Next, when the process is performed again by the same user or another user with the same query, if the number of words extracted from the title and summary sentence of the selected Web page is m, the word For words registered in the correlation table, the frequency is incremented by 1, and for other words, the frequency is newly registered in the word correlation table and the frequency is set to 1.

  Thereafter, each time the process is performed for the same query, the number of words registered in the word correlation table is increased or the frequency of already registered words is incremented by 1 in the same manner.

  To perform the process with another query, create a new line in the word correlation table and perform the same operation as above. In other words, a row in the word correlation table is created for each query.

  FIG. 2B is an example of a word correlation table corresponding to different queries. A word correlation table is created for the query “patent” and the query “trademark”.

  By doing in this way, it is possible to know the degree of correlation between the query and the title (composition word) included in the title and summary sentence of the Web page selected in relation to the query.

  Therefore, when a certain query is input, a plurality of words having a strong correlation with the query are selected from the word correlation table in the descending order of the correlation and are displayed as recommended queries. However, the same words as the query are excluded. The selection and display are performed by a recommended query extraction unit and a recommended query transmission unit described later.

  For example, when “application”, “right”, and “invention” are in descending order with respect to the query “patent”, “patent & application” as the first recommendation query and “patent” as the second recommendation query & "Right" and "patent & invention" as the third recommendation query. This is based on the track record that the words “application”, “right”, and “invention” were often included in the titles and abstracts of web pages selected starting from the query “patent”. The information for narrowing down the query is provided to the user. In many search engines, a space means “&”. In that case, a space may be inserted instead of “&”.

  FIG. 3 is a diagram illustrating an example in which a recommendation query is displayed. When the query “patent” is input, “patent & application”, “patent & right”, and “patent & invention” are displayed as the recommendation query 7.

  The mechanism for displaying the recommended query may be any method currently used. For example, the event that the query is input in the query input field 2 is used as a trigger from the browser to the server by a method such as Ajax. This can be done by notifying the input query and returning a recommendation query from the server.

  In this way, it becomes easy for a user who has only thought of “patent” at first to find a more narrow query in order to find his / her desired Web page.

  Next, according to the recommendation query, for example, when the user inputs “patent & application” and performs a search, a search result list is also displayed.

  FIG. 4 is a diagram illustrating an example of a search result list displayed on the screen of the user terminal. In this example, a search is executed based on the query “patent & application”.

  Here, when the selection is performed again, a word correlation degree table for two queries may be created between the word extracted from the title 4 and the summary sentence 5 of the selected Web page and “patent & application”. it can.

  In this case, the difference from the above-described word correlation table is that a row is specified by two queries. In this case, “patent” and “application”. Since it is an AND condition, the order does not matter.

  FIG. 5 is a diagram illustrating an example of a word correlation table for two queries.

  Here, when “patent” and “application” are input as queries under AND conditions, the title of the Web page selected from the search result list and the words included in the summary sentence, “patent” and “application” It shows the degree of correlation.

  As a result, a recommendation query can be created in the same manner as described above based on a high frequency. For example, if the frequency is “procedure”, “document”, “how” in order, “patent & application & procedure” as the first recommendation query, “patent & application & documents” as the second recommendation query, “Patent & application & method” is displayed as the recommendation query 3. Hereinafter, when this procedure is repeated, the query can be narrowed down gradually in order to reach the Web page desired by the user, starting from the query input by the user first.

  FIG. 6 is a diagram illustrating an example in which a recommendation query is displayed. When the query “patent & application” is input, “patent & application & procedure”, “patent & application & document”, and “patent & application & method” are displayed as the recommendation query 7.

  According to this method, the correlation between the two can be obtained based on the query input in the past and the word included in the title and summary sentence of the Web page selected at that time. The degree of correlation with less noise can be obtained as compared with the case based on the words included in. This is because various words are usually included in the content, and it is considered that the word does not necessarily have a high degree of correlation with the query input by the user.

  On the other hand, since the title and summary text have fewer characters than the content and are information directly used by the user as a criterion for selection, it is considered that noise is small.

  In addition, there is an advantage that the load of computer processing can be reduced as much as the number of included words is small both in performing morphological analysis and in creating and retrieving a word correlation table.

  In the above example, the word correlation table is created based on the input of queries by all users, but a word correlation table may be created for each user. According to the former, query recommendation reflecting the search results of general users is possible, but according to the latter, personalized query recommendation can be performed for each user. When a word correlation table is created for each user, the word correlation table may be created by identifying the user correlation ID.

  Further, the search may be automatically executed in addition to the display of the recommended query. This means that the search is performed once or more by the user, and two or more searches are performed. As a result, it is possible to reach a Web document that could not be reached only by the first search.

[Display advertisement]
Then, during query search, advertisements related to the query can be displayed. The advertisement to be displayed is a banner advertisement, and when the user clicks on it, a predetermined advertisement content is displayed.

  FIG. 7 is a schematic diagram related to transmission of advertisement. The flow from when the query is transmitted from the browser to the server until the Web page in which the advertisement is inserted is transmitted after the server selects the advertisement is shown. When displaying the Web page, the server selects a banner advertisement to be displayed according to a predetermined rule from the advertisement pool recorded in the advertisement DB. The selection can be made in the following two steps.

  Before describing the selection procedure, first, a table used in the selection procedure will be described.

  FIG. 8 is an advertisement related word table showing a relationship between an advertisement and related words related to the advertisement. This is recorded by the advertisement related word recording means described later. The related term is to display the advertisement when the input query matches any of the related terms. The advertisement is identified by the advertisement ID, and a priority index (contract amount) for use in determining the priority when displaying the advertisement is held in association with the advertisement ID. Here, the contract amount of the advertisement is used as the priority index, but it is not limited to this. In the related word column, related words related to the advertisement are held. Here, 1 indicates that it is a related word, and 0 indicates that it is not a related word. This related word may be extracted from the advertising content using morphological analysis, but may be specified by the advertiser when contracting with the advertiser.

  Under such a mechanism, when a query is input and a search result list is displayed, an advertisement ID having the same word as the query as a related word is extracted. If the query includes a plurality of words such as “patent & recruitment”, an advertisement ID having the plurality of words as related words is extracted. This is the first stage procedure. According to the example of FIG. 8, the advertisement IDs that both “patent” and “adopted” have as related words are A001 and B001, so these are extracted.

  Although the advertisement related word table is examined with the query as one word, the query may be decomposed into a plurality of words and an advertisement ID having each of those words as a related word may be extracted. This word decomposition can be performed using morphological analysis. For example, when a query such as “car sales association” is input, it can be broken down into the words “car” and “sales party”, so that an advertisement ID having “automobile” as a related word can be extracted.

  As described above, when a plurality of advertisement IDs are extracted in the first stage procedure, the process proceeds to the second stage procedure. This is determined based on the priority index related to the extracted advertisement ID. For example, when two advertisement IDs A001 and B001 are extracted, the contract amount of the former is 4 million yen and the contract amount of the latter is 1 million yen, so the number of display times is distributed at a ratio of 4: 1.

  In this case, if a random number from 0 to 1 is generated and the random number is 0.8 or less, the contract ID is A001 advertisement ID, and if the random number is greater than 0.8, the B001 advertisement ID is selected. Good. Even when there are three or more extracted advertisement IDs, selection can be made at an arbitrary ratio by using random numbers.

  If there is only one advertisement ID extracted in the first stage procedure, it is sufficient to select the advertisement ID, and there is no need to perform the second stage procedure. The banner advertisement specified by the advertisement ID thus selected is inserted into the page and displayed when the search result list is output. The advertisement-related word table holds the position indicating the location where the banner advertisement is displayed, the link associated with the banner advertisement, and the display content of the banner advertisement. Therefore, insertion into the web page is based on these Just do it. The selection of the banner advertisement and the display on the Web page are performed by a determination unit and a banner advertisement transmission unit described later.

  FIG. 9 is a diagram illustrating an example in which the banner advertisement 8 is displayed together with the search result list.

[Overall configuration of Web page search system]
FIG. 10 is a diagram showing the overall configuration of the Web page search system. The processing described above will be described in association with each means in the overall configuration diagram.

  The web page search server 10 and the user terminal 30 can communicate with each other via the Internet (not shown).

  The user inputs a query to the query input means 31 of the user terminal 30. The query input to the query input unit 31 is transmitted to the Web page search server 10, and the query reception unit 11 of the Web page search server 10 receives the transmitted query.

  Next, the user inputs a search execution instruction to the search execution instruction unit 32 of the user terminal 30. Then, the user terminal 30 transmits the instruction to the search execution unit 12 of the Web page search server 10. When the search execution unit 12 receives the search execution instruction, the search execution unit 12 searches the Web page including the query based on the previously received query. Then, the search result transmission unit 13 transmits the search result to the user terminal 30. The search result list display means 33 of the user terminal 30 displays the received search results as a list on the screen.

  Next, the user inputs selection of a desired Web page to the selection input unit 34 of the user terminal 30. The user terminal 30 transmits the input selection result to the selection receiving unit 14 of the Web page search server 10. Two processes are performed based on the selection result received by the selection receiving means 14. First, information on the selected web page is read out and transmitted to the user terminal 30 by the web page transmission means 15. The contents are displayed on the screen by the web page display means 35 of the user terminal 30.

  The above processing is generally performed in a conventional Web page search system. In the present embodiment, another process is activated.

  The word extraction means 16 extracts words from the selected title and summary sentence by morphological analysis. The correlation degree recording means 17 records the degree of correlation between the query accepted by the query acceptance means 11 and the extracted word.

  When the query receiving unit 11 receives a query, the recommended query extracting unit 18 selects a word having a high degree of correlation with the input query based on the degree of correlation between the query and the word already recorded by the correlation degree recording unit 17. A predetermined number and a correlation query are extracted in descending order to generate a recommendation query. Then, the recommended query transmission unit 19 transmits the generated recommended query to the user terminal 30. The recommended query display means 36 of the user terminal 30 receives the recommended query and displays it on the screen.

  On the other hand, when a query is input and execution of a search is instructed, the determination unit 22 determines which banner advertisement should be displayed in order to display a banner advertisement related to the query. This determination is performed based on the related word used for the determination for displaying the banner advertisement in advance, since it is recorded in the advertisement related word recording means 20. When a banner advertisement to be displayed is determined, information related to the banner advertisement is transmitted to the user terminal 30 by the banner advertisement transmitting unit 23. Normally, this transmission is performed together with the search result, and the banner advertisement display means 37 of the user terminal 30 displays a banner advertisement on the display screen of the search result list.

[Hardware configuration of Web page search server]
FIG. 11 is a diagram illustrating a hardware configuration of the Web page search server 10 according to the present embodiment. The Web page search server 10 includes a CPU (Central Processing Unit) 51 (a plurality of CPUs such as a CPU 52 may be added in a multiprocessor configuration), a bus line 40, a communication I / F (I / F). An interface) 53, a main memory 54, a BIOS (Basic Input Output System) 55, a display device 56, an I / O controller 57, and an input device 58 such as a keyboard and a mouse.

  The control unit 50 is a part that controls the Web page search server 10 in an integrated manner. By appropriately reading and executing various programs stored in the hard disk 60 (described later), the control unit 50 cooperates with the hardware described above. Various functions according to the invention are realized.

  The communication I / F 53 is a network adapter used when the Web page search server 10 receives various input information from the user terminal 30 via the Internet (not shown) or transmits display contents on the screen. It is. The communication I / F 53 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

  The BIOS 55 records a boot program executed by the CPU 51 when the Web page search server 10 is started up, a program depending on the hardware of the Web page search server 10, and the like.

  The display device 56 includes a display device such as a cathode ray tube display device (CRT) or a liquid crystal display device (LCD).

  A storage device 62 such as a hard disk 60 and a semiconductor memory 61 can be connected to the I / O controller 57.

  The input device 58 accepts input by the administrator of the Web page search server 10.

  The hard disk 60 stores various programs for causing the hardware to function as the Web page search server 10, a program for executing the functions of the present invention, and the table described above. The web page search server 10 can also use a hard disk (not shown) provided externally as an external storage device.

  Although the hardware configuration of the Web page search server 10 has been mainly described above, the functions described above may be realized by installing a program in a computer and causing the computer to operate as the Web page search server 10. it can. Therefore, the function realized by the Web page search server 10 described as an embodiment in the present invention is executed by executing the above-described method by the computer or by introducing the above-described program into the computer. This is also possible.

  The computer in the present invention refers to an information processing device including a storage device, a control unit, and the like, and the Web page search server 10 includes an information processing device including a storage device 62, a control unit 50, and the like. This information processing apparatus is included in the concept of the computer of the present invention.

  The control unit 50 is mainly used for the word extraction unit 16, the recommended query extraction unit 18, the recommended query transmission unit 19, the determination unit 22, and the banner advertisement transmission unit 23, and the correlation degree recording unit 17 and the advertisement related word recording unit 20 store it. The apparatus 62 corresponds mainly.

  The hardware configuration of the Web page search server 10 has been described above. In the present invention, the user terminal 30 can also be realized with the same hardware configuration.

(Second Embodiment)
The second embodiment is different from the first embodiment in how to hold the word correlation degree table (FIG. 2).

  Note that, in the following description and drawings, the same reference numerals are given to portions that perform the same functions as those in the first embodiment described above, and redundant descriptions are omitted as appropriate.

  In the word correlation table displaying the correlation between the selected title and the word extracted from the summary sentence and the query, the first embodiment triggered the extraction operation as a number indicating the correlation. Expressed the number of selections. Therefore, even if a plurality of identical words are extracted from a title or the like, the number is ignored and only the number of extractions is reflected.

  However, in the present embodiment, the numerical value part 6 of the word correlation degree table (FIG. 2) has the extracted number as the correlation degree.

  This is because if the same word appears multiple times in the title or summary sentence, it is considered that the Web page is a Web page closely related to the word. Therefore, it is considered that the user is affected by the selection.

  In the present embodiment, the method for selecting an advertisement to be displayed is different based on the input query.

  In the first embodiment, when the advertisement is selected, it is checked whether or not the query matches the related word of the advertisement held in the advertisement related word table (FIG. 8). Even if there is no related word that matches, if the word related to the query (query related word) matches the related word of the advertisement, an advertisement selection process is performed.

  FIG. 12 is a diagram illustrating a query related word table. This is recorded by the query related word recording means described later. In this example, query related words “car” and “car” are held for the query “car”. By doing this, when the query "car" is entered, even if the advertisement does not have "car" as a related word, it will be selected if it has "car" or "car" as a related word. Will be.

  Further, not only a different expression of the same object, but also, for example, a product overlapped by consumers such as “car accessories” may be registered.

[Overall configuration of Web page search system]
FIG. 13 is a diagram showing the overall configuration of the Web page search system in the present embodiment. The difference from the overall configuration diagram of the Web page search system in the first embodiment is that there is a query related word recording means 21. The query related word recording means 21 records a query related word table (FIG. 12).

  When the query is input and the search is executed, the determination unit 22 obtains a related word of the query from the query related word table recorded by the query related word recording unit 21. Then, the advertisement related word table (FIG. 8) is checked for the presence or absence of an advertisement in which either of the query and the obtained related word is recorded as a related word. Subsequent processing is the same as that in the first embodiment, and is therefore omitted.

  The correlation degree recording means 17 is different from the first embodiment in the way of calculating the correlation degree, but is omitted because it has already been described.

  According to the present embodiment, since the degree of correlation between the query used as a reference when extracting the recommended query and the recommended query can be different from that of the first embodiment, a more effective recommended query can be provided. Can increase opportunities to extract

  Moreover, according to this embodiment, since the related advertisement can be selected using the query and the words associated with the query, it is possible to increase the number of advertisement opportunities associated with the business opportunity.

(Third embodiment)
In the third embodiment, a mechanism for extracting a related term of an input query using a mathematical method called LSI (Lentent Semantic Indexing) will be described.

When the query, the title of the Web document selected by the search by the query, and the word (noun phrase) extracted from the summary sentence are collected, the query set Q and the noun phrase set Y are grasped. If the extraction frequency of the noun phrase w1 extracted by the search by the query Q1 is F (w1, Q1), the co-occurrence matrix M of the noun phrase set Y for the query set Q can be defined as follows.

The matrix M can be decomposed by singular value decomposition as follows.

  U is an orthogonal matrix of p rows and p columns, matrix Σ is p rows and m columns and is zero except for the diagonal component, the diagonal component is a nonnegative matrix, V is an orthogonal matrix V of m rows and m columns, and T is transposed Means a matrix. The U and V columns are referred to as a left singular value vector and a right singular value vector, respectively. In addition, the diagonal components of Σ are arranged in descending order.

Since both p and m are normally large values, a matrix Mk that approximates M can be obtained by taking an appropriate k. Note that k is a positive integer not greater than either p or m. Σk is the approximate matrix of M, where Mk obtained as follows is assumed that r singular values are counted from the larger Σ, and the other singular values are set to zero. The degree of approximation can be adjusted by selecting. The transposed matrices of Uk and Vk are each composed of the first k vectors.
The matrix Mk is an approximated matrix with a reduced dimension of the matrix M, and the query column vector is reconfigured by LSI so that noun phrase elements (rows) that are semantically close to each other have a high frequency.

  Next, when query column vectors included in the query set are clustered, a conceptually similar query subset (partial query set) is obtained. As a clustering method, a general method such as a K-average method can be used.

  Then, a noun phrase having a high frequency included in the partial query set as a clustering result can be extracted as a characteristic topic in the partial query set.

  When the query input by the user matches a noun phrase included in the partial query set, particularly a topic, it is possible to grasp other queries included in the partial query set as related terms.

  By doing this, even if the query A and the noun phrase B have never co-occurred, A and B can be combined by calculation combining other vectors including A or B, and presented as a recommended query. be able to.

  As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

It is a figure which shows the example of the search result list | wrist displayed on the screen of the user terminal which concerns on 1st Embodiment. It is a figure which shows the example of the word correlation table which concerns on 1st Embodiment. It is a figure which shows the example which displayed the recommendation query which concerns on 1st Embodiment. It is a figure which shows the example of the search result list | wrist displayed on the screen of the user terminal which concerns on 1st Embodiment. It is a figure which shows the example of the word correlation degree table for 2 queries which concerns on 1st Embodiment. It is a figure which shows the example which displayed the recommendation query which concerns on 1st Embodiment. It is the schematic which concerns on transmission of the advertisement which concerns on 1st Embodiment. It is a figure which shows the advertisement related word table which concerns on 1st Embodiment. It is a figure which shows the example which displayed the banner advertisement with the search result list | wrist which concerns on 1st Embodiment. 1 is a diagram illustrating an overall configuration of a Web page search system according to a first embodiment. It is a figure which shows the hardware constitutions of the web page search server which concerns on 1st Embodiment. It is a figure which shows the query related word table which concerns on 2nd Embodiment. It is a figure which shows the whole structure of the web page search system which concerns on 2nd Embodiment.

Explanation of symbols

1 Search Result List 2 Query Input Field 3 Search Execution Button 4 Title 5 Summary Text
6 Numerical part (degree of correlation)
7 Recommendation Query 8 Banner Advertisement 10 Web Page Search Server 30 User Terminal 40 Bus Line 50 Control Unit 62 Storage Device

Claims (6)

  1. In a web page search system that searches a web page by entering a search query in an input field ,
    Information related to a Web page selected by the user among a plurality of Web page candidates presented after execution of the search, and displayed as a search result on the screen of the terminal used by the user for the selection Word extraction means for extracting words contained in
    Correlation degree recording means for recording the degree of correlation between the search query input for performing the search and the word extracted by the word extraction means;
    When a search query is input in the input field, based on the correlation recorded in the correlation recording unit, a recommended query extraction unit that extracts a word having a strong correlation with the search query ;
    A recommended query transmitting means for transmitting a recommended query specifying the word extracted by the search query and the recommended query extracting means by AND condition to a terminal used by the user so that the user can select in the input field ;
    A web page search server comprising:
  2. 2. The correlation degree recorded by the correlation degree recording means is the number of times of selection related to extraction of the word by the word extraction means in a search performed by inputting the search query. Web page search server described in 1.
  3. 2. The Web according to claim 1, wherein the correlation degree recorded by the correlation degree recording unit is the number of words extracted by the word extraction unit in a search performed by inputting the search query. Page search server.
  4. Advertisement-related word recording means for recording a code for identifying a banner advertisement in association with one or more words;
    Determining means for determining whether or not the input search query matches a word recorded in association with a code for specifying a banner advertisement by the advertisement-related word recording means;
    Banner advertisement transmitting means for transmitting a banner advertisement specified by the code to a terminal used by the user in order to display the banner advertisement specified by the code on the Web page related to the search performed by inputting the search query according to the determination result by the determining means. When,
    The web page search server according to any one of claims 1 to 3, further comprising:
  5. Advertisement-related word recording means for recording a code for identifying a banner advertisement in association with one or more words;
    Query-related word recording means for recording the search query in association with one or more words;
    Whether or not the words recorded in association with the input search query by the query related word recording means match the words recorded in association with a code for specifying a banner advertisement by the advertisement related word recording means Determining means for determining
    Banner advertisement transmitting means for transmitting a banner advertisement specified by the code to a terminal used by the user in order to display the banner advertisement specified by the code on the Web page related to the search performed by inputting the search query according to the determination result by the determining means. When,
    The web page search server according to any one of claims 1 to 3, further comprising:
  6. In a web page search system that searches a web page by entering a search query in an input field ,
    Information related to a Web page selected by the user among a plurality of Web page candidates presented after the search is executed, and displayed as a search result on the screen of the terminal used by the user for the selection A word extraction step for extracting words contained in
    A correlation recording step in which a computer records a correlation between a search query input for performing the search and the extracted word;
    When a search query is input in the input field , the computer extracts a recommended query that extracts words having a strong correlation with the search query based on the recorded correlation .
    A recommended query transmission step of transmitting a recommended query that specifies the search query and the extracted word in an AND condition to a terminal used by the user so that the user can select in the input field ;
    Query recommendation method including
JP2008004844A 2008-01-11 2008-01-11 Web page search server and query recommendation method Active JP4962967B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008004844A JP4962967B2 (en) 2008-01-11 2008-01-11 Web page search server and query recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2008004844A JP4962967B2 (en) 2008-01-11 2008-01-11 Web page search server and query recommendation method

Publications (2)

Publication Number Publication Date
JP2009169541A JP2009169541A (en) 2009-07-30
JP4962967B2 true JP4962967B2 (en) 2012-06-27

Family

ID=40970660

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2008004844A Active JP4962967B2 (en) 2008-01-11 2008-01-11 Web page search server and query recommendation method

Country Status (1)

Country Link
JP (1) JP4962967B2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464897A (en) 2009-01-12 2009-06-24 阿里巴巴集团控股有限公司 Word matching and information query method and device
WO2011123981A1 (en) 2010-04-07 2011-10-13 Google Inc. Detection of boilerplate content
JP5323004B2 (en) * 2010-06-03 2013-10-23 ヤフー株式会社 Query suggestion apparatus and method based on phrases
JP5480058B2 (en) * 2010-08-03 2014-04-23 ヤフー株式会社 Advertisement matching apparatus, method and program
CN103180849B (en) * 2010-10-21 2017-12-29 高通公司 Multi-data source is searched for using mobile computing device
US20120150657A1 (en) * 2010-12-14 2012-06-14 Microsoft Corporation Enabling Advertisers to Bid on Abstract Objects
CN102567408B (en) * 2010-12-31 2014-06-04 阿里巴巴集团控股有限公司 Method and device for recommending search keyword
JP2013225226A (en) 2012-04-23 2013-10-31 Kyocera Corp Information terminal, display control program and display control method
JP5797232B2 (en) * 2013-06-19 2015-10-21 ヤフー株式会社 Information processing apparatus, query control method, and query control program
JP6168963B2 (en) * 2013-10-17 2017-07-26 ヤフー株式会社 Information search apparatus, information search method, and program
CN103942279B (en) * 2014-04-01 2018-07-10 百度(中国)有限公司 Search result shows method and apparatus
CN106708834A (en) * 2015-08-07 2017-05-24 腾讯科技(深圳)有限公司 Object searching method, device and server
JP6429826B2 (en) * 2016-04-20 2018-11-28 ヤフー株式会社 Service providing apparatus, service providing method, and service providing program

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4091146B2 (en) * 1997-07-28 2008-05-28 株式会社ジャストシステム Document retrieval apparatus and computer-readable recording medium recording a program for causing a computer to function as the apparatus
JP3547073B2 (en) * 1998-05-12 2004-07-28 日本電信電話株式会社 Information retrieval method, apparatus and recording medium
JP3563682B2 (en) * 2000-09-12 2004-09-08 日本電信電話株式会社 Next search candidate word presentation method and apparatus, and recording medium storing next search candidate word presentation program
JP4092933B2 (en) * 2002-03-20 2008-05-28 富士ゼロックス株式会社 Document information retrieval apparatus and document information retrieval program
JP2004326220A (en) * 2003-04-22 2004-11-18 Ricoh Co Ltd Document search system, method and program, and recording medium
JP4535765B2 (en) * 2004-04-23 2010-09-01 ニフティ株式会社 Content navigation program, content navigation method, and content navigation apparatus

Also Published As

Publication number Publication date
JP2009169541A (en) 2009-07-30

Similar Documents

Publication Publication Date Title
Scaffidi et al. Red Opal: product-feature scoring from reviews
JP4809441B2 (en) Estimating search category synonyms from user logs
JP4805929B2 (en) Search system and method using inline context query
US9275106B2 (en) Dynamic search box for web browser
US8666962B2 (en) Speculative search result on a not-yet-submitted search query
USRE44794E1 (en) Method and apparatus for representing and navigating search results
US9098568B2 (en) Query suggestions from documents
JP3099756B2 (en) Document processing apparatus, a word extractor and a word extracting method
US8676829B2 (en) Methods and apparatus for generating a data dictionary
US7937391B2 (en) Consumer product review system using a comparison chart
US7693904B2 (en) Method and system for determining relation between search terms in the internet search system
JP2009238241A (en) Method and apparatus for searching data of database
US20080104542A1 (en) Apparatus and Method for Conducting Searches with a Search Engine for Unstructured Data to Retrieve Records Enriched with Structured Data and Generate Reports Based Thereon
US20060100998A1 (en) Method and system to combine keyword and natural language search results
AU2005260076B2 (en) Enhanced document browsing with automatically generated links based on user information and context
JP2008507041A (en) Personalize the ordering of place content in search results
TWI524193B (en) Computer-readable media and computer-implemented method for semantic table of contents for search results
US7617205B2 (en) Estimating confidence for query revision models
US20090083270A1 (en) System and program for handling anchor text
US8745067B2 (en) Presenting comments from various sources
US20060212441A1 (en) Full text query and search systems and methods of use
JP5632124B2 (en) Rating method, search result sorting method, rating system, and search result sorting system
US7958128B2 (en) Query-independent entity importance in books
US20090287676A1 (en) Search results with word or phrase index
US7844599B2 (en) Biasing queries to determine suggested queries

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20090611

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20110607

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110614

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110804

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20120306

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20120312

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20120319

R150 Certificate of patent or registration of utility model

Ref document number: 4962967

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20150406

Year of fee payment: 3

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533