WO2021234844A1

WO2021234844A1 - Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program

Info

Publication number: WO2021234844A1
Application number: PCT/JP2020/019902
Authority: WO
Inventors: 知史三枝; 裕一郎関口
Original assignee: 日本電信電話株式会社
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2021-11-25

Abstract

A selection device (12) according to the present disclosure comprises: a calculation unit (121) which calculates the degree of dispersion of positions at which a phrase appears in a retrieval target document; and a phrase selection unit (126) which selects, on the basis of the calculated degree of dispersion, whether or not the phrase in the retrieval target document is to be used for a retrieval query for retrieving the retrieval target document.

Description

Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program

This disclosure relates to a selection device, a priority determination device, an answer candidate acquisition device, a selection method, a priority determination method, an answer candidate acquisition method, and a program.

In recent years, in the department where operators respond to inquiries from customers (customers) by telephone or chat, we have prepared questions that are expected in advance and answers to those questions (so-called FAQ (Frequency Asked Questions)) in the database. It is equipped with a search system that allows you to register and browse / search FAQs. When responding to inquiries from customers, operators are increasingly searching on the search system according to the contents of inquiries from customers and responding based on the answers of the found FAQs.

As a method of preparing the contents of the FAQ, for example, there is a method of determining the contents to be added / deleted as the FAQ from the experience or know-how of the operator. In addition, as another method, the call center administrator etc. confirms the response log (recorded data, text data obtained by voice recognition for the recorded data, etc.) that records the response with the customer by the operator, and adds it as FAQ. There is a way to determine what should be deleted. Depending on the changes in the products and services provided, the contents to be maintained as FAQ may change, and it costs a lot to maintain the FAQ.

In Patent Document 1, a FAQ question is given based on a text that voice-recognizes recorded data obtained by recording a customer's response by an operator, an existing FAQ, and information on whether or not the FAQ was useful when the operator searched for the FAQ. The method of servicing is described. In this method, the search result for a certain query is presented to the operator, and the operator gives information as to whether or not the search result is useful. Then, based on the query (question) that the search result was judged to be useless, the question (FAQ candidate) extracted from the recorded data, and the question that has already been prepared, similar questions are aggregated and the aggregated question Among them, the set that does not include the questions that have already been prepared is extracted as the questions that should be added to the FAQ. With the method described in Patent Document 1, it is possible to obtain a question to be added to the FAQ, but the question is not extracted for each product / service to be maintained and the answer to the question is not obtained.

As a method for the operator to obtain the answer to the question from the customer, there is a method in which the operator searches the business manual used as a reference for the answer by using a search query corresponding to the question. In this method, if a search query contains words and phrases that frequently appear throughout the business manual, even items that are not relevant as answers to the question may be searched as search results. Therefore, in order to remove words and phrases scattered in a document, a method of using the appearance information of the words and phrases is often used (see Non-Patent Document 1). However, in this method, for example, even if a phrase frequently appears in a certain passage in the business manual, if the phrase frequently appears throughout the business manual, the priority of the phrase is lowered. , You may not get proper search results as an answer to your question. Patent Document 2 describes a method of obtaining a statement section of a customer's request from a voice log recording a conversation between an operator and a customer using the word appearance density, but this method is more appropriate. It is not intended for a search.

International Publication No. 2019/156103 Japanese Unexamined Patent Publication No. 2012-47775

In order to obtain answer candidates for FQA maintenance from the response log, it is necessary to create a search query to search the business manual. When the method described in Non-Patent Document 1 is used to create a search query, words and phrases that appear collectively in a specific range (such as a chapter or section) of a business manual are excluded from the search query, and appropriate search results are obtained. Can't be done.

The purpose of the present disclosure made in view of the above-mentioned problems is a selection device, a priority determination device, and an answer candidate acquisition that can select a phrase more suitable for a search query for searching a document to be searched. It is an object of the present invention to provide an apparatus, a selection method, a priority determination method, an answer candidate acquisition method, and a program.

In order to solve the above problems, the selection device according to the present disclosure has a calculation unit that calculates the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and the search based on the degree of dispersion calculated by the calculation unit. A word / phrase selection unit for selecting whether or not a word / phrase in the target document can be used in a search query for searching the search target document is provided.

Further, in order to solve the above problems, the priority determination device according to the present disclosure includes a plurality of prioritization devices related to the document to be searched based on the frequency of appearance of words and phrases selected as usable in the search query by the selection device described above. The priority of words and phrases in the question sentence or the plurality of question sentences is determined.

Further, in order to solve the above problems, the answer candidate acquisition device according to the present disclosure includes a question acquisition unit that acquires a question cluster consisting of a plurality of question sentences related to the document to be searched, and the above-mentioned priority determination device. The plurality of question sentences constituting the question cluster are input, and the plurality of question sentences or the plurality of question sentences are determined based on the priority of the plurality of question sentences or the words and phrases in the plurality of question sentences determined by the above-mentioned priority determination device. The search query generator that calculates the query score of words and phrases in the plurality of question sentences and generates the search query based on the calculated query score, and the generated search query are used to search the document to be searched. Then, for each of the answer candidate acquisition unit for acquiring the search results and the plurality of acquired search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and based on the calculated search score, the search score is calculated. An output unit for determining the output order of search results for the search query is provided.

Further, in order to solve the above problems, the selection method according to the present disclosure is based on a calculation step of calculating the degree of dispersion of the appearance positions of words and phrases in the document to be searched and the calculated degree of dispersion. Includes a selection step of selecting whether or not the phrase in the document can be used in a search query to search for the document to be searched.

Further, in order to solve the above problems, the prioritization method according to the present disclosure includes a plurality of prioritization methods related to the document to be searched based on the frequency of appearance of words and phrases selected to be usable in the search query by the selection method described above. Includes a priority determination step that determines the priority of words in the interrogative text or the plurality of interrogative texts.

Further, in order to solve the above-mentioned problems, the answer candidate acquisition method according to the present disclosure is determined by the question acquisition step of acquiring a question cluster consisting of a plurality of question sentences related to the document to be searched and the above-mentioned priority determination method. Based on the priority of the words and phrases in the plurality of question sentences or the plurality of question sentences constituting the question cluster, the query score of the words and phrases in the plurality of question sentences or the plurality of question sentences is calculated and calculated. A search query generation step that generates the search query based on the generated query score, a search result acquisition step that searches the document to be searched using the generated search query and acquires the search result, and the acquisition. For each of the plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output step for determining the output order of the search results for the search query based on the calculated search score. including.

Further, in order to solve the above problems, the program according to the present disclosure causes the computer to function as the above-mentioned selection device, priority determination device, or answer candidate acquisition device.

According to the selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method, and program according to the present disclosure, words and phrases more suitable for a search query for searching a document to be searched. Can be selected.

It is a figure which shows the structural example of the search system which concerns on one Embodiment of this disclosure. It is a figure which shows the structural example of the selection part shown in FIG. It is a flowchart which shows an example of the operation of the selection part shown in FIG. It is a flowchart which shows an example of the operation of the priority order determination part shown in FIG. It is a flowchart which shows an example of the operation of the answer candidate acquisition apparatus shown in FIG.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.

FIG. 1 is a diagram showing a configuration example of the search system 10 according to the embodiment of the present disclosure. The search system 10 according to the present embodiment generates a search query for searching a document to be searched, searches the document to be searched using the generated search query, and outputs a search result. In the following, in the call center, the business manual that the operator refers to when answering the question from the customer is the text to be searched, a search query for searching the business manual is generated, and the business manual is searched using the generated search query. Will be explained using an example of preparing a possible question and its answer (FAQ).

As shown in FIG. 1, the search system 10 according to the present embodiment has a question determination unit 11, a selection unit 12 as a selection device, a priority determination unit 13 as a priority determination device, and a question candidate acquisition unit 14. , A search query generation unit 15, an answer candidate acquisition unit 16, and a pair output unit 17. The question candidate acquisition unit 14 is an example of the question acquisition unit. The pair output unit 17 is an example of an output unit. The question candidate acquisition unit 14, the search query generation unit 15, the answer candidate acquisition unit 16, and the pair output unit 17 constitute an answer candidate acquisition device 18.

The question determination unit 11 inputs a response log recording the response of the operator to the customer. The response log is, for example, recorded data obtained by recording a dialogue between an operator and a customer, text data obtained by converting the recorded data into text by voice recognition, and the like. The question determination unit 11 acquires the question utterance (question sentence) in which the customer utters the question from the input response log. The question determination unit 11 acquires a voice recognition result from the response log using, for example, a model learned by the classifier, and determines whether or not the question is uttered from the voice recognition result. Any known determination device can be used for the determination. The question determination unit 11 outputs the acquired plurality of question utterances (question sentences) to the question candidate acquisition unit 14.

The selection unit 12 selects whether or not a word (word or phrase) in the business manual, which is the document to be searched, can be used in the search query, and outputs the selection result to the priority determination unit 13. FIG. 2 is a diagram showing a configuration example of the selection unit 12.

As shown in FIG. 2, the selection unit 12 as a selection device includes a calculation unit 121 and a phrase selection unit 126.

The calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched. As shown in FIG. 2, the calculation unit 121 includes a business manual division unit 122, a phrase division unit 123, an appearance frequency calculation unit 124, and an appearance dispersion calculation unit 125.

The business manual division unit 122 divides the business manual into line units.

The word / phrase division unit 123 divides the business manual divided into lines by the business manual division unit 122 into word / phrase units.

The appearance frequency calculation unit 124 calculates the appearance frequency in the business manual for each word / phrase divided by the word / phrase division unit 123.

The appearance dispersion calculation unit 125 calculates the degree of dispersion of the appearance position in the business manual for each word based on the appearance frequency and the appearance position of each word calculated by the appearance frequency calculation unit 124.

The word selection unit 126 selects whether or not the word can be used in the search query of the word in the business manual based on the degree of dispersion of the appearance position of the word calculated by the calculation unit 121, and determines the selection result as the priority determination unit 13. Output to. For example, the phrase selection unit 126 selects a phrase whose appearance position variance is larger than a predetermined threshold value as a phrase that cannot be used in a search query, and can use a phrase whose appearance position variance is less than or equal to a predetermined threshold value in a search query. Select a phrase. Words and phrases with a large dispersion of appearance positions are words and phrases that frequently appear in the entire business manual. Including such words in your search query is likely to prevent you from getting the right answer to your question. Therefore, by selecting words and phrases that have a large dispersion of appearance positions as words and phrases that cannot be used in the search query, words and phrases that frequently appear in the entire business manual can be excluded from the search query. In addition, by selecting words and phrases that have a small dispersion of appearance positions as words and phrases that can be used in search queries, words and phrases that appear frequently in a specific range can be used in search queries even if they appear in the entire business manual. Can be included. Therefore, it is possible to select a phrase that is more suitable for the search query. As a result, it becomes easier to obtain appropriate answers to the questions.

Referring again to FIG. 1, the priority determination unit 13 as the priority determination device has a plurality of interrogative sentences or a plurality of question sentences related to the business manual based on the frequency of occurrence of words and phrases selected as available in the search query by the selection unit 12. The priority of words and phrases in the question sentence to be used for the search query is determined, and the result is output to the search query generation unit 15.

The priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. Further, the priority determination unit 13 raises the priority of the words and phrases selected to be usable in the search query, for example, the words and phrases that frequently appear in a plurality of interrogative sentences.

The question candidate acquisition unit 14 aggregates a plurality of question utterances (question sentences) output from the question determination unit 11 for each similar question sentence, and acquires a question cluster composed of a plurality of similar question sentences. The question texts that make up the question cluster are candidate question texts (additional candidate question texts) that are added to the FAQ. As a method of aggregating question sentences, for example, word2vec is used to extract words in each question sentence (question utterance), add the word vectors of the extracted words to obtain the utterance vector, and obtain the utterance vector of the obtained vector. There is a method using cosine similarity. The question candidate acquisition unit 14 outputs the acquired question cluster to the search query generation unit 15.

The search query generation unit 15 generates a search query for searching a business manual from the question sentences constituting the question cluster output from the question candidate acquisition unit 14. The search query generation unit 15 inputs a plurality of question sentences constituting the question cluster into the priority determination unit 13. The search query generation unit 15 is a query indicating the importance of the question sentence or the phrase included in the question sentence to the search query based on the priority of the question sentence or the phrase included in the question sentence determined by the priority determination unit 13. Calculate the score and generate a search query based on the calculated query score.

For example, the search query generation unit 15 calculates the query score higher for the question sentence having the higher priority determined by the priority determination unit 13 among the plurality of question sentences constituting the question cluster, and the representative in the question cluster. Decide on a question. As described above, the priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. .. Therefore, the search query generation unit 15 is a question sentence (plurality) that is a phrase that can be used in the search query and includes a phrase that frequently appears in other question sentences among the plurality of question sentences that constitute the question cluster. The query score of the question sentence) that is close to the vector average of the question sentence) is calculated high, and the question is determined as the representative question. The search query generation unit 15 may determine two or more question sentences as representative question sentences from a plurality of question sentences constituting the question cluster.

The search query generation unit 15 generates a search query by excluding words and phrases selected as unusable for the search query by the selection unit 12 from the determined representative question text.

Further, the search query generation unit 15 may generate a search query based on the priority of words and phrases determined by the priority determination unit 13. The search query generation unit 15 calculates, for example, a high query score of a phrase having a high priority, and generates a search query using the phrase having a high query score. As described above, the priority determination unit 13 raises the priority of the phrase selected to be usable in the search query, for example, the phrase having a higher frequency of appearance in a plurality of interrogative sentences. Therefore, for example, assuming that the number of question sentences constituting the question cluster is M, the search query generation unit 15 generates a search query using words and phrases included in the question sentences of M / 2 or more.

The search query generation unit 15 outputs the generated search query to the answer candidate acquisition unit 16. The search query generation unit 15 may output the query score of the search query (the query score of the question sentence or phrase that is the basis of the search query generation) to the answer candidate acquisition unit 16.

The answer candidate acquisition unit 16 searches the business manual using the search query generated by the search query generation unit 15 and acquires the search results. When a plurality of search queries are generated, the answer candidate acquisition unit 16 searches the business manual using each of the plurality of search queries and acquires the search results.

The search of the business manual using the search query can be performed by, for example, a search in sentence units or a search for partial documents in which sentences are combined in a certain unit. The answer candidate acquisition unit 16 may search the business manual by combining a plurality of methods as described above. The answer candidate acquisition unit 16 may acquire a plurality of (top N) search results.

Further, the answer candidate acquisition unit 16 may use a phrase having a small dispersion value of the appearance position or a phrase appearing in the table of contents of the business manual to narrow down the search range. The answer candidate acquisition unit 16 narrows down the search range using these words and phrases, and then performs a search using a search query from which the words and phrases used for narrowing down the search range are removed, so that the range is effective for FAQ maintenance. You can search.

The answer candidate acquisition unit 16 outputs the search query and the search result acquired by the search using the search query to the pair output unit 17. The answer candidate acquisition unit 16 usually acquires a plurality of search results, and outputs the acquired plurality of search results to the pair output unit 17. The answer candidate acquisition unit 16 may output the query score of the search query to the pair output unit 17.

The pair output unit 17 calculates a search score for each of the plurality of search results for the search query acquired by the answer candidate acquisition unit 16 based on the frequency of appearance of words and phrases in the plurality of search results. The pair output unit 17 determines the output order of the search results for the search query based on the calculated search score, and outputs the pair of the search query and the search result in the determined order. The pair output unit 17 calculates, for example, a high search score of a search result having a lot of duplication with other search results among a plurality of search results. Further, the pair output unit 17 may calculate, for example, the search score of each search result according to the frequency of appearance of words and phrases included in each of the plurality of search results in the plurality of search results.

When a plurality of search queries are generated and search results for each of the plurality of search queries are obtained, the pair output unit 17 uses the query score of the search query and the search score of the search result obtained by the search by the search query. The output order of the search results may be controlled based on the above. For example, the pair output unit 17 may control the output order of the pair of the search query and the search result based on the multiplication value of the query score and the search score.

Next, the operation of the search system 10 according to the present embodiment will be described.

FIG. 3 is a flowchart showing an example of the operation of the selection unit 12, and is a diagram for explaining a selection method by the selection unit 12 as a selection device.

The calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched (step S11). The calculation of the variance of the appearance position will be described in more detail, focusing on the operation of the appearance variance calculation unit 125.

The business manual is composed of one html (HyperTextMarkupLanguage) file or a text file. The phrase dividing unit 123 extracts a phrase (noun or the like) by morphological analysis such as mecab, and acquires the extracted phrase and the line number in which the phrase appears. The appearance frequency calculation unit 124 calculates the appearance frequency of the extracted words and phrases. The appearance variance calculation unit 125 calculates the variance of the appearance position of the phrase from the line number where the extracted phrase appears.

The appearance dispersion calculation unit 125 sets the line number of the manual in which the phrase X appears as the appearance position for each phrase (X) that appears N times in the business manual, and the appearance position and phrase of the phrase Xn (n = 1 to N). The variance of the appearance position of the phrase X is calculated by dividing the sum of the differences from the average value of the appearance positions of X by the number of appearances N of the phrase X.

The calculation of the variance of the appearance position of a word by the appearance variance calculation unit 125 will be described using the following 10-line business manual as an example.

<Business manual>
1. 1. How to set up the phone.
2. 2. The setting method when receiving a call is as follows.
3. 3. Select the menu, press the settings button, and select Phone → Incoming call.
4. To set the ringtone, select the ringtone and then the ringtone file.
5. To reject incoming calls, select Reject incoming calls and enter the number to reject.
6. The setting method when making a call is as follows.
7. Select the menu, press the settings button, and select Phone → Call.
8. To assign a number when making a call, select the numbering setting.
9. Operation method 10. others

In the above-mentioned operation manual, the phrase "telephone" appears on lines 1, 2, 3, 6, and 7. The appearance variance calculation unit 125 obtains the normalized average appearance position by dividing the average value of the appearance positions of the phrase “telephone” by the total number of lines. Then, the appearance variance calculation unit 125 calculates the squared average of the value obtained by subtracting the normalized average appearance position from the value obtained by dividing the appearance position of the phrase “phone” by the total number of lines, thereby calculating the word “phone”. The variance of the appearance position of "" is calculated.

In the above example, the normalized average appearance position is 0.38 (= (1 + 2 + 3 + 6 + 7) / 5/10). In addition, the variance of the appearance position of the phrase "telephone" is 0.21 (= ((1 / 10-0.38) + (2 / 10-0.38) + (3 / 10-0.38) + (6). / 10-0.38) + (7 / 10-0.38)) * ((1 / 10-0.38) + (2 / 10-0.38) + (3 / 10-0.38) + (6 / 10-0.38) + (7 / 10-0.38)) / 5). Similarly, for example, the variance of the appearance position of the phrase "menu" is calculated to be 0.2.

When there are a plurality of business manuals, the appearance variance calculation unit 125 may calculate, for example, the variance of the appearance position of the phrase for each business manual. Further, in the present embodiment, the calculation unit 121 has described by using an example in which the business manual is divided into line units and the distribution value is calculated based on the line number, but the present invention is not limited to this. The calculation unit 121 may calculate the variance of the appearance position of the phrase by using the sentence number of the serial number assigned from the beginning of the business manual instead of the line number.

Referring to FIG. 3 again, the phrase selection unit 126 selects whether or not the phrase in the business manual can be used in the search query for searching the business manual based on the calculated degree of dispersion (step S12).

The word selection unit 126 obtains, for example, the average value of the variance of the appearance positions of all words. Then, the phrase selection unit 126 selects, for example, a phrase whose appearance position variance is larger than the average value of the appearance position variance of all words as a phrase that cannot be used in the search query, and the variance of the appearance position is all words. Select words that are less than or equal to the average variance of the occurrence position as words that can be used in the search query. In this way, the phrase selection unit 126 selects whether or not the phrase in the business manual, which is the document to be searched, can be used in the search query, based on the degree of dispersion of the phrase. When a plurality of business files exist, the phrase selection unit 126 averages the variance of the appearance position in each business manual calculated for one phrase, and selects whether or not the phrase can be used in the search query. do.

As described above, the selection method according to the present embodiment is based on the step (calculation step) of calculating the degree of dispersion of the appearance position of the word in the document (business manual) to be searched and the calculated degree of dispersion. Includes a step (selection step) of selecting whether or not the phrase in the document can be used in a search query for searching the document to be searched.

FIG. 4 is a flowchart showing an example of the operation of the priority determination unit 13, and is a diagram for explaining a priority determination method by the priority determination unit 13 as a priority determination device.

The priority determination unit 13 determines the question text or the question based on the frequency of occurrence of the words and phrases selected to be usable in the search query by the selection method described with reference to FIG. 3 in each of the plurality of question texts related to the business manual. The priority of words and phrases in a sentence is determined (step S21).

As described above, the priority determination method according to the present embodiment determines the frequency of appearance of words and phrases selected as usable in the search query by the selection method according to the present embodiment in each of the plurality of question sentences related to the document to be searched. Based on this, it includes a step of determining the priority of a question sentence or a phrase in the question sentence (priority determination step).

FIG. 5 is a flowchart showing an example of the operation of the answer candidate acquisition device 18, and is a diagram for explaining a method of acquiring answer candidates by the answer candidate acquisition device 18.

The question candidate acquisition unit 14 aggregates a plurality of question sentences output from the question determination unit 11 and acquires a question cluster (step S31).

In call center operations such as troubleshooting or inquiries about products using telephone, the operator responds to customers by referring to the FAQ that has been prepared. When preparing FAQs, the categories or genres of FAQs to be maintained are determined, questions are extracted for each category or genre, and FAQs are maintained. Therefore, the question determination unit 11 narrows down the file group of the response log, for example, by using a keyword such as "security". Next, from the narrowed-down response log, the question determination unit 11 uses an utterance script for product sales or a product name as a keyword, and whether the response log after the place where the keyword appears is a question utterance (question sentence). Input to the question judge to judge whether or not, and acquire the question sentence (question candidate sentence).

The question candidate acquisition unit 14 extracts an arbitrary number of important words and phrases from the question candidate sentences. The extracted important words and phrases are words and phrases indicating question candidate sentences. The question candidate acquisition unit 14 used the frequency of appearance of words appearing in all question candidate sentences and the business manual referred to when creating the FAQ answer, and extracted by the inverse of the frequency of appearance of words in the business manual. Words are weighted and the vector expression of each question candidate sentence is obtained. The question candidate acquisition unit 14 acquires a question cluster from the vector-expressed question candidate sentence. The question candidate acquisition unit 14 acquires the question cluster by excluding the question text included in the existing (prepared) FAQ.

For infrequent question candidate sentences such as small clusters, the question candidate acquisition unit 14 takes time to repeatedly confirm and answer questions from the response logs in the vicinity of the response log in which the question sentence appears. You may also include the part that is mentioned in the question candidate sentence. Specifically, the question candidate acquisition unit 14 uses the response log that is the basis for extracting the question sentence and the number of appearances of other question sentences after the utterance in which the question sentence appears as a score to ask a question. You may take out the candidate sentence. By doing this, it is possible to retrieve important questions that appear infrequently.

Next, the search query generation unit 15 generates a search query based on the priority of the plurality of question sentences constituting the question cluster or the words and phrases in the plurality of question sentences determined by the priority determination method according to the present embodiment. (Step S32). For example, the search query generation unit 15 calculates a higher query score for a determined question sentence with a higher priority, and removes words and phrases selected by the selection unit 12 as unusable for a search query from the question sentence having a higher query score. And generate a search query. Further, the search query generation unit 15 calculates the query score higher for words that are selected to be usable for the search query, have a higher frequency of appearance in the question sentence, and have a higher priority, and use words with a higher query score. And generate a search query. In this way, the search query generation unit 15 generates a search query without using the phrase selected as unavailable for the search query.

Exclude words and phrases that are ubiquitous in a specific range of the document to be searched from the search query by deciding whether or not the word or phrase can be used in the search query based on the degree of dispersion of the appearance position of the word or phrase in the business manual. Instead, you can exclude words that frequently appear throughout the document from your search query. Therefore, it is possible to select a more suitable phrase in the search query for the document to be searched.

The answer candidate acquisition unit 16 searches the business manual using the generated search query and acquires the search result (step S33).

The pair output unit 17 calculates a search score for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and determines the output order of the search results for the search query based on the calculated search score. (Step S34).

As described above, the answer candidate acquisition method according to the present embodiment includes a step of acquiring a question candidate (question acquisition step), a step of generating a search query (search query generation step), and a step of acquiring a search result (search result). It includes a step (acquisition step) and a step (output step) for controlling the output order of search results. In the question acquisition step, a question cluster consisting of a plurality of question sentences related to the document to be searched is acquired. In the search query generation step, a plurality of question sentences or a plurality of question sentences or words are prioritized in a plurality of question sentences or a plurality of question sentences constituting the acquired question cluster determined by the priority determination method according to the present embodiment. Calculate the query score of words and phrases in multiple question sentences, and generate a search query based on the calculated query score. In the search result acquisition step, the document to be searched is searched using the generated search query, and the search result is acquired. In the output step, a search score is calculated for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and the output order of the search results for the search query is determined based on the calculated search score.

In the search query generation step, multiple search queries may be generated from one question cluster. In this case, in the search result acquisition step, the business manual may be searched using each of the generated plurality of search queries and the search results may be acquired. In the output step, the search score is calculated for each search result using multiple search queries, and the search result is based on the query score of the search query and the search score of the search result obtained by the search using the search query. The output order may be determined.

As described above, in the present embodiment, it is determined whether or not the word or phrase can be used in the search query in the search target document based on the degree of dispersion of the appearance position of the word or phrase in the search target document. By doing so, it is possible to exclude words and phrases that frequently appear in the entire document from the search query without excluding words and phrases that are ubiquitous in a specific range of the document to be searched. Therefore, it is possible to select a phrase that is more suitable for the search query for the document to be searched.

It is possible to preferably use a computer to function as each part of the above-mentioned search system 10. Such a computer stores a program describing processing contents that realize the functions of each part of the search system 10 in the storage unit of the computer, and the CPU (Central Processing Unit) of the computer reads and executes this program. It can be realized by letting it. That is, the program can cause the computer to function as the selection device 12 described above. The program can also cause the computer to function as the prioritization device 13 described above. Alternatively, the program may allow the computer to function as the answer candidate acquisition device 18 described above.

Further, this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.

The present disclosure is not limited to the configuration specified in each of the above-described embodiments, and various modifications can be made without departing from the gist of the invention described in the claims. For example, the functions included in each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

10 Search system 11 Question candidate judgment unit 12 Selection unit (selection device)
13 Priority after determination unit (priority determination device)
14 Question Candidate Acquisition Department (Question Acquisition Department)
15 Search query generation unit 16 Answer candidate acquisition unit 17 Pair output unit (output unit)
18 Answer candidate acquisition device 121 Calculation unit 122 Business manual division unit 123 Word division unit 124 Appearance frequency calculation unit 125 Appearance variance calculation unit 126 Word selection unit

Claims

A calculation unit that calculates the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and
Based on the degree of dispersion calculated by the calculation unit, a phrase selection unit that selects whether or not the phrase in the document to be searched can be used in a search query for searching the document to be searched, and a phrase selection unit.
A selection device equipped with.
Based on the frequency of occurrence of words and phrases selected as available in the search query by the selection device according to claim 1, the priority of the plurality of question sentences related to the document to be searched or the words and phrases in the plurality of question sentences is set. Priority determination device to determine.
A question acquisition unit that acquires a question cluster consisting of multiple question sentences related to the document to be searched, and
The plurality of question sentences constituting the question cluster are input to the priority determination device according to claim 2, and the plurality of question sentences or the plurality of question sentences determined by the priority determination device according to claim 2 are entered. A search query generator that calculates the query score of the plurality of question sentences or the words and phrases in the plurality of question sentences based on the priority of the words and phrases in the question sentence and generates the search query based on the calculated query score.
The answer candidate acquisition unit that searches the document to be searched using the generated search query and acquires the search results, and the answer candidate acquisition unit.
For each of the acquired plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output for determining the output order of the search results for the search query is determined based on the calculated search score. Department and
Answer candidate acquisition device equipped with.
In the answer candidate acquisition device according to claim 3,
The search query generation unit generates a plurality of the search queries.
The answer candidate acquisition unit searches for the document to be searched by using each of the plurality of generated search queries, and acquires the search result.
The output unit calculates the search score for each search result using the plurality of search queries, and sets the query score of the search query and the search score of the search result obtained by the search using the search query into. Based on this, an answer candidate acquisition device that determines the output order of the search results.
A calculation step for calculating the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and
Based on the calculated degree of variance, a selection step of selecting whether or not the phrase in the document to be searched can be used in a search query for searching the document to be searched, and a selection step.
Selection method including.
Based on the frequency of occurrence of words and phrases selected to be usable in the search query by the selection method according to claim 5, the priority of the plurality of question sentences related to the document to be searched or the words and phrases in the plurality of question sentences is set. A priority determination method that includes a priority determination step to determine.
A question acquisition step to acquire a question cluster consisting of multiple question sentences related to the document to be searched, and
The plurality of question sentences or the plurality of questions based on the priority of the words and phrases in the plurality of question sentences or the plurality of question sentences constituting the question cluster determined by the priority determination method according to claim 6. A search query generation step that calculates the query score of a phrase in a sentence and generates the search query based on the calculated query score.
A search result acquisition step of searching the document to be searched using the generated search query and acquiring the search result, and
For each of the acquired plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output for determining the output order of the search results for the search query is determined based on the calculated search score. Steps and
How to get answer candidates including.
In the answer candidate acquisition method according to claim 7,
In the search query generation step, a plurality of the search queries are generated.
In the search result acquisition step, the document to be searched is searched for using each of the plurality of generated search queries, and the search result is acquired.
In the output step, the search score is calculated for each search result using the plurality of search queries, and the query score of the search query and the search score of the search result obtained by the search using the search query are used. Based on this, a method for acquiring answer candidates that determines the output order of the search results.
A program for making a computer function as the selection device according to claim 1, the priority determination device according to claim 2, or the answer candidate acquisition device according to claim 3 or 4.