WO2021234844A1 - Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program - Google Patents

Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program Download PDF

Info

Publication number
WO2021234844A1
WO2021234844A1 PCT/JP2020/019902 JP2020019902W WO2021234844A1 WO 2021234844 A1 WO2021234844 A1 WO 2021234844A1 JP 2020019902 W JP2020019902 W JP 2020019902W WO 2021234844 A1 WO2021234844 A1 WO 2021234844A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
question
query
search query
words
Prior art date
Application number
PCT/JP2020/019902
Other languages
French (fr)
Japanese (ja)
Inventor
知史 三枝
裕一郎 関口
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/019902 priority Critical patent/WO2021234844A1/en
Publication of WO2021234844A1 publication Critical patent/WO2021234844A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Definitions

  • This disclosure relates to a selection device, a priority determination device, an answer candidate acquisition device, a selection method, a priority determination method, an answer candidate acquisition method, and a program.
  • the call center administrator etc. confirms the response log (recorded data, text data obtained by voice recognition for the recorded data, etc.) that records the response with the customer by the operator, and adds it as FAQ.
  • the response log recorded data, text data obtained by voice recognition for the recorded data, etc.
  • the contents to be maintained as FAQ may change, and it costs a lot to maintain the FAQ.
  • Patent Document 1 a FAQ question is given based on a text that voice-recognizes recorded data obtained by recording a customer's response by an operator, an existing FAQ, and information on whether or not the FAQ was useful when the operator searched for the FAQ.
  • the method of servicing is described.
  • the search result for a certain query is presented to the operator, and the operator gives information as to whether or not the search result is useful.
  • the question (FAQ candidate) extracted from the recorded data, and the question that has already been prepared similar questions are aggregated and the aggregated question Among them, the set that does not include the questions that have already been prepared is extracted as the questions that should be added to the FAQ.
  • Non-Patent Document 1 As a method for the operator to obtain the answer to the question from the customer, there is a method in which the operator searches the business manual used as a reference for the answer by using a search query corresponding to the question.
  • a search query contains words and phrases that frequently appear throughout the business manual, even items that are not relevant as answers to the question may be searched as search results. Therefore, in order to remove words and phrases scattered in a document, a method of using the appearance information of the words and phrases is often used (see Non-Patent Document 1).
  • this method for example, even if a phrase frequently appears in a certain passage in the business manual, if the phrase frequently appears throughout the business manual, the priority of the phrase is lowered.
  • Patent Document 2 describes a method of obtaining a statement section of a customer's request from a voice log recording a conversation between an operator and a customer using the word appearance density, but this method is more appropriate. It is not intended for a search.
  • Non-Patent Document 1 In order to obtain answer candidates for FQA maintenance from the response log, it is necessary to create a search query to search the business manual.
  • a search query When the method described in Non-Patent Document 1 is used to create a search query, words and phrases that appear collectively in a specific range (such as a chapter or section) of a business manual are excluded from the search query, and appropriate search results are obtained. Can't be done.
  • the purpose of the present disclosure made in view of the above-mentioned problems is a selection device, a priority determination device, and an answer candidate acquisition that can select a phrase more suitable for a search query for searching a document to be searched. It is an object of the present invention to provide an apparatus, a selection method, a priority determination method, an answer candidate acquisition method, and a program.
  • the selection device has a calculation unit that calculates the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and the search based on the degree of dispersion calculated by the calculation unit.
  • a word / phrase selection unit for selecting whether or not a word / phrase in the target document can be used in a search query for searching the search target document is provided.
  • the priority determination device includes a plurality of prioritization devices related to the document to be searched based on the frequency of appearance of words and phrases selected as usable in the search query by the selection device described above. The priority of words and phrases in the question sentence or the plurality of question sentences is determined.
  • the answer candidate acquisition device includes a question acquisition unit that acquires a question cluster consisting of a plurality of question sentences related to the document to be searched, and the above-mentioned priority determination device.
  • the plurality of question sentences constituting the question cluster are input, and the plurality of question sentences or the plurality of question sentences are determined based on the priority of the plurality of question sentences or the words and phrases in the plurality of question sentences determined by the above-mentioned priority determination device.
  • the search query generator that calculates the query score of words and phrases in the plurality of question sentences and generates the search query based on the calculated query score, and the generated search query are used to search the document to be searched.
  • a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and based on the calculated search score, the search score is calculated.
  • An output unit for determining the output order of search results for the search query is provided.
  • the selection method according to the present disclosure is based on a calculation step of calculating the degree of dispersion of the appearance positions of words and phrases in the document to be searched and the calculated degree of dispersion. Includes a selection step of selecting whether or not the phrase in the document can be used in a search query to search for the document to be searched.
  • the prioritization method includes a plurality of prioritization methods related to the document to be searched based on the frequency of appearance of words and phrases selected to be usable in the search query by the selection method described above. Includes a priority determination step that determines the priority of words in the interrogative text or the plurality of interrogative texts.
  • the answer candidate acquisition method is determined by the question acquisition step of acquiring a question cluster consisting of a plurality of question sentences related to the document to be searched and the above-mentioned priority determination method.
  • the query score of the words and phrases in the plurality of question sentences or the plurality of question sentences is calculated and calculated.
  • a search query generation step that generates the search query based on the generated query score
  • a search result acquisition step that searches the document to be searched using the generated search query and acquires the search result, and the acquisition.
  • a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results
  • an output step for determining the output order of the search results for the search query based on the calculated search score. including.
  • the program according to the present disclosure causes the computer to function as the above-mentioned selection device, priority determination device, or answer candidate acquisition device.
  • the selection device priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method, and program according to the present disclosure, words and phrases more suitable for a search query for searching a document to be searched. Can be selected.
  • FIG. 1 is a diagram showing a configuration example of the search system 10 according to the embodiment of the present disclosure.
  • the search system 10 according to the present embodiment generates a search query for searching a document to be searched, searches the document to be searched using the generated search query, and outputs a search result.
  • the business manual that the operator refers to when answering the question from the customer is the text to be searched
  • a search query for searching the business manual is generated, and the business manual is searched using the generated search query.
  • FAQ possible question and its answer
  • the search system 10 has a question determination unit 11, a selection unit 12 as a selection device, a priority determination unit 13 as a priority determination device, and a question candidate acquisition unit 14.
  • the question candidate acquisition unit 14 is an example of the question acquisition unit.
  • the pair output unit 17 is an example of an output unit.
  • the question candidate acquisition unit 14, the search query generation unit 15, the answer candidate acquisition unit 16, and the pair output unit 17 constitute an answer candidate acquisition device 18.
  • the question determination unit 11 inputs a response log recording the response of the operator to the customer.
  • the response log is, for example, recorded data obtained by recording a dialogue between an operator and a customer, text data obtained by converting the recorded data into text by voice recognition, and the like.
  • the question determination unit 11 acquires the question utterance (question sentence) in which the customer utters the question from the input response log.
  • the question determination unit 11 acquires a voice recognition result from the response log using, for example, a model learned by the classifier, and determines whether or not the question is uttered from the voice recognition result. Any known determination device can be used for the determination.
  • the question determination unit 11 outputs the acquired plurality of question utterances (question sentences) to the question candidate acquisition unit 14.
  • the selection unit 12 selects whether or not a word (word or phrase) in the business manual, which is the document to be searched, can be used in the search query, and outputs the selection result to the priority determination unit 13.
  • FIG. 2 is a diagram showing a configuration example of the selection unit 12.
  • the selection unit 12 as a selection device includes a calculation unit 121 and a phrase selection unit 126.
  • the calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched. As shown in FIG. 2, the calculation unit 121 includes a business manual division unit 122, a phrase division unit 123, an appearance frequency calculation unit 124, and an appearance dispersion calculation unit 125.
  • the business manual division unit 122 divides the business manual into line units.
  • the word / phrase division unit 123 divides the business manual divided into lines by the business manual division unit 122 into word / phrase units.
  • the appearance frequency calculation unit 124 calculates the appearance frequency in the business manual for each word / phrase divided by the word / phrase division unit 123.
  • the appearance dispersion calculation unit 125 calculates the degree of dispersion of the appearance position in the business manual for each word based on the appearance frequency and the appearance position of each word calculated by the appearance frequency calculation unit 124.
  • the word selection unit 126 selects whether or not the word can be used in the search query of the word in the business manual based on the degree of dispersion of the appearance position of the word calculated by the calculation unit 121, and determines the selection result as the priority determination unit 13. Output to.
  • the phrase selection unit 126 selects a phrase whose appearance position variance is larger than a predetermined threshold value as a phrase that cannot be used in a search query, and can use a phrase whose appearance position variance is less than or equal to a predetermined threshold value in a search query. Select a phrase.
  • Words and phrases with a large dispersion of appearance positions are words and phrases that frequently appear in the entire business manual. Including such words in your search query is likely to prevent you from getting the right answer to your question.
  • words and phrases that have a large dispersion of appearance positions as words and phrases that cannot be used in the search query words and phrases that frequently appear in the entire business manual can be excluded from the search query.
  • words and phrases that have a small dispersion of appearance positions as words and phrases that can be used in search queries words and phrases that appear frequently in a specific range can be used in search queries even if they appear in the entire business manual. Can be included. Therefore, it is possible to select a phrase that is more suitable for the search query. As a result, it becomes easier to obtain appropriate answers to the questions.
  • the priority determination unit 13 as the priority determination device has a plurality of interrogative sentences or a plurality of question sentences related to the business manual based on the frequency of occurrence of words and phrases selected as available in the search query by the selection unit 12.
  • the priority of words and phrases in the question sentence to be used for the search query is determined, and the result is output to the search query generation unit 15.
  • the priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. Further, the priority determination unit 13 raises the priority of the words and phrases selected to be usable in the search query, for example, the words and phrases that frequently appear in a plurality of interrogative sentences.
  • the question candidate acquisition unit 14 aggregates a plurality of question utterances (question sentences) output from the question determination unit 11 for each similar question sentence, and acquires a question cluster composed of a plurality of similar question sentences.
  • the question texts that make up the question cluster are candidate question texts (additional candidate question texts) that are added to the FAQ.
  • word2vec is used to extract words in each question sentence (question utterance), add the word vectors of the extracted words to obtain the utterance vector, and obtain the utterance vector of the obtained vector.
  • cosine similarity There is a method using cosine similarity.
  • the question candidate acquisition unit 14 outputs the acquired question cluster to the search query generation unit 15.
  • the search query generation unit 15 generates a search query for searching a business manual from the question sentences constituting the question cluster output from the question candidate acquisition unit 14.
  • the search query generation unit 15 inputs a plurality of question sentences constituting the question cluster into the priority determination unit 13.
  • the search query generation unit 15 is a query indicating the importance of the question sentence or the phrase included in the question sentence to the search query based on the priority of the question sentence or the phrase included in the question sentence determined by the priority determination unit 13. Calculate the score and generate a search query based on the calculated query score.
  • the search query generation unit 15 calculates the query score higher for the question sentence having the higher priority determined by the priority determination unit 13 among the plurality of question sentences constituting the question cluster, and the representative in the question cluster. Decide on a question.
  • the priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. .. Therefore, the search query generation unit 15 is a question sentence (plurality) that is a phrase that can be used in the search query and includes a phrase that frequently appears in other question sentences among the plurality of question sentences that constitute the question cluster.
  • the query score of the question sentence) that is close to the vector average of the question sentence) is calculated high, and the question is determined as the representative question.
  • the search query generation unit 15 may determine two or more question sentences as representative question sentences from a plurality of question sentences constituting the question cluster.
  • the search query generation unit 15 generates a search query by excluding words and phrases selected as unusable for the search query by the selection unit 12 from the determined representative question text.
  • the search query generation unit 15 may generate a search query based on the priority of words and phrases determined by the priority determination unit 13.
  • the search query generation unit 15 calculates, for example, a high query score of a phrase having a high priority, and generates a search query using the phrase having a high query score.
  • the priority determination unit 13 raises the priority of the phrase selected to be usable in the search query, for example, the phrase having a higher frequency of appearance in a plurality of interrogative sentences. Therefore, for example, assuming that the number of question sentences constituting the question cluster is M, the search query generation unit 15 generates a search query using words and phrases included in the question sentences of M / 2 or more.
  • the search query generation unit 15 outputs the generated search query to the answer candidate acquisition unit 16.
  • the search query generation unit 15 may output the query score of the search query (the query score of the question sentence or phrase that is the basis of the search query generation) to the answer candidate acquisition unit 16.
  • the answer candidate acquisition unit 16 searches the business manual using the search query generated by the search query generation unit 15 and acquires the search results.
  • the answer candidate acquisition unit 16 searches the business manual using each of the plurality of search queries and acquires the search results.
  • the search of the business manual using the search query can be performed by, for example, a search in sentence units or a search for partial documents in which sentences are combined in a certain unit.
  • the answer candidate acquisition unit 16 may search the business manual by combining a plurality of methods as described above.
  • the answer candidate acquisition unit 16 may acquire a plurality of (top N) search results.
  • the answer candidate acquisition unit 16 may use a phrase having a small dispersion value of the appearance position or a phrase appearing in the table of contents of the business manual to narrow down the search range.
  • the answer candidate acquisition unit 16 narrows down the search range using these words and phrases, and then performs a search using a search query from which the words and phrases used for narrowing down the search range are removed, so that the range is effective for FAQ maintenance. You can search.
  • the answer candidate acquisition unit 16 outputs the search query and the search result acquired by the search using the search query to the pair output unit 17.
  • the answer candidate acquisition unit 16 usually acquires a plurality of search results, and outputs the acquired plurality of search results to the pair output unit 17.
  • the answer candidate acquisition unit 16 may output the query score of the search query to the pair output unit 17.
  • the pair output unit 17 calculates a search score for each of the plurality of search results for the search query acquired by the answer candidate acquisition unit 16 based on the frequency of appearance of words and phrases in the plurality of search results.
  • the pair output unit 17 determines the output order of the search results for the search query based on the calculated search score, and outputs the pair of the search query and the search result in the determined order.
  • the pair output unit 17 calculates, for example, a high search score of a search result having a lot of duplication with other search results among a plurality of search results. Further, the pair output unit 17 may calculate, for example, the search score of each search result according to the frequency of appearance of words and phrases included in each of the plurality of search results in the plurality of search results.
  • the pair output unit 17 uses the query score of the search query and the search score of the search result obtained by the search by the search query.
  • the output order of the search results may be controlled based on the above.
  • the pair output unit 17 may control the output order of the pair of the search query and the search result based on the multiplication value of the query score and the search score.
  • FIG. 3 is a flowchart showing an example of the operation of the selection unit 12, and is a diagram for explaining a selection method by the selection unit 12 as a selection device.
  • the calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched (step S11). The calculation of the variance of the appearance position will be described in more detail, focusing on the operation of the appearance variance calculation unit 125.
  • the business manual is composed of one html (HyperTextMarkupLanguage) file or a text file.
  • the phrase dividing unit 123 extracts a phrase (noun or the like) by morphological analysis such as mecab, and acquires the extracted phrase and the line number in which the phrase appears.
  • the appearance frequency calculation unit 124 calculates the appearance frequency of the extracted words and phrases.
  • the appearance variance calculation unit 125 calculates the variance of the appearance position of the phrase from the line number where the extracted phrase appears.
  • the variance of the appearance position of the phrase X is calculated by dividing the sum of the differences from the average value of the appearance positions of X by the number of appearances N of the phrase X.
  • the calculation of the variance of the appearance position of a word by the appearance variance calculation unit 125 will be described using the following 10-line business manual as an example.
  • ⁇ Business manual> 1. How to set up the phone. 2. 2. The setting method when receiving a call is as follows. 3. 3. Select the menu, press the settings button, and select Phone ⁇ Incoming call. 4. To set the ringtone, select the ringtone and then the ringtone file. 5. To reject incoming calls, select Reject incoming calls and enter the number to reject. 6. The setting method when making a call is as follows. 7. Select the menu, press the settings button, and select Phone ⁇ Call. 8. To assign a number when making a call, select the numbering setting. 9. Operation method 10. others
  • the appearance variance calculation unit 125 obtains the normalized average appearance position by dividing the average value of the appearance positions of the phrase “telephone” by the total number of lines. Then, the appearance variance calculation unit 125 calculates the squared average of the value obtained by subtracting the normalized average appearance position from the value obtained by dividing the appearance position of the phrase “phone” by the total number of lines, thereby calculating the word “phone”. The variance of the appearance position of "" is calculated.
  • the variance of the appearance position of the phrase "menu” is calculated to be 0.2.
  • the appearance variance calculation unit 125 may calculate, for example, the variance of the appearance position of the phrase for each business manual. Further, in the present embodiment, the calculation unit 121 has described by using an example in which the business manual is divided into line units and the distribution value is calculated based on the line number, but the present invention is not limited to this. The calculation unit 121 may calculate the variance of the appearance position of the phrase by using the sentence number of the serial number assigned from the beginning of the business manual instead of the line number.
  • the phrase selection unit 126 selects whether or not the phrase in the business manual can be used in the search query for searching the business manual based on the calculated degree of dispersion (step S12).
  • the word selection unit 126 obtains, for example, the average value of the variance of the appearance positions of all words. Then, the phrase selection unit 126 selects, for example, a phrase whose appearance position variance is larger than the average value of the appearance position variance of all words as a phrase that cannot be used in the search query, and the variance of the appearance position is all words. Select words that are less than or equal to the average variance of the occurrence position as words that can be used in the search query. In this way, the phrase selection unit 126 selects whether or not the phrase in the business manual, which is the document to be searched, can be used in the search query, based on the degree of dispersion of the phrase. When a plurality of business files exist, the phrase selection unit 126 averages the variance of the appearance position in each business manual calculated for one phrase, and selects whether or not the phrase can be used in the search query. do.
  • the selection method according to the present embodiment is based on the step (calculation step) of calculating the degree of dispersion of the appearance position of the word in the document (business manual) to be searched and the calculated degree of dispersion. Includes a step (selection step) of selecting whether or not the phrase in the document can be used in a search query for searching the document to be searched.
  • FIG. 4 is a flowchart showing an example of the operation of the priority determination unit 13, and is a diagram for explaining a priority determination method by the priority determination unit 13 as a priority determination device.
  • the priority determination unit 13 determines the question text or the question based on the frequency of occurrence of the words and phrases selected to be usable in the search query by the selection method described with reference to FIG. 3 in each of the plurality of question texts related to the business manual.
  • the priority of words and phrases in a sentence is determined (step S21).
  • the priority determination method determines the frequency of appearance of words and phrases selected as usable in the search query by the selection method according to the present embodiment in each of the plurality of question sentences related to the document to be searched. Based on this, it includes a step of determining the priority of a question sentence or a phrase in the question sentence (priority determination step).
  • FIG. 5 is a flowchart showing an example of the operation of the answer candidate acquisition device 18, and is a diagram for explaining a method of acquiring answer candidates by the answer candidate acquisition device 18.
  • the question candidate acquisition unit 14 aggregates a plurality of question sentences output from the question determination unit 11 and acquires a question cluster (step S31).
  • the operator responds to customers by referring to the FAQ that has been prepared.
  • the categories or genres of FAQs to be maintained are determined, questions are extracted for each category or genre, and FAQs are maintained. Therefore, the question determination unit 11 narrows down the file group of the response log, for example, by using a keyword such as "security”.
  • the question determination unit 11 uses an utterance script for product sales or a product name as a keyword, and whether the response log after the place where the keyword appears is a question utterance (question sentence). Input to the question judge to judge whether or not, and acquire the question sentence (question candidate sentence).
  • the question candidate acquisition unit 14 extracts an arbitrary number of important words and phrases from the question candidate sentences.
  • the extracted important words and phrases are words and phrases indicating question candidate sentences.
  • the question candidate acquisition unit 14 used the frequency of appearance of words appearing in all question candidate sentences and the business manual referred to when creating the FAQ answer, and extracted by the inverse of the frequency of appearance of words in the business manual. Words are weighted and the vector expression of each question candidate sentence is obtained.
  • the question candidate acquisition unit 14 acquires a question cluster from the vector-expressed question candidate sentence.
  • the question candidate acquisition unit 14 acquires the question cluster by excluding the question text included in the existing (prepared) FAQ.
  • the question candidate acquisition unit 14 takes time to repeatedly confirm and answer questions from the response logs in the vicinity of the response log in which the question sentence appears. You may also include the part that is mentioned in the question candidate sentence. Specifically, the question candidate acquisition unit 14 uses the response log that is the basis for extracting the question sentence and the number of appearances of other question sentences after the utterance in which the question sentence appears as a score to ask a question. You may take out the candidate sentence. By doing this, it is possible to retrieve important questions that appear infrequently.
  • the search query generation unit 15 generates a search query based on the priority of the plurality of question sentences constituting the question cluster or the words and phrases in the plurality of question sentences determined by the priority determination method according to the present embodiment. (Step S32). For example, the search query generation unit 15 calculates a higher query score for a determined question sentence with a higher priority, and removes words and phrases selected by the selection unit 12 as unusable for a search query from the question sentence having a higher query score. And generate a search query. Further, the search query generation unit 15 calculates the query score higher for words that are selected to be usable for the search query, have a higher frequency of appearance in the question sentence, and have a higher priority, and use words with a higher query score. And generate a search query. In this way, the search query generation unit 15 generates a search query without using the phrase selected as unavailable for the search query.
  • the answer candidate acquisition unit 16 searches the business manual using the generated search query and acquires the search result (step S33).
  • the pair output unit 17 calculates a search score for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and determines the output order of the search results for the search query based on the calculated search score. (Step S34).
  • the answer candidate acquisition method includes a step of acquiring a question candidate (question acquisition step), a step of generating a search query (search query generation step), and a step of acquiring a search result (search result). It includes a step (acquisition step) and a step (output step) for controlling the output order of search results.
  • a question acquisition step a question cluster consisting of a plurality of question sentences related to the document to be searched is acquired.
  • search query generation step a plurality of question sentences or a plurality of question sentences or words are prioritized in a plurality of question sentences or a plurality of question sentences constituting the acquired question cluster determined by the priority determination method according to the present embodiment.
  • search result acquisition step the document to be searched is searched using the generated search query, and the search result is acquired.
  • search result acquisition step a search score is calculated for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and the output order of the search results for the search query is determined based on the calculated search score.
  • search query generation step multiple search queries may be generated from one question cluster.
  • the business manual in the search result acquisition step, the business manual may be searched using each of the generated plurality of search queries and the search results may be acquired.
  • the search score is calculated for each search result using multiple search queries, and the search result is based on the query score of the search query and the search score of the search result obtained by the search using the search query. The output order may be determined.
  • the word or phrase it is determined whether or not the word or phrase can be used in the search query in the search target document based on the degree of dispersion of the appearance position of the word or phrase in the search target document.
  • a computer to function as each part of the above-mentioned search system 10.
  • a computer stores a program describing processing contents that realize the functions of each part of the search system 10 in the storage unit of the computer, and the CPU (Central Processing Unit) of the computer reads and executes this program. It can be realized by letting it. That is, the program can cause the computer to function as the selection device 12 described above. The program can also cause the computer to function as the prioritization device 13 described above. Alternatively, the program may allow the computer to function as the answer candidate acquisition device 18 described above.
  • this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium.
  • the computer-readable medium on which the program is recorded may be a non-transient recording medium.
  • the non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.
  • each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

Abstract

A selection device (12) according to the present disclosure comprises: a calculation unit (121) which calculates the degree of dispersion of positions at which a phrase appears in a retrieval target document; and a phrase selection unit (126) which selects, on the basis of the calculated degree of dispersion, whether or not the phrase in the retrieval target document is to be used for a retrieval query for retrieving the retrieval target document.

Description

選択装置、優先順位決定装置、回答候補取得装置、選択方法、優先順位決定方法、回答候補取得方法およびプログラムSelection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program
 本開示は、選択装置、優先順位決定装置、回答候補取得装置、選択方法、優先順位決定方法、回答候補取得方法およびプログラムに関する。 This disclosure relates to a selection device, a priority determination device, an answer candidate acquisition device, a selection method, a priority determination method, an answer candidate acquisition method, and a program.
 近年、電話あるいはチャットなどを用いて、オペレータがカスタマ(顧客)からの問い合わせに応対する部門では、予め想定される質問とその質問に対する回答(いわゆるFAQ(Frequency Asked Questions))を整備してデータベースに登録し、FAQを閲覧・検索可能な検索システムが備えられている。カスタマからの問い合わせに対する回答を行う際には、オペレータは、カスタマからの問い合わせ内容に応じて検索システム上で検索を行い、見つかったFAQの回答に基づいて回答するという運用形態が増加している。 In recent years, in the department where operators respond to inquiries from customers (customers) by telephone or chat, we have prepared questions that are expected in advance and answers to those questions (so-called FAQ (Frequency Asked Questions)) in the database. It is equipped with a search system that allows you to register and browse / search FAQs. When responding to inquiries from customers, operators are increasingly searching on the search system according to the contents of inquiries from customers and responding based on the answers of the found FAQs.
 FAQの内容を整備する方法としては、例えば、オペレータの経験あるいはノウハウから、FAQとして追加・削除すべき内容を決定する方法がある。また、別の方法として、オペレータによるカスタマとの応対を記録した応対ログ(録音データ、録音データに対する音声認識により得られたテキストデータなど)を、コールセンタの管理者などが確認し、FAQとして追加・削除すべき内容を決定する方法がある。提供される商品・サービスなどの変化に応じて、FAQとして整備すべき内容が変化することもあり、FAQの整備には多大なコストがかかっていた。 As a method of preparing the contents of the FAQ, for example, there is a method of determining the contents to be added / deleted as the FAQ from the experience or know-how of the operator. In addition, as another method, the call center administrator etc. confirms the response log (recorded data, text data obtained by voice recognition for the recorded data, etc.) that records the response with the customer by the operator, and adds it as FAQ. There is a way to determine what should be deleted. Depending on the changes in the products and services provided, the contents to be maintained as FAQ may change, and it costs a lot to maintain the FAQ.
 特許文献1には、オペレータによるカスタマとの応対を録音した録音データを音声認識したテキストと、既存のFAQと、FAQをオペレータが検索した際に役立ったか否かの情報とに基づき、FAQの質問を整備する方法が記載されている。この方法では、ある問い合わせクエリに対する検索結果がオペレータに提示され、オペレータは、その検索結果が役に立ったか否かの情報を付与する。そして、検索結果が役に立たなかったと判定されたクエリ(質問)と、録音データから抽出された質問(FAQ候補)と、既に整備済みの質問とに基づき、類似した質問を集約し、集約した質問のうち、既に整備済みの質問を含まない集合を、FAQに追加すべき質問として抽出する。特許文献1に記載の方法では、FAQに追加すべき質問を取得することはできるが、整備対象の商品・サービス単位での質問の抽出、および、質問に対する回答の取得は行われない。 In Patent Document 1, a FAQ question is given based on a text that voice-recognizes recorded data obtained by recording a customer's response by an operator, an existing FAQ, and information on whether or not the FAQ was useful when the operator searched for the FAQ. The method of servicing is described. In this method, the search result for a certain query is presented to the operator, and the operator gives information as to whether or not the search result is useful. Then, based on the query (question) that the search result was judged to be useless, the question (FAQ candidate) extracted from the recorded data, and the question that has already been prepared, similar questions are aggregated and the aggregated question Among them, the set that does not include the questions that have already been prepared is extracted as the questions that should be added to the FAQ. With the method described in Patent Document 1, it is possible to obtain a question to be added to the FAQ, but the question is not extracted for each product / service to be maintained and the answer to the question is not obtained.
 オペレータがカスタマからの質問に対する回答を取得する方法として、オペレータが、回答の参考に利用する業務マニュアルに対して、質問に応じた検索クエリを用いて検索を行う方法がある。この方法では、業務マニュアル全体に亘って頻繁に出現する語句が検索クエリに含まれると、質問に対する回答として関係性の低い項目まで検索結果として検索されてしまうことがある。そこで、文書中に点在する語句を取り除くために、語句の出現情報を利用する方法がよく用いられる(非特許文献1参照)。しかしながら、この方法では、例えば、業務マニュアル中のある一節に高頻度に出現する語句であっても、その語句が業務マニュアル全体に亘って頻繁に出現する場合には、語句の優先度が低下し、質問に対する回答として適切な検索結果が得られないことがある。特許文献2には、単語の出現密度を用いて、オペレータとカスタマとの会話を録音した音声ログから、カスタマの用件の発言区間を求める方法が記載されているが、この方法は、より適切な検索を目的としたものではない。 As a method for the operator to obtain the answer to the question from the customer, there is a method in which the operator searches the business manual used as a reference for the answer by using a search query corresponding to the question. In this method, if a search query contains words and phrases that frequently appear throughout the business manual, even items that are not relevant as answers to the question may be searched as search results. Therefore, in order to remove words and phrases scattered in a document, a method of using the appearance information of the words and phrases is often used (see Non-Patent Document 1). However, in this method, for example, even if a phrase frequently appears in a certain passage in the business manual, if the phrase frequently appears throughout the business manual, the priority of the phrase is lowered. , You may not get proper search results as an answer to your question. Patent Document 2 describes a method of obtaining a statement section of a customer's request from a voice log recording a conversation between an operator and a customer using the word appearance density, but this method is more appropriate. It is not intended for a search.
国際公開第2019/156103号International Publication No. 2019/156103 特開2012-47875号公報Japanese Unexamined Patent Publication No. 2012-47775
 応対ログからFQA整備のための回答候補を取得するためには、業務マニュアルを検索するための検索クエリを作成する必要がある。検索クエリの作成に非特許文献1に記載の方法を用いると、業務マニュアルの特定の範囲(ある章あるいは節など)にまとまって出現する語句も検索クエリから除外され、適切な検索結果を得ることができなくなる。 In order to obtain answer candidates for FQA maintenance from the response log, it is necessary to create a search query to search the business manual. When the method described in Non-Patent Document 1 is used to create a search query, words and phrases that appear collectively in a specific range (such as a chapter or section) of a business manual are excluded from the search query, and appropriate search results are obtained. Can't be done.
 上記のような問題点に鑑みてなされた本開示の目的は、検索対象の文書を検索するための検索クエリにより適した語句を選択することができる、選択装置、優先順位決定装置、回答候補取得装置、選択方法、優先順位決定方法、回答候補取得方法およびプログラムを提供することにある。 The purpose of the present disclosure made in view of the above-mentioned problems is a selection device, a priority determination device, and an answer candidate acquisition that can select a phrase more suitable for a search query for searching a document to be searched. It is an object of the present invention to provide an apparatus, a selection method, a priority determination method, an answer candidate acquisition method, and a program.
 上記課題を解決するため、本開示に係る選択装置は、検索対象の文書における語句の出現位置の分散の程度を算出する算出部と、前記算出部により算出された分散の程度に基づき、前記検索対象の文書における語句の、前記検索対象の文書を検索するための検索クエリへの使用の可否を選択する語句選択部と、を備える。 In order to solve the above problems, the selection device according to the present disclosure has a calculation unit that calculates the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and the search based on the degree of dispersion calculated by the calculation unit. A word / phrase selection unit for selecting whether or not a word / phrase in the target document can be used in a search query for searching the search target document is provided.
 また、上記課題を解決するため、本開示に係る優先順位決定装置は、上述した選択装置により前記検索クエリに使用可能と選択された語句の出現頻度に基づき、前記検索対象の文書に関連する複数の質問文または前記複数の質問文における語句の優先順位を決定する。 Further, in order to solve the above problems, the priority determination device according to the present disclosure includes a plurality of prioritization devices related to the document to be searched based on the frequency of appearance of words and phrases selected as usable in the search query by the selection device described above. The priority of words and phrases in the question sentence or the plurality of question sentences is determined.
 また、上記課題を解決するため、本開示に係る回答候補取得装置は、検索対象の文書に関連する複数の質問文からなる質問クラスタを取得する質問取得部と、上述した優先順位決定装置に、前記質問クラスタを構成する前記複数の質問文を入力し、上述した優先順位決定装置により決定された前記複数の質問文または前記複数の質問文における語句の優先順位に基づき、前記複数の質問文または前記複数の質問文における語句のクエリスコアを算出し、該算出したクエリスコアに基づき前記検索クエリを生成する検索クエリ生成部と、前記生成された検索クエリを用いて前記検索対象の文書を検索して、検索結果を取得する回答候補取得部と、前記取得された複数の検索結果それぞれについて、前記複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、該算出した検索スコアに基づき、前記検索クエリに対する検索結果の出力順序を決定する出力部と、を備える。 Further, in order to solve the above problems, the answer candidate acquisition device according to the present disclosure includes a question acquisition unit that acquires a question cluster consisting of a plurality of question sentences related to the document to be searched, and the above-mentioned priority determination device. The plurality of question sentences constituting the question cluster are input, and the plurality of question sentences or the plurality of question sentences are determined based on the priority of the plurality of question sentences or the words and phrases in the plurality of question sentences determined by the above-mentioned priority determination device. The search query generator that calculates the query score of words and phrases in the plurality of question sentences and generates the search query based on the calculated query score, and the generated search query are used to search the document to be searched. Then, for each of the answer candidate acquisition unit for acquiring the search results and the plurality of acquired search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and based on the calculated search score, the search score is calculated. An output unit for determining the output order of search results for the search query is provided.
 また、上記課題を解決するため、本開示に係る選択方法は、検索対象の文書における語句の出現位置の分散の程度を算出する算出ステップと、前記算出された分散の程度に基づき、前記検索対象の文書における語句の、前記検索対象の文書を検索するための検索クエリへの使用の可否を選択する選択ステップと、を含む。 Further, in order to solve the above problems, the selection method according to the present disclosure is based on a calculation step of calculating the degree of dispersion of the appearance positions of words and phrases in the document to be searched and the calculated degree of dispersion. Includes a selection step of selecting whether or not the phrase in the document can be used in a search query to search for the document to be searched.
 また、上記課題を解決するため、本開示に係る優先順位決定方法は、上述した選択方法により前記検索クエリに使用可能と選択された語句の出現頻度に基づき、前記検索対象の文書に関連する複数の質問文または前記複数の質問文における語句の優先順位を決定する優先順位決定ステップを含む。 Further, in order to solve the above problems, the prioritization method according to the present disclosure includes a plurality of prioritization methods related to the document to be searched based on the frequency of appearance of words and phrases selected to be usable in the search query by the selection method described above. Includes a priority determination step that determines the priority of words in the interrogative text or the plurality of interrogative texts.
 また、上記課題を解決するため、本開示に係る回答候補取得方法は、検索対象の文書に関連する複数の質問文からなる質問クラスタを取得する質問取得ステップと、上述した優先順位決定方法により決定された、前記質問クラスタを構成する前記複数の質問文または前記複数の質問文における語句の優先順位に基づき、前記複数の質問文または前記複数の質問文における語句のクエリスコアを算出し、該算出したクエリスコアに基づき前記検索クエリを生成する検索クエリ生成ステップと、前記生成された検索クエリを用いて前記検索対象の文書を検索して、検索結果を取得する検索結果取得ステップと、前記取得された複数の検索結果それぞれについて、前記複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、該算出した検索スコアに基づき、前記検索クエリに対する検索結果の出力順序を決定する出力ステップと、を含む。 Further, in order to solve the above-mentioned problems, the answer candidate acquisition method according to the present disclosure is determined by the question acquisition step of acquiring a question cluster consisting of a plurality of question sentences related to the document to be searched and the above-mentioned priority determination method. Based on the priority of the words and phrases in the plurality of question sentences or the plurality of question sentences constituting the question cluster, the query score of the words and phrases in the plurality of question sentences or the plurality of question sentences is calculated and calculated. A search query generation step that generates the search query based on the generated query score, a search result acquisition step that searches the document to be searched using the generated search query and acquires the search result, and the acquisition. For each of the plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output step for determining the output order of the search results for the search query based on the calculated search score. including.
 また、上記課題を解決するため、本開示に係るプログラムは、コンピュータを、上述した選択装置、優先順位決定装置または回答候補取得装置として機能させる。 Further, in order to solve the above problems, the program according to the present disclosure causes the computer to function as the above-mentioned selection device, priority determination device, or answer candidate acquisition device.
 本開示に係る選択装置、優先順位決定装置、回答候補取得装置、選択方法、優先順位決定方法、回答候補取得方法およびプログラムによれば、検索対象の文書を検索するための検索クエリにより適した語句を選択することができる。 According to the selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method, and program according to the present disclosure, words and phrases more suitable for a search query for searching a document to be searched. Can be selected.
本開示の一実施形態に係る検索システムの構成例を示す図である。It is a figure which shows the structural example of the search system which concerns on one Embodiment of this disclosure. 図1に示す選択部の構成例を示す図である。It is a figure which shows the structural example of the selection part shown in FIG. 図2に示す選択部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the selection part shown in FIG. 図1に示す優先順位決定部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the priority order determination part shown in FIG. 図1に示す回答候補取得装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the answer candidate acquisition apparatus shown in FIG.
 以下、本開示の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.
 図1は、本開示の一実施形態に係る検索システム10の構成例を示す図である。本実施形態に係る検索システム10は、検索対象の文書を検索するための検索クエリを生成し、生成した検索クエリを用いて検索対象の文書の検索を行い、検索結果を出力するものである。以下では、コールセンタにおいて、オペレータがカスタマからの質問に対する回答に際して参照する業務マニュアルを検索対象の文章とし、業務マニュアルを検索するための検索クエリを生成し、生成した検索クエリを用いた業務マニュアルの検索により、想定される質問およびその回答(FAQ)を整備する例を用いて説明する。 FIG. 1 is a diagram showing a configuration example of the search system 10 according to the embodiment of the present disclosure. The search system 10 according to the present embodiment generates a search query for searching a document to be searched, searches the document to be searched using the generated search query, and outputs a search result. In the following, in the call center, the business manual that the operator refers to when answering the question from the customer is the text to be searched, a search query for searching the business manual is generated, and the business manual is searched using the generated search query. Will be explained using an example of preparing a possible question and its answer (FAQ).
 図1に示すように、本実施形態に係る検索システム10は、質問判定部11と、選択装置としての選択部12と、優先順位決定装置としての優先順位決定部13と、質問候補取得部14と、検索クエリ生成部15と、回答候補取得部16と、ペア出力部17とを備える。質問候補取得部14は、質問取得部の一例である。ペア出力部17は出力部の一例である。質問候補取得部14、検索クエリ生成部15、回答候補取得部16およびペア出力部17は、回答候補取得装置18を構成する。 As shown in FIG. 1, the search system 10 according to the present embodiment has a question determination unit 11, a selection unit 12 as a selection device, a priority determination unit 13 as a priority determination device, and a question candidate acquisition unit 14. , A search query generation unit 15, an answer candidate acquisition unit 16, and a pair output unit 17. The question candidate acquisition unit 14 is an example of the question acquisition unit. The pair output unit 17 is an example of an output unit. The question candidate acquisition unit 14, the search query generation unit 15, the answer candidate acquisition unit 16, and the pair output unit 17 constitute an answer candidate acquisition device 18.
 質問判定部11は、オペレータによるカスタマとの応対を記録した応対ログが入力される。応対ログは、例えば、オペレータとカスタマとの対話を録音した録音データ、録音データを音声認識によりテキスト化したテキストデータなどである。質問判定部11は、入力された応対ログから、カスタマが質問を発話した質問発話(質問文)を取得する。質問判定部11は、例えば、分類器で学習されたモデルを用いて、応対ログから音声認識結果を取得し、その音声認識結果から質問発話であるか否かを判定する。判定には、既知の任意の判定器を用いることができる。質問判定部11は、取得した複数の質問発話(質問文)を質問候補取得部14に出力する。 The question determination unit 11 inputs a response log recording the response of the operator to the customer. The response log is, for example, recorded data obtained by recording a dialogue between an operator and a customer, text data obtained by converting the recorded data into text by voice recognition, and the like. The question determination unit 11 acquires the question utterance (question sentence) in which the customer utters the question from the input response log. The question determination unit 11 acquires a voice recognition result from the response log using, for example, a model learned by the classifier, and determines whether or not the question is uttered from the voice recognition result. Any known determination device can be used for the determination. The question determination unit 11 outputs the acquired plurality of question utterances (question sentences) to the question candidate acquisition unit 14.
 選択部12は、検索対象の文書である業務マニュアルにおける語句(単語または句)の、検索クエリへの使用の可否を選択し、選択の結果を優先順位決定部13に出力する。図2は、選択部12の構成例を示す図である。 The selection unit 12 selects whether or not a word (word or phrase) in the business manual, which is the document to be searched, can be used in the search query, and outputs the selection result to the priority determination unit 13. FIG. 2 is a diagram showing a configuration example of the selection unit 12.
 図2に示すように、選択装置としての選択部12は、算出部121と、語句選択部126とを備える。 As shown in FIG. 2, the selection unit 12 as a selection device includes a calculation unit 121 and a phrase selection unit 126.
 算出部121は、検索対象の文書である業務マニュアルにおける語句の出現位置の分散の程度を算出する。算出部121は、図2に示すように、業務マニュアル分割部122と、語句分割部123と、出現頻度算出部124と、出現分散算出部125とを備える。 The calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched. As shown in FIG. 2, the calculation unit 121 includes a business manual division unit 122, a phrase division unit 123, an appearance frequency calculation unit 124, and an appearance dispersion calculation unit 125.
 業務マニュアル分割部122は、業務マニュアルを行単位に分割する。 The business manual division unit 122 divides the business manual into line units.
 語句分割部123は、業務マニュアル分割部122により行ごとに分割された業務マニュアルを語句単位で分割する。 The word / phrase division unit 123 divides the business manual divided into lines by the business manual division unit 122 into word / phrase units.
 出現頻度算出部124は、語句分割部123により分割された語句ごとの、業務マニュアルにおける出現頻度を算出する。 The appearance frequency calculation unit 124 calculates the appearance frequency in the business manual for each word / phrase divided by the word / phrase division unit 123.
 出現分散算出部125は、出現頻度算出部124により算出された語句ごとの出現頻度と出現位置とに基づき、語句ごとに、業務マニュアルにおける出現位置の分散の程度を算出する。 The appearance dispersion calculation unit 125 calculates the degree of dispersion of the appearance position in the business manual for each word based on the appearance frequency and the appearance position of each word calculated by the appearance frequency calculation unit 124.
 語句選択部126は、算出部121により算出された、語句の出現位置の分散の程度に基づき、業務マニュアルにおける語句の検索クエリへの使用の可否を選択し、選択の結果を優先順位決定部13に出力する。語句選択部126は、例えば、出現位置の分散が所定の閾値より大きい語句を、検索クエリに使用不可の語句と選択し、出現位置の分散が所定の閾値以下の語句を、検索クエリに使用可能な語句と選択する。出現位置の分散が大きい語句は、業務マニュアル全体に頻繁に出現する語句である。このような語句を検索クエリに含めると、質問に対する適切な回答を得られない可能性が高い。そこで、出現位置の分散が大きい語句を検索クエリに使用不可の語句と選択することで、業務マニュアル全体に頻繁に出現する語句は、検索クエリから除外することができる。また、出現位置の分散が小さい語句を検索クエリに使用可能な語句と選択することで、業務マニュアル全体に出現する語句であっても、特定の範囲に高頻度に出現する語句は、検索クエリに含めることができる。そのため、検索クエリにより適した語句を選択することができる。その結果、質問に対する適切な回答が得られやすくなる。 The word selection unit 126 selects whether or not the word can be used in the search query of the word in the business manual based on the degree of dispersion of the appearance position of the word calculated by the calculation unit 121, and determines the selection result as the priority determination unit 13. Output to. For example, the phrase selection unit 126 selects a phrase whose appearance position variance is larger than a predetermined threshold value as a phrase that cannot be used in a search query, and can use a phrase whose appearance position variance is less than or equal to a predetermined threshold value in a search query. Select a phrase. Words and phrases with a large dispersion of appearance positions are words and phrases that frequently appear in the entire business manual. Including such words in your search query is likely to prevent you from getting the right answer to your question. Therefore, by selecting words and phrases that have a large dispersion of appearance positions as words and phrases that cannot be used in the search query, words and phrases that frequently appear in the entire business manual can be excluded from the search query. In addition, by selecting words and phrases that have a small dispersion of appearance positions as words and phrases that can be used in search queries, words and phrases that appear frequently in a specific range can be used in search queries even if they appear in the entire business manual. Can be included. Therefore, it is possible to select a phrase that is more suitable for the search query. As a result, it becomes easier to obtain appropriate answers to the questions.
 図1を再び参照すると、優先順位決定装置としての優先順位決定部13は、選択部12により検索クエリに使用可能と選択された語句の出現頻度に基づき、業務マニュアルに関連する複数の質問文または質問文における語句の、検索クエリに用いる優先順位を決定し、結果を検索クエリ生成部15に出力する。 Referring again to FIG. 1, the priority determination unit 13 as the priority determination device has a plurality of interrogative sentences or a plurality of question sentences related to the business manual based on the frequency of occurrence of words and phrases selected as available in the search query by the selection unit 12. The priority of words and phrases in the question sentence to be used for the search query is determined, and the result is output to the search query generation unit 15.
 優先順位決定部13は、例えば、検索クエリに使用可能と選択された語句であり、かつ、複数の質問文における出現頻度が高い語句を含む質問文ほど優先順位を高くする。また、優先順位決定部13は、例えば、検索クエリに使用可能と選択された語句のうち、複数の質問文における出現頻度が高い語句ほど優先順位を高くする。 The priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. Further, the priority determination unit 13 raises the priority of the words and phrases selected to be usable in the search query, for example, the words and phrases that frequently appear in a plurality of interrogative sentences.
 質問候補取得部14は、質問判定部11から出力された複数の質問発話(質問文)を類似する質問文ごとに集約し、類似する複数の質問文からなる質問クラスタを取得する。質問クラスタを構成する質問文は、FAQに追加される候補の質問文(追加候補質問文)である。質問文の集約の方法としては、例えば、word2vecを用いて、それぞれの質問文(質問発話)における単語を抽出し、抽出した単語の単語ベクトルを加算して発話のベクトルを求め、求めたベクトルのコサイン類似度を用いる方法などがある。質問候補取得部14は、取得した質問クラスタを検索クエリ生成部15に出力する。 The question candidate acquisition unit 14 aggregates a plurality of question utterances (question sentences) output from the question determination unit 11 for each similar question sentence, and acquires a question cluster composed of a plurality of similar question sentences. The question texts that make up the question cluster are candidate question texts (additional candidate question texts) that are added to the FAQ. As a method of aggregating question sentences, for example, word2vec is used to extract words in each question sentence (question utterance), add the word vectors of the extracted words to obtain the utterance vector, and obtain the utterance vector of the obtained vector. There is a method using cosine similarity. The question candidate acquisition unit 14 outputs the acquired question cluster to the search query generation unit 15.
 検索クエリ生成部15は、質問候補取得部14から出力された質問クラスタを構成する質問文から業務マニュアルを検索するための検索クエリを生成する。検索クエリ生成部15は、質問クラスタを構成する複数の質問文を優先順位決定部13に入力する。検索クエリ生成部15は、優先順位決定部13により決定された質問文または質問文に含まれる語句の優先順位に基づき、質問文または質問文に含まれる語句の、検索クエリに対する重要度を示すクエリスコアを算出し、算出したクエリスコアに基づき検索クエリを生成する。 The search query generation unit 15 generates a search query for searching a business manual from the question sentences constituting the question cluster output from the question candidate acquisition unit 14. The search query generation unit 15 inputs a plurality of question sentences constituting the question cluster into the priority determination unit 13. The search query generation unit 15 is a query indicating the importance of the question sentence or the phrase included in the question sentence to the search query based on the priority of the question sentence or the phrase included in the question sentence determined by the priority determination unit 13. Calculate the score and generate a search query based on the calculated query score.
 検索クエリ生成部15は、例えば、質問クラスタを構成する複数の質問文のうち、優先順位決定部13により決定された優先順位が高い質問文ほど、クエリスコアを高く算出し、その質問クラスタにおける代表質問文と決定する。上述したように、優先順位決定部13は、例えば、検索クエリに使用可能と選択された語句であり、かつ、複数の質問文における出現頻度が高い語句を含む質問文ほど、優先順位を高くする。したがって、検索クエリ生成部15は、質問クラスタを構成する複数の質問文のうち、検索クエリに使用可能な語句であり、かつ、他の質問文にも頻繁に出現する語句を含む質問文(複数の質問文のベクトル平均に近い質問文)のクエリスコアを高く算出し、その質問を代表質問と決定する。検索クエリ生成部15は、質問クラスタを構成する複数の質問文から、2以上の質問文を代表質問文と決定してもよい。 For example, the search query generation unit 15 calculates the query score higher for the question sentence having the higher priority determined by the priority determination unit 13 among the plurality of question sentences constituting the question cluster, and the representative in the question cluster. Decide on a question. As described above, the priority determination unit 13 raises the priority of a question sentence that is selected to be usable in a search query and includes a phrase that frequently appears in a plurality of question sentences, for example. .. Therefore, the search query generation unit 15 is a question sentence (plurality) that is a phrase that can be used in the search query and includes a phrase that frequently appears in other question sentences among the plurality of question sentences that constitute the question cluster. The query score of the question sentence) that is close to the vector average of the question sentence) is calculated high, and the question is determined as the representative question. The search query generation unit 15 may determine two or more question sentences as representative question sentences from a plurality of question sentences constituting the question cluster.
 検索クエリ生成部15は、決定した代表質問文から、選択部12により検索クエリに使用不可と選択された語句を除外して、検索クエリを生成する。 The search query generation unit 15 generates a search query by excluding words and phrases selected as unusable for the search query by the selection unit 12 from the determined representative question text.
 また、検索クエリ生成部15は、優先順位決定部13により決定された語句の優先順位に基づき、検索クエリを生成してもよい。検索クエリ生成部15は、例えば、優先順位が高い語句のクエリスコアを高く算出し、クエリスコアの高い語句を用いて、検索クエリを生成する。上述したように、優先順位決定部13は、例えば、検索クエリに使用可能と選択された語句のうち、複数の質問文における出現頻度が高い語句ほど、その語句の優先順位を高くする。したがって、検索クエリ生成部15は、例えば、質問クラスタを構成する質問文の数をMとすると、M/2以上の質問文に含まれる語句を用いて、検索クエリを生成する。 Further, the search query generation unit 15 may generate a search query based on the priority of words and phrases determined by the priority determination unit 13. The search query generation unit 15 calculates, for example, a high query score of a phrase having a high priority, and generates a search query using the phrase having a high query score. As described above, the priority determination unit 13 raises the priority of the phrase selected to be usable in the search query, for example, the phrase having a higher frequency of appearance in a plurality of interrogative sentences. Therefore, for example, assuming that the number of question sentences constituting the question cluster is M, the search query generation unit 15 generates a search query using words and phrases included in the question sentences of M / 2 or more.
 検索クエリ生成部15は、生成した検索クエリを回答候補取得部16に出力する。検索クエリ生成部15は、検索クエリのクエリスコア(検索クエリの生成の基となった質問文または語句のクエリスコア)を回答候補取得部16に出力してもよい。 The search query generation unit 15 outputs the generated search query to the answer candidate acquisition unit 16. The search query generation unit 15 may output the query score of the search query (the query score of the question sentence or phrase that is the basis of the search query generation) to the answer candidate acquisition unit 16.
 回答候補取得部16は、検索クエリ生成部15により生成された検索クエリを用いて業務マニュアルを検索し、検索結果を取得する。回答候補取得部16は、複数の検索クエリが生成された場合には、複数の検索クエリそれぞれを用いて業務マニュアルを検索し、検索結果を取得する。 The answer candidate acquisition unit 16 searches the business manual using the search query generated by the search query generation unit 15 and acquires the search results. When a plurality of search queries are generated, the answer candidate acquisition unit 16 searches the business manual using each of the plurality of search queries and acquires the search results.
 検索クエリを用いた業務マニュアルの検索は、例えば、文単位での検索、文を一定の単位で結合した部分文書の検索などの方法により行うことができる。回答候補取得部16は、上述したような複数の方法を組み合わせて、業務マニュアルを検索してもよい。回答候補取得部16は、複数(上位N個)の検索結果を取得してもよい。 The search of the business manual using the search query can be performed by, for example, a search in sentence units or a search for partial documents in which sentences are combined in a certain unit. The answer candidate acquisition unit 16 may search the business manual by combining a plurality of methods as described above. The answer candidate acquisition unit 16 may acquire a plurality of (top N) search results.
 また、回答候補取得部16は、出現位置の分散値が小さい語句あるいは業務マニュアルの目次に出現する語句を、検索範囲の絞り込みに用いてもよい。回答候補取得部16は、これらの語句を用いて検索範囲を絞り込んだうえで、検索範囲の絞り込みに用いた語句を取り除いた検索クエリを用いて検索を行うことで、FAQ整備に有効な範囲の検索を行うことができる。 Further, the answer candidate acquisition unit 16 may use a phrase having a small dispersion value of the appearance position or a phrase appearing in the table of contents of the business manual to narrow down the search range. The answer candidate acquisition unit 16 narrows down the search range using these words and phrases, and then performs a search using a search query from which the words and phrases used for narrowing down the search range are removed, so that the range is effective for FAQ maintenance. You can search.
 回答候補取得部16は、検索クエリと、その検索クエリを用いた検索により取得された検索結果とを、ペア出力部17に出力する。回答候補取得部16は通常、複数の検索結果を取得し、取得した複数の検索結果をペア出力部17に出力する。回答候補取得部16は、検索クエリのクエリスコアをペア出力部17に出力してもよい。 The answer candidate acquisition unit 16 outputs the search query and the search result acquired by the search using the search query to the pair output unit 17. The answer candidate acquisition unit 16 usually acquires a plurality of search results, and outputs the acquired plurality of search results to the pair output unit 17. The answer candidate acquisition unit 16 may output the query score of the search query to the pair output unit 17.
 ペア出力部17は、回答候補取得部16により取得された、検索クエリに対する複数の検索結果それぞれについて、複数の検索結果における語句の出現頻度に基づき検索スコアを算出する。ペア出力部17は、算出した検索スコアに基づき、検索クエリに対する検索結果の出力順序を決定し、決定した順序で検索クエリと検索結果とのペアを出力する。ペア出力部17は、例えば、複数の検索結果のうち、他の検索結果と重複の多い検索結果の検索スコアを高く算出する。また、ペア出力部17は、例えば、複数の検索結果にそれぞれに含まれる語句の、複数の検索結果おける出現頻度に応じて、各検索結果の検索スコアを算出してもよい。 The pair output unit 17 calculates a search score for each of the plurality of search results for the search query acquired by the answer candidate acquisition unit 16 based on the frequency of appearance of words and phrases in the plurality of search results. The pair output unit 17 determines the output order of the search results for the search query based on the calculated search score, and outputs the pair of the search query and the search result in the determined order. The pair output unit 17 calculates, for example, a high search score of a search result having a lot of duplication with other search results among a plurality of search results. Further, the pair output unit 17 may calculate, for example, the search score of each search result according to the frequency of appearance of words and phrases included in each of the plurality of search results in the plurality of search results.
 複数の検索クエリが生成され、複数の検索クエリそれぞれについての検索結果が取得された場合、ペア出力部17は、検索クエリのクエリスコアと、その検索クエリによる検索により取得された検索結果の検索スコアとに基づき、検索結果の出力順序を制御してもよい。例えば、ペア出力部17は、クエリスコアと検索スコアとの乗算値に基づき、検索クエリと検索結果とのペアの出力順序を制御してよい。 When a plurality of search queries are generated and search results for each of the plurality of search queries are obtained, the pair output unit 17 uses the query score of the search query and the search score of the search result obtained by the search by the search query. The output order of the search results may be controlled based on the above. For example, the pair output unit 17 may control the output order of the pair of the search query and the search result based on the multiplication value of the query score and the search score.
 次に、本実施形態に係る検索システム10の動作について説明する。 Next, the operation of the search system 10 according to the present embodiment will be described.
 図3は、選択部12の動作の一例を示すフローチャートであり、選択装置としての選択部12による選択方法を説明するための図である。 FIG. 3 is a flowchart showing an example of the operation of the selection unit 12, and is a diagram for explaining a selection method by the selection unit 12 as a selection device.
 算出部121は、検索対象の文書である業務マニュアルにおける語句の出現位置の分散の程度を算出する(ステップS11)。出現位置の分散の算出について、出現分散算出部125の動作を中心に、より詳細に説明する。 The calculation unit 121 calculates the degree of dispersion of the appearance positions of words and phrases in the business manual that is the document to be searched (step S11). The calculation of the variance of the appearance position will be described in more detail, focusing on the operation of the appearance variance calculation unit 125.
 業務マニュアルは、1つのhtml(Hyper Text Markup Language)ファイルあるいはテキストファイルなどで構成されている。語句分割部123は、mecabなどの形態素解析により語句(名詞など)を抽出し、抽出した語句と、その語句が出現した行番号とを取得する。出現頻度算出部124は、抽出された語句の出現頻度を算出する。出現分散算出部125は、抽出された語句が出現した行番号から、語句の出現位置の分散を算出する。 The business manual is composed of one html (HyperTextMarkupLanguage) file or a text file. The phrase dividing unit 123 extracts a phrase (noun or the like) by morphological analysis such as mecab, and acquires the extracted phrase and the line number in which the phrase appears. The appearance frequency calculation unit 124 calculates the appearance frequency of the extracted words and phrases. The appearance variance calculation unit 125 calculates the variance of the appearance position of the phrase from the line number where the extracted phrase appears.
 出現分散算出部125は、業務マニュアル中にN回出現する各語句(X)について、語句Xが出現するマニュアルの行番号を出現位置とし、語句Xn(n=1~N)の出現位置と語句Xの出現位置の平均値との差分の総和を語句Xの出現数Nで除して、語句Xの出現位置の分散を算出する。 The appearance dispersion calculation unit 125 sets the line number of the manual in which the phrase X appears as the appearance position for each phrase (X) that appears N times in the business manual, and the appearance position and phrase of the phrase Xn (n = 1 to N). The variance of the appearance position of the phrase X is calculated by dividing the sum of the differences from the average value of the appearance positions of X by the number of appearances N of the phrase X.
 出現分散算出部125による語句の出現位置の分散の算出について、以下に示す10行の業務マニュアルを例として説明する。 The calculation of the variance of the appearance position of a word by the appearance variance calculation unit 125 will be described using the following 10-line business manual as an example.
 <業務マニュアル>
1.電話の設定方法。
2.電話着信時の設定の方法は以下のとおりです。
3.メニューを選択し、設定ボタンを押して、電話→着信を選択する。
4.着信音を設定する場合は、着信音を選択し、着信音ファイルを選択する。
5.着信拒否設定をする場合は、着信拒否を選択し、拒否する番号を入力する。
6.電話発信時の設定の方法は以下のとおりです。
7.メニューを選択し、設定ボタンを押して、電話→発信を選択する。
8.発信時に番号を付与する場合は、番号付与設定を選択する。
9.操作方法
10.その他
<Business manual>
1. 1. How to set up the phone.
2. 2. The setting method when receiving a call is as follows.
3. 3. Select the menu, press the settings button, and select Phone → Incoming call.
4. To set the ringtone, select the ringtone and then the ringtone file.
5. To reject incoming calls, select Reject incoming calls and enter the number to reject.
6. The setting method when making a call is as follows.
7. Select the menu, press the settings button, and select Phone → Call.
8. To assign a number when making a call, select the numbering setting.
9. Operation method 10. others
 上述した操作マニュアルにおいて、語句「電話」は、1,2,3,6,7行目に出現している。出現分散算出部125は、語句「電話」の出現位置の平均値を全行数で除算することで、正規化済みの平均出現位置を求める。そして、出現分散算出部125は、語句「電話」の出現位置を全行数で除算した値から、正規化済みの平均出現位置を減算した値の2乗平均を算出することで、語句「電話」の出現位置の分散を算出する。 In the above-mentioned operation manual, the phrase "telephone" appears on lines 1, 2, 3, 6, and 7. The appearance variance calculation unit 125 obtains the normalized average appearance position by dividing the average value of the appearance positions of the phrase “telephone” by the total number of lines. Then, the appearance variance calculation unit 125 calculates the squared average of the value obtained by subtracting the normalized average appearance position from the value obtained by dividing the appearance position of the phrase “phone” by the total number of lines, thereby calculating the word “phone”. The variance of the appearance position of "" is calculated.
 上述した例では、正規化済みの平均出現位置は0.38(=(1+2+3+6+7)/5/10)となる。また、語句「電話」の出現位置の分散は0.21(=((1/10-0.38)+(2/10-0.38)+(3/10-0.38)+(6/10-0.38)+(7/10-0.38))*((1/10-0.38)+(2/10-0.38)+(3/10-0.38)+(6/10-0.38)+(7/10-0.38))/5)となる。同様にして、例えば、語句「メニュー」の出現位置の分散を算出すると、0.2となる。 In the above example, the normalized average appearance position is 0.38 (= (1 + 2 + 3 + 6 + 7) / 5/10). In addition, the variance of the appearance position of the phrase "telephone" is 0.21 (= ((1 / 10-0.38) + (2 / 10-0.38) + (3 / 10-0.38) + (6). / 10-0.38) + (7 / 10-0.38)) * ((1 / 10-0.38) + (2 / 10-0.38) + (3 / 10-0.38) + (6 / 10-0.38) + (7 / 10-0.38)) / 5). Similarly, for example, the variance of the appearance position of the phrase "menu" is calculated to be 0.2.
 業務マニュアルが複数存在する場合、出現分散算出部125は、例えば、各業務マニュアルについて、語句の出現位置の分散を算出してよい。また、本実施形態では、算出部121は、業務マニュアルを行単位で分割し、行番号に基づき分散値を算出する例を用いて説明したが、これに限られるものではない。算出部121は、行番号の代わりに、業務マニュアルの先頭から付与された通番の文番号を用いて、語句の出現位置の分散を算出してもよい。 When there are a plurality of business manuals, the appearance variance calculation unit 125 may calculate, for example, the variance of the appearance position of the phrase for each business manual. Further, in the present embodiment, the calculation unit 121 has described by using an example in which the business manual is divided into line units and the distribution value is calculated based on the line number, but the present invention is not limited to this. The calculation unit 121 may calculate the variance of the appearance position of the phrase by using the sentence number of the serial number assigned from the beginning of the business manual instead of the line number.
 図3を再び参照すると、語句選択部126は、算出された分散の程度に基づき、業務マニュアルにおける語句の、業務マニュアルを検索するための検索クエリへの使用の可否を選択する(ステップS12)。 Referring to FIG. 3 again, the phrase selection unit 126 selects whether or not the phrase in the business manual can be used in the search query for searching the business manual based on the calculated degree of dispersion (step S12).
 語句選択部126は、例えば、全語句の出現位置の分散の平均値を求める。そして、語句選択部126は、例えば、出現位置の分散が全語句の出現位置の分散の平均値よりも大きい語句を、検索クエリに使用不可な語句と選択し、出現位置の分散が全語句の出現位置の分散の平均値以下である語句を、検索クエリに使用可能な語句と選択する。このように、語句選択部126は、語句の分散の程度に基づき、検索対象の文書である業務マニュアルにおける語句の、検索クエリへの使用の可否を選択する。複数の業務ファイルが存在する場合、語句選択部126は、例えば、1つの語句について算出された、各業務マニュアルにおける出現位置の分散を平均して、その語句の検索クエリへの使用の可否を選択する。 The word selection unit 126 obtains, for example, the average value of the variance of the appearance positions of all words. Then, the phrase selection unit 126 selects, for example, a phrase whose appearance position variance is larger than the average value of the appearance position variance of all words as a phrase that cannot be used in the search query, and the variance of the appearance position is all words. Select words that are less than or equal to the average variance of the occurrence position as words that can be used in the search query. In this way, the phrase selection unit 126 selects whether or not the phrase in the business manual, which is the document to be searched, can be used in the search query, based on the degree of dispersion of the phrase. When a plurality of business files exist, the phrase selection unit 126 averages the variance of the appearance position in each business manual calculated for one phrase, and selects whether or not the phrase can be used in the search query. do.
 このように本実施形態に係る選択方法は、検索対象の文書(業務マニュアル)における語句の出現位置の分散の程度を算出するステップ(算出ステップ)と、算出された分散の程度に基づき、検索対象の文書における語句の、検索対象の文書を検索するための検索クエリへの使用の可否を選択するステップ(選択ステップ)と、を含む。 As described above, the selection method according to the present embodiment is based on the step (calculation step) of calculating the degree of dispersion of the appearance position of the word in the document (business manual) to be searched and the calculated degree of dispersion. Includes a step (selection step) of selecting whether or not the phrase in the document can be used in a search query for searching the document to be searched.
 図4は、優先順位決定部13の動作の一例を示すフローチャートであり、優先順位決定装置としての優先順位決定部13による優先順位決定方法を説明するための図である。 FIG. 4 is a flowchart showing an example of the operation of the priority determination unit 13, and is a diagram for explaining a priority determination method by the priority determination unit 13 as a priority determination device.
 優先順位決定部13は、業務マニュアルに関連する複数の質問文それぞれにおける、図3を参照して説明した選択方法により検索クエリに使用可能と選択された語句の出現頻度に基づき、質問文または質問文における語句の優先順位を決定する(ステップS21)。 The priority determination unit 13 determines the question text or the question based on the frequency of occurrence of the words and phrases selected to be usable in the search query by the selection method described with reference to FIG. 3 in each of the plurality of question texts related to the business manual. The priority of words and phrases in a sentence is determined (step S21).
 このように本実施形態に係る優先順位決定方法は、検索対象の文書に関連する複数の質問文それぞれにおける、本実施形態に係る選択方法により検索クエリに使用可能と選択された語句の出現頻度に基づき、質問文または質問文における語句の優先順位を決定するステップ(優先順位決定ステップ)を含む。 As described above, the priority determination method according to the present embodiment determines the frequency of appearance of words and phrases selected as usable in the search query by the selection method according to the present embodiment in each of the plurality of question sentences related to the document to be searched. Based on this, it includes a step of determining the priority of a question sentence or a phrase in the question sentence (priority determination step).
 図5は、回答候補取得装置18の動作の一例を示すフローチャートであり、回答候補取得装置18による回答候補取得方法について説明するための図である。 FIG. 5 is a flowchart showing an example of the operation of the answer candidate acquisition device 18, and is a diagram for explaining a method of acquiring answer candidates by the answer candidate acquisition device 18.
 質問候補取得部14は、質問判定部11から出力された複数の質問文を集約して、質問クラスタを取得する(ステップS31)。 The question candidate acquisition unit 14 aggregates a plurality of question sentences output from the question determination unit 11 and acquires a question cluster (step S31).
 電話を用いた、故障対応あるいは商品に対する問い合わせなどのコールセンタ業務では、オペレータは、整備されたFAQを参考にして、カスタマとの応対を行う。FAQの整備にあたっては、整備対象となるFAQのカテゴリあるいはジャンルを決定し、カテゴリあるいはジャンルごとに質問を抽出して、FAQの整備が行われる。そのため、質問判定部11は、例えば、応対ログのファイル群から、例えば、「セキュリティ」などのキーワードを用いて絞り込みを行う。次に、質問判定部11は、絞り込んだ応対ログから、商品販売のための発話スクリプトまたは商品名などをキーワードとし、キーワードが出現した箇所以降の応対ログを、質問発話(質問文)であるか否かを判定する質問判定器に入力し、質問文(質問候補文)を取得する。 In call center operations such as troubleshooting or inquiries about products using telephone, the operator responds to customers by referring to the FAQ that has been prepared. When preparing FAQs, the categories or genres of FAQs to be maintained are determined, questions are extracted for each category or genre, and FAQs are maintained. Therefore, the question determination unit 11 narrows down the file group of the response log, for example, by using a keyword such as "security". Next, from the narrowed-down response log, the question determination unit 11 uses an utterance script for product sales or a product name as a keyword, and whether the response log after the place where the keyword appears is a question utterance (question sentence). Input to the question judge to judge whether or not, and acquire the question sentence (question candidate sentence).
 質問候補取得部14は、質問候補文から、任意の個数の重要語句を抽出する。抽出された重要語句は、質問候補文を示す語句である。質問候補取得部14は、全ての質問候補文に出現する単語の出現頻度と、FAQの回答の作成に当たって参照する業務マニュアルとを用いて、業務マニュアル中の単語の出現頻度の逆数で、抽出した語句の重み付けを行い、各質問候補文のベクトル表現を求める。質問候補取得部14は、ベクトル表現した質問候補文から質問クラスタを取得する。質問候補取得部14は、既存の(整備済みの)FAQに含まれる質問文は除外して、質問クラスタを取得する。 The question candidate acquisition unit 14 extracts an arbitrary number of important words and phrases from the question candidate sentences. The extracted important words and phrases are words and phrases indicating question candidate sentences. The question candidate acquisition unit 14 used the frequency of appearance of words appearing in all question candidate sentences and the business manual referred to when creating the FAQ answer, and extracted by the inverse of the frequency of appearance of words in the business manual. Words are weighted and the vector expression of each question candidate sentence is obtained. The question candidate acquisition unit 14 acquires a question cluster from the vector-expressed question candidate sentence. The question candidate acquisition unit 14 acquires the question cluster by excluding the question text included in the existing (prepared) FAQ.
 質問候補取得部14は、小さなクラスタなど、頻度が少ない質問候補文については、その質問文が出現した応対ログの周辺箇所の応対ログのうち、繰り返しの確認・質問など、回答に時間を要している箇所も質問候補文に含めてよい。具体的には、質問候補取得部14は、質問文の抽出の基となった応対ログ、および、その質問文が出現した発話以降の他の質問文の出現数をスコアとして利用して、質問候補文を取り出してよい。こうすることで、出現頻度が低いが重要な質問を取り出すことができる。 For infrequent question candidate sentences such as small clusters, the question candidate acquisition unit 14 takes time to repeatedly confirm and answer questions from the response logs in the vicinity of the response log in which the question sentence appears. You may also include the part that is mentioned in the question candidate sentence. Specifically, the question candidate acquisition unit 14 uses the response log that is the basis for extracting the question sentence and the number of appearances of other question sentences after the utterance in which the question sentence appears as a score to ask a question. You may take out the candidate sentence. By doing this, it is possible to retrieve important questions that appear infrequently.
 次に、検索クエリ生成部15は、本実施形態に係る優先順位決定方法により決定された、質問クラスタを構成する複数の質問文または複数の質問文における語句の優先順位に基づき、検索クエリを生成する(ステップS32)。検索クエリ生成部15は、例えば、決定された優先順位が高い質問文ほどクエリスコアを高く算出し、クエリスコアが高い質問文から、選択部12により検索クエリに使用不可と選択された語句を取り除いて、検索クエリを生成する。また、検索クエリ生成部15は、例えば、検索クエリに使用可能と選択され、かつ、質問文における出現頻度が高く、優先順位が高い語句ほどクエリスコアを高く算出し、クエリスコアが高い語句を用いて、検索クエリを生成する。このように、検索クエリ生成部15は、検索クエリに使用不可を選択された語句を用いずに、検索クエリを生成する。 Next, the search query generation unit 15 generates a search query based on the priority of the plurality of question sentences constituting the question cluster or the words and phrases in the plurality of question sentences determined by the priority determination method according to the present embodiment. (Step S32). For example, the search query generation unit 15 calculates a higher query score for a determined question sentence with a higher priority, and removes words and phrases selected by the selection unit 12 as unusable for a search query from the question sentence having a higher query score. And generate a search query. Further, the search query generation unit 15 calculates the query score higher for words that are selected to be usable for the search query, have a higher frequency of appearance in the question sentence, and have a higher priority, and use words with a higher query score. And generate a search query. In this way, the search query generation unit 15 generates a search query without using the phrase selected as unavailable for the search query.
 業務マニュアルにおける語句の出現位置の分散の程度に基づき、当該語句の検索クエリへの使用の可否を決定することで、検索対象の文書の特定の範囲に遍在する語句は検索クエリから除外することなく、文書全体に頻繁に出現する語句を検索クエリから除外することができる。そのため、検索対象の文書に対する検索クエリにおいて、より適した語句を選択することができる。 Exclude words and phrases that are ubiquitous in a specific range of the document to be searched from the search query by deciding whether or not the word or phrase can be used in the search query based on the degree of dispersion of the appearance position of the word or phrase in the business manual. Instead, you can exclude words that frequently appear throughout the document from your search query. Therefore, it is possible to select a more suitable phrase in the search query for the document to be searched.
 回答候補取得部16は、生成された検索クエリを用いて業務マニュアルを検索し、検索結果を取得する(ステップS33)。 The answer candidate acquisition unit 16 searches the business manual using the generated search query and acquires the search result (step S33).
 ペア出力部17は、取得された複数の検索結果それぞれについて、複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、算出した検索スコアに基づき、検索クエリに対する検索結果の出力順序を決定する(ステップS34)。 The pair output unit 17 calculates a search score for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and determines the output order of the search results for the search query based on the calculated search score. (Step S34).
 このように本実施形態に係る回答候補取得方法は、質問候補を取得するステップ(質問取得ステップ)と、検索クエリを生成するステップ(検索クエリ生成ステップ)と、検索結果を取得するステップ(検索結果取得ステップ)と、検索結果の出力順序を制御するステップ(出力ステップ)とを含む。質問取得ステップでは、検索対象の文書に関連する複数の質問文からなる質問クラスタを取得する。検索クエリ生成ステップでは、本実施形態に係る優先順位決定方法により決定された、取得された質問クラスタを構成する複数の質問文または複数の質問文における語句の優先順位に基づき、複数の質問文または複数の質問文における語句のクエリスコアを算出し、算出したクエリスコアに基づき検索クエリを生成する。検索結果取得ステップでは、生成された検索クエリを用いて検索対象の文書を検索して、検索結果を取得する。出力ステップでは、取得された複数の検索結果それぞれについて、複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、算出した検索スコアに基づき、検索クエリに対する検索結果の出力順序を決定する。 As described above, the answer candidate acquisition method according to the present embodiment includes a step of acquiring a question candidate (question acquisition step), a step of generating a search query (search query generation step), and a step of acquiring a search result (search result). It includes a step (acquisition step) and a step (output step) for controlling the output order of search results. In the question acquisition step, a question cluster consisting of a plurality of question sentences related to the document to be searched is acquired. In the search query generation step, a plurality of question sentences or a plurality of question sentences or words are prioritized in a plurality of question sentences or a plurality of question sentences constituting the acquired question cluster determined by the priority determination method according to the present embodiment. Calculate the query score of words and phrases in multiple question sentences, and generate a search query based on the calculated query score. In the search result acquisition step, the document to be searched is searched using the generated search query, and the search result is acquired. In the output step, a search score is calculated for each of the acquired search results based on the frequency of appearance of words and phrases in the plurality of search results, and the output order of the search results for the search query is determined based on the calculated search score.
 検索クエリ生成ステップでは、1つの質問クラスタから複数の検索クエリを生成してもよい。この場合、検索結果取得ステップでは、生成された複数の検索クエリそれぞれを用いて業務マニュアルを検索して検索結果を取得してよい。出力ステップでは、複数の検索クエリを用いた検索結果ごとに検索スコアを算出し、検索クエリのクエリスコアと、検索クエリを用いた検索により得られた検索結果の検索スコアとに基づき、検索結果の出力順序を決定してよい。 In the search query generation step, multiple search queries may be generated from one question cluster. In this case, in the search result acquisition step, the business manual may be searched using each of the generated plurality of search queries and the search results may be acquired. In the output step, the search score is calculated for each search result using multiple search queries, and the search result is based on the query score of the search query and the search score of the search result obtained by the search using the search query. The output order may be determined.
 このように本実施形態においては、検索対象の文書における語句の出現位置の分散の程度に基づき、検索対象の文書における語句の検索クエリへの使用の可否を決定する。こうすることで、検索対象の文書の特定の範囲に遍在する語句は検索クエリから除外することなく、文書全体に頻繁に出現する語句を検索クエリから除外することができる。そのため、検索対象の文書に対する検索クエリにより適した語句を選択することができる。 As described above, in the present embodiment, it is determined whether or not the word or phrase can be used in the search query in the search target document based on the degree of dispersion of the appearance position of the word or phrase in the search target document. By doing so, it is possible to exclude words and phrases that frequently appear in the entire document from the search query without excluding words and phrases that are ubiquitous in a specific range of the document to be searched. Therefore, it is possible to select a phrase that is more suitable for the search query for the document to be searched.
 上述した検索システム10の各部として機能させるためにコンピュータを好適に用いることが可能である。そのようなコンピュータは、検索システム10の各部の機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのCPU(Central Processing Unit)によってこのプログラムを読み出して実行させることで実現することができる。すなわち、プログラムは、コンピュータを、上述した選択装置12として機能させることができる。また、プログラムは、コンピュータを、上述した優先順位決定装置13として機能させることができる。または、プログラムは、コンピュータを、上述した回答候補取得装置18として機能させることができる。 It is possible to preferably use a computer to function as each part of the above-mentioned search system 10. Such a computer stores a program describing processing contents that realize the functions of each part of the search system 10 in the storage unit of the computer, and the CPU (Central Processing Unit) of the computer reads and executes this program. It can be realized by letting it. That is, the program can cause the computer to function as the selection device 12 described above. The program can also cause the computer to function as the prioritization device 13 described above. Alternatively, the program may allow the computer to function as the answer candidate acquisition device 18 described above.
 また、このプログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、CD-ROMやDVD-ROMなどの記録媒体であってもよい。また、このプログラムは、ネットワークを介して提供することも可能である。 Further, this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.
 本開示は、上述した各実施形態で特定された構成に限定されず、請求の範囲に記載した発明の要旨を逸脱しない範囲内で種々の変形が可能である。例えば、各構成部などに含まれる機能などは論理的に矛盾しないように再配置可能であり、複数の構成部などを1つに組み合わせたり、或いは分割したりすることが可能である。 The present disclosure is not limited to the configuration specified in each of the above-described embodiments, and various modifications can be made without departing from the gist of the invention described in the claims. For example, the functions included in each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.
 10  検索システム
 11  質問候補判定部
 12  選択部(選択装置)
 13  優先後決定部(優先順位決定装置)
 14  質問候補取得部(質問取得部)
 15  検索クエリ生成部
 16  回答候補取得部
 17  ペア出力部(出力部)
 18  回答候補取得装置
 121  算出部
 122  業務マニュアル分割部
 123  語句分割部
 124  出現頻度算出部
 125  出現分散算出部
 126  語句選択部
10 Search system 11 Question candidate judgment unit 12 Selection unit (selection device)
13 Priority after determination unit (priority determination device)
14 Question Candidate Acquisition Department (Question Acquisition Department)
15 Search query generation unit 16 Answer candidate acquisition unit 17 Pair output unit (output unit)
18 Answer candidate acquisition device 121 Calculation unit 122 Business manual division unit 123 Word division unit 124 Appearance frequency calculation unit 125 Appearance variance calculation unit 126 Word selection unit

Claims (9)

  1.  検索対象の文書における語句の出現位置の分散の程度を算出する算出部と、
     前記算出部により算出された分散の程度に基づき、前記検索対象の文書における語句の、前記検索対象の文書を検索するための検索クエリへの使用の可否を選択する語句選択部と、
    を備える選択装置。
    A calculation unit that calculates the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and
    Based on the degree of dispersion calculated by the calculation unit, a phrase selection unit that selects whether or not the phrase in the document to be searched can be used in a search query for searching the document to be searched, and a phrase selection unit.
    A selection device equipped with.
  2.  請求項1に記載の選択装置により前記検索クエリに使用可能と選択された語句の出現頻度に基づき、前記検索対象の文書に関連する複数の質問文または前記複数の質問文における語句の優先順位を決定する優先順位決定装置。 Based on the frequency of occurrence of words and phrases selected as available in the search query by the selection device according to claim 1, the priority of the plurality of question sentences related to the document to be searched or the words and phrases in the plurality of question sentences is set. Priority determination device to determine.
  3.  検索対象の文書に関連する複数の質問文からなる質問クラスタを取得する質問取得部と、
     請求項2に記載の優先順位決定装置に、前記質問クラスタを構成する前記複数の質問文を入力し、請求項2に記載の優先順位決定装置により決定された前記複数の質問文または前記複数の質問文における語句の優先順位に基づき、前記複数の質問文または前記複数の質問文における語句のクエリスコアを算出し、該算出したクエリスコアに基づき前記検索クエリを生成する検索クエリ生成部と、
     前記生成された検索クエリを用いて前記検索対象の文書を検索して、検索結果を取得する回答候補取得部と、
     前記取得された複数の検索結果それぞれについて、前記複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、該算出した検索スコアに基づき、前記検索クエリに対する検索結果の出力順序を決定する出力部と、
    を備える回答候補取得装置。
    A question acquisition unit that acquires a question cluster consisting of multiple question sentences related to the document to be searched, and
    The plurality of question sentences constituting the question cluster are input to the priority determination device according to claim 2, and the plurality of question sentences or the plurality of question sentences determined by the priority determination device according to claim 2 are entered. A search query generator that calculates the query score of the plurality of question sentences or the words and phrases in the plurality of question sentences based on the priority of the words and phrases in the question sentence and generates the search query based on the calculated query score.
    The answer candidate acquisition unit that searches the document to be searched using the generated search query and acquires the search results, and the answer candidate acquisition unit.
    For each of the acquired plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output for determining the output order of the search results for the search query is determined based on the calculated search score. Department and
    Answer candidate acquisition device equipped with.
  4.  請求項3に記載の回答候補取得装置において、
     前記検索クエリ生成部は、複数の前記検索クエリを生成し、
     前記回答候補取得部は、前記生成された複数の検索クエリそれぞれを用いて前記検索対象の文書を検索して検索結果を取得し、
     前記出力部は、前記複数の検索クエリを用いた検索結果ごとに前記検索スコアを算出し、前記検索クエリのクエリスコアと、該検索クエリを用いた検索により得られた検索結果の検索スコアとに基づき、前記検索結果の出力順序を決定する、回答候補取得装置。
    In the answer candidate acquisition device according to claim 3,
    The search query generation unit generates a plurality of the search queries.
    The answer candidate acquisition unit searches for the document to be searched by using each of the plurality of generated search queries, and acquires the search result.
    The output unit calculates the search score for each search result using the plurality of search queries, and sets the query score of the search query and the search score of the search result obtained by the search using the search query into. Based on this, an answer candidate acquisition device that determines the output order of the search results.
  5.  検索対象の文書における語句の出現位置の分散の程度を算出する算出ステップと、
     前記算出された分散の程度に基づき、前記検索対象の文書における語句の、前記検索対象の文書を検索するための検索クエリへの使用の可否を選択する選択ステップと、
    を含む選択方法。
    A calculation step for calculating the degree of dispersion of the appearance positions of words and phrases in the document to be searched, and
    Based on the calculated degree of variance, a selection step of selecting whether or not the phrase in the document to be searched can be used in a search query for searching the document to be searched, and a selection step.
    Selection method including.
  6.  請求項5に記載の選択方法により前記検索クエリに使用可能と選択された語句の出現頻度に基づき、前記検索対象の文書に関連する複数の質問文または前記複数の質問文における語句の優先順位を決定する優先順位決定ステップを含む優先順位決定方法。 Based on the frequency of occurrence of words and phrases selected to be usable in the search query by the selection method according to claim 5, the priority of the plurality of question sentences related to the document to be searched or the words and phrases in the plurality of question sentences is set. A priority determination method that includes a priority determination step to determine.
  7.  検索対象の文書に関連する複数の質問文からなる質問クラスタを取得する質問取得ステップと、
     請求項6に記載の優先順位決定方法により決定された、前記質問クラスタを構成する前記複数の質問文または前記複数の質問文における語句の優先順位に基づき、前記複数の質問文または前記複数の質問文における語句のクエリスコアを算出し、該算出したクエリスコアに基づき前記検索クエリを生成する検索クエリ生成ステップと、
     前記生成された検索クエリを用いて前記検索対象の文書を検索して、検索結果を取得する検索結果取得ステップと、
     前記取得された複数の検索結果それぞれについて、前記複数の検索結果における語句の出現頻度に基づき検索スコアを算出し、該算出した検索スコアに基づき、前記検索クエリに対する検索結果の出力順序を決定する出力ステップと、
    を含む回答候補取得方法。
    A question acquisition step to acquire a question cluster consisting of multiple question sentences related to the document to be searched, and
    The plurality of question sentences or the plurality of questions based on the priority of the words and phrases in the plurality of question sentences or the plurality of question sentences constituting the question cluster determined by the priority determination method according to claim 6. A search query generation step that calculates the query score of a phrase in a sentence and generates the search query based on the calculated query score.
    A search result acquisition step of searching the document to be searched using the generated search query and acquiring the search result, and
    For each of the acquired plurality of search results, a search score is calculated based on the frequency of appearance of words and phrases in the plurality of search results, and an output for determining the output order of the search results for the search query is determined based on the calculated search score. Steps and
    How to get answer candidates including.
  8.  請求項7に記載の回答候補取得方法において、
     前記検索クエリ生成ステップでは、複数の前記検索クエリを生成し、
     前記検索結果取得ステップでは、前記生成された複数の検索クエリそれぞれを用いて前記検索対象の文書を検索して検索結果を取得し、
     前記出力ステップでは、前記複数の検索クエリを用いた検索結果ごとに前記検索スコアを算出し、前記検索クエリのクエリスコアと、該検索クエリを用いた検索により得られた検索結果の検索スコアとに基づき、前記検索結果の出力順序を決定する、回答候補取得方法。
    In the answer candidate acquisition method according to claim 7,
    In the search query generation step, a plurality of the search queries are generated.
    In the search result acquisition step, the document to be searched is searched for using each of the plurality of generated search queries, and the search result is acquired.
    In the output step, the search score is calculated for each search result using the plurality of search queries, and the query score of the search query and the search score of the search result obtained by the search using the search query are used. Based on this, a method for acquiring answer candidates that determines the output order of the search results.
  9.  コンピュータを、請求項1に記載の選択装置、請求項2に記載の優先順位決定装置、または、請求項3または4に記載の回答候補取得装置として機能させるためのプログラム。 A program for making a computer function as the selection device according to claim 1, the priority determination device according to claim 2, or the answer candidate acquisition device according to claim 3 or 4.
PCT/JP2020/019902 2020-05-20 2020-05-20 Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program WO2021234844A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/019902 WO2021234844A1 (en) 2020-05-20 2020-05-20 Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/019902 WO2021234844A1 (en) 2020-05-20 2020-05-20 Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program

Publications (1)

Publication Number Publication Date
WO2021234844A1 true WO2021234844A1 (en) 2021-11-25

Family

ID=78708302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/019902 WO2021234844A1 (en) 2020-05-20 2020-05-20 Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program

Country Status (1)

Country Link
WO (1) WO2021234844A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080315A1 (en) * 2004-10-08 2006-04-13 The Greentree Group Statistical natural language processing algorithm for use with massively parallel relational database management system
US20080243820A1 (en) * 2007-03-27 2008-10-02 Walter Chang Semantic analysis documents to rank terms
US20120197905A1 (en) * 2011-02-02 2012-08-02 Microsoft Corporation Information retrieval using subject-aware document ranker
US20140244384A1 (en) * 2007-03-23 2014-08-28 Walter Chang Method and apparatus for performing targeted advertising in documents
US20190095802A1 (en) * 2017-09-25 2019-03-28 International Business Machines Corporation Heuristic and non-semantic prediction of the cost to find and review data relevant to a task

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080315A1 (en) * 2004-10-08 2006-04-13 The Greentree Group Statistical natural language processing algorithm for use with massively parallel relational database management system
US20140244384A1 (en) * 2007-03-23 2014-08-28 Walter Chang Method and apparatus for performing targeted advertising in documents
US20080243820A1 (en) * 2007-03-27 2008-10-02 Walter Chang Semantic analysis documents to rank terms
US20120197905A1 (en) * 2011-02-02 2012-08-02 Microsoft Corporation Information retrieval using subject-aware document ranker
US20190095802A1 (en) * 2017-09-25 2019-03-28 International Business Machines Corporation Heuristic and non-semantic prediction of the cost to find and review data relevant to a task

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANAKO ONISHI; ICHIRO KOBAYASHI: "An Information Enhancement Technique by using Linked Data", DEIM FORUM 2011, 4 August 2011 (2011-08-04), pages 1 - 8, XP009532380 *
TAKAAKI HASEGAWA: "Automatic Knowledge Assistance System Supporting Operator Responses", NTT TECHNICAL REVIEW, vol. 17, no. 9, 1 September 2019 (2019-09-01), pages 15 - 18, XP055874245 *

Similar Documents

Publication Publication Date Title
CN107832286B (en) Intelligent interaction method, equipment and storage medium
JP4924950B2 (en) Question answering data editing device, question answering data editing method, question answering data editing program
WO2018034118A1 (en) Dialog system and computer program therefor
JP6998680B2 (en) Interactive business support system and interactive business support program
US11494434B2 (en) Systems and methods for managing voice queries using pronunciation information
US20170365258A1 (en) Utterance presentation device, utterance presentation method, and computer program product
CA2823835C (en) Voice search and response based on relevancy
JP7060027B2 (en) FAQ maintenance support device, FAQ maintenance support method, and program
CN106407393B (en) information processing method and device for intelligent equipment
KR102348084B1 (en) Image Displaying Device, Driving Method of Image Displaying Device, and Computer Readable Recording Medium
JP2019207648A (en) Interactive business assistance system
US11842721B2 (en) Systems and methods for generating synthesized speech responses to voice inputs by training a neural network model based on the voice input prosodic metrics and training voice inputs
CN110852095B (en) Statement hot spot extraction method and system
JP6873805B2 (en) Dialogue support system, dialogue support method, and dialogue support program
WO2019159986A1 (en) Information provision device, information provision method, and program
CA3143970A1 (en) Systems and methods for identifying dynamic types in voice queries
US20210034662A1 (en) Systems and methods for managing voice queries using pronunciation information
WO2021234844A1 (en) Selection device, priority determination device, answer candidate acquisition device, selection method, priority determination method, answer candidate acquisition method and program
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
JP7126865B2 (en) Interactive business support system
JP7028746B2 (en) Question generator and question generation method
US20230205794A1 (en) Generating search insight data
CN111354350A (en) Voice processing method and device, voice processing equipment and electronic equipment
US20220207066A1 (en) System and method for self-generated entity-specific bot
JP2019101619A (en) Dialogue scenario generation apparatus, program and method capable of determining context from dialogue log groups

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP