WO2021250837A1

WO2021250837A1 - Search device, search method, and recording medium

Info

Publication number: WO2021250837A1
Application number: PCT/JP2020/022971
Authority: WO
Inventors: 秀治古明地; 靖夫飯村
Original assignee: 日本電気株式会社
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2021-12-16
Also published as: JPWO2021250837A1

Abstract

A search device provided with: a conversion unit that converts input voice data into text data by means of voice recognition in order to generate, on the basis of given voice data, a plurality of search queries configured from search terms corresponding to search items; and a search unit that extracts a character string corresponding to a search item from the text data, generates, for each search item, a search term related to the character string, on the basis of a distance from the extracted character string and, for each search item, combines the generated search terms, thereby generating a plurality of search queries.

Description

Search device, search method, and recording medium

The present invention relates to a search device or the like that generates a search query using text data converted from voice data.

Normally, when inquiring about the identity of a police officer in a job question, etc., the person to be inquired is confirmed by contacting the main office by voice call via wireless communication. In such queries, it may be possible to automate immediate queries by applying text analysis techniques that convert speech data to text data. For example, if the database in which the inquiry information is stored can be searched by using the text data converted from the voice data based on the voice of the speaker, the immediate inquiry can be automated. However, in reality, since the pronunciation and accent vary depending on the speaker, it is assumed that erroneous conversion occurs when the voice data is converted into text data, and the inquiry is made based on the erroneous text data.

Patent Document 1 discloses a sequence signal search device for efficiently processing a plurality of search term candidates from sequence signals such as voice data including errors. The apparatus of Patent Document 1 plots the syllable sequence of the speech recognition result of the speech data and the syllable sequence of the search term on a plane based on the distance (similarity) between the syllables. The apparatus of Patent Document 1 realizes a search process of voice data by a search term by detecting a straight line on the plane.

Patent Document 2 discloses a navigation device that searches for place names and road names based on voice recognition. The device of Patent Document 2 accepts the first character string included in the search target character string and narrows down the candidates for the search target character string. Then, the apparatus of Patent Document 2 extracts a search target character string from the narrowed down candidates based on the voice data input thereafter.

Japanese Unexamined Patent Publication No. 2011-128903 Japanese Unexamined Patent Publication No. 2010-038751

According to the method of Patent Document 1, it is possible to extract a plurality of search term candidates based on the voice recognition result. However, the method of Patent Document 1 is difficult to apply when the voice data is not known before the search, and is not suitable for an immediate inquiry in which it is necessary to extract a search term from text data based on arbitrary voice data. rice field.

According to the method of Patent Document 2, the certainty of extracting the search target character string can be improved by narrowing down the search target character string candidates in advance. However, in the method of Patent Document 2, it is necessary to input the first character string included in the search target character string in advance. Therefore, the method of Patent Document 2 cannot extract a plurality of search terms from text data composed of a plurality of search terms.

An object of the present invention is to provide a search device or the like capable of generating a plurality of search queries composed of search terms corresponding to search items based on arbitrary voice data.

The search device according to one aspect of the present invention has a conversion unit that converts input voice data into text data by voice recognition, and a character string corresponding to a search item is extracted from the text data and is a distance from the extracted character string. Based on the above, a search unit that generates a search term related to a character string for each search item and combines the search terms generated for each search item to generate a plurality of search queries is provided.

In the search method of one aspect of the present invention, the computer converts the input voice data into text data by voice recognition, extracts the character string corresponding to the search item from the text data, and extracts the character string from the extracted character string. Based on the distance, the search term related to the character string is generated for each search item, and the search items generated for each search item are combined to generate multiple search queries.

The program of one aspect of the present invention is a process of converting input voice data into text data by voice recognition, a process of extracting a character string corresponding to a search item from text data, and a distance from the extracted character string. Based on the above, the computer is made to execute a process of generating a search term related to a character string for each search item and a process of generating a plurality of search queries by combining the search items generated for each search item.

According to the present invention, it is possible to provide a search device or the like that can generate a plurality of search queries composed of search terms corresponding to search items based on arbitrary voice data.

It is a block diagram which shows an example of the structure of the search apparatus which concerns on 1st Embodiment. This is an example of a table in which the search terms generated by the search device according to the first embodiment are ranked according to the score. It is a conceptual diagram which shows an example which the search apparatus which concerns on 1st Embodiment generates a plurality of search queries based on input voice data. It is a conceptual diagram which shows an example of the collation data stored in the database searched by the search apparatus which concerns on 1st Embodiment. It is a conceptual diagram which shows an example which the search apparatus which concerns on 1st Embodiment converts text data corresponding to a search result into voice data and outputs it. It is a conceptual diagram which shows an example which the search apparatus which concerns on 1st Embodiment converts text data for confirming correctness of a search term into voice data, and outputs it. It is a conceptual diagram which shows an example which the voice data of the reply is input to the search device with respect to the voice data transmitted from the search device which concerns on 1st Embodiment. It is a flowchart which shows an example of the generation of the search query by the search apparatus which concerns on 1st Embodiment. It is a flowchart which shows an example of the correctness confirmation of the search term by the search apparatus which concerns on 1st Embodiment. It is a flowchart which shows an example of hearing back of the search term by the search apparatus which concerns on 1st Embodiment. It is a flowchart which shows an example of the search of the database by the search apparatus which concerns on 1st Embodiment. It is a block diagram which shows an example of the structure of the search apparatus which concerns on 2nd Embodiment. It is a conceptual diagram for demonstrating the accuracy of the search query calculated by the search apparatus which concerns on 2nd Embodiment. It is a flowchart which shows an example of the generation of the search query by the search apparatus which concerns on 2nd Embodiment. It is a flowchart which shows an example of the search of the database by the search apparatus which concerns on 2nd Embodiment. It is a block diagram which shows an example of the structure of the search apparatus which concerns on 3rd Embodiment. This is an example of a table included in a dictionary used by the search device according to the third embodiment to generate a search term. This is another example of a table included in a dictionary used by the search device according to the third embodiment to generate a search term. It is a conceptual diagram which shows an example which the search apparatus which concerns on 3rd Embodiment converts text data for confirming correctness of a search term into voice data, and outputs it. It is a conceptual diagram which shows an example which the voice data of the reply is input to the search device with respect to the voice data transmitted from the search device which concerns on 3rd Embodiment. It is a flowchart which shows an example of the generation of the search query by the search apparatus which concerns on 3rd Embodiment. It is a block diagram which shows an example of the structure of the search apparatus which concerns on 4th Embodiment. It is a sequence diagram which shows an example of the flow from the connection by a wireless device to the search device which concerns on 4th Embodiment, to the generation of a search term by a search device. It is a sequence diagram which shows an example of the flow from the generation of the search term by the search apparatus which concerns on 4th Embodiment to the output of a collation result. It is a sequence diagram which shows another example of the flow from the generation of the search term by the search apparatus which concerns on 4th Embodiment to the output of a collation result. It is a sequence diagram which shows still another example of the flow from the generation of the search term by the search apparatus which concerns on 4th Embodiment to the output of a collation result. It is a block diagram which shows an example of the structure of the search apparatus which concerns on 5th Embodiment. It is a block diagram which shows an example of the hardware configuration which realizes the search apparatus which concerns on each embodiment.

Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. However, although the embodiments described below have technically preferable limitations for carrying out the present invention, the scope of the invention is not limited to the following. In all the drawings used in the following embodiments, the same reference numerals are given to the same parts unless there is a specific reason. Further, in the following embodiments, repeated explanations may be omitted for similar configurations and operations. Further, the direction of the arrow in the drawing shows an example, and does not limit the direction of the signal between the blocks.

(First Embodiment)
First, the search device according to the first embodiment will be described with reference to the drawings. The search device of the present embodiment converts voice data into text data by using voice recognition technology, and recognizes at least one character string from the converted text data. The search device of the present embodiment generates a plurality of search terms related to at least one recognized character string based on the distance between pronunciations, and generates a plurality of search queries (also referred to as search patterns) including those search terms. .. In the following description, characters and symbols such as hiragana, katakana, kanji, and alphabet may be used instead of the phonemes used by speech recognition.

(composition)
FIG. 1 is a block diagram showing an example of the configuration of the search device 10 of the present embodiment. The search device 10 includes an acquisition unit 11, a first conversion unit 12, a search unit 13, a second conversion unit 18, and an output unit 19. The acquisition unit 11 and the output unit 19 constitute an input / output unit 110. The first conversion unit 12 and the second conversion unit 18 constitute a conversion unit 120. FIG. 1 also shows a database (DB100) connected to the search device 10. The DB 100 is connected to the search unit 13 via a network such as the Internet or an intranet. A plurality of collation data are stored in the DB 100.

The search device 10 transmits / receives voice data to / from a radio device (not shown). For example, the radio has a microphone and a speaker. The radio converts the voice input by voice via the microphone into an electric signal (voice data). For example, a radio device transmits a radio signal including voice data by wireless communication in a specific frequency band. For example, a radio signal transmitted from a radio is converted into an electric signal via an antenna, an amplifier, a demodulator, or the like (not shown), and is input to the search device 10 via a network such as the Internet or an intranet. For example, the search device 10 outputs voice data to the radio. In the following, although the search device 10 and the radio device are not directly connected, the search device and the radio device will be described as exchanging voice data.

The acquisition unit 11 acquires voice data from the radio. The acquisition unit 11 outputs the acquired voice data to the first conversion unit 12.

The first conversion unit 12 converts voice data into text data by voice recognition. For example, the first conversion unit 12 converts voice data into text data by using an algorithm of an acoustic model or a language model. For example, the first conversion unit 12 converts voice data into text data by using a method such as a statistical method or a dynamic time expansion / contraction method. For example, the first conversion unit 12 converts voice data into text data by using a technique such as deep learning or a hidden Markov model. For example, the first conversion unit 12 converts voice data into text data using a voice recognition dictionary including a voice model, a language model, a pronunciation dictionary, and the like. For example, the first conversion unit 12 calculates a speech recognition score (also referred to as a score) by text analysis for a character string (word) included in the text data. For example, the first conversion unit 12 converts voice data into text data based on the score in voice recognition. The voice recognition method described here is an example, and does not limit the conversion method from voice data to text data by the first conversion unit 12.

The search unit 13 recognizes at least one character string for each search item from the text data converted by the first conversion unit 12. For example, when the search items are a name, a date of birth, and a registered domicile, the search unit 13 detects a character string that can correspond to those search items. For example, the search unit 13 is a candidate for a character string corresponding to another search item based on the character string corresponding to a certain search item recognized from the text data and having a high possibility of appearing before and after the character string. And. For example, the search unit 13 has at least one of the search term candidates for each search item based on the speech recognition score (score) given to each of the character strings (words) extracted from the text data. Select one string as the search term. For example, the search unit 13 selects the character string having the highest score as the search term from the search term candidates for each search item. The character string having the highest score among the search term candidates for each search item corresponds to the character string of the recognition result.

The search unit 13 generates a search term from the recognized character string based on the distance between pronunciations. The distance between pronunciations is the distance between two character strings based on pronunciation. For example, the search unit 13 compares phoneme strings constituting two character strings, and sets the number of different phonemes as the inter-sounding distance. In this embodiment, the distance between pronunciations is defined in consideration of the order in which phonemes appear in the character string. For example, the search unit 13 generates a character string having a close distance between the recognized character string and the pronunciation as a search term. The recognized character string is generated as a search term because the distance between pronunciations of the recognized character string is 0.

The distance between pronunciations will be explained below with some examples. The example given below is an example, and does not limit the distance between pronunciations used by the search unit 13 of the present embodiment when generating a search term.

First, I will explain the distance between pronunciations of Sato and Kato. The phonemes of Sato are "s", "a", "t", and "o". The phonemes of Kato are "k", "a", "t", and "o". Only the first phoneme is different between Sato and Kato. That is, since Sato and Kato have one different phoneme, the distance between pronunciations is 1.

Next, I will explain the distance between pronunciations of Sato and Saito. The phonemes of Sato are "s", "a", "t", and "o". The phonemes of Saito are "s", "a", "i", "t", and "o". Sato and Saito differ in that there is one more phoneme in Saito. That is, in Sato and Saito, the distance between pronunciations is 1 because there is only one phoneme that is excessive or insufficient.

Next, I will explain the distance between pronunciations of Sato and Suzuki. The phonemes of Sato are "s", "a", "t", and "o". The phonemes of Suzuki are "s", "u", "z", "u", "k", and "i". Sato and Suzuki have three different phonemes and two phonemes that are excessive or deficient, so the distance between pronunciations is five.

Further, the search unit 13 may generate a search term from the recognized character string with a distance defined by an acoustic distance between two phonemes (also referred to as a distance between phonemes) as a distance between pronunciations. That is, the inter-sounding distance may be a predefined inter-phoneme distance between arbitrary phonemes. For example, defined in advance two phonemes (p _1, p ₂₎ between the distance between phonemes pd (p _1, p _2). For example, the distance between two phonemes is pd (s, s) = 0.0, pd (s, k) = 1.2, pd (o, o) = 0.1, pd ('', It is defined as k) = 4.0. For example, the distance between pronunciations of Sato (sato) and Kato (kato) is different from the first phoneme (s, k), so the distance between pronunciations D corresponds to the distance between phonemes pd (s, k) = 1.2. ..

The search unit 13 generates a plurality of search queries using the generated search terms. For example, when the search items are a name, a date of birth, and a registered domicile, the search unit 13 generates a search query that combines the search terms for each of those search items. The search item may include a search item other than the name, date of birth, and address.

The search unit 13 searches the DB 100 using at least one of the generated plurality of search queries. For example, the search unit 13 searches the DB 100 using some of the generated search queries that satisfy predetermined criteria. For example, the search unit 13 may search the DB 100 using all of the generated search queries.

DB100 is constructed in association with the type of collation target (also referred to as inquiry type). The DB 100 stores a plurality of data (also referred to as collation data) including search items for searching the collation target. For example, the collation data stored in the DB 100 is searched using one of several search items as a key. If at least one of the searched collation data matches, the search is a hit. In the present embodiment, when all the search items are matched, it is expressed that the search is a hit. For example, the DB 100 is constructed for each inquiry type.

For example, the search unit 13 selects at least one search query according to the score of the key search term included in the search query. For example, the search unit 13 inputs a plurality of search words generated from a character string that is a recognition result by voice recognition into a voice recognition engine that outputs a score according to the recognition result, and according to the output score. Rank the search terms. The search unit 13 searches the DB 100 using the selected search query. For example, the search unit 13 selects the search query having the highest score of the key search term included in the search query, and searches the DB 100 using the selected search query. FIG. 2 is an example (table 131) of a table in which the search terms of the surname included in the name, which is a search item, are ranked according to the score. For example, the search unit 13 selects a search query according to the score included in the surname of the name included in the search query.

For example, the search unit 13 generates a search term with a score based on the recognition result of voice recognition for a plurality of search terms. For example, in a certain voice recognition, it is assumed that the recognition results of "Sato (0.41)" and "Kato (0.65)" are obtained for the surname (the score in parentheses). In addition, regarding the date of birth, it is assumed that the recognition results of "January 1, 1990 (0.56)" and "July 1, 1990 (0.92)" were obtained (in parentheses). Score). Then, it is assumed that the recognition result of "Tokyo A-ku (0.43)" is obtained for the registered domicile (the score in parentheses). For example, the search unit 13 generates the following text data.
"My last name is Sato (0.41), my date of birth is January 1, 1990 (0.56), and my registered domicile is A-ku, Tokyo (0.43)."
"My last name is Kato (0.65), my date of birth is January 1, 1990 (0.56), and my registered domicile is A-ku, Tokyo (0.43)."
"My last name is Sato (0.41), my date of birth is July 1, 1990 (0.92), and my registered domicile is A-ku, Tokyo (0.43)."
"My last name is Kato (0.65), my date of birth is July 1, 1990 (0.92), and my registered domicile is A-ku, Tokyo (0.43)."
For example, the above text data is converted into voice data by the second conversion unit 18, and is output from the output unit 19 to a radio (not shown). The above text data converted into voice data may be output in an order according to the score of any search item, or may be output in an order according to the total value of the scores of all the search items. ..

The second conversion unit 18 acquires text data from the search unit 13. The second conversion unit 18 converts the acquired text data into voice data. For example, the second conversion unit 18 converts the text data into speech data by using a rule synthesis method such as formant speech synthesis or tone synthesis. For example, the second conversion unit 18 converts text data into speech data by using a waveform connection type speech synthesis method such as unit selection type speech synthesis, diphon speech synthesis, and field-limited speech synthesis. For example, the second conversion unit 18 converts text data into speech data by using a method of statistical parametric speech synthesis such as neural network speech synthesis and hidden Markov speech synthesis. The speech synthesis method described here is an example, and does not limit the speech synthesis method used by the second conversion unit 18. Further, if the first conversion unit 12 can convert the text data into the voice data, the second conversion unit 18 may be omitted.

The output unit 19 outputs the voice data converted by the second conversion unit 18. The voice data output from the output unit 19 is output to the radio, and is output as voice in the radio. For example, the output unit 19 outputs voice data based on the collation data that hits the search in an order according to the accuracy of the search query.

FIG. 3 is a conceptual diagram showing an example in which the search device 10 generates a plurality of search queries using voice data acquired from a radio device. In the example of FIG. 3, the search device 10 acquires the voice data "Mr. Shibataro, born on January 1, 1990, registered domicile address A-ku, Tokyo". For example, the search device 10 recognizes the character strings for each search item such as "Shibataro", "January 1, 1990", and "A-ku, Tokyo" from the acquired voice data by voice recognition. The search device 10 generates a plurality of search terms related to the recognized character string based on the distance between pronunciations. The search device 10 generates a search query that combines a plurality of generated search terms. For example, the search device 10 makes a plurality of search queries such as (Shibataro, January 1, 1990, A-ku, Tokyo), (Shibataro, July 1, 1990, A-ku, Tokyo), and so on. Generate.

FIG. 4 is an example of collation data (collation table 101) stored in the DB 100. The collation table 101 stores collation data including search items (name, date of birth, registered domicile, etc.). For example, the collation table 101 stores collation data of a person whose date of birth is January 1, 1990 and whose registered domicile is "Shibataro" in A-ku, Tokyo. For example, when the DB 100 is searched using the search query generated in the example of FIG. 3, this person named "Shibataro" is hit.

FIG. 5 is an example in which the search device 10 converts text data corresponding to the search result of the DB 100 into voice data and outputs the data. In the example of FIG. 5, the search device 10 acquires the search result "Shibataro, January 1, 1990, A-ku, Tokyo, XX" from the DB 100. In FIG. 5, "○○" is an inquiry type of a person who hits the search. For example, the search device 10 generates text data such as "Registered domicile address A-ku, Tokyo, Mr. Shibataro, born on January 1, 1990, corresponds to XX" from the acquired search results. The search device 10 converts the generated text data into voice data. The search device 10 outputs the converted voice data to the radio. The voice data output from the search device 10 is output as voice in a radio (not shown).

Further, the search device 10 may generate text data for inquiring whether the search term is correct or not by using at least a part of the search terms related to the character string recognized in the text data.

FIG. 6 is an example of converting text data for confirming the correctness of a search term into voice data and outputting it. In the example of FIG. 6, the search device 10 outputs voice data based on the search word having the highest score and the text data for re-questioning the search word that could not be voice-recognized. In the example of FIG. 6, the search device 10 outputs voice data having the content "Search by Kibataro, registered address A-ku, Tokyo. Please give me your date of birth again." For example, audio data is output to a radio. The voice data output from the search device 10 is output as voice for confirming the correctness of the search term in a radio (not shown).

FIG. 7 is an example in which, in the example of FIG. 6, response voice data is returned from a radio (not shown) or the like with respect to the voice data transmitted from the search device 10. In the example of FIG. 7, the search device 10 acquires the voice data of the reply "The name is Shibataro and the date of birth is January 1, 1990". For example, the search device 10 obtains the recognition results of "Shibataro" and "January 1, 1990" from the acquired voice data by voice recognition. The search device 10 generates a search term according to the recognition result, generates a search query including the generated search term, and generates text data of the response. If the correctness of the search term can be confirmed by the sender of the voice data before or at the time of generating the search query, it is possible to reduce the generation of an erroneous search query.

(motion)
Next, the operation of the search device 10 will be described with reference to the drawings. In the following, the generation of the search query, the confirmation of the correctness of the search term, the re-listening of the search items, the search using the generated search query, and the like will be individually described. The following operation is an example and does not limit the operation of the search device 10.

[Search query generation]
FIG. 8 is a flowchart for explaining an example of generating a search query by the search device 10. In the description according to the flowchart of FIG. 8, the search device 10 is the main operating body.

In FIG. 8, first, the search device 10 acquires voice data (step S111). For example, the search device 10 acquires voice data output from a radio (not shown).

Next, the search device 10 converts the acquired voice data into text data by voice recognition (step S112).

Next, the search device 10 extracts the character string corresponding to the search item from the text data (step S113).

Next, the search device 10 generates a search term related to the extracted character string for each search item based on the distance between pronunciations (step S114).

Next, the search device 10 generates a plurality of search queries that combine search terms for each search item (step S115).

[Confirmation of correctness]
FIG. 9 is a flowchart for explaining an example in which the search device 10 confirms the correctness of the search term. The flowchart of FIG. 9 is a process following step S114 of the flowchart of FIG. In the example of FIG. 9, it is assumed that the search term is given a score or a ranking. In the description according to the flowchart of FIG. 9, the search device 10 is the main operating body.

In FIG. 9, first, the search device 10 generates text data for confirming the correctness of the search term having the maximum score (step S121).

Next, the search device 10 converts the generated text data into voice data and outputs the converted voice data (step S122). For example, the voice data output from the search device 10 is output as voice in a radio (not shown).

Next, the search device 10 acquires the voice data of the response and converts it into text data by voice recognition (step S123). For example, the search device 10 acquires voice data from a radio (not shown). For example, if the search device 10 does not obtain a response for a predetermined period, the voice data may be retransmitted or the process may proceed to step S125.

Here, if the search term is correct (Yes in step S124), the search device 10 generates a plurality of search queries that combine the search terms for each search item (step S126).

On the other hand, when the search term is incorrect (No in step S124), the search device 10 generates text data for confirming the correctness of the search term having the next highest score (step S125). After step S125, the process returns to step S122. The series of processes from step S122 to step S125 is continued until it is confirmed that the search term is correct. If it cannot be confirmed that the search term is correct even after repeating the series of processes of steps S122 to S125 a predetermined number of times / for a predetermined time, the process proceeds to step S126 or returns to step S113 of FIG. You may.

[Return]
FIG. 10 is a flowchart for explaining an example in which the search device 10 listens back to an unrecognized search item. In the description according to the flowchart of FIG. 10, the search device 10 is the main operating body.

In FIG. 10, first, the search device 10 acquires voice data (step S131). For example, the search device 10 acquires voice data transmitted from a radio (not shown).

Next, the search device 10 converts the acquired voice data into text data by voice recognition (step S132).

Next, the search device 10 extracts the character string corresponding to the search item from the text data (step S133).

Here, when the search item is insufficient (Yes in step S134), the search device 10 outputs voice data for listening back to the missing search item (step S135). After step S135, the process returns to step S131. For example, the voice data output from the search device 10 is output as voice in a radio (not shown).

On the other hand, when the search items are not insufficient (No in step S134), the search device 10 generates a search term related to the extracted character string for each search item based on the distance between pronunciations (step S136). ..

Next, the search device 10 generates a plurality of search queries that combine search terms for each search item (step S137).

〔search〕
FIG. 11 is a flowchart for explaining an example in which the search device 10 searches the DB 100 using the generated search query. In the description according to the flowchart of FIG. 11, the search device 10 is the main operating body.

In FIG. 11, first, the search device 10 searches the DB 100 using the generated search query (step S151).

When the search is hit (Yes in step S152), the search device 10 generates text data including the hit search result (step S153). On the other hand, when the search is not hit (No in step S152), the search device 10 generates text data indicating that the search result was not obtained (step S154).

After step S153 and step S154, the search device 10 converts the generated text data into voice data (step S155).

Next, the search device 10 outputs voice data (step S156). For example, the voice data output from the search device 10 is output as voice in a radio (not shown).

As described above, the search device of the present embodiment includes an acquisition unit, a first conversion unit, a search unit, a second conversion unit, and an output unit. The acquisition unit inputs voice data. The first conversion unit converts the voice data acquired by the acquisition unit into text data by voice recognition. The search unit extracts the character string corresponding to the search item from the text data. The search unit generates search terms related to the character string for each search item based on the distance from the extracted character string. The search unit generates a plurality of search queries by combining the search terms generated for each search item. The search unit searches a database in which collation data including search items are accumulated. The search unit generates text data according to the search results. The second conversion unit converts the generated text data into voice data. The output unit outputs the voice data converted from the text data.

In one aspect of the present embodiment, the search unit generates a character string having a close distance between the character string and the pronunciation as a search term based on the distance between pronunciations based on the difference in phonemes constituting the character string extracted from the text data. do. For example, the search unit calculates the inter-sounding distance using a predefined inter-phoneme distance between two phonemes.

According to this embodiment, it is possible to generate a plurality of search queries composed of search terms corresponding to search items based on arbitrary voice data.

In one aspect of the present embodiment, the search unit assigns a score based on voice recognition to the search term, and generates text data including the search term to which the score is assigned. The conversion unit converts the generated text data into voice data, and outputs the voice data converted from the text data.

For example, since the input of accident information and the like by police officers using a radio or the like is only voice input, it is desirable to reduce the frequency and confirmation of input. Further, in such an input, it is desirable to make an inquiry of a person or the like quickly and accurately based on the information input by voice. According to the present embodiment, it is possible to generate a large number of a plurality of search queries composed of search terms included in a search item based on voice data. Since those search queries are generated according to the distance from the character string extracted from the voice data, it is highly possible that the search queries include the exact search terms. Further, in the present embodiment, since it is possible to confirm the correctness of the generated search term, it is possible to generate a search query based on an accurate search term.

(Second embodiment)
Next, the search device according to the second embodiment will be described with reference to the drawings. The present embodiment is different from the first embodiment in that the generated search query is ranked according to the accuracy.

(composition)
FIG. 12 is a block diagram showing an example of the configuration of the search device 20 of the present embodiment. The search device 20 includes an acquisition unit 21, a first conversion unit 22, a search unit 23, a second conversion unit 28, and an output unit 29. The acquisition unit 21 and the output unit 29 constitute an input / output unit 210. The first conversion unit 22 and the second conversion unit 28 constitute a conversion unit 220. FIG. 12 also shows a database (DB200) connected to the search device 20. For example, the DB 200 is connected to the search unit 23 via a network such as the Internet or an intranet. A plurality of collation data are stored in the DB 200. Since the configurations other than the search unit 23 included in the search device 20 are the same as the configurations included in the search device 10 of the first embodiment, detailed description thereof will be omitted. In the following, the description will be focused on the search unit 23.

The search unit 23 selects a search query according to the accuracy of the search term. The search unit 23 searches the DB 200 using the selected search query. The accuracy is the sum of the distances between pronunciations for each search item such as name, date of birth, and address. For example, the accuracy of the character string (search term) itself of the recognition result by the voice recognition by the search unit 23 is 0. In addition, each word ranked based on the score of the recognition result uses the distance between pronunciations as the accuracy. For example, the accuracy may be weighted for each search item. For example, there are far more variations in names than variations in dates of birth. Therefore, if the weight of the name is increased compared to the date of birth, the search accuracy will be improved. In addition, rare surnames are unlikely to be encountered, so they may be excluded from the search terms regardless of their accuracy.

Here, the accuracy of the search query generated by the search unit 23 will be described with an example. In the following example, the search items are name (also called name query), date of birth (also called date of birth query), and registered domicile (also called registered domicile query). In the following example, for the sake of brevity, only the surname is used for the name query.

For example, the recognition results of the name query, date of birth query, and registered domicile query are "Sato", "January 1, 1990", and "Tokyo A". It is assumed that it was a ward (Tokyo Toake). Each recognition result is used as a search term. Regarding the name query, the accuracy of "Sato", which is the recognition result, is 0. Regarding the date of birth query, the accuracy of the recognition result "January 1, 1990 (Heisei ni Nenichi ga Tsutsutachi)" is 0. Regarding the registered domicile query, the accuracy of "Tokyo A-ku, Tokyo", which is the recognition result, is 0.

For example, as search term candidates for name query, date of birth query, and registered domicile query, "Kato", "July 1, 1990 (Heisei ni Nenshi ga Tsutsutachi)", " It is assumed that "Tokyo D-ku, Tokyo" was generated. The accuracy of "Kato" is 1 based on the distance between pronunciations. The accuracy of "July 1, 1990" is 1 based on the distance between pronunciations. The accuracy of "Tokyo D-ku, Tokyo" is 1 based on the distance between pronunciations.

Since each of the above search items has two name queries, two date of birth queries, and two registered domicile queries, the following eight types of search queries are generated.
Search Query 1: "Sato, January 1, 1990, A-ku, Tokyo"
Search Query 2: "Kato, January 1, 1990, A-ku, Tokyo"
Search Query 3: "Sato, July 1, 1990, A-ku, Tokyo"
Search Query 4: "Kato, July 1, 1990, A-ku, Tokyo"
Search Query 5: "Sato, January 1, 1990, D-ku, Tokyo"
Search query 6: "Kato, January 1, 1990, D-ku, Tokyo"
Search Query 7: "Sato, July 1, 1990, D-ku, Tokyo"
Search Query 8: "Kato, July 1, 1990, D-ku, Tokyo"
In the above, the notation of phonemes and the like of each search term is omitted.

Here, assuming that the weight of the name query is λ ₁ , the weight of the date of birth query is λ ₂ , and the weight of the registered domicile query is λ ₃ , the accuracy of the above search query is calculated as follows.
Search query 1: λ ₁ x 0 + λ ₂ x 0 + λ ₃ x 0 = 0
Search query 2: λ ₁ x 1 + λ ₂ x 0 + λ ₃ x 0 = λ ₁
Search query 3: λ ₁ x 0 + λ ₂ x 1 + λ ₃ x 0 = λ ₂
Search query 4: λ ₁ x 1 + λ ₂ x 1 + λ ₃ x 0 = λ ₁ + λ ₂
Search query 5: λ ₁ x 0 + λ ₂ x 0 + λ ₃ x 1 = λ ₃
Search query 6: λ ₁ x 1 + λ ₂ x 0 + λ ₃ x 1 = λ ₁ + λ ₃
Search query 7: λ ₁ x 0 + λ ₂ x 1 + λ ₃ x 1 = λ ₂ + λ ₃
Search query 8: λ ₁ × 1 ＋ λ ₂ × 1 ＋ λ ₃ × 1 ＝ λ ₁ ＋ λ ₂ ＋ λ ₃
For example, for each of the name query weight λ ₁ , the date of birth query weight λ ₂ , and the registered domicile query weight λ ₃ , it is preferable to set search items with many recognition errors to large values. The name query weight λ ₁ , the date of birth query weight λ ₂ , and the registered domicile query weight λ ₃ may all be 1.

FIG. 13 is a conceptual diagram for explaining the accuracy of the search query calculated by the search device 20. In the example of FIG. 13, the voice data "Mr. Shibataro, born on January 1, 1990, registered domicile is A-ku, Tokyo" is input to the search device 20. In the example of FIG. 13, the weight of each verification item is 1.

The search device 20 extracts the character strings "Shibataro", "Heiseininenichigatsutsuitachi", and "Tokyotoake" from the text data based on the input voice data for each verification item. The search device 20 generates a search term for each verification item based on the distance between pronunciations. For example, the search device 20 generates search terms such as "Shibataro" having a distance between pronunciations of 0, "Kibataro" having a distance between pronunciations of 1, and so on for a name query. For example, the search device 20 generates search terms such as "Heiseininenichigatsuitsuitachi" with a pronunciation distance of 0, "Heiseinenenshichigatsutsuitachi" with a pronunciation distance of 1, and so on, with respect to the date of birth query. do. For example, the search device 20 generates search terms such as "Tokyo Tokyo" having a distance between pronunciations of 0, "Tokyo Tokyo" having a distance between pronunciations of 1, and so on, with respect to the registered domicile query.

The search device 20 generates a search query by combining a plurality of search terms generated for each search item. For example, the search device 20 generates a search query of "Shibataro, Heiseininenichigatsutsuitachi, Tokyotoake". The accuracy of this search query is 0, which is the sum of the distances between pronunciations of each search term. For example, the search device 20 generates a search query of "Kibataro, Heiseinen Shichigatsutsuitachi, Tokyo Todake". The accuracy of this search query is 3, which is the sum of the distances between pronunciations of each search term.

For example, the search unit 23 searches the DB 100 based on the accuracy of the generated plurality of search queries, and generates text data that is the source of voice data to be output to a radio (not shown).

(motion)
Next, the operation of the search device 20 will be described with reference to the drawings. In the following, the generation of the search query and the search using the generated search query will be described individually. The following operation is an example and does not limit the operation of the search device 20.

[Search query generation]
FIG. 14 is a flowchart for explaining an example of generating a search query by the search device 20. In the description according to the flowchart of FIG. 14, the search device 20 is the main operating body.

In FIG. 14, first, the search device 20 acquires voice data (step S211). For example, the search device 20 acquires voice data transmitted from a radio (not shown).

Next, the search device 20 converts the acquired voice data into text data by voice recognition (step S212).

Next, the search device 20 extracts the character string corresponding to the search item from the text data (step S213).

Next, the search device 20 generates a search term related to the extracted character string for each search item based on the distance between pronunciations (step S214).

Next, the search device 20 generates a plurality of search queries that combine search terms for each search item (step S215).

Next, the search device 20 calculates the accuracy of the generated search query based on the distance between pronunciations for each search term (step S216).

Next, the search device 20 ranks a plurality of search queries according to the accuracy (step S217).

〔search〕
FIG. 15 is a flowchart for explaining an example in which the search device 20 searches the DB 200 using the generated search query. In the description according to the flowchart of FIG. 15, the search device 20 is the main operating body.

In FIG. 15, first, the search device 20 searches the DB 200 using the generated search query (step S251).

When a plurality of hits in the search are hit (Yes in step S252), the search device 20 generates a plurality of text data including the hit search results (step S253). Next, the search device 20 converts each of the generated text data into voice data (step S254). Then, the search device 10 outputs a plurality of voice data according to the order of accuracy of the search query (step S255). For example, a plurality of voice data output from the search device 20 are output as voice in a radio (not shown) in an order according to the order of accuracy of the search query.

On the other hand, when a plurality of searches are not hit (No in step S252), the search device 20 generates text data according to the search result (step S256). The case where a plurality of search results are not hit includes the case where the search result is not hit and the case where only one search result is hit. If the search result is not hit, the search device 20 generates text data indicating that the search result was not obtained. On the other hand, when only one search result is hit, the search device 20 generates text data including the hit search result. Next, the search device 20 converts the generated text data into voice data (step S257). Then, the search device 20 outputs voice data (step S258). For example, the voice data output from the search device 20 is output as voice in a radio (not shown).

As described above, the search unit of the present embodiment calculates the accuracy of the sum of the pronunciation distances of the search terms to which the weights are given for each search item for each search query. The search unit ranks search queries according to their accuracy.

According to this embodiment, it is possible to ask back whether the search terms constituting the search query are correct or not, or to search the database according to the accuracy of the search query. Therefore, according to the present embodiment, the search efficiency and the search accuracy can be improved.

In one aspect of the present embodiment, the search unit generates text data including search terms constituting a search query ranked according to accuracy. The conversion unit converts the generated text data into voice data, and outputs the voice data converted from the text data in order according to the order of accuracy of the search query from which the text data is generated.

The search device of this embodiment can output voice data according to the accuracy of the search query. For example, a user who hears the voice data output by the search device can recognize the search result according to the order of the accuracy of the search query.

(Third embodiment)
Next, the search device of the third embodiment will be described with reference to the drawings. The search device of the present embodiment is different from the first and second embodiments in that the search query is ranked based on the dictionary of the distance between pronunciations (also referred to as the dictionary of distances between pronunciations) for at least one search item. different.

(composition)
FIG. 16 is a block diagram showing an example of the configuration of the search device 30 of the present embodiment. The search device 30 includes an acquisition unit 31, a first conversion unit 32, a search unit 33, a dictionary 34, a second conversion unit 38, and an output unit 39. The acquisition unit 31 and the output unit 39 constitute an input / output unit 310. The first conversion unit 32 and the second conversion unit 38 constitute a conversion unit 320. FIG. 16 also shows a database (DB300) connected to the search device 30. The DB 300 is connected to the search unit 33 via a network such as the Internet or an intranet. A plurality of collation data are stored in the DB 300. Since the configurations other than the search unit 33 and the dictionary 34 included in the search device 30 are the same as the configurations included in the search device 10 of the first embodiment, detailed description thereof will be omitted. In the following, the description will be focused on the search unit 33 and the dictionary 34.

The dictionary 34 is a dictionary (also referred to as an inter-pronunciation distance dictionary) that summarizes the inter-pronunciation distances of character strings corresponding to search items. The dictionary 34 is prepared in advance for each search item. For example, when the search item is the surname of the name, the surname recorded in the national biographical dictionary, the family register, or the like and the ranking according to the pronunciation distance between the surnames are registered in the dictionary 34. For example, when N surnames are stored, the ranking according to the distance between pronunciations with other surnames is registered in the dictionary 34 for each N surnames (N is a natural number). However, even if the kanji used for the surname is the same, the reading may be different, but one reading shall be associated with one kanji. In other words, the dictionary 34 includes a data sequence in which other character strings are arranged in order of order according to the distance between pronunciations for each of the stored character strings.

For example, for each of Yamada (YAMADA), Sato (SATOU), and Kato (KATOU), the pronunciation distance between Yamada and Sato, the pronunciation distance between Sato and Kato, and the pronunciation distance between Kato and Yamada are defined. To. For example, the distance between pronunciations between Sato (SATOU) and Kato (KATOU) is 1 because one phoneme is replaced.

FIG. 17 is an example of a table (dictionary between pronunciations 340) included in the dictionary 34 whose search item is the surname. For example, in the inter-pronunciation distance dictionary 340, Kato is ranked first, Saito is ranked second, and so on. Note that FIG. 17 is a conceptual one of the dictionary 34 and does not accurately show the ranking corresponding to the actual surname.

For example, when the character string recognized from the voice data is "Sato", the search unit 33 extracts a character string having a higher rank than "Sato" as a search term. For example, the search unit 33 extracts a character string having a rank up to M as a search term in the field of "Sato" of the pronunciation distance dictionary 340 (M is a natural number). For example, the search unit 33 may extract a character string whose inter-pronunciation distance is within the Xth place as a search term in the field of "Sato" of the inter-pronunciation distance dictionary 340 (X is a natural number).

Further, the dictionary 34 may include a spelling alphabet. The spelling alphabet is a table that summarizes the rules established to prevent mistakes in hearing voice in wireless communication and the like. FIG. 18 is a spelling alphabet 360 including the contents of the spelling alphabet described in Appendix 5 of the Radio Station Operation Regulations. For example, to prevent misunderstanding of "A", "Asahinoa" is uttered. For example, to prevent misunderstanding of "shi", "shinbunnoshi" is uttered. The spelling alphabet 360 may include not only characters but also data related to numbers and symbols. Further, the spelling alphabet 360 may include not only Japanese characters but also characters, numbers, and symbols related to European languages such as alphabets.

FIG. 19 is a conceptual diagram showing an example in which the search device 30 listens back to the name recognized by voice recognition. For example, the search device 30 outputs back-listening voice data according to the score and accuracy of the search term. FIG. 20 is an example in which the correct voice data is returned via a radio (not shown) because the name of the voice data to be heard back in FIG. 19 is incorrect. In the example of FIG. 20, the search device 30 can refer to the voice data of "Shinbunnoshi" registered in the spelling alphabet 360 and recognize the exact name "Shibataro". When answering back, if it is decided in advance to answer with the pattern registered in the spelling alphabet 360, it becomes easy to confirm the correctness of the character string recognized by the voice recognition according to the pattern.

(motion)
Next, the operation of the search device 30 will be described with reference to the drawings. In the following, the generation of the search query will be described. The following operation is an example and does not limit the operation of the search device 30.

[Search query generation]
FIG. 21 is a flowchart for explaining an example of generating a search query by the search device 30. In the description according to the flowchart of FIG. 21, the search device 30 is the main operating body.

In FIG. 21, first, the search device 30 acquires voice data (step S311). For example, the search device 30 acquires voice data output from a radio (not shown).

Next, the search device 30 converts the acquired voice data into text data by voice recognition (step S312).

Next, the search device 30 extracts the character string corresponding to the search item from the text data (step S313).

Next, the search device 30 refers to the pronunciation distance dictionary for each search item, and selects a search term based on the order according to the pronunciation distance with the extracted character string (step S314).

Next, the search device 30 generates a plurality of search queries that combine search terms selected for each search item (step S315).

As described above, in the search unit of the present embodiment, a plurality of other character strings corresponding to the search item are ranked according to the distance between pronunciations for each of the plurality of character strings corresponding to the search item. Refer to the pronunciation distance dictionary. The search unit refers to the pronunciation distance dictionary and selects a character string having a higher rank in the pronunciation distance dictionary with respect to the character string extracted from the text data as a search term.

According to this embodiment, since the search term is selected by referring to the inter-pronunciation distance dictionary, processing such as calculation of the inter-pronunciation distance can be omitted. Therefore, according to the present embodiment, it is possible to speed up the generation of search terms and search queries.

(Fourth Embodiment)
Next, the search device of the fourth embodiment will be described with reference to the drawings. This embodiment embodies the exchange between the search device and the radio. In this embodiment, an identity inquiry by a police officer will be described as an example.

(composition)
FIG. 22 is a block diagram showing an example of the configuration of the search device 40 of the present embodiment. The search device 40 includes an input / output unit 41, a first conversion unit 42, a search unit 43, a registration information recording unit 44, and a second conversion unit 48. The first conversion unit 42 and the second conversion unit 48 constitute a conversion unit 420. FIG. 22 also shows a radio 450 that exchanges voice data with the search device 40, and a database group (DB group 400) connected to the search device 40. Each of the plurality of DBs constituting the DB group 400 is connected to the search unit 43 via a network such as the Internet or an intranet. The DB group 400 includes a plurality of DBs for each inquiry type. A plurality of collation data for each inquiry type is stored in each of the plurality of DBs included in the DB group. The radio 450 exchanges voice data with the search device 40. Although only one radio 450 is shown in FIG. 22, the search device 40 can exchange voice data with a plurality of radios 450. Further, the radio 450 may include a part or all of the configuration of the search device 40. Since the main configuration of the search device 40 is the same as the configuration included in the search device 10 of the first embodiment, detailed description thereof will be omitted. In the following, the explanation will focus on the exchange of voice data between the radio 450 and the search device 40.

The input / output unit 41 (also referred to as an input / output unit) acquires voice data based on a radio signal transmitted from the radio 450. The input / output unit 41 outputs voice data to the first conversion unit 42. Further, the input / output unit 41 outputs the voice data acquired from the second conversion unit.

For example, the radio 450 transmits a radio signal including voice data by wireless communication in a specific frequency band. For example, the radio signal transmitted from the radio 450 is converted into an electric signal via an antenna, an amplifier, a demodulator, etc. (not shown), and is converted into an electric signal to the input / output unit 41 of the search device 40 via a network such as the Internet or an intranet. Entered. For example, the input / output unit 41 outputs the voice data acquired from the second conversion unit 48 toward the radio 450.

The registration information of the radio 450 is registered in the registration information recording unit 44. For example, the registration information is a user identifier of a user who uses the radio 450 or a device identifier of the radio 450. The search device 40 exchanges voice data with the radio 450 of the transmission source of the identification information matching the registration information recorded in the registration information recording unit 44.

The search unit 43 executes processing according to the content of the text data acquired from the first conversion unit 42. In the present embodiment, an example in which the text data acquired by the search unit 43 includes identification information, inquiry type, inquiry information, and the like will be given. The search unit 43 performs processing such as generation of text data including response contents to the source of voice data before conversion of text data and search of DB included in DB group 400 according to the contents of text data to be acquired. Run. The text data generated by the search unit 43 is converted into voice data by the second conversion unit 48, and is output from the input / output unit 41 toward the radio 450.

When the voice data from the radio 450 contains the identification information, the search unit 43 refers to the registration information recording unit 44 and determines whether the identification information is recorded in the registration information recording unit 44. When the identification information is recorded in the registration information recording unit 44, the search unit 43 generates text data for inquiring about the inquiry type. For example, when the identification information is not recorded in the registration information recording unit 44, the search unit 43 generates text data notifying that the identification information does not match and text data instructing the retransmission of the identification information. The text data generated by the search unit 43 is converted into voice data by the second conversion unit 48 and output to the radio 450.

When the inquiry type is included in the voice data from the radio 450, the search unit 43 generates text data for inquiring the inquiry content to the sender of the inquiry type. For example, the search unit 43 may generate text data for inquiring about the inquiry content, including the content for asking for the inquiry type. The text data generated by the search unit 43 is converted into voice data by the second conversion unit 48 and output to the radio 450.

When the inquiry information is included in the voice data from the radio 450, the search unit 43 extracts the character string corresponding to the search item from the text data. The search unit 43 generates a search term related to the extracted character string based on the distance between pronunciations. Further, the search unit 43 includes text data including a content for confirming a search term having the closest pronunciation distance to the extracted character string according to the extraction status of the character string corresponding to the search item, and a search that could not be extracted. Re-listen to the item Generate text data that includes the content.

(motion)
Next, the mutual relationship in operation between the radio 450, the search device 40, and the DB included in the DB group 400 will be described using a sequence diagram. In the following, an example will be described of the flow related to the generation of search terms and the flow of processing and the like executed thereafter. In the following, it is assumed that the user who handles the radio 450 inputs voice information according to the voice data from the search device 40 to the radio 450. The mutual relationship in the operation between the radio 450, the search device 40, and the DB included in the DB group 400 listed below is an example, and does not limit the operation and the mutual relationship.

FIG. 23 is a sequence diagram showing an example of the flow from the connection to the search device 40 by the radio device 450 to the generation of the search term by the search device 40.

First, the radio 450 connects to the search device 40 (step S411). The connection method of the radio 450 to the search device 40 is not particularly limited.

When the search device 40 detects the connection of the radio device 450, the search device 40 outputs voice data including a request for identification information to the radio device 450 (step S412). For example, when the search device 40 detects the connection of the radio 450, the search device 40 responds to the radio 450 with voice data including a request for identification information such as "Automatic response. Please give me your P number, affiliation, and name." Is output. The P number is an identifier for uniquely identifying a police officer.

When the radio 450 receives the voice data including the request for the identification information, the radio 450 outputs the voice data including the identification information input by voice according to the voice data to the search device 40 (step S413). For example, in response to a request for identification information, information including identification information such as "P number XX, XX station, area XX, XX" is input to the radio 450 by voice. For example, the radio 450 outputs voice data including identification information such as "P number XX, XX station, area XX, XX" to the search device 40.

When the search device 40 receives the voice data including the identification information, it confirms whether the identification information included in the voice data is registered in the registration information recording unit 44 (step S414). When the identification information included in the voice data received from the radio 450 is registered in the registration information recording unit 44, the search device 40 outputs the voice data requesting the inquiry type to the radio 450 (step S415). For example, when the identification information included in the voice data received from the radio 450 is registered, the search device 40 uses an inquiry type such as "P number OOOO, OOOO, please give me an inquiry type." The voice data including the request of is output to the radio 450.

When the radio 450 receives the voice data including the request of the inquiry type, the radio 450 outputs the voice data including the inquiry type input by voice according to the voice data to the search device 40 (step S416). For example, the radio 450 outputs voice data including an inquiry type such as "It is a comprehensive inquiry due to exemption from liability" to the search device 40.

When the search device 40 receives the voice data including the inquiry type, the search device 40 confirms the inquiry type (step S417). The search device 40 outputs voice data requesting inquiry information to the radio 450 (step S418). For example, the search device 40 is a voice requesting inquiry information such as "Comprehensive inquiry, isn't it? If there is an error in the inquiry type, please correct it. If you like, please give us the name, date of birth, etc. of the other party." The data is output to the radio 450.

When the radio 450 receives the voice data requesting the inquiry information, it outputs the voice data including the inquiry information input by voice according to the voice data to the search device 40 (step S419). For example, the radio 450 outputs voice data including inquiry information such as "Mr. Shibataro, January 1, 1990, registered domicile address A-ku, Tokyo" to the search device 40.

When the search device 40 receives the voice data including the inquiry information, the search device 40 extracts the character string corresponding to the search item of the inquiry information from the text data based on the acquired voice data. The search device 40 generates a search term from the extracted character string based on the distance between pronunciations (step S420).

FIG. 24 is a sequence diagram showing an example of the flow from the generation of the search term by the search device 40 to the output of the collation result. The sequence diagram of FIG. 24 relates to a process following the generation of the search term in step S420 of FIG.

In FIG. 24, when the search device 40 generates a search term (step S420), the search device 40 outputs voice data including the confirmation content of the search term to the radio device 450 (step S421). For example, the search device 40 outputs voice data including confirmation contents of a search term such as "Search in Kibataro, A-ku, Tokyo. If there is an error, please correct the error part." To the radio 450. ..

Further, the search device 40 combines the generated search terms for each search item to generate a plurality of search queries (step S422). The search device 40 uses at least one of the generated plurality of search queries to search the DBs included in the DB group 400 in which the collation data of the query type being inquired is stored (step S423). ).

When the radio 450 receives the voice data including the confirmation content of the search term, it outputs the voice data including the voice input response according to the voice data to the search device 40 (step S424). For example, the radio 450 outputs voice data including a response such as "There is no doubt" to the search device 40. If there is no error in the confirmation content of the search term, step S424 may be omitted.

When the collation data of the inquiry type being inquired is stored in the DB, the search device 40 acquires the search result from the DB (step S425). The search device 40 outputs the collation result according to the search result to the radio 450 (step S426). For example, if the search is a hit, the search device 40 will use the audio data including the collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, corresponds to XX." Is output to the radio 450. For example, if the search does not hit, the search device 40 includes a collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, does not correspond to XX." The voice data is output to the radio 450.

According to the example of FIG. 24, since it is possible to obtain a search result using a search query composed of search terms confirmed to be correct, the search accuracy can be improved.

FIG. 25 is a sequence diagram showing another example of the flow from the generation of the search term by the search device 40 to the output of the collation result. FIG. 25 is an example in which the search term confirmation and the DB search using a plurality of search queries are performed in parallel. The search term confirmation and the DB search using a plurality of search queries may be performed at the same timing, or may be performed at slightly different timings. The sequence diagram of FIG. 25 relates to a process following the generation of the search term in step S420 of FIG.

In FIG. 25, when the search device 40 generates a search term (step S420), the search device 40 combines the generated search terms for each search item to generate a plurality of search queries (step S431).

The search device 40 outputs voice data including the confirmation content of the search term to the radio 450 (step S432). For example, the search device 40 outputs voice data including confirmation contents of a search term such as "Search in Kibataro, A-ku, Tokyo. If there is an error, please correct the error part." To the radio 450. .. Step S432 may be performed in parallel with step S432 or may be performed in advance of step S432.

In parallel with step S432, the search device 40 searches the DBs included in the DB group 400 for the DB in which the collation data of the query type being queried is stored, using the plurality of generated search queries. Step S433). At this time, the search device 40 searches the DB using all of the generated search queries. When the collation data of the inquiry type being inquired is stored in the DB, the search device 40 acquires the search result from the DB (step S434).

When the radio 450 receives the voice data including the confirmation content of the search term from the search device 40, the radio 450 outputs the voice data including the response (confirmation result) voiced according to the voice data to the search device 40 (step). S435). For example, if there is an error in the search term, voice information including correction of the search term such as "The name is Shibataro. Newspaper Shi." Is input to the radio 450. The radio 450 outputs voice data corresponding to the input voice information to the search device 40. If there is no error in the confirmation content of the search term, step S435 may be omitted.

When the search term is incorrect, the search device 40 receives voice data including correction of the search term from the radio 450. The search device 40 outputs the collation result corresponding to the search result hit by the search using the search query composed of the correct search terms to the radio 450 (step S436). For example, if the search is a hit, the search device 40 will use the audio data including the collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, corresponds to XX." Is output to the radio 450. For example, if the search does not hit, the search device 40 includes a collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, does not correspond to XX." The voice data is output to the radio 450.

According to the example of FIG. 25, since the DB is searched using a plurality of search queries in parallel with the search term confirmation, although an incorrect search result can be obtained by the search using any of the search queries, it is correct. Search results will be obtained. According to the example of FIG. 25, the search efficiency may be improved because the search result using the search query composed of the correct search terms may be selected according to the confirmation result of the search terms.

FIG. 26 is a sequence diagram showing still another example of the flow from the generation of the search term by the search device 40 to the output of the collation result. The sequence diagram of FIG. 26 relates to a process following the generation of the search term in step S420 of FIG.

In FIG. 26, when the search device 40 generates a search term (step S420), the search device 40 outputs voice data including the confirmation content of the search term to the radio device 450 (step S441). For example, the search device 40 outputs voice data including confirmation contents of a search term such as "Search in Kibataro, A-ku, Tokyo. If there is an error, please correct the error part." To the radio 450. ..

Further, the search device 40 combines the generated search terms for each search item to generate a plurality of search queries (step S442). The search device 40 uses at least one of the generated plurality of search queries to search the DBs included in the DB group 400 in which the collation data of the query type being inquired is stored (step S443). ).

When the radio 450 receives the voice data including the confirmation content of the search term from the search device 40, the radio 450 outputs the voice data including the response (confirmation result) voiced according to the voice data to the search device 40 (step). S444). For example, if there is an error in the search term, voice information including a correction such as "The name is Shibataro. Newspaper Shi." Is input to the radio 450. The radio 450 outputs voice data corresponding to the input voice information to the search device 40.

The search device 40 receives voice data including corrections from the radio 450 when there is an error in the search term. The search device 40 selects another search term based on the correction, and outputs voice data including the confirmation content of the search term to the radio 450 (step S445). For example, the search device 40 outputs voice data including confirmation contents of a search term such as "Search by Shibatarou. If there is an error, please correct it." To the radio 450. Further, the search device 40 uses a search query composed of search terms selected based on the correction, and among the DBs included in the DB group 400, the DB in which the collation data of the inquiry type being inquired is stored is stored. Search (step S446).

When the radio 450 receives the voice data including the reconfirmation content of the search term from the search device 40, the radio 450 outputs the voice data including the response (confirmation result) voiced according to the voice data to the search device 40 (). Step S447). For example, if the search term is correct, voice information such as "There is no doubt" is input to the radio 450. The radio 450 outputs voice data corresponding to the input voice information to the search device 40.

When the search term is correct, the search device 40 receives voice data indicating that the search term is correct from the radio 450. The search device 40 acquires the search results hit by the search using the search query composed of the correct search terms from the DB (step S448). The search device 40 outputs the collation result corresponding to the search result hit by the search using the search query composed of the correct search terms to the radio 450 (step S449). For example, if the search is a hit, the search device 40 will use the audio data including the collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, corresponds to XX." Is output to the radio 450. For example, if the search does not hit, the search device 40 includes a collation result such as "Born on January 1, 1990, Mr. Shibataro, whose registered address is A-ku, Tokyo, does not correspond to XX." The voice data is output to the radio 450.

According to the example of FIG. 26, the DB can be searched using a search query composed of search terms confirmed to be correct, so that the search accuracy is improved. Further, according to the example of FIG. 26, when the search term is wrong, the search using the search query including the wrong search term can be stopped, so that the search efficiency is improved.

(Fifth Embodiment)
Next, the search device of the fifth embodiment will be described with reference to the drawings. The search device of the present embodiment has a simplified configuration of the search device of the first to fourth embodiments. FIG. 27 is a block diagram showing an example of the configuration of the search device 50 of the present embodiment. The search device 50 includes a conversion unit 52 and a search unit 53.

The conversion unit 52 converts the input voice data into text data by voice recognition. The search unit 53 extracts the character string corresponding to the search item from the text data. The search unit 53 generates a search term related to the character string for each search item based on the distance from the extracted character string. The search unit 53 generates a plurality of search queries by combining the search items generated for each search item.

(hardware)
Here, the hardware configuration for executing the processing of the search device according to each embodiment of the present invention will be described by taking the information processing device 90 of FIG. 28 as an example. The information processing device 90 of FIG. 28 is a configuration example for executing the processing of the search device of each embodiment, and does not limit the scope of the present invention.

As shown in FIG. 28, the information processing device 90 includes a processor 91, a main storage device 92, an auxiliary storage device 93, an input / output interface 95, and a communication interface 96. In FIG. 28, the interface is abbreviated as I / F (Interface). The processor 91, the main storage device 92, the auxiliary storage device 93, the input / output interface 95, and the communication interface 96 are connected to each other via the bus 98 so as to be capable of data communication. Further, the processor 91, the main storage device 92, the auxiliary storage device 93, and the input / output interface 95 are connected to a network such as the Internet or an intranet via the communication interface 96.

The processor 91 expands the program stored in the auxiliary storage device 93 or the like to the main storage device 92, and executes the expanded program. In the present embodiment, the software program installed in the information processing apparatus 90 may be used. The processor 91 executes the process by the search device according to the present embodiment.

The main storage device 92 has an area in which the program is expanded. The main storage device 92 may be a volatile memory such as a DRAM (Dynamic Random Access Memory). Further, a non-volatile memory such as MRAM (Magnetoresistive Random Access Memory) may be configured / added as the main storage device 92.

The auxiliary storage device 93 stores various data. The auxiliary storage device 93 is composed of a local disk such as a hard disk or a flash memory. It is also possible to store various data in the main storage device 92 and omit the auxiliary storage device 93.

The input / output interface 95 is an interface for connecting the information processing device 90 and peripheral devices. The communication interface 96 is an interface for connecting to an external system or device through a network such as the Internet or an intranet based on a standard or a specification. The input / output interface 95 and the communication interface 96 may be shared as an interface for connecting to an external device.

The information processing device 90 may be configured to connect an input device such as a keyboard, a mouse, or a touch panel, if necessary. These input devices are used to input information and settings. When the touch panel is used as an input device, the display screen of the display device may also serve as the interface of the input device. Data communication between the processor 91 and the input device may be mediated by the input / output interface 95.

Further, the information processing apparatus 90 may be equipped with a display device for displaying information. When a display device is provided, it is preferable that the information processing device 90 is provided with a display control device (not shown) for controlling the display of the display device. The display device may be connected to the information processing device 90 via the input / output interface 95.

The above is an example of the hardware configuration for enabling the search device according to each embodiment of the present invention. The hardware configuration of FIG. 28 is an example of the hardware configuration for executing the arithmetic processing of the search device according to each embodiment, and does not limit the scope of the present invention. Further, the scope of the present invention also includes a program for causing a computer to execute a process related to the search device according to each embodiment. Further, a recording medium on which a program according to each embodiment is recorded is also included in the scope of the present invention. For example, the recording medium can be realized by an optical recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). Further, the recording medium may be realized by a semiconductor recording medium such as a USB (Universal Serial Bus) memory or an SD (Secure Digital) card, a magnetic recording medium such as a flexible disk, or another recording medium.

The components of the search device of each embodiment can be arbitrarily combined. Further, the components of the search device of each embodiment may be realized by software or by a circuit.

Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above embodiments. Various modifications that can be understood by those skilled in the art can be made to the structure and details of the present invention within the scope of the present invention.

10, 20, 30, 40

Search device

11, 21, 31

Acquisition unit

12, 22, 32, 42

First conversion unit

13, 23, 33, 43

Search unit

18, 28, 38, 48

Second conversion unit

19, 29 , 39 Output unit 34 Dictionary 41 Input / output unit 44 Registration information recording unit 52 Conversion unit 53 Search unit 90 Information processing device 91 Processor 92 Main storage device 93 Auxiliary storage device 95 Input / output interface 96 Communication interface 98

Bus

100, 200, 300 DB
400 DB group

Claims

A conversion means that converts input voice data into text data by voice recognition,
A character string corresponding to the search item is extracted from the text data, and a search term related to the character string is generated for each search item based on the extracted distance from the character string, and for each search item. A search device including a search means for generating a plurality of search queries by combining the generated search terms.
The search means is
The first aspect of claim 1, wherein a character string having a close distance between the character string and the pronunciation is generated as the search term based on the pronunciation distance based on the difference in phonemes constituting the character string extracted from the text data. Search device.
The search means is
The search device according to claim 2, wherein the distance between phonemes is calculated using a predefined distance between two phonemes.
The search means is
For each of the plurality of character strings corresponding to the search item, the text is referred to with reference to the pronunciation distance dictionary in which the plurality of other character strings corresponding to the search item are ranked according to the pronunciation distance. The search device according to claim 2 or 3, wherein the character string having a higher rank in the pronunciation distance dictionary with respect to the character string extracted from the data is selected as the search term.
The search means is
The accuracy, which is the sum of the distances between pronunciations of the search term to which the weight for each search item is given, is calculated for each search query.
The search device according to any one of claims 2 to 4, which ranks the search queries according to the accuracy.
The search means is
Generate the text data containing the search terms constituting the search query ranked according to the accuracy.
The conversion means
5. Claim 5 that the generated text data is converted into the voice data, and the voice data converted from the text data is sequentially output according to the order of accuracy of the search query from which the text data is generated. The search device described in.
The search means is
A score based on the voice recognition is given to the search term, and the text data including the search term to which the score is given is generated.
The conversion means
The search device according to any one of claims 1 to 6, which converts the generated text data into the voice data and outputs the voice data converted from the text data.
The search means is
Using at least one of the search queries, a database in which matching data including the search item is accumulated is searched, and the database is searched.
Generate the text data according to the search results and generate
The conversion means
The search device according to any one of claims 1 to 7, which converts the generated text data into the voice data and outputs the voice data converted from the text data.
The computer
The input voice data is converted into text data by voice recognition,
Extract the character string corresponding to the search item from the text data,
Based on the extracted distance from the character string, a search term related to the character string is generated for each search item.
A search method that generates a plurality of search queries by combining the search items generated for each search item.
The process of converting input voice data into text data by voice recognition,
The process of extracting the character string corresponding to the search item from the text data, and
A process of generating a search term related to the character string for each search item based on the extracted distance from the character string, and
A non-transient recording medium in which a program that causes a computer to execute a process of generating a plurality of search queries by combining the search items generated for each search item is recorded.