JP6702119B2

JP6702119B2 - Speech recognition result creating device, method and program

Info

Publication number: JP6702119B2
Application number: JP2016187778A
Authority: JP
Inventors: 伊東　秀夫; 秀夫伊東
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-09-27
Filing date: 2016-09-27
Publication date: 2020-05-27
Anticipated expiration: 2036-09-27
Also published as: JP2018054717A

Description

本発明は、音声認識結果作成装置、方法及びプログラムに関する。 The present invention relates to a speech recognition result creating device, method and program.

従来、音声を認識する音声認識技術が知られている。音声認識技術では、音声をテキストに変換し、得られたテキストを複数の時間区画に分節し、分節された各テキストに対応する語句を選択し、選択した語句を連接することにより、音声認識結果（音声に対応する文章）が作成される。一般に、音声認識技術では、語句毎に整備された辞書データを利用して、分節された各テキストを対応する語句に変換する。これにより、音声認識結果の精度を向上させることができる。 Conventionally, a voice recognition technology for recognizing voice is known. Speech recognition technology converts speech into text, divides the obtained text into multiple time segments, selects the phrase corresponding to each segmented text, and concatenates the selected phrases to obtain the speech recognition result. (Sentence corresponding to voice) is created. Generally, in speech recognition technology, dictionary data prepared for each phrase is used to convert each segmented text into a corresponding phrase. Thereby, the accuracy of the voice recognition result can be improved.

一方、近年、インターネット上で音声認識サービスが提供されている。音声認識サービスには、音声認識結果と、分節された各テキストに対応する語句の候補と、を含む音声認識データをユーザに提供するものがある。ユーザは、音声データを音声認識サービスに入力することで、音声データに対応する音声認識データを得ることができる。 On the other hand, in recent years, a voice recognition service has been provided on the Internet. Some voice recognition services provide the user with voice recognition data including a voice recognition result and word and phrase candidates corresponding to each segmented text. The user can obtain voice recognition data corresponding to the voice data by inputting the voice data to the voice recognition service.

しかしながら、上記従来の音声認識サービスでは、ユーザは、辞書データを独自に整備することができなかった。このため、音声認識サービスから得られる音声認識結果は、精度が低いという問題があった。 However, in the above-mentioned conventional voice recognition service, the user cannot prepare the dictionary data independently. Therefore, there is a problem that the accuracy of the voice recognition result obtained from the voice recognition service is low.

本発明は、上記の課題に鑑みてなされたものであり、音声認識データに基づいて、精度の高い音声認識結果を作成可能とすることを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to make it possible to create a highly accurate voice recognition result based on voice recognition data.

一実施形態に係る音声認識結果作成装置は、各分節テキストに対応する語句の候補及び前記各候補の評価値を含む音声認識データを入力される入力部と、辞書データを記憶する辞書データ記憶部と、前記辞書データと、前記音声認識データと、の一致する部分を検索する検索部と、検索結果と、前記音声認識データと、に基づいて、音声認識結果を作成する作成部と、を備える。 A speech recognition result creating device according to an embodiment is an input unit for receiving speech recognition data including candidate words and phrases corresponding to segment texts and evaluation values of the candidates, and a dictionary data storage unit that stores dictionary data. And a search unit that searches a matching portion of the dictionary data and the voice recognition data, and a creation unit that creates a voice recognition result based on the search result and the voice recognition data. ..

本発明の各実施形態によれば、音声認識データに基づいて、精度の高い音声認識結果を作成することができる。 According to each embodiment of the present invention, a highly accurate voice recognition result can be created based on voice recognition data.

音声認識結果作成装置の機能構成の一例を示す図。The figure which shows an example of a function structure of a speech recognition result preparation device. 音声認識データの一例を示す図。The figure which shows an example of voice recognition data. 文章データ（辞書データ）の一例を示す図。The figure which shows an example of text data (dictionary data). 音声認識結果作成装置のハードウェア構成の一例を示す図。The figure which shows an example of the hardware constitutions of a speech recognition result preparation device. 音声認識結果作成装置の動作の概要を示すフローチャート。The flowchart which shows the outline|summary of operation|movement of a speech recognition result preparation apparatus. 第１実施形態における音声認識結果の作成処理の一例を示すフローチャート。The flowchart which shows an example of the production|generation process of the speech recognition result in 1st Embodiment. 評価値の更新後の音声認識データの一例を示す図。The figure which shows an example of the voice recognition data after the update of an evaluation value. 同音語データ（辞書データ）の一例を示す図。The figure which shows an example of homophone data (dictionary data). 第２実施形態における音声認識結果の作成処理の一例を示すフローチャート。The flowchart which shows an example of the production|generation process of the speech recognition result in 2nd Embodiment. 置換用データ（辞書データ）の一例を示す図。The figure which shows an example of the data (dictionary data) for substitution. 第３実施形態における音声認識結果の作成処理の一例を示すフローチャート。The flowchart which shows an example of the production|generation process of the speech recognition result in 3rd Embodiment. 第４実施形態における音声認識結果の作成処理の一例を示すフローチャート。The flowchart which shows an example of the production|generation process of the speech recognition result in 4th Embodiment.

以下、本発明の各実施形態について、添付の図面を参照しながら説明する。なお、各実施形態に係る明細書及び図面の記載に関して、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重畳した説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In addition, regarding the description of the specification and the drawings according to each embodiment, components having substantially the same functional configurations are denoted by the same reference numerals and overlapping description will be omitted.

（第１実施形態）
第１実施形態に係る音声認識結果作成装置（以下、「作成装置」という）１について、図１〜図７を参照して説明する。まず、作成装置１の機能構成について説明する。図１は、作成装置１の機能構成の一例を示す図である。図１の作成装置１は、入力部１１と、音声認識データ記憶部１２と、辞書データ記憶部１３と、検索部１４と、作成部１５と、を備える。 (First embodiment)
A speech recognition result creating apparatus (hereinafter referred to as “creating apparatus”) 1 according to the first embodiment will be described with reference to FIGS. 1 to 7. First, the functional configuration of the creation device 1 will be described. FIG. 1 is a diagram illustrating an example of a functional configuration of the creation device 1. The creation device 1 of FIG. 1 includes an input unit 11, a voice recognition data storage unit 12, a dictionary data storage unit 13, a search unit 14, and a creation unit 15.

入力部１１は、外部の音声認識サービスが出力した音声認識データを入力される。入力部１１は、入力された音声認識データを音声認識データ記憶部１２に記憶させる。 The input unit 11 receives voice recognition data output by an external voice recognition service. The input unit 11 stores the input voice recognition data in the voice recognition data storage unit 12.

音声認識データ記憶部１２は、音声認識データを記憶する。ここで、音声認識データについて説明する。音声認識サービスは、認識対象の音声を入力されると、当該音声をテキストに変換する。次に、音声認識サービスは、得られたテキストを、複数の時間区分に分節する。分節された各テキストを分節テキストという。続いて、音声認識サービスは、各分節テキストに対応する語句の候補を、各候補の評価値とともに、音声認識データとして出力する。すなわち、音声認識データは、各分節テキストに対応する語句の候補と、各候補の評価値と、を含むデータである。ここでいう評価値は、その候補の確からしさ、すなわち、正解である確率の高さを示す値である。正解とは、音声の発話者が意図した語句と一致することをいう。以下では、評価値が高いほど、正解である確率が高いものとする。この場合、音声認識サービスにより得られた音声認識結果は、各分節テキストの語句の候補の中で、最も評価値が高い候補を連接したものとなる。 The voice recognition data storage unit 12 stores voice recognition data. Here, the voice recognition data will be described. When a voice to be recognized is input, the voice recognition service converts the voice into text. The speech recognition service then segments the resulting text into multiple time segments. Each segmented text is called segment text. Then, the speech recognition service outputs the candidate words and phrases corresponding to each segment text together with the evaluation value of each candidate as the speech recognition data. That is, the voice recognition data is data including candidate words and phrases corresponding to each segment text and an evaluation value of each candidate. The evaluation value here is a value indicating the likelihood of the candidate, that is, the high probability of being a correct answer. The correct answer means matching with the word or phrase intended by the speaker of the voice. Below, it is assumed that the higher the evaluation value, the higher the probability that the answer is correct. In this case, the speech recognition result obtained by the speech recognition service is a concatenation of the candidates having the highest evaluation value among the candidate words of each segment text.

図２は、音声認識データ記憶部１２に記憶された音声認識データの一例を示す図である。図２の音声認識データは、「リコーノキカクダ」という音声に対して出力されたものである。発話者は、「リコーの企画だ」という発言を意図したものとする。 FIG. 2 is a diagram showing an example of the voice recognition data stored in the voice recognition data storage unit 12. The voice recognition data in FIG. 2 is output for the voice "Ricoh no Kakuda". It is assumed that the speaker intends to say "Ricoh's project."

図２の例では、音声認識サービスは、「リコーノキカクダ」という音声を「イコウノキカクダ」というテキストに変換し、「イコウ」、「ノ」、「キカク」、「ダ」という４つの分節テキストに分節している。「イコウ」に対応する語句の候補として、「行こう」及び「移行」が出力されている。「行こう」の評価値は０．３であり、「移行」の評価値は０．１である。図２の音声認識データの場合、音声認識サービスにより得られた音声認識結果は「行こうの規格だ」となる。 In the example of FIG. 2, the voice recognition service converts the voice "Ricoh no Kakuda" into the text "Iko no Kakuda" and divides it into four segment texts "Iko", "No", "Kikaku", and "Da". There is. "Let's go" and "transition" are output as the candidates for the word corresponding to "Iko". The evaluation value of "Let's go" is 0.3, and the evaluation value of "Transition" is 0.1. In the case of the voice recognition data shown in FIG. 2, the voice recognition result obtained by the voice recognition service is "the standard for going".

辞書データ記憶部１３は、ユーザにより用意された辞書データを記憶する。辞書データは、音声認識結果の精度を向上させるためのデータある。本実施形態では、辞書データとして、文章データが記憶される。文章データは、発話者の発言に関連するものであるのが好ましい。 The dictionary data storage unit 13 stores dictionary data prepared by the user. The dictionary data is data for improving the accuracy of the voice recognition result. In this embodiment, sentence data is stored as dictionary data. The sentence data is preferably related to the utterance of the speaker.

図３は、辞書データ記憶部１３に記憶された文章データ（辞書データ）の一例を示す図である。図３の例では、辞書データとして、「リコーのサービスを開発する。」、「その企画はすでに検討済み。」、及び「今後の計画を早急に策定する必要がある。」という３つの文章データが記憶されている。 FIG. 3 is a diagram showing an example of sentence data (dictionary data) stored in the dictionary data storage unit 13. In the example of FIG. 3, as the dictionary data, three sentence data of “Develop a service of Ricoh”, “The plan has already been considered”, and “It is necessary to formulate a future plan immediately”. Is remembered.

検索部１４は、音声認識データ及び辞書データをマッチングする。すなわち、検索部１４は、音声認識データと、辞書データと、の一致する部分を検索する。検索部１４による検索方法について、詳しくは後述する。 The search unit 14 matches the voice recognition data and the dictionary data. That is, the search unit 14 searches for a matching portion between the voice recognition data and the dictionary data. The search method by the search unit 14 will be described later in detail.

作成部１５は、音声認識データと、検索部１４による検索結果と、に基づいて、音声認識結果を作成する。作成部１５による音声認識結果の作成方法について、詳しくは後述する。 The creation unit 15 creates a voice recognition result based on the voice recognition data and the search result by the search unit 14. The method of creating the voice recognition result by the creating unit 15 will be described in detail later.

次に、作成装置１のハードウェア構成について説明する。図４は、作成装置１のハードウェア構成の一例を示す図である。図４の作成装置１は、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、ＨＤＤ（Hard Disk Drive）１０４と、を備える。また、作成装置１は、入力装置１０５と、表示装置１０６と、通信インタフェース１０７と、バス１０８と、を備える。 Next, the hardware configuration of the creation device 1 will be described. FIG. 4 is a diagram illustrating an example of the hardware configuration of the creation device 1. The creation device 1 of FIG. 4 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, and an HDD (Hard Disk Drive) 104. Further, the creation device 1 includes an input device 105, a display device 106, a communication interface 107, and a bus 108.

ＣＰＵ１０１は、プログラムを実行することにより、作成装置１の全体を制御し、上述の各機能構成を実現する。ＲＯＭ１０２は、ＣＰＵ１０１が実行するプログラムを含む各種のデータを記憶する。ＲＡＭ１０３は、ＣＰＵ１０１に作業領域を提供する。ＨＤＤ１０４は、ＣＰＵ１０１が実行するプログラムを含む各種のデータを記憶する。入力装置１０５は、ユーザからの操作に応じた情報を作成装置１に入力する。入力装置１０５は、例えば、キーボード、マウス、及びタッチパネルなどである。表示装置１０６は、映像や画像を表示する。表示装置１０６は、例えば、液晶ディスプレイや有機ＥＬ（Electro Luminescence）ディスプレイなどである。通信インタフェース１０７は、作成装置１を外部のネットワークに接続するためのインタフェースである。バス１０８は、ＣＰＵ１０１と、ＲＯＭ１０２と、ＲＡＭ１０３と、ＨＤＤ１０４と、入力装置１０５と、表示装置１０６と、通信インタフェース１０７と、を接続する。 By executing the program, the CPU 101 controls the entire creation device 1 and realizes the above-described functional configurations. The ROM 102 stores various data including a program executed by the CPU 101. The RAM 103 provides the CPU 101 with a work area. The HDD 104 stores various data including programs executed by the CPU 101. The input device 105 inputs information according to a user operation to the creation device 1. The input device 105 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 106 displays video and images. The display device 106 is, for example, a liquid crystal display or an organic EL (Electro Luminescence) display. The communication interface 107 is an interface for connecting the creating apparatus 1 to an external network. The bus 108 connects the CPU 101, the ROM 102, the RAM 103, the HDD 104, the input device 105, the display device 106, and the communication interface 107.

次に、本実施形態に係る作成装置１の動作について説明する。図５は、作成装置１の動作の概要を示すフローチャートである。作成装置１は、入力部１１に音声認識データを入力されると、図５の動作を開始する。 Next, the operation of the creating device 1 according to the present embodiment will be described. FIG. 5 is a flowchart showing an outline of the operation of the creation device 1. When the voice recognition data is input to the input unit 11, the creation device 1 starts the operation of FIG.

まず、入力部１１は、入力された音声認識データを、音声認識データ記憶部１２に記憶させる（ステップＳ１０１）。これにより、図２のような音声認識データが音声認識データ記憶部１２に記憶される。 First, the input unit 11 stores the input voice recognition data in the voice recognition data storage unit 12 (step S101). As a result, the voice recognition data as shown in FIG. 2 is stored in the voice recognition data storage unit 12.

次に、検索部１４は、音声認識データ及び辞書データをマッチングする。そして、作成部１５は、検索部１４による検索結果と、音声認識データと、に基づいて音声認識結果を作成する（ステップＳ１０２）。 Next, the search unit 14 matches the voice recognition data and the dictionary data. Then, the creation unit 15 creates a voice recognition result based on the search result by the search unit 14 and the voice recognition data (step S102).

図６は、本実施形態における音声認識結果の作成処理の一例を示すフローチャートである。図６のフローチャートは、図５のステップＳ１０２の内部処理に相当する。以下では、音声認識データ記憶部１２に図２の音声認識データが記憶され、辞書データ記憶部１３に図３の文章データが記憶されているものとする。 FIG. 6 is a flowchart showing an example of a voice recognition result creation process in this embodiment. The flowchart of FIG. 6 corresponds to the internal processing of step S102 of FIG. In the following, it is assumed that the voice recognition data storage unit 12 stores the voice recognition data of FIG. 2 and the dictionary data storage unit 13 stores the sentence data of FIG.

まず、検索部１４は、音声認識データに含まれる分節テキストの中から、分節テキストを１つ選択する（ステップＳ２０１）。ここでは、「イコウ」が選択されたものとする。 First, the search unit 14 selects one segment text from the segment texts included in the voice recognition data (step S201). Here, it is assumed that "Iko" is selected.

次に、検索部１４は、選択した分節テキストの語句の候補の中から、候補を１つ選択する（ステップＳ２０２）。ここでは、「行こう」が選択されたものとする。 Next, the search unit 14 selects one candidate from the candidates of the phrase of the selected segment text (step S202). Here, it is assumed that “Let's go” is selected.

続いて、検索部１４は、選択した候補を検索キーとして、文章データ（辞書データ）を検索する（ステップＳ２０３）。検索部１４は、検索キー（選択した候補）と文章データとの一致件数を、検索結果として出力する。図３の文章データには「行こう」と一致する部分はないため、検索結果として０件が出力される。 Subsequently, the search unit 14 searches the sentence data (dictionary data) using the selected candidate as a search key (step S203). The search unit 14 outputs the number of matches between the search key (selected candidate) and the text data as a search result. Since the text data in FIG. 3 has no part that matches "Let's go", 0 is output as the search result.

検索部１４が検索結果を出力すると、作成部１５は、出力された検索結果に基づいて、選択中の候補の評価値を更新する（ステップＳ２０４）。本実施形態では、作成部１５は、文章データとの一致件数が多いほど、評価値が高くなるように、評価値を更新する。評価値の更新方法は、任意である。以下では、元の評価値に一致件数を加算することにより、評価値を更新するものとする。この場合、「行こう」は、元の評価値が０．３であり、一致件数が０件であるから、更新後の評価値は０．３（＝０．３＋０）となる。 When the search unit 14 outputs the search result, the creation unit 15 updates the evaluation value of the selected candidate based on the output search result (step S204). In the present embodiment, the creation unit 15 updates the evaluation value such that the evaluation value increases as the number of matches with the text data increases. The method of updating the evaluation value is arbitrary. In the following, the evaluation value is updated by adding the number of matches to the original evaluation value. In this case, since the original evaluation value of "Let's go" is 0.3 and the number of matching cases is 0, the updated evaluation value is 0.3 (=0.3+0).

検索部１４は、検索が終了すると、ステップＳ２０１で選択した分節テキストの全候補が選択されたか（未選択の候補があるか）を確認する（ステップＳ２０５）。未選択の候補がある場合（ステップＳ２０５のＮＯ）、処理はステップＳ２０２に戻る。そして、検索部１４は、未選択の候補の中から次の候補を選択する（ステップＳ２０２）。これにより、「移行」が選択される。 When the search is completed, the search unit 14 confirms whether all the segment text candidates selected in step S201 have been selected (whether there are unselected candidates) (step S205). If there are unselected candidates (NO in step S205), the process returns to step S202. Then, the search unit 14 selects the next candidate from the unselected candidates (step S202). As a result, "migration" is selected.

一方、全候補が選択された場合（ステップＳ２０５のＹＥＳ）、検索部１４は、音声認識データに含まれる全分節テキストが選択されたか（未選択の分節テキストがあるか）を確認する（ステップＳ２０６）。 On the other hand, when all candidates have been selected (YES in step S205), the search unit 14 confirms whether all segment texts included in the voice recognition data have been selected (whether there is any unselected segment text) (step S206). ).

未選択の分節テキストがある場合（ステップＳ２０６のＮＯ）、処理はステップＳ２０１に戻る。そして、検索部１４は、未選択の分節テキストの中から、次の分節テキストを選択する（ステップＳ２０１）。これにより、「ノ」が選択される。 If there is an unselected segment text (NO in step S206), the process returns to step S201. Then, the search unit 14 selects the next segment text from the unselected segment texts (step S201). As a result, "No" is selected.

一方、全分節テキストが選択された場合（ステップＳ２０６のＹＥＳ）、作成部１５は、各分節テキストについて、評価値が最高の候補を選択し、選択した候補を連接する（ステップＳ２０７）。これにより、音声認識結果が作成される。 On the other hand, when all segment texts are selected (YES in step S206), the creating unit 15 selects the candidate having the highest evaluation value for each segment text and connects the selected candidates (step S207). As a result, a voice recognition result is created.

図７は、音声認識データ記憶部１２に記憶された音声認識データの一例を示す図である。図７の音声認識データは、図２の音声認識データの評価値を更新したものに相当する。図７の音声認識データの場合、「イコウ」の候補として「行こう」が選択され、「ノ」の候補として「の」が選択され、「キカク」の候補として「企画」が選択され、「ダ」の候補として「だ」が選択される。これにより、「行こうの企画だ」という音声認識結果が作成される。 FIG. 7 is a diagram showing an example of the voice recognition data stored in the voice recognition data storage unit 12. The voice recognition data of FIG. 7 corresponds to the updated evaluation value of the voice recognition data of FIG. In the case of the voice recognition data of FIG. 7, "Let's go" is selected as a candidate of "Iko", "No" is selected as a candidate of "No", "Plan" is selected as a candidate of "Kikaku", and " "Da" is selected as a candidate for "Da". As a result, the voice recognition result of "I'm planning to go" is created.

作成部１５が作成した「行こうの企画だ」という音声認識結果は、音声認識サービスにより得られた「行こうの規格だ」という音声認識結果に比べて、発言者が意図した「リコーの企画だ」という発言に近くなっていることがわかる。これは、文章データに基づいて評価値を更新した結果、「企画」の評価値が「規格」の評価値より高くなり、「キカク」という分節テキストに対応する語句として「企画」が選択されたためである。 The voice recognition result "I plan to go" created by the creating unit 15 is "Ricoh's plan" that the speaker intended compared to the voice recognition result "It's a standard to go" obtained by the voice recognition service. It is close to the statement "No." This is because, as a result of updating the evaluation value based on the text data, the evaluation value of "plan" becomes higher than the evaluation value of "standard", and "plan" is selected as the phrase corresponding to the segment text "Kikaku". Is.

このように、本実施形態によれば、音声認識サービスが出力した音声認識データと、ユーザが用意した文章データ（辞書データ）と、に基づいて、音声認識結果の精度を向上させることができる。言い換えると、本実施形態によれば、音声認識サービスに比べて、精度の高い音声認識結果を作成することができる。 As described above, according to this embodiment, the accuracy of the voice recognition result can be improved based on the voice recognition data output by the voice recognition service and the sentence data (dictionary data) prepared by the user. In other words, according to the present embodiment, it is possible to create a highly accurate voice recognition result as compared with the voice recognition service.

また、本実施形態によれば、辞書データは、文章データであるため、容易に用意することができる。すなわち、データを解析や加工を行うことなく、辞書データを用意することができる。 Further, according to the present embodiment, since the dictionary data is text data, it can be easily prepared. That is, dictionary data can be prepared without analyzing or processing the data.

なお、本実施形態では、語句の候補が１つしかない分節テキストは、選択を省略されてもよい。これは、候補が１つしかない場合、評価値の更新の有無にかかわらず、その候補が選択されるためである。 In the present embodiment, selection of segment text having only one word candidate may be omitted. This is because if there is only one candidate, that candidate is selected regardless of whether or not the evaluation value has been updated.

（第２実施形態）
第２実施形態に係る作成装置１について、図８及び図９を参照して説明する。本実施形態に係る作成装置１の機能構成及びハードウェア構成は、第１実施形態と同様である。ただし、本実施形態では、辞書データとして、文章データ及び同音語データが記憶され、この同音語データを利用して、音声認識結果が作成される。なお、文章データについては、上述の通りである。 (Second embodiment)
The creation device 1 according to the second embodiment will be described with reference to FIGS. 8 and 9. The functional configuration and hardware configuration of the creation device 1 according to this embodiment are the same as those in the first embodiment. However, in this embodiment, sentence data and homophone data are stored as dictionary data, and a voice recognition result is created using the homophone data. The text data is as described above.

同音語データは、複数の同音語の組（以下、「同音語セット」という）を示すデータである。同音語データは、１つ又は複数の同音語セットを含み、各同音語セットには、複数の同音語が含まれる。 The homophone word data is data indicating a set of a plurality of homophone words (hereinafter referred to as “homophone set”). The homophone data includes one or a plurality of homophone sets, and each homophone set includes a plurality of homophones.

図８は、辞書データ記憶部１３に記憶され同音語データ（辞書データ）の一例を示す図である。図８の例では、同音語データには、「キカク」、「ハシ」及び「ジショ」に対応する３つの同音語セットが含まれる。例えば、「キカク」に対応する同音語セットには、「企画」、「規格」及び「其角」の３つの同音語が含まれる。 FIG. 8 is a diagram showing an example of homophone data (dictionary data) stored in the dictionary data storage unit 13. In the example of FIG. 8, the homophone word data includes three homophone word sets corresponding to “Kikaku”, “Hashi” and “jisho”. For example, a homophone word set corresponding to "Kikaku" includes three homophone words of "plan", "standard", and "its corner".

ここで、本実施形態に係る作成装置１の動作について説明する。本実施形態に係る作成装置１の動作の概要は、第１実施形態と同様である。図９は、本実施形態における音声認識結果の作成処理の一例を示すフローチャートである。以下では、音声認識データ記憶部１２に図２の音声認識データが記憶され、辞書データ記憶部１３に図３の文章データ及び図８の同音語データが記憶されているものとする。 Here, the operation of the creating apparatus 1 according to the present embodiment will be described. The outline of the operation of the creation device 1 according to this embodiment is the same as that of the first embodiment. FIG. 9 is a flowchart showing an example of a voice recognition result creation process according to the present embodiment. In the following, it is assumed that the voice recognition data storage unit 12 stores the voice recognition data of FIG. 2, and the dictionary data storage unit 13 stores the sentence data of FIG. 3 and the homophone data of FIG.

まず、検索部１４は、音声認識データに含まれる分節テキストの中から、分節テキストを１つ選択する（ステップＳ３０１）。ここでは、「キカク」が選択されたものとする。 First, the search unit 14 selects one segment text from the segment texts included in the voice recognition data (step S301). Here, it is assumed that "Kikaku" is selected.

次に、検索部１４は、選択した分節テキストの語句の候補の中から、候補を１つ選択する（ステップＳ３０２）。ここでは、「企画」が選択されたものとする。 Next, the search unit 14 selects one candidate from the candidates of the phrase of the selected segment text (step S302). Here, it is assumed that “plan” is selected.

続いて、検索部１４は、選択した候補を検索キーとして、同音語データ（辞書データ）を検索する（ステップＳ３０３）。検索部１４は、検索キー（選択した候補）を含む同音語セットを、検索結果として出力する。 Subsequently, the search unit 14 searches for homophone data (dictionary data) using the selected candidate as a search key (step S303). The search unit 14 outputs a homophone set including the search key (selected candidate) as a search result.

検索キーを含む同音語セットがない場合（ステップＳ３０４のＮＯ）、処理はステップＳ３０７に進む。 When there is no homophone set including the search key (NO in step S304), the process proceeds to step S307.

一方、検索キーを含む同音語セットがある場合（ステップＳ３０４のＹＥＳ）、検索部１４は、選択した同音語を検索キーとして、文章データ（辞書データ）を検索する（ステップＳ３０５）。検索部１４は、検索キー（選択した候補）と文章データとの一致件数を、検索結果として出力する。図３の文章データには「企画」と一致する部分が１つあるため、検索結果として１件が出力される。 On the other hand, when there is a homophone set including the search key (YES in step S304), the search unit 14 searches the sentence data (dictionary data) using the selected homophone as a search key (step S305). The search unit 14 outputs the number of matches between the search key (selected candidate) and the text data as a search result. Since the text data in FIG. 3 has one part that matches “plan”, one result is output as the search result.

検索部１４が検索結果を出力すると、作成部１５は、出力された検索結果に基づいて、選択中の候補の評価値を更新する（ステップＳ３０６）。評価値の更新方法は、第１実施形態と同様である。この場合、「企画」は、元の評価値が０．４であり、一致件数が１件であるから、更新後の評価値は１．４３（＝０．４＋１）となる。 When the search unit 14 outputs the search result, the creation unit 15 updates the evaluation value of the selected candidate based on the output search result (step S306). The method of updating the evaluation value is the same as in the first embodiment. In this case, since the original evaluation value of “plan” is 0.4 and the number of matching cases is 1, the updated evaluation value is 1.43 (=0.4+1).

検索部１４は、検索が終了すると、ステップＳ３０１で選択した分節テキストの全候補が選択されたか（未選択の候補があるか）を確認する（ステップＳ３０７）。未選択の候補がある場合（ステップＳ３０７のＮＯ）、処理はステップＳ３０２に戻る。そして、検索部１４は、未選択の候補の中から次の候補を選択する（ステップＳ３０２）。これにより、「規格」が選択される。 When the search is completed, the search unit 14 confirms whether all candidates of the segment text selected in step S301 have been selected (whether there are unselected candidates) (step S307). If there are unselected candidates (NO in step S307), the process returns to step S302. Then, the search unit 14 selects the next candidate from the unselected candidates (step S302). As a result, the "standard" is selected.

一方、全候補が選択された場合（ステップＳ３０７のＹＥＳ）、検索部１４は、音声認識データに含まれる全分節テキストが選択されたか（未選択の分節テキストがあるか）を確認する（ステップＳ３０８）。 On the other hand, when all candidates are selected (YES in step S307), the search unit 14 confirms whether all segment texts included in the voice recognition data have been selected (whether there is any unselected segment text) (step S308). ).

未選択の分節テキストがある場合（ステップＳ３０８のＮＯ）、処理はステップＳ３０１に戻る。そして、検索部１４は、未選択の分節テキストの中から、次の分節テキストを選択する（ステップＳ３０１）。これにより、「ダ」が選択される。 If there is an unselected segment text (NO in step S308), the process returns to step S301. Then, the search unit 14 selects the next segment text from the unselected segment texts (step S301). As a result, “D” is selected.

一方、全分節テキストが選択された場合（ステップＳ３０８のＹＥＳ）、作成部１５は、各分節テキストについて、評価値が最高の候補を選択し、選択した候補を連接する（ステップＳ３０９）。これにより、音声認識結果が作成される。図３の文章データ及び図８の同音語データを利用した場合、評価値の更新後の音声認識データは、図７の音声認識データと同様になる。これにより、「行こうの企画だ」という音声認識結果が作成される。 On the other hand, when all segment texts are selected (YES in step S308), the creating unit 15 selects the candidate having the highest evaluation value for each segment text and connects the selected candidates (step S309). As a result, a voice recognition result is created. When the text data in FIG. 3 and the homophone data in FIG. 8 are used, the voice recognition data after the evaluation value is updated becomes the same as the voice recognition data in FIG. 7. As a result, the voice recognition result of "I'm planning to go" is created.

以上説明した通り、本実施形態によれば、分節テキストの候補が同音語セットに含まれる場合、当該候補の評価値は更新される。言い換えると、分節テキストの候補が同音語セットに含まれない場合、当該候補の評価値は更新されない。これにより、過剰な評価値の更新を抑制し、精度が高い音声認識結果を作成することができる。 As described above, according to the present embodiment, when the segment text candidate is included in the homophone set, the evaluation value of the candidate is updated. In other words, when the segment text candidate is not included in the homophone set, the evaluation value of the candidate is not updated. As a result, it is possible to suppress excessive updating of the evaluation value and create a highly accurate voice recognition result.

（第３実施形態）
第３実施形態に係る作成装置１について、図１０及び図１１を参照して説明する。本実施形態に係る作成装置１の機能構成及びハードウェア構成は、第１実施形態と同様である。ただし、本実施形態では、辞書データとして、置換用データが記憶され、この置換用データを利用して、音声認識結果が作成される。 (Third Embodiment)
The creation device 1 according to the third embodiment will be described with reference to FIGS. 10 and 11. The functional configuration and hardware configuration of the creation device 1 according to this embodiment are the same as those in the first embodiment. However, in the present embodiment, replacement data is stored as dictionary data, and a speech recognition result is created using this replacement data.

置換用データは、置換する語句（以下、「第１語句」という）と、置換される語句（以下、「第２語句」という）と、の対応関係を示すデータである。置換用データは、例えば、音声認識サービスにより得られた過去の音声認識結果における誤変換に基づいて用意される。 The replacement data is data indicating a correspondence relationship between a word/phrase to be replaced (hereinafter referred to as “first word/phrase”) and a word/phrase to be replaced (hereinafter referred to as “second word/phrase”). The replacement data is prepared, for example, based on the erroneous conversion in the past voice recognition result obtained by the voice recognition service.

図１０は、辞書データ記憶部１３に記憶された置換用データ（辞書データ）の一例を示す図である。図１０の例では、「リコー」という第１語句に対して、「行こう」及び「移行」という第２語句が対応付けられている。また、「トピックモデル」という第１語句に対して、「いつも出る」という第２語句が対応付けられている。 FIG. 10 is a diagram showing an example of replacement data (dictionary data) stored in the dictionary data storage unit 13. In the example of FIG. 10, the first word “Ricoh” is associated with the second words “Go” and “Transition”. Further, the second phrase "always comes out" is associated with the first phrase "topic model".

ここで、本実施形態に係る作成装置１の動作について説明する。本実施形態に係る作成装置１の動作の概要は、第１実施形態と同様である。図１１は、本実施形態における音声認識結果の作成処理の一例を示すフローチャートである。図１１のフローチャートは、図５のステップＳ１０２の内部処理に相当する。以下では、音声認識データ記憶部１２に図２の音声認識データが記憶され、辞書データ記憶部１３に図１０の置換用データが記憶されているものとする。 Here, the operation of the creating apparatus 1 according to the present embodiment will be described. The outline of the operation of the creation device 1 according to this embodiment is the same as that of the first embodiment. FIG. 11 is a flowchart showing an example of a voice recognition result creation process in this embodiment. The flowchart of FIG. 11 corresponds to the internal processing of step S102 of FIG. In the following, it is assumed that the voice recognition data storage unit 12 stores the voice recognition data of FIG. 2 and the dictionary data storage unit 13 stores the replacement data of FIG.

まず、作成部１５は、音声認識データを参照して、各分節テキストについて、評価値が最高の候補を選択し、選択した候補を連接することにより、音声認識結果を作成する（ステップＳ４０１）。この音声認識結果は、音声認識サービスにより得られる音声認識結果に相当する。すなわち、「行こうの規格だ」という音声認識結果が作成される。 First, the creating unit 15 refers to the voice recognition data, selects a candidate having the highest evaluation value for each segment text, and connects the selected candidates to create a voice recognition result (step S401). This voice recognition result corresponds to the voice recognition result obtained by the voice recognition service. In other words, the voice recognition result "It is a standard for going" is created.

次に、検索部１４は、置換用データに含まれる第１語句テキストの中から、第１語句を１つ選択する（ステップＳ４０２）。ここでは、「リコー」が選択されたものとする。 Next, the search unit 14 selects one first word/phrase from the first word/phrase text included in the replacement data (step S402). Here, it is assumed that “Ricoh” is selected.

続いて、検索部１４は、選択した第１語句に対応する第２語句の中から、第２語句を１つ選択する（ステップＳ４０３）。ここでは、「行こう」が選択されたものとする。 Subsequently, the search unit 14 selects one second phrase from the second phrases corresponding to the selected first phrase (step S403). Here, it is assumed that “Let's go” is selected.

検索部１４は、選択した第２語句を検索キーとして、作成部１５が作成した音声認識結果を検索する（ステップＳ４０４）。検索部１４は、検索キー（選択した第２語句）と音声認識結果との一致部分を検索結果として出力する。 The search unit 14 searches for the voice recognition result created by the creating unit 15 using the selected second phrase as a search key (step S404). The search unit 14 outputs the matching portion between the search key (selected second phrase) and the voice recognition result as the search result.

検索部１４が検索結果を出力すると、作成部１５は、検索結果に基づいて、音声認識結果を更新する。具体的には、音声認識結果に第２語句と一致する部分がある場合（ステップＳ４０５のＹＥＳ）、すなわち、音声認識結果に第２語句が含まれる場合、作成部１５は、第２語句を対応する第１語句に置換する（ステップＳ４０６）。これにより、音声認識結果に含まれる「行こう」が「リコー」に置換される。その後、処理はステップＳ４０７に進む。 When the search unit 14 outputs the search result, the creation unit 15 updates the voice recognition result based on the search result. Specifically, when the voice recognition result includes a portion that matches the second word (YES in step S405), that is, when the voice recognition result includes the second word, the creating unit 15 handles the second word. It is replaced with the first phrase to be executed (step S406). As a result, “let's go” included in the voice recognition result is replaced with “Ricoh”. Then, a process progresses to step S407.

一方、音声認識結果に第２語句と一致する部分がない場合（ステップＳ４０５のＮＯ）、すなわち、音声認識結果に第２語句が含まれない場合、処理はステップＳ４０７に進む。 On the other hand, if the voice recognition result does not have a portion that matches the second word (NO in step S405), that is, if the voice recognition result does not include the second word, the process proceeds to step S407.

検索部１４は、検索が終了すると、ステップＳ４０２で選択した第１語句に対応する全第２語句が選択されたか（未選択の第２語句があるか）を確認する（ステップＳ４０７）。未選択の第２語句がある場合（ステップＳ４０７のＮＯ）、処理はステップＳ４０３に戻る。そして、検索部１４は、未選択の第２語句の中から次の第２語句を選択する（ステップＳ４０３）。これにより、「移行」が選択される。 Upon completion of the search, the search unit 14 confirms whether all the second phrases corresponding to the first phrase selected in step S402 have been selected (whether there is any unselected second phrase) (step S407). If there is an unselected second word/phrase (NO in step S407), the process returns to step S403. Then, the search unit 14 selects the next second phrase from the unselected second phrases (step S403). As a result, "migration" is selected.

一方、全第２語句が選択された場合（ステップＳ４０７のＹＥＳ）、検索部１４は、置換用データに含まれる全第１語句が選択されたか（未選択の第１語句があるか）を確認する（ステップＳ４０８）。 On the other hand, when all the second words are selected (YES in step S407), the search unit 14 confirms whether all the first words included in the replacement data have been selected (whether there is any unselected first word). Yes (step S408).

未選択の第１語句がある場合（ステップＳ４０８のＮＯ）、処理はステップＳ４０２に戻る。そして、検索部１４は、未選択の第１語句の中から、次の第１語句を選択する（ステップＳ４０２）。これにより、「トピックモデル」が選択される。 If there is an unselected first word/phrase (NO in step S408), the process returns to step S402. Then, the search unit 14 selects the next first phrase from the unselected first phrases (step S402). As a result, the "topic model" is selected.

一方、全第１語句が選択された場合（ステップＳ４０８のＹＥＳ）、音声認識結果の作成処理が終了する。この時点で作成部１５が保持している音声認識結果が、作成装置１により得られた音声認識結果となる。結果として、「リコーの規格だ」という音声認識結果が作成される。 On the other hand, if all the first words have been selected (YES in step S408), the voice recognition result creation process ends. The voice recognition result held by the creating unit 15 at this point becomes the voice recognition result obtained by the creating device 1. As a result, a voice recognition result "Ricoh's standard" is created.

作成部１５が作成した「リコーの規格だ」という音声認識結果は、音声認識サービスにより得られた「行こうの規格だ」という音声認識結果に比べて、発言者が意図した「リコーの企画だ」という発言に近くなっていることがわかる。これは、置換用データに基づいて語句を置換したことにより、「行こう」が「リコー」に置換されたためである。 The voice recognition result "Ricoh's standard" created by the creating unit 15 is "Ricoh's plan" that the speaker intended, compared to the voice recognition result "It's a standard for going" obtained by the voice recognition service. It is understood that it is close to the remark. This is because “goko” was replaced with “Ricoh” by replacing the phrase based on the replacement data.

このように、本実施形態によれば、音声認識サービスが出力した音声認識データと、ユーザが用意した置換用データ（辞書データ）と、に基づいて、音声認識結果の精度を向上させることができる。言い換えると、本実施形態によれば、音声認識サービスに比べて、精度の高い音声認識結果を作成することができる。 As described above, according to the present embodiment, the accuracy of the voice recognition result can be improved based on the voice recognition data output by the voice recognition service and the replacement data (dictionary data) prepared by the user. .. In other words, according to the present embodiment, it is possible to create a highly accurate voice recognition result as compared with the voice recognition service.

また、本実施形態によれば、分節テキストを、音声認識データに語句の候補として含まれない語句に、変換することができる。 Further, according to this embodiment, the segment text can be converted into a phrase that is not included in the speech recognition data as a phrase candidate.

（第４実施形態）
第４実施形態に係る作成装置１について、図１２を参照して説明する。本実施形態に係る作成装置１の機能構成及びハードウェア構成は、第１実施形態と同様である。ただし、本実施形態では、辞書データとして、文章データ及び置換用データが記憶され、この文章データ及び置換用データを利用して、音声認識結果が作成される。なお、文章データ及び置換用データについては、上述の通りである。 (Fourth Embodiment)
The creation device 1 according to the fourth embodiment will be described with reference to FIG. The functional configuration and hardware configuration of the creation device 1 according to this embodiment are the same as those in the first embodiment. However, in the present embodiment, the sentence data and the replacement data are stored as the dictionary data, and the speech recognition result is created using the sentence data and the replacement data. The text data and the replacement data are as described above.

ここで、本実施形態に係る作成装置１の動作について説明する。本実施形態に係る作成装置１の動作の概要は、第１実施形態と同様である。図１２は、本実施形態における音声認識結果の作成処理の一例を示すフローチャートである。図１２のフローチャートは、図１１のフローチャートに、ステップＳ４０９，Ｓ４１０を追加したものである。以下、ステップＳ４０９，Ｓ４１０について説明する。なお、音声認識データ記憶部１２に図２の音声認識データが記憶され、辞書データ記憶部１３に図３の文章データ及び図１０の置換用データが記憶されているものとする。 Here, the operation of the creating apparatus 1 according to the present embodiment will be described. The outline of the operation of the creation device 1 according to this embodiment is the same as that of the first embodiment. FIG. 12 is a flowchart showing an example of a voice recognition result creation process in this embodiment. The flowchart of FIG. 12 is obtained by adding steps S409 and S410 to the flowchart of FIG. Hereinafter, steps S409 and S410 will be described. It is assumed that the voice recognition data storage unit 12 stores the voice recognition data of FIG. 2, and the dictionary data storage unit 13 stores the sentence data of FIG. 3 and the replacement data of FIG.

本実施形態では、音声認識結果に第２語句と一致する部分がある場合（ステップＳ４０５のＹＥＳ）、検索部１４は、第１近傍文字列及び第２近傍文字列を検索キーとして、文章データを検索する（ステップＳ４０９）。 In the present embodiment, when the voice recognition result has a portion that matches the second word (YES in step S405), the search unit 14 uses the first neighborhood character string and the second neighborhood character string as search keys to retrieve the text data. A search is performed (step S409).

第２近傍文字列は、音声認識結果における、第２語句及びその近傍文字列からなる文字列である。第２語句の近傍文字列とは、例えば、第２語句の直前又は直後の数文字の文字列のことである。第１近傍文字列は、第２近傍文字列に含まれる第２語句を、対応する第１語句に置換した文字列である。ここでは、第２近傍文字列は、第２語句及びその直後の１文字からなる文字列であるものとする。 The second neighboring character string is a character string including the second word and its neighboring character string in the voice recognition result. The neighborhood character string of the second word is, for example, a character string of several characters immediately before or after the second word. The first neighbor character string is a character string in which the second word/phrase included in the second neighbor character string is replaced with the corresponding first word/phrase. Here, the second neighborhood character string is assumed to be a character string consisting of the second word and one character immediately after it.

例えば、第１語句が「リコー」であり、第２語句が「行こう」であり、音声認識結果が「行こうの規格だ」である場合、第２近傍文字列は「行こうの」となり、第１近傍文字列は「リコーの」となる。 For example, when the first word is "Ricoh", the second word is "Let's go", and the voice recognition result is "Goko's standard", the second neighborhood character string is "Gokono". , The first neighborhood character string is “Ricoh's”.

検索部１４は、検索キー（第１近傍文字列）と文章データとの一致件数Ｒ２を、検索結果として出力する。図３の文章データには「リコーの」と一致する部分が１つあるため、検索結果として１件が出力される。 The search unit 14 outputs the number of matches R2 between the search key (first neighborhood character string) and the text data as a search result. Since the text data in FIG. 3 has one part that matches "Ricoh's", one case is output as the search result.

また、検索部１４は、検索キー（第２近傍文字列）と文章データとの一致件数Ｒ２を、検索結果として出力する。図３の文章データには「行こうの」と一致する部分がないため、検索結果として０件が出力される。 Further, the search unit 14 outputs the number of matches R2 between the search key (the second neighborhood character string) and the text data as the search result. Since the text data in FIG. 3 does not have a part that matches “Yuko”, 0 is output as the search result.

作成部１５は、一致件数Ｒ１が一致件数Ｒ２より大きい場合（ステップＳ４１０のＹＥＳ）、第２語句を対応する第１語句に置換する（ステップＳ４０６）。その後、処理はステップＳ４０７に進む。一方、一致件数Ｒ１が一致件数Ｒ２以下である場合（ステップＳ４１０のＮＯ）、処理はステップＳ４０７に進む。本実施形態では、一致件数Ｒ１が一致件数Ｒ２より大きいため、「行こう」が「リコー」に置換され、第３実施形態と同様に、「リコーの規格だ」という音声認識結果が作成される。 When the number of matches R1 is larger than the number of matches R2 (YES in step S410), the creation unit 15 replaces the second word with the corresponding first word (step S406). Then, a process progresses to step S407. On the other hand, if the number of matches R1 is less than or equal to the number of matches R2 (NO in step S410), the process proceeds to step S407. In the present embodiment, the number of matching cases R1 is larger than the number of matching cases R2, and thus “let's go” is replaced with “Ricoh”, and a speech recognition result “Ricoh's standard” is created as in the third embodiment. ..

以上説明した通り、本実施形態によれば、文章データに対する、第１近傍文字列の一致件数Ｒ１が、第２近傍文字列の一致件数Ｒ２より大きい場合、第２語句が第１語句に置換される。言い換えると、文章データに対する、第１近傍文字列の一致件数Ｒ１が、第２近傍文字列の一致件数Ｒ２以下である場合、第２語句が第１語句に置換されない。これにより、作成装置１は、第２語句の過剰な置換を抑制し、精度が高い音声認識結果を作成することができる。 As described above, according to the present embodiment, when the number of matches R1 of the first neighborhood character string with respect to the sentence data is larger than the number of matches R2 of the second neighborhood character string, the second word is replaced with the first word. It In other words, if the number of matches R1 of the first neighborhood character string to the text data is less than or equal to the number of matches R2 of the second neighborhood character string, the second word is not replaced with the first word. Thereby, the creation device 1 can suppress excessive substitution of the second word and create a highly accurate voice recognition result.

なお、上記実施形態に挙げた構成等に、その他の要素との組み合わせなど、ここで示した構成に本発明が限定されるものではない。これらの点に関しては、本発明の趣旨を逸脱しない範囲で変更することが可能であり、その応用形態に応じて適切に定めることができる。 It should be noted that the present invention is not limited to the configurations shown here, such as the combination of the configurations described in the above embodiments with other elements. These points can be changed without departing from the spirit of the present invention, and can be appropriately determined according to the application form.

１：音声認識結果作成装置
１１：入力部
１２：音声認識データ記憶部
１３：辞書データ記憶部
１４：検索部
１５：作成部 1: Voice recognition result creation device 11: Input unit 12: Voice recognition data storage unit 13: Dictionary data storage unit 14: Search unit 15: Creation unit

特開２００４−３３３７０３号公報JP, 2004-333703, A

Claims

An input unit for inputting voice recognition data including candidate words and phrases corresponding to each segment text and evaluation values of the respective candidates,
A dictionary data storage unit for storing dictionary data prepared in advance by the user,
A search unit that searches a matching portion of the dictionary data and the voice recognition data;
A creation unit that creates a voice recognition result based on the search result and the voice recognition data;
A voice recognition result creating device including.

The speech recognition result creation according to claim 1, wherein the creation unit creates the speech recognition result by selecting the candidate based on the evaluation value for each of the segment texts and connecting the selected candidates. apparatus.

The dictionary data includes text data,
The search unit searches the sentence data using the candidate as a search key,
The speech recognition result creation device according to claim 1, wherein the creation unit updates the evaluation value of the candidate based on the number of matches between the candidate and the text data.

The dictionary data includes homophone data including at least one homophone set including a plurality of homophones,
The voice according to claim 3, wherein the search unit searches the homonym data using the candidate as a search key, and if the homophone set including the candidate exists, searches the sentence data using the candidate as a search key. Recognition result creation device.

The dictionary data includes replacement data indicating a correspondence relationship between a first word to be replaced and a second word to be replaced,
The creating unit selects the candidate based on the evaluation value for each segment text, and connects the selected candidates to create the voice recognition result,
The search unit searches the voice recognition result using the second word or phrase as a search key,
The speech recognition result creating apparatus according to claim 1, wherein, when the speech recognition result includes the second word/phrase, the creating unit replaces the second word/phrase with the corresponding first word/phrase.

The dictionary data includes text data,
The search unit searches the sentence data using a first neighborhood character string including the first word and a second neighborhood character string including the second word as a search key.
When the number of matches of the first neighborhood character string and the sentence data is greater than the number of matches of the second neighborhood character string and the sentence data, the creation unit replaces the second phrase with the corresponding first phrase. The speech recognition result creation device according to claim 5.

An input step of inputting speech recognition data including candidate words and phrases corresponding to each segment text and evaluation values of the respective candidates;
A search step of searching a matching portion of the dictionary data prepared in advance by the user and the voice recognition data;
A creation step of creating a voice recognition result based on the search result and the voice recognition data;
A method for creating a voice recognition result including.

An input step of inputting speech recognition data including candidate words and phrases corresponding to each segment text and evaluation values of the respective candidates;
A search step of searching a matching portion of the dictionary data prepared in advance by the user and the voice recognition data;
A creation step of creating a voice recognition result based on the search result and the voice recognition data;
A program that causes a computer to execute.