JP2017134162A

JP2017134162A - Voice recognition device, voice recognition method, and voice recognition program

Info

Publication number: JP2017134162A
Application number: JP2016012466A
Authority: JP
Inventors: 高橋　英樹; Hideki Takahashi; 英樹高橋
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-01-26
Filing date: 2016-01-26
Publication date: 2017-08-03

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device capable of improving the accuracy in voice recognition without increasing processing load imposed on an external unit executing voice recognition.SOLUTION: A voice recognition device 10 includes: a storage section 13 that stores a piece of voice data received from a terminal which inputs a voice while associating a piece of identification information relevant to the terminal; an acquisition section 14 that acquires a piece of text data as voice recognition result of the voice data from an external unit; and a correction part 15 that corrects a character string included in the text data using a character string correction dictionary depending on the identification information.SELECTED DRAWING: Figure 3

Description

本発明は、音声認識装置、音声認識方法、及び音声認識プログラムに関する。 The present invention relates to a voice recognition device, a voice recognition method, and a voice recognition program.

従来、端末での通話音声等を後から素早く振り返ることを目的とし、音声認識サーバに搭載した音声認識エンジンを利用して、通話音声等をテキスト化する技術が知られている。 2. Description of the Related Art Conventionally, there has been known a technique for converting a call voice or the like into text using a voice recognition engine installed in a voice recognition server for the purpose of quickly looking back on the call voice or the like at a terminal.

音声認識エンジンでは、特徴点からずれた発声を行う話者の場合、音声認識の精度が低下する。例えば滑舌が悪い人が発音した「聞き逃し発生（ききのがしはっせい）」の音声は、「機能足発生（きのうあしはっせい）」等と誤認識される場合がある。そのため、話者の音声の特徴を音声認識エンジンに学習させることで音声認識の精度を向上させる技術が知られている（例えば、特許文献１参照）。 In the speech recognition engine, the accuracy of speech recognition is reduced in the case of a speaker who makes utterances deviating from feature points. For example, a voice of “Occurrence of missed hearing” (pronounced by a person with a bad tongue) may be erroneously recognized as “occurrence of a functional foot”. For this reason, a technique for improving the accuracy of speech recognition by causing a speech recognition engine to learn the features of the speech of a speaker is known (see, for example, Patent Document 1).

特開２０１０−１７５９６７号公報JP 2010-175967 A

しかしながら、特定の話者の音声の特徴を音声認識エンジンに学習させるためには、当該話者の数十時間分の音声を入力させる必要がある場合もある。また、特定の話者の音声の特徴に基づいた音声認識を行う場合、音声認識サーバ側の処理負荷が高まるという問題がある。 However, in order for the voice recognition engine to learn the characteristics of the voice of a specific speaker, it may be necessary to input the voice of the speaker for several tens of hours. Further, when performing speech recognition based on the characteristics of a specific speaker's voice, there is a problem that the processing load on the voice recognition server increases.

そこで、一側面では、音声認識を実行する外部装置の処理負荷を高めずに、音声認識の精度を向上させることを目的とする。 Therefore, an object of one aspect is to improve the accuracy of speech recognition without increasing the processing load of an external device that performs speech recognition.

一つの案では、音声認識装置において、音声を入力する端末から受信した音声データと、前記端末に関する識別情報とを対応付けて記録する記録部と、外部装置から、前記音声データを音声認識した結果であるテキストデータを取得する取得部と、前記識別情報に応じた、文字列の補正辞書を用いて、前記テキストデータに含まれる文字列を補正する補正部と、を備える。 In one proposal, in the speech recognition apparatus, a result of speech recognition of the speech data from an external device and a recording unit that records speech data received from a terminal that inputs speech and identification information related to the terminal. An acquisition unit that acquires the text data, and a correction unit that corrects a character string included in the text data by using a character string correction dictionary according to the identification information.

一側面によれば、音声認識を実行する外部装置の処理負荷を高めずに、音声認識の精度を向上させることができる。 According to one aspect, the accuracy of speech recognition can be improved without increasing the processing load on an external device that performs speech recognition.

実施形態における音声認識システムの構成例を示す図である。It is a figure which shows the structural example of the speech recognition system in embodiment. 実施形態における音声認識装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the speech recognition apparatus in embodiment. 実施形態における音声認識装置の機能ブロック図である。It is a functional block diagram of the speech recognition apparatus in the embodiment. 録音データ記憶部に記憶されるデータの一例を示す図である。It is a figure which shows an example of the data memorize | stored in a sound recording data storage part. 音声認識結果記憶部に記憶されるデータ一例を示す図である。It is a figure which shows an example of the data memorize | stored in a speech recognition result memory | storage part. 補正辞書記憶部に記憶されるデータの一例を示す図である。It is a figure which shows an example of the data memorize | stored in a correction dictionary memory | storage part. 録音した通話音声を認識する処理のシーケンス図である。It is a sequence diagram of the process which recognizes the recorded telephone call voice. 音声認識結果の補正処理の一例を示すフローチャートである。It is a flowchart which shows an example of the correction process of a speech recognition result. 音声認識結果の補正処理の具体例を説明する図である。It is a figure explaining the specific example of the correction process of a speech recognition result. 補正辞書の登録処理の一例を示すフローチャートである。It is a flowchart which shows an example of the registration process of a correction dictionary. 補正辞書の登録処理の具体例を示す図である。It is a figure which shows the specific example of the registration process of a correction dictionary.

以下、図面に基づいて本発明の実施形態を説明する。図１は、本発明の実施形態における音声認識システムの構成例を示す図である。図１において、音声認識システム１は、音声認識装置１０、電話機２０、及び音声認識サーバ３０（「外部装置」の一例）を含む。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration example of a voice recognition system according to an embodiment of the present invention. In FIG. 1, the speech recognition system 1 includes a speech recognition device 10, a telephone 20, and a speech recognition server 30 (an example of “external device”).

音声認識装置１０と電話機２０とは、公衆電話網、携帯電話網、インターネット等の通信回線によって通信可能に接続される。 The speech recognition apparatus 10 and the telephone 20 are connected so as to be communicable via a communication line such as a public telephone network, a mobile telephone network, or the Internet.

音声認識装置１０と音声認識サーバ３０とは、インターネット等の通信回線によって通信可能に接続される。 The speech recognition apparatus 10 and the speech recognition server 30 are connected to be communicable via a communication line such as the Internet.

音声認識装置１０は、例えば、スマートフォン、タブレット型端末、携帯電話、ＰＣ（Personal Computer）等である。 The voice recognition device 10 is, for example, a smartphone, a tablet terminal, a mobile phone, a PC (Personal Computer), or the like.

音声認識装置１０は、電話機２０から受信した通話等の音声を録音し、録音した音声の音声認識を音声認識サーバ３０に実行させる。音声認識装置１０は、音声認識結果のテキストデータを、通話相手に応じた辞書に基づいて補正する。なお、音声認識装置１０は、電話機２０と同様の端末でもよい。 The voice recognition device 10 records voice such as a call received from the telephone 20 and causes the voice recognition server 30 to perform voice recognition of the recorded voice. The voice recognition device 10 corrects the text data of the voice recognition result based on a dictionary corresponding to the call partner. Note that the voice recognition device 10 may be a terminal similar to the telephone 20.

電話機２０は、例えば、スマートフォン、携帯電話、固定電話、ＩＰ電話機、ＰＣ（Personal Computer）等である。電話機２０は、音声、留守番電話、及びボイスメッセージ等により、音声認識装置１０に音声を送信（入力）する。 The telephone 20 is, for example, a smartphone, a mobile phone, a fixed phone, an IP phone, a PC (Personal Computer), or the like. The telephone 20 transmits (inputs) voice to the voice recognition device 10 by voice, answering machine, voice message, or the like.

音声認識サーバ３０は、音声認識装置１０から受信した音声データを音声認識し、音声認識結果のテキストデータを、音声認識装置１０に送信する。なお、音声認識サーバ３０における音声認識の処理は、公知の技術を用いて行われてもよい。 The voice recognition server 30 recognizes voice data received from the voice recognition device 10 and transmits text data of the voice recognition result to the voice recognition device 10. Note that the voice recognition processing in the voice recognition server 30 may be performed using a known technique.

図２は、実施の形態における音声認識装置１０のハードウェア構成例を示す図である。図２の音声認識装置１０は、それぞれバスＢで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、インタフェース装置１０５、表示装置１０６、及び入力装置１０７等を有する。
音声認識装置１０での処理を実現する音声認識プログラムは、ＳＤメモリカード等の記録媒体１０１によって提供される。音声認識プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、音声認識プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、音声認識プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされた音声認識プログラムを格納すると共に、必要なファイルやデータ等を格納する。
メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って音声認識装置１０に係る機能を実現する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。表示装置１０６はプログラムによるＧＵＩ（Graphical User Interface）等を表示する。入力装置１０７はタッチパネル及びボタン等、またはキーボード及びマウス等で構成され、様々な操作指示を入力させるために用いられる。 FIG. 2 is a diagram illustrating a hardware configuration example of the speech recognition apparatus 10 according to the embodiment. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, a display device 106, an input device 107, and the like that are connected to each other via a bus B.
A voice recognition program for realizing processing in the voice recognition device 10 is provided by a recording medium 101 such as an SD memory card. When the recording medium 101 on which the voice recognition program is recorded is set in the drive device 100, the voice recognition program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the voice recognition program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed voice recognition program and also stores necessary files and data.
The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 realizes functions related to the speech recognition apparatus 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network. The display device 106 displays a GUI (Graphical User Interface) or the like by a program. The input device 107 includes a touch panel and buttons, or a keyboard and mouse, and is used to input various operation instructions.

なお、記録媒体１０１の一例としては、ＳＤメモリカード、ＣＤ−ＲＯＭ、ＤＶＤディスク、又はＵＳＢメモリ等の可搬型の記録媒体が挙げられる。また、補助記憶装置１０２の一例としては、ＨＤＤ（Hard Disk Drive）又はフラッシュメモリ等が挙げられる。記録媒体１０１及び補助記憶装置１０２のいずれについても、コンピュータ読み取り可能な記録媒体に相当する。 An example of the recording medium 101 is a portable recording medium such as an SD memory card, a CD-ROM, a DVD disk, or a USB memory. An example of the auxiliary storage device 102 is an HDD (Hard Disk Drive) or a flash memory. Both the recording medium 101 and the auxiliary storage device 102 correspond to computer-readable recording media.

音声認識サーバ３０のハードウェア構成は、サーバ用のコンピュータのハードウェア構成であり、図２に示す音声認識装置１０のハードウェア構成例と同様でもよい。 The hardware configuration of the speech recognition server 30 is a hardware configuration of a server computer, and may be the same as the hardware configuration example of the speech recognition apparatus 10 shown in FIG.

次に、図３を参照し、音声認識装置１０の機能構成について説明する。図３は、音声認識装置１０の機能ブロック図である。音声認識装置１０は、通信部１１、記録部１３、取得部１４、補正部１５、及び登録部１６を有する。これら各部は、音声認識装置１０にインストールされた１以上のプログラムが、音声認識装置１０のＣＰＵ１０４に実行させる処理により実現される。 Next, the functional configuration of the speech recognition apparatus 10 will be described with reference to FIG. FIG. 3 is a functional block diagram of the speech recognition apparatus 10. The voice recognition device 10 includes a communication unit 11, a recording unit 13, an acquisition unit 14, a correction unit 15, and a registration unit 16. Each of these units is realized by a process in which one or more programs installed in the speech recognition apparatus 10 are executed by the CPU 104 of the speech recognition apparatus 10.

また、音声認識装置１０は、録音データ記憶部１２Ａ、音声認識結果記憶部１２Ｂ、及び補正辞書記憶部１２Ｃ等を有する。これら各記憶部は、例えば、補助記憶装置１０２等を用いて実現される。 The voice recognition device 10 also includes a recording data storage unit 12A, a voice recognition result storage unit 12B, a correction dictionary storage unit 12C, and the like. Each of these storage units is realized using, for example, the auxiliary storage device 102 or the like.

図４は、録音データ記憶部１２Ａに記憶されるデータの一例を示す図である。録音データ記憶部１２Ａには、通話等の音声データの送信元の識別情報に対応付けて、当該送信元から受信した音声データが記憶される。なお、音声データの送信元の識別情報は、電話機２０を識別する情報、または電話機２０のユーザを識別する情報であり、例えば、音声データの送信元である電話機２０の電話番号や、電話機２０のユーザのアカウントＩＤ等である。 FIG. 4 is a diagram illustrating an example of data stored in the recording data storage unit 12A. The recorded data storage unit 12A stores the audio data received from the transmission source in association with the identification information of the transmission source of the audio data such as a call. The identification information of the voice data transmission source is information for identifying the telephone 20 or information for identifying the user of the telephone 20, for example, the telephone number of the telephone 20 that is the voice data transmission source, User account ID and the like.

図５は、音声認識結果記憶部１２Ｂに記憶されるデータの一例を示す図である。音声認識結果記憶部１２Ｂには、音声データの送信元の識別情報に対応付けて、音声認識サーバ３０による当該音声データに対する音声認識結果であるテキストデータが記憶される。また、当該テキストデータに含まれる文字列について補正が必要な場合は、補正部１５によって補正されたテキストデータが記憶される。 FIG. 5 is a diagram illustrating an example of data stored in the speech recognition result storage unit 12B. The speech recognition result storage unit 12B stores text data that is a speech recognition result for the speech data by the speech recognition server 30 in association with the identification information of the transmission source of the speech data. Further, when correction is necessary for the character string included in the text data, the text data corrected by the correction unit 15 is stored.

図６は、補正辞書記憶部１２Ｃに記憶されるデータの一例を示す図である。補正辞書記憶部１２Ｃには、音声の送信元の識別情報毎に、補正辞書が記憶される。補正辞書は、「補正前の文字列」、「補正後の文字列」、「文節内名詞」のデータ項目を含む。 FIG. 6 is a diagram illustrating an example of data stored in the correction dictionary storage unit 12C. The correction dictionary storage unit 12C stores a correction dictionary for each piece of identification information of the voice transmission source. The correction dictionary includes data items of “character string before correction”, “character string after correction”, and “noun within phrase”.

「補正前の文字列」は、音声認識サーバ３０による音声認識結果であるテキストデータ中で、補正対象とされた文字列である。「補正後の文字列」は、「補正前の文字列」を置換により補正する文字列である。「文節内名詞」は、「補正前の文字列」とともに使用される可能性が高い文字列である。 The “character string before correction” is a character string that is a correction target in text data that is a voice recognition result by the voice recognition server 30. The “character string after correction” is a character string that corrects “character string before correction” by replacement. The “in-phrase noun” is a character string that is likely to be used together with the “character string before correction”.

図３に戻る。通信部１１は、電話機２０や音声認識サーバ３０との通信を行う。 Returning to FIG. The communication unit 11 communicates with the telephone 20 and the voice recognition server 30.

記録部１３は、電話機２０から受信した、通話音声、留守番電話の音声、及びボイスメッセージ等の音声データと、発着信時に取得した電話番号やアカウントＩＤ等の、電話機２０に関する識別情報とを対応付けて、録音データ記憶部１２Ａに記録する。 The recording unit 13 associates voice data such as a voice call, answering machine voice, and voice message received from the telephone 20 with identification information about the telephone 20 such as a telephone number and an account ID acquired at the time of outgoing / incoming call. And recorded in the recording data storage unit 12A.

取得部１４は、録音データ記憶部１２Ａに格納されている音声データを音声認識サーバ３０に送信し、音声認識サーバ３０から、当該音声データを音声認識した結果であるテキストデータを受信し、音声認識結果記憶部１２Ｂに格納する。なお、音声認識装置１０にて、録音データ記憶部１２Ａに格納された音声データを音声認識する構成としてもよい。 The acquisition unit 14 transmits the voice data stored in the recording data storage unit 12A to the voice recognition server 30, receives text data that is a result of voice recognition of the voice data from the voice recognition server 30, and performs voice recognition. Store in the result storage unit 12B. The voice recognition device 10 may be configured to recognize voice data stored in the recording data storage unit 12A.

補正部１５は、音声の送信元である電話機２０に関する識別情報に応じた補正辞書を用いて、取得部１４により取得され、音声認識結果記憶部１２Ｂに格納されているテキストデータに含まれる文字列を補正する。補正部１５は、当該テキストデータに含まれる第１の文字列を、当該補正辞書において第１の文字列に対応付けて登録されている第２の文字列に置換することにより、文字列を補正する。 The correction unit 15 uses the correction dictionary corresponding to the identification information related to the telephone 20 that is the voice transmission source, and the character string included in the text data acquired by the acquisition unit 14 and stored in the voice recognition result storage unit 12B. Correct. The correction unit 15 corrects the character string by replacing the first character string included in the text data with the second character string registered in association with the first character string in the correction dictionary. To do.

補正部１５は、当該テキストデータに、補正辞書記憶部１２Ｃに登録されている「補正前の文字列」及び「文節内名詞」が含まれる場合、当該「補正前の文字列」の文字列について補正を行う。 When the text data includes “character string before correction” and “noun in phrase” registered in the correction dictionary storage unit 12C, the correction unit 15 determines the character string of the “character string before correction”. Make corrections.

補正部１５は、補正後のテキストデータを、音声認識結果記憶部１２Ｂにおいて、補正前のテキストデータに上書きして格納する。 The correction unit 15 stores the corrected text data by overwriting the uncorrected text data in the speech recognition result storage unit 12B.

登録部１６は、ユーザから、取得部１４によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータに対応する識別情報に応じた補正辞書に、編集前の文字列に対応付けて編集後の文字列を登録する。 When the registration unit 16 receives an editing operation on any of the character strings included in the text data acquired by the acquisition unit 14 from the user, the registration unit 16 stores the correction dictionary before the editing in the correction dictionary corresponding to the identification information corresponding to the text data. Register the edited character string in association with the character string.

より詳細には、登録部１６は、当該編集操作を受け付けると、補正辞書記憶部１２Ｃにおいて、発着信履歴や電話帳から取得した、音声の送信元の識別情報に対応付けられた補正辞書にアクセスする。そして、登録部１６は、当該テキストデータの文節から、編集前の文字列とは別の文字列である名詞等を抽出し、編集前の文字列と当該別の文字列とに対応付けて、編集後の文字列を、当該補正辞書に登録する。ここで、登録部１６は、編集前の文字列、別の文字列、編集後の文字列を、当該補正辞書の「補正前の文字列」、「文節内名詞」及び「補正後の文字列」の項目にそれぞれ登録する。 More specifically, upon receiving the editing operation, the registration unit 16 accesses the correction dictionary associated with the identification information of the voice transmission source acquired from the outgoing / incoming call history and the telephone directory in the correction dictionary storage unit 12C. To do. And the registration part 16 extracts the noun etc. which are character strings different from the character string before edit from the clause of the said text data, matched with the character string before edit, and the said another character string, The edited character string is registered in the correction dictionary. Here, the registration unit 16 converts the character string before editing, another character string, and the edited character string into “character string before correction”, “noun within phrase”, and “character string after correction” in the correction dictionary. ”To each item.

次に、図７を参照して、録音した通話音声を認識する際の処理について説明する。図７は、録音した通話音声を認識する処理のシーケンス図である。 Next, processing for recognizing a recorded call voice will be described with reference to FIG. FIG. 7 is a sequence diagram of processing for recognizing a recorded call voice.

ステップＳ１０１において、音声認識装置１０は、電話機２０との間の通話を開始する。 In step S <b> 101, the voice recognition device 10 starts a call with the telephone 20.

続いて、電話機２０は、音声認識装置１０に、通話音声を送信する（ステップＳ１０２）。 Subsequently, the telephone 20 transmits a call voice to the voice recognition device 10 (step S102).

続いて、音声認識装置１０は、電話機２０からの通話音声を録音する（ステップＳ１０３）。 Subsequently, the voice recognition device 10 records the call voice from the telephone 20 (step S103).

続いて、音声認識装置１０は、電話機２０との間の通話を終了する（ステップＳ１０４）。 Subsequently, the voice recognition device 10 ends the call with the telephone 20 (step S104).

続いて、音声認識装置１０は、電話機２０からの通話音声が録音された音声データを、音声認識サーバ３０に送信する（ステップＳ１０５）。 Subsequently, the voice recognition device 10 transmits the voice data in which the call voice from the telephone 20 is recorded to the voice recognition server 30 (step S105).

続いて、音声認識サーバ３０は、受信した音声データについて音声認識を実行し、音声認識結果であるテキストデータを生成する（ステップＳ１０６）。 Subsequently, the voice recognition server 30 performs voice recognition on the received voice data, and generates text data as a voice recognition result (step S106).

続いて、音声認識サーバ３０は、音声認識結果のテキストデータを、音声認識装置１０に送信する（ステップＳ１０７）。 Subsequently, the voice recognition server 30 transmits the text data of the voice recognition result to the voice recognition device 10 (step S107).

続いて、音声認識装置１０は、補正辞書記憶部１２Ｃに格納される音声の送信元の識別情報に応じた補正辞書に基づいて、受信した音声認識結果のテキストデータについて補正処理を行う（ステップＳ１０８）。 Subsequently, the voice recognition device 10 performs correction processing on the received text data of the voice recognition result based on the correction dictionary corresponding to the identification information of the voice transmission source stored in the correction dictionary storage unit 12C (step S108). ).

続いて、音声認識装置１０は、補正した結果のテキストデータを表示装置１０６に表示する（ステップＳ１０９）。 Subsequently, the voice recognition device 10 displays the corrected text data on the display device 106 (step S109).

続いて、音声認識装置１０は、ユーザから、テキストデータに対する編集（修正）操作を受け付ける（ステップＳ１１０）。 Subsequently, the speech recognition apparatus 10 receives an edit (correction) operation on the text data from the user (step S110).

続いて、音声認識装置１０は、補正辞書記憶部１２Ｃに、通話音声の送信元の識別情報に対応付けて、「修正前の文字列」、「修正後の文字列」等を登録する（ステップＳ１１１）。 Subsequently, the speech recognition device 10 registers “character string before correction”, “character string after correction”, and the like in association with the identification information of the transmission source of the call voice in the correction dictionary storage unit 12C (Step S1). S111).

次に、図８を参照して、音声認識結果の補正処理の詳細例について説明する。図８は、音声認識結果の補正処理の一例を示すフローチャートである。 Next, a detailed example of the speech recognition result correction process will be described with reference to FIG. FIG. 8 is a flowchart illustrating an example of a speech recognition result correction process.

ステップＳ２０１において、取得部１４は、音声認識結果のテキストデータを取得する。 In step S201, the acquisition unit 14 acquires text data of a voice recognition result.

続いて、補正部１５は、発着信履歴や電話帳から、音声の送信元の識別情報を取得する（ステップＳ２０２）。 Subsequently, the correction unit 15 acquires the identification information of the voice transmission source from the outgoing / incoming call history and the telephone directory (step S202).

続いて、補正部１５は、補正辞書記憶部１２Ｃを参照し、通話音声の送信元の識別情報に応じた補正辞書を取得する（ステップＳ２０３）。 Subsequently, the correction unit 15 refers to the correction dictionary storage unit 12C and acquires a correction dictionary corresponding to the identification information of the call voice transmission source (step S203).

続いて、補正部１５は、音声認識結果のテキストデータを形態素解析し、テキストデータに含まれる各文節を所定の単位の文字列に分解する（ステップＳ２０４）。 Subsequently, the correction unit 15 performs morphological analysis on the text data of the speech recognition result, and decomposes each clause included in the text data into a character string of a predetermined unit (step S204).

続いて、補正部１５は、分解した文字列の中から、名詞である文字列を抽出する（ステップＳ２０５）。なお、抽出した各文字列の集合を、以下で「文字列集合Ｌ」という。 Subsequently, the correcting unit 15 extracts a character string that is a noun from the decomposed character strings (step S205). The set of extracted character strings is hereinafter referred to as “character string set L”.

続いて、補正部１５は、抽出した名詞の各文字列（文字列集合Ｌに含まれる各文字列）が、補正辞書の「補正前の文字列」に登録されているか判定する（ステップＳ２０６）。 Subsequently, the correcting unit 15 determines whether each character string of the extracted noun (each character string included in the character string set L) is registered in the “character string before correction” in the correction dictionary (step S206). .

抽出した名詞の各文字列が登録されている場合（ステップＳ２０６でＹＥＳ）、補正部１５は、当該各文字列を、文字列集合Ｌから除外する（ステップＳ２０７）。 When each character string of the extracted noun is registered (YES in step S206), the correction unit 15 excludes each character string from the character string set L (step S207).

続いて、補正部１５は、除外されずに残っている名詞の各文字列が、予め記憶されている「時相名詞」等の文字列であるか判定する（ステップＳ２０８）。 Subsequently, the correction unit 15 determines whether each character string of the noun remaining without being excluded is a character string such as “temporal noun” stored in advance (step S208).

判定対象とされた各文字列が、予め記憶されている文字列である場合（ステップＳ２０８でＹＥＳ）、補正部１５は、当該各文字列を、文字列集合Ｌから除外する（ステップＳ２０９）。 When each character string to be determined is a character string stored in advance (YES in step S208), the correction unit 15 excludes each character string from the character string set L (step S209).

続いて、補正部１５は、除外されずに残っている名詞の文字列を、「文節内名詞」として抽出する（ステップＳ２１０）。 Subsequently, the correcting unit 15 extracts the character string of the noun that remains without being excluded as “noun in phrase” (step S210).

続いて、補正部１５は、ステップＳ２０７で除外した、補正辞書に登録されている名詞の各文字列と、ステップＳ２１０で抽出した「文節内名詞」の文字列の組が、補正辞書の「補正前の文字列」及び「文節内名詞」にそれぞれ登録されているか判定する（ステップＳ２１１）。 Subsequently, the correcting unit 15 determines that the combination of each character string of the noun registered in the correction dictionary excluded in step S207 and the character string of “noun in phrase” extracted in step S210 is “correction” in the correction dictionary. It is determined whether it is registered in “previous character string” and “noun in phrase” (step S211).

当該文字列の組が登録されていれば、（ステップＳ２１１でＹＥＳ）、補正部１５は、音声認識結果のテキストデータに含まれる文字列のうち、ステップＳ２０７で除外した「補正前の文字列」を、補正辞書の「補正後の文字列」に補正（置換）する（ステップＳ２１２）。 If the set of the character strings is registered (YES in step S211), the correction unit 15 “character string before correction” excluded in step S207 from the character strings included in the text data of the speech recognition result. Is corrected (replaced) to “character string after correction” in the correction dictionary (step S212).

続いて、補正部１５は、補正した後のテキストデータを、音声認識結果記憶部１２Ｂに格納する（ステップＳ２１３）。 Subsequently, the correction unit 15 stores the corrected text data in the voice recognition result storage unit 12B (step S213).

次に、図９を参照して、音声認識結果の補正処理の具体例について説明する。図９は、音声認識結果の補正処理の具体例を説明する図である。 Next, a specific example of the speech recognition result correction process will be described with reference to FIG. FIG. 9 is a diagram illustrating a specific example of the speech recognition result correction process.

図９には、補正部１５が、図８のステップＳ２０１において、「先日の打ち合わせで機能足が発生したでしょ。」という文節を含むテキストデータを取得した例が示されている。 FIG. 9 illustrates an example in which the correction unit 15 acquires text data including a phrase “A function foot has occurred in the previous meeting” in step S201 of FIG.

この場合、補正部１５は、図８のステップＳ２０４で、当該文節を形態素解析して当該文節を所定の単位の文字列に分解する。補正部１５は、図８のステップＳ２０５で、分解した文字列の中から、名詞を抽出する。図９では、例えば、「先日」、「機能」、「足」、及び「発生」が、名詞として抽出される。なお、文節を形態素解析して名詞を抽出する処理は、公知の技術を用いて行われてもよい。 In this case, the correction unit 15 morphologically analyzes the phrase in step S204 of FIG. 8 and decomposes the phrase into character strings of a predetermined unit. In step S205 of FIG. 8, the correction unit 15 extracts nouns from the decomposed character string. In FIG. 9, for example, “the other day”, “function”, “foot”, and “occurrence” are extracted as nouns. In addition, the process which extracts a noun by performing a morphological analysis on a phrase may be performed using a known technique.

補正部１５は、図８のステップＳ２０７で、抽出された名詞のうち、「機能」と「足」が連続しており、「機能足」が補正前単語であるため、「機能」及び「足」を除外する。それにより「先日」及び「発生」が残る。補正部１５は、図８のステップＳ２０９で、「先日」が予め記憶されている「時相名詞」等の文字列であり、補正対象の文字列と同時に使用される確率が低いため、「先日」を除外する。それにより「発生」が残る。その結果、図８のステップＳ２１０で、「文節内名詞」として「発生」が抽出される。 In step S207 of FIG. 8, the correcting unit 15 includes “function” and “foot” because “function” and “foot” are continuous among the extracted nouns, and “functional foot” is the word before correction. "Is excluded. As a result, “the other day” and “occurrence” remain. In step S209 in FIG. 8, the correction unit 15 is a character string such as “temporal noun” in which “the other day” is stored in advance, and has a low probability of being used simultaneously with the character string to be corrected. "Is excluded. As a result, “occurrence” remains. As a result, “occurrence” is extracted as “noun in phrase” in step S210 of FIG.

補正部１５は、図８のステップＳ２１２で、「機能足」の文字列を、「聞き逃し」の文字列に補正する。補正部１５は、図８のステップＳ２１３で、「先日の打ち合わせで聞き逃しが発生したでしょ。」というテキストデータを、音声認識結果記憶部１２Ｂに格納する。 In step S212 of FIG. 8, the correcting unit 15 corrects the character string “functional foot” to the character string “missing”. In step S213 of FIG. 8, the correction unit 15 stores text data “Learning missed in the previous meeting” in the speech recognition result storage unit 12B.

次に、図１０を参照して、補正辞書記憶部１２Ｃの登録処理の詳細例について説明する。図１０は、補正辞書の登録処理の一例を示すフローチャートである。 Next, a detailed example of registration processing in the correction dictionary storage unit 12C will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of a correction dictionary registration process.

登録部１６は、音声認識結果記憶部１２Ｂに格納されているテキストデータを画面に表示する（ステップＳ３０１）。 The registration unit 16 displays the text data stored in the voice recognition result storage unit 12B on the screen (step S301).

続いて、登録部１６は、ユーザからの、当該テキストデータに対する編集操作を受け付ける（ステップＳ３０２）。 Subsequently, the registration unit 16 receives an editing operation for the text data from the user (step S302).

続いて、登録部１６は、当該テキストデータにおいて編集操作とされた文字列を含む文節の中から、「文節内名詞」を抽出する（ステップＳ３０３）。なお、「文節内名詞」の抽出は、図８のステップＳ２０２〜ステップＳ２１０と同様の処理により行う。 Subsequently, the registration unit 16 extracts “in-phrase nouns” from the phrases including the character string that has been edited in the text data (step S303). Note that the “noun in a phrase” is extracted by the same processing as Step S202 to Step S210 in FIG.

続いて、登録部１６は、発着信履歴や電話帳から取得した音声の送信元の識別情報に対応付けて、編集前の文字列、編集後の文字列、ステップＳ３０３で抽出した「文節内名詞」を、補正辞書記憶部１２Ｃの「補正前の文字列」、「補正後の文字列」、「文節内名詞」の項目にそれぞれ登録する（ステップＳ３０４）。 Subsequently, the registration unit 16 associates with the identification information of the transmission source of the voice acquired from the outgoing / incoming call history or the phone book, the character string before editing, the character string after editing, and the “noun in phrase” extracted in step S303. Are registered in the items of “character string before correction”, “character string after correction”, and “noun within phrase” in the correction dictionary storage unit 12C (step S304).

次に、図１１を参照して、補正辞書の登録処理の具体例について説明する。図１１は、補正辞書の登録処理の具体例を示す図である。 Next, a specific example of the correction dictionary registration process will be described with reference to FIG. FIG. 11 shows a specific example of correction dictionary registration processing.

図１１（Ａ）は、図１０のステップＳ３０１で、音声認識結果記憶部１２Ｂに格納されているテキストデータを画面に表示している際の表示画面例である。図１１（Ａ）の例では、「聞き逃しが発生したでしょ。」というテキストデータ５０１が表示されている。図１１（Ｂ）〜図１１（Ｅ）は、図１０のステップＳ３０２で、テキストデータに対する編集操作を受け付ける際の操作と表示画面例を示す図である。 FIG. 11A is a display screen example when the text data stored in the speech recognition result storage unit 12B is displayed on the screen in step S301 of FIG. In the example of FIG. 11 (A), text data 501 “Hit was missed” is displayed. FIG. 11B to FIG. 11E are diagrams showing an operation and a display screen example when accepting an editing operation on text data in step S302 of FIG.

例えば、図１１（Ｂ）のように、登録部１６は、誤認識された文字列５０２の選択操作を受け付ける。続いて、図１１（Ｃ）のように登録部１６は、選択された文字列５０２の長押し操作を受け付ける。続いて、登録部１６は、図１１（Ｄ）のように、「補正辞書に登録」する旨のポップアップメニュー５０３を表示し、当該ポップアップメニュー５０３の押下操作を受け付ける。続いて、登録部１６は、図１１（Ｄ）のように、補正前の文字列５０２と、補正後の文字列の入力欄５０５を表示し、入力欄５０５への補正後の文字列の入力操作を受け付ける。 For example, as illustrated in FIG. 11B, the registration unit 16 receives an operation of selecting a character string 502 that has been erroneously recognized. Subsequently, as illustrated in FIG. 11C, the registration unit 16 accepts a long press operation of the selected character string 502. Subsequently, as illustrated in FIG. 11D, the registration unit 16 displays a pop-up menu 503 for “registering in the correction dictionary” and accepts a pressing operation of the pop-up menu 503. Subsequently, as shown in FIG. 11D, the registration unit 16 displays a character string 502 before correction and a character string input field 505 after correction, and inputs the corrected character string into the input field 505. Accept the operation.

その後、登録部１６は、発着信履歴や電話帳から取得した音声の送信元の識別情報に対応付けて、ユーザが入力した「補正前の文字列」及び「補正後の文字列」と、抽出した「文節内名詞」とを補正辞書記憶部１２Ｃに登録する。 After that, the registration unit 16 extracts the “character string before correction” and “character string after correction” input by the user in association with the identification information of the voice transmission source acquired from the outgoing / incoming call history and the phone book. The “noun in phrase” is registered in the correction dictionary storage unit 12C.

＜まとめ＞
本実施形態では、音声認識装置１０は、音声の送信元に応じた補正辞書に基づいて、音声認識サーバ３０から取得した音声認識結果のテキストデータに含まれる文字列を補正する。 <Summary>
In the present embodiment, the voice recognition device 10 corrects a character string included in text data of a voice recognition result acquired from the voice recognition server 30 based on a correction dictionary corresponding to a voice transmission source.

それにより、音声認識サーバ３０側の処理負荷を高めずに、音声認識の精度を向上させることができる。また、電話番号等である音声の送信元の識別情報を、音声認識サーバ３０等の外部装置に送出する必要がないため、情報のセキュリティーも確保できる。 Thereby, the accuracy of voice recognition can be improved without increasing the processing load on the voice recognition server 30 side. Further, since it is not necessary to send the identification information of the voice transmission source such as a telephone number to an external device such as the voice recognition server 30, information security can be ensured.

また、補正辞書に、「補正前の文字列」とともに使用される可能性が高い文字列である「文節内名詞」を含め、音声認識結果のテキストデータに、「補正前の文字列」と「文節内名詞」が含まれる場合に、「補正前の文字列」を「補正後の文字列」に補正する。それにより、誤った補正処理を抑えることができる。 In addition, the correction dictionary includes “noun in phrase” which is a character string that is likely to be used together with “character string before correction”, and “text string before correction” and “ When “noun in phrase” is included, “character string before correction” is corrected to “character string after correction”. Thereby, erroneous correction processing can be suppressed.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

以上の説明に関し、更に以下の項を開示する。
（付記１）
音声を入力する端末から受信した音声データと、前記端末に関する識別情報とを対応付けて記録する記録部と、
外部装置から、前記音声データを音声認識した結果であるテキストデータを取得する取得部と、
前記識別情報に応じた、文字列の補正辞書を用いて、前記テキストデータに含まれる文字列を補正する補正部と、
を備えることを特徴とする音声認識装置。
（付記２）
前記補正部は、前記テキストデータに含まれる第１の文字列を、前記補正辞書において前記第１の文字列に対応付けて登録されている第２の文字列に補正することを特徴とする、付記１記載の音声認識装置。
（付記３）
前記取得部によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータに対応する前記識別情報に応じた補正辞書に、編集前の文字列に対応付けて編集後の文字列を登録する登録部を備えることを特徴とする、付記１または２記載の音声認識装置。
（付記４）
前記登録部は、前記取得部によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータから、前記編集前の文字列とは別の文字列を抽出し、前記編集前の文字列と前記別の文字列とに対応付けて、前記編集後の文字列を、当該テキストデータに対応する前記識別情報に応じた補正辞書に登録し、
前記補正部は、前記外部装置から取得されたテキストデータに、前記編集前の文字列及び前記別の文字列が含まれる場合、前記編集前の文字列について補正を行うことを特徴とする、付記３に記載の音声認識装置。
（付記５）
音声認識装置が、
音声を入力する端末から受信した音声データと、前記端末に関する識別情報とを対応付けて記録する処理と、
外部装置から、前記音声データを音声認識した結果であるテキストデータを取得する処理と、
前記識別情報に応じた、文字列の補正辞書を用いて、前記テキストデータに含まれる文字列を補正する処理と、
を実行することを特徴とする音声認識方法。
（付記６）
前記補正する処理は、前記テキストデータに含まれる第１の文字列を、前記補正辞書において前記第１の文字列に対応付けて登録されている第２の文字列に補正することを特徴とする、付記５記載の音声認識方法。
（付記７）
前記取得する処理によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータに対応する前記識別情報に応じた補正辞書に、編集前の文字列に対応付けて編集後の文字列を登録する処理を実行することを特徴とする、付記５または６記載の音声認識方法。
（付記８）
前記登録する処理は、前記取得する処理によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータから、前記編集前の文字列とは別の文字列を抽出し、前記編集前の文字列と前記別の文字列とに対応付けて、前記編集後の文字列を、当該テキストデータに対応する前記識別情報に応じた補正辞書に登録し、
前記補正する処理は、前記外部装置から取得されたテキストデータに、前記編集前の文字列及び前記別の文字列が含まれる場合、前記編集前の文字列について補正をすることを特徴とする、付記７に記載の音声認識方法。
（付記９）
音声認識装置に、
音声を入力する端末から受信した音声データと、前記端末に関する識別情報とを対応付けて記録する処理と、
外部装置から、前記音声データを音声認識した結果であるテキストデータを取得する処理と、
前記識別情報に応じた、文字列の補正辞書を用いて、前記テキストデータに含まれる文字列を補正する処理と、
を実行させることを特徴とする音声認識プログラム。
（付記１０）
前記補正する処理は、前記テキストデータに含まれる第１の文字列を、前記補正辞書において前記第１の文字列に対応付けて登録されている第２の文字列に補正することを特徴とする、付記９記載の音声認識プログラム。
（付記１１）
前記取得する処理によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータに対応する前記識別情報に応じた補正辞書に、編集前の文字列に対応付けて編集後の文字列を登録する処理を実行することを特徴とする、付記９または１０記載の音声認識プログラム。
（付記１２）
前記登録する処理は、前記取得する処理によって取得されたテキストデータに含まれるいずれかの文字列に対する編集操作を受け付けると、当該テキストデータから、前記編集前の文字列とは別の文字列を抽出し、前記編集前の文字列と前記別の文字列とに対応付けて、前記編集後の文字列を、当該テキストデータに対応する前記識別情報に応じた補正辞書に登録し、
前記補正する処理は、前記外部装置から取得されたテキストデータに、前記編集前の文字列及び前記別の文字列が含まれる場合、前記編集前の文字列について補正をすることを特徴とする、付記１１に記載の音声認識プログラム。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
A recording unit that records voice data received from a terminal that inputs voice and identification information about the terminal in association with each other;
An acquisition unit that acquires text data that is a result of voice recognition of the voice data from an external device;
A correction unit that corrects a character string included in the text data using a character string correction dictionary according to the identification information;
A speech recognition apparatus comprising:
(Appendix 2)
The correction unit corrects the first character string included in the text data to a second character string registered in association with the first character string in the correction dictionary. The speech recognition apparatus according to appendix 1.
(Appendix 3)
When an editing operation on any character string included in the text data acquired by the acquisition unit is accepted, editing is performed in association with the character string before editing in the correction dictionary corresponding to the identification information corresponding to the text data. The speech recognition apparatus according to appendix 1 or 2, further comprising a registration unit that registers a subsequent character string.
(Appendix 4)
When the registration unit accepts an editing operation on any of the character strings included in the text data acquired by the acquisition unit, the registration unit extracts a character string different from the character string before the editing from the text data, In association with the character string before editing and the other character string, the edited character string is registered in the correction dictionary corresponding to the identification information corresponding to the text data,
The correction unit corrects the character string before editing when the character string before editing and the other character string are included in the text data acquired from the external device. 4. The speech recognition device according to 3.
(Appendix 5)
Voice recognition device
A process of associating and recording voice data received from a terminal that inputs voice and identification information about the terminal;
Processing to obtain text data as a result of voice recognition of the voice data from an external device;
A process of correcting a character string included in the text data using a character string correction dictionary according to the identification information;
The voice recognition method characterized by performing.
(Appendix 6)
In the correcting process, the first character string included in the text data is corrected to a second character string registered in association with the first character string in the correction dictionary. The speech recognition method according to appendix 5.
(Appendix 7)
When an editing operation on any character string included in the text data acquired by the acquisition process is received, the correction dictionary corresponding to the identification information corresponding to the text data is associated with the character string before editing. The speech recognition method according to appendix 5 or 6, wherein a process of registering the edited character string is executed.
(Appendix 8)
When the registering process accepts an editing operation on any of the character strings included in the text data acquired by the acquiring process, the character string different from the character string before the editing is extracted from the text data. Then, in association with the character string before editing and the other character string, the edited character string is registered in a correction dictionary corresponding to the identification information corresponding to the text data,
In the correction process, when the text data acquired from the external device includes the character string before editing and the other character string, the correction is performed on the character string before editing. The speech recognition method according to appendix 7.
(Appendix 9)
In voice recognition device,
A process of associating and recording voice data received from a terminal that inputs voice and identification information about the terminal;
Processing to obtain text data as a result of voice recognition of the voice data from an external device;
A process of correcting a character string included in the text data using a character string correction dictionary according to the identification information;
A speech recognition program characterized in that
(Appendix 10)
In the correcting process, the first character string included in the text data is corrected to a second character string registered in association with the first character string in the correction dictionary. The voice recognition program according to appendix 9.
(Appendix 11)
When an editing operation on any character string included in the text data acquired by the acquisition process is received, the correction dictionary corresponding to the identification information corresponding to the text data is associated with the character string before editing. The speech recognition program according to appendix 9 or 10, wherein a process for registering the edited character string is executed.
(Appendix 12)
When the registering process accepts an editing operation on any of the character strings included in the text data acquired by the acquiring process, the character string different from the character string before the editing is extracted from the text data. Then, in association with the character string before editing and the other character string, the edited character string is registered in a correction dictionary corresponding to the identification information corresponding to the text data,
In the correction process, when the text data acquired from the external device includes the character string before editing and the other character string, the correction is performed on the character string before editing. The speech recognition program according to attachment 11.

１０音声認識装置
１１通信部
１２記憶部
１２Ａ録音データ
１２Ｂ音声認識結果データ
１２Ｃ補正辞書データ
１３記録部
１４取得部
１５補正部
１６登録部 DESCRIPTION OF SYMBOLS 10 Speech recognition apparatus 11 Communication part 12 Storage part 12A Recording data 12B Speech recognition result data 12C Correction dictionary data 13 Recording part 14 Acquisition part 15 Correction part 16 Registration part

Claims

A recording unit that records voice data received from a terminal that inputs voice and identification information about the terminal in association with each other;
An acquisition unit that acquires text data that is a result of voice recognition of the voice data from an external device;
A correction unit that corrects a character string included in the text data using a character string correction dictionary according to the identification information;
A speech recognition apparatus comprising:

The correction unit corrects the first character string included in the text data to a second character string registered in association with the first character string in the correction dictionary. The speech recognition apparatus according to claim 1.

When an editing operation on any character string included in the text data acquired by the acquisition unit is accepted, editing is performed in association with the character string before editing in the correction dictionary corresponding to the identification information corresponding to the text data. The speech recognition apparatus according to claim 1, further comprising a registration unit that registers subsequent character strings.

When the registration unit accepts an editing operation on any of the character strings included in the text data acquired by the acquisition unit, the registration unit extracts a character string different from the character string before the editing from the text data, In association with the character string before editing and the other character string, the edited character string is registered in the correction dictionary corresponding to the identification information corresponding to the text data,
The correction unit corrects the character string before editing when the text data acquired from the external device includes the character string before editing and the other character string. Item 4. The speech recognition device according to Item 3.

Voice recognition device
A process of associating and recording voice data received from a terminal that inputs voice and identification information about the terminal;
Processing to obtain text data as a result of voice recognition of the voice data from an external device;
A process of correcting a character string included in the text data using a character string correction dictionary according to the identification information;
The voice recognition method characterized by performing.

In voice recognition device,
A process of associating and recording voice data received from a terminal that inputs voice and identification information about the terminal;
Processing to obtain text data as a result of voice recognition of the voice data from an external device;
A process of correcting a character string included in the text data using a character string correction dictionary according to the identification information;
A speech recognition program characterized in that