JP6244731B2

JP6244731B2 - Information processing apparatus and information processing program

Info

Publication number: JP6244731B2
Application number: JP2013165985A
Authority: JP
Inventors: 洋平山根; 基行鷹合; 昌嗣外池; 木村　俊一; 俊一木村; 拓也桜井; 瑛一田中
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2013-08-09
Filing date: 2013-08-09
Publication date: 2017-12-13
Anticipated expiration: 2033-08-09
Also published as: JP2015034902A

Description

本発明は、情報処理装置及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus and an information processing program.

特許文献１には、より精度良く、言語解析を行えるようにすることを課題とし、所定の文脈を解析するとき、局所文脈情報としてその文脈に含まれ、処理対象とされている単語の直近と、さらに直近の単語が処理対象とされ、また、大域文脈情報として、処理対象より前に位置する複数の単語が処理対象とされ、大域文脈情報に関しては、予め複数の単語から構成される文脈に関する確率分布が生成され、クラスタＩＤが割り当てられており、処理対象とされている大域文脈情報が、どの確率分布に該当するかが判断され、その該当すると判断された確率分布のクラスタＩＤと、局所文脈情報が用いられ、入力された文章がどのような文章であるかの解析が行われ、言語解析を行う音響認識装置などに適用できることが開示されている。 In Patent Document 1, it is an object to enable language analysis with higher accuracy, and when analyzing a predetermined context, it is included in the context as local context information and the nearest word to be processed Further, the most recent word is a processing target, and as global context information, a plurality of words positioned before the processing target are processing targets, and the global context information is related to a context composed of a plurality of words in advance. A probability distribution is generated, a cluster ID is assigned, and it is determined to which probability distribution the global context information to be processed corresponds, the cluster ID of the probability distribution determined to be applicable, and the local distribution It is disclosed that context information is used, analysis of what kind of sentence is inputted, and it can be applied to an acoustic recognition device that performs language analysis.

特許文献２には、認識精度が高く効率の良い文音声の認識を行うことを目的とし、ラティス内での文節系列の探索の過程で、解析の途中結果から得られる文節系列の意味解析を行い、この意味情報と状況推移とから認識すべき分の意味予測を行い、この情報を制約条件として利用することによって、ある状況において期待しない意味を持った文節系列を解析途中で排除したり、文解析がその途中で文節の脱落により続行不可能となることを防止する文解析手段を認識システムに組み入れるようにしたことが開示されている。 Patent document 2 aims at performing sentence speech recognition with high recognition accuracy and efficiency, and in the process of searching for a phrase sequence in the lattice, performs a semantic analysis of the phrase sequence obtained from the intermediate results of the analysis. Predict the meaning of what should be recognized from this semantic information and situation transition, and use this information as a constraint condition to eliminate phrase sequences with unexpected meanings in certain situations, It is disclosed that sentence analysis means for preventing the analysis from continuing due to the dropping of a phrase in the middle is incorporated in the recognition system.

特許文献３には、音声認識における候補を適正化し音声認識率を向上させた音声認識装置、音声認識プログラム、並びに音声認識装置に用いる言語モデルの生成方法及び言語モデル生成装置を提供することを課題とし、音声認識装置は、発話の定型パターンから生成された言語モデルが格納された言語モデル格納部と、音声の音響特性を含む音響モデルが格納された音響モデル格納部と、言語モデルと音響モデルとを参照し、音声信号を音響分析して文字情報に変換する音声処理部とを備え、言語モデルは、話者が属する組織のＵＲＬによって特定されるサイトから取得された文字情報を含む発話の定型パターンから生成されており、また、音響モデルは、電話音声で学習されたモデルであり、このような言語モデルと音響モデルとを採用しているので、適正な候補から文字情報に変換でき、音声認識率が向上することが開示されている。 Patent Document 3 provides a speech recognition device, a speech recognition program, and a language model generation method and a language model generation device used in the speech recognition device, in which candidates for speech recognition are optimized and the speech recognition rate is improved. The speech recognition apparatus includes a language model storage unit storing a language model generated from a fixed pattern of speech, an acoustic model storage unit storing an acoustic model including acoustic characteristics of speech, a language model, and an acoustic model. And a speech processing unit that acoustically analyzes a speech signal and converts it into character information, and the language model is a speech model including character information acquired from a site specified by the URL of the organization to which the speaker belongs. It is generated from a fixed pattern, and the acoustic model is a model learned by telephone speech, and adopts such a language model and acoustic model. Because there can be converted to character information from the correct candidate, the speech recognition rate has been disclosed to be improved.

特開２００８−１８１５３７号公報JP 2008-181537 A 特開平０１−２６０４９４号公報Japanese Unexamined Patent Publication No. 01-260494 特開２００５−２０８４８３号公報JP 2005-208483 A

本発明は、音声情報の音声を発話した発話者に関連する文書を用いて、その音声情報の認識結果を修正するようにした情報処理装置及び情報処理プログラムを提供することを目的としている。 An object of the present invention is to provide an information processing apparatus and an information processing program that use a document related to a speaker who utters speech of speech information and correct the recognition result of the speech information.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、音声情報を受け付ける第１の受付手段と、前記音声情報の音声を発話した発話者を一意に識別し得る発話者識別情報を受け付ける第２の受付手段と、前記第１の受付手段が受け付けた音声情報を認識する認識手段と、前記第２の受付手段が受け付けた発話者識別情報に関連する文書を取得する取得手段と、前記取得手段によって取得された文書に基づいて、前記認識手段による認識結果を修正する修正手段を具備し、前記修正手段は、前記認識手段による複数の認識結果が、前記取得手段によって取得された文書内で出現する確率を算出し、該確率に基づいて、前記認識手段による認識結果を修正することを特徴とする情報処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
The invention according to claim 1 is a first accepting means for accepting voice information, a second accepting means for accepting speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information, and the first Based on the recognition means for recognizing the speech information received by the receiving means, the acquisition means for acquiring the document related to the speaker identification information received by the second receiving means, and the document acquired by the acquisition means Correcting means for correcting the recognition result by the recognition means , wherein the correction means calculates a probability that a plurality of recognition results by the recognition means appear in the document acquired by the acquisition means, and the probability The information processing apparatus is characterized in that the recognition result by the recognition means is corrected based on the above .

請求項２の発明は、音声情報を受け付ける第１の受付手段と、前記音声情報の音声を発話した発話者を一意に識別し得る発話者識別情報を受け付ける第２の受付手段と、前記第１の受付手段が受け付けた音声情報を認識する認識手段と、前記第２の受付手段が受け付けた発話者識別情報に関連する文書を取得する取得手段と、前記取得手段によって取得された文書に基づいて、前記認識手段による認識結果を修正する修正手段を具備し、前記認識手段は、１つの文節に対して複数の認識結果と、該認識結果についての確信度を出力し、前記修正手段は、１つの文節に対して前記確信度が予め定められた値より高い又は以上である認識結果が複数ある場合は、該複数の認識結果のうちのそれぞれと該文節の前又は後の文節の認識結果との組み合わせが、前記取得手段によって取得された文書内で出現する確率を算出し、該確率に基づいて、前記認識手段による認識結果を修正することを特徴とする情報処理装置である。 The invention according to claim 2 is a first accepting means for accepting voice information, a second accepting means for accepting speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information, and the first Based on the recognition means for recognizing the speech information received by the receiving means, the acquisition means for acquiring the document related to the speaker identification information received by the second receiving means, and the document acquired by the acquisition means And a correcting means for correcting a recognition result by the recognizing means, wherein the recognizing means outputs a plurality of recognition results and a certainty factor for the recognition result for one phrase, When there are a plurality of recognition results with the certainty level higher or higher than a predetermined value for one clause, each of the plurality of recognition results and the recognition result of the clause before or after the clause, Combination Sega, calculates the probability of occurrence in the document obtained by the obtaining means, based on said probability is information processing apparatus characterized by modifying the recognition result by said recognizing means.

請求項３の発明は、前記取得手段は、前記発話者識別情報の発話者が作成した文書であって、前記音声情報の音声を発話した時から予め定められた期間内に作成された文書を取得することを特徴とする請求項１又は２に記載の情報処理装置である。 According to a third aspect of the present invention, the acquisition means is a document created by a speaker of the speaker identification information, and the document is created within a predetermined period from when the voice of the voice information is spoken. an information processing apparatus according to claim 1 or 2, characterized in that to obtain.

請求項４の発明は、コンピュータを、音声情報を受け付ける第１の受付手段と、前記音声情報の音声を発話した発話者を一意に識別し得る発話者識別情報を受け付ける第２の受付手段と、前記第１の受付手段が受け付けた音声情報を認識する認識手段と、前記第２の受付手段が受け付けた発話者識別情報に関連する文書を取得する取得手段と、前記取得手段によって取得された文書に基づいて、前記認識手段による認識結果を修正する修正手段として機能させ、前記修正手段は、前記認識手段による複数の認識結果が、前記取得手段によって取得された文書内で出現する確率を算出し、該確率に基づいて、前記認識手段による認識結果を修正することを特徴とする情報処理プログラムである。
請求項５の発明は、コンピュータを、音声情報を受け付ける第１の受付手段と、前記音声情報の音声を発話した発話者を一意に識別し得る発話者識別情報を受け付ける第２の受付手段と、前記第１の受付手段が受け付けた音声情報を認識する認識手段と、前記第２の受付手段が受け付けた発話者識別情報に関連する文書を取得する取得手段と、前記取得手段によって取得された文書に基づいて、前記認識手段による認識結果を修正する修正手段として機能させ、前記認識手段は、１つの文節に対して複数の認識結果と、該認識結果についての確信度を出力し、前記修正手段は、１つの文節に対して前記確信度が予め定められた値より高い又は以上である認識結果が複数ある場合は、該複数の認識結果のうちのそれぞれと該文節の前又は後の文節の認識結果との組み合わせが、前記取得手段によって取得された文書内で出現する確率を算出し、該確率に基づいて、前記認識手段による認識結果を修正することを特徴とする情報処理プログラムである。 The invention of claim 4 is a first receiving means for receiving voice information, a second receiving means for receiving speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information, Recognizing means for recognizing voice information received by the first receiving means, acquiring means for acquiring a document related to the speaker identification information received by the second receiving means, and a document acquired by the acquiring means Based on the recognition means functioning as a correction means for correcting the recognition result by the recognition means, the correction means calculates a probability that a plurality of recognition results by the recognition means appear in the document acquired by the acquisition means. An information processing program for correcting a recognition result by the recognition means based on the probability .
According to a fifth aspect of the present invention, the computer includes a first receiving unit that receives voice information, a second receiving unit that receives speaker identification information that can uniquely identify a speaker who has spoken the voice of the voice information, Recognizing means for recognizing voice information received by the first receiving means, acquiring means for acquiring a document related to the speaker identification information received by the second receiving means, and a document acquired by the acquiring means Based on the recognition means functioning as a correction means for correcting the recognition result by the recognition means, the recognition means outputs a plurality of recognition results and a certainty factor for the recognition result for one phrase, and the correction means When there are a plurality of recognition results with the certainty level higher or higher than a predetermined value for one phrase, each of the plurality of recognition results and a sentence before or after the phrase An information processing program that calculates a probability that a combination with a recognition result of the above appears in a document acquired by the acquisition unit, and corrects the recognition result by the recognition unit based on the probability .

請求項１の情報処理装置によれば、音声情報の音声を発話した発話者に関連する文書を用いて、その音声情報の認識結果を修正することができる。また、複数の認識結果が、文書内で出現する確率を用いて認識結果を修正することができる。 According to the information processing apparatus of the first aspect, the recognition result of the voice information can be corrected using the document related to the speaker who uttered the voice of the voice information. In addition, the recognition result can be corrected using the probability that a plurality of recognition results appear in the document.

請求項２の情報処理装置によれば、音声情報の音声を発話した発話者に関連する文書を用いて、その音声情報の認識結果を修正することができる。また、本構成を有していない場合に比較して、効率よく認識結果を修正することができる。 According to the information processing apparatus of the second aspect , the recognition result of the voice information can be corrected using the document related to the speaker who uttered the voice of the voice information. In addition, the recognition result can be corrected more efficiently than when the configuration is not provided.

請求項３の情報処理装置によれば、最近の発話者が作成した文書を用いて、認識結果を修正することができる。 According to the information processing apparatus of the third aspect , the recognition result can be corrected using a document created by a recent speaker.

請求項４の情報処理プログラムによれば、音声情報の音声を発話した発話者に関連する文書を用いて、その音声情報の認識結果を修正することができる。また、複数の認識結果が、文書内で出現する確率を用いて認識結果を修正することができる。
請求項５の情報処理プログラムによれば、音声情報の音声を発話した発話者に関連する文書を用いて、その音声情報の認識結果を修正することができる。また、本構成を有していない場合に比較して、効率よく認識結果を修正することができる。 According to the information processing program of claim 4 , the recognition result of the voice information can be corrected using the document related to the speaker who uttered the voice of the voice information. In addition, the recognition result can be corrected using the probability that a plurality of recognition results appear in the document.
According to the information processing program of the fifth aspect, the recognition result of the voice information can be corrected using the document related to the speaker who uttered the voice of the voice information. In addition, the recognition result can be corrected more efficiently than when the configuration is not provided.

本実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 文書テーブルのデータ構造例を示す説明図である。It is explanatory drawing which shows the data structure example of a document table. 会議テーブルのデータ構造例を示す説明図である。It is explanatory drawing which shows the example of a data structure of a meeting table. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な一実施の形態の例を説明する。
図１は、本実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、すべての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, an example of a preferred embodiment for realizing the present invention will be described with reference to the drawings.
FIG. 1 shows a conceptual module configuration diagram of a configuration example of the present embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, the values may be different from each other, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is also included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である情報処理装置１００は、音声を認識するものであって、図１の例に示すように、音声受付モジュール１１０、音声認識モジュール１２０、参加者取得モジュール１３０、文書格納モジュール１４０、文書取得モジュール１５０、音声認識結果修正モジュール１６０、音声認識結果出力モジュール１７０を有している。情報処理装置１００は、複数人が集まって行われる会議等の会合で用いられるものであって、その会合での出席者の発話（音声情報）を認識する。 The information processing apparatus 100 according to the present embodiment recognizes voice, and as shown in the example of FIG. 1, the voice reception module 110, the voice recognition module 120, the participant acquisition module 130, and the document storage module 140. , A document acquisition module 150, a speech recognition result correction module 160, and a speech recognition result output module 170. The information processing apparatus 100 is used in a meeting such as a meeting where a plurality of people gather and recognizes the speech (voice information) of the attendee at the meeting.

音声受付モジュール１１０は、音声認識モジュール１２０と接続されている。音声受付モジュール１１０は、音声情報を受け付ける。例えば、マイクから発話者の音声情報を受け付ける。具体的には、マイクは、携帯端末（例えば、スマートフォンを含む携帯電話等）に内蔵されているものであってもよい。また、音声情報は既に録音されたものであってもよい。つまり、ハードディスク（情報処理装置１００に内蔵されているものの他に、ネットワークを介して接続されているもの等を含む）等に記憶されている音声情報を読み出すこと等が含まれる。 The voice reception module 110 is connected to the voice recognition module 120. The voice reception module 110 receives voice information. For example, the voice information of the speaker is received from a microphone. Specifically, the microphone may be built in a mobile terminal (for example, a mobile phone including a smartphone). The audio information may be already recorded. That is, it includes reading out audio information stored in a hard disk (including those connected via a network in addition to those built in the information processing apparatus 100).

音声認識モジュール１２０は、音声受付モジュール１１０、音声認識結果修正モジュール１６０と接続されている。音声認識モジュール１２０は、音声受付モジュール１１０が受け付けた音声情報を認識する。従来からの音声認識技術を用いればよい。例えば、音声認識結果を文節ラティス形式で出力するようにしてもよい。また、音声認識結果を文節ラティス形式以外、例えばコンフュージョンネットワークなどの形式で出力してもよい。文節ラティス形式、コンフュージョンネットワークは、１つの文節に対して複数の認識結果、認識結果についての確信度が含まれている。つまり、音声認識モジュール１２０は、１つの文節に対して複数の認識結果と、その認識結果についての確信度を出力するようにしてもよい。確信度とは、認識結果の正しさの度合いを、例えば、０から１までの範囲の数値で表現するものである。例えば、音声受付モジュール１１０が受け付けた音声情報と辞書であるパターンとを比較して、差異がない（特徴空間上で一致する）場合は、確信度が高い。 The voice recognition module 120 is connected to the voice reception module 110 and the voice recognition result correction module 160. The voice recognition module 120 recognizes voice information received by the voice reception module 110. Conventional speech recognition technology may be used. For example, the speech recognition result may be output in a phrase lattice format. Further, the speech recognition result may be output in a format other than the phrase lattice format, for example, a confusion network. The phrase lattice format and the confusion network include a plurality of recognition results for one phrase and certainty about the recognition result. That is, the speech recognition module 120 may output a plurality of recognition results and a certainty factor for the recognition results for one phrase. The certainty factor expresses the degree of correctness of the recognition result by, for example, a numerical value ranging from 0 to 1. For example, when the voice information received by the voice reception module 110 is compared with a pattern that is a dictionary, and there is no difference (matches in the feature space), the certainty factor is high.

参加者取得モジュール１３０は、文書取得モジュール１５０と接続されている。参加者取得モジュール１３０は、音声受付モジュール１１０が受け付けた声情報の音声を発話した発話者を、本実施の形態において一意に識別し得る発話者識別情報（ユーザーＩＤ（ＩＤｅｎｔｉｆｉｃａｔｉｏｎ））を受け付ける。例えば、発話直前、発話中又は発話直後に、会合への出席者が自らのユーザーＩＤを、キーボード等を用いて入力するようにしてもよいし、出席者が所有しているＩＣカード、携帯端末等からユーザーＩＤを読み取るようにしてもよい。携帯端末に内蔵されているマイクを用いる場合は、発話を契機としてその携帯端末内に内蔵されているユーザーＩＤを抽出してもよい。また、予め出席者の声紋を取得しておき、音声情報の声紋と一致するユーザーＩＤを抽出してもよい。また、会議室に設置されたカメラが撮影した画像から、発話している出席者の顔を認識し、そのユーザーＩＤを取得するようにしてもよい。さらに、電子的な会議室予約システム等から、その会合への出席者を特定し、その特定した出席者内から、出席者の顔を認識し、そのユーザーＩＤを取得するようにしてもよい。 The participant acquisition module 130 is connected to the document acquisition module 150. The participant acquisition module 130 receives speaker identification information (user ID (IDentification)) that can uniquely identify a speaker who has spoken the voice of the voice information received by the voice reception module 110 in the present embodiment. For example, immediately before speaking, during speaking or immediately after speaking, attendees of the meeting may input their user IDs using a keyboard or the like, or an IC card or portable terminal owned by the attendees The user ID may be read from the above. When using the microphone built in the portable terminal, the user ID built in the portable terminal may be extracted with the utterance as an opportunity. Alternatively, the voice print of the attendee may be acquired in advance, and the user ID that matches the voice print of the voice information may be extracted. Further, the face of the participant who is speaking may be recognized from an image taken by a camera installed in the conference room, and the user ID may be acquired. Furthermore, an attendee to the meeting may be identified from an electronic conference room reservation system or the like, and the face of the attendee may be recognized from within the identified attendee, and the user ID may be acquired.

文書格納モジュール１４０は、文書取得モジュール１５０と接続されている。文書格納モジュール１４０は、文書を記憶している。また、その文書を管理するための情報も記憶している。例えば、文書テーブル３００を記憶している。図３は、文書テーブル３００のデータ構造例を示す説明図である。文書テーブル３００は、文書ＩＤ欄３１０、文書名欄３２０、作成者ＩＤ欄３３０、作成日時欄３４０、編集者ＩＤ欄３５０、編集日時欄３６０、閲覧者ＩＤ欄３７０、閲覧日時欄３８０を有している。文書ＩＤ欄３１０は、文書を本実施の形態において一意に識別するための情報（文書ＩＤ）を記憶している。文書名欄３２０は、その文書の文書名を記憶している。作成者ＩＤ欄３３０は、その文書の作成者（ユーザー）を本実施の形態において一意に識別するための情報（作成者ＩＤ、ユーザーＩＤ）を記憶している。作成日時欄３４０は、その文書が作成された作成日時（年、月、日、時、分、秒、秒以下、又はこれらの組み合わせであってもよい）を記憶している。編集者ＩＤ欄３５０は、その文書の編集者（ユーザー）を本実施の形態において一意に識別するための情報（編集者ＩＤ、ユーザーＩＤ）を記憶している。編集日時欄３６０は、その文書に対してその編集が行われた編集日時を記憶している。閲覧者ＩＤ欄３７０は、その文書の閲覧者（ユーザー）を本実施の形態において一意に識別するための情報（閲覧者ＩＤ、ユーザーＩＤ）を記憶している。閲覧日時欄３８０は、その文書に対してその閲覧が行われた閲覧日時を記憶している。
その文書に対して編集が複数回行われた場合は、編集者ＩＤ欄３５０、編集日時欄３６０の組がその回数分だけある。また同様に、その文書に対して閲覧が複数回行われた場合は、閲覧者ＩＤ欄３７０、閲覧日時欄３８０の組がその回数分だけある。
文書格納モジュール１４０が記憶している文書は、発話者、会議の参加者が関連（具体的には、作成、編集、閲覧等）した文書が含まれていればよい。例えば、発表用資料、議事録等がある。 The document storage module 140 is connected to the document acquisition module 150. The document storage module 140 stores a document. Information for managing the document is also stored. For example, the document table 300 is stored. FIG. 3 is an explanatory diagram showing an example of the data structure of the document table 300. The document table 300 includes a document ID column 310, a document name column 320, a creator ID column 330, a creation date / time column 340, an editor ID column 350, an edit date / time column 360, a viewer ID column 370, and a browse date / time column 380. ing. The document ID column 310 stores information (document ID) for uniquely identifying a document in the present embodiment. The document name column 320 stores the document name of the document. The creator ID column 330 stores information (creator ID, user ID) for uniquely identifying the creator (user) of the document in the present embodiment. The creation date / time column 340 stores the creation date / time (year, month, day, hour, minute, second, second or less, or a combination thereof) when the document was created. The editor ID column 350 stores information (editor ID, user ID) for uniquely identifying the editor (user) of the document in the present embodiment. The edit date / time column 360 stores the edit date / time when the document was edited. The viewer ID column 370 stores information (viewer ID, user ID) for uniquely identifying the viewer (user) of the document in the present embodiment. The browsing date / time column 380 stores the browsing date / time when the document was browsed.
When the document is edited a plurality of times, there are as many sets of the editor ID column 350 and the edit date / time column 360 as the number of times. Similarly, when the document is browsed a plurality of times, there are a set of the viewer ID column 370 and the browse date / time column 380 corresponding to the number of times.
The document stored in the document storage module 140 may include a document related to a speaker or a conference participant (specifically, created, edited, viewed, etc.). For example, there are presentation materials and minutes.

文書取得モジュール１５０は、参加者取得モジュール１３０、文書格納モジュール１４０、音声認識結果修正モジュール１６０と接続されている。文書取得モジュール１５０は、参加者取得モジュール１３０が受け付けた発話者識別情報に関連する文書を、文書格納モジュール１４０から取得する。ここで発話者識別情報に関連する文書とは、発話者識別情報の発話者が、作成した文書、編集した文書、閲覧した文書のうち、いずれか又はこれらの組み合わせがある。また、優先順位を付してもよい。例えば、発話者が、作成した文書、編集した文書、閲覧した文書の順位で、音声認識結果修正モジュール１６０による認識結果修正のための重み付けの係数を重くするように付してもよい。
また、文書取得モジュール１５０は、発話者識別情報の発話者が作成した文書であって、音声情報の音声を発話した時から予め定められた期間内に作成された文書を取得するようにしてもよい。「発話した時」とは、音声受付モジュール１１０が音声情報を受け付けた時点、音声情報に発話した日時が付されている場合は、その日時を抽出すればよい。
また、文書取得モジュール１５０は、会議の参加者識別情報（参加者ＩＤ）に関連する文書を、文書格納モジュール１４０から取得するようにしてもよい。もちろんのことながら、発話者識別情報は参加者識別情報に含まれる。例えば、会議テーブル４００から参加者識別情報を取得する。図４は、会議テーブル４００のデータ構造例を示す説明図である。会議テーブル４００は、会議ＩＤ欄４１０、会議名欄４２０、日時欄４３０、参加者ＩＤ欄４４０を有している。会議ＩＤ欄４１０は、会議を本実施の形態において一意に識別するための情報（会議ＩＤ）を記憶している。会議名欄４２０は、その会議の会議名を記憶している。日時欄４３０は、その会議が行われた日時を記憶している。参加者ＩＤ欄４４０は、その会議において参加した参加者ＩＤ（ユーザーＩＤ）を記憶している。文書取得モジュール１５０は、情報処理装置１００が受け付けた音声情報が発話された会議における参加者ＩＤを、会議テーブル４００から取得する。そして、その参加者ＩＤが、文書テーブル３００の作成者ＩＤ欄３３０、編集者ＩＤ欄３５０、閲覧者ＩＤ欄３７０に記憶されている作成者ＩＤ、編集者ＩＤ、閲覧者ＩＤと合致するか否かを判断し、合致した文書ＩＤから、その文書を取得する。なお、会議テーブル４００は、情報処理装置１００内に記憶されていてもよいし、会議室の予約管理等を行う会議管理装置等に記憶されているものから通信回線を介して読み込んでもよい。会議の参加者ＩＤに関連する文書についても、前述の重み付けの処理を適用してもよい。 The document acquisition module 150 is connected to the participant acquisition module 130, the document storage module 140, and the speech recognition result correction module 160. The document acquisition module 150 acquires a document related to the speaker identification information received by the participant acquisition module 130 from the document storage module 140. Here, the document related to the speaker identification information includes any one or a combination of a document created, edited, and viewed by the speaker of the speaker identification information. Moreover, you may give priority. For example, the speaker may add the weighting coefficient for correcting the recognition result by the speech recognition result correction module 160 in the order of the created document, the edited document, and the viewed document.
Further, the document acquisition module 150 may acquire a document created by a speaker of the speaker identification information and created within a predetermined period from when the voice information is spoken. Good. “When uttered” means that when the voice reception module 110 receives voice information, and the date and time of utterance is attached to the voice information, the date and time may be extracted.
Further, the document acquisition module 150 may acquire a document related to the conference participant identification information (participant ID) from the document storage module 140. Of course, the speaker identification information is included in the participant identification information. For example, participant identification information is acquired from the conference table 400. FIG. 4 is an explanatory diagram showing an example of the data structure of the conference table 400. The conference table 400 includes a conference ID column 410, a conference name column 420, a date / time column 430, and a participant ID column 440. The conference ID column 410 stores information (conference ID) for uniquely identifying a conference in the present embodiment. The meeting name column 420 stores the meeting name of the meeting. The date and time column 430 stores the date and time when the meeting was held. The participant ID column 440 stores a participant ID (user ID) who participated in the conference. The document acquisition module 150 acquires from the conference table 400 the participant ID in the conference in which the audio information received by the information processing apparatus 100 is spoken. Whether or not the participant ID matches the creator ID, editor ID, and viewer ID stored in the creator ID column 330, the editor ID column 350, and the viewer ID column 370 of the document table 300. And the document is acquired from the matched document ID. The conference table 400 may be stored in the information processing apparatus 100, or may be read via a communication line from what is stored in a conference management apparatus that performs conference room reservation management or the like. The above-described weighting process may be applied to a document related to the conference participant ID.

音声認識結果修正モジュール１６０は、音声認識モジュール１２０、文書取得モジュール１５０、音声認識結果出力モジュール１７０と接続されている。音声認識結果修正モジュール１６０は、文書取得モジュール１５０によって取得された文書に基づいて、音声認識モジュール１２０による認識結果を修正する。
また、音声認識結果修正モジュール１６０は、音声認識モジュール１２０よる複数の認識結果が、文書取得モジュール１５０によって取得された文書内で出現する確率を算出し、その確率に基づいて、音声認識モジュール１２０による認識結果を修正するようにしてもよい。
また、音声認識結果修正モジュール１６０は、１つの文節に対して確信度が予め定められた値より高い又は以上である認識結果が複数ある場合は、その複数の認識結果のうちのそれぞれとその文節の前又は後の文節の認識結果との組み合わせが、文書取得モジュール１５０によって取得された文書内で出現する確率を算出し、その確率に基づいて、音声認識モジュール１２０による認識結果を修正するようにしてもよい。
例えば、文節Ａ、文節Ｂ、文節Ｃがあり、文節Ａ、文節Ｃに対しては、確信度が予め定められた値より高いものは１つであり、それぞれ認識結果をＡ１、Ｃ１とする。文節Ｂに対して、複数の認識結果Ｂ１、Ｂ２があり、ともに予め定められた値より高い場合は、
（１）Ａ１Ｂ１Ｃ１
（２）Ａ１Ｂ２Ｃ１
の組み合わせを作成する。そして、（１）の組み合わせが文書取得モジュール１５０で取得した文書内で発生する確率、（２）の組み合わせがその文書内で発生する確率をそれぞれ求める。
また、音声認識結果修正モジュール１６０は、前述のように、文書毎の重みを用いて、確率を算出するようにしてもよい。例えば、確率に重みを乗算して、最終的な確率としてもよい。 The voice recognition result correction module 160 is connected to the voice recognition module 120, the document acquisition module 150, and the voice recognition result output module 170. The voice recognition result correction module 160 corrects the recognition result by the voice recognition module 120 based on the document acquired by the document acquisition module 150.
Further, the speech recognition result correction module 160 calculates the probability that a plurality of recognition results by the speech recognition module 120 will appear in the document acquired by the document acquisition module 150, and based on the probability, the speech recognition module 120 The recognition result may be corrected.
In addition, when there are a plurality of recognition results whose certainty factor is higher than or higher than a predetermined value for one phrase, the speech recognition result correction module 160 includes each of the plurality of recognition results and the phrase. The probability that the combination with the phrase recognition result before or after the phrase appears in the document acquired by the document acquisition module 150 is calculated, and the recognition result by the speech recognition module 120 is corrected based on the probability. May be.
For example, there are a clause A, a clause B, and a clause C. For the clause A and the clause C, there is one with a certainty factor higher than a predetermined value, and the recognition results are A1 and C1, respectively. When there are a plurality of recognition results B1 and B2 for the phrase B and both are higher than a predetermined value,
(1) A1 B1 C1
(2) A1 B2 C1
Create a combination of Then, the probability that the combination (1) occurs in the document acquired by the document acquisition module 150 and the probability that the combination (2) occurs in the document are obtained.
Further, as described above, the speech recognition result correction module 160 may calculate the probability using the weight for each document. For example, the final probability may be obtained by multiplying the probability by a weight.

音声認識結果出力モジュール１７０は、音声認識結果修正モジュール１６０と接続されている。音声認識結果出力モジュール１７０は、音声認識結果修正モジュール１６０によって修正された音声認識結果を出力する。音声認識結果を出力するとは、例えば、ディスプレイ等の表示装置に表示すること、会議録データベース等の記憶装置へ音声認識結果を書き込むこと、メモリーカード等の記憶媒体に記憶すること、議事録作成装置等の情報処理装置に渡すこと等が含まれる。 The voice recognition result output module 170 is connected to the voice recognition result correction module 160. The voice recognition result output module 170 outputs the voice recognition result corrected by the voice recognition result correction module 160. Outputting the speech recognition result includes, for example, displaying on a display device such as a display, writing the speech recognition result in a storage device such as a conference database, storing it in a storage medium such as a memory card, and a minutes creation device And the like, and the like.

図２は、本実施の形態による処理例を示すフローチャートである。
ステップＳ２０２では、音声受付モジュール１１０が、発話者ＩＤ付きの音声データを受け付ける。
ステップＳ２０４では、音声認識モジュール１２０が、ステップＳ２０２で受け付けた音声データの音声認識を行う。
ステップＳ２０６では、参加者取得モジュール１３０が、会議の参加者ＩＤを取得する。
ステップＳ２０８では、文書取得モジュール１５０が、文書格納モジュール１４０から参加者ＩＤの参加者が関与した文書を取得する。参加者ＩＤには、発話者ＩＤが含まれる。したがって、少なくとも発話者ＩＤの発話者が関与した文書を取得する。
ステップＳ２１０では、音声認識結果修正モジュール１６０が、音声認識結果を、ステップＳ２０８で取得した文書を用いて修正する。詳細については、図５又は図６の例に示すフローチャートを用いて後述する。
ステップＳ２１２では、音声認識結果出力モジュール１７０が、ステップＳ２１０で修正された音声認識結果を出力する。
なお、ステップＳ２０２、ステップＳ２０４の処理とステップＳ２０６、ステップＳ２０８の処理は、いずれが先であってもよいし、並列的に処理を行ってもよい。 FIG. 2 is a flowchart showing an example of processing according to this embodiment.
In step S202, the voice reception module 110 receives voice data with a speaker ID.
In step S204, the voice recognition module 120 performs voice recognition of the voice data received in step S202.
In step S206, the participant acquisition module 130 acquires a conference participant ID.
In step S <b> 208, the document acquisition module 150 acquires a document in which the participant with the participant ID is involved from the document storage module 140. The participant ID includes a speaker ID. Therefore, a document in which at least the speaker with the speaker ID is involved is acquired.
In step S210, the speech recognition result correction module 160 corrects the speech recognition result using the document acquired in step S208. Details will be described later with reference to the flowchart shown in the example of FIG.
In step S212, the speech recognition result output module 170 outputs the speech recognition result corrected in step S210.
Note that any of the processing in step S202 and step S204 and the processing in step S206 and step S208 may be performed first or may be performed in parallel.

図５は、本実施の形態による処理例を示すフローチャートである。ステップＳ２１０の詳細な処理例（１）を説明するものである。
ステップＳ５０２では、文節に対する複数の認識結果から１つの認識結果を抽出する。例えば、音声認識結果として次のものを受け付ける。各行は、１つの文節（１つの区間の音声情報）に対する複数の認識結果を示している。なお、ここで“−”は空白を表すものとする。
・（確認）（各）（悪人）
・（−）（二）
・（の）（−）（な）（が）
・（作業）（産業）
・（に）
・（係る）（かかる）
・（コース）（工数）（ホース）
・（が）（か）（家）（科）
各認識結果から１つの認識結果を抽出する。例えば、「（確認）（各）（悪人）」から（確認）を抽出する。他の行（同じ文節に対する複数の認識結果）からも、１つずつ抽出する。 FIG. 5 is a flowchart showing an example of processing according to the present embodiment. A detailed processing example (1) of step S210 will be described.
In step S502, one recognition result is extracted from a plurality of recognition results for the phrase. For example, the following is accepted as a voice recognition result. Each row shows a plurality of recognition results for one clause (speech information of one section). Here, “-” represents a blank.
・ (Confirmation) (Each) (Evil)
・ (−) (2)
(No) (-) (na) (ga)
・ (Work) (Industry)
・ (To)
・ (It depends) (It takes)
・ (Course) (Man-hours) (Hose)
・ (Ga) (ka) (house) (family)
One recognition result is extracted from each recognition result. For example, (confirmation) is extracted from “(confirmation) (each) (bad person)”. One by one is extracted from other lines (a plurality of recognition results for the same phrase).

ステップＳ５０４では、ステップＳ５０２で抽出した認識結果の組を生成する。前述の例では、「（確認）（−）（の）（作業）（に）（係る）（コース）（が）」という組を生成する。もちろんのことながら、ステップＳ５０８で処理が戻って、２回目以降の処理の場合は、違う組み合わせとなる。例えば、「（各）（−）（の）（作業）（に）（係る）（コース）（が）」となる。
ステップＳ５０６では、生成した文字列の組の文書における出現確率を算出する。ここでの対象となる文書は、ステップＳ２０８で取得した文書である。
出現確率は以下の通り計算する。
出現確率＝すべての語が出現する文書数／文書数 In step S504, a set of recognition results extracted in step S502 is generated. In the above-described example, a pair of “(confirmation) (−) (of) (work) (to) (related) (course) (ga)” is generated. Of course, the process returns in step S508, and the second and subsequent processes are different combinations. For example, “(each) (−) (no) (work) (ni) (related) (course) (ga)”.
In step S506, the appearance probability of the generated character string set in the document is calculated. The target document here is the document acquired in step S208.
Appearance probability is calculated as follows.
Appearance probability = number of documents in which all words appear / number of documents

ステップＳ５０８では、すべての組み合わせを生成したか否かを判断し、生成した場合はステップＳ５１０へ進み、それ以外の場合はステップＳ５０２へ戻る。つまり、同じ区間に対する複数の認識結果のうち、１つずつ抽出した組み合わせをすべてに対して、ステップＳ５０２〜ステップＳ５０６までの処理を行うものである。
ステップＳ５１０では、認識結果を確定する。具体的には、出現確率が最高値の認識結果の組み合わせを、認識結果として確定する。
ステップＳ５１２では、確定した認識結果を音声認識結果出力モジュール１７０に渡す。 In step S508, it is determined whether all combinations have been generated. If they have been generated, the process proceeds to step S510, and otherwise, the process returns to step S502. That is, the processing from step S502 to step S506 is performed on all combinations extracted one by one from among a plurality of recognition results for the same section.
In step S510, the recognition result is confirmed. Specifically, a combination of recognition results having the highest appearance probability is determined as the recognition result.
In step S512, the confirmed recognition result is transferred to the speech recognition result output module 170.

図６は、本実施の形態による処理例を示すフローチャートである。ステップＳ２１０の詳細な処理例（２）を説明するものである。図５の例に示したフローチャートの処理では、認識結果の確信度を用いていなかったが、図６では、確信度を用いた処理例を説明する。
ステップＳ６０２では、認識結果から確信度が閾値（予め定められた値）以上のものを抽出する。
音声認識結果として、以下に示す文節ラティス例では、一つ目の文節では“確認”、“格”、“悪人”が出力されており、それぞれ文節の確信度として、０．９２２、０．０４２、０．０３７が出力されている。
同様にして２つ目の文節では“−”、“二”が出力されている。ここで“−”は空白を表すものとする。
・（確認：０．９２２）（各：０．０４２）（悪人：０．０３７）
・（−：０．９５８）（二：０．０４２）
・（の：０．８２３）（−：０．０９４）（な：０．０４０）（が：０．０３４）
・（作業：０．９５２）（産業：０．０４８）
・（に：１．０００）
・（係る：０．５９２）（かかる：０．４０９）
・（コース：０．８２５）（工数：０．１２０）（ホース：０．０５５）
・（が：０．８９９）（か：０．０５１）（家：０．０２６）（科：０．０２４）
音声認識の結果である文節ラティスより、文節の確信度が閾値以上であるものを選び出す。以下に、閾値を“０．１”とした場合の選び出した例を示す。
・（確認：０．９２２）
・（−：０．９５８）
・（の：０．８２３）
・（作業：０．９５２）
・（に：１．０００）
・（係る：０．５９２）（かかる：０．４０９）
・（コース：０．８２５）（工数：０．１２０）
・（が：０．８９９） FIG. 6 is a flowchart showing an example of processing according to this embodiment. A detailed processing example (2) of step S210 will be described. In the process of the flowchart shown in the example of FIG. 5, the certainty factor of the recognition result is not used, but in FIG. 6, a processing example using the certainty factor will be described.
In step S602, a recognition result having a certainty factor equal to or higher than a threshold (predetermined value) is extracted.
As an example of the phrase lattice shown below, “confirmation”, “case”, and “bad person” are output as speech recognition results, and the certainty levels of the phrases are 0.922 and 0.042, respectively. , 0.037 is output.
Similarly, “−” and “2” are output in the second clause. Here, “-” represents a blank.
・ (Confirmation: 0.922) (Each: 0.042) (Bad: 0.037)
・ (-: 0.958) (2: 0.042)
・ (No: 0.823) (-: 0.094) (na: 0.040) (ga: 0.034)
・ (Work: 0.952) (Industry: 0.048)
・ (To: 1.000)
・ (According: 0.592) (According: 0.409)
・ (Course: 0.825) (Man-hour: 0.120) (Hose: 0.055)
・ (Ga: 0.899) (or: 0.051) (house: 0.026) (family: 0.024)
From the phrase lattice that is the result of speech recognition, a phrase having a certainty of the phrase equal to or greater than a threshold is selected. The following is an example of selection when the threshold value is “0.1”.
・ (Confirmation: 0.922)
・ (-: 0.958)
・ (No: 0.823)
・ (Work: 0.952)
・ (To: 1.000)
・ (According: 0.592) (According: 0.409)
・ (Course: 0.825) (Man-hour: 0.120)
・ (G: 0.899)

ステップＳ６０４では、１つの文節に対して、抽出した認識結果が複数あるか否かを判断し、複数ある場合はステップＳ６０６へ進み、それ以外の場合はステップＳ６１２へ進む。つまり、文節中に確信度が閾値以上のものが一つの場合、それをその文節の最終的な解析結果とする。前述の例では、（確認：０．９２２）、（−：０．９５８）、（の：０．８２３）、（作業：０．９５２）、（に：１．０００）、（が：０．８９９）がそれにあたる。同じ文節中に、確信度が閾値以上のものが複数存在する場合、ステップＳ６０６以降の処理を行う。 In step S604, it is determined whether or not there are a plurality of extracted recognition results for one phrase. If there are a plurality of recognition results, the process proceeds to step S606, otherwise the process proceeds to step S612. In other words, if there is one phrase with a certainty level greater than or equal to the threshold, that is used as the final analysis result of the phrase. In the above example, (confirmation: 0.922), (-: 0.958), (no: 0.823), (operation: 0.952), (ni: 1.000), (but: 0.0. 899) is that. If there are a plurality of phrases having a certainty level equal to or greater than the threshold value in the same phrase, the processes after step S606 are performed.

ステップＳ６０６では、確定した認識結果（文節中に確信度が閾値以上のものが一つ）と不確定の認識結果（文節中に確信度が閾値以上のものが複数）のそれぞれとによって構成される組を生成する。つまり、解析結果が確定している語と候補の語が、ステップＳ２０８で取得した文書に出現する確率をそれぞれ計算する。このとき、すべての確定している語を用いてもよい。また、品詞情報を用いて自立語や名詞・未知語のみを対象に限定してもよい。つまり、「に」等の助詞、助動詞等を対象から除くようにしてもよい。
組を生成するのに、不確定の認識結果に対して、前又は後の認識結果を組み合わせる。
前述の例では、“係る”と“かかる”のそれぞれについて、「確認、の、作業、に、が、係る」と「確認、の、作業、に、が、かかる」の組を生成する。ここでは、不確定の認識結果が出現した位置で、それまでの確定した認識結果との組を生成している。そして、ステップＳ６１０で不確定の認識結果を確定した後に、次の不確定の認識結果まで進んで、組を生成して、これを最後の認識結果まで繰り返すようにしてもよい。
また、ここで、すべての組み合わせ、「確認、の、作業、に、が、係る、コース、が」、「確認、の、作業、に、が、かかる、コース、が」、「確認、の、作業、に、が、係る、工数、が」、「確認、の、作業、に、が、かかる、工数、が」を生成するようにしてもよい。 In step S606, each of the confirmed recognition result (one phrase with a certainty level greater than or equal to a threshold value) and an indeterminate recognition result (a plurality of phrases with a certainty level greater than or equal to a threshold value) is configured. Create a tuple. That is, the probability that the word whose analysis result is confirmed and the candidate word appears in the document acquired in step S208 is calculated. At this time, all confirmed words may be used. Moreover, you may limit only to an independent word, a noun, and an unknown word using a part of speech information. That is, particles such as “ni”, auxiliary verbs, and the like may be excluded from the target.
To generate a set, the previous or subsequent recognition result is combined with the indeterminate recognition result.
In the above-described example, for each of “related” and “related”, a pair of “confirmation, work, related” and “confirmation, related work” is generated. Here, at the position where the uncertain recognition result appears, a pair with the previously recognized recognition result is generated. Then, after confirming the indeterminate recognition result in step S610, the process may proceed to the next indeterminate recognition result, generate a set, and repeat this until the final recognition result.
Also, here, all combinations, “confirmation, work, ga, course, ga”, “confirmation, work, gait, course, ga”, “confirmation, It may be made to generate “work, work, man-hours”, “confirmation, work, work, man-hours”.

ステップＳ６０８では、生成した文字列の組の文書における出現確率を算出する。
前述の例では、「確認、の、作業、に、が、係る」と「確認、の、作業、に、が、かかる」が出現する確率を計算する。出現確率の算出式は前述の通りである。
例えば、ここでは、「確認、の、作業、に、係る」の出現確率が０．００４、「確認、の、作業、に、かかる」の出現確率が０．０１２だったとする。
ステップＳ６１０では、認識結果を確定する。
前述の例では、“かかる”（「確認、の、作業、に、かかる」）の出現確率が“係る”（「確認、の、作業、に、係る」）の出現確率がよりも大きいため、“かかる”を最終的な解析結果として採用する。このとき、最大の出現確率のものを採用してもよいし、出現確率が他よりも閾値以上のものを採用してもよい。後者の場合は先に他の文節の曖昧性を解消してから再度出現確率を計算する。同様にして残りの文節についても最終的な出力結果を決定する。すべての文節について曖昧性を解決したら処理を終了する。
ステップＳ６１２では、ステップＳ６１０で確定した認識結果を音声認識結果出力モジュール１７０に渡す。 In step S608, the appearance probability of the generated character string set in the document is calculated.
In the above-described example, the probability of occurrence of “confirmation of work, related to” and “confirmation of work related to” is calculated. The formula for calculating the appearance probability is as described above.
For example, here, it is assumed that the occurrence probability of “confirmation, work, related” is 0.004, and the appearance probability of “confirmation, work, related” is 0.012.
In step S610, the recognition result is confirmed.
In the above example, the occurrence probability of “takes” (“confirmation, work, take”) is higher than the occurrence probability of “related” (“confirmation, work, take”). “Such” is adopted as the final analysis result. At this time, the one with the maximum appearance probability may be adopted, or the one with the appearance probability higher than the threshold may be adopted. In the latter case, the appearance probability is calculated again after the ambiguity of other phrases is first resolved. Similarly, the final output result is determined for the remaining clauses. When the ambiguity is resolved for all the clauses, the process is terminated.
In step S612, the recognition result determined in step S610 is passed to the speech recognition result output module 170.

なお、本実施の形態としてのプログラムが実行されるコンピュータのハードウェア構成は、図７に例示するように、一般的なコンピュータであり、具体的にはパーソナルコンピュータ、サーバーとなり得るコンピュータ等である。つまり、具体例として、処理部（演算部）としてＣＰＵ７０１を用い、記憶装置としてＲＡＭ７０２、ＲＯＭ７０３、ＨＤ７０４を用いている。ＨＤ７０４として、例えばハードディスクを用いてもよい。音声受付モジュール１１０、音声認識モジュール１２０、参加者取得モジュール１３０、文書取得モジュール１５０、音声認識結果修正モジュール１６０、音声認識結果出力モジュール１７０等のプログラムを実行するＣＰＵ７０１と、そのプログラムやデータを記憶するＲＡＭ７０２と、本コンピュータを起動するためのプログラム等が格納されているＲＯＭ７０３と、文書格納モジュール１４０としての機能を有する補助記憶装置（フラッシュメモリ等であってもよい）であるＨＤ７０４と、キーボード、マウス、タッチパネル等に対する利用者の操作に基づいてデータを受け付ける受付装置７０６と、ＣＲＴ、液晶ディスプレイ等の出力装置７０５と、ネットワークインタフェースカード等の通信ネットワークと接続するための通信回線インタフェース７０７、そして、それらをつないでデータのやりとりをするためのバス７０８により構成されている。これらのコンピュータが複数台互いにネットワークによって接続されていてもよい。 The hardware configuration of the computer on which the program according to the present embodiment is executed is a general computer, specifically a computer that can be a personal computer or a server, as illustrated in FIG. That is, as a specific example, the CPU 701 is used as a processing unit (calculation unit), and the RAM 702, ROM 703, and HD 704 are used as storage devices. For example, a hard disk may be used as the HD 704. A CPU 701 that executes programs such as the voice reception module 110, the voice recognition module 120, the participant acquisition module 130, the document acquisition module 150, the voice recognition result correction module 160, the voice recognition result output module 170, and the like, and stores the programs and data. A RAM 702, a ROM 703 storing a program for starting up the computer, an HD 704 as an auxiliary storage device (may be a flash memory or the like) having a function as the document storage module 140, a keyboard, a mouse A communication device for connecting to a receiving device 706 that accepts data based on a user's operation on a touch panel, an output device 705 such as a CRT or a liquid crystal display, and a communication network such as a network interface card. Interface 707, and, and a bus 708 for exchanging data by connecting them. A plurality of these computers may be connected to each other via a network.

前述の実施の形態のうち、コンピュータ・プログラムによるものについては、本ハードウェア構成のシステムにソフトウェアであるコンピュータ・プログラムを読み込ませ、ソフトウェアとハードウェア資源とが協働して、前述の実施の形態が実現される。
なお、図７に示すハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図７に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えばＡＳＩＣ等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図７に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、特に、パーソナルコンピュータの他、情報家電、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Among the above-described embodiments, the computer program is a computer program that reads the computer program, which is software, in the hardware configuration system, and the software and hardware resources cooperate with each other. Is realized.
Note that the hardware configuration illustrated in FIG. 7 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 7, and is a configuration capable of executing the modules described in the present embodiment. I just need it. For example, some modules may be configured by dedicated hardware (for example, ASIC), and some modules may be in an external system and connected via a communication line. A plurality of systems shown in FIG. 5 may be connected to each other via communication lines so as to cooperate with each other. In particular, in addition to personal computers, information appliances, copiers, fax machines, scanners, printers, and multifunction machines (image processing apparatuses having two or more functions of scanners, printers, copiers, fax machines, etc.) Etc. may be incorporated.

また、前述の実施の形態の説明において、予め定められた値との比較において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。 Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more than”, and “less than” may be used.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１００…情報処理装置
１１０…音声受付モジュール
１２０…音声認識モジュール
１３０…参加者取得モジュール
１４０…文書格納モジュール
１５０…文書取得モジュール
１６０…音声認識結果修正モジュール
１７０…音声認識結果出力モジュール DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus 110 ... Voice reception module 120 ... Voice recognition module 130 ... Participant acquisition module 140 ... Document storage module 150 ... Document acquisition module 160 ... Voice recognition result correction module 170 ... Voice recognition result output module

Claims

First receiving means for receiving voice information;
Second receiving means for receiving speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information;
Recognizing means for recognizing voice information received by the first receiving means;
Obtaining means for obtaining a document related to the speaker identification information received by the second receiving means;
Correction means for correcting the recognition result by the recognition means based on the document acquired by the acquisition means ;
The correction means calculates a probability that a plurality of recognition results by the recognition means appear in the document acquired by the acquisition means, and corrects the recognition results by the recognition means based on the probabilities. Information processing apparatus.

First receiving means for receiving voice information;
Second receiving means for receiving speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information;
Recognizing means for recognizing voice information received by the first receiving means;
Obtaining means for obtaining a document related to the speaker identification information received by the second receiving means;
Correction means for correcting the recognition result by the recognition means based on the document acquired by the acquisition means ;
The recognition means outputs a plurality of recognition results for one phrase and a certainty factor for the recognition results,
When there are a plurality of recognition results for which the certainty factor is higher or higher than a predetermined value for one clause, each of the plurality of recognition results and before or after the clause. An information processing apparatus that calculates a probability that a combination with a recognition result of a phrase appears in a document acquired by the acquisition unit, and corrects the recognition result by the recognition unit based on the probability .

The acquisition means acquires a document created by a speaker of the speaker identification information, the document created within a predetermined period from when the voice of the voice information is spoken. The information processing apparatus according to claim 1 or 2 .

Computer
First receiving means for receiving voice information;
Second receiving means for receiving speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information;
Recognizing means for recognizing voice information received by the first receiving means;
Obtaining means for obtaining a document related to the speaker identification information received by the second receiving means;
Based on the document acquired by the acquisition means, function as a correction means for correcting the recognition result by the recognition means ,
The correction means calculates a probability that a plurality of recognition results by the recognition means appear in the document acquired by the acquisition means, and corrects the recognition results by the recognition means based on the probabilities.
An information processing program characterized by that .

Computer
First receiving means for receiving voice information;
Second receiving means for receiving speaker identification information capable of uniquely identifying a speaker who has spoken the voice of the voice information;
Recognizing means for recognizing voice information received by the first receiving means;
Obtaining means for obtaining a document related to the speaker identification information received by the second receiving means;
Based on the document acquired by the acquisition means, function as a correction means for correcting the recognition result by the recognition means ,
The recognition means outputs a plurality of recognition results for one phrase and a certainty factor for the recognition results,
When there are a plurality of recognition results for which the certainty factor is higher or higher than a predetermined value for one clause, each of the plurality of recognition results and before or after the clause. The probability that the combination with the recognition result of the phrase appears in the document acquired by the acquisition unit is calculated, and the recognition result by the recognition unit is corrected based on the probability.
An information processing program characterized by that .