JP2020160118A

JP2020160118A - Information processing device, information processing method and program

Info

Publication number: JP2020160118A
Application number: JP2019056140A
Authority: JP
Inventors: 大樹石浦; Daiki Ishiura; 光平武田; Kohei Takeda
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2020-10-01
Anticipated expiration: 2039-03-25
Also published as: JP7406921B2

Abstract

To suitably generate a dictionary for voice recognition.SOLUTION: A first voice recognition result based on a first dictionary and a second voice recognition result based on a second dictionary are received, and a first confidence level about the first voice recognition result calculated based on predetermined operation, and a second confidence level about the second voice recognition result calculated based on the operation are received. The first confidence level and the second confidence level are compared with each other, and if a predetermined condition is satisfied, word information contained in the second voice recognition result is registered. The second dictionary includes the word information registered in the first dictionary, and the word information designated by a user.SELECTED DRAWING: Figure 5

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to information processing devices, information processing methods and programs.

不特定者を対象とした音声認識装置では、汎用的かつ一般的な語彙を中心とした音声認識用の辞書が予め登録されており、当該音声認識装置は、登録されている音声認識用の辞書に基づいて音声を認識する。このような音声認識装置において、認識対象の語彙が設計時において規定可能な場合には、事前に作成した音声認識用辞書を用いるが、語彙が規定できない場合、あるいは動的に変更されるべきである場合においては、一般的に、人的作業による入力、または自動的に文字列情報から音声認識用の語彙を生成して辞書に登録する、などといったことが行われる。 In the voice recognition device for unspecified persons, a dictionary for voice recognition centering on general-purpose and general vocabulary is registered in advance, and the voice recognition device is a registered dictionary for voice recognition. Recognize voice based on. In such a speech recognition device, if the vocabulary to be recognized can be defined at the time of design, a pre-created speech recognition dictionary is used, but if the vocabulary cannot be defined, or it should be changed dynamically. In some cases, human input is generally performed, or a vocabulary for speech recognition is automatically generated from character string information and registered in a dictionary.

また、近年の音声認識装置では、例えば、省略語などの言い換え表現についても音声認識用の辞書に登録することによって、正式な単語の発声だけでなく、ユーザによる任意の省略的な発声にも対処している。 Further, in recent voice recognition devices, for example, by registering paraphrase expressions such as abbreviations in a dictionary for voice recognition, not only formal word utterances but also arbitrary abbreviations by users can be dealt with. are doing.

例えば特許文献１には、単語の省略的な言い換え表現に対しても高い認識率で認識することが可能な音声認識装置が開示されている。 For example, Patent Document 1 discloses a speech recognition device capable of recognizing abbreviated paraphrase expressions of words with a high recognition rate.

特許第３７２４６４９号公報Japanese Patent No. 3724649

しかしながら、特許文献１に開示されている音声認識装置では、例えば、企業特有の社内用語や今回の会議や講演会で登場するような特殊用語といった、汎用的かつ一般的ではない新規な単語（特殊用語）を音声認識用の辞書に登録する場合には、人的作業による入力が必要となり、登録すべき単語の選別や入力など、人的作業負担が大きかった。そのため、音声認識用の辞書を好適に生成するという観点からすると未だ十分でなかった。 However, in the speech recognition device disclosed in Patent Document 1, new words (special) that are general and uncommon, such as company-specific internal terms and special terms that appear at this conference or lecture, are used. When registering a term) in a speech recognition dictionary, it is necessary to input by human work, and the burden of human work such as selection and input of words to be registered is heavy. Therefore, it is still insufficient from the viewpoint of suitably generating a dictionary for speech recognition.

本発明は、上述のような事情に鑑みてなされたものであり、音声認識用の辞書を好適に生成することができる情報処理装置、情報処理方法およびプログラムを提供することを目的としている。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information processing device, an information processing method, and a program capable of suitably generating a dictionary for voice recognition.

上記目的を達成するため、本発明の第１の観点に係る情報処理装置は、
第１の辞書に基づく第１音声認識結果と、前記第１の辞書とは異なる第２の辞書に基づく第２音声認識結果と、を受信する音声認識結果受信手段と、
予め定められた演算に基づいて算出された前記第１音声認識結果についての第１確信度と、前記演算に基づいて算出された前記第２音声認識結果についての第２確信度と、を受信する確信度受信手段と、
前記第１確信度と前記第２確信度とを比較し、予め定められた条件を満たす場合、前記第２音声認識結果に含まれる単語情報を登録する単語情報登録手段と、を備え、
前記第２の辞書には、前記第１の辞書に登録された単語情報に加え、ユーザにより指定された単語情報が含まれる、
ことを特徴とする。 In order to achieve the above object, the information processing device according to the first aspect of the present invention is
A voice recognition result receiving means for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence receiving means and
A word information registration means for registering word information included in the second voice recognition result when the first certainty degree and the second certainty degree are compared and a predetermined condition is satisfied is provided.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
It is characterized by that.

前記予め定められた条件を満たす場合、前記第２音声認識結果から登録対象となる単語情報を、予め定められた基準に従って抽出する抽出手段をさらに備え、
前記単語情報登録手段は、前記抽出手段により抽出された単語情報を登録する、
ようにしてもよい。 When the predetermined conditions are satisfied, an extraction means for extracting word information to be registered from the second voice recognition result according to a predetermined standard is further provided.
The word information registration means registers the word information extracted by the extraction means.
You may do so.

前記抽出手段により抽出された単語情報を、出現頻度毎に予め定められた複数分類のいずれかに分類する分類手段をさらに備え、
前記単語情報登録手段は、前記分類手段により分類された単語情報を該分類毎に登録する、
ようにしてもよい。 Further provided with a classification means for classifying the word information extracted by the extraction means into one of a plurality of predetermined classifications for each occurrence frequency.
The word information registration means registers word information classified by the classification means for each classification.
You may do so.

前記単語情報には音声情報および文字情報が含まれ、
前記単語情報登録手段により登録された単語情報を前記第１の辞書に追加することで前記第１の辞書を更新する第１辞書更新手段、をさらに備え、
前記第２の辞書は、前記第１の辞書が更新される毎に前記ユーザの操作により新たに記憶される、
ようにしてもよい。 The word information includes voice information and character information.
A first dictionary update means for updating the first dictionary by adding the word information registered by the word information registration means to the first dictionary is further provided.
The second dictionary is newly stored by the operation of the user every time the first dictionary is updated.
You may do so.

上記目的を達成するため、本発明の第２の観点に係る情報処理方法は、
第１の辞書に基づく第１音声認識結果と、前記第１の辞書とは異なる第２の辞書に基づく第２音声認識結果と、を受信する音声認識結果受信ステップと、
予め定められた演算に基づいて算出された前記第１音声認識結果についての第１確信度と、前記演算に基づいて算出された前記第２音声認識結果についての第２確信度と、を受信する確信度受信ステップと、
前記第１確信度と前記第２確信度とを比較し、予め定められた条件を満たす場合、前記第２音声認識結果に含まれる単語情報を登録する単語情報登録ステップと、を備え、
前記第２の辞書には、前記第１の辞書に登録された単語情報に加え、ユーザにより指定された単語情報が含まれる、
ことを特徴とする。 In order to achieve the above object, the information processing method according to the second aspect of the present invention is
A voice recognition result receiving step for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence reception step and
A word information registration step of comparing the first certainty degree and the second certainty degree and registering the word information included in the second voice recognition result when a predetermined condition is satisfied is provided.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
It is characterized by that.

上記目的を達成するため、本発明の第３の観点に係るプログラムは、
コンピュータを、
第１の辞書に基づく第１音声認識結果と、前記第１の辞書とは異なる第２の辞書に基づく第２音声認識結果と、を受信する音声認識結果受信手段、
予め定められた演算に基づいて算出された前記第１音声認識結果についての第１確信度と、前記演算に基づいて算出された前記第２音声認識結果についての第２確信度と、を受信する確信度受信手段、
前記第１確信度と前記第２確信度とを比較し、予め定められた条件を満たす場合、前記第２音声認識結果に含まれる単語情報を登録する単語情報登録手段、として機能させ、
前記第２の辞書には、前記第１の辞書に登録された単語情報に加え、ユーザにより指定された単語情報が含まれる、
ことを特徴とする。 In order to achieve the above object, the program according to the third aspect of the present invention is
Computer,
A voice recognition result receiving means for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence receiving means,
When the first certainty degree and the second certainty degree are compared and a predetermined condition is satisfied, the word information included in the second voice recognition result is registered as a word information registration means.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
It is characterized by that.

本発明によれば、音声認識用の辞書を好適に生成することができる。 According to the present invention, a dictionary for voice recognition can be preferably generated.

本発明の実施の形態に係る情報処理システムの一例を示すブロック図である。It is a block diagram which shows an example of the information processing system which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る音声認識サーバの一例を示すブロック図である。It is a block diagram which shows an example of the voice recognition server which concerns on embodiment of this invention. 情報処理システムの全体的な処理を説明するための説明図である。It is explanatory drawing for demonstrating the overall processing of an information processing system. 用語登録処理の一例を示すフローチャートである。It is a flowchart which shows an example of a term registration process. 音声認識結果の一例を示す図である。It is a figure which shows an example of the voice recognition result. 音声認識結果の形態素と品詞の一例を示す図である。It is a figure which shows an example of a morpheme and a part of speech of a speech recognition result.

本発明における情報処理装置１００を、図１に示す情報処理システム１に適用した例を用いて説明する。情報処理システム１では、図１に示すように、情報処理装置１００Ａおよび１００Ｂと、音声認識サーバ２００とがネットワーク５１０を介して通信可能に接続されている。なお、理解を容易にするため、この実施の形態では、情報処理装置１００Ａのユーザと情報処理装置１００Ｂのユーザとが互いに会話を行う場合を例に、以下説明する。なお、情報処理装置１００Ａおよび情報処理装置１００Ｂは、単に情報処理装置１００とも言う。 An example in which the information processing device 100 in the present invention is applied to the information processing system 1 shown in FIG. 1 will be described. In the information processing system 1, as shown in FIG. 1, the information processing devices 100A and 100B and the voice recognition server 200 are communicably connected via the network 510. In order to facilitate understanding, in this embodiment, a case where the user of the information processing device 100A and the user of the information processing device 100B have a conversation with each other will be described below as an example. The information processing device 100A and the information processing device 100B are also simply referred to as an information processing device 100.

情報処理装置１００は、携帯電話やスマートフォン、タブレットやＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の情報端末（所謂コンピュータ）であり、Ｐ２Ｐ（ＰｅｅｒｔｏＰｅｅｒ）等の分散型のネットワーク５１０を構築している。なお、情報処理システム１は、Ｐ２Ｐ型のシステムに限られず、例えばクラウドコンピューティング型であってもよい。 The information processing device 100 is an information terminal (so-called computer) such as a mobile phone, a smartphone, a tablet, or a PC (Personal Computer), and constructs a distributed network 510 such as P2P (Peer to Peer). The information processing system 1 is not limited to the P2P type system, and may be, for example, a cloud computing type.

情報処理装置１００は、音声認識サーバ２００から受信した、他の情報処理装置１００のユーザの会話の音声データおよびテキストデータ（音声認識結果）を出力する機能を有している。また、情報処理装置１００は、音声認識サーバ２００から受信した確信度に基づいて、登録対象となる単語情報を音声認識結果から抽出し、音声認識用の辞書へ登録する機能を有している。 The information processing device 100 has a function of outputting voice data and text data (voice recognition result) of conversations of other users of the information processing device 100 received from the voice recognition server 200. Further, the information processing device 100 has a function of extracting word information to be registered from the voice recognition result based on the certainty level received from the voice recognition server 200 and registering the word information in the voice recognition dictionary.

音声認識サーバ２００は、例えばメインフレームやワークステーション、あるいはＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の任意のコンピュータ装置である。音声認識サーバ２００は、情報処理装置１００から送信された音声（会話の内容）を、予め記憶された音声認識用の辞書に基づいて認識し、認識した音声データをテキストデータとともに（音声認識結果として）他の情報処理装置１００へ送信する機能を有している。また、音声認識サーバ２００は、音声認識結果として得られる語彙が実際に発話された語彙と一致している確率を示す確信度を算出し、他の情報処理装置１００へ送信する機能も有している。 The voice recognition server 200 is, for example, a mainframe, a workstation, or an arbitrary computer device such as a PC (Personal Computer). The voice recognition server 200 recognizes the voice (contents of the conversation) transmitted from the information processing device 100 based on the pre-stored dictionary for voice recognition, and the recognized voice data together with the text data (as a voice recognition result). ) It has a function of transmitting to another information processing device 100. The voice recognition server 200 also has a function of calculating a certainty degree indicating the probability that the vocabulary obtained as a voice recognition result matches the actually spoken vocabulary and transmitting it to another information processing device 100. There is.

次に、図２を参照し、この実施の形態における情報処理装置１００（図１に示す情報処理装置１００Ａおよび情報処理装置１００Ｂ）の構成について説明する。なお、図示は省略しているが、ユーザの会話（音声）を送信用の音声データとして（アナログからデジタルへ）変換する機能（およびその逆の機能）を有する機能部が設けられているものとする。 Next, the configuration of the information processing device 100 (information processing device 100A and information processing device 100B shown in FIG. 1) in this embodiment will be described with reference to FIG. Although not shown, it is assumed that a functional unit having a function of converting a user's conversation (voice) as voice data for transmission (from analog to digital) (and vice versa) is provided. To do.

図２に示すように、情報処理装置１００は、記憶部１１０と、制御部１２０と、入出力部１３０と、通信部１４０と、これらを相互に接続するシステムバス（図示省略）と、を備えている。 As shown in FIG. 2, the information processing apparatus 100 includes a storage unit 110, a control unit 120, an input / output unit 130, a communication unit 140, and a system bus (not shown) that connects them to each other. ing.

記憶部１１０は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ)やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等を備える。ＲＯＭは制御部１２０のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）が実行するプログラム及び、プログラムを実行する上で予め必要なデータを記憶する（図示省略）。 The storage unit 110 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The ROM stores a program executed by the CPU (Central Processing Unit) of the control unit 120 and data necessary for executing the program in advance (not shown).

具体的に、この実施の形態における記憶部１１０は、登録用語一覧１１１として、音声認識用の辞書として登録すべき単語の音声データとそのテキストデータを記憶する。なお、音声データと当該音声データに対応するテキストデータを、合わせて単語情報とも言う。なお、登録用語一覧１１１は、登録対象の単語情報の一覧を示すものであり、複数の単語情報が含まれる。当該登録用語一覧１１１の単語情報は、後述する用語登録処理により、分類毎に記憶部１１０へ記憶される。また、記憶部１１０には、登録分類１１２として、ユーザによる指定に基づいて分類される登録分類の一覧と、その分類基準が記憶されている。登録分類としては、例えば、「普遍的に使用される社内用語」といった分類や、「特定の組織内で使用される組織内用語」などの分類が、ユーザによる指定に基づいて登録されている。分類基準としては、例えば、会話中における当該登録対象の単語情報の出現頻度を記憶しておき、５回以上出現している単語情報については「普遍的に使用される社内用語」と分類し、５回未満であれば「特定の組織内で使用される組織内用語」に分類するなど、ユーザによって任意に設定可能であればよい。 Specifically, the storage unit 110 in this embodiment stores voice data of words to be registered as a dictionary for voice recognition and text data thereof as a registered term list 111. The voice data and the text data corresponding to the voice data are also collectively referred to as word information. The registered term list 111 shows a list of word information to be registered, and includes a plurality of word information. The word information of the registered term list 111 is stored in the storage unit 110 for each classification by the term registration process described later. Further, the storage unit 110 stores a list of registered classifications classified based on the user's designation as the registration classification 112 and the classification criteria thereof. As the registration classification, for example, a classification such as "universally used in-house term" and a classification such as "internal term used in a specific organization" are registered based on the designation by the user. As a classification standard, for example, the frequency of appearance of the word information to be registered during conversation is memorized, and the word information that appears 5 times or more is classified as "universally used in-house term". If it is less than 5 times, it may be arbitrarily set by the user, such as classifying it into "internal terms used in a specific organization".

制御部１２０は、ＣＰＵやＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等から構成される。制御部１２０は、記憶部１１０に記憶されたプログラムに従って動作し、当該プログラムに従った処理を実行する。制御部１２０は、記憶部１１０に記憶されたプログラムにより提供される主要な機能部として、確信度比較部１２１と、形態素抽出部１２２と、品詞推定部１２３と、用語分類部１２４と、用語登録部１２５と、を備える。 The control unit 120 is composed of a CPU, an ASIC (Application Specific Integrated Circuit), and the like. The control unit 120 operates according to the program stored in the storage unit 110, and executes the process according to the program. The control unit 120 includes a certainty comparison unit 121, a morpheme extraction unit 122, a part of speech estimation unit 123, a term classification unit 124, and term registration as main functional units provided by a program stored in the storage unit 110. A unit 125 is provided.

確信度比較部１２１は、音声認識サーバ２００から送信された確信度を比較する機能部である。詳しくは後述するが、音声認識サーバ２００からは、第１登録用語一覧２１１を音声認識用の辞書（第１の辞書）として用いた場合の音声認識結果（後述する第１登録用語一覧２１１に基づくテキストデータとその音声データ）とその確信度Ａ（第１確信度）と、第２登録用語一覧２１２を音声認識用の辞書（第２の辞書）として用いた場合の音声認識結果（後述する第２登録用語一覧２１２に基づくテキストデータとその音声データ）とその確信度Ｂ（第２確信度）と、が送信される。確信度比較部１２１は、当該確信度Ａと確信度Ｂとを比較する。具体的に、確信度比較部１２１は、確信度Ｂから確信度Ａを減算した値が、予め定められた閾値以上であるか否かを判定することにより、確信度を比較する。閾値は、例えば、会議の内容や使用する言語などに応じて異なる値がユーザにより設定されていればよい。 The certainty comparison unit 121 is a functional unit that compares the certainty transmitted from the voice recognition server 200. Although details will be described later, the voice recognition server 200 is based on the voice recognition result (based on the first registered term list 211 described later) when the first registered term list 211 is used as a speech recognition dictionary (first dictionary). Voice recognition result (described later) when text data and its voice data), its certainty A (first certainty), and the second registered term list 212 are used as a voice recognition dictionary (second dictionary). 2 Text data based on the registered term list 212 and its voice data) and its certainty B (second certainty) are transmitted. The certainty comparison unit 121 compares the certainty A with the certainty B. Specifically, the certainty comparison unit 121 compares the certainty by determining whether or not the value obtained by subtracting the certainty A from the certainty B is equal to or higher than a predetermined threshold value. The threshold value may be set by the user, for example, depending on the content of the conference, the language used, and the like.

形態素抽出部１２２は、例えば、第１登録用語一覧２１１を音声認識用の辞書として用いた場合の音声認識結果（第１音声認識結果）と、第２登録用語一覧２１２を音声認識用の辞書として用いた場合の音声認識結果（第２音声認識結果）と、のそれぞれを、形態素解析などにより形態素毎に分割し、異なる形態素を抽出する機能部である。具体的に、形態素抽出部１２２は、形態素毎に分割した第２音声認識結果から、形態素毎に分割した第１音声認識結果との共通部分の形態素を差し引くことで、異なる形態素を抽出する。 The morphological element extraction unit 122 uses, for example, the voice recognition result (first voice recognition result) when the first registered term list 211 is used as a speech recognition dictionary and the second registered term list 212 as a speech recognition dictionary. It is a functional unit that extracts different morphemes by dividing each of the voice recognition result (second voice recognition result) when used into each morpheme by morphological analysis or the like. Specifically, the morpheme extraction unit 122 extracts different morphemes by subtracting the morpheme of the intersection with the first speech recognition result divided for each morpheme from the second speech recognition result divided for each morpheme.

品詞推定部１２３は、第１音声認識結果と第２音声認識結果とのそれぞれの形態素の品詞を比較することで、異なる品詞の形態素を抽出する機能部である。具体的に、品詞推定部１２３は、第１音声認識結果の形態素と第２音声認識結果の形態素を比較し、第２音声認識結果の形態素の品詞が名詞であるものの、第１音声認識結果の形態素が名詞以外である形態素を抽出する。すなわち、形態素抽出部１２２は、第２音声認識結果から、第１音声認識結果と異なる単語の形態素（異なる文字列）を抽出するのに対し、品詞推定部１２３は、第２音声認識結果から、第１音声認識結果と異なる品詞の形態素を抽出する。換言すると、形態素抽出部１２２は、文字列の観点から形態素を抽出する機能部であり、品詞推定部１２３は、品詞の観点から形態素を抽出する機能部であると言える。なお、「普遍的に使用される社内用語」や「特定の組織内で使用される組織内用語」などといった特殊用語は、通常名詞であることが多い。そのため、この実施の形態における品詞推定部１２３は、第２音声認識結果の形態素の品詞が名詞であるものの、第１音声認識結果の形態素が名詞以外である形態素を抽出する。これとは異なり、単に異なる品詞の形態素を入出力部１３０に出力し、ユーザにより抽出するか否かを選択させるようにしてもよい。 The part of speech estimation unit 123 is a functional unit that extracts morphemes of different part of speech by comparing the part of speech of each morpheme of the first speech recognition result and the second speech recognition result. Specifically, the part-speech estimation unit 123 compares the morpheme of the first speech recognition result with the morpheme of the second speech recognition result, and although the part of speech of the morpheme of the second speech recognition result is a noun, the first speech recognition result Extract morphemes whose morphemes are other than nouns. That is, the morpheme extraction unit 122 extracts the morpheme (different character string) of a word different from the first speech recognition result from the second speech recognition result, whereas the part of speech estimation unit 123 extracts the morpheme (different character string) of the word from the second speech recognition result. Extract morphemes of part of speech that differ from the first speech recognition result. In other words, it can be said that the morpheme extraction unit 122 is a functional unit that extracts morphemes from the viewpoint of a character string, and the part of speech estimation unit 123 is a functional unit that extracts morphemes from the viewpoint of part of speech. Special terms such as "universally used internal terms" and "internal terms used within a specific organization" are often nouns. Therefore, the part of speech estimation unit 123 in this embodiment extracts a morpheme whose morpheme of the second speech recognition result is a noun but whose morpheme of the first speech recognition result is other than a noun. Unlike this, the morphemes of different part of speech may be simply output to the input / output unit 130, and the user may select whether or not to extract them.

用語分類部１２４は、形態素抽出部１２２の機能により抽出した形態素と、品詞推定部１２３の機能により抽出した形態素と、が一致しているか否かを判定し、一致した場合に登録対象として認定し、当該認定した登録対象の形態素の単語情報を、登録分類１１２に基づく分類に基づいて分類する機能部である。具体的に、用語分類部１２４は、抽出したそれぞれの形態素が一致する場合、登録対象となる単語情報の出現頻度に基づいて、登録分類１１２として設定されている分類基準に従い、登録されているいずれかの分類に分類する。 The term classification unit 124 determines whether or not the morpheme extracted by the function of the morpheme extraction unit 122 and the morpheme extracted by the function of the part of speech estimation unit 123 match, and if they match, the term classification unit 124 recognizes the morpheme as a registration target. , Is a functional unit that classifies the word information of the certified morpheme to be registered based on the classification based on the registration classification 112. Specifically, when the extracted morphemes match, the term classification unit 124 is registered according to the classification criteria set as the registration classification 112 based on the appearance frequency of the word information to be registered. It is classified into the above classification.

用語登録部１２５は、用語分類部１２４で分類された単語情報を、当該分類毎に登録用語一覧１１１へ登録する機能部である。また、用語登録部１２５は、登録用語一覧１１１へ登録された単語情報の内容に基づいて、第１登録用語一覧２１１の内容を更新させる更新指示を音声認識サーバ２００へ送信する機能も有している。なお、用語登録部１２５は、単語情報登録手段としての機能である。 The term registration unit 125 is a functional unit that registers the word information classified by the term classification unit 124 in the registered term list 111 for each classification. Further, the term registration unit 125 also has a function of transmitting an update instruction for updating the contents of the first registered term list 211 to the voice recognition server 200 based on the contents of the word information registered in the registered term list 111. There is. The term registration unit 125 is a function as a word information registration means.

これら各機能部が協働して、情報処理装置１００において、登録対象となる単語情報を音声認識用の辞書へ登録する機能を実現している。 Each of these functional units cooperates to realize a function of registering word information to be registered in a speech recognition dictionary in the information processing device 100.

入出力部１３０は、キーボード、マウス、カメラ、マイク、液晶ディスプレイ、有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等から構成され、データの入出力を行うための装置である。 The input / output unit 130 is composed of a keyboard, a mouse, a camera, a microphone, a liquid crystal display, an organic EL (Electro-luminescence) display, and the like, and is a device for inputting / outputting data.

通信部１４０は、他の情報処理装置１００や音声認識サーバ２００とネットワーク５１０を介して通信を行うためのデバイスである。 The communication unit 140 is a device for communicating with another information processing device 100 or a voice recognition server 200 via a network 510.

以上が、情報処理装置１００の構成である。次に、図３を参照し、この実施の形態における音声認識サーバ２００の構成について説明する。図３に示すように、音声認識サーバ２００は、記憶部２１０と、制御部２２０と、入出力部２３０と、通信部２４０と、これらを相互に接続するシステムバス（図示省略）と、を備えている。 The above is the configuration of the information processing device 100. Next, the configuration of the voice recognition server 200 in this embodiment will be described with reference to FIG. As shown in FIG. 3, the voice recognition server 200 includes a storage unit 210, a control unit 220, an input / output unit 230, a communication unit 240, and a system bus (not shown) that connects them to each other. ing.

記憶部２１０は、ＲＯＭやＲＡＭ等を備える。ＲＯＭは制御部２２０のＣＰＵが実行するプログラム及び、プログラムを実行する上で予め必要なデータを記憶する（図示省略）。 The storage unit 210 includes a ROM, a RAM, and the like. The ROM stores a program executed by the CPU of the control unit 220 and data necessary for executing the program in advance (not shown).

具体的に、この実施の形態における記憶部２１０は、音声認識用の辞書として、第１登録用語一覧２１１と、第２登録用語一覧２１２とを記憶する。第１登録用語一覧２１１は、単語情報の一覧であり、後述する用語登録処理が実行される度に、登録されている単語情報が更新される。なお、初期の第１登録用語一覧２１１は、汎用的かつ一般的な語彙を中心とした単語情報の一覧であればよく、例えば、ユーザにより生成されてもよいし、ネットワーク上に公開されているものをダウンロードすることで取得してもよい。 Specifically, the storage unit 210 in this embodiment stores the first registered term list 211 and the second registered term list 212 as a dictionary for voice recognition. The first registered term list 211 is a list of word information, and the registered word information is updated every time the term registration process described later is executed. The initial first registered term list 211 may be a list of word information centered on a general-purpose and general vocabulary, and may be generated by a user or is open to the public on the network, for example. You may get it by downloading the one.

一方、第２登録用語一覧２１２は、第１登録用語一覧２１１よりも、例えば「普遍的に使用される社内用語」や「特定の組織内で使用される組織内用語」などといった特殊用語の単語情報を多く含むよう、ユーザにより生成された単語情報の一覧である。なお、第２登録用語一覧２１２は、例えば、予定されている会議の資料や講演会の資料に基づいて、当該会議や講演会毎にユーザにより生成されればよい。この実施の形態における情報処理装置１００では、例えば会議毎に（換言すると第１登録用語一覧２１１が更新される毎に）第２登録用語一覧２１２が新規に記憶されて、後述する用語登録処理が行われる。当該用語登録処理では、第２登録用語一覧２１２と第１登録用語一覧２１１との比較により、対象となる単語情報が登録される。したがって、「普遍的に使用される社内用語」などの特殊用語を音声認識用の辞書に好適に登録することができるとともに、繰り返し実行することで、当該音声認識用の辞書を更新することが可能となる。 On the other hand, the second registered term list 212 has more special term words such as "universally used in-house term" and "internal term used in a specific organization" than the first registered term list 211. It is a list of word information generated by the user so as to include a lot of information. The second registered term list 212 may be generated by the user for each meeting or lecture, for example, based on the material of the scheduled meeting or the material of the lecture. In the information processing apparatus 100 according to this embodiment, for example, the second registered term list 212 is newly stored for each meeting (in other words, every time the first registered term list 211 is updated), and the term registration process described later is performed. Will be done. In the term registration process, the target word information is registered by comparing the second registered term list 212 with the first registered term list 211. Therefore, special terms such as "universally used in-house terms" can be suitably registered in the speech recognition dictionary, and the speech recognition dictionary can be updated by repeatedly executing the terms. It becomes.

制御部２２０は、ＣＰＵやＡＳＩＣ等から構成される。制御部１２０は、記憶部１１０に記憶されたプログラムに従って動作し、当該プログラムに従った処理を実行する。制御部２２０は、記憶部２１０に記憶されたプログラムにより提供される主要な機能部として、音声認識処理部２２１と、確信度算出部２２２と、を備える。 The control unit 220 is composed of a CPU, an ASIC, and the like. The control unit 120 operates according to the program stored in the storage unit 110, and executes the process according to the program. The control unit 220 includes a voice recognition processing unit 221 and a certainty calculation unit 222 as main functional units provided by the program stored in the storage unit 210.

音声認識処理部２２１は、例えば、情報処理装置１００から受信した音声データについて、第１登録用語一覧２１１に基づくテキストデータと、第２登録用語一覧２１２に基づくテキストデータと、のそれぞれに変換する機能部である。なお、音声データからテキストデータへの変換は、第１登録用語一覧２１１および第２登録用語一覧２１２に基づいて、従来から用いられている音声認識技術により行われればよい。なお、音声認識処理部２２１は、変換したそれぞれのテキストデータを、音声データとともに他の情報処理装置１００へと送信する機能も有している。 The voice recognition processing unit 221 has a function of converting, for example, voice data received from the information processing device 100 into text data based on the first registered term list 211 and text data based on the second registered term list 212. It is a department. The conversion from voice data to text data may be performed by a conventionally used voice recognition technique based on the first registered term list 211 and the second registered term list 212. The voice recognition processing unit 221 also has a function of transmitting each converted text data together with the voice data to another information processing device 100.

確信度算出部２２２は、音声認識処理部２２１にて変換されたテキストデータに対応する確信度を算出する機能部である。具体的に、確信度算出部２２２は、第１登録用語一覧２１１に基づくテキストデータの確信度Ａと、第２登録用語一覧２１２に基づくテキストデータの確信度Ｂと、をそれぞれ算出する。確信度は、例えば、第１登録用語一覧２１１や第２登録用語一覧２１２に登録されている単語情報の音声特徴量（波形や周期等）と、受信した音声データによる音声特徴量の類似度に基づいて算出（予め定められた演算に基づいて算出）されればよい。なお、確信度算出部２２２は、算出したそれぞれの確信度を他の情報処理装置１００へと送信する機能も有している。 The certainty calculation unit 222 is a functional unit that calculates the certainty corresponding to the text data converted by the voice recognition processing unit 221. Specifically, the certainty calculation unit 222 calculates the certainty A of the text data based on the first registered term list 211 and the certainty B of the text data based on the second registered term list 212, respectively. The degree of certainty is, for example, the similarity between the voice features (waveform, period, etc.) of the word information registered in the first registered term list 211 and the second registered term list 212 and the voice features based on the received voice data. It may be calculated based on (calculated based on a predetermined calculation). The certainty calculation unit 222 also has a function of transmitting each calculated certainty to another information processing device 100.

これらの機能部が協働して、音声認識サーバ２００において、情報処理装置１００から受信した音声データをテキストデータにそれぞれ変換し（音声認識し）、当該音声データとともに音声認識結果として他の情報処理装置１００へと送信する機能を実現している。また、確信度を他の情報処理装置１００へと送信する機能を実現している。 These functional units work together to convert the voice data received from the information processing device 100 into text data (speech recognition) in the voice recognition server 200, and perform other information processing as a voice recognition result together with the voice data. The function of transmitting to the device 100 is realized. Moreover, the function of transmitting the certainty degree to another information processing apparatus 100 is realized.

入出力部２３０は、キーボード、マウス、カメラ、マイク、液晶ディスプレイ、有機ＥＬディスプレイ等から構成され、データの入出力を行うための装置である。 The input / output unit 230 is composed of a keyboard, a mouse, a camera, a microphone, a liquid crystal display, an organic EL display, and the like, and is a device for inputting / outputting data.

通信部２４０は、情報処理装置１００とネットワーク５１０を介して通信を行うためのデバイスである。 The communication unit 240 is a device for communicating with the information processing device 100 via the network 510.

以上が、音声認識サーバ２００の構成である。続いて情報処理装置１００の動作などについて、図４〜図７を参照して説明する。まず、情報処理システム１の動作として、全体的な処理の流れについて、図４を参照して説明する。なお、図示する例では、情報処理装置１００Ｂのユーザが情報処理装置１００Ａのユーザに対して例文１の内容の発言した場合を例に、以下説明する。 The above is the configuration of the voice recognition server 200. Subsequently, the operation of the information processing apparatus 100 and the like will be described with reference to FIGS. 4 to 7. First, as the operation of the information processing system 1, the overall processing flow will be described with reference to FIG. In the illustrated example, the case where the user of the information processing device 100B speaks the content of the example sentence 1 to the user of the information processing device 100A will be described below as an example.

図４に示すように、情報処理装置１００Ｂのユーザが入出力部１３０に例文１の音声を入力すると、制御部１２０の機能により音声データに変換され、当該音声データが音声認識サーバ２００へ送信される（図４の（１））。なお、図示する例では、理解を容易にするため、情報処理装置１００Ｂから音声認識サーバ２００へ当該音声データが送信される例を示しているが、例えば、情報処理装置１００Ｂから情報処理装置１００Ａへと音声データが送信され、当該情報処理装置１００Ａにて抽出した特定の音声データが音声認識サーバ２００へ送信されるようにしてもよい。 As shown in FIG. 4, when the user of the information processing apparatus 100B inputs the voice of the example sentence 1 to the input / output unit 130, the voice data is converted into voice data by the function of the control unit 120, and the voice data is transmitted to the voice recognition server 200. ((1) in FIG. 4). In the illustrated example, the voice data is transmitted from the information processing device 100B to the voice recognition server 200 in order to facilitate understanding. For example, the information processing device 100B to the information processing device 100A. And the specific voice data extracted by the information processing apparatus 100A may be transmitted to the voice recognition server 200.

音声認識サーバ２００は、情報処理装置１００Ｂから音声データを受信すると、音声認識処理部２２１の機能により、第１登録用語一覧２１１に基づいて音声認識を行い（テキストデータへ変換し）、音声データとテキストデータを、第１音声認識結果として情報処理装置１００Ａへ送信する（図４の（２））。また、音声認識サーバ２００は、確信度算出部２２２の機能により、第１登録用語一覧２１１に基づく音声認識の確信度Ａを算出し、情報処理装置１００Ａへ送信する（図４の（３））。 When the voice recognition server 200 receives the voice data from the information processing device 100B, the voice recognition processing unit 221 functions to perform voice recognition (converted to text data) based on the first registered term list 211, and the voice data is combined with the voice data. The text data is transmitted to the information processing device 100A as the first voice recognition result ((2) in FIG. 4). Further, the voice recognition server 200 calculates the certainty level A of voice recognition based on the first registered term list 211 by the function of the certainty degree calculation unit 222, and transmits it to the information processing device 100A ((3) in FIG. 4). ..

また、音声認識サーバ２００は、音声認識処理部２２１の機能により、第２登録用語一覧２１２に基づいて音声認識を行い（テキストデータへ変換し）、音声データとテキストデータを、第２音声認識結果として情報処理装置１００Ａへ送信する（図４の（４））。また、音声認識サーバ２００は、確信度算出部２２２の機能により、第２登録用語一覧２１２に基づく音声認識の確信度Ｂを算出し、情報処理装置１００Ａへ送信する（図４の（５））。なお、図４の（２）〜（５）は、まとめて行われてもよい。 Further, the voice recognition server 200 performs voice recognition (converts to text data) based on the second registered term list 212 by the function of the voice recognition processing unit 221 and converts the voice data and the text data into the second voice recognition result. Is transmitted to the information processing device 100A ((4) in FIG. 4). Further, the voice recognition server 200 calculates the certainty level B of voice recognition based on the second registered term list 212 by the function of the certainty degree calculation unit 222, and transmits it to the information processing device 100A ((5) in FIG. 4). .. In addition, (2) to (5) of FIG. 4 may be performed collectively.

情報処理装置１００Ａの側では、音声認識サーバ２００から受信した、第２登録用語一覧２１２に基づく音声データとテキストデータを、入出力部１３０から出力する（図６（Ｂ）に示す内容が出力される）。また、情報処理装置１００Ａは、音声認識サーバ２００から第１音声認識結果と第２音声認識結果（確信度Ａおよび確信度Ｂも含む）を受信すると（音声認識結果受信手段および確信度受信手段に相当）、登録対象となる特殊用語を当該音声認識用の辞書に登録するための用語登録処理を行う。すなわち、情報処理装置１００Ａは、情報処理装置１００Ｂのユーザの発言に含まれる特殊用語を音声認識用の辞書に登録するための処理を行う。なお、以下では、図６（Ａ）に示す内容の音声データおよびテキストデータを第１音声認識結果として受信し、図６（Ｂ）に示す内容の音声データおよびテキストデータを第２音声認識結果として受信し、当該第２音声認識結果の「ＮＴＴ」を、特殊用語として登録する場合について説明する（確信度についても図示する値であるとする）。 On the information processing device 100A side, the voice data and the text data based on the second registered term list 212 received from the voice recognition server 200 are output from the input / output unit 130 (the contents shown in FIG. 6B are output). ). Further, when the information processing apparatus 100A receives the first voice recognition result and the second voice recognition result (including the certainty degree A and the certainty degree B) from the voice recognition server 200 (in the voice recognition result receiving means and the certainty degree receiving means). (Equivalent), perform the term registration process for registering the special term to be registered in the dictionary for voice recognition. That is, the information processing device 100A performs a process for registering the special term included in the user's remark of the information processing device 100B in the dictionary for voice recognition. In the following, the voice data and text data of the content shown in FIG. 6 (A) are received as the first voice recognition result, and the voice data and text data of the content shown in FIG. 6 (B) are used as the second voice recognition result. A case where the data is received and the second voice recognition result "NTT" is registered as a special term will be described (the confidence level is also a value shown in the figure).

図５は、用語登録処理の一例を示すフローチャートである。用語登録処理において、情報処理装置１００Ａは、確信度比較部１２１の機能により、確信度Ｂから確信度Ａを減算した値が、予め定められた閾値以上であるか否か（予め定められた条件を満たすか否か）を判定する（ステップＳ１０１）。閾値未満である場合、情報処理装置１００Ａは、登録すべき対象が存在しないものとして、そのまま用語登録処理を終了する。具体的に、ステップＳ１０１の処理では、図６（Ｂ）に示す確信度０．８９から図６（Ａ）に示す確信度０．１６を減算し、閾値以上であるか否かを判定する。なお、この例における閾値は、０．５として予めユーザにより設定されているものとする。 FIG. 5 is a flowchart showing an example of the term registration process. In the term registration process, the information processing apparatus 100A uses the function of the certainty comparison unit 121 to determine whether or not the value obtained by subtracting the certainty A from the certainty B is equal to or higher than a predetermined threshold value (predetermined condition). Whether or not the condition is satisfied) is determined (step S101). If it is less than the threshold value, the information processing apparatus 100A terminates the term registration process as it is, assuming that there is no target to be registered. Specifically, in the process of step S101, the certainty degree 0.16 shown in FIG. 6 (A) is subtracted from the certainty degree 0.89 shown in FIG. 6 (B) to determine whether or not it is equal to or greater than the threshold value. It is assumed that the threshold value in this example is set to 0.5 in advance by the user.

閾値以上である場合（ステップＳ１０１；Ｙｅｓ）、情報処理装置１００Ａは、形態素抽出部１２２の機能により、音声認識サーバ２００から受信した第１音声認識結果と第２音声認識結果のそれぞれを形態素毎に分割し、異なる形態素を第２音声認識結果から抽出する（ステップＳ１０２）。なお、ステップＳ１０２では、第１音声認識結果のうちのテキストデータを形態素毎に分割し、異なる形態素を抽出した上で、当該形態素に対応する部分の音声データを抽出してもよい。また、第１音声認識結果のうちのテキストデータと音声データの両方を形態素毎に分割し、それぞれについて異なる形態素を抽出してもよい。具体的に、ステップＳ１０２では、図６（Ａ）および図７（Ａ）に示す「Ｖｅｎｄｉｔｔｉ」と図６（Ｂ）および図７（Ｂ）に示す「ＮＴＴ」の形態素が異なるため、図６（Ｂ）および図７（Ｂ）に示す「ＮＴＴ」の形態素を抽出する。なお、図６（Ａ）および図７（Ａ）に示す「Ｖｅｎｄｉｔｔｉ」はこの実施の形態にて理解を容易にするために用いた造語であり、品詞が形容詞であるものとする。また、以下では、当該「ＮＴＴ」の出現頻度が５回であり、今回の例文１にて６回の出現頻度となったものとする。 When the value is equal to or higher than the threshold value (step S101; Yes), the information processing apparatus 100A uses the function of the morpheme extraction unit 122 to set each of the first voice recognition result and the second voice recognition result received from the voice recognition server 200 for each morpheme. It is divided and different morphemes are extracted from the second speech recognition result (step S102). In step S102, the text data in the first voice recognition result may be divided into morphemes, different morphemes may be extracted, and then the voice data of the portion corresponding to the morpheme may be extracted. Further, both the text data and the voice data in the first voice recognition result may be divided into morphemes, and different morphemes may be extracted for each. Specifically, in step S102, since the morphemes of "Venditti" shown in FIGS. 6 (A) and 7 (A) and "NTT" shown in FIGS. 6 (B) and 7 (B) are different, FIG. 6 ( The morpheme of "NTT" shown in B) and FIG. 7 (B) is extracted. It should be noted that "Venditti" shown in FIGS. 6 (A) and 7 (A) is a coined word used for facilitating understanding in this embodiment, and it is assumed that the part of speech is an adjective. Further, in the following, it is assumed that the frequency of appearance of the "NTT" is 5 times, and the frequency of appearance of the "NTT" is 6 times in this example sentence 1.

ステップＳ１０２の処理を実行した後、情報処理装置１００Ａは、品詞推定部１２３の機能により、第１音声認識結果の形態素と第２音声認識結果の形態素を比較し、第２音声認識結果の形態素の品詞が名詞であるものの、第１音声認識結果の形態素が名詞以外である形態素を抽出する（ステップＳ１０３）。なお、上述したように、ステップＳ１０３では、単に異なる品詞の形態素を入出力部１３０に出力し、ユーザにより抽出するか否かを選択させるようにしてもよい。具体的に、ステップＳ１０３の処理では、図７（Ａ）に示す「Ｖｅｎｄｉｔｔｉ」の品詞が「形容詞」であり、図７（Ｂ）に示す「ＮＴＴ」の品詞が「名詞」であることから、図７（Ｂ）に示す「ＮＴＴ」の形態素を抽出する。また、この実施の形態では、図７に示すように「ｏｆ」といった前置詞については、音声認識用の辞書への登録といった観点からすると不要な品詞であることから、比較対象外としている。 After executing the process of step S102, the information processing apparatus 100A compares the morpheme of the first speech recognition result with the morpheme of the second speech recognition result by the function of the part of speech estimation unit 123, and the morpheme of the second speech recognition result. A morpheme whose part of speech is a noun but whose morpheme of the first speech recognition result is other than a noun is extracted (step S103). As described above, in step S103, the morphemes of different part of speech may be simply output to the input / output unit 130, and the user may select whether or not to extract them. Specifically, in the process of step S103, the part of speech of "Venditti" shown in FIG. 7 (A) is an "adjective", and the part of speech of "NTT" shown in FIG. 7 (B) is a "noun". The morpheme of "NTT" shown in FIG. 7B is extracted. Further, in this embodiment, as shown in FIG. 7, a preposition such as "of" is excluded from comparison because it is an unnecessary part of speech from the viewpoint of registration in a dictionary for speech recognition.

ステップＳ１０３の処理を実行した後、情報処理装置１００Ａは、用語分類部１２４の機能により、ステップＳ１０２で抽出した形態素とステップＳ１０３で抽出した形態素とが一致するか否かを判定する（ステップＳ１０４）。一致していない場合（ステップＳ１０４；Ｎｏ）、用語登録処理を終了する。なお、一致していない場合、ステップＳ１０２で抽出した形態素とステップＳ１０３で抽出した形態素のそれぞれに対応する単語情報ついて、登録用語一覧１１１へ登録するか否かをユーザに選択させ、いずれも登録しない場合に当該用語登録処理を終了し、少なくともいずれかを登録する場合には、ステップＳ１０５の処理に移行すればよい。なお、この実施の形態では、ステップＳ１０２の処理およびステップＳ１０３の処理で抽出した形態素同士が一致するか否かを判定したが、ステップＳ１０２の処理のみ、またはステップＳ１０３の処理のみ行い、ステップＳ１０５の処理に移行してもよい。さらに、ステップＳ１０２〜ステップＳ１０４の処理を実行せず、ステップＳ１０１にてＹｅｓと判定した場合には、ステップＳ１０５の処理へ移行してもよい。この場合、例えば、形態素毎の確信度が音声認識サーバ２００から送信されればよい。 After executing the process of step S103, the information processing apparatus 100A determines whether or not the morpheme extracted in step S102 and the morpheme extracted in step S103 match by the function of the term classification unit 124 (step S104). .. If they do not match (step S104; No), the term registration process ends. If they do not match, the user is asked to select whether or not to register the word information corresponding to each of the morpheme extracted in step S102 and the morpheme extracted in step S103 in the registered term list 111, and neither is registered. In this case, when the term registration process is terminated and at least one of them is registered, the process may proceed to step S105. In this embodiment, it is determined whether or not the morphemes extracted in the process of step S102 and the process of step S103 match, but only the process of step S102 or only the process of step S103 is performed, and the process of step S105 is performed. You may move to processing. Further, if the processes of steps S102 to S104 are not executed and a Yes is determined in step S101, the process may shift to the process of step S105. In this case, for example, the certainty of each morpheme may be transmitted from the voice recognition server 200.

一致していると判定した場合（ステップＳ１０４；Ｙｅｓ）、情報処理装置１００Ａは、用語分類部１２４の機能により、抽出した形態素に対応する単語情報を登録対象として認定し、認定した登録対象の形態素の単語情報を、登録分類１１２に基づく分類に基づいて分類する（ステップＳ１０５）。具体的に、ステップＳ１０５の処理では、「ＮＴＴ」の単語情報の出現頻度が６回であることから、当該「ＮＴＴ」は「普遍的に使用される社内用語」の分類に分類する。なお、「普遍的に使用される社内用語」には、例えば、複数のプロジェクトにおいて共通して使用される用語が含まれる。 When it is determined that they match (step S104; Yes), the information processing apparatus 100A certifies the word information corresponding to the extracted morpheme as the registration target by the function of the term classification unit 124, and the certified morpheme of the registration target. The word information of is classified based on the classification based on the registration classification 112 (step S105). Specifically, in the process of step S105, since the frequency of appearance of the word information of "NTT" is 6, the "NTT" is classified into the "universally used in-house term". The "universally used in-house term" includes, for example, a term commonly used in a plurality of projects.

ステップＳ１０５の処理を実行した後、情報処理装置１００Ａは、用語登録部１２５の機能により、ステップＳ１０４の処理にて分類された単語情報としての音声データおよびテキストデータを、当該分類に従い登録用語一覧１１１へ登録する（ステップＳ１０６）。具体的に、ステップＳ１０６の処理では、「普遍的に使用される社内用語」の分類に分類された「ＮＴＴ」の音声データおよびテキストデータを、それぞれ対応付けて、登録用語一覧１１１における「普遍的に使用される社内用語」の分類として登録する。 After executing the process of step S105, the information processing apparatus 100A uses the function of the term registration unit 125 to collect voice data and text data as word information classified by the process of step S104 in the registered term list 111 according to the classification. Register in (step S106). Specifically, in the process of step S106, the voice data and the text data of "NTT" classified into the classification of "universally used in-house terms" are associated with each other, and the "universal" in the registered term list 111 is used. Register as a classification of "in-house terms used in".

ステップＳ１０６の処理を実行した後、情報処理装置１００Ａは、用語登録部１２５の機能により、登録用語一覧１１１へ登録された単語情報の内容に基づいて、第１登録用語一覧２１１の内容を更新させる更新指示を音声認識サーバ２００へ送信し（ステップＳ１０７）、用語登録処理を終了する。具体的に、ステップＳ１０７の処理では、登録用語一覧１１１における「普遍的に使用される社内用語」の分類として登録した「ＮＴＴ」の音声データおよびテキストデータを、更新指示とともに音声認識サーバ２００へ送信し、音声認識サーバ２００に記憶されている第１登録用語一覧２１１に、当該「ＮＴＴ」の音声データおよびテキストデータを追加登録させる。これにより、第１登録用語一覧２１１の内容が更新されることとなる。 After executing the process of step S106, the information processing apparatus 100A updates the content of the first registered term list 211 based on the content of the word information registered in the registered term list 111 by the function of the term registration unit 125. The update instruction is transmitted to the voice recognition server 200 (step S107), and the term registration process is completed. Specifically, in the process of step S107, the voice data and text data of "NTT" registered as the classification of "universally used in-house terms" in the registered term list 111 are transmitted to the voice recognition server 200 together with the update instruction. Then, the voice data and the text data of the "NTT" are additionally registered in the first registered term list 211 stored in the voice recognition server 200. As a result, the contents of the first registered term list 211 will be updated.

図４に戻り、音声認識サーバ２００の側では、情報処理装置１００Ａから更新指示を受信したことに基づいて、第１登録用語一覧２１１の内容を更新する。なお、図示は省略しているが、この後に、情報処理装置１００Ａのユーザが情報処理装置１００Ｂのユーザに対して発言した場合には、情報処理装置１００Ａの制御部１２０の機能により音声データに変換され、当該音声データが音声認識サーバ２００へ送信される。そして情報処理装置１００Ｂの側において用語登録処理が行われ、音声認識サーバ２００における第１登録用語一覧２１１の内容が更新される。このような処理が、当該会議や講演会などの会話が終了するまで繰り返し実行されることとなる。このように、会話毎に用語登録処理が行われて第１登録用語一覧２１１の内容が更新されるため、リアルタイムで音声認識用の辞書が更新されることとなり、音声認識用の辞書を好適に生成することができる。なお、この実施の形態では、２者間での会話を例としたが、３者以上でも同様である。また、このようにして生成された辞書は、公知の日本語入力ソフトにおける辞書にも活用可能である。 Returning to FIG. 4, the voice recognition server 200 updates the contents of the first registered term list 211 based on the update instruction received from the information processing device 100A. Although not shown, if the user of the information processing device 100A subsequently speaks to the user of the information processing device 100B, the data is converted into voice data by the function of the control unit 120 of the information processing device 100A. Then, the voice data is transmitted to the voice recognition server 200. Then, the term registration process is performed on the information processing device 100B side, and the content of the first registered term list 211 in the voice recognition server 200 is updated. Such processing will be repeatedly executed until the conversation such as the conference or lecture is completed. In this way, since the term registration process is performed for each conversation and the contents of the first registered term list 211 are updated, the dictionary for voice recognition is updated in real time, and the dictionary for voice recognition is preferably used. Can be generated. In this embodiment, a conversation between two parties is taken as an example, but the same applies to three or more parties. Further, the dictionary generated in this way can also be used as a dictionary in known Japanese input software.

（変形例）
なお、この発明は、上記実施の形態に限定されず、様々な変形及び応用が可能である。例えば、情報処理装置１００では、上記実施の形態で示した全ての技術的特徴を備えるものでなくてもよく、従来技術における少なくとも１つの課題を解決できるように、上記実施の形態で説明した一部の構成を備えたものであってもよい。また、下記の変形例それぞれについて、少なくとも一部を組み合わせてもよい。 (Modification example)
The present invention is not limited to the above embodiment, and various modifications and applications are possible. For example, the information processing apparatus 100 does not have to have all the technical features shown in the above-described embodiment, and has been described in the above-described embodiment so as to solve at least one problem in the prior art. It may have a structure of parts. In addition, at least a part of each of the following modifications may be combined.

上記実施の形態では、図５のステップＳ１０７の処理が用語登録処理の中で実行される例を示したが、例えば、会議の終了や講演会の終了などといった一連の会話が終了したタイミングで一度行われるようにしてもよい。例えば、会話が終了したタイミングでユーザによる入出力部１３０への操作が行われることで図５に示すステップＳ１０７の処理が実行されるようにしてもよい。また、例えば、「終了」など、予め定められた特定の音声（複数設定されていてよい）を受信した場合に、会話の終了と判定して図５のステップＳ１０７の処理を実行するようにしてもよい。また、これとは異なり、ユーザにより設定された数の単語情報が登録用語一覧１１１へ登録される毎に図５のステップＳ１０７の処理が実行されるようにしてもよい。これらによれば、第１登録用語一覧２１１の更新処理に対する負荷を軽減することができる。 In the above embodiment, an example is shown in which the process of step S107 in FIG. 5 is executed in the term registration process, but once at the timing when a series of conversations such as the end of a meeting or the end of a lecture are completed. It may be done. For example, the process of step S107 shown in FIG. 5 may be executed by the user performing an operation on the input / output unit 130 at the timing when the conversation ends. Further, for example, when a predetermined specific voice (a plurality of settings may be set) such as "end" is received, it is determined that the conversation is finished and the process of step S107 of FIG. 5 is executed. May be good. Further, unlike this, the process of step S107 of FIG. 5 may be executed every time a number of word information set by the user is registered in the registered term list 111. According to these, it is possible to reduce the load on the update process of the first registered term list 211.

また、例えば「ＰｏＣ」という単語について、「ピーオーシー」と読むユーザや「ポック」と読むユーザなど、一の単語について、ユーザ毎に読み方が異なるような場合がある。このような単語について、第２登録用語一覧２１２として、一のテキストデータに対応して複数の音声データを予め登録しておき、図５のステップＳ１０６では、一のテキストデータに対応して複数の音声データを登録用語一覧１１１へ登録すればよい。そして、ステップＳ１０７の処理では、当該内容にて第１登録用語一覧２１１を更新させる指示を行えばよい。これによれば、一の単語について、ユーザ毎に読み方が異なるような場合についても、音声認識用の辞書を好適に生成することができる。 Further, for example, the word "PoC" may be read differently for each user, such as a user who reads "POC" or a user who reads "Pock". With respect to such words, a plurality of voice data corresponding to one text data are registered in advance as the second registered term list 212, and in step S106 of FIG. 5, a plurality of voice data corresponding to one text data are registered. The voice data may be registered in the registered term list 111. Then, in the process of step S107, an instruction to update the first registered term list 211 may be given according to the contents. According to this, even when the reading of one word is different for each user, it is possible to preferably generate a dictionary for voice recognition.

また、上記実施の形態における音声認識サーバ２００の構成を、情報処理装置１００が備えていてもよい。この場合、図５のステップＳ１０７において、自身の記憶部１１０に記憶された第１登録用語一覧２１１を更新し、他の情報処理装置１００に記憶された第１登録用語一覧２１１と同期をとるようにすればよい。 Further, the information processing device 100 may include the configuration of the voice recognition server 200 according to the above embodiment. In this case, in step S107 of FIG. 5, the first registered term list 211 stored in the own storage unit 110 is updated to synchronize with the first registered term list 211 stored in the other information processing apparatus 100. It should be.

なお、上述の機能を、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）とアプリケーションとの分担、またはＯＳとアプリケーションとの協同により実現する場合等には、ＯＳ以外の部分のみを媒体に格納してもよい。 In addition, when the above-mentioned function is realized by sharing the OS (Operating System) and the application, or by cooperating with the OS and the application, only the part other than the OS may be stored in the medium.

また、搬送波にプログラムを重畳し、通信ネットワークを介して配信することも可能である。例えば、通信ネットワーク上の掲示板（ＢＢＳ、ＢｕｌｌｅｔｉｎＢｏａｒｄＳｙｓｔｅｍ）に当該プログラムを掲示し、ネットワークを介して当該プログラムを配信してもよい。そして、これらのプログラムを起動し、オペレーティングシステムの制御下で、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行できるように構成してもよい。 It is also possible to superimpose a program on a carrier wave and distribute it via a communication network. For example, the program may be posted on a bulletin board system (BBS, Bulletin Board System) on a communication network, and the program may be distributed via the network. Then, by starting these programs and executing them in the same manner as other application programs under the control of the operating system, the above-mentioned processing may be executed.

１情報処理システム、１００、１００Ａ、１００Ｂ情報処理装置、１１０、２１０記憶部、１１１登録用語一覧、１１２登録分類、１２０、２２０制御部、１２１確信度比較部、１２２形態素抽出部、１２３品詞推定部、１２４用語分類部、１２５用語登録部、１３０、２３０入出力部、１４０、２４０通信部、２００音声認識サーバ、２１１第１登録用語一覧、２１２第２登録用語一覧、２２１音声認識処理部、２２２確信度算出部、５１０ネットワーク 1 Information processing system, 100, 100A, 100B Information processing device, 110, 210 Storage unit, 111 Registered term list, 112 Registration classification, 120, 220 Control unit, 121 Confidence comparison unit, 122 Morphological element extraction unit, 123 Part code estimation unit , 124 term classification unit, 125 term registration unit, 130, 230 input / output unit, 140, 240 communication unit, 200 voice recognition server, 211 first registered term list, 212 second registered term list, 221 voice recognition processing unit, 222 Confidence calculation unit, 510 network

Claims

A voice recognition result receiving means for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence receiving means and
A word information registration means for registering word information included in the second voice recognition result when the first certainty degree and the second certainty degree are compared and a predetermined condition is satisfied is provided.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
An information processing device characterized by this.

When the predetermined conditions are satisfied, an extraction means for extracting word information to be registered from the second voice recognition result according to a predetermined standard is further provided.
The word information registration means registers the word information extracted by the extraction means.
The information processing apparatus according to claim 1.

Further provided with a classification means for classifying the word information extracted by the extraction means into one of a plurality of predetermined classifications for each occurrence frequency.
The word information registration means registers word information classified by the classification means for each classification.
The information processing apparatus according to claim 2.

The word information includes voice information and character information.
A first dictionary update means for updating the first dictionary by adding the word information registered by the word information registration means to the first dictionary is further provided.
The second dictionary is newly stored by the operation of the user every time the first dictionary is updated.
The information processing apparatus according to any one of claims 1 to 3.

A voice recognition result receiving step for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence reception step and
A word information registration step of comparing the first certainty degree and the second certainty degree and registering the word information included in the second voice recognition result when a predetermined condition is satisfied is provided.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
An information processing method characterized by this.

Computer,
A voice recognition result receiving means for receiving a first voice recognition result based on the first dictionary and a second voice recognition result based on a second dictionary different from the first dictionary.
Receives the first certainty of the first voice recognition result calculated based on a predetermined calculation and the second certainty of the second voice recognition result calculated based on the calculation. Confidence receiving means,
When the first certainty degree and the second certainty degree are compared and a predetermined condition is satisfied, the word information included in the second voice recognition result is registered as a word information registration means.
The second dictionary includes word information specified by the user in addition to the word information registered in the first dictionary.
A program characterized by that.