JP2006301102A

JP2006301102A - Voice recognition device and program

Info

Publication number: JP2006301102A
Application number: JP2005119881A
Authority: JP
Inventors: Takashi Sudo; 貴志須藤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2005-04-18
Filing date: 2005-04-18
Publication date: 2006-11-02
Anticipated expiration: 2025-04-18
Also published as: JP4749756B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a voice recognition device which can be adapted to various systems such as a sound model without preholding information regarding a user. <P>SOLUTION: A screen control description language analysis section 3 analyzes specific information included in a description language for screen control. An adaptation information determination section 9 holds data of adaptation information corresponding to the specific information and determines adaptation information corresponding to the specific information based upon an analysis result of the screen control description language analysis section 3. A voice recognition section 8 recognizes a voice input to a screen displayed, based upon the description language for screen control according to the adaptation information determined by the adaptation information determination section 9. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、画面中の入力項目に対する入力を音声で操作する音声認識装置及びそのプログラムに関するものである。 The present invention relates to a voice recognition apparatus and a program for operating a voice input for an input item on a screen.

近年は携帯端末を持つ人が急増し、キーボードによる入力以外の端末への情報入力方法に対する需要が高まってきている。その手法の一つが音声による入力である。音声認識において、ユーザや使用環境に応じてシステムを適応させる方法については様々な技術が提案されている。 In recent years, the number of people with portable terminals has increased rapidly, and the demand for information input methods to terminals other than keyboard input has increased. One method is voice input. In speech recognition, various techniques have been proposed for a method of adapting a system according to a user and a use environment.

例えば、ユーザの顧客番号や電話番号を使用してユーザを特定し、そのユーザに応じて、性別、年齢別、地域別に予め複数用意された音響モデルを変更するようにしたものがあった（例えば、特許文献１参照）。 For example, a user is identified using a customer number or a telephone number, and a plurality of acoustic models prepared in advance for each gender, age, and region are changed according to the user (for example, , See Patent Document 1).

特開２０００−３４７６８４号公報JP 2000-347684 A

しかしながら、従来の音声認識装置では、ユーザに関する情報を予め保持しておく必要がある。そのため、不特定多数の人が使用するシステムでは、音響モデルの変更を行うことが出来ない、という問題点があった。 However, in the conventional speech recognition apparatus, it is necessary to hold information about the user in advance. Therefore, there is a problem that the acoustic model cannot be changed in a system used by an unspecified number of people.

この発明は上記のような課題を解決するためになされたもので、ユーザに関する情報を予め保持しておくことなく、音響モデルなど様々なシステムに関する適応を可能にする音声認識装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition apparatus that can be applied to various systems such as an acoustic model without previously storing information about a user. And

この発明に係る音声認識装置は、画面制御用記述言語に含まれる特定の情報を解析する画面制御用記述言語解析部と、画面制御用記述言語解析部の解析結果に基づいて、特定の情報に対応した適応化情報を決定する適応化情報決定部と、適応化情報決定部で決定した適応化情報に基づいて、画面制御用記述言語に基づいて表示された画面に対して入力された音声の音声認識を行う音声認識部とを備えたものである。 The speech recognition apparatus according to the present invention converts the specific information included in the screen control description language into specific information based on the analysis result of the screen control description language analysis unit and the screen control description language analysis unit. An adaptation information determination unit for determining the corresponding adaptation information, and based on the adaptation information determined by the adaptation information determination unit, the speech input to the screen displayed based on the screen control description language And a voice recognition unit that performs voice recognition.

この発明の音声認識装置は、画面制御用記述言語に含まれる特定の情報に対応した適応化情報に基づいて音声認識を行うようにしたので、ユーザに関する情報を予め保持しておくことなく、音響モデルなど様々なシステムに関する適応を可能にする音声認識装置を得ることができる。 Since the speech recognition apparatus according to the present invention performs speech recognition based on the adaptation information corresponding to specific information included in the screen control description language, the speech recognition apparatus does not hold information about the user in advance. It is possible to obtain a speech recognition apparatus that can be applied to various systems such as models.

実施の形態１．
図１は、この発明の実施の形態１による音声認識装置を示す構成図である。
図において、音声認識装置は、画面制御用記述言語取得部１、画面表示部２、画面制御用記述言語解析部３、システム適応化情報保持部４、システムデータ保持部５、システム適応部６、音声取得部７、音声認識部８からなる。 Embodiment 1 FIG.
1 is a block diagram showing a speech recognition apparatus according to Embodiment 1 of the present invention.
In the figure, the speech recognition apparatus includes a screen control description language acquisition unit 1, a screen display unit 2, a screen control description language analysis unit 3, a system adaptation information holding unit 4, a system data holding unit 5, a system adaptation unit 6, It comprises a voice acquisition unit 7 and a voice recognition unit 8.

画面制御用記述言語取得部１は、複数の入力項目を画面に表示するための画面制御用記述言語を取得する機能部である。画面表示部２は、画面制御用記述言語取得部１で取得した画面制御用記述言語に基づいて画面表示を行う機能部であり、画面制御のための制御部とディスプレイ等の表示部からなるものである。画面制御用記述言語解析部３は、画面制御用記述言語取得部１で取得した画面制御用記述言語に基づいて、その画面データに含まれる特定の情報を解析する機能部である。 The screen control description language acquisition unit 1 is a functional unit that acquires a screen control description language for displaying a plurality of input items on the screen. The screen display unit 2 is a functional unit that performs screen display based on the screen control description language acquired by the screen control description language acquisition unit 1, and includes a control unit for screen control and a display unit such as a display. It is. The screen control description language analysis unit 3 is a functional unit that analyzes specific information included in the screen data based on the screen control description language acquired by the screen control description language acquisition unit 1.

システム適応化情報保持部４は、画面制御用記述言語解析部３で解析された特定の情報と、予め設けられた適応化情報としての騒音モデルとの対応関係を示す適応化対応表を有し、この適応化対応表から特定の情報に対応した騒音モデルの情報を選択し、これを適応化対応情報としてシステムデータ保持部５に出力する機能部である。システムデータ保持部５は、適応化モデルとして実際の騒音モデルを保持しており、システム適応化情報保持部４からの適応化対応情報に基づいて選択した騒音モデルを出力するよう構成されている。システム適応部６は、その画面に対応した騒音モデルを決定し、これを適応化情報として音声認識部８に出力する機能部である。また、これら、システム適応化情報保持部４〜システム適応部６によって、画面制御用記述言語解析部３で解析された特定の情報に対応した適応化情報を決定する適応化情報決定部９を構成している。 The system adaptation information holding unit 4 has an adaptation correspondence table showing correspondence between specific information analyzed by the screen control description language analysis unit 3 and a noise model as adaptation information provided in advance. This is a functional unit that selects noise model information corresponding to specific information from the adaptation correspondence table and outputs the selected information to the system data holding unit 5 as adaptation correspondence information. The system data holding unit 5 holds an actual noise model as an adaptation model, and is configured to output a noise model selected based on the adaptation correspondence information from the system adaptation information holding unit 4. The system adaptation unit 6 is a functional unit that determines a noise model corresponding to the screen and outputs this as adaptation information to the speech recognition unit 8. The system adaptation information holding unit 4 to the system adaptation unit 6 constitute an adaptation information determination unit 9 that determines adaptation information corresponding to specific information analyzed by the screen control description language analysis unit 3. is doing.

音声取得部７は、ユーザの音声を取得する機能部であり、音声入力回路等からなるものである。音声認識部８は、システム適応部６からの適応化情報としての騒音モデルと音声取得部７からの音声情報に基づいて、音声認識を行い、その認識結果を画面表示部２に出力する機能部である。 The voice acquisition unit 7 is a functional unit that acquires a user's voice, and includes a voice input circuit and the like. The voice recognition unit 8 performs voice recognition based on the noise model as adaptation information from the system adaptation unit 6 and the voice information from the voice acquisition unit 7, and outputs the recognition result to the screen display unit 2. It is.

また、以降の実施の形態も含めて、本発明の音声認識装置はコンピュータを用いて実現され、上記の画面制御用記述言語取得部１〜音声認識部８の構成（画面表示部２は制御部分の構成）は、それぞれの機能に対応したプログラムと、これらのプログラムを実行するためのＣＰＵやメモリといったハードウェアから実現されている。 In addition, including the following embodiments, the speech recognition apparatus of the present invention is realized by using a computer, and the configuration of the screen control description language acquisition unit 1 to the speech recognition unit 8 (the screen display unit 2 is a control part). Is realized by a program corresponding to each function and hardware such as a CPU and a memory for executing these programs.

次に、実施の形態１の動作について説明する。
本実施の形態では、画面制御用記述言語として、代表的なＷｅｂブラウザが対応しているマークアップ言語であるＨＴＭＬ（HyperText Markup Language)として説明する。また、本実施の形態では、施設の点検作業のため複数の入力項目を持つページにおいて音声入力を行う場合を例にとって説明する。更に、本実施の形態では、ＨＴＭＬ文書のＵＲＬ（Uniform Resource Locator）を画面制御用記述言語に含まれる特定の情報として使用し、騒音モデルを選択及び設定する方法について説明する。 Next, the operation of the first embodiment will be described.
In the present embodiment, description will be made as HTML (HyperText Markup Language), which is a markup language supported by a typical Web browser, as a screen control description language. In the present embodiment, a case where voice input is performed on a page having a plurality of input items for facility inspection work will be described as an example. Furthermore, in the present embodiment, a method for selecting and setting a noise model using a URL (Uniform Resource Locator) of an HTML document as specific information included in the screen control description language will be described.

先ず、例に挙げるサイトについて説明する。
図２に施設１における点検用ページのHTML文書を示す。
ここで、施設１における点検用ページのURLは「http://www.tenken.ne.jp/place1.html」とする。
また、図３に施設１における点検用ページの画面表示内容を示す。施設１における点検用ページでは施設１の施設中央の温度の入力を入力項目１０１で、施設１の施設出口の温度の入力を入力項目１０２で行う。施設１における点検用ページにおける入力項目１０１と１０２への入力を行った後、入力決定ボタン１０３を押下すると、点検したデータがシステムに入力される。 First, an example site will be described.
FIG. 2 shows an HTML document of the inspection page in the facility 1.
Here, the URL of the inspection page in the facility 1 is “http://www.tenken.ne.jp/place1.html”.
FIG. 3 shows the screen display contents of the inspection page in the facility 1. In the inspection page for the facility 1, the temperature at the center of the facility 1 is input by the input item 101, and the temperature at the facility outlet of the facility 1 is input by the input item 102. After inputting the input items 101 and 102 on the inspection page in the facility 1, when the input determination button 103 is pressed, the inspected data is input to the system.

また、図４に施設２における点検用ページのHTML文書を示す。
ここで、施設２における点検用ページのＵＲＬは、「http://www.tenken.ne.jp/place2.html」とする。また、図５に施設２における点検用ページの画面表示内容を示す。施設２における点検用ページでは施設２の施設中央の温度の入力を入力項目２０１で、施設２の施設出口の温度の入力を入力項目２０２で行う。施設２における点検用ページにおける入力項目２０１と２０２への入力を行った後、入力決定ボタン２０３を押下すると、点検したデータがシステムに入力される。 FIG. 4 shows an HTML document of the inspection page in the facility 2.
Here, the URL of the inspection page in the facility 2 is “http://www.tenken.ne.jp/place2.html”. FIG. 5 shows the screen display contents of the inspection page in the facility 2. On the inspection page in the facility 2, the temperature at the center of the facility 2 is input by the input item 201, and the temperature at the facility outlet of the facility 2 is input by the input item 202. After inputting the input items 201 and 202 on the inspection page in the facility 2, when the input determination button 203 is pressed, the inspected data is input to the system.

ここで、各施設についての情報を図６に示す。
上記のような点検の場合、図６に示すように点検場所によって騒音環境が大きく異なる可能性がある。そのような異なる騒音環境下で同一の騒音モデルを使用して音声認識を行った場合、データのミスマッチを引き起こし、高い認識性能が得られない可能性がある。このような問題に対応するため、本実施の形態では、適応化情報決定部９において、表示画面に応じて、使用するシステムデータ（適応化情報）を切り替える。即ち、システム適応化情報保持部４で、画面制御用記述言語解析部３で解析した情報に対応した騒音モデルを選択し、この騒音モデルをシステムデータ保持部５から取り出して、システム適応部６で、適応化情報として出力する。 Here, the information about each facility is shown in FIG.
In the case of the inspection as described above, the noise environment may vary greatly depending on the inspection place as shown in FIG. When speech recognition is performed using the same noise model under such different noise environments, there is a possibility that data mismatch occurs and high recognition performance cannot be obtained. In order to cope with such a problem, in this embodiment, the adaptation information determination unit 9 switches the system data (adaptation information) to be used according to the display screen. That is, the system adaptation information holding unit 4 selects a noise model corresponding to the information analyzed by the screen control description language analysis unit 3, extracts the noise model from the system data holding unit 5, and the system adaptation unit 6 And output as adaptation information.

ここで、騒音モデルとは、音声認識使用環境下の騒音を集め、その特徴を情報化したものである。この騒音モデルは音響モデルの一部として、入力データの無音部分に対する確率演算に使用される情報である。本実施の形態では音響モデルは、既定の音響モデルを使用するものとする。 Here, the noise model is a collection of noise under a voice recognition usage environment and information about its characteristics. This noise model is information used as a part of the acoustic model for the probability calculation for the silent portion of the input data. In the present embodiment, it is assumed that a predetermined acoustic model is used as the acoustic model.

次に、具体的な例を用いて本実施の形態の動作について説明する。
以下、図２の点検用ページが画面表示された場合のシステム適応の動作について、図７のフローチャートを使用して説明する。 Next, the operation of this embodiment will be described using a specific example.
Hereinafter, the system adaptation operation when the inspection page of FIG. 2 is displayed on the screen will be described with reference to the flowchart of FIG.

先ず、ステップＳＴ１０１において、画面制御用記述言語取得部１は、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。ここで、表示画面構成を記述している図２に示すＨＴＭＬ文書を取得する。 First, in step ST101, the screen control description language acquisition unit 1 starts from the network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Get display screen information. Here, the HTML document shown in FIG. 2 describing the display screen configuration is acquired.

ステップＳＴ１０２において、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、画面表示部２は、図３に示すページを表示する。
ステップＳＴ１０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬ「http://www.tenken.ne.jp/place1.html」が得られる。 In step ST102, based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1, the screen display unit 2 displays the page shown in FIG.
In step ST103, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL “http://www.tenken.ne.jp/place1.html” of the HTML document is obtained.

ＵＲＬとシステムデータ保持部５で保持している騒音モデルとの関連付け情報は、システム適応化情報保持部４において、適応化対応表としてとして保持されており、例えば図８に示すような形式で格納される。図８から分かるように、ＵＲＬに文字列「place1」が含まれている場合、施設１が本システムの使用場所であることを特定できる。 The association information between the URL and the noise model held in the system data holding unit 5 is held as an adaptation correspondence table in the system adaptation information holding unit 4 and stored in a format as shown in FIG. 8, for example. Is done. As can be seen from FIG. 8, when the character string “place1” is included in the URL, it is possible to specify that the facility 1 is the place of use of the present system.

システムデータ保持部５は、例えば、図９に示す騒音モデルを適応化モデルとして保持しているものとする。ここで、施設１用騒音モデルは、施設１の環境騒音、即ち電話のベル音や話し声などの騒音を学習することによって得られる騒音モデルであるものとする。また、施設２用騒音モデル及び施設３用騒音モデルも同様に、施設２、施設３それぞれの環境騒音を学習することによって得られる騒音モデルであるものとする。また、既定騒音モデルは、画面制御用記述言語解析部３での解析結果と、システム適応化情報保持部４で保持している情報との関連付けがなされなかった場合に設定するための騒音モデルである。 For example, the system data holding unit 5 holds the noise model shown in FIG. 9 as an adaptation model. Here, the noise model for the facility 1 is assumed to be a noise model obtained by learning the environmental noise of the facility 1, that is, noise such as a telephone bell sound or talking voice. Similarly, the noise model for the facility 2 and the noise model for the facility 3 are also noise models obtained by learning the environmental noise of the facility 2 and the facility 3, respectively. The default noise model is a noise model for setting when the analysis result in the screen control description language analysis unit 3 and the information held in the system adaptation information holding unit 4 are not associated with each other. is there.

ステップＳＴ１０４において、画面制御用記述言語解析部３で解析した結果得られたＵＲＬと、システム適応化情報保持部４において保持している図８に示す適応化対応表を照合する。その結果、ＵＲＬには「place1」という文字列が含まれているので、「施設１用騒音モデル」を適応化対応情報としてシステムデータ保持部５に出力する。これにより、システムデータ保持部５は、図３に示すページにおいて使用する騒音モデルとして「施設１用騒音モデル」を選択する。ステップＳＴ１０５において、システム適応部６では、選択された騒音モデルに切り替える。 In step ST104, the URL obtained as a result of the analysis by the screen control description language analysis unit 3 is collated with the adaptation correspondence table shown in FIG. 8 held in the system adaptation information holding unit 4. As a result, since the character string “place1” is included in the URL, “the noise model for facility 1” is output to the system data holding unit 5 as adaptation correspondence information. As a result, the system data holding unit 5 selects “the noise model for facility 1” as the noise model used in the page shown in FIG. In step ST105, the system adaptation unit 6 switches to the selected noise model.

設定された騒音モデルは、図１０に示すように音声認識部８における確率演算処理において、既定の音響モデルの一部として、入力データの無音部分に対する演算処理で使用される。尚、音声認識部８における音声区間検出処理、音響特徴抽出処理、確率演算処理および認識辞書を用いた認識候補探索処理は、公知の処理であるため、ここでの説明は省略する。 As shown in FIG. 10, the set noise model is used in the arithmetic processing for the silent portion of the input data as part of the default acoustic model in the probability calculation processing in the speech recognition unit 8. Note that the speech segment detection processing, acoustic feature extraction processing, probability calculation processing, and recognition candidate search processing using the recognition dictionary in the speech recognition unit 8 are known processing, and thus description thereof is omitted here.

このような動作により、騒音環境下においても適切な騒音モデルが設定されるため、認識性能が向上する。また、もし、画面制御用記述言語解析部３で解析して得られた表示画面のＵＲＬが、システム適応化情報保持部４において保持している適応化対応表にない場合は、これを示す情報が適応化対応情報としてシステムデータ保持部５に出力されて、システムデータ保持部５からは既定騒音モデルがシステム適応部６に出力され、ステップＳＴ１０６において、この既定騒音モデルが適応化情報としてシステム適応部６で設定される。 By such an operation, an appropriate noise model is set even in a noisy environment, so that recognition performance is improved. Also, if the URL of the display screen obtained by analysis by the screen control description language analysis unit 3 is not in the adaptation correspondence table held in the system adaptation information holding unit 4, information indicating this Is output to the system data holding unit 5 as adaptation support information, and the default noise model is output from the system data holding unit 5 to the system adaptation unit 6. In step ST106, the default noise model is applied to the system adaptation as adaptation information. Set in section 6.

ＵＲＬが「http://www.tenken.ne.jp/place2.html」であるネットワークアクセス先から画面制御用記述言語取得部１において、表示画面情報を取得した場合も上記と同様の処理を行う。表示画面のＵＲＬには「place2」という文字列が含まれているので、システム適応化情報保持部４において保持している適応化対応表から、図５に示すページにおいて使用する騒音モデルとして「施設２用騒音モデル」が選択されて適応化対応情報として出力され、システムデータ保持部５では、この適応化対応情報に基づいて、保持している騒音モデルの中からこの「施設２用騒音モデル」を選択し、システム適応部６では選択された騒音モデルに切り替える。 The same process as described above is performed when the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place2.html”. . Since the character string “place2” is included in the URL of the display screen, the “facility” is used as the noise model used on the page shown in FIG. 5 from the adaptation correspondence table held in the system adaptation information holding unit 4. "Noise model for 2" is selected and output as adaptation correspondence information. Based on this adaptation correspondence information, the system data holding unit 5 selects the "Noise model for facility 2" from the held noise models. The system adaptation unit 6 switches to the selected noise model.

上記のように、表示画面のＵＲＬを使用して使用場所を特定することにより、使用場所によって騒音環境が大きく異なる場合でも適切に騒音モデルを設定することが可能となる。 As described above, by specifying the use location using the URL of the display screen, it is possible to appropriately set the noise model even when the noise environment varies greatly depending on the use location.

また、本実施形態を説明する上で、画面制御用記述言語解析部３で取得した情報としてＨＴＭＬ文書のＵＲＬのみとしているが、任意の表示画面解析情報を使用することが可能である。例えば、表示画面に記載されている文字列を使用することも可能である。更に、本実施の形態を説明する上で、画面制御用記述言語としてＨＴＭＬ文書を使用したが、これに限定されるものではなく、任意の記述言語を使用することが可能である。 In the description of the present embodiment, only the URL of the HTML document is used as the information acquired by the screen control description language analysis unit 3, but any display screen analysis information can be used. For example, it is possible to use a character string described on the display screen. Further, in describing the present embodiment, an HTML document is used as a screen control description language, but the present invention is not limited to this, and any description language can be used.

以上のように、実施の形態１の音声認識装置によれば、画面制御用記述言語に含まれる特定の情報を解析する画面制御用記述言語解析部と、特定の情報に対応した適応化情報を有し、画面制御用記述言語解析部における解析結果に基づいて、適応化情報を決定する適応化情報決定部と、画面制御用記述言語に基づいて表示された画面に対して入力された音声を取得する音声取得部と、適応化情報決定部で決定した適応化情報に基づいて、音声取得部で取得された音声の音声認識を行う音声認識部とを備えたので、ユーザに関する情報を予め保持しておくことなく、音響モデルなど様々なシステムに関する適応を可能にする音声認識装置を得ることができる。 As described above, according to the speech recognition apparatus of the first embodiment, the screen control description language analysis unit that analyzes specific information included in the screen control description language, and the adaptation information corresponding to the specific information are provided. And an adaptation information determination unit for determining adaptation information based on the analysis result in the screen control description language analysis unit, and voice input to the screen displayed based on the screen control description language. Since it has a voice acquisition unit to be acquired and a voice recognition unit that performs voice recognition of the voice acquired by the voice acquisition unit based on the adaptation information determined by the adaptation information determination unit, information on the user is held in advance It is possible to obtain a speech recognition apparatus that can be applied to various systems such as an acoustic model without having to do so.

また、実施の形態１の音声認識装置によれば、適応化情報決定部は、表示画面に応じた騒音モデルを適応化情報として決定するようにしたので、更に、使用場所によって騒音環境が大きく異なる場合でも適切に騒音モデルを設定することが可能となるという効果が得られる。 Further, according to the speech recognition apparatus of the first embodiment, the adaptation information determination unit determines the noise model corresponding to the display screen as the adaptation information, and the noise environment greatly varies depending on the place of use. Even in this case, it is possible to appropriately set the noise model.

また、実施の形態１の音声認識装置の制御用プログラムによれば、コンピュータを、画面制御用記述言語に含まれる特定の情報を解析する画面制御用記述言語解析部と、画面制御用記述言語解析部における解析結果に基づいて、特定の情報に対応した適応化情報を決定する適応化情報決定部と、適応化情報決定部で決定した適応化情報に基づいて、画面制御用記述言語に基づいて表示された画面に対して入力された音声の音声認識を行う音声認識部とを備えた音声認識装置として機能させるようにしたので、ユーザに関する情報を予め保持しておくことなく、音響モデルなど様々なシステムに関する適応を可能にする音声認識装置をコンピュータ上に実現することができる効果がある。 In addition, according to the control program for the speech recognition apparatus of the first embodiment, the computer includes a screen control description language analysis unit that analyzes specific information included in the screen control description language, and a screen control description language analysis. Based on the analysis result in the unit, an adaptation information determination unit that determines the adaptation information corresponding to the specific information, based on the adaptation information determined by the adaptation information determination unit, based on the screen control description language Since it is made to function as a voice recognition device including a voice recognition unit that performs voice recognition of voice input to the displayed screen, various information such as an acoustic model can be obtained without holding information about the user in advance. It is possible to implement a speech recognition apparatus that enables adaptation on a simple system on a computer.

実施の形態２．
実施の形態２は、適応化モデルとして、使用環境毎の騒音重畳音響モデルを用意し、表示画面に応じて騒音重畳音響モデルの切り替えを行うようにしたものである。 Embodiment 2. FIG.
In the second embodiment, a noise superimposed acoustic model for each use environment is prepared as an adaptation model, and the noise superimposed acoustic model is switched according to the display screen.

実施の形態２の音声認識装置において、図面上の構成は実施の形態１と同様であるため、図１を援用して説明する。実施の形態２では、システム適応化情報保持部４で保持する適応化対応表は、画面制御用記述言語解析部３で取得した特定の情報と、騒音重畳音響モデルとの対応関係を示すよう構成され、システム適応化情報保持部４では、特定の情報が与えられた場合は、この適応化対応表に基づいて、騒音重畳音響モデルの情報を適応化対応情報として出力するよう構成されている。また、システムデータ保持部５では、適応化モデルとして騒音重畳音響モデルを有しており、システム適応化情報保持部４から適応化対応情報が与えられた場合は、これに対応した騒音重畳音響モデルを選択して出力するよう構成されている。 In the speech recognition apparatus according to the second embodiment, the configuration in the drawing is the same as that in the first embodiment, and thus will be described with reference to FIG. In the second embodiment, the adaptation correspondence table held by the system adaptation information holding unit 4 is configured to indicate the correspondence between the specific information acquired by the screen control description language analysis unit 3 and the noise superimposed acoustic model. Then, when specific information is given, the system adaptation information holding unit 4 is configured to output information on the noise superimposed acoustic model as adaptation correspondence information based on this adaptation correspondence table. Further, the system data holding unit 5 has a noise superimposed acoustic model as an adaptation model. When the adaptation correspondence information is given from the system adaptation information holding unit 4, the noise superimposed acoustic model corresponding thereto is provided. Is selected and output.

図１１は、システム適応化情報保持部４で保持する適応化情報表の説明図である。
図１２は、システムデータ保持部５で保持する騒音重畳音響モデルの説明図である。
これらの図に示す騒音重畳音響モデルとは、予め騒音が重畳された音声データにより学習した、音声認識に用いる基本的な音の単位（子音や母音など）の情報が記述されたものである。 FIG. 11 is an explanatory diagram of an adaptation information table held by the system adaptation information holding unit 4.
FIG. 12 is an explanatory diagram of a noise superimposed acoustic model held by the system data holding unit 5.
The noise superimposing acoustic model shown in these drawings describes information on basic sound units (consonants, vowels, etc.) used for speech recognition, learned from speech data on which noise is superimposed in advance.

また、システム適応部６は、システムデータ保持部５から出力された騒音重畳音響モデルを適応化情報として出力するよう構成されている。これ以外の構成は図１に示した実施の形態１と同様である。 The system adaptation unit 6 is configured to output the noise superimposed acoustic model output from the system data holding unit 5 as adaptation information. The other configuration is the same as that of the first embodiment shown in FIG.

次に、実施の形態２の動作について説明する。尚、本実施の形態の動作を説明する上で使用するページ及び使用環境は実施の形態１と同様であるとする。 Next, the operation of the second embodiment will be described. It should be noted that the pages and usage environment used for explaining the operation of the present embodiment are the same as those of the first embodiment.

図１３は、実施の形態２における適応化モデル決定の動作を示すフローチャートである。
本実施の形態では、実施の形態１と同様に、図２の点検用ページが画面表示された場合のシステム適応の動作について説明する。 FIG. 13 is a flowchart showing the operation of determining the adaptation model in the second embodiment.
In the present embodiment, as in the first embodiment, an operation of system adaptation when the inspection page in FIG. 2 is displayed on the screen will be described.

先ず、ステップＳＴ２０１において、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。ここで、表示画面構成を記述している図２に示すＨＴＭＬ文書を取得する。 First, in step ST201, the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Here, the HTML document shown in FIG. 2 describing the display screen configuration is acquired.

ステップＳＴ２０２において、画面表示部２は、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、図３に示すページを表示する。ステップＳＴ２０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬ「http://www.tenken.ne.jp/place1.html」が得られる。ＵＲＬとシステムデータ保持部５で保持している騒音重畳音響モデルとの関連付け情報は、システム適応化情報保持部４において、図１１に示すように適応化対応情報として保持されている。図１１から分かるように、ＵＲＬに文字列「place1」が含まれている場合、施設１が本システムの使用場所であることが特定できる。 In step ST202, the screen display unit 2 displays the page shown in FIG. 3 based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1. In step ST203, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL “http://www.tenken.ne.jp/place1.html” of the HTML document is obtained. The association information between the URL and the noise superimposed acoustic model held in the system data holding unit 5 is held as adaptation correspondence information in the system adaptation information holding unit 4 as shown in FIG. As can be seen from FIG. 11, when the character string “place1” is included in the URL, it is possible to specify that the facility 1 is the place where the present system is used.

システムデータ保持部５は、例えば、図１２に示す騒音重畳音響モデルを保持している。ここで、施設１用騒音重畳音響モデルは、施設１の環境騒音を重畳した音声データを用いて学習することによって得られる音響モデルであるものとする。施設２用騒音重畳音響モデル及び施設３用騒音重畳音響モデルも同様に、施設２、施設３それぞれの環境騒音を重畳した音声データを用いて学習することによって得られる音響モデルであるものとする。また、既定騒音重畳音響モデルは画面制御用記述言語解析部３での解析結果とシステム適応化情報保持部４で保持している情報との関連付けがなされなかった場合に設定するための騒音重畳音響モデルである。 The system data holding unit 5 holds, for example, a noise superimposed acoustic model shown in FIG. Here, the noise superimposing acoustic model for facility 1 is assumed to be an acoustic model obtained by learning using voice data on which environmental noise of facility 1 is superimposed. Similarly, the noise superimposing acoustic model for facility 2 and the noise superimposing acoustic model for facility 3 are acoustic models obtained by learning using voice data in which environmental noises of facilities 2 and 3 are superimposed. The default noise superimposing acoustic model is a noise superimposing acoustic to be set when the analysis result in the screen control description language analysis unit 3 and the information held in the system adaptation information holding unit 4 are not associated with each other. It is a model.

ステップＳＴ２０４において、画面制御用記述言語解析部３で解析した結果得られたＵＲＬとシステム適応化情報保持部４において保持している図１１に示す適応化対応表を照合する。その結果、ＵＲＬには「place1」という文字列が含まれているので、図３に示すページにおいて使用する騒音重畳音響モデルとして「施設１用騒音重畳音響モデル」が選択されてこの情報が適応化対応情報として出力される。システムデータ保持部５では、適応化対応情報に基づいて、保持している騒音重畳音響モデルの中から対応するモデルを選択し、これを出力する。ステップＳＴ２０５において、システム適応部６では、与えられた騒音重畳音響モデルに切り替える。設定された騒音重畳音響モデルは、図１４に示すように音声認識部における確率演算処理に使用される。これにより、騒音環境下においても適切な騒音重畳音響モデルが設定されるため、認識性能が向上する。 In step ST204, the URL obtained as a result of analysis by the screen control description language analysis unit 3 is collated with the adaptation correspondence table shown in FIG. 11 held in the system adaptation information holding unit 4. As a result, since the character string “place1” is included in the URL, “noise superimposing acoustic model for facility 1” is selected as the noise superimposing acoustic model used in the page shown in FIG. 3 and this information is adapted. Output as correspondence information. The system data holding unit 5 selects a corresponding model from the held noise superimposed acoustic models based on the adaptation correspondence information, and outputs the selected model. In step ST205, the system adaptation unit 6 switches to the given noise superimposed acoustic model. The set noise superimposing acoustic model is used for probability calculation processing in the speech recognition unit as shown in FIG. As a result, an appropriate noise-superimposed acoustic model is set even in a noisy environment, thereby improving recognition performance.

もし、画面制御用記述言語解析部３で解析して得られた表示画面のＵＲＬが、システム適応化情報保持部４において保持している図１１に示す適応化対応表にない場合は、ステップＳＴ２０６において、既定騒音重畳音響モデルがシステム適応部６で設定される。 If the URL of the display screen obtained by analysis by the screen control description language analysis unit 3 is not in the adaptation correspondence table shown in FIG. 11 held in the system adaptation information holding unit 4, step ST206 is performed. , The default noise superimposing acoustic model is set by the system adaptation unit 6.

上記のように、表示画面のＵＲＬを使用して使用場所を特定することにより、使用場所によって騒音環境が大きく異なる場合でも適切に騒音重畳音響モデルを設定することが可能となる。 As described above, by specifying the use location using the URL of the display screen, it is possible to appropriately set the noise superimposed acoustic model even when the noise environment varies greatly depending on the use location.

また、本実施の形態を説明する上で、画面制御用記述言語解析部３で取得した情報としてＨＴＭＬ文書のＵＲＬのみとしているが、任意の表示画面解析情報を使用することが可能である。例えば、表示画面に記載されている文字列を使用することも可能である。更に、本実施の形態を説明する上で、画面制御用記述言語としてＨＴＭＬ文書を使用したが、任意の記述言語を使用することが可能である。 In the description of the present embodiment, only the URL of the HTML document is used as information acquired by the screen control description language analysis unit 3, but any display screen analysis information can be used. For example, it is possible to use a character string described on the display screen. Furthermore, in describing the present embodiment, an HTML document is used as the screen control description language, but any description language can be used.

以上のように、実施の形態２の音声認識装置によれば、適応化情報決定部は、表示画面に応じた騒音重畳音響モデルを適応化情報として決定するようにしたので、使用場所によって騒音環境が大きく異なる場合でも適切に騒音重畳音響モデルを設定することが可能となる。即ち、騒音モデルの設定だけでは、使用環境に十分に適応しているとは言えないような場合でも、音声認識性能を改善できる効果がある。 As described above, according to the speech recognition apparatus of the second embodiment, the adaptation information determination unit determines the noise superimposed acoustic model corresponding to the display screen as the adaptation information. It is possible to appropriately set the noise superimposing acoustic model even when the values are greatly different. That is, there is an effect that the speech recognition performance can be improved even if the noise model alone is not sufficiently adapted to the usage environment.

実施の形態３．
実施の形態３は、画面制御用記述言語解析部３で解析した特定の情報に基づいて、騒音除去処理の有無を設定するようにしたものである。即ち、環境騒音が想定できる場合は、騒音除去処理を行った方が認識性能改善において効果的なことがある。そこで、本実施の形態では、画面制御用記述言語解析部３で取得した情報を使用して騒音除去処理の有無を設定するようにしている。 Embodiment 3 FIG.
In the third embodiment, the presence / absence of noise removal processing is set based on specific information analyzed by the screen control description language analysis unit 3. That is, when environmental noise can be assumed, it may be more effective in improving recognition performance to perform noise removal processing. Therefore, in this embodiment, the presence / absence of noise removal processing is set using information acquired by the screen control description language analysis unit 3.

実施の形態３における音声認識装置の図面上の構成は、システムデータ保持部５が設けられていないだけで、図１の構成と同様であるため、この図１を援用して説明する。
実施の形態３のシステム適応化情報保持部４は、画面制御用記述言語解析部３で解析した特定の情報に対応して騒音除去処理の有無を示す適応化対応表を備えている。 The configuration of the speech recognition apparatus according to the third embodiment in the drawing is the same as the configuration in FIG. 1 except that the system data holding unit 5 is not provided.
The system adaptation information holding unit 4 according to the third embodiment includes an adaptation correspondence table that indicates the presence or absence of noise removal processing corresponding to the specific information analyzed by the screen control description language analysis unit 3.

図１５は、システム適応化情報保持部４が保持する適応化対応表の説明図である。
図示のように、特定の情報としてのＵＲＬに対応して使用場所と騒音除去処理の有無が対応付けられている
また、実施の形態３のシステム適応部６は、システム適応化情報保持部４で決定された騒音除去処理の有無を示す情報に基づき、騒音除去処理の有無を適応化情報として出力するよう構成されている。これ以外の構成は、実施の形態１と同様であるため、その他の構成に関する説明は省略する。 FIG. 15 is an explanatory diagram of an adaptation correspondence table held by the system adaptation information holding unit 4.
As shown in the figure, the location of use and the presence / absence of noise removal processing are associated with a URL as specific information. Further, the system adaptation unit 6 of Embodiment 3 is a system adaptation information holding unit 4. Based on the determined information indicating presence / absence of noise removal processing, the presence / absence of noise removal processing is output as adaptation information. Since the configuration other than this is the same as that of the first embodiment, description of other configurations is omitted.

次に、実施の形態３の動作について説明する。
図１６は、実施の形態３の騒音除去処理の設定動作を示すフローチャートである。
本実施の形態では、実施の形態１と同様に、図２の点検用ページが画面表示された場合のシステム適応の動作について説明する。 Next, the operation of the third embodiment will be described.
FIG. 16 is a flowchart illustrating the setting operation of the noise removal process according to the third embodiment.
In the present embodiment, as in the first embodiment, the system adaptation operation when the inspection page in FIG. 2 is displayed on the screen will be described.

先ず、ステップＳＴ３０１において、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。ここで、表示画面構成を記述している図２に示すＨＴＭＬ文書を取得する。 First, in step ST301, the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Here, the HTML document shown in FIG. 2 describing the display screen configuration is acquired.

次に、ステップＳＴ３０２において、画面表示部２は、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、図３に示すページを表示する。ステップＳＴ３０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬ「http://www.tenken.ne.jp/place1.html」が得られる。ＵＲＬと騒音除去処理との関連付け情報は、図１５に示したように、システム適応化情報保持部４において、適応化対応情報として保持されている。図１５から分かるように、ＵＲＬに文字列「place1」が含まれている場合、施設１が本システムの使用場所であることが特定できる。また、図６及び図１５から明らかなように、本実施の形態においては、環境騒音が定常騒音である場合は、騒音除去処理を行い、非定常騒音である場合は騒音除去処理を行わないものとする。 Next, in step ST302, the screen display unit 2 displays the page shown in FIG. 3 based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1. In step ST303, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL “http://www.tenken.ne.jp/place1.html” of the HTML document is obtained. The association information between the URL and the noise removal process is held as adaptation correspondence information in the system adaptation information holding unit 4 as shown in FIG. As can be seen from FIG. 15, when the character string “place1” is included in the URL, it is possible to specify that the facility 1 is the place where this system is used. As is apparent from FIGS. 6 and 15, in the present embodiment, noise removal processing is performed when the environmental noise is stationary noise, and noise removal processing is not performed when the environmental noise is non-stationary noise. And

ステップＳＴ３０４において、画面制御用記述言語解析部３で解析した結果得られたＵＲＬとシステム適応化情報保持部４において保持している図１５に示す適応化対応表を照合する。その結果、ＵＲＬには「place1」という文字列が含まれているので、図３に示すページでは、騒音除去処理は「無し」が選択され、これが適応化対応情報としてシステム適応部６に出力される。これにより、ステップＳＴ３０５において、システム適応部６は、騒音除去処理を行わないよう設定する。設定された騒音除去処理の有無は、図１７に示すように音声認識部８における音響特徴抽出前に適用される。即ち、騒音除去処理が「無し」であった場合は、そのまま音響特徴抽出処理が行われ、騒音除去処理が「有り」であった場合は、所定の騒音除去処理を行った後、音響特徴処理を行う。
このような動作により、騒音環境下においても適切に騒音除去処理の有無を設定できるため、認識性能が向上する。 In step ST304, the URL obtained as a result of the analysis by the screen control description language analysis unit 3 is collated with the adaptation correspondence table shown in FIG. 15 held in the system adaptation information holding unit 4. As a result, since the character string “place1” is included in the URL, “None” is selected as the noise removal process on the page shown in FIG. The Thereby, in step ST305, the system adaptation part 6 sets so that a noise removal process may not be performed. The presence / absence of the set noise removal processing is applied before the acoustic feature extraction in the speech recognition unit 8 as shown in FIG. That is, if the noise removal processing is “none”, the acoustic feature extraction processing is performed as it is, and if the noise removal processing is “present”, the acoustic feature processing is performed after performing the predetermined noise removal processing. I do.
With such an operation, the presence / absence of noise removal processing can be set appropriately even in a noisy environment, so that recognition performance is improved.

もし、画面制御用記述言語解析部３で解析して得られた表示画面のＵＲＬが、システム適応化情報保持部４において保持している図１５に示す適応化対応表にない場合は、ステップＳＴ３０６において、既定の騒音除去処理に関する設定がシステム適応部６で設定される。 If the URL of the display screen obtained by analysis by the screen control description language analysis unit 3 is not in the adaptation correspondence table shown in FIG. 15 held in the system adaptation information holding unit 4, step ST306 is performed. The system adaptation unit 6 sets a predetermined noise removal processing.

上記のように、表示画面のＵＲＬを使用して使用場所を特定することにより、使用場所によって騒音環境が大きく異なる場合でも適切に騒音除去処理の有無を設定することが可能となる。 As described above, by specifying the use location using the URL of the display screen, it is possible to appropriately set the presence or absence of noise removal processing even when the noise environment varies greatly depending on the use location.

以上のように、実施の形態３の音声認識装置によれば、適応化情報決定部は、表示画面に応じた騒音除去処理の有無を適応化情報として決定するようにしたので、使用場所によって騒音環境が大きく異なる場合でも適切に騒音除去処理の有無を設定することが可能となる。 As described above, according to the speech recognition apparatus of the third embodiment, the adaptation information determination unit determines whether or not noise removal processing according to the display screen is performed as the adaptation information. Even when the environment is greatly different, it is possible to appropriately set the presence or absence of noise removal processing.

実施の形態４．
実施の形態４は、表示画面に応じて年代及び性別毎の音響モデルへの変更を行うようにしたものである。即ち、音声は、年代及び性別によって、その音響的特徴が大きく異なるため、使用環境への適応のみでは、認識性能改善において高い効果が得られない可能性がある。これに対処するため、実施の形態４では、表示画面によって年代や性別が限定されるページである場合、表示画面に応じて年代及び性別毎の音響モデルへの変更を行うようにしている。 Embodiment 4 FIG.
In the fourth embodiment, the acoustic model for each age and gender is changed according to the display screen. That is, since the acoustic characteristics of speech vary greatly depending on the age and gender, there is a possibility that a high effect in improving recognition performance cannot be obtained only by adaptation to the usage environment. In order to cope with this, in the fourth embodiment, when the page is a page whose age and sex are limited by the display screen, the acoustic model is changed for each age and sex according to the display screen.

実施の形態４における音声認識装置の図面上の構成は、図１の構成と同様であるため、この図１を援用して説明する。
実施の形態４のシステム適応化情報保持部４は、画面制御用記述言語解析部３で解析した特定の情報に対応した年代及び性別毎の音響モデルを示す適応化対応表を備えている。
図１８は、システム適応化情報保持部４が保持する適応化対応表の説明図である。
図示のように、特定の情報としてのＵＲＬに対応して使用場所と年代及び性別毎の音響モデルの情報が対応付けられている。このような適応化対応表を有するシステム適応化情報保持部４は、特定の情報としてのＵＲＬが与えられた場合、そのＵＲＬに対応した音響モデルを選択し、これを適応化対応情報として出力するよう構成されている。 The configuration of the speech recognition apparatus according to the fourth embodiment on the drawing is the same as the configuration of FIG. 1 and will be described with reference to FIG.
The system adaptation information holding unit 4 according to the fourth embodiment includes an adaptation correspondence table indicating acoustic models for each age and gender corresponding to specific information analyzed by the screen control description language analysis unit 3.
FIG. 18 is an explanatory diagram of an adaptation correspondence table held by the system adaptation information holding unit 4.
As shown in the figure, corresponding to the URL as the specific information, the information on the acoustic model for each location and age and gender is associated with the URL. When a URL as specific information is given, the system adaptation information holding unit 4 having such an adaptation correspondence table selects an acoustic model corresponding to the URL and outputs this as adaptation correspondence information. It is configured as follows.

また、システムデータ保持部５には、適応化モデルとして、年代及び性別毎の音響モデルが保持されている。
図１９は、年代及び性別毎の音響モデルの説明図である。
ここで、成人男性向け音響モデルは、成人男性の音声データを用いて学習することによって得られる音響モデルであるものとする。成人女性向け音響モデル、高齢者男性向け音響モデル及び高齢者女性向け音響モデルも同様に、成人女性、高齢者男性、高齢者女性それぞれの音声データを用いて学習することによって得られる音響モデルであるものとする。また、既定年代・性別毎音響モデルは画面制御用記述言語解析部３での解析結果とシステム適応化情報保持部４で保持している適応化対応表の情報との関連付けがなされなかった場合に設定するための音響モデルである。 The system data holding unit 5 holds an acoustic model for each age and sex as an adaptation model.
FIG. 19 is an explanatory diagram of an acoustic model for each age and gender.
Here, it is assumed that the acoustic model for adult men is an acoustic model obtained by learning using audio data of adult men. Similarly, the acoustic model for adult women, the acoustic model for elderly men, and the acoustic model for elderly women are acoustic models obtained by learning using audio data of adult women, elderly men, and elderly women, respectively. Shall. In addition, the acoustic model for each predetermined age and gender is obtained when the analysis result in the screen control description language analysis unit 3 is not associated with the information in the adaptation correspondence table held in the system adaptation information holding unit 4. It is an acoustic model for setting.

このような適応化モデルを有するシステムデータ保持部５は、システム適応化情報保持部４から適応化対応情報が与えられた場合、その適応化対応情報に対応した音響モデルを選択し、これを適応化モデルとしてシステム適応部６に出力するよう構成されている。 The system data holding unit 5 having such an adaptation model selects an acoustic model corresponding to the adaptation correspondence information when the adaptation correspondence information is given from the system adaptation information holding unit 4, and adapts this. It outputs so that it may output to the system adaptation part 6 as a computerized model.

また、実施の形態４のシステム適応部６は、システムデータ保持部５から出力された音響モデルの情報に基づいて、適応化情報を出力するよう構成されている。これ以外の構成は、実施の形態１と同様であるため、ここでの説明は省略する。 The system adaptation unit 6 according to the fourth embodiment is configured to output adaptation information based on the acoustic model information output from the system data holding unit 5. Since the other configuration is the same as that of the first embodiment, the description thereof is omitted here.

次に、実施の形態４の動作について説明する。
図２０は、実施の形態４における音響モデルの設定動作を示すフローチャートである。
本実施の形態では、実施の形態１と同様に、図２の点検用ページが画面表示された場合のシステム適応の動作について説明する。 Next, the operation of the fourth embodiment will be described.
FIG. 20 is a flowchart illustrating an acoustic model setting operation according to the fourth embodiment.
In the present embodiment, as in the first embodiment, an operation of system adaptation when the inspection page in FIG. 2 is displayed on the screen will be described.

先ず、ステップＳＴ４０１において、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。次に、画面表示部２は、ステップＳＴ４０２において、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、図３に示すページを表示する。 First, in step ST401, the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Next, in step ST402, the screen display unit 2 displays the page shown in FIG. 3 based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1.

ステップＳＴ４０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬ「http://www.tenken.ne.jp/ place1.html」が得られる。ＵＲＬとシステムデータ保持部３で保持している情報との関連付け情報はシステム適応化情報保持部４において、図１８に示すように、表示画面に対して適切な年代及び性別毎の音響モデルの設定を行うための情報として保持されている。図１８から分かるように、ＵＲＬに文字列「place1」が含まれている場合、施設１が本システムの使用場所であることが特定できる。 In step ST403, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL “http://www.tenken.ne.jp/place1.html” of the HTML document is obtained. The association information between the URL and the information held in the system data holding unit 3 is set in the system adaptation information holding unit 4 as shown in FIG. It is held as information for performing As can be seen from FIG. 18, when the character string “place1” is included in the URL, the facility 1 can be specified as the place of use of the present system.

ステップＳＴ４０４において、画面制御用記述言語解析部３で解析した結果得られたＵＲＬとシステム適応化情報保持部４において保持している適応化対応表を照合する。その結果、ＵＲＬには「place1」という文字列が含まれているので、ステップＳＴ４０５において、図３に示すページにおいて使用する年代及び性別毎の音響モデルとして、「成人男性向け音響モデル」がシステムデータ保持部５に保持されている音響モデルの中から選択され、これが、システム適応部６から適応化情報として音声認識部８に出力される。 In step ST404, the URL obtained as a result of analysis by the screen control description language analysis unit 3 is collated with the adaptation correspondence table held in the system adaptation information holding unit 4. As a result, since the character string “place1” is included in the URL, in step ST405, “acoustic model for adult men” is system data as an acoustic model for each age and gender used in the page shown in FIG. The acoustic model held in the holding unit 5 is selected from the acoustic model, and this is output from the system adaptation unit 6 to the speech recognition unit 8 as adaptation information.

設定された年代・性別毎音響モデルは、図２１に示すように音声認識部８における確率演算処理に使用される。これにより、表示画面により入力対象者の年代及び性別が異なる場合においても適切に年代・性別毎音響モデルが設定されるため、認識性能が向上する。 The set age / gender acoustic model is used for the probability calculation processing in the speech recognition unit 8 as shown in FIG. Thereby, even when the input subject's age and gender are different depending on the display screen, the acoustic model is appropriately set for each age and gender, so that the recognition performance is improved.

もし、画面制御用記述言語解析部３で解析して得られた表示画面のＵＲＬが、システム適応化情報保持部４において保持している適応化対応表にない場合は、ステップＳＴ４０６において、既定年代・性別毎音響モデルがシステム適応部６で設定される。 If the URL of the display screen obtained by the analysis by the screen control description language analysis unit 3 is not in the adaptation correspondence table held in the system adaptation information holding unit 4, in step ST406, the default age A gender-specific acoustic model is set by the system adaptation unit 6.

上記のように、表示画面のＵＲＬを使用して年代及び性別毎の音響モデルの設定を行うことにより、点検場所によってユーザの年代及び性別が限定できる場合に適切に音響モデルを選択することが可能となる。 As described above, by setting the acoustic model for each age and gender using the URL of the display screen, it is possible to select the acoustic model appropriately when the user's age and gender can be limited by the inspection location It becomes.

尚、本実施の形態を説明する上で、画面制御用記述言語解析部３で取得した情報としてＨＴＭＬ文書のＵＲＬのみとしているが、任意の表示画面解析情報を使用することが可能である。例えば、表示画面に記載されている文字列を使用することも可能である。更に、本実施の形態を説明する上で、画面制御用記述言語としてＨＴＭＬ文書を使用したが、任意の記述言語を使用することが可能である。 In the description of the present embodiment, only the URL of the HTML document is used as the information acquired by the screen control description language analysis unit 3, but arbitrary display screen analysis information can be used. For example, it is possible to use a character string described on the display screen. Furthermore, in describing the present embodiment, an HTML document is used as the screen control description language, but any description language can be used.

以上のように、実施の形態４の音声認識装置によれば、適応化情報決定部は、表示画面に応じた年代毎、性別毎の音響モデルを適応化情報として決定するようにしたので、表示画面により入力対象者の年代及び性別が異なる場合においても適切に年代・性別毎音響モデルが設定されるため、音声認識性能を向上させることができる。 As described above, according to the speech recognition apparatus of the fourth embodiment, the adaptation information determination unit determines the acoustic model for each age and sex according to the display screen as the adaptation information. Even when the input subject's age and gender differ depending on the screen, the sound recognition performance can be improved because the acoustic model is appropriately set for each age and gender.

尚、保持する適応化モデルとして、実施の形態１では騒音モデル、実施の形態２では騒音重畳音響モデル、実施の形態４では年代・性別毎音響モデルとしたが、これらの適応化モデルを複数あるいは全て保持しておき、これらの中から適宜選択して用いるようにしてもよい。このようにすれば、条件等に応じてより適切な適応化モデルを設定することができ、更に音声認識性能を向上させることができる。 As the adaptation models to be held, the noise model in the first embodiment, the noise superimposing acoustic model in the second embodiment, and the acoustic model for each age and gender in the fourth embodiment. All of them may be retained and appropriately selected from these. In this way, a more appropriate adaptation model can be set according to conditions and the like, and speech recognition performance can be further improved.

実施の形態５．
実施の形態１〜３では、表示画面に対して、騒音モデル、騒音重畳音響モデル及び騒音除去処理の有無を設定する方法について述べた。ここで、各施設についての詳細な位置に応じて適応化モデルを設定すれば、更に高い音声認識性能を実現することができる。そのため、実施の形態５では、施設内の位置に応じて騒音モデルや騒音重畳モデルを設定するようにしている。 Embodiment 5. FIG.
In the first to third embodiments, the method for setting the presence / absence of the noise model, the noise superimposing acoustic model, and the noise removal processing on the display screen has been described. Here, if an adaptation model is set in accordance with the detailed position of each facility, higher speech recognition performance can be realized. Therefore, in the fifth embodiment, a noise model and a noise superimposition model are set according to the position in the facility.

図２２は、施設内の位置と騒音環境との関係を示す説明図である。
図示のように、同一施設内でも入力項目によって点検場所が異なり、また、騒音環境が大きく異なる場合がある。そのような異なる騒音環境下で同一のシステムデータ（騒音モデル、騒音重畳音響モデル及び騒音除去処理の有無）を使用して音声認識を行った場合、データのミスマッチを引き起こし、高い認識性能が得られない可能性がある。 FIG. 22 is an explanatory diagram showing the relationship between the position in the facility and the noise environment.
As shown in the figure, the inspection location varies depending on the input items even in the same facility, and the noise environment may vary greatly. When speech recognition is performed using the same system data (noise model, noise superimposing acoustic model and presence / absence of noise removal processing) under such different noise environments, data mismatch occurs and high recognition performance is obtained. There is no possibility.

このような問題に対応するため、実施の形態５では、表示画面の入力項目毎に、使用するシステムデータを切り替える。尚、実施の形態５における音声認識装置の図面上の構成は、図１の構成と同様であるため、図１を援用して説明する。 In order to deal with such a problem, in the fifth embodiment, the system data to be used is switched for each input item on the display screen. In addition, since the structure on the drawing of the speech recognition apparatus in Embodiment 5 is the same as that of FIG. 1, it demonstrates using FIG.

実施の形態５の画面制御用記述言語解析部３は、特定の情報として、画面のＵＲＬを解析すると共に、項目毎の位置を表すｎａｍｅ属性に含まれる文字を解析するよう構成されている。また、システム適応化情報保持部４は、ｎａｍｅ属性に含まれる文字に対応した騒音モデル、騒音重畳音響モデル及び騒音除去処理の有無を示す適応化対応表を備え、この適応化対応表に基づいて騒音モデルや騒音重畳音響モデルあるいは騒音除去処理といった適応化モデルの選択を行うよう構成されている。 The screen control description language analysis unit 3 according to the fifth embodiment is configured to analyze the URL of the screen as specific information and to analyze the characters included in the name attribute representing the position of each item. Further, the system adaptation information holding unit 4 includes a noise model corresponding to characters included in the name attribute, a noise superimposing acoustic model, and an adaptation correspondence table indicating the presence / absence of noise removal processing, and based on this adaptation correspondence table An adaptation model such as a noise model, a noise superimposing acoustic model, or a noise removal process is selected.

図２３は、システム適応化情報保持部４が保持する適応化対応表の説明図である。
図示のように、特定の情報としてのＵＲＬに含まれる文字と更に位置を示す文字に対応して、騒音モデル、騒音重畳音響モデル及び騒音除去処理の有無を示す情報が対応付けられている。 FIG. 23 is an explanatory diagram of an adaptation correspondence table held by the system adaptation information holding unit 4.
As illustrated, the noise model, the noise superimposing acoustic model, and the information indicating the presence or absence of the noise removal process are associated with the character included in the URL as the specific information and the character indicating the position.

システムデータ保持部５は、騒音モデル及び騒音重畳音響モデルを保持しており、システム適応化情報保持部４から与えられた適応化対応情報に基づいて対応したモデルをシステム適応部６に出力するよう構成されている。
図２４は、システムデータ保持部５が保持する適応化モデルの説明図である。
図示のように、システムデータ保持部５は、各施設の位置に対応した騒音モデルや騒音重畳音響モデルのデータを有している。 The system data holding unit 5 holds a noise model and a noise superimposed acoustic model, and outputs a corresponding model to the system adaptation unit 6 based on the adaptation correspondence information given from the system adaptation information holding unit 4. It is configured.
FIG. 24 is an explanatory diagram of an adaptation model held by the system data holding unit 5.
As shown in the figure, the system data holding unit 5 has data of a noise model and a noise superimposed acoustic model corresponding to the position of each facility.

また、実施の形態５のシステム適応部６は、システムデータ保持部５から出力された騒音モデルや騒音重畳音響モデルといった適応化モデルの情報を、適応化情報として決定するよう構成されている。これ以外の構成は、実施の形態１と同様であるため、ここでの説明は省略する。 Further, the system adaptation unit 6 according to the fifth embodiment is configured to determine information on an adaptation model such as a noise model and a noise superimposed acoustic model output from the system data holding unit 5 as adaptation information. Since the other configuration is the same as that of the first embodiment, the description thereof is omitted here.

次に、実施の形態５の動作について説明する。
図２５は、実施の形態５におけるシステムの設定動作を示すフローチャートである。
本実施の形態では、実施の形態１と同様に、図２の点検用ページが画面表示された場合のシステム適応の動作について説明する。 Next, the operation of the fifth embodiment will be described.
FIG. 25 is a flowchart showing a system setting operation according to the fifth embodiment.
In the present embodiment, as in the first embodiment, the system adaptation operation when the inspection page in FIG. 2 is displayed on the screen will be described.

先ず、ステップＳＴ５０１において、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。次に、ステップＳＴ５０２において、画面表示部２は、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、図３に示すページを表示する。 First, in step ST501, the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Next, in step ST502, the screen display unit 2 displays the page shown in FIG. 3 based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1.

ステップＳＴ５０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬと入力項目毎のｎａｍｅ属性が得られる。 In step ST503, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL of the HTML document and the name attribute for each input item are obtained.

ステップＳＴ５０４において、システム適応化情報保持部４では、画面制御用記述言語解析部３で解析した結果得られたＵＲＬ及び入力項目毎のｎａｍｅ属性の文字と、保持している適応化対応表の情報とを照合する。その結果、ＵＲＬには「place1」という文字列が含まれており、また、入力項目１０１のｎａｍｅ属性には「centertemp」という文字列が含まれているので、図３に示すページの入力項目１０１への音声入力において使用する騒音モデルとして「施設１−施設中央用騒音モデル」が、騒音重畳音響モデルとして「施設１−施設中央用騒音重畳音響モデル」が選択され、これらの適応化対応情報がシステムデータ保持部５に送出される。 In step ST504, the system adaptation information holding unit 4 obtains the URL and the name attribute character for each input item obtained as a result of the analysis by the screen control description language analysis unit 3, and the information of the adaptation correspondence table held therein. And match. As a result, since the URL includes the character string “place1” and the name attribute of the input item 101 includes the character string “centertemp”, the input item 101 of the page shown in FIG. “Facility 1—Facilities center noise model” is selected as the noise model to be used for voice input, and “Facility 1—Facility center noise model” is selected as the noise superimposed acoustic model. It is sent to the system data holding unit 5.

システムデータ保持部５では、これらの情報に基づいて、保持されている適応化モデルの中から対応するモデルを選択し、これをシステム適応部６に出力する。ステップＳＴ５０５において、システム適応部６では選択された騒音モデル及び騒音重畳音響モデルに切り替える。また、騒音除去処理は「無し」が選択され、システム適応部６で設定される。 Based on these pieces of information, the system data holding unit 5 selects a corresponding model from the held adaptation models, and outputs this to the system adaptation unit 6. In step ST505, the system adaptation unit 6 switches to the selected noise model and noise superimposed acoustic model. In addition, “None” is selected for the noise removal processing and is set by the system adaptation unit 6.

また、入力項目１０２のｎａｍｅ属性には「exittemp」という文字列が含まれているので、ステップＳＴ５０４において、システム適応化情報保持部４は、図３に示すページの入力項目１０２への音声入力において使用する騒音モデルとして「施設１−施設出口用騒音モデル」を、また、騒音重畳音響モデルとして「施設１−施設出口用騒音重畳音響モデル」を選択して出力し、システムデータ保持部５は、保持しているデータの中からそれぞれ対応したモデルを選択して出力する。ステップＳＴ５０５において、システム適応部６では選択された騒音モデル及び騒音重畳音響モデルに切り替える。また、騒音除去処理は「有り」が選択され、システム適応部６で設定される。 Since the name attribute of the input item 102 includes the character string “exittemp”, in step ST504, the system adaptation information holding unit 4 performs voice input to the input item 102 on the page shown in FIG. “Facility 1—Facility exit noise model” is selected as the noise model to be used, and “Facility 1—Facility exit noise superimposed acoustic model” is selected and output as the noise superimposed acoustic model. Select and output the corresponding model from the stored data. In step ST505, the system adaptation unit 6 switches to the selected noise model and noise superimposed acoustic model. In addition, “Yes” is selected for the noise removal processing, and is set by the system adaptation unit 6.

もし、画面制御用記述言語解析部３で解析して得られた表示画面のＵＲＬまたはｎａｍｅ属性が、システム適応化情報保持部４において保持している適応化対応表にない場合は、ステップＳＴ５０６において、既定のシステムデータがシステム適応部６で設定される。 If the URL or name attribute of the display screen obtained by analysis by the screen control description language analysis unit 3 is not in the adaptation correspondence table held in the system adaptation information holding unit 4, in step ST506. Default system data is set by the system adaptation unit 6.

上記のように、表示画面のＵＲＬと入力項目のｎａｍｅ属性を使用してシステムデータの設定を行うことにより、表示画面中の入力項目毎に使用環境が異なり、その騒音環境が大きく異なる場合に適切にシステムデータを選択及び設定することが可能となる。 Appropriate when the system environment is set using the URL of the display screen and the name attribute of the input item as described above, so that the usage environment differs for each input item on the display screen and the noise environment differs greatly. It is possible to select and set system data.

本実施の形態を説明する上で、画面制御用記述言語解析部３で取得した情報としてＨＴＭＬ文書のＵＲＬと入力項目のｎａｍｅ属性を使用しているが、任意の表示画面解析情報を使用することが可能である。例えば、表示画面に記載されている文字列を使用することも可能である。更に、本実施の形態を説明する上で、画面制御用記述言語としてＨＴＭＬ文書を使用したが、任意の記述言語を使用することが可能である。 In the description of the present embodiment, the URL of the HTML document and the name attribute of the input item are used as information acquired by the screen control description language analysis unit 3, but arbitrary display screen analysis information should be used. Is possible. For example, it is possible to use a character string described on the display screen. Furthermore, in describing the present embodiment, an HTML document is used as the screen control description language, but any description language can be used.

以上のように、実施の形態５の音声認識装置によれば、適応化情報決定部は、表示画面中の入力項目毎の騒音モデルを適応化情報として決定するようにしたので、表示画面中の入力項目毎に使用環境が異なり、その騒音環境が大きく異なる場合でも適切に騒音モデルを選択及び設定することが可能となる。 As described above, according to the speech recognition apparatus of the fifth embodiment, the adaptation information determination unit determines the noise model for each input item in the display screen as the adaptation information. It is possible to select and set a noise model appropriately even when the use environment differs for each input item and the noise environment varies greatly.

また、実施の形態５の音声認識装置によれば、適応化情報決定部は、表示画面中の入力項目毎の騒音重畳音響モデルを適応化情報として決定するようにしたので、表示画面中の入力項目毎に使用環境が異なり、その騒音環境が大きく異なる場合でも適切に騒音重畳音響モデルを選択及び設定することが可能となる。 Further, according to the speech recognition apparatus of the fifth embodiment, the adaptation information determination unit determines the noise superimposed acoustic model for each input item in the display screen as the adaptation information. It is possible to appropriately select and set the noise superimposing acoustic model even when the usage environment is different for each item and the noise environment is greatly different.

また、実施の形態５の音声認識装置によれば、適応化情報決定部は、表示画面中の入力項目毎の騒音除去処理の有無を適応化情報として決定するようにしたので、表示画面中の入力項目毎に使用環境が異なり、その騒音環境が大きく異なる場合でも適切に騒音除去処理の有無を設定することが可能となる。 Further, according to the speech recognition apparatus of the fifth embodiment, the adaptation information determination unit determines whether or not noise removal processing is performed for each input item in the display screen as the adaptation information. Even if the use environment differs for each input item and the noise environment is greatly different, it is possible to appropriately set the presence or absence of noise removal processing.

実施の形態６．
実施の形態６は、表示画面の入力項目に応じて年代及び性別毎の音響モデルへの変更を行うようにしたものである。即ち、表示画面の入力項目毎に年代や性別が限定される場合、表示画面の入力項目毎に年代及び性別毎の音響モデルへの変更を行うことにより、それらのユーザにおいて、より高い認識性能を得ることができる。本実施の形態では、画面制御用記述言語解析部３で取得した情報を使用して年代及び性別毎の音響モデルを変更する方法について説明する。 Embodiment 6 FIG.
In the sixth embodiment, the acoustic model is changed for each age and sex according to the input items on the display screen. In other words, when the age and gender are limited for each input item on the display screen, by changing the acoustic model for each age and gender for each input item on the display screen, higher recognition performance can be achieved for those users. Obtainable. In the present embodiment, a method for changing the acoustic model for each age and sex using the information acquired by the screen control description language analysis unit 3 will be described.

実施の形態６における音声認識装置の図面上の構成は、図１の構成と同様であるため、この図１を援用して説明する。また、本実施の形態の動作を説明する上で使用するページ及び使用環境は実施の形態５と同様であるとする。 The configuration of the speech recognition apparatus according to the sixth embodiment on the drawing is the same as the configuration of FIG. 1 and will be described with reference to FIG. Further, it is assumed that the pages and the use environment used for explaining the operation of the present embodiment are the same as those of the fifth embodiment.

実施の形態６のシステム適応化情報保持部４は、画面制御用記述言語解析部３で解析した特定の情報としてＵＲＬに含まれる文字とｎａｍｅ属性に含まれる文字に対応した年代及び性別毎の音響モデルを示す適応化対応表を備えている。
図２６は、システム適応化情報保持部４が保持する適応化対応表の説明図である。
図示のように、特定の情報としてのＵＲＬに含まれる文字とｎａｍｅ属性に含まれる文字とに対応して年代及び性別毎の音響モデルの情報が対応付けられている。 The system adaptation information holding unit 4 according to the sixth embodiment includes the sound for each age and gender corresponding to the characters included in the URL and the characters included in the name attribute as specific information analyzed by the screen control description language analysis unit 3. An adaptation correspondence table showing the model is provided.
FIG. 26 is an explanatory diagram of an adaptation correspondence table held by the system adaptation information holding unit 4.
As shown in the figure, the acoustic model information for each age and gender is associated with the character included in the URL as the specific information and the character included in the name attribute.

また、システムデータ保持部５には、適応化モデルとして、年代及び性別毎の音響モデルが保持されている。この年代及び性別毎の音響モデルは、図１９に示した実施の形態４における音響モデルと同様であるため、ここでの説明は省略する。 The system data holding unit 5 holds an acoustic model for each age and sex as an adaptation model. Since the acoustic model for each age and gender is the same as the acoustic model in the fourth embodiment shown in FIG. 19, the description is omitted here.

また、実施の形態６のシステム適応部６は、システムデータ保持部５から出力された音響モデルの情報に基づいて、適応化情報を決定するよう構成されている。これ以外の構成は、実施の形態５と同様であるため、その他の構成に関する説明は省略する。 The system adaptation unit 6 according to the sixth embodiment is configured to determine adaptation information based on the acoustic model information output from the system data holding unit 5. Since the configuration other than this is the same as that of the fifth embodiment, description of the other configuration is omitted.

次に、実施の形態６の動作について説明する。
図２７は、実施の形態６における音響モデルの設定動作を示すフローチャートである。
本実施の形態では、実施の形態１と同様に、図２の点検用ページが画面表示された場合のシステム適応の動作について説明する。 Next, the operation of the sixth embodiment will be described.
FIG. 27 is a flowchart illustrating an acoustic model setting operation according to the sixth embodiment.
In the present embodiment, as in the first embodiment, an operation of system adaptation when the inspection page in FIG. 2 is displayed on the screen will be described.

先ず、ステップＳＴ６０１において、ＵＲＬが「http://www.tenken.ne.jp/place1.html」であるネットワークアクセス先から画面制御用記述言語取得部１は、表示画面情報を取得する。次に、ステップＳＴ６０２において、画面表示部２は、画面制御用記述言語取得部１で取得した図２に示すＨＴＭＬ文書に基づいて、図３に示すページを表示する。 First, in step ST601, the screen control description language acquisition unit 1 acquires display screen information from a network access destination whose URL is “http://www.tenken.ne.jp/place1.html”. Next, in step ST602, the screen display unit 2 displays the page shown in FIG. 3 based on the HTML document shown in FIG. 2 acquired by the screen control description language acquisition unit 1.

ステップＳＴ６０３において、画面制御用記述言語取得部１で取得したＨＴＭＬ文書などのページ情報を画面制御用記述言語解析部３で解析する。解析した結果、ＨＴＭＬ文書のＵＲＬと入力項目毎のｎａｍｅ属性が得られる。 In step ST <b> 603, page information such as an HTML document acquired by the screen control description language acquisition unit 1 is analyzed by the screen control description language analysis unit 3. As a result of the analysis, the URL of the HTML document and the name attribute for each input item are obtained.

ステップＳＴ６０４において、画面制御用記述言語解析部３で解析した結果得られたＵＲＬ及び入力項目毎のｎａｍｅ属性とシステム適応化情報保持部４において保持している適応化対応表を照合する。ＵＲＬには「place1」という文字列が含まれており、また、入力項目１０１のｎａｍｅ属性には「centertemp」という文字列が含まれているので、図３に示すページの入力項目１０１への音声入力において使用する騒音モデルとして「成人男性音響モデル」が選択され、この適応化対応情報がシステムデータ保持部５に送出される。 In step ST604, the URL obtained as a result of the analysis by the screen control description language analysis unit 3 and the name attribute for each input item are collated with the adaptation correspondence table held in the system adaptation information holding unit 4. Since the URL includes the character string “place1” and the name attribute of the input item 101 includes the character string “centertemp”, the voice to the input item 101 on the page shown in FIG. “Adult male acoustic model” is selected as a noise model to be used for input, and this adaptation correspondence information is sent to the system data holding unit 5.

システムデータ保持部５では、これらの情報に基づいて、保持されている音響モデルの中から対応する「成人男性音響モデル」を選択し、これをシステム適応部６に出力する。ステップＳＴ６０５において、システム適応部６では選択された成人男性音響モデルに切り替える。 Based on these pieces of information, the system data holding unit 5 selects a corresponding “adult male acoustic model” from the held acoustic models, and outputs this to the system adaptation unit 6. In step ST605, the system adaptation unit 6 switches to the selected adult male acoustic model.

また、入力項目１０２のｎａｍｅ属性には「exittemp」という文字列が含まれているので、ステップＳＴ６０４において、システム適応化情報保持部４は、図３に示すページの入力項目１０２への音声入力において使用する音響モデルとして「高齢者男性向け音響モデル」を選択して出力し、システムデータ保持部５は、保持しているデータの中から対応したモデルを選択して出力する。ステップＳＴ６０５において、システム適応部６では選択された音響モデルに切り替える。 Since the name attribute of the input item 102 includes the character string “exittemp”, in step ST604, the system adaptation information holding unit 4 performs voice input to the input item 102 of the page shown in FIG. As an acoustic model to be used, “acoustic model for elderly men” is selected and output, and the system data holding unit 5 selects and outputs a corresponding model from the held data. In step ST605, the system adaptation unit 6 switches to the selected acoustic model.

上記のように、表示画面のＵＲＬと入力項目のｎａｍｅ属性を使用して年代及び性別毎の音響モデルの設定を行うことにより、点検場所によってユーザの年代及び性別が限定できる場合に適切に音響モデルを選択することが可能となる。 As described above, by setting the acoustic model for each age and gender using the URL of the display screen and the name attribute of the input item, the acoustic model is appropriately used when the user's age and gender can be limited depending on the inspection location. Can be selected.

また、本実施の形態を説明する上で、画面制御用記述言語解析部３で取得した情報としてＨＴＭＬ文書のＵＲＬと入力項目のｎａｍｅ属性を使用しているが、任意の表示画面解析情報を使用することが可能である。例えば、表示画面に記載されている文字列を使用することも可能である。更に、本実施の形態を説明する上で、画面制御用記述言語としてＨＴＭＬ文書を使用したが、任意の記述言語を使用することが可能である。 In the description of the present embodiment, the URL of the HTML document and the name attribute of the input item are used as information acquired by the screen control description language analysis unit 3, but arbitrary display screen analysis information is used. Is possible. For example, it is possible to use a character string described on the display screen. Furthermore, in describing the present embodiment, an HTML document is used as the screen control description language, but any description language can be used.

以上のように、実施の形態６の音声認識装置によれば、適応化情報決定部は、表示画面中の入力項目毎の年代毎、性別毎の音響モデルを適応化情報として決定するようにしたので、表示画面中の入力項目毎にユーザの年代や性別が異なる場合でも適切に音響モデルを選択及び設定することが可能となる。 As described above, according to the speech recognition apparatus of the sixth embodiment, the adaptation information determination unit determines the acoustic model for each age and sex for each input item on the display screen as the adaptation information. Therefore, even when the user's age and gender are different for each input item on the display screen, it is possible to appropriately select and set the acoustic model.

尚、上記各実施の形態の機能を実現するプログラムを記録した記録媒体をコンピュータに読み取らせ、実行することで各実施の形態を実施することが出来る。プログラムを供給するための記録媒体としてはＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、フロッピー（登録商標）ディスク、ハードディスク、メモリカード等、種々の記録媒体を用いることができる。 Each embodiment can be implemented by causing a computer to read and execute a recording medium that records a program that implements the functions of the above-described embodiments. Various recording media such as a ROM, a CD-ROM, a DVD-ROM, a floppy (registered trademark) disk, a hard disk, and a memory card can be used as a recording medium for supplying the program.

この発明の実施の形態１による音声認識装置を示す構成図である。It is a block diagram which shows the speech recognition apparatus by Embodiment 1 of this invention. 施設１における点検用ページのHTML文書を示す説明図である。It is explanatory drawing which shows the HTML document of the page for an inspection in the facility. 施設１における点検用ページの表示内容を示す説明図である。It is explanatory drawing which shows the display content of the page for an inspection in the facility. 施設２における点検用ページのHTML文書を示す説明図である。It is explanatory drawing which shows the HTML document of the page for an inspection in the plant | facility 2. FIG. 施設２における点検用ページの表示内容を示す説明図である。It is explanatory drawing which shows the display content of the page for an inspection in the plant | facility 2. FIG. 実施の形態１における点検対象施設の情報を示す説明図である。3 is an explanatory diagram showing information on a facility to be inspected in Embodiment 1. FIG. 実施の形態１における騒音モデル設定動作のフローチャートである。3 is a flowchart of a noise model setting operation in the first embodiment. 実施の形態１における適応化対照表を示す説明図である。6 is an explanatory diagram illustrating an adaptation comparison table according to Embodiment 1. FIG. 実施の形態１における適応化モデルを示す説明図である。6 is an explanatory diagram illustrating an adaptation model according to Embodiment 1. FIG. 実施の形態１における適応化情報の設定処理を示す説明図である。6 is an explanatory diagram illustrating adaptation information setting processing in Embodiment 1. FIG. 実施の形態２における適応化対照表を示す説明図である。FIG. 11 is an explanatory diagram showing an adaptation comparison table in the second embodiment. 実施の形態２における騒音重畳音響モデルを示す説明図である。6 is an explanatory diagram illustrating a noise superimposing acoustic model according to Embodiment 2. FIG. 実施の形態２における騒音重畳音響モデル設定動作のフローチャートである。10 is a flowchart of a noise superimposing acoustic model setting operation in the second embodiment. 実施の形態２における適応化情報の設定処理を示す説明図である。FIG. 10 is an explanatory diagram illustrating adaptation information setting processing in the second embodiment. 実施の形態３における適応化対応表を示す説明図である。10 is an explanatory diagram showing an adaptation correspondence table in Embodiment 3. FIG. 実施の形態３における騒音除去処理の有無を設定する場合の動作を示すフローチャートである。10 is a flowchart illustrating an operation when setting whether or not noise removal processing is performed in the third embodiment. 実施の形態３における適応化情報の設定処理を示す説明図である。FIG. 10 is an explanatory diagram showing adaptation information setting processing in the third embodiment. 実施の形態４における適応化対照表を示す説明図である。10 is an explanatory diagram showing an adaptation comparison table in Embodiment 4. FIG. 実施の形態４における適応化モデルを示す説明図である。FIG. 10 is an explanatory diagram showing an adaptation model in the fourth embodiment. 実施の形態４における年代・性別毎音響モデル設定動作のフローチャートである。12 is a flowchart of an acoustic model setting operation for each age and gender in the fourth embodiment. 実施の形態４における適応化情報の設定処理を示す説明図である。FIG. 10 is an explanatory diagram illustrating adaptation information setting processing in the fourth embodiment. 実施の形態５における施設内の位置と騒音環境との関係を示す説明図である。It is explanatory drawing which shows the relationship between the position in the facility in Embodiment 5, and a noise environment. 実施の形態５における適応化対応表を示す説明図である。FIG. 10 is an explanatory diagram showing an adaptation correspondence table in the fifth embodiment. 実施の形態５における適応化モデルを示す説明図である。FIG. 10 is an explanatory diagram showing an adaptation model in the fifth embodiment. 実施の形態５におけるシステム設定動作を示すフローチャートである。10 is a flowchart showing a system setting operation in the fifth embodiment. 実施の形態６における適応化対応表を示す説明図である。FIG. 20 is an explanatory diagram showing an adaptation correspondence table in the sixth embodiment. 実施の形態６における年代・性別毎音響モデル設定処理のフローチャートである。18 is a flowchart of acoustic model setting processing for each age / sex according to the sixth embodiment.

Explanation of symbols

２画面表示部、３画面制御用記述言語解析部、７音声取得部、８音声認識部、９適応化情報決定部。
2 screen display unit, 3 screen control description language analysis unit, 7 speech acquisition unit, 8 speech recognition unit, 9 adaptation information determination unit.

Claims

A screen control description language analysis unit for analyzing specific information included in the screen control description language;
An adaptation information determination unit for determining adaptation information corresponding to the specific information analyzed by the description language analysis unit for screen control;
A voice acquisition unit that acquires a voice input to a screen displayed based on the screen control description language;
A speech recognition apparatus comprising: a speech recognition unit that performs speech recognition of the speech acquired by the speech acquisition unit based on the adaptation information determined by the adaptation information determination unit.

The speech recognition apparatus according to claim 1, wherein the adaptation information includes at least one of a noise model, an acoustic model, and a noise superimposed acoustic model.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines a noise model corresponding to the display screen as the adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines a noise superimposed acoustic model corresponding to the display screen as the adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines the presence or absence of noise removal processing according to the display screen as the adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines an acoustic model for each age and sex according to the display screen as the adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines a noise model for each input item in the display screen as adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines a noise superimposed acoustic model for each input item in the display screen as adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines whether or not noise removal processing is performed for each input item in the display screen as the adaptation information.

The speech recognition apparatus according to claim 1, wherein the adaptation information determination unit determines an acoustic model for each age and each gender for each input item in the display screen as the adaptation information.

Computer
A screen control description language analysis unit for analyzing specific information included in the screen control description language;
An adaptation information determination unit that determines adaptation information corresponding to the specific information based on an analysis result in the description language analysis unit for screen control;
A speech recognition unit comprising: a speech recognition unit that performs speech recognition of speech input to a screen displayed based on the screen control description language based on the adaptation information determined by the adaptation information determination unit A program for functioning as a device.