JP2003223184A

JP2003223184A - Speech recognition device and method

Info

Publication number: JP2003223184A
Application number: JP2002021662A
Authority: JP
Inventors: Hirohisa Sakai; 浩久酒井
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2002-01-30
Filing date: 2002-01-30
Publication date: 2003-08-08

Abstract

<P>PROBLEM TO BE SOLVED: To recognize a series of inputted speech data. <P>SOLUTION: A worker on the production line of a plant inputs facility management information (fault information) in voice from a portable telephone 10. The input speech is a series of speech of 'robot MMR5-1 encoder count error motor replacement', for example. A speech recognition server 14 first extracts a formal portion from the inputted speech, adds pauses before and after the form and distinguishes 'robot', 'MMR5-1' and 'encoder count error'. Besides, a sentence portion including a verb is extracted, pauses are added before and after such a portion and 'encoder count error' and 'motor replacement' are distinguished. After being divided into each of words, the input speech is collated with a database and each of the words are recognized. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置および
方法、特に入力された一連の音声を認識する技術に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device and method, and more particularly to a technique for recognizing a series of inputted voices.

【０００２】[0002]

【従来の技術】近年、ＩＴをはじめとするコンピュータ
技術、特にマンマシンインタフェース技術の進展に伴
い、操作者の発した音声を認識して操作する技術が開発
されている。操作者はキーボードによる入力操作が不要
となり、効率的にコンピュータにデータを入力しあるい
は必要な指示を与えることが可能である。一例として、
工場の生産ラインにおいて、作業者がラインのある設備
に故障が生じたため、その故障を修理した場合を想定す
る。工場の各ラインにおけるこのような故障情報（管理
情報）は、データベース化しておけば、ラインの他の箇
所で類似の故障が発生した場合にもデータベースにアク
セスするだけでその対策方法を迅速に知ることができ
る、あるいは故障情報を容易に統計処理できる等、種々
の利点がある。一方、管理情報をデータベース化するた
めに各作業者に携帯端末やノート型パソコン等を持たせ
るのは煩雑であり、コストも増大する。そこで、各作業
者が緊急連絡用などのために既に有しているＰＨＳ（登
録商標）等の携帯電話を用いて音声により故障情報を入
力し、これを音声認識してデータベース化できれば好都
合である。2. Description of the Related Art In recent years, with the progress of computer technology including IT, especially man-machine interface technology, technology for recognizing and operating a voice uttered by an operator has been developed. The operator does not need to use the keyboard to input data, and can efficiently input data or give necessary instructions to the computer. As an example,
It is assumed that a worker in a factory production line breaks down equipment on the line and repairs the failure. If such failure information (management information) in each line of the factory is stored in a database, even if similar failures occur in other parts of the line, just by accessing the database, the countermeasures can be quickly known. There are various advantages such as being able to perform, or statistically processing failure information. On the other hand, it is complicated and costly to provide each worker with a portable terminal, a notebook computer, or the like in order to make the management information into a database. Therefore, it is convenient if each worker can input the failure information by voice using a mobile phone such as PHS (registered trademark) that each worker already has for emergency contact, and can recognize this as voice to make a database. .

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
音声認識においては、作業者は音声ガイダンスに従って
管理情報を１つ１つ区切り、１つ１つ確認しながら音声
入力せざるを得ず、入力手順が煩雑となる問題があっ
た。However, in the conventional voice recognition, the operator has no choice but to input the voice while checking the management information one by one in accordance with the voice guidance. There was a problem that became complicated.

【０００４】例えば、生産ラインの組立ロボットに異常
が生じた場合を想定する。そのロボットの形式がＭＭＲ
５−１であり、エンコーダカウントミスを生じ、このた
め作業者がモータを交換したとする。作業者が入力すべ
き項目は、対象設備であるロボット、対象工程であるロ
ボットを特定する型式、不具合（故障）の内容であるエ
ンコーダカウントミス、対策内容であるモータ交換の合
計４項目である。作業者は、音声ガイダンスにしたがっ
て、まず「ロボット」と音声入力し、コンピュータに当
該音声を認識させる。コンピュータは、認識結果を復唱
し、作業者は当該認識結果の正当性を「はい」、あるい
は「いいえ」などで答える。「いいえ」の場合、作業者
は再度「ロボット」と音声入力してコンピュータに認識
させる。For example, assume a case where an abnormality occurs in an assembly robot on a production line. The type of robot is MMR
It is 5-1 and an encoder count error occurs, so that the operator replaces the motor. The items to be entered by the operator are a total of four items, that is, the robot that is the target equipment, the model that identifies the robot that is the target process, the encoder count error that is the content of the failure (failure), and the motor replacement that is the content of the countermeasure. According to the voice guidance, the worker first inputs the voice "robot" and causes the computer to recognize the voice. The computer repeats the recognition result, and the operator replies "Yes" or "No" with the validity of the recognition result. In the case of "no", the worker again inputs "robot" by voice to make the computer recognize it.

【０００５】「はい」の場合、作業者は次に「エムエム
アールゴノイチ（ＭＭＲ５−１）」なる音声を入力す
る。コンピュータは、再びこの音声を認識し、その結果
を復唱する。以上のようにして、各項目毎に、入力−認
識−復唱−確認を繰り返してコンピュータに認識させて
おり、全ての管理情報を入力するまでに時間と手間を要
していた。In the case of "yes", the operator next inputs the voice "MMARGONOICH (MMR5-1)". The computer recognizes this voice again and repeats the result. As described above, input-recognition-repeat-confirmation is repeated for each item to cause the computer to recognize, and it takes time and effort to input all management information.

【０００６】これは、例えば作業者が「ロボットエム
エムアールゴノイチエンコーダカウントミスモータ
コウカン」のように一気に音声入力した場合、コンピュ
ータはそれぞれの単語の区切りを正確に認識できる保証
がなく、例えば本来は「エンコーダカウントミス」と認
識すべきところを、その前の「イチ」との区切りを認識
できず、「イチエンコーダカウントミス」、すなわち位
置エンコーダのカウントミスと誤認識してしまう等の問
題が生じ得るからである。[0006] This is because, for example, when a worker makes a voice input at once, such as "Robot MMR Gonoichi Encoder Count Miss Motor Kokan", there is no guarantee that the computer can accurately recognize each word division. There is a problem in that a section that should be recognized as an "encoder count error" cannot be recognized as a break from the preceding "Ichi", and it is mistakenly recognized as a "Ichi encoder count error", that is, a position encoder count error. Because you get it.

【０００７】本発明は、上記従来技術の有する課題に鑑
みなされたものであり、その目的は、一連の音声が入力
された場合でも、認識率を向上させることができる装置
および方法を提供することにある。The present invention has been made in view of the above problems of the prior art, and an object thereof is to provide an apparatus and method capable of improving the recognition rate even when a series of voices is input. It is in.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、音声入力手段と、前記音声入力手段から
入力された一連の音声から予め定められた特徴部分を抽
出し、前記特徴部分の前後で前記一連の音声を区切る処
理手段と、前記処理手段で区切られた音声をそれぞれ音
声データベースと照合することで認識する認識手段とを
有することを特徴とする。In order to achieve the above object, the present invention extracts a predetermined characteristic part from a voice input means and a series of voices inputted from the voice input means, It is characterized by further comprising processing means for dividing the series of voices before and after the portion and recognition means for recognizing the voices divided by the processing means by collating them with a voice database.

【０００９】ここで、前記特徴部分は、アルファベット
と数字からなる部分であることが好適である。Here, it is preferable that the characteristic portion is a portion composed of alphabets and numbers.

【００１０】また、前記音声入力手段から機械の型式を
含む一連の音声が入力され、前記特徴部分は、前記機械
の型式であることが好適である。Further, it is preferable that a series of voices including a machine type is input from the voice input means, and the characteristic portion is the machine type.

【００１１】また、前記音声入力手段から対象設備、工
程、不具合、対策内容を含む一連の音声が入力され、前
記処理手段は、前記一連の音声から前記特徴部分として
前記工程を抽出し、前記工程の前後で前記一連の音声を
区切り、さらに前記一連の音声から前記特徴部分として
前記対策内容を抽出して前記対策内容の前後で前記一連
の音声を区切ることで前記一連の音声を前記対象設備、
工程、不具合、対策内容に区切り、前記認識手段は、前
記処理手段で区切られた前記対象設備、工程、不具合、
対策内容をそれぞれ前記音声データベースと照合するこ
とが好適である。本装置において、前記音声入力手段
は携帯電話とすることが好適である。Further, a series of voices including a target equipment, a process, a trouble, and a countermeasure content is input from the voice input means, and the processing means extracts the step as the characteristic part from the series of voices, The series of voices is separated before and after, and the measure content is further extracted from the series of voices as the characteristic part, and the series of voices is divided before and after the measure content, so that the series of voices is the target equipment,
Dividing into process, trouble, and countermeasure content, the recognition means divides the target equipment, process, trouble, and the like by the processing means.
It is preferable to collate the countermeasure contents with the voice database. In this device, the voice input means is preferably a mobile phone.

【００１２】また、本発明は、入力された一連の音声を
認識する方法を提供する。この方法は、前記一連の音声
から予め定められた特徴部分を抽出するステップと、前
記特徴部分に基づき前記一連の音声を複数の単語に分割
するステップと、前記分割されたそれぞれの単語を音声
データベースと照合することで認識するステップとを有
することを特徴とする。The present invention also provides a method for recognizing a series of inputted voices. This method comprises the steps of extracting a predetermined characteristic part from the series of voices, dividing the series of voices into a plurality of words based on the characteristic part, and dividing each of the divided words into a voice database. And a step of recognizing by collating with.

【００１３】ここで、前記特徴部分は、アルファベット
と数字からなる部分とすることができ、また、機械の型
式とすることが好適である。Here, the characteristic part may be a part consisting of alphabets and numbers, and is preferably a machine type.

【００１４】また、前記一連の音声は、対象設備、工
程、不具合、対策内容を含む音声であり、前記抽出する
ステップでは、前記工程及び対策内容を抽出し、前記分
割するステップでは、前記工程及び前記対策内容の前後
で前記一連の音声を区切ることで前記対象設備、工程、
不具合、対策内容を分割することが好適である。Further, the series of voices are voices including target equipment, process, trouble, and countermeasure contents. In the extracting step, the process and countermeasure contents are extracted, and in the dividing step, the process and By dividing the series of sounds before and after the countermeasure content, the target equipment, process,
It is preferable to divide the problem and countermeasure content.

【００１５】本方法においても、前記一連の音声は、携
帯電話から入力されることが好適である。Also in this method, it is preferable that the series of voices be input from a mobile phone.

【００１６】[0016]

【発明の実施の形態】以下、図面に基づき本発明の実施
形態について、工場の生産ラインにおける作業者が管理
情報（故障情報）を音声入力する場合を例にとり説明す
る。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings by taking as an example a case where an operator in a production line of a factory inputs management information (fault information) by voice.

【００１７】図１には、本実施形態に係る音声認識装置
を含む音声認識システムの構成図が示されている。生産
ラインの各作業者はＰＨＳ（登録商標）等の携帯電話１
０を保持しており、生産ラインに故障が発生したときに
はこの携帯電話１０を用いて音声入力する。携帯電話１
０から入力された音声は公衆網１２を介して音声認識サ
ーバ１４に供給される。FIG. 1 shows a block diagram of a voice recognition system including a voice recognition apparatus according to this embodiment. Each worker on the production line uses a mobile phone 1 such as PHS (registered trademark)
0 is held, and when a failure occurs in the production line, voice input is performed using the mobile phone 10. Mobile phone 1
The voice input from 0 is supplied to the voice recognition server 14 via the public network 12.

【００１８】音声認識サーバ１４は、携帯電話１０から
入力された音声を認識し、認識結果をデータベース１６
に登録する。音声認識サーバ１４とデータベース１６
は、例えばＬＡＮにより接続される。The voice recognition server 14 recognizes the voice input from the mobile phone 10 and stores the recognition result in the database 16
Register with. Speech recognition server 14 and database 16
Are connected by, for example, a LAN.

【００１９】図２には、図１における音声認識サーバ１
４の構成ブロック図が示されている。音声認識サーバ１
４は、公衆網１２からの音声（本実施形態では一連の音
声）を入力するインタフェース（Ｉ／Ｆ）、ＣＰＵ、Ｒ
ＯＭやＲＡＭなどのメモリを備え、さらに音声データベ
ースと声紋データベースを備える。FIG. 2 shows the voice recognition server 1 shown in FIG.
4 is a configuration block diagram. Speech recognition server 1
Reference numeral 4 denotes an interface (I / F) for inputting a voice (a series of voices in this embodiment) from the public network 12, a CPU, and an R.
It has a memory such as OM and RAM, and further has a voice database and a voiceprint database.

【００２０】音声データベースは、管理項目毎の音声デ
ータベースから構成されており、具体的には対象設備音
声データベース、工程音声データベース、不具合音声デ
ータベース、対策内容音声データベースを含む。各音声
データベースには、連続した複数の音素からなるモデル
データが蓄積されており、入力音声とデータベース内の
モデルデータとを照合し、入力音声をデータベース内の
いずれかの単語と特定することで入力音声を認識する。
入力音声データとモデルデータとの照合技術、あるいは
波形のパターンマッチング技術は公知である。The voice database comprises a voice database for each management item, and specifically includes a target facility voice database, a process voice database, a fault voice database, and a countermeasure content voice database. Each voice database stores model data consisting of multiple consecutive phonemes, and the input voice is collated with the model data in the database, and the input voice is input by specifying one of the words in the database. Recognize voice.
A technique for collating input voice data with model data, or a waveform pattern matching technique is known.

【００２１】声紋データベースは、各作業者固有の声紋
データを記憶するデータベースであり、作業者毎の声紋
の他、各作業者の音声入力時の「くせ」をデータとして
保持する。「くせ」には発声の傾向の他、管理情報を入
力する場合の項目の順次も含まれる。例えば、ある作業
者Ａは対象設備、工程、不具合、対策内容の順に入力す
るが、別の作業者Ｂは不具合、工程、対象設備、対策内
容の順に入力する等である。管理情報の入力順序データ
は、学習により取得することができる。よく知られてい
るように、音声を周波数分析して得られるスペクトルは
指紋と同様にその人固有の特性を示し、音声データと声
紋データとを照合することで、その声の持ち主を特定す
ることもできる。本実施形態では、声紋データベースを
用いて入力音声から作業者を特定し、その作業者のくせ
に基づいて入力音声を修正して標準的なモデルデータと
照合する。また、管理情報の各項目を認識する際に、声
紋に基づき特定した作業者の音声入力順序を参酌して認
識する。The voiceprint database is a database for storing voiceprint data unique to each worker. The voiceprint database holds not only the voiceprint of each worker but also the "habit" at the time of voice input of each worker as data. The “habit” includes a tendency of utterance as well as a sequence of items for inputting management information. For example, a certain worker A inputs the target equipment, the process, the trouble, and the countermeasure contents in this order, while another worker B inputs the trouble, the process, the target equipment, and the countermeasure contents in that order. The input order data of the management information can be acquired by learning. As is well known, the spectrum obtained by frequency analysis of voice shows the characteristic peculiar to the person like a fingerprint, and the owner of the voice can be specified by comparing the voice data with the voiceprint data. You can also In the present embodiment, a voiceprint database is used to identify a worker from the input voice, the input voice is corrected based on the worker's habit, and collated with standard model data. When recognizing each item of the management information, the voice input order of the operator identified based on the voice print is taken into consideration.

【００２２】ＣＰＵは、インターフェースＩ／Ｆを介し
て入力された一連の音声を各項目毎に、すなわち管理情
報として意味のある単語毎に分割し、分割された各単語
の音声データを音声データベースと照合して認識する。
ＣＰＵは、一連の音声を意味のある単語に分割する際に
は、一連に入力された音声から特徴部分を抽出し、抽出
された特徴部分の前後で区切ることにより分割する。特
徴部分は、４つの項目の中のいずれであるかを一義的に
特定することができる部分であり、具体的には工程部分
である。工程は機械の型式で指示され、型式はアルファ
ベットと数字の組み合わせから構成される。例えば「Ｍ
ＭＲ５−１」などである。工程以外にこのようなアルフ
ァベットと数字の組み合わせからなる項目は存在しな
い。従って、一連の音声からアルファベットと数字の部
分を抽出することで、抽出された部分は工程であると特
定でき、しかもその特徴部分の前後で音声データを区切
ることで他の項目と工程との区切りを正しく認識するこ
とができる。The CPU divides a series of voices input via the interface I / F into each item, that is, into each word having a meaning as management information, and the voice data of each divided word is used as a voice database. Recognize by collating.
When dividing a series of voices into meaningful words, the CPU extracts a characteristic portion from a series of input voices, and divides the extracted characteristic portion before and after the characteristic portion. The characteristic part is a part that can uniquely specify which of the four items, specifically, a process part. The process is dictated by the machine type, which consists of a combination of letters and numbers. For example, "M
MR5-1 "and the like. There is no item consisting of such a combination of letters and numbers other than the process. Therefore, by extracting the alphabet and number parts from a series of voices, the extracted part can be specified as a process, and by separating the voice data before and after the characteristic part, it is possible to separate other items from the process. Can be correctly recognized.

【００２３】図３には、以上の原理に基づいた本実施形
態における音声認識サーバ１４の処理フローチャートが
示されている。なお、作業者は携帯電話１０を用いて
「ロボットエムエムアールゴノイチ（ＭＭＲ５−１の
意）エンコーダカウントミスモータコウカン（モータ
交換の意）」と一気に音声入力したものとする。FIG. 3 shows a processing flowchart of the voice recognition server 14 in this embodiment based on the above principle. It is assumed that the worker uses the mobile phone 10 to suddenly input "Robot MMR Gonoichi (meaning MMR5-1) Encoder Count Miss Motor Change (meaning motor replacement)" at once.

【００２４】まず、音声認識サーバ１４は、一括入力し
た音声データからアルファベット及び数字を含んだ部分
を抽出し、これを型式の単語として特定する（Ｓ１０
１）。First, the voice recognition server 14 extracts a portion including alphabets and numbers from the voice data collectively input, and specifies this as a type word (S10).
1).

【００２５】具体的には、「エムエムアールゴノイチ」
の部分である。アルファベット及び数字の部分であるこ
とを認識するためには、音声データベース内のアルファ
ベットの音声データ及び数字の音声データと照合するこ
とは云うまでもない。アルファベット及び数字は比較的
容易に認識できる。Specifically, "MMARGONOICH"
Part of. Needless to say, in order to recognize that it is a part of the alphabet and the numbers, it is necessary to collate it with the voice data of the alphabet and the voice data of the numbers in the voice database. Alphabets and numbers are relatively easy to recognize.

【００２６】型式部分を抽出した後、音声認識サーバ１
４は一連の音声データに対し、型式部分を挟んでその前
後で区分けする（Ｓ１０２）。具体的には、「ロボット
エムエムアールゴノイチ」の部分において「エムエム
アールゴノイチ」の前で区分けすることで「ロボット」
と「エムエムアールゴノイチ」に区分けし、また、「エ
ムエムアールゴノイチエンコーダカウントミス」の部
分において「エムエムアールゴノイチ」の後で区分けす
ることで「エムエムアールゴノイチ」と「エンコーダカ
ウントミス」に区分けする。After extracting the model part, the voice recognition server 1
In step 4, a series of voice data is divided before and after the model part is sandwiched (S102). Specifically, the "robot" is divided into "robot MM Argo Noichi" before "M MU Argo Noichi".
And "MMARGONOICH", and in the section of "MMARGONOICH Encoder Count Mistake" after "MMARGONOICH", it is separated into "MMARGONOICH" and "Encoder Count Miss". Divide into.

【００２７】次に、音声認識サーバ１４は、一連の音声
データから動詞を含んだ文章を抽出し、文章（対策内
容）として認識する（Ｓ１０３）。具体的には、入力音
声から動詞である「コウカン」を含む「モータコウカ
ン」を抽出する。動詞の抽出は、音声データベース内の
対策内容音声データベースに予め格納された動詞モデル
データと入力音声データとを照合することで抽出でき
る。動詞のモデルデータは、対策内容として考えられる
動詞群、例えば「交換」、「取替」、「補充」などから
構成される。動詞を含む文書は、項目としては対策内容
だけであり、動詞を含む文書を抽出することで一連の音
声から対策内容項目を一義的に抽出できる。Next, the voice recognition server 14 extracts a sentence containing a verb from a series of voice data and recognizes it as a sentence (contents of countermeasure) (S103). Specifically, "motor kokan" including the verb "kokan" is extracted from the input voice. The verb can be extracted by comparing the input voice data with the verb model data stored in advance in the countermeasure content voice database in the voice database. The verb model data is composed of verb groups that can be considered as countermeasures, such as “exchange”, “replacement”, and “replenishment”. A document containing a verb has only countermeasure contents as an item, and by extracting a document containing a verb, the countermeasure contents item can be uniquely extracted from a series of voices.

【００２８】文章（対策内容）を認識した後、一連の音
声を対策内容の前後で区分けする（Ｓ１０４）。具体的
には、「エンコーダカウントミスモータコウカン」の
部分を「エンコーダカウントミス」と「モータコウカ
ン」に区分けする。「モータコウカン」は一連の音声の
最後に存在するため、その後で区切られることはない。After recognizing the sentence (contents of the countermeasure), a series of voices are divided before and after the contents of the countermeasure (S104). Specifically, the "encoder count miss motor change" part is divided into "encoder count miss" and "motor change". The "Motorkoukan" exists at the end of the series of voices, so it is not separated after that.

【００２９】以上のようにして型式および対策内容の前
後で一連の音声を区切った後、それぞれの区切りにおい
て認識を行う（Ｓ１０５）。すなわち、Ｓ１０１〜Ｓ１
０４の処理により、一連の音声は「ロボット」、「エム
エムアールゴノイチ」、「エンコーダカウントミス」、
「モータコウカン」の４つの部分（単語）に分割されて
おり、入力音声の声紋からどの作業者の音声でその語順
も分かるから、各単語がどの項目に対応するかが分かり
（型式と対策内容は既に分かっており、残り２つの項目
はそれぞれ対象設備、不具合であることが明らかとな
る）、音声データベース内の各項目ごとのモデルデータ
と入力音声とを照合することで各項目の音声データを認
識することができる。例えば、入力音声の声紋から作業
者Ａが音声入力したものと特定でき、作業者Ａは対象設
備、工程、不具合、対策内容の順に入力するものとする
と、「ロボット」、「エンコーダカウントミス」はそれ
ぞれ対象設備、不具合を意味すると特定でき、「ロボッ
ト」を対象設備音声データベースのモデルデータと照合
してロボットであることを認識し、「エムエムアールゴ
ノイチ」を工程音声データベースのモデルデータと照合
してＭＭＲ５−１であることを認識し、「エンコーダカ
ウントミス」を不具合音声データベース内のモデルデー
タと照合してエンコーダカウントミスであることを認識
し、「モータコウカン」を対策内容音声データベースの
モデルデータと照合してモータ交換であることを認識す
る。なお、工程と対策内容は、それぞれＳ１０１及びＳ
１０３で認識してしまうことも可能である。As described above, after a series of voices is divided before and after the model and the contents of countermeasures, recognition is performed at each division (S105). That is, S101 to S1
By the processing of 04, a series of voices are "robot", "MMARGONOICH", "encoder count error",
It is divided into four parts (words) "Motorkoukan", and the voiceprint of the input voice can tell the word order by the voice of any worker, so it is possible to know which item each word corresponds to (type and countermeasure content). Has already been known, and it is clear that the remaining two items are the target equipment and malfunction, respectively.) By comparing the model data of each item in the voice database with the input voice, the voice data of each item is obtained. Can be recognized. For example, if it is possible to specify that the voice input by the worker A is input from the voiceprint of the input voice, and the worker A inputs in the order of the target equipment, the process, the defect, and the content of the countermeasure, the “robot” and the “encoder count error” are Each can be identified as a target equipment and a failure, and the robot is verified by matching the "robot" with the model data in the target equipment voice database, and the "MMARGONOICH" is compared with the model data in the process voice database. Recognizes that it is MMR5-1, matches the "encoder count error" with the model data in the defective voice database, recognizes that it is an encoder count error, and identifies "Motor Kokan" as the countermeasure data voice database model data. It is confirmed that it is a motor replacement by collating with. The steps and countermeasures are S101 and S, respectively.
It is also possible to recognize at 103.

【００３０】４つの項目を全て認識した後、音声認識サ
ーバ１４は認識結果をデータベース（管理情報データベ
ース）１６に格納する（Ｓ１０６）。データベース１６
には、管理情報が４つの項目に分類されて蓄積されるこ
ととなり、作業者は必要に応じてデータベース１６にア
クセスし、必要な情報を得ることができる。データベー
ス１６から作業者に提供される情報は、適宜加工するこ
ともできる。例えば、モータ交換を行った対象設備一覧
というフォーマットで出力することもでき、ある対象設
備で過去に生じた不具合一覧というフォーマットで出力
することも可能である。特定の対象設備における特定の
工程での不具合の発生割合等という統計結果を出力する
ことも好適であろう。After recognizing all four items, the voice recognition server 14 stores the recognition result in the database (management information database) 16 (S106). Database 16
In this case, the management information is classified into four items and accumulated, and the worker can access the database 16 as necessary to obtain the necessary information. The information provided from the database 16 to the worker can be appropriately processed. For example, it is possible to output in the format of a target equipment list in which the motor has been replaced, or in the format of a list of defects that occurred in the past in a certain target equipment. It would also be preferable to output statistical results such as a failure occurrence rate in a specific process in a specific target facility.

【００３１】このように、本実施形態においては、特徴
部分としてアルファベットと数字からなる型式項目に着
目するとともに、動詞を含む対策内容項目に着目し、こ
れらの前後で区切りを入れることで一連の音声を正しく
単語に分割するものであり、従来のように各単語を入力
し、その都度認識結果を確認しながら次の単語を入力し
ていく必要がなくなり、効率的に管理情報を音声入力す
ることができる。As described above, in the present embodiment, attention is paid to the model item consisting of the alphabet and the numbers as the characteristic part, the attention is paid to the measure content item including the verb, and a series of voices is put before and after these items. Is correctly divided into words, and it is not necessary to input each word as in the past and input the next word while checking the recognition result each time, and efficiently input the management information by voice. You can

【００３２】また、本実施形態では、一連の音声入力は
各作業者が既に緊急連絡用として保持している携帯電話
１０を用いて行われるため、作業者に対して新たに携帯
端末やパソコンを与える必要がなく、インフラ整備に要
する費用を抑えることもできる。使い慣れた携帯電話１
０から音声入力できる利点もある。Further, in the present embodiment, since a series of voice inputs is performed using the mobile phone 10 which each worker already holds for emergency contact, a new mobile terminal or personal computer is provided to the worker. There is no need to give it, and it is possible to reduce the cost required for infrastructure development. Familiar mobile phone 1
There is also an advantage that you can input voice from 0.

【００３３】以上、本発明の実施形態について説明した
が、本発明はこれに限定されるものではなく種々の変更
が可能である。例えば、本実施形態においてはアルファ
ベットと数字からなる型式を特徴部分として抽出してい
るが、一連に入力される音声の性質あるいはカテゴリー
に応じて特徴部分を定めることができる。特徴部分が備
えるべき属性は、他の項目と明確に区別され、かつ一義
的に決定できる音素データから構成されることである。Although the embodiment of the present invention has been described above, the present invention is not limited to this, and various modifications can be made. For example, in the present embodiment, a model consisting of alphabets and numbers is extracted as the characteristic portion, but the characteristic portion can be determined according to the nature or category of a series of input voices. The attribute that the characteristic part should have is that it is composed of phoneme data that is clearly distinguished from other items and can be uniquely determined.

【００３４】[0034]

【発明の効果】以上説明したように、本発明によれば一
連の音声を入力した場合の認識率を向上させ、音声入力
効率を上げることができる。As described above, according to the present invention, the recognition rate in the case of inputting a series of voices can be improved and the voice input efficiency can be increased.

[Brief description of drawings]

【図１】実施形態に係る音声認識システムの構成図で
ある。FIG. 1 is a configuration diagram of a voice recognition system according to an embodiment.

【図２】図１における音声認識サーバの構成ブロック
図である。FIG. 2 is a block diagram of a configuration of a voice recognition server in FIG.

【図３】実施形態の処理フローチャートである。FIG. 3 is a processing flowchart of an embodiment.

[Explanation of symbols]

１０携帯電話、１２公衆網、１４音声認識サー
バ、１６データベース。10 mobile phone, 12 public network, 14 voice recognition server, 16 database.

Claims

[Claims]

1. A voice input means, a processing means for extracting a predetermined characteristic portion from a series of voices input from the voice input means, and dividing the series of voices before and after the characteristic portion, and the processing. A voice recognition device, comprising: a recognition unit that recognizes voices separated by the means by collating each voice with a voice database.

2. The voice recognition device according to claim 1, wherein the characteristic portion is a portion including alphabets and numbers.

3. The voice recognition device according to claim 1, wherein a series of voices including a model of a machine is input from the voice input unit, and the characteristic portion is a model of the machine.

4. The apparatus according to claim 1, wherein a series of voices including target equipment, processes, troubles, and countermeasures are input from the voice inputting means, and the processing means outputs the characteristic portion from the series of voices. By extracting the step, dividing the series of voices before and after the step, further extracting the measure content as the characteristic portion from the series of voices, and dividing the series of voices before and after the measure content A series of voices is divided into the target equipment, steps, defects, and countermeasures, and the recognition means collates the target equipments, steps, troubles, and countermeasures separated by the processing means with the voice database. And a voice recognition device.

5. The apparatus according to claim 1, wherein the voice input unit is a mobile phone.

6. A method for recognizing a series of input voices, the method comprising: extracting a predetermined characteristic part from the series of voices; and converting the series of voices into a plurality of words based on the characteristic part. A voice recognition method comprising: a step of dividing; and a step of recognizing each of the divided words by collating each word with a voice database.

7. The voice recognition method according to claim 6, wherein the characteristic portion is a portion including alphabets and numbers.

8. The voice recognition device according to claim 6, wherein the characteristic portion is a machine type.

9. The method according to claim 6, wherein the series of voices is voice including target equipment, process, trouble, and countermeasure content, and in the extracting step, the process and countermeasure content are extracted, In the step of dividing, the target equipment by dividing the series of voices before and after the process and the content of the countermeasure,
A voice recognition method characterized by dividing processes, defects and countermeasures.

10. The voice recognition method according to claim 6, wherein the series of voices is input from a mobile phone.