JP6712940B2

JP6712940B2 - Voice input device, voice input method

Info

Publication number: JP6712940B2
Application number: JP2016210651A
Authority: JP
Inventors: 尚志奥村; 隆史右田; 直紀竹内
Original assignee: Toppan Forms Co Ltd
Current assignee: Toppan Forms Co Ltd
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2020-06-24
Anticipated expiration: 2036-10-27
Also published as: JP2018072508A

Description

本発明は、音声入力装置、音声入力方法に関する。 The present invention relates to a voice input device and a voice input method.

各種サービスの申し込み画面における入力項目に対して、スマートフォンのタッチパネルや、ＰＣ（パーソナルコンピュータ）に接続されたキーボード等の入力装置から文字列等を入力することで、申込データの作成が行なわれている。申込画面においては、入力項目が複数存在するものがあり、例えば、氏名、住所、電話番号等の入力項目がある。
このような入力項目に対してデータを入力する場合、ユーザは、入力対象の入力項目の入力欄を画面上でタッチしたり、マウスを操作して入力欄までポインタを移動させてクリックすることで、どの入力項目に入力するかを選択する。そして、タッチパネルやキーボードからデータを入力する。
このような入力項目に対するデータの入力方法として、文字列を入力装置から入力する方法以外に、音声で入力することができるものもある（例えば、特許文献１参照）。 Application data is created by inputting character strings or the like to input items on application screens for various services from a touch panel of a smartphone or an input device such as a keyboard connected to a PC (personal computer). .. Some application screens have a plurality of input items, for example, input items such as name, address, and telephone number.
When inputting data for such an input item, the user touches the input field of the input item to be input on the screen or operates the mouse to move the pointer to the input field and click. , Select which input item to enter. Then, the data is input from the touch panel or the keyboard.
As a method of inputting data for such an input item, there is a method of inputting a voice by using a method other than a method of inputting a character string from an input device (for example, refer to Patent Document 1).

特開２０１４−０８５９５４号公報JP, 2014-085954, A

しかしながら、入力項目が複数ある場合、ユーザは、入力する対象の入力項目をタッチ入力等することで指定した後に発話し、その後、次の入力項目をタッチ入力することで指定し、発話をするという動作を繰り返す必要があり、入力項目が増えるほど、タッチ操作等をして入力する対象の入力項目を指定する操作の数が増大し、入力にかかる手間が増大する。 However, when there are a plurality of input items, the user speaks after specifying the input item to be input by touch input, etc., and then touches the next input item to specify and speak. It is necessary to repeat the operation, and as the number of input items increases, the number of operations for designating an input item to be input by performing a touch operation or the like increases and the time and effort required for input increases.

本発明は、このような事情に鑑みてなされたもので、その目的は、複数の入力項目に対して音声入力を行なう場合における操作入力にかかる手間を増大させないようにすることができる音声入力装置、音声入力方法を提供することにある。 The present invention has been made in view of such circumstances, and an object thereof is a voice input device capable of preventing an increase in time and effort required for operation input when voice input is performed for a plurality of input items. , Providing a voice input method.

上述した課題を解決するために、本発明は、集音された音声に応じた音声信号からテキストデータを生成する音声認識部と、前記テキストデータに含まれるキーワードを複数検出し、検出されたキーワードのうち、第１キーワードと第２キーワードの間のテキストデータを前記第１キーワードに対する分割テキストデータとして抽出し、前記第２キーワードの後段のテキストデータの少なくとも一部から第２キーワードに対する分割テキストデータとして抽出する入力データ抽出部と、入力項目に対する分割テキストデータの候補である単語と、前記入力項目を構成する複数の小項目のうち少なくとも１つとを対応づけて記憶する辞書データ記憶部と、入力操作に応じてテキストデータを入力可能な入力欄を複数含んだ入力対象データが表示される表示画面における入力欄に対し、前記入力欄に入力する対象となるテキストデータを、前記入力データ抽出部によって得られた分割テキストデータから取得して、対応する前記入力欄にそれぞれ入力するデータ入力部と、を有し、前記入力データ抽出部は、分割キーワードから検出されたキーワードに前記辞書データ記憶部に記憶された単語がある場合に、当該キーワードのすぐ後ろ側にある文字列を、前記テキストデータに小項目の名称が含まれていなくても小項目に分割する対象の文字列であると特定し、前記データ入力部は、前記表示画面に表示された小項目の名称のうち前記単語に対応する小項目の名称に対して対応関係にある入力欄に、前記小項目に分割する対象の文字列であると特定された文字列に基づいて得られる単語を記述する。
In order to solve the above-mentioned problems, the present invention detects a plurality of keywords included in a voice recognition unit that generates text data from a voice signal corresponding to a collected voice, and the keywords included in the text data, and detects the detected keywords. Among them, the text data between the first keyword and the second keyword is extracted as the fragment text data for the first keyword, and at least a part of the text data in the latter part of the second keyword is extracted as the fragment text data for the second keyword. An input data extraction unit for extracting, a word that is a candidate for the divided text data for the input item, and a dictionary data storage unit that stores at least one of a plurality of small items that make up the input item in association with each other, and an input operation. The input data extraction unit obtains the text data to be input to the input field for the input field on the display screen in which the input target data including a plurality of input fields in which the text data can be input is displayed. was acquired from split the text data has a corresponding data input unit for inputting to each of the input field, wherein the input data extraction unit, the dictionary data storage unit to the detected keyword from dividing keyword If there is a stored word, specify the character string immediately after the keyword as the target character string to be divided into small items even if the text data does not include the name of the small item. , The data input unit has a character string to be divided into the small items in an input field having a correspondence relationship with the name of the small item corresponding to the word among the names of the small items displayed on the display screen. Describe the word obtained based on the string specified to be .

また、本発明は、上述の音声入力装置において、前記入力項目に対応した変換データを記憶する変換データ記憶部と、前記入力データ抽出部によって抽出された分割テキストデータの少なくとも一部に前記変換データ記憶部に記憶された変換対象の文字が含まれている場合に、前記変換データに基づいて、前記変換対象の文字を当該文字とは異なる文字に変換するデータ変換部とを有し、前記データ入力部は、前記入力データ抽出部によって抽出された分割テキストデータについて前記データ変換部によって変換された後の分割テキストデータを入力項目に対して記述する。 Further, the present invention is, in the above-described voice input device, a conversion data storage unit that stores conversion data corresponding to the input item, and the conversion data in at least a part of the divided text data extracted by the input data extraction unit. A data conversion unit that converts the conversion target character into a character different from the character based on the conversion data when the conversion target character stored in the storage unit is included, The input unit describes, for an input item, the divided text data after the divided text data extracted by the input data extraction unit has been converted by the data conversion unit.

また、本発明は、コンピュータにおける音声入力方法であって、音声認識部が、集音された音声に応じた音声信号からテキストデータを生成し、入力データ抽出部が、前記テキストデータに含まれるキーワードを複数検出し、検出されたキーワードのうち、第１キーワードと第２キーワードの間のテキストデータを前記第１キーワードに対する分割テキストデータとして抽出し、前記第２キーワードの後段のテキストデータの少なくとも一部から第２キーワードに対する分割テキストデータとして抽出し、前記入力データ抽出部は、分割キーワードから検出されたキーワードに、入力項目に対する分割テキストデータの候補である単語と、前記入力項目を構成する複数の小項目のうち少なくとも１つとを対応づけて記憶する辞書データ記憶部に記憶された単語がある場合に、当該キーワードのすぐ後ろ側にある文字列を、前記テキストデータに小項目の名称が含まれていなくても小項目に分割する対象の文字列であると特定し、データ入力部が、入力操作に応じてテキストデータを入力可能な入力欄を複数含んだ入力対象データが表示される表示画面における入力欄に対し、前記入力欄に入力する対象となるテキストデータを、前記入力データ抽出部によって得られた分割テキストデータから取得して、対応する前記入力欄にそれぞれ入力し、前記データ入力部は、前記表示画面に表示された小項目の名称のうち前記単語に対応する小項目の名称に対して対応関係にある入力欄に、前記小項目に分割する対象の文字列であると特定された文字列に基づいて得られる単語を記述する。 Further, the present invention is a voice input method in a computer, wherein the voice recognition unit generates text data from a voice signal corresponding to the collected voice, and the input data extraction unit is a keyword included in the text data. Of the detected keywords, the text data between the first keyword and the second keyword is extracted as the divided text data for the first keyword, and at least a part of the text data of the latter stage of the second keyword is detected. From the divided keywords to the second keyword, and the input data extracting unit extracts, in the keyword detected from the divided keywords, a word that is a candidate for the divided text data for the input item and a plurality of small words that form the input item. When there is a word stored in the dictionary data storage unit that stores at least one of the items in association with each other, a character string immediately behind the relevant keyword is used, and the text data includes the name of the small item. Even if it is not specified, it is specified as the target character string to be divided into small items, and the data input part displays the input target data that includes multiple input fields that can input text data according to the input operation. the input field, text data for input to the input field, is obtained from divide text data obtained by the input data extraction unit, and input to the corresponding said input field, the data input unit Is specified as a character string to be divided into the sub-items in the input field corresponding to the name of the sub-item corresponding to the word among the names of the sub-items displayed on the display screen. Describe the word obtained based on the character string .

以上説明したように、この発明によれば、複数の入力項目に対して音声入力を行なう場合における操作入力にかかる手間を増大させないようにして音声入力を行なうことができる。 As described above, according to the present invention, voice input can be performed without increasing the time and effort required for operation input when voice input is performed for a plurality of input items.

この発明の一実施形態による音声入力装置を適用した携帯端末の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the portable terminal which applied the audio|voice input device by one Embodiment of this invention. 携帯端末１における入力対象の入力画面の一例を示す図である。It is a figure which shows an example of the input screen of the input object in the portable terminal 1. 携帯端末１の動作を説明するフローチャートである。6 is a flowchart illustrating an operation of the mobile terminal 1. 入力項目に対する入力データを得る処理を説明する概念図である。It is a conceptual diagram explaining the process which acquires the input data with respect to an input item. 入力項目に対する文字列の入力が行なわれた後の状態を表す画面の一例を示す図である。It is a figure showing an example of a screen showing a state after a character string is inputted to an input item.

以下、本発明の一実施形態による音声入力装置について図面を参照して説明する。
図１は、この発明の一実施形態による音声入力装置を適用した携帯端末の構成を示す概略ブロック図である。携帯端末１は、音声信号生成部１１、音声認識部１２、辞書データ記憶部１３、変換データ記憶部１４、入力データ抽出部１５、データ変換部１６、データ入力部１７、表示部１８、操作入力部１９を有する。 Hereinafter, a voice input device according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic block diagram showing the configuration of a mobile terminal to which a voice input device according to an embodiment of the present invention is applied. The mobile terminal 1 includes a voice signal generation unit 11, a voice recognition unit 12, a dictionary data storage unit 13, a conversion data storage unit 14, an input data extraction unit 15, a data conversion unit 16, a data input unit 17, a display unit 18, and an operation input. It has a part 19.

音声信号生成部１１は、音声を集音して音声信号を生成する。例えば、音声信号生成部１１は、マイクロフォンが用いられる。音声認識部１２は、音声信号生成部１１が生成した音声信号からテキストデータを生成する。 The voice signal generator 11 collects voice and generates a voice signal. For example, a microphone is used as the audio signal generation unit 11. The voice recognition unit 12 generates text data from the voice signal generated by the voice signal generation unit 11.

辞書データ記憶部１３は、入力項目に対する入力データの候補である単語と、入力項目を構成する複数の小項目のうち少なくとも１つとを対応づけて記憶する。ここで、入力データは、テキストデータの少なくとも一部のデータであり、入力項目の入力欄に対して入力することが可能なデータである。また、入力項目は、複数の小項目を含む場合がある。例えば、入力項目が「住所」である場合には、小項目として、「都道府県」、「市区町村」、「番地」、「建物名」（あるいは「番地など」）などがあり、この複数の小項目から１つの入力項目が構成される。入力データの候補となる単語としては、例えば、「都道府県」であれば、「東京都」、「大阪府」、「北海道」、「埼玉県」等、都道府県として存在しうる名称が記憶される。 The dictionary data storage unit 13 stores a word that is a candidate for input data for an input item and at least one of a plurality of small items that make up the input item in association with each other. Here, the input data is at least a part of the text data, and is data that can be input to the input field of the input item. Moreover, the input item may include a plurality of small items. For example, when the input item is "address", there are "prefecture", "city", "street number", "building name" (or "street number" etc.), etc. as small items. One input item is composed of the small items. As words that can be candidates for input data, for example, in the case of “prefecture”, names that can exist as prefectures such as “Tokyo”, “Osaka prefecture”, “Hokkaido”, “Saitama prefecture” are stored. It

変換データ記憶部１４は、入力項目に対応した変換データを記憶する。ここでは、テキストデータが表す文字をそのまま入力データとして適用することもできるが、入力項目においては、一般的な発話内容と一般的な記述内容が異なる場合がある。例えば、電話番号の市外局番とエリアコード、エリアコードと加入者番号の間は、発話する場合、「０３の００００の００００」のように「の」として発話するが、電話番号を文字列にて記述する場合には、「０３−００００−００００」のように、「−」（ハイフン）が用いられる。このように、発話内容と記述する際の文字列とが異なる場合であっても、記述する際の文字列を意識することなく発話しても、記述方法に沿った文字列に変換することが可能となる。 The conversion data storage unit 14 stores conversion data corresponding to the input item. Here, although the characters represented by the text data can be directly applied as the input data, the general utterance content and the general description content may be different in the input item. For example, when speaking between the area code and area code of the telephone number, and between the area code and the subscriber number, speak as "no" such as "03, 0000, 0000", but the telephone number is converted to a character string. In describing, "-" (hyphen) is used like "03-0000-0000". In this way, even if the utterance content is different from the written character string, even if the user utters without being aware of the written character string, it can be converted into a character string according to the writing method. It will be possible.

入力データ抽出部１５は、テキストデータに含まれるキーワードを複数検出し、検出されたキーワードのうち、第１キーワードと第２キーワードの間のテキストデータを第１キーワードに対する入力データ（分割テキストデータ）として抽出し、第２キーワードの後段のテキストデータの少なくとも一部から第２キーワードに対する入力データ（分割テキストデータ）として抽出する。このようにキーワードに基づくことで、複数の入力項目のうち、どの入力項目に入力すべき文字列であるかを区切って識別することができる。詳細は後述する。
データ変換部１６は、入力データ抽出部によって抽出された入力データの少なくとも一部に変換データ記憶部に記憶された変換対象の文字が含まれている場合に、変換データに基づいて、変換対象の文字を当該文字とは異なる文字に変換する。 The input data extraction unit 15 detects a plurality of keywords included in the text data, and among the detected keywords, the text data between the first keyword and the second keyword is used as the input data (divided text data) for the first keyword. It is extracted and extracted as input data (divided text data) for the second keyword from at least a part of the text data in the latter part of the second keyword. As described above, based on the keyword, it is possible to distinguish which of the plurality of input items is the character string to be input by delimiting it. Details will be described later.
When at least a part of the input data extracted by the input data extraction unit includes the conversion target character stored in the conversion data storage unit, the data conversion unit 16 determines the conversion target character based on the conversion data. Converts a character to a different character.

データ入力部１７は、入力操作に応じてテキストデータを入力可能な入力欄を複数含んだ入力対象データが表示される表示画面における入力欄に対し、前記入力欄に入力する対象となるテキストデータを、前記音声認識部によって生成されたテキストデータを分割して得られる分割テキストデータから取得して、対応する前記入力欄にそれぞれ入力することで記述する。また、データ入力部１７は、入力データ抽出部１５によって抽出された入力データに辞書データ記憶部１４に記憶された単語がある場合に、単語に対応する小項目に対して当該単語を記述する。また、データ入力部１７は、入力データ抽出部１５によって抽出された入力データについてデータ変換部１６によって変換された後のテキストデータを入力項目に対して記述する。 The data input unit 17 sets the text data to be input in the input field to the input field in the display screen on which the input target data including a plurality of input fields in which the text data can be input according to the input operation is displayed. , The text data generated by the voice recognition unit is obtained from the divided text data obtained by dividing the text data, and is input into the corresponding input fields. In addition, when the input data extracted by the input data extraction unit 15 has a word stored in the dictionary data storage unit 14, the data input unit 17 describes the word in a small item corresponding to the word. Further, the data input unit 17 describes the text data after the input data extracted by the input data extraction unit 15 has been converted by the data conversion unit 16 in the input item.

表示部１８は、例えば液晶パネル等であり、各種情報を表示する。操作入力部１９は、タッチセンサであり、検出領域内のどの位置がタッチされたかを検出する。表示部１８と操作入力部１９は、タッチパネルを用いることができる。制御部２０は、端末装置１内の各部を制御する。 The display unit 18 is, for example, a liquid crystal panel or the like, and displays various information. The operation input unit 19 is a touch sensor and detects which position in the detection area is touched. A touch panel can be used as the display unit 18 and the operation input unit 19. The control unit 20 controls each unit in the terminal device 1.

図２は、携帯端末１における入力対象の入力画面の一例を示す図である。
入力画面は、例えば、入力項目とその入力項目に対応した入力欄との組が複数含まれる入力画面であればよく、例えば、各種サービス（銀行口座開設、ローン、旅行、クレジットカード作成など）の申込データ、見積書等の作成を依頼する作成依頼データ、ネットショッピングにおける商品の配達先を登録する宛先データ等がある。この入力画面において、入力項目として、「氏名」（符号１００）、「住所」（符号１１０）、「電話番号」（符号１２０）、「会社名」（符号１３０）がある。入力項目「住所」については、さらに複数の入力項目が小項目として含まれており、ここでは、「都道府県」（符号１１１）、「市町村」（符号１１２）、「番地など」（符号１１３）の小項目がある。そして、これら入力項目、あるいは小項目には、入力欄（符号１４０）がそれぞれ並べられている。また、入力画面の一部（ここでは、画面右上）には、音声入力を行なう機能のオンオフを切替える音声入力ボタン（符号２００）が設けられている。 FIG. 2 is a diagram showing an example of an input screen of an input target on the mobile terminal 1.
The input screen may be, for example, an input screen that includes a plurality of pairs of input items and input fields corresponding to the input items. For example, various services (bank account opening, loan, travel, credit card creation, etc.) There are application data, creation request data for requesting the creation of a quotation, and destination data for registering the delivery destinations of products in online shopping. On this input screen, there are "name" (reference numeral 100), "address" (reference numeral 110), "phone number" (reference numeral 120), and "company name" (reference numeral 130) as input items. The input item "address" further includes a plurality of input items as small items. Here, "prefecture" (reference numeral 111), "municipalities" (reference numeral 112), "address etc." (reference numeral 113). There is a small item. An input field (reference numeral 140) is arranged in each of these input items or small items. Further, a voice input button (reference numeral 200) for switching on and off of a function for performing voice input is provided on a part of the input screen (here, the upper right of the screen).

次に、上述した携帯端末１の動作について図３、図４、図５を用いて説明する。図３は、携帯端末１の動作を説明するフローチャート、図４は、入力項目に対する入力データを得る処理を説明する概念図、図５は、入力項目に対する文字列の入力が行なわれた後の状態を表す画面例を示す図である。
まず、携帯端末１の制御部２０は、ユーザからの操作入力に基づいて、入力対象となる申込データ（例えば、図２）を表示部１８の画面に表示する。制御部２０は、表示部１８に表示された画面に対してタッチ入力されたか否かを検出する（ステップＳ１０１）。タッチ入力されていない場合には（ステップＳ１０１−ＮＯ）、一定時間ウエイトし（ステップＳ１０２）、ステップＳ１０１に移行する。 Next, the operation of the above-described mobile terminal 1 will be described with reference to FIGS. 3, 4, and 5. 3 is a flow chart for explaining the operation of the mobile terminal 1, FIG. 4 is a conceptual diagram for explaining a process for obtaining input data for an input item, and FIG. 5 is a state after a character string is input for the input item. It is a figure which shows the example of a screen showing.
First, the control unit 20 of the mobile terminal 1 displays the application data to be input (for example, FIG. 2) on the screen of the display unit 18 based on the operation input from the user. The control unit 20 detects whether or not a touch input has been made on the screen displayed on the display unit 18 (step S101). If no touch input has been made (step S101-NO), wait for a certain period of time (step S102), and the process proceeds to step S101.

一方、タッチ入力された場合、制御部２０は、音声入力をオンにする指示であるか否かを判定する（ステップＳ１０３）。音声入力をオンにする指示であるか否かは、音声入力ボタン（図２符号２００）をタッチする操作入力であるか否かを基に判断することができる。音声入力をオンにする指示ではない場合（ステップＳ１０３−ＮＯ）、例えば、入力対象項目をタッチして、タッチパネル上の操作ボタンをタッチすることで文字入力された場合には、タッチ入力による文字列の入力処理を行なう（ステップＳ１０４）。 On the other hand, when the touch input is performed, the control unit 20 determines whether or not the instruction is to turn on the voice input (step S103). Whether or not the instruction is to turn on the voice input can be determined based on whether or not the operation input is performed by touching the voice input button (reference numeral 200 in FIG. 2). If it is not an instruction to turn on voice input (step S103-NO), for example, if a character is input by touching an input target item and then touching an operation button on the touch panel, a character string by touch input Is input (step S104).

一方、音声入力をオンにする操作である場合、制御部２０は、音声入力処理を開始する。音声入力処理が開始されると、音声信号生成部１１は、ユーザによって発話される音声を取得し、得られた音声に対応する音声信号を生成する（ステップＳ１０５）。音声信号が生成されると、音声認識部１２は、音声信号に基づいて音声認識処理を行なうことで、音声信号に対応するテキストデータを生成する（ステップＳ１０６）。例えば、ここで得られるテキストデータは、「氏名は山田太郎住所は東京都港区１の１の１電話番号は０３の００００の００００会社名は特許株式会社」（図４、符号３００）である。 On the other hand, when the operation is to turn on the voice input, the control unit 20 starts the voice input process. When the voice input process is started, the voice signal generation unit 11 acquires the voice uttered by the user and generates a voice signal corresponding to the obtained voice (step S105). When the voice signal is generated, the voice recognition unit 12 performs voice recognition processing based on the voice signal to generate text data corresponding to the voice signal (step S106). For example, the text data obtained here is “Name is Taro Yamada, Address is 1 in 1 of Minato-ku, Tokyo, Phone number is 03, 0000, 0000 Company name is Patent Corporation” (FIG. 4, symbol 300). ..

テキストデータが生成されると、入力データ抽出部１５は、テキストデータに含まれるキーワードを検出する（ステップＳ１０７）。ここで検出するキーワードは、入力項目の名称に対応する文字列であり、例えば、「氏名」（図４符号４００）、「住所」（図４符号４０２）、「電話番号」（図４符号４０４）、「会社名」（図４符号４０６）である。そして、入力データ抽出部１５は、検出された複数のキーワードを基に、キーワードとキーワードの間にある文字列を特定するとともに、最終キーワードの後段の文字列を特定する（ステップＳ１０８）。ここで、最終キーワードとは、得られたテキストデータの最も文末側にあるキーワードであり、ここでは、「会社名」である。そして、このステップＳ１０８において、キーワードとキーワードの間にある文字列として特定される文字列は、例えば、「は山田太郎」、「は東京都港区１の１の１」、「は０３の００００の００００」であり、最終キーワードの後段にある文字列として特定される文字列は、「は特許株式会社」である。そして特定された文字列の先頭に所定の文字（例えば、「は」）がある場合には、特定された文字列から除外し、「山田太郎」（図４符号４０１）、「東京都港区１の１の１」（図４符号４０３）、「０３の００００の００００」（図４符号４０５）、「特許株式会社」（図４符号４０７）の文字列を特定された文字列として得る。 When the text data is generated, the input data extraction unit 15 detects a keyword included in the text data (step S107). The keyword detected here is a character string corresponding to the name of the input item. For example, "name" (reference numeral 400 in FIG. 4), "address" (reference numeral 402 in FIG. 4), and "phone number" (reference numeral 404 in FIG. 4). ), and “company name” (reference numeral 406 in FIG. 4). Then, the input data extraction unit 15 specifies the character string between the keywords based on the detected plurality of keywords and also specifies the character string of the latter stage of the final keyword (step S108). Here, the final keyword is the keyword at the end of the sentence of the obtained text data, and is the “company name” here. Then, in this step S108, the character string specified as the character string between the keywords is, for example, “Taro Yamada”, “is 1 to 1 of Minato-ku, Tokyo”, or “is 03 to 0000”. The character string specified as the character string following the final keyword is “wa patent corporation”. If a specified character (for example, "ha") is at the beginning of the specified character string, it is excluded from the specified character string, and "Yamada Taro" (reference numeral 401 in FIG. 4), "Minato Ward, Tokyo" The character strings "1 in 1 of 1" (reference numeral 403 in FIG. 4), "0000 of 0000 in 03" (reference numeral 405 in FIG. 4), and "patent corporation" (reference numeral 407 in FIG. 4) are obtained as the specified character strings.

次に、入力データ抽出部１５は、小項目がある入力項目に対する文字列について、小項目に対応するように分割する（ステップＳ１０９）。小項目がある入力項目であるか否かについて、入力データ抽出部１５は、予め携帯端末１の所定のメモリ領域内に入力項目名を登録しておき、ステップＳ１０７において検出されたキーワードに、メモリ領域に登録された入力項目名に対応するキーワードがあるか否かを判定し、対応するキーワードがある場合には、そのキーワードの直ぐ後ろ側にある文字列を小項目に分割する対象の文字列であるとして特定する。そして、入力データ抽出部１５は、辞書データ記憶部１３を参照し、特定された文字列の中に、辞書データ記憶部１３に登録された文字列が含まれているか否かを判定する。例えば、この辞書データ記憶部１３には、入力項目「住所」に対応づけて、住所の候補として、都道府県（例えば、「東京都」、「神奈川県」、「大阪府」等の４７都道府県の名称）、市区町村の名称（例えば、「港区」、「千代田区」、「青葉区」、「中央区」等の４７都道府県に存在しうる市区町村の名称）の文字列のデータが記憶されている。入力データ抽出部１５は、「住所」の後段の文字列である「東京都港区１の１の１」のうち、辞書データ記憶部１３に記憶された文字列「東京都」と「港区」とがそれぞれ小項目に対応する文字列として特定し、残りの「１の１の１」についても、小項目に対応する文字列として特定し、もとの文字列「東京都港区１の１の１」を分割して、「東京都」と「港区」と「１の１の１」との３つの文字列を得る。 Next, the input data extraction unit 15 divides the character string for an input item having a small item so as to correspond to the small item (step S109). Regarding whether or not the small item is an input item, the input data extraction unit 15 registers the input item name in a predetermined memory area of the mobile terminal 1 in advance, and stores the input item name in the memory detected by the keyword detected in step S107. It is determined whether there is a keyword corresponding to the input item name registered in the area, and if there is a corresponding keyword, the character string immediately behind that keyword is the character string to be divided into sub-items. Identify as. Then, the input data extraction unit 15 refers to the dictionary data storage unit 13 and determines whether or not the identified character string includes the character string registered in the dictionary data storage unit 13. For example, the dictionary data storage unit 13 is associated with the input item “address”, and the prefectures (for example, 47 prefectures such as “Tokyo”, “Kanagawa prefecture”, “Osaka prefecture”, etc.) are candidates for the address. Name), the name of the municipality (for example, the name of a municipality that can exist in 47 prefectures such as "Minato Ward", "Chiyoda Ward", "Aoba Ward", "Chuo Ward") The data is stored. The input data extraction unit 15 extracts the character strings “Tokyo” and “Minato Ward” stored in the dictionary data storage unit 13 from “1 of 1 in 1 Minato-ku, Tokyo” which is the character string after “Address”. And “1 of 1 in 1” are also specified as the character strings corresponding to the small items, and the original character string “1 in Minato-ku, Tokyo” is also specified. "1 of 1" is divided to obtain three character strings "Tokyo", "Minato Ward", and "1 of 1 of 1".

次に、入力データ抽出部１５は、変換データ記憶部１４を参照し、特定された文字列のそれぞれについて、変換対象の文字について、別の文字に変換する（ステップＳ１１０）。ここで、変換データ記憶部１４は、入力項目「住所」の小項目「番地など」に対応する文字列を対象として、「の」の文字がある場合には、変換後の文字が「−」（ハイフン）であることが記憶され、入力項目「電話番号」に対応する文字列を対象として、「の」の文字がある場合には、変換後の文字が「−」（ハイフン）であることを記憶している。入力データ抽出部１５は、この変換データ記憶部１４を参照し、文字列に変換対象の文字がある場合には、文字の変換を行なう。例えば、「１の１の１」の文字列は、「１−１−１」に変換され、「０３の００００の００００」の文字列は、「０３−００００−００００」に変換される。 Next, the input data extraction unit 15 refers to the conversion data storage unit 14 and converts the character to be converted into another character for each of the specified character strings (step S110). Here, the conversion data storage unit 14 targets the character string corresponding to the small item “address, etc.” of the input item “address”, and when the character “no” is present, the converted character is “−”. (Hyphen) is memorized, and if there is a character "no" in the character string corresponding to the input item "phone number", the converted character must be "-" (hyphen). I remember. The input data extraction unit 15 refers to the conversion data storage unit 14, and if the character string has a character to be converted, converts the character. For example, the character string of "1 of 1" is converted into "1-1-1", and the character string of "03 of 0000-0000" is converted into "03-0000-0000".

次に、データ入力部１７は、入力データ抽出部１５によって得られた文字列について、対応する入力項目の入力欄に、キーワードを基に記述する（ステップＳ１１１）。
ここでは、データ入力部１７は、例えば、キーワード「氏名」とキーワード「住所」の間にある文字列「山田太郎」を、入力項目「氏名」に対する入力データであると特定し、「氏名」に対応する入力欄に記述する。以下同様に、データ入力部１７は、キーワード「住所」とキーワード「電話番号」の間にある文字列については、ステップＳ１０９において分割されており、この分割された文字列を対応する小項目の入力欄に記述する。ここでは、小項目「都道府県」に文字列「東京都」、小項目「市区町村」に「港区」が記述され、「番地など」については、ステップＳ１１０において変換された後の文字列「１−１−１」が記述される。また、データ入力部１７は、キーワード「電話番号」とキーワード「会社名」の間にある文字列であって、ステップＳ１１０において変換された後の文字列「０３−００００−００００」を、入力項目「電話番号」に対する入力データであると特定し、「電話番号」に対応する入力欄に記述する。そして、データ入力部１７は、最終キーワード「会社名」の後段にある文字列「特許株式会社」を、入力項目「会社名」に対する入力データであると特定し、「会社名」に対応する入力欄に記述する。このようにして入力項目の入力欄に対する文字列の記述が行なわれると、図５に示すように、各入力欄に、音声入力された情報に基づく文字列が記述される。この記述がなれた後に、送信ボタン（符号５００）がタッチ入力された場合、携帯端末１は、申込データの送信先のサーバ装置に対して、入力項目に対する入力データを送信する。 Next, the data input unit 17 describes the character string obtained by the input data extraction unit 15 in the input field of the corresponding input item based on the keyword (step S111).
Here, for example, the data input unit 17 identifies the character string “Taro Yamada” between the keyword “name” and the keyword “address” as the input data for the input item “name”, and sets it as “name”. Describe in the corresponding input field. Similarly, the data input unit 17 divides the character string between the keyword “address” and the keyword “phone number” in step S109, and inputs the divided character string into the corresponding sub-item. Describe in the column. Here, the character string “Tokyo” is described in the small item “prefecture”, and “Minato Ward” is described in the small item “city/town/village”. Regarding the “address, etc.”, the character string after being converted in step S110. “1-1-1” is described. Further, the data input unit 17 inputs the character string “03-0000-0000” which is a character string between the keyword “telephone number” and the keyword “company name”, which is converted in step S110. It is specified that the input data is for "telephone number", and is described in the input field corresponding to "telephone number". Then, the data input unit 17 identifies the character string "patent corporation" in the latter part of the final keyword "company name" as the input data for the input item "company name", and inputs the corresponding "company name". Describe in the column. When the character string is described in the input field of the input item in this manner, as shown in FIG. 5, the character string based on the information input by voice is described in each input field. When the send button (reference numeral 500) is touch-inputted after this description is made, the mobile terminal 1 sends the input data for the input item to the server device to which the application data is sent.

以上説明した実施形態において、図５に示す音声入力が行なわれた後の画面において、記述された文字列について、修正したい文字列がある場合には、その修正対象の入力欄をタッチ操作することで、その入力欄に対し、音声入力あるいは、タッチ操作による文字列の入力を行ない、文字列の修正を行なうことができる。 In the embodiment described above, if there is a character string to be corrected in the described character string on the screen after the voice input shown in FIG. 5, touch the input field of the correction target. Then, the character string can be corrected by inputting a voice or a character string by a touch operation to the input field.

また、上述の実施形態において、ステップＳ１０３において、音声入力ボタンがタッチされた時点から音声入力処理が開始される場合について説明したが、音声入力処理の終了タイミングとしては、再度音声入力ボタンがタッチされるまで、音声入力ボタンがタッチされてから所定の時間が経過するまで、あるいは、タッチされた指が音声入力ボタンから離れるまで、のいずれのタイミングであってもよい。 Further, in the above-described embodiment, the case where the voice input process is started from the time when the voice input button is touched in step S103 has been described. However, as the end timing of the voice input process, the voice input button is touched again. Until a predetermined time elapses after the voice input button is touched, or until the touched finger is separated from the voice input button.

また、上述の実施形態において、申込データの入力画面において１つの音声入力ボタンを設ける場合について説明したが、入力項目よりも少ない数の範囲であれば、音声入力ボタンを複数設けるようにしてもよい。例えば、氏名と住所を対象として音声入力するための音声入力ボタンと、電話番号と会社名とを対象として音声入力するための音声入力ボタンとを設けるようにしてもよい。 Further, in the above-described embodiment, the case where one voice input button is provided on the application data input screen has been described, but a plurality of voice input buttons may be provided as long as the number is smaller than the number of input items. .. For example, a voice input button for voice inputting a name and an address and a voice input button for voice inputting a telephone number and a company name may be provided.

また、上述した実施形態において、ステップＳ１０８の文字列を特定する処理において、特定された文字列の先頭に所定の文字（例えば、「は」）がある場合に、特定された文字列から除外するようにしたが、「が」を除外対象の文字として予め登録しておき、「氏名が山田太郎・・・」等のテキストデータから「が山田太郎」の文字列が得られた場合には、文字列の先頭の所定の文字として「が」を除外するようにしてもよい。 Further, in the above-described embodiment, in the process of identifying a character string in step S108, if a specified character (for example, “ha”) is present at the beginning of the identified character string, it is excluded from the identified character string. However, when "ga" is registered in advance as a character to be excluded and a character string "ga Yamada Taro" is obtained from text data such as "Name is Taro Yamada...", “Ga” may be excluded as the predetermined character at the beginning of the character string.

また、上述の実施形態においては、本発明に係る音声入力処理を携帯端末１において適用する場合について説明したが、スマートフォン、ＰＣ（パーソナルコンピュータ）において適用するようにしてもよい。また、音声入力を端末装置において行ない、得られた音声データまたはテキストデータをサーバ装置に送信し、本発明に係る音声入力処理をそのサーバ装置において行ない、入力欄に記述された後の結果を端末装置に送信するようにしてもよい。また、例えば、携帯端末１における機能のうち、音声認識部１２、辞書データ記憶部１３、変換データ記憶部１４の機能をサーバ装置において行なうようにしてもよい。 Moreover, although the case where the voice input process according to the present invention is applied to the mobile terminal 1 has been described in the above-described embodiment, it may be applied to a smartphone or a PC (personal computer). Also, voice input is performed in the terminal device, the obtained voice data or text data is transmitted to the server device, the voice input processing according to the present invention is performed in the server device, and the result after being described in the input field is displayed in the terminal. It may be transmitted to the device. Further, for example, among the functions of the mobile terminal 1, the functions of the voice recognition unit 12, the dictionary data storage unit 13, and the conversion data storage unit 14 may be performed in the server device.

上述した実施形態における音声信号生成部１１、音声認識部１２、辞書データ記憶部１３、変換データ記憶部１４、入力データ抽出部１５、データ変換部１６、データ入力部１７の機能をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The functions of the voice signal generation unit 11, the voice recognition unit 12, the dictionary data storage unit 13, the conversion data storage unit 14, the input data extraction unit 15, the data conversion unit 16, and the data input unit 17 in the above-described embodiment are realized by a computer. You may do it. In that case, the program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read by a computer system and executed. The “computer system” mentioned here includes an OS and hardware such as peripheral devices. Further, the "computer-readable recording medium" means a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, the "computer-readable recording medium" means to hold a program dynamically for a short time like a communication line when transmitting the program through a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system that serves as a server or a client in that case, which holds a program for a certain period of time, may be included. Further, the program may be for realizing a part of the functions described above, or may be a program that can realize the functions described above in combination with a program already recorded in a computer system, It may be realized using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to this embodiment, and includes a design and the like within a range not departing from the gist of the present invention.

１…携帯端末
１１…音声信号生成部
１２…音声認識部
１３…辞書データ記憶部
１４…変換データ記憶部
１５…入力データ抽出部
１６…データ変換部
１７…データ入力部
１８…表示部
１９…操作入力部 DESCRIPTION OF SYMBOLS 1... Mobile terminal 11... Voice signal generation part 12... Voice recognition part 13... Dictionary data storage part 14... Conversion data storage part 15... Input data extraction part 16... Data conversion part 17... Data input part 18... Display part 19... Operation Input section

Claims

A voice recognition unit that generates text data from a voice signal corresponding to the collected voice,
A plurality of keywords included in the text data is detected, and among the detected keywords, text data between a first keyword and a second keyword is extracted as divided text data for the first keyword. An input data extraction unit that extracts, as at least a part of the text data of, as divided text data for the second keyword,
A dictionary data storage unit that stores a word that is a candidate for segmented text data for an input item and at least one of a plurality of small items that form the input item in association with each other;
With respect to the input field on the display screen on which the input target data including a plurality of input fields in which the text data can be input according to the input operation is displayed, the text data to be input in the input field is input to the input data extraction unit acquired from divide text data obtained by having a data input unit for inputting to the corresponding said input field, a,
When the keyword detected from the divided keywords has a word stored in the dictionary data storage unit, the input data extraction unit assigns a character string immediately behind the keyword to the name of a small item in the text data. Even if it does not include, specify that it is a character string to be divided into small items,
The data input unit, in the input field having a correspondence relationship with the name of the sub-item corresponding to the word among the names of the sub-items displayed on the display screen, the character string to be divided into the sub-items. A voice input device that describes a word obtained based on a specified character string .

A conversion data storage unit that stores conversion data corresponding to the input item;
If at least a part of the divided text data extracted by the input data extraction unit includes a conversion target character stored in the conversion data storage unit, the conversion target character is based on the conversion data. And a data conversion unit that converts the character into a character different from the character,
The voice input device according to claim 1, wherein the data input unit describes, for an input item, the divided text data after the divided text data extracted by the input data extraction unit has been converted by the data conversion unit.

A voice input method in a computer,
The voice recognition unit generates text data from a voice signal corresponding to the collected voice,
The input data extraction unit detects a plurality of keywords included in the text data, and of the detected keywords, extracts text data between a first keyword and a second keyword as divided text data for the first keyword, Extracted as segmented text data for the second keyword from at least a part of the text data in the latter stage of the second keyword,
The input data extraction unit stores the keyword detected from the split keyword in association with a word that is a candidate for the split text data for the input item and at least one of a plurality of small items that form the input item. When there is a word stored in the dictionary data storage unit, the character string immediately after the keyword is to be divided into sub-items even if the text data does not include the name of the sub-item. Is identified as
The data input unit, for the input field in the display screen on which the input target data including a plurality of input fields in which the text data can be input according to the input operation is displayed, the text data to be input in the input field, acquired from divide text data obtained by the input data extraction unit, and input to the corresponding said input field,
The data input unit, in the input field having a correspondence relationship with the name of the sub-item corresponding to the word among the names of the sub-items displayed on the display screen, the character string to be divided into the sub-items. A voice input method that describes a word obtained based on a specified character string .