JP2023029982A

JP2023029982A - Operation input method, operation input system, and operation terminal

Info

Publication number: JP2023029982A
Application number: JP2022195657A
Authority: JP
Inventors: 亮太藤井; Ryota Fujii
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-10-24
Filing date: 2022-12-07
Publication date: 2023-03-07
Also published as: JP7203334B2; JP2020067788A

Abstract

PROBLEM TO BE SOLVED: To provide an operation input method that improves convenience in inputting of a user operation even in a situation where it is difficult for the user to perform an operation with a hand.

SOLUTION: An operation input method comprises steps of: collecting a voice uttered by a user while a first screen is displayed on an operation terminal; recognizing the collected voice; determining whether or not a recognition result matches input content for a confirmation item shown on the first screen; instructing, when the recognition result matches the input content for the confirmation item shown on the first screen, the operation terminal to switch display from the first screen to a second screen; determining whether or not the recognition result issued by the user matches the input content for the confirmation item displayed on a m-th screen while the m-th screen (an integer that satisfies 2≤m≤N, N: an integer greater than or equal to 4) is displayed on the operation terminal; and instructing, when the recognition result matches the input content for the confirmation item shown on the m-th screen, the operation terminal to switch the display from the m-th screen to a (m+1)th screen.

SELECTED DRAWING: Figure 2

Description

本開示は、音声を用いて操作する操作入力方法、操作入力システムおよび操作端末に関する。 The present disclosure relates to an operation input method, an operation input system, and an operation terminal that operate using voice.

特許文献１には、住所に対応する入力音声に対して音声認識処理を行うことで、住所を構成する複数の単語のそれぞれに対応する第１候補を決定し、決定された複数の単語のそれぞれに対応する複数の第１候補を利用者に提示する音声認識装置が開示されている。音声認識装置は、認識結果を提示した後に利用者からの認識結果修正指示を受け付けると、誤認識があった単語に対応する入力音声に対して再度の音声認識処理を行うことで、この単語に対応する第１候補を除く第２候補を決定する。これにより、住所を音声入力する際の発話回数を減らせることができ、利用者の操作の簡略化が可能となる。 In Patent Document 1, by performing speech recognition processing on input speech corresponding to an address, a first candidate corresponding to each of a plurality of words forming the address is determined, and each of the determined plurality of words is A speech recognition device is disclosed that presents a user with a plurality of first candidates corresponding to . After presenting the recognition result, the speech recognition device accepts an instruction to correct the recognition result from the user. A second candidate is determined that excludes the corresponding first candidate. As a result, the number of utterances required to input the address by voice can be reduced, and the user's operation can be simplified.

特開２０１７－１０２３２０号公報JP 2017-102320 A

しかし、特許文献１の構成では、例えば誤認識された単語があった場合に利用者からの認識結果修正指示はバックスイッチを押下する等の処理が求められるため、利用者の手を用いた操作が必要となる。このため、例えば利用者が手を使えない状況にある場合等においては、認識結果の修正を行うことができず、円滑な処理を進行できない点で利用者の利便性が低下するという課題があった。 However, in the configuration of Patent Literature 1, for example, when there is a misrecognized word, processing such as pressing the back switch is required for the recognition result correction instruction from the user. Is required. For this reason, for example, when the user is in a situation where the user cannot use his or her hands, there is a problem that the user's convenience is reduced in that the recognition result cannot be corrected and the processing cannot proceed smoothly. rice field.

本開示は、上述した従来の状況に鑑みて案出され、ユーザが手を用いた操作を行うことが難しい状況等でも、ユーザ操作の入力時の利便性を向上する操作入力方法、操作入力システムおよび操作端末を提供することを目的とする。 The present disclosure has been devised in view of the above-described conventional situation, and an operation input method and an operation input system that improve convenience when inputting user operations even in situations where it is difficult for the user to perform operations using hands. and to provide an operation terminal.

本開示は、操作端末に第１の画面が表示された状態でユーザの発する音声を収音するステップと、収音された前記音声を認識するステップと、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致するか否かを判断するステップと、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致する場合、前記第１の画面から第２の画面への表示の切り替えを前記操作端末に指示するステップと、前記操作端末に第ｍ番目の画面（２≦ｍ≦Ｎを満たす整数、Ｎ：４以上の整数）が表示された状態で前記ユーザの発する音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致するか否かを判断するステップと、前記音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致する場合、前記第ｍ番目の画面から第（ｍ＋１）番目の画面への表示の切り替えを前記操作端末に指示するステップと、を有する、操作入力方法を提供する。 The present disclosure includes a step of collecting a voice uttered by a user while a first screen is displayed on an operation terminal; a step of recognizing the collected voice; determining whether or not the input contents for the confirmation items shown on the screen match the input contents for the confirmation items shown on the first screen; a step of instructing the operation terminal to switch display from to a second screen; determining whether or not the recognition result of the voice uttered by the user matches the input contents for the confirmation items displayed on the m-th screen; and and instructing the operation terminal to switch display from the m-th screen to the (m+1)-th screen when the input contents match the displayed confirmation items. .

また、本開示は、音声入力装置および表示装置を有する操作端末と、音声処理装置とが通信可能に接続された操作入力システムであって、前記操作端末は、前記表示装置に第１の画面を表示した状態で前記音声入力装置によりユーザの発する音声を収音し、前記操作端末または前記音声処理装置は、収音された前記音声を認識し、前記音声処理装置は、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致するか否かを判断し、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致する場合、前記第１の画面から第２の画面への表示の切り替えを前記操作端末に指示し、前記音声処理装置は、前記操作端末に第ｍ番目の画面（２≦ｍ≦Ｎを満たす整数、Ｎ：４以上の整数）が表示された状態で前記ユーザの発する音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致するか否かを判断し、前記音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致する場合、前記第ｍ番目の画面から第（ｍ＋１）番目の画面への表示の切り替えを前記操作端末に指示する、操作入力システムを提供する。 Further, the present disclosure is an operation input system in which an operation terminal having a voice input device and a display device and a voice processing device are communicably connected, wherein the operation terminal displays a first screen on the display device. In the displayed state, the speech input device picks up the voice uttered by the user, the operation terminal or the speech processing device recognizes the picked-up speech, and the speech processing device recognizes the recognition result of the speech determining whether or not the input contents for the confirmation items shown on the first screen match the input contents for the confirmation items shown on the first screen; The voice processing device instructs the operation terminal to switch the display from the screen to the second screen, and the voice processing device instructs the operation terminal to display the m-th screen (integer satisfying 2 ≤ m ≤ N, N: 4 or more (integer) is displayed, it is determined whether or not the recognition result of the voice uttered by the user matches the input content for the confirmation item shown on the m-th screen, and the recognition result of the voice is the m-th screen. To provide an operation input system for instructing an operation terminal to switch display from the m-th screen to the (m+1)-th screen when input contents for confirmation items shown on the th screen match.

また、本開示は、表示装置に第１の画面を表示した状態でユーザの発する音声を収音する音声入力装置と、収音された前記音声を認識する認識部と、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致するか否かを判断する制御部と、を備え、前記制御部は、前記音声の認識結果が前記第１の画面に示される確認項目に対する入力内容と合致する場合、前記第１の画面から第２の画面への表示の切り替えを前記表示装置に指示し、前記制御部は、前記表示装置に第ｍ番目の画面（２≦ｍ≦Ｎを満たす整数、Ｎ：４以上の整数）が表示された状態で前記ユーザの発する音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致するか否かを判断し、前記音声の認識結果が前記第ｍ番目の画面に示される確認項目に対する入力内容と合致する場合、前記第ｍ番目の画面から第（ｍ＋１）番目の画面への表示の切り替えを前記表示装置に指示する、操作端末を提供する。 Further, the present disclosure includes a voice input device that collects a voice uttered by a user while a first screen is displayed on a display device, a recognition unit that recognizes the collected voice, and a recognition result of the voice. a control unit that determines whether or not the input contents of the confirmation items displayed on the first screen match the confirmation items displayed on the first screen. , the control unit instructs the display device to switch the display from the first screen to the second screen, and the control unit instructs the display device to display the mth screen (2 ≤ m ≤ (integer satisfying N, where N is an integer equal to or greater than 4) is displayed, it is determined whether or not the recognition result of the voice uttered by the user matches the input contents for the confirmation items shown on the m-th screen. , if the speech recognition result matches the input content for the confirmation item shown on the m-th screen, switching the display from the m-th screen to the (m+1)-th screen on the display device Provide an operation terminal to instruct.

本開示によれば、ユーザが手を用いた操作を行うことが難しい状況等でも、ユーザ操作の入力時の利便性を向上できる。 Advantageous Effects of Invention According to the present disclosure, it is possible to improve convenience when inputting a user operation even in a situation where it is difficult for the user to perform an operation using hands.

実施の形態１に係る操作入力システムの構成の概略を示す図1 is a diagram showing a schematic configuration of an operation input system according to Embodiment 1; FIG. 操作端末および音声処理装置のハードウェア構成を示すブロック図Block diagram showing the hardware configuration of the operation terminal and voice processing device キーワードデータベースの登録内容を示すテーブルA table showing the content registered in the keyword database 表示装置の音声入力画面の遷移を示す図The figure which shows the transition of the voice input screen of a display apparatus. 実施の形態１に係る音声認識の動作手順例を示すシーケンス図FIG. 4 is a sequence diagram showing an example of the operation procedure of speech recognition according to Embodiment 1; 図５に続く音声認識の動作手順例を示すシーケンス図A sequence diagram showing an example of a speech recognition operation procedure following FIG. 表示装置に表示された点検結果画面の一例を示す図The figure which shows an example of the inspection result screen displayed on the display apparatus 実施の形態１の変形例１に係る音声認識の動作手順例を示すフローチャートFlowchart showing an example of a speech recognition operation procedure according to Modification 1 of Embodiment 1 図８に続く音声認識の動作手順例を示すフローチャートFlowchart showing an example of a speech recognition operation procedure following FIG. 実施の形態２に係る表示装置の音声入力画面の遷移を示す図FIG. 10 is a diagram showing transition of voice input screens of the display device according to Embodiment 2; 操作入力システムにおける音声認識手順を示すシーケンス図Sequence diagram showing the speech recognition procedure in the operation input system 図１１に続く操作入力システムにおける音声認識手順を示すシーケンス図FIG. 11 is a sequence diagram showing the speech recognition procedure in the operation input system following FIG. 実施の形態２の変形例１に係る音声認識手順を示すフローチャートFlowchart showing a speech recognition procedure according to Modification 1 of Embodiment 2 図１３に続く音声認識手順を示すフローチャートFlowchart showing the speech recognition procedure following FIG.

以下、適宜図面を参照しながら、本開示に係る操作入力方法、操作入力システムおよび操作端末の構成および作用を具体的に開示した実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面および以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の範囲に記載の主題を限定することは意図されていない。 Hereinafter, embodiments specifically disclosing the configurations and actions of an operation input method, an operation input system, and an operation terminal according to the present disclosure will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of well-known matters and redundant descriptions of substantially the same configurations may be omitted. This is to avoid unnecessary verbosity in the following description and to facilitate understanding by those skilled in the art. It should be noted that the accompanying drawings and the following description are provided for a thorough understanding of the present disclosure by those skilled in the art and are not intended to limit the claimed subject matter.

（実施の形態１）
図１は、実施の形態１に係る操作入力システム５の構成の概略を示す図である。操作入力システム５は、操作端末１０を携帯する利用者（ユーザ）が発する音声を認識し、この音声認識結果に対応する情報を表示する。操作入力システム５は、アクセスポイント４０を介してネットワークＮＷに接続される操作端末１０と、ネットワークＮＷに接続された音声処理装置５０とを含む構成である。 (Embodiment 1)
FIG. 1 is a diagram showing a schematic configuration of an operation input system 5 according to Embodiment 1. As shown in FIG. The operation input system 5 recognizes a voice uttered by a user carrying the operation terminal 10 and displays information corresponding to the voice recognition result. The operation input system 5 includes an operation terminal 10 connected to a network NW via an access point 40, and a voice processing device 50 connected to the network NW.

アクセスポイント４０は、操作端末１０を無線でネットワークＮＷに接続する機器である。ネットワークＮＷは、有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ、あるいはインターネット等の広域ネットワークである。 The access point 40 is a device that wirelessly connects the operation terminal 10 to the network NW. The network NW is a wide area network such as a wired LAN (Local Area Network), a wireless LAN, or the Internet.

操作端末１０は、複数の利用者によって共用される端末であり、音声入力および表示可能なタブレット端末で構成される。タブレット端末には、入力された音声データをそのまま音声処理装置５０に送信するためのアプリケーション、入力された音声を認識してテキストデータに変換するためのアプリケーション、このテキストデータを音声処理装置５０に送信するためのアプリケーションがそれぞれインストールされている。なお、操作端末１０は、タブレット端末の他に、同様な情報処理能力および通信機能を有する電子機器であるスマートフォン、ノート型ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）あるいはＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）等のコンピュータ端末でもよい。 The operation terminal 10 is a terminal shared by a plurality of users, and is composed of a tablet terminal capable of voice input and display. The tablet terminal has an application for transmitting input voice data as it is to the voice processing device 50, an application for recognizing the input voice and converting it into text data, and transmitting this text data to the voice processing device 50. An application is installed for each. In addition to the tablet terminal, the operation terminal 10 may be a computer terminal such as a smartphone, a notebook PC (Personal Computer), or a PDA (Personal Digital Assistant), which are electronic devices having similar information processing capability and communication function.

音声処理装置５０は、汎用のコンピュータで構成される。音声処理装置５０は、操作端末１０から送信された音声データを基に、音声を認識してテキストデータに変換し、また、このテキストデータに対応する表示情報を取得してあるいは画面データを生成して操作端末１０に送信する。なお、音声処理装置５０は、テキストデータに変換することなく、音声データに対応する表示情報を取得してあるいは画面データを生成して操作端末１０に送信してもよい。 The voice processing device 50 is configured by a general-purpose computer. The voice processing device 50 recognizes voice based on the voice data transmitted from the operation terminal 10, converts it into text data, acquires display information corresponding to this text data, or generates screen data. to the operation terminal 10. Note that the voice processing device 50 may acquire display information corresponding to voice data or generate screen data and transmit it to the operation terminal 10 without converting it into text data.

図２は、操作端末１０および音声処理装置５０のハードウェア構成を示すブロック図である。操作端末１０は、プロセッサ１１と、メモリ１２と、通信回路１３と、音声入力装置１４と、表示装置１５とを含む構成である。 FIG. 2 is a block diagram showing the hardware configuration of the operation terminal 10 and the audio processing device 50. As shown in FIG. The operation terminal 10 includes a processor 11 , a memory 12 , a communication circuit 13 , an audio input device 14 and a display device 15 .

プロセッサ１１は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）あるいはＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）を用いて構成され、操作端末１０の各部の動作を制御する。プロセッサ１１は、操作端末１０の制御部として機能し、操作端末１０の各部の動作を全体的に統括するための制御処理、操作端末１０の各部との間のデータの入出力処理、データの演算（計算）処理およびデータの記憶処理を行う。プロセッサ１１は、メモリ１２内のＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に記憶されたプログラムの実行に従って動作する。 The processor 11 is configured using, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an FPGA (Field Programmable Gate Array), and controls the operation of each section of the operation terminal 10 . The processor 11 functions as a control unit of the operation terminal 10, and performs control processing for overall control of the operation of each unit of the operation terminal 10, data input/output processing with each unit of the operation terminal 10, and data calculation. (computation) to process and store data; The processor 11 operates according to execution of programs stored in a ROM (Read Only Memory) in the memory 12 .

また、プロセッサ１１は、機能的な構成として音声認識部２５を有する。音声認識部２５は、メモリ１２内のＲＯＭに記憶されたプログラムの実行に従って構成されるソフトウェア的な構成要素であり、音声入力装置１４で入力された音声を認識し、テキストデータに変換する。プロセッサ１１は、画面データを基に、表示装置１５に各種情報を表示する。また、プロセッサ１１は、音声入力を促進するための音声の候補を表示装置１５に表示する。プロセッサ１１は、表示装置１５に表示するための画面データを生成する。 The processor 11 also has a speech recognition unit 25 as a functional configuration. The voice recognition unit 25 is a software component configured according to the execution of a program stored in the ROM in the memory 12, recognizes voice input by the voice input device 14, and converts it into text data. The processor 11 displays various information on the display device 15 based on the screen data. Processor 11 also displays speech candidates on display device 15 to facilitate speech input. Processor 11 generates screen data for display on display device 15 .

メモリ１２は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）とＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）を含み、操作端末１０の動作の実行に必要なプログラムやデータ、動作中に生成された情報またはデータ等を一時的に保存する。ＲＡＭは、例えばプロセッサ１１の動作時に使用されるワークメモリである。ＲＯＭは、例えばプロセッサ１１を制御するためのプログラムおよびデータを予め記憶する。また、メモリ１２には、社員ＩＤ（ＩｄｅｎｔｉｆｉｃａｔｉｏｎＮｕｍｂｅｒ）、および各社員ＩＤに対応する複数の点検項目が登録されたキーワードテーブル１２ｚが記憶されている（図３参照）。 The memory 12 includes, for example, RAM (Random Access Memory) and ROM (Read Only Memory), and temporarily stores programs and data necessary for executing operations of the operation terminal 10, information or data generated during operations, and the like. do. The RAM is a work memory used when the processor 11 operates, for example. The ROM pre-stores programs and data for controlling the processor 11, for example. The memory 12 also stores an employee ID (Identification Number) and a keyword table 12z in which a plurality of inspection items corresponding to each employee ID are registered (see FIG. 3).

通信回路１３は、ネットワークＮＷを介して音声処理装置５０との間で通信可能な通信回路、あるいはその通信回路が実装されたＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）により構成される。 The communication circuit 13 is composed of a communication circuit capable of communicating with the speech processing device 50 via the network NW, or a NIC (Network Interface Card) on which the communication circuit is mounted.

音声入力装置１４は、音声を収音するマイクにより構成される。マイクとしては、指向性を有するマイク、無指向性のマイクのいずれでもよい。 The voice input device 14 is composed of a microphone for picking up voice. The microphone may be either a directional microphone or an omnidirectional microphone.

表示装置１５は、プロセッサ１１の指示に従い、音声処理装置５０から送信された画面データを表示する。なお、操作端末１０が画面データを生成する場合、表示装置１５は、プロセッサ１１が生成した画面データを表示する。なお、表示装置１５は、タッチ入力操作可能なタッチパネルで構成されてもよい。 The display device 15 displays the screen data transmitted from the audio processing device 50 according to instructions from the processor 11 . Note that when the operation terminal 10 generates screen data, the display device 15 displays the screen data generated by the processor 11 . Note that the display device 15 may be configured by a touch panel capable of touch input operation.

音声処理装置５０は、プロセッサ５１と、メモリ５２と、通信回路５３と、ストレージ５４とを含む構成である。 The audio processing device 50 includes a processor 51 , a memory 52 , a communication circuit 53 and a storage 54 .

メモリ５２は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）とＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）を含み、音声処理装置５０の動作の実行に必要なプログラムやデータ、動作中に生成された情報またはデータ等を一時的に保存する。ＲＡＭは、例えばプロセッサ５１の動作時に使用されるワークメモリである。ＲＯＭは、例えばプロセッサ５１を制御するためのプログラムおよびデータを予め記憶する。 The memory 52 includes, for example, RAM (Random Access Memory) and ROM (Read Only Memory), and temporarily stores programs and data necessary for executing the operation of the speech processing device 50, information or data generated during operation, and the like. save. The RAM is a work memory used when the processor 51 operates, for example. The ROM pre-stores programs and data for controlling the processor 51, for example.

通信回路５３は、ネットワークＮＷを介して操作端末１０との間で通信可能な通信回路、あるいはその通信回路が実装されたＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）により構成される。 The communication circuit 53 is composed of a communication circuit capable of communicating with the operation terminal 10 via the network NW, or a NIC (Network Interface Card) on which the communication circuit is mounted.

ストレージ５４は、大容量の記憶媒体であり、例えば音声データ、音声認識結果、およびキーワードデータベース（ＤＢ）５４１等を記憶する。 The storage 54 is a large-capacity storage medium, and stores, for example, speech data, speech recognition results, keyword database (DB) 541, and the like.

キーワードＤＢ５４１には、全社員ＩＤ、および各社員ＩＤに対応する複数の点検項目が登録されている。 All employee IDs and a plurality of inspection items corresponding to each employee ID are registered in the keyword DB 541 .

プロセッサ５１は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）あるいはＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）を用いて構成され、音声処理装置５０の各部の動作を制御する。プロセッサ５１は、音声処理装置５０の制御部として機能し、音声処理装置５０の各部の動作を全体的に統括するための制御処理、音声処理装置５０の各部との間のデータの入出力処理、データの演算（計算）処理およびデータの記憶処理を行う。プロセッサ５１は、メモリ５２内のＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に記憶されたプログラムの実行に従って動作する。 The processor 51 is configured using, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an FPGA (Field Programmable Gate Array), and controls the operation of each section of the audio processing device 50 . The processor 51 functions as a control unit of the audio processing device 50, and performs control processing for overall control of the operation of each unit of the audio processing device 50, data input/output processing with each unit of the audio processing device 50, Performs data arithmetic (calculation) processing and data storage processing. The processor 51 operates according to the execution of programs stored in a ROM (Read Only Memory) in the memory 52 .

プロセッサ５１は、機能的な構成として制御部６１と、音声認識部６２と、キーワードマッチング部６３とを有する。これらの各部は、プロセッサ５１が内蔵メモリあるいはメモリ５２に記憶された制御プログラムを実行することで実現される機能である。 The processor 51 has a control section 61, a speech recognition section 62, and a keyword matching section 63 as functional components. These units are functions realized by the processor 51 executing a control program stored in the built-in memory or the memory 52 .

音声認識部６２は、メモリ５２内のＲＯＭに記憶されたプログラムの実行に従って構成されるソフトウェア的な構成要素であり、操作端末１０から送信された音声データの音声を認識してテキストデータに変換する。なお、操作端末１０から、音声データの代わりに、既に音声認識された結果であるテキストデータが送信される場合、音声認識部６２による音声認識処理は省略される。 The voice recognition unit 62 is a software component configured according to the execution of a program stored in the ROM in the memory 52, and recognizes the voice of the voice data transmitted from the operation terminal 10 and converts it into text data. . Note that when text data, which is the result of voice recognition already performed, is transmitted from the operation terminal 10 instead of voice data, the voice recognition processing by the voice recognition unit 62 is omitted.

キーワードマッチング部６３は、メモリ５２内のＲＯＭに記憶されたプログラムの実行に従って構成されるソフトウェア的な構成要素であり、音声認識結果であるテキストとキーワードＤＢ５４１に登録されているキーワードとを照合する。例えば、音声認識されたテキストが番号またはその羅列（数列）である場合、キーワードマッチング部６３は、この番号とキーワードＤＢ５４１に登録されている社員ＩＤとを照合し、一致している場合、利用者がこの社員ＩＤを持つ社員であることを認証する。社員ＩＤの認証が成功した場合、キーワードマッチング部６３は、この社員ＩＤに対応する１つまたは複数の点検項目をピックアップ（抽出）する。キーワードマッチング部６３は、この利用者が次に発する音声を点検項目の順番（チェック番号）に該当させる。例えば、社員ＩＤが「１２３４５６」である場合、その社員ＩＤの社員が実行するべき点検項目として、エンジンの確認と、ブレーキディスクの確認と、アクセルペダルの確認と、バッテリチェックの確認等とが予め定義されておりピックアップされる。キーワードマッチング部６３は、この点検項目の順番を、利用者が発する音声の発話順と判断する。 The keyword matching unit 63 is a software-like component configured according to the execution of the program stored in the ROM in the memory 52, and collates the text that is the speech recognition result with the keywords registered in the keyword DB 541. FIG. For example, if the speech-recognized text is a number or a series of numbers (sequence) thereof, the keyword matching unit 63 collates this number with the employee ID registered in the keyword DB 541, and if they match, the user is an employee with this employee ID. When the employee ID is successfully authenticated, the keyword matching unit 63 picks up (extracts) one or more inspection items corresponding to this employee ID. The keyword matching unit 63 matches the next voice uttered by the user to the order of inspection items (check number). For example, if the employee ID is "123456", the inspection items to be performed by the employee with that employee ID include checking the engine, checking the brake disc, checking the accelerator pedal, checking the battery, and the like. Defined and picked up. The keyword matching unit 63 determines the order of inspection items as the utterance order of the voice uttered by the user.

制御部６１は、メモリ５２内のＲＯＭに記憶されたプログラムの実行に従って構成されるソフトウェア的な構成要素であり、音声認識結果およびチェック番号の点検項目を含む画面データを生成し、操作端末１０に送信する。また、制御部６１は、音声認識結果およびチェック番号の点検項目をストレージ５４に記憶する。 The control unit 61 is a software component configured according to the execution of a program stored in the ROM in the memory 52, generates screen data including inspection items such as a speech recognition result and a check number, and outputs the screen data to the operation terminal 10. Send. Further, the control unit 61 stores the inspection item of the voice recognition result and the check number in the storage 54 .

図３は、キーワードデータベース（ＤＢ）５４１の登録内容を示すテーブルである。キーワードデータベース（ＤＢ）５４１には、例えば、社員ＩＤが「１２３４５６」に対応する点検項目として、チェック１：エンジンの確認と、チェック２：ブレーキディスクの確認と、チェック３：アクセルペダルの確認と、チェック４：バッテリの確認等とが予め定義されて登録されている。 FIG. 3 is a table showing registered contents of the keyword database (DB) 541. As shown in FIG. In the keyword database (DB) 541, for example, check 1: engine check, check 2: brake disc check, check 3: accelerator pedal check, and check items corresponding to employee ID "123456" are stored. Check 4: Checking the battery and the like are defined and registered in advance.

また、社員ＩＤが「７８９１２３」に対応する点検項目として、チェック１：エンジンオイルの確認と、チェック２：クーラントの確認と、チェック３：ブレーキオイルの確認と、チェック４：燃料量の確認等とが予め定義されて登録されている。 Check 1: Check engine oil, Check 2: Check coolant, Check 3: Check brake oil, Check 4: Check fuel amount, etc. is defined and registered in advance.

図４は、表示装置１５の音声入力画面の遷移を示す図である。操作端末１０は、起動後、表示装置１５に社員ＩＤの入力画面ＧＡ１を表示する。社員ＩＤの入力画面ＧＡ１には、「社員ＩＤの入力」のメッセージｍｓ１と、その下方に入力ボックスｂｘ１とが表示される。操作端末１０は、音声入力装置１４において利用者が発する音声を収音し、その音声に含まれる数字を社員ＩＤとして受け付ける。図４の例では、利用者が「１２３４５６」という音声を発したことで、その音声の認識結果である「１２３４５６」が入力ボックスｂｘ１に入力されている。 FIG. 4 is a diagram showing transition of the voice input screen of the display device 15. As shown in FIG. After being activated, the operation terminal 10 displays an employee ID input screen GA1 on the display device 15 . On the employee ID input screen GA1, a message ms1 of "enter employee ID" and an input box bx1 below it are displayed. The operation terminal 10 picks up the voice uttered by the user through the voice input device 14 and accepts the number included in the voice as the employee ID. In the example of FIG. 4, the user uttered the voice "123456", and "123456", which is the recognition result of the voice, is entered in the input box bx1.

操作端末１０は、社員ＩＤの音声の認識結果を受け付けると、その社員ＩＤに対応する社員が予め登録された社員であるか否かを認証等し、その認証が成功した場合にその社員ＩＤに対応する点検項目の入力画面ＧＡ２を表示する。入力画面ＧＡ２の上方には、直前の入力画面ＧＡ１の表示時に収音された利用者の音声の認識によって入力された社員ＩＤが表示される。入力画面ＧＡ２には、点検項目ごとに、チェック番号１，２，３，Ｎ（図４の例では、Ｎ＝４）、点検内容ｃｔ１，ｃｔ２，ｃｔ３，ｃｔ４、および入力ボックスｂｙ１，ｂｙ２，ｂｙ３，ｂｙ４のそれぞれが順に対応付けられて表示される。なお、Ｎは４以上の正の整数である。 When receiving the voice recognition result of the employee ID, the operation terminal 10 authenticates whether or not the employee corresponding to the employee ID is a pre-registered employee. The corresponding inspection item input screen GA2 is displayed. Above the input screen GA2, the employee ID input by recognizing the user's voice picked up when the previous input screen GA1 was displayed is displayed. The input screen GA2 displays check numbers 1, 2, 3, and N (N=4 in the example of FIG. 4), inspection details ct1, ct2, ct3, and ct4, and input boxes by1, by2, and by3 for each inspection item. , by4 are associated with each other in order and displayed. Note that N is a positive integer of 4 or more.

チェック番号１～Ｎは、例えば利用者が点検を終える度に発する発話の順番に対応する。一例として、チェック１の点検内容は、エンジンの確認である。入力ボックスｂｙ１には、利用者が車両（図示略）内に搭載されたエンジンの点検確認結果として発話された音声「ＯＫ」の認識結果（テキストデータ）が入力される。 The check numbers 1 to N correspond to, for example, the order of utterances made by the user every time he/she finishes checking. As an example, the inspection content of check 1 is confirmation of the engine. In the input box by1, the recognition result (text data) of the voice "OK" uttered by the user as the inspection confirmation result of the engine mounted in the vehicle (not shown) is input.

同様に、チェック２の点検内容は、ブレーキディスクの確認である。入力ボックスｂｙ２には、利用者が車両内に搭載されたブレーキディスクの点検確認結果として発話された音声「ＯＫ」の認識結果（テキストデータ）が入力される。 Similarly, the inspection content of check 2 is confirmation of the brake disc. In the input box by2, the recognition result (text data) of the voice "OK" uttered by the user as the inspection confirmation result of the brake disc mounted in the vehicle is input.

チェック３の点検内容は、アクセルペダルの確認である。入力ボックスｂｙ３には、利用者が車両内に搭載されたアクセルペダルの点検確認結果として発話された音声「ＯＫ」の認識結果（テキストデータ）が入力される。 The inspection content of check 3 is confirmation of the accelerator pedal. The recognition result (text data) of the voice "OK" uttered by the user as the check confirmation result of the accelerator pedal mounted in the vehicle is input to the input box by3.

チェック４の点検内容は、バッテリの確認である。入力ボックスｂｙ４には、例えば車両内に搭載されたバッテリの確認結果として利用者によってバッテリ電圧が低いと判断された場合に利用者により発話された音声「ＮＧ」の認識結果（テキストデータ）が入力される。 The inspection content of check 4 is confirmation of the battery. In the input box by4, for example, the recognition result (text data) of the voice "NG" uttered by the user when the user determines that the battery voltage is low as a result of checking the battery installed in the vehicle is input. be done.

次に、実施の形態１に係る操作入力システム５の動作手順について説明する。 Next, operation procedures of the operation input system 5 according to the first embodiment will be described.

以下の説明を分かり易くするため、一例として、点検対象物（例えば車両）を点検する作業者の手が塞がれている状態で、作業者が車両の点検結果を記録する例をユースケースとして説明する。手が塞がれている状態として、例えば点検終了後、作業者が使用していた工具を片付けていたり、手を洗ったりしている等の状況が想定される。また、素手によるタッチ操作で入力可能なタッチパネルを有する操作端末に対し、作業者が手袋をはめている状況が想定される。 In order to make the following explanation easier to understand, as an example, the use case is an example in which an operator who inspects an object to be inspected (e.g., a vehicle) records the inspection result of the vehicle while the operator's hands are occupied. explain. As a state in which the hands are occupied, for example, a situation is assumed in which the worker puts away the used tools or washes his hands after the inspection is finished. In addition, it is assumed that the operator wears gloves on an operation terminal having a touch panel that allows input by touch operation with bare hands.

図５および図６は、音声認識の動作手順例を示すシーケンス図である。 5 and 6 are sequence diagrams showing an example of the operation procedure of speech recognition.

図５において、利用者による電源オン操作によって操作端末１０が起動すると、操作端末１０のプロセッサ１１は、音声認識の動作を開始する。プロセッサ１１は、社員ＩＤの入力画面ＧＡ１を表示装置１５に表示する（Ｔ１）。社員ＩＤの入力画面ＧＡ１が表示された状態で、利用者が音声（例えば、番号「１２３４５６」）を発する。プロセッサ１１は、音声入力装置１４で利用者の発話を収音する（Ｔ２）。プロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、この音声データを音声処理装置５０に送信する（Ｔ３）。 In FIG. 5, when the operation terminal 10 is activated by the user's power-on operation, the processor 11 of the operation terminal 10 starts speech recognition operation. The processor 11 displays an employee ID input screen GA1 on the display device 15 (T1). With the employee ID input screen GA1 displayed, the user speaks (for example, the number "123456"). The processor 11 picks up the user's speech with the voice input device 14 (T2). Processor 11 transmits this audio data to audio processing device 50 via communication circuit 13, access point 40 and network NW (T3).

音声処理装置５０のプロセッサ５１の制御部６１は、通信回路５３を介して操作端末１０から送信された音声データを受信し、メモリ５２にこの音声データを記憶する。プロセッサ５１の音声認識部６２は、メモリ５２に記憶された音声データに対し音声認識を行う（Ｔ４）。 The control unit 61 of the processor 51 of the voice processing device 50 receives voice data transmitted from the operation terminal 10 via the communication circuit 53 and stores the voice data in the memory 52 . The speech recognition unit 62 of the processor 51 performs speech recognition on the speech data stored in the memory 52 (T4).

プロセッサ５１のキーワードマッチング部６３は、音声認識されたテキストデータが該当する項目（社員ＩＤの入力項目）に対する入力であるかを判断する（Ｔ５）。この判断では、キーワードマッチング部６３は、ストレージ５４に記憶されたキーワードＤＢ５４１を参照し、音声認識されたテキストデータに対応する社員ＩＤの有無を判定し、利用者の社員ＩＤを認証する。さらに、キーワードマッチング部６３は、社員ＩＤの認証結果がＯＫである場合、キーワードＤＢ５４１に登録されている、社員ＩＤに対応する複数の点検項目をピックアップする。 The keyword matching unit 63 of the processor 51 determines whether or not the voice-recognized text data is input for the corresponding item (employee ID input item) (T5). In this determination, the keyword matching unit 63 refers to the keyword DB 541 stored in the storage 54, determines whether or not there is an employee ID corresponding to the voice-recognized text data, and authenticates the user's employee ID. Furthermore, when the authentication result of the employee ID is OK, the keyword matching unit 63 picks up a plurality of inspection items corresponding to the employee ID registered in the keyword DB 541 .

制御部６１は、社員ＩＤの認証ＯＫ、および社員ＩＤに対応する複数の点検項目を基に、点検項目の入力画面ＧＡ２の画面データを生成する（Ｔ６）。制御部６１は、通信回路５３およびネットワークＮＷを介して、点検項目の入力画面ＧＡ２の画面データを操作端末１０に送信する（Ｔ７）。 The control unit 61 generates screen data for the inspection item input screen GA2 based on the authentication OK of the employee ID and a plurality of inspection items corresponding to the employee ID (T6). The control unit 61 transmits the screen data of the inspection item input screen GA2 to the operation terminal 10 via the communication circuit 53 and the network NW (T7).

なお、社員ＩＤの認証結果がＮＧである場合、音声処理装置５０の制御部６１は、操作端末１０から再度の音声データの受信を待つ。このとき、制御部６１は、利用者の社員ＩＤが認証できない旨を操作端末１０に返信してもよいし、何も返信しなくてもよい。また、キーワードＤＢ５４１に、社員ＩＤに対応する点検項目が登録されていない場合、制御部６１は、利用者の社員ＩＤに対応する点検項目が登録されていない旨を操作端末１０に返信してもよいし、何も返信しなくてもよい。 Note that if the authentication result of the employee ID is NG, the control unit 61 of the voice processing device 50 waits for reception of voice data from the operation terminal 10 again. At this time, the control unit 61 may reply to the operation terminal 10 to the effect that the employee ID of the user cannot be authenticated, or may reply nothing. Further, when the inspection item corresponding to the employee ID is not registered in the keyword DB 541, the control unit 61 returns to the operation terminal 10 that the inspection item corresponding to the employee ID of the user is not registered. OK, you don't have to reply anything.

操作端末１０のプロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、音声処理装置５０から点検項目の入力画面ＧＡ２の画面データを受信し、点検項目の入力画面ＧＡ２を表示装置１５に表示する（Ｔ８）。なお、プロセッサ１１は、音声入力装置１４で利用者の発話を収音してからしばらく経過しても、音声処理装置５０から点検項目の入力画面ＧＡ２の画面データを受信しなかった場合、手順Ｔ２に戻って同様の処理を繰り返してもよい。これにより、利用者は、再度、社員ＩＤを発話でき、突発的な騒音等によりうまく収音できなかった場合に対処できる。 The processor 11 of the operation terminal 10 receives the screen data of the inspection item input screen GA2 from the voice processing device 50 via the communication circuit 13, the access point 40 and the network NW, and displays the inspection item input screen GA2 on the display device 15. (T8). Note that if the processor 11 does not receive the screen data of the inspection item input screen GA2 from the voice processing device 50 even after a while has passed since the voice input device 14 picked up the user's utterance, the processor 11 performs step T2. and repeat the same process. As a result, the user can speak the employee ID again, and can deal with the case where the sound cannot be picked up properly due to sudden noise or the like.

表示装置１５に点検項目の入力画面ＧＡ２が表示された状態で、プロセッサ１１は、手順Ｔ２と同様、音声入力装置１４で利用者の発話を収音する（Ｔ９）。このとき、利用者は、チェック１（ここでは、エンジンの確認）の点検項目に対し、例えば「ＯＫ」、「ＮＧ」等を発音する。プロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、この音声データを音声処理装置５０に送信する（Ｔ１０）。音声処理装置５０の制御部６１は、通信回路５３を介して操作端末１０から送信された音声データを受信し、メモリ５２にこの音声データを記憶する。音声認識部６２は、メモリ５２に記憶された音声データに対し音声認識を行う（Ｔ１１）。 While the inspection item input screen GA2 is displayed on the display device 15, the processor 11 picks up the user's speech with the voice input device 14 (T9), as in step T2. At this time, the user pronounces, for example, "OK" or "NG" for the inspection item of check 1 (here, check of the engine). Processor 11 transmits this audio data to audio processing device 50 via communication circuit 13, access point 40 and network NW (T10). The control unit 61 of the voice processing device 50 receives voice data transmitted from the operation terminal 10 via the communication circuit 53 and stores the voice data in the memory 52 . The speech recognition unit 62 performs speech recognition on the speech data stored in the memory 52 (T11).

制御部６１は、音声認識されたテキストデータが前の入力項目（社員ＩＤの入力項目）に対する入力であるか否かを判断する（Ｔ１２）。音声認識されたテキストデータが社員ＩＤの入力項目に対する入力である場合、つまり６桁の数字である場合、制御部６１は、通信回路５３およびネットワークＮＷを介して、この社員ＩＤのテキストデータを操作端末１０に送信する（Ｔ１３）。操作端末１０のプロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、音声処理装置５０から社員ＩＤのテキストデータを受信すると、この受信が社員ＩＤの訂正であると判断し、訂正された社員ＩＤのテキストデータを反映するように、点検項目の入力画面ＧＡ２を更新する（Ｔ１４）。 The control unit 61 determines whether or not the voice-recognized text data is input for the previous input item (employee ID input item) (T12). If the voice-recognized text data is an input for the employee ID entry item, that is, if it is a six-digit number, the control unit 61 operates the employee ID text data via the communication circuit 53 and the network NW. It is transmitted to the terminal 10 (T13). When the processor 11 of the operation terminal 10 receives the text data of the employee ID from the speech processing device 50 via the communication circuit 13, the access point 40 and the network NW, it determines that this reception is the correction of the employee ID, and corrects it. The inspection item input screen GA2 is updated so as to reflect the text data of the received employee ID (T14).

一方、手順Ｔ１２で音声認識されたテキストデータが社員ＩＤの入力項目に対する入力でない場合、制御部６１は、音声認識されたテキストデータが該当する項目（チェック１の点検項目）に対する入力（例えば「ＯＫ」）であるか否かを判別する（Ｔ１５）。音声認識されたテキストがチェック１の点検項目に対する入力でない場合、制御部６１は、手順Ｔ１１に戻り、再度、音声データを受信するまで待つ。 On the other hand, if the voice-recognized text data in step T12 is not an input for the employee ID input item, the control unit 61 performs an input (for example, "OK ”) is determined (T15). If the voice-recognized text is not the input for the inspection item of check 1, the control unit 61 returns to step T11 and waits until voice data is received again.

一方、手順Ｔ１５で音声認識されたテキストデータがチェック１の点検項目に対する入力である場合、制御部６１は、通信回路５３およびネットワークＮＷを介して、チェック１の点検項目に対するテキストデータを操作端末１０に送信する（Ｔ１６）。操作端末１０のプロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、音声処理装置５０からチェック１の点検項目に対するテキストデータを受信し、このテキストデータを反映するように、点検項目の入力画面ＧＡ２を更新する（Ｔ１７）。更新された点検項目の入力画面ＧＡ２では、チェック１のエンジンの確認に対する入力ボックスｂｙ１に「ＯＫ」の文字が表示される（図４参照）。手順Ｔ９～手順Ｔ１７までの同様の処理は、点検項目の数に相当するＮ回分繰り返される。つまり、点検項目の番号（チェック番号）を第ｋ番目の点検項目で表すと、手順Ｔ９～手順Ｔ１７までの同様の処理は、ｋ＝１～Ｎで行われる。 On the other hand, if the text data voice-recognized in step T15 is an input for the check item of check 1, the control unit 61 transmits the text data for the check item of check 1 to the operation terminal 10 via the communication circuit 53 and the network NW. (T16). The processor 11 of the operation terminal 10 receives the text data for the check item of check 1 from the voice processing device 50 via the communication circuit 13, the access point 40 and the network NW, and converts the check item so as to reflect this text data. is updated (T17). In the updated inspection item input screen GA2, the input box by1 for the confirmation of the engine in Check 1 displays the characters "OK" (see FIG. 4). Similar processing from procedure T9 to procedure T17 is repeated N times corresponding to the number of inspection items. That is, if the inspection item number (check number) is represented by the k-th inspection item, the same processing from procedure T9 to procedure T17 is performed with k=1 to N.

その後、表示装置１５にチェック（Ｎ－１）の点検項目が入力済みである、つまりチェックＮの入力画面ＧＡ２が表示された状態で、プロセッサ１１は、手順Ｔ９と同様、音声入力装置１４で利用者の発話を収音する（Ｔ１８）。プロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、この音声データを音声処理装置５０に送信する（Ｔ１９）。音声処理装置５０の制御部６１は、通信回路５３を介して操作端末１０から送信された音声データを受信し、メモリ５２にこの音声データを記憶する。音声認識部６２は、メモリ５２に記憶された音声データに対し音声認識を行う（Ｔ２０）。 After that, in a state where the check (N-1) inspection item has been input on the display device 15, that is, the input screen GA2 for check N is displayed, the processor 11 uses the voice input device 14 as in step T9. The utterance of the person is collected (T18). Processor 11 transmits this audio data to audio processing device 50 via communication circuit 13, access point 40 and network NW (T19). The control unit 61 of the voice processing device 50 receives voice data transmitted from the operation terminal 10 via the communication circuit 53 and stores the voice data in the memory 52 . The speech recognition unit 62 performs speech recognition on the speech data stored in the memory 52 (T20).

制御部６１は、音声認識されたテキストデータが前の点検項目（チェック（Ｎ－１）の点検項目）に対する入力であるか否かを判別する（Ｔ２１）。例えば、利用者がキーワードとして「前の項目ＮＧ」と発話した場合、前の入力項目に対する入力であると判断される。 The control unit 61 determines whether or not the voice-recognized text data is an input for the previous check item (check (N-1) check item) (T21). For example, when the user utters "previous item NG" as a keyword, it is determined that the input is for the previous input item.

音声認識されたテキストデータがチェック（Ｎ－１）の点検項目に対する入力である場合、制御部６１は、通信回路５３およびネットワークＮＷを介して、訂正されたチェック（Ｎ－１）のテキストデータを操作端末１０に送信する（Ｔ２２）。操作端末１０のプロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、音声処理装置５０から訂正されたチェック（Ｎ－１）のテキストデータを受信すると、このテキストデータを反映するように、点検項目の入力画面ＧＡ２を更新する（Ｔ２３）。 When the voice-recognized text data is an input for the inspection item of check (N-1), the control unit 61 transmits the corrected text data of check (N-1) via the communication circuit 53 and the network NW. It is transmitted to the operation terminal 10 (T22). When the processor 11 of the operation terminal 10 receives the text data of the corrected check (N-1) from the speech processing device 50 via the communication circuit 13, the access point 40 and the network NW, the processor 11 reflects this text data. Then, the inspection item input screen GA2 is updated (T23).

一方、手順Ｔ２１で音声認識されたテキストデータがチェック（Ｎ－１）の点検項目に対する入力でない場合、制御部６１は、音声認識されたテキストデータが該当する項目（ここでは、チェックＮの点検項目）に対する入力（例えば「ＯＫ」）であるかを判断する（Ｔ２４）。音声認識されたテキストデータがチェックＮの点検項目に対する入力でない場合、制御部６１は、手順１９で音声データを受信するまで待つ。 On the other hand, if the voice-recognized text data in step T21 is not an input for the check item of check (N-1), the control unit 61 selects the item to which the voice-recognized text data corresponds (here, the check item of check N). ) (for example, "OK") (T24). If the voice-recognized text data is not the input for the inspection item of check N, the control unit 61 waits until voice data is received in step 19 .

一方、手順Ｔ２１で音声認識されたテキストがチェックＮの点検項目に対する入力である場合、制御部６１は、点検結果画面ＧＡ３（図７参照）を生成する（Ｔ２５）。制御部６１は、通信回路５３およびネットワークＮＷを介して、点検結果画面ＧＡ３の画面データを操作端末１０に送信する（Ｔ２６）。操作端末１０のプロセッサ１１は、通信回路１３、アクセスポイント４０およびネットワークＮＷを介して、音声処理装置５０から点検結果画面ＧＡ３の画面データを受信し、表示装置１５に点検結果画面ＧＡ３を表示する（Ｔ２７）。 On the other hand, if the text voice-recognized in step T21 is input for the inspection item of check N, the control unit 61 generates an inspection result screen GA3 (see FIG. 7) (T25). The control unit 61 transmits screen data of the inspection result screen GA3 to the operation terminal 10 via the communication circuit 53 and the network NW (T26). The processor 11 of the operation terminal 10 receives the screen data of the inspection result screen GA3 from the voice processing device 50 via the communication circuit 13, the access point 40 and the network NW, and displays the inspection result screen GA3 on the display device 15 ( T27).

図７は、表示装置１５に表示された点検結果画面ＧＡ３の一例を示す図である。点検結果画面ＧＡ３には、社員ＩＤ、点検日時、およびチェック１～Ｎの点検結果が一覧で表示される。図７では、社員ＩＤ：１２３４５６、点検日時：２０１８年○月○日，チェック１：エンジンの確認ＯＫ，チェック２：ブレーキディスクの確認ＯＫ，チェック３：アクセルぺダルの確認ＯＫ，チェック４：バッテリの確認ＮＧ等が表示される。 FIG. 7 is a diagram showing an example of the inspection result screen GA3 displayed on the display device 15. As shown in FIG. On the inspection result screen GA3, the employee ID, inspection date and time, and the inspection results of checks 1 to N are displayed in a list. In Fig. 7, employee ID: 123456, inspection date: 2018, month, day, check 1: engine check OK, check 2: brake disc check OK, check 3: accelerator pedal check OK, check 4: battery Confirmation of NG is displayed.

このように、実施の形態１の操作入力システムでは、社員ＩＤの入力から点検項目の確認、点検結果画面の表示に至るまでの操作を、利用者の手を必要とすることなく簡単な発話だけで完結できる。特に、チェック番号を発音することなく、チェックの順番を発話順に合わせることができ、発話回数が少なくて済む。これにより、発話による操作が簡単になる。また、発話順によらず、入力する単語の形式（例えば４桁の数字）が他の単語の形式（例えば２文字のアルファベット）と異なる場合、入力対象を特定する項目名を省略できるようにしてもよい。 As described above, in the operation input system of Embodiment 1, operations from inputting the employee ID to checking the inspection items and displaying the inspection result screen can be performed only by simple utterances without requiring the user's hands. can be completed with In particular, the order of checks can be matched to the order of utterance without uttering the check number, and the number of utterances can be reduced. This simplifies the operation by speaking. Also, regardless of the speaking order, if the format of a word to be entered (e.g. 4-digit number) is different from the format of other words (e.g. 2-letter alphabet), the item name specifying the input target can be omitted. good.

なお、前の点検項目の確認を訂正する場合、利用者が「前の項目ＮＧ」と発話する場合を例示したが、これに限らず、「チェック番号○〇ＮＧ」と発話してもよい。チェック番号を入力することで、２つ以上前のチェック番号の入力を訂正することも可能となる。また、入力操作に使用される、利用者が発する簡単な単語としては、番号、ＯＫ、ＮＧに限らず、肯定を表すＹＥＳ，否定を表すＮＯ，ランクを表すＡ，Ｂ，Ｃ等であってもよい。これにより、入力ミスが少なくなる。 In addition, when correcting the confirmation of the previous check item, the user utters "Previous item NG" as an example, but is not limited to this, and may utter "Check number ○○ NG". By inputting the check number, it is also possible to correct the input of the check number two or more before. In addition, the simple words uttered by the user used in the input operation are not limited to numbers, OK, and NG, but may be YES representing affirmation, NO representing negation, A, B, C representing rank, etc. good too. This reduces input errors.

以上により、実施の形態１の操作入力方法は、操作端末１０に社員ＩＤの入力画面ＧＡ１（第１の画面）が表示された状態で利用者（ユーザ）の発する音声を収音するステップと、収音された前記音声を認識するステップと、音声の認識結果が社員ＩＤの入力画面ＧＡ１に示される点検項目（確認項目）に対する入力内容と合致するか否かを判断するステップと、音声の認識結果が社員ＩＤの入力画面ＧＡ１に示される社員ＩＤの入力項目に対する入力内容と合致する場合、社員ＩＤの入力画面ＧＡ１から点検項目の入力画面ＧＡ２（第２の画面）への表示の切り替えと、社員ＩＤの入力画面ＧＡ１に示される社員ＩＤの入力項目に対応する音声の認識結果の点検項目の入力画面ＧＡ２への表示とを操作端末１０に指示するステップと、を有する。 As described above, the operation input method according to the first embodiment includes the step of collecting the voice uttered by the user while the employee ID input screen GA1 (first screen) is displayed on the operation terminal 10; a step of recognizing the collected voice; a step of determining whether or not the recognition result of the voice matches the input contents of the inspection items (confirmation items) shown on the employee ID input screen GA1; and a step of recognizing the voice. If the result matches the input content for the employee ID input item shown on the employee ID input screen GA1, switching the display from the employee ID input screen GA1 to the inspection item input screen GA2 (second screen); and a step of instructing the operation terminal 10 to display on the input screen GA2 the check item of the speech recognition result corresponding to the input item of the employee ID shown on the input screen GA1 of the employee ID.

これにより、操作入力方法、操作入力システム５、あるいは操作端末１０によれば、ユーザが手を用いた操作を行うことが難しい状況（作業者の手が塞がれている状態）等でも、作業者が車両の点検結果を記録するために操作端末１０に対して音声を発するという簡易な作業で点検を効率的に行えるので、ユーザ操作の入力時の利便性を向上できる。 As a result, according to the operation input method, the operation input system 5, or the operation terminal 10, even in a situation where it is difficult for the user to perform an operation using his hands (a state in which the operator's hands are occupied), the work can be performed. Since the inspection can be performed efficiently by a simple task of uttering a voice to the operation terminal 10 in order to record the inspection result of the vehicle, the convenience at the time of inputting the user operation can be improved.

また、操作入力方法は、操作端末１０にＮ（Ｎ：４以上の整数）個の点検項目の入力画面ＧＡ２が表示された状態でユーザの発する音声の認識結果が点検項目の入力画面ＧＡ２に示される第ｋ（ｋ：１≦ｋ≦（Ｎ－１）を満たす整数）番目の点検項目に対する入力内容と合致するか否かを判断するステップと、音声の認識結果が点検項目の入力画面ＧＡ２に示される第ｋ番目の点検項目に対する入力内容と合致する場合、第ｋ番目の点検項目に対応する音声の認識結果の点検項目の入力画面ＧＡ２への表示を操作端末１０に指示するステップと、を有する。これにより、点検項目の連続的な入力が可能となり、操作性が向上する。 Further, the operation input method is such that the recognition result of the voice uttered by the user is displayed on the inspection item input screen GA2 in a state in which N (N: an integer equal to or greater than 4) inspection item input screens GA2 are displayed on the operation terminal 10. a step of judging whether or not it matches the input content for the k-th inspection item (k: an integer satisfying 1 ≤ k ≤ (N-1)); a step of instructing the operation terminal 10 to display the result of voice recognition corresponding to the k-th inspection item on the inspection item input screen GA2 if it matches the input content for the k-th inspection item shown; have. This enables continuous input of inspection items, improving operability.

また、音声を認識するステップは、音声の認識結果が点検項目の入力画面ＧＡ２に示される第ｋ番目の点検項目に対する入力内容と合致しない場合、利用者が発話した「前の項目ＮＧ」、「チェック３ＮＯ」等のキーワード（所定のキーワード）と第ｋ番目の点検項目に対して利用者が再度発する音声との認識処理を受け付けるステップを含む。これにより、一旦、入力が完了した後でも、前の点検項目の入力内容を簡単に訂正できる。 Further, in the step of recognizing the voice, if the voice recognition result does not match the input content for the k-th inspection item shown on the inspection item input screen GA2, the user uttered "previous item NG", " It includes a step of receiving a recognition process of a keyword (predetermined keyword) such as "check 3 NO" and the voice uttered again by the user for the k-th inspection item. This makes it possible to easily correct the input content of the previous inspection item even after the input has been completed.

また、音声の認識結果が点検項目の入力画面ＧＡ２に示される第Ｎ番目の点検項目に対する入力内容と合致する場合、Ｎ個の点検項目とそれぞれの点検項目に対するユーザの発する音声の認識結果とを対応付けた点検結果画面ＧＡ３（認識結果）の表示を操作端末１０に指示するステップ、を更に有する。これにより、ユーザは、全ての点検項目の入力内容を一覧で視覚的に確認できる。したがって、ユーザは、誤入力を見つけ易くなり、入力ミスの低減を図ることができる。 Further, when the speech recognition result matches the input content for the N-th inspection item shown on the inspection item input screen GA2, the N inspection items and the recognition result of the voice uttered by the user for each inspection item are displayed. It further includes a step of instructing the operation terminal 10 to display the associated inspection result screen GA3 (recognition result). Thereby, the user can visually confirm the input contents of all inspection items in a list. Therefore, it becomes easier for the user to find an erroneous input, and input errors can be reduced.

また、点検項目の入力画面ＧＡ２に示されるチェック１（第１番目）からチェックＮ（第Ｎ番目）までの点検項目に対する入力内容は、社員ＩＤの入力画面ＧＡ１に示される社員ＩＤの入力項目に対する入力内容と対応付けられる。これにより、社員ＩＤごとに点検項目を管理できる。また、ユーザは、点検項目の入力画面ＧＡ２に示された点検項目の入力内容と自身が想定している点検項目の内容とを比較し、その正誤を容易に確認できる。 In addition, the input contents for the inspection items from check 1 (first) to check N (Nth) shown on the inspection item input screen GA2 correspond to the employee ID input items shown on the employee ID input screen GA1. Associated with input content. Thereby, inspection items can be managed for each employee ID. In addition, the user can compare the input contents of the inspection items shown on the inspection item input screen GA2 with the contents of the inspection items assumed by the user, and can easily confirm whether the contents are correct or not.

また、利用者が発話した「前の項目ＮＧ」、「チェック３ＮＯ」等のキーワード（所定のキーワード）は、１つ前の点検項目（第ｋ－１番目の確認項目）に対する入力内容の訂正（修正）を表すテキストデータ（情報）である。これにより、１つ前の点検項目を簡単に訂正できる。 In addition, keywords (predetermined keywords) such as "previous item NG" and "check 3 NO" uttered by the user are used to correct the input content for the previous check item (k-1th check item). It is text data (information) representing (correction). This makes it possible to easily correct the previous check item.

（実施の形態１の変形例１）
実施の形態１では、音声処理装置５０が操作端末１０から音声データを受信して音声認識を行う場合を示したが、実施の形態１の変形例１では、操作端末１０が音声認識を行う例を説明する。 (Modification 1 of Embodiment 1)
In Embodiment 1, the case where the speech processing device 50 receives speech data from the operation terminal 10 and performs speech recognition is shown. explain.

図８および図９は、実施の形態１の変形例１に係る音声認識の動作手順例を示すフローチャートである。 8 and 9 are flowcharts showing an example of a speech recognition operation procedure according to Modification 1 of Embodiment 1. FIG.

図８において、利用者による電源オンの操作等によって操作端末１０が起動すると、操作端末１０のプロセッサ１１は、音声認識の動作を開始する。プロセッサ１１は、社員ＩＤの入力画面ＧＡ１を表示装置１５に表示する（Ｓ１）。社員ＩＤの入力画面ＧＡ１が表示された状態で、利用者が音声（例えば、番号「１２３４５６」）を発する。プロセッサ１１は、音声入力装置１４で利用者の発話を収音する（Ｓ２）。 In FIG. 8, when the operating terminal 10 is activated by a user's power-on operation or the like, the processor 11 of the operating terminal 10 starts speech recognition operation. The processor 11 displays an employee ID input screen GA1 on the display device 15 (S1). With the employee ID input screen GA1 displayed, the user speaks (for example, the number "123456"). The processor 11 picks up the user's speech with the voice input device 14 (S2).

プロセッサ１１は、収音された音声の音声データをメモリ１２に記憶する。プロセッサ１１の音声認識部２５は、メモリ１２に記憶された音声データに対し音声認識を行い、音声を認識できたか否かを判別する（Ｓ３）。音声を認識できなかった場合、プロセッサ１１は、ステップＳ２に戻り、再度、収音動作を行う。これにより、利用者は、再度、社員ＩＤの発話でき、突発的な騒音等によりうまく収音できなかった場合に対処できる。 The processor 11 stores audio data of the collected audio in the memory 12 . The speech recognition unit 25 of the processor 11 performs speech recognition on the speech data stored in the memory 12, and determines whether or not the speech has been recognized (S3). If the speech cannot be recognized, the processor 11 returns to step S2 and performs the sound pickup operation again. As a result, the user can speak the employee ID again, and can deal with the case where the sound cannot be picked up properly due to sudden noise or the like.

ステップＳ３で音声を認識できた場合、プロセッサ１１は、音声認識されたテキストデータが該当する項目（社員ＩＤの入力項目）に対する入力であるか否かを判別する（Ｓ４）。社員ＩＤの入力項目に対する入力である場合、プロセッサ１１は、メモリ１２に記憶されたキーワードテーブル１２ｚを参照し、音声認識されたテキストに対応する社員ＩＤの有無を判定し、利用者の社員ＩＤを認証する。さらに、プロセッサ１１は、社員ＩＤの認証結果がＯＫである場合、キーワードテーブル１２ｚに登録されている、社員ＩＤに対応する複数の点検項目をピックアップする。 If the voice can be recognized in step S3, the processor 11 determines whether or not the voice-recognized text data is an input for the corresponding item (employee ID input item) (S4). If the input is for an employee ID input item, the processor 11 refers to the keyword table 12z stored in the memory 12, determines whether or not there is an employee ID corresponding to the text recognized by voice, and identifies the user's employee ID. Authenticate. Furthermore, when the authentication result of the employee ID is OK, the processor 11 picks up a plurality of inspection items corresponding to the employee ID registered in the keyword table 12z.

ステップＳ４で社員ＩＤの入力項目に対する入力でない場合、プロセッサ１１は、ステップＳ２に戻り、再度、収音動作を行う。社員ＩＤの入力項目に対する入力でない場合として、例えば、社員ＩＤの認証結果がＮＧであることや、キーワードテーブル１２ｚに利用者の社員ＩＤに対応する点検項目が登録されていないことが挙げられる。なお、これらの場合、プロセッサ１１は、社員ＩＤの認証結果がＮＧである旨を表示装置１５に表示してもよい。また、プロセッサ１１は、利用者の社員ＩＤに対応する点検項目が登録されていない旨を表示装置１５に表示してもよい。 If it is determined in step S4 that the input item for the employee ID is not entered, the processor 11 returns to step S2 and performs the sound pickup operation again. Examples of cases where the input is not for the input item of the employee ID include that the authentication result of the employee ID is NG, or that the inspection item corresponding to the user's employee ID is not registered in the keyword table 12z. In these cases, the processor 11 may display on the display device 15 that the authentication result of the employee ID is NG. Moreover, the processor 11 may display on the display device 15 that the inspection item corresponding to the employee ID of the user is not registered.

プロセッサ１１は、社員ＩＤの認証ＯＫ、および社員ＩＤに対応する複数の点検項目を基に、点検項目の入力画面ＧＡ２を生成する（Ｓ５）。プロセッサ１１は、点検項目の入力画面ＧＡ２を表示装置１５に表示する（Ｓ６）。 The processor 11 generates an inspection item input screen GA2 based on the confirmation OK of the employee ID and a plurality of inspection items corresponding to the employee ID (S5). The processor 11 displays the inspection item input screen GA2 on the display device 15 (S6).

表示装置１５に点検項目の入力画面ＧＡ２が表示された状態で、プロセッサ１１は、ステップＳ２と同様、音声入力装置１４で利用者の発話を収音する（Ｓ７）。このとき、利用者は、チェック１のエンジンの確認の点検項目に対し、例えば「ＯＫ」、「ＮＧ」等を発音する。 With the inspection item input screen GA2 displayed on the display device 15, the processor 11 picks up the user's speech with the voice input device 14 (S7), as in step S2. At this time, the user pronounces, for example, "OK" or "NG" for the inspection item of check 1 engine confirmation.

プロセッサ１１は、収音された音声の音声データをメモリ１２に記憶する。プロセッサ１１の音声認識部２５は、メモリ１２に記憶された音声データに対し音声認識を行い、音声を認識できたか否かを判別する（Ｓ８）。音声を認識できなかった場合、プロセッサ１１は、ステップＳ７に戻り、再度、収音動作を行う。 The processor 11 stores audio data of the collected audio in the memory 12 . The speech recognition unit 25 of the processor 11 performs speech recognition on the speech data stored in the memory 12, and determines whether or not the speech has been recognized (S8). If the speech cannot be recognized, the processor 11 returns to step S7 and performs the sound pickup operation again.

ステップＳ８で音声を認識できた場合、プロセッサ１１は、音声認識されたテキストデータが該当する項目（チェック１の点検項目）に対する入力であるか否かを判別する（Ｓ９）。音声認識されたテキストデータがチェック１の点検項目に対する入力でない場合、プロセッサ１１は、音声認識されたテキストデータが前の入力項目（社員ＩＤの入力項目）に対する入力であるか否かを判別する（Ｓ１０）。社員ＩＤの入力項目に対する入力でない場合、プロセッサ１１は、ステップＳ８の処理に戻る。なお、社員ＩＤの入力項目に対する入力でない場合、プロセッサ１１は、何も表示しなくてよいし、再入力を促すように表示装置１５に表示してもよい。 If the voice is recognized in step S8, the processor 11 determines whether or not the voice-recognized text data is an input for the corresponding item (inspection item of check 1) (S9). If the voice-recognized text data is not the input for the inspection item of check 1, the processor 11 determines whether or not the voice-recognized text data is the input for the previous input item (employee ID input item) ( S10). If the input item is not the employee ID, the processor 11 returns to the process of step S8. If the input is not for the employee ID input item, the processor 11 may display nothing, or may display on the display device 15 to prompt for re-input.

ステップＳ１０で音声認識されたテキストデータが社員ＩＤの入力項目に対する入力である場合、つまり６桁の数字である場合、プロセッサ１１は、この入力が社員ＩＤの訂正であると判断し、訂正された社員ＩＤを反映するように、点検項目の入力画面ＧＡ２を更新する（Ｓ１１）。この後、プロセッサ１１は、ステップＳ７の処理に戻る。 If the text data voice-recognized in step S10 is an input for an employee ID input item, that is, if it is a 6-digit number, the processor 11 determines that this input is a correction of the employee ID and corrects the input. The check item input screen GA2 is updated so as to reflect the employee ID (S11). After that, the processor 11 returns to the process of step S7.

ステップＳ９で音声認識されたテキストデータがチェック１の点検項目に対する入力である場合、プロセッサ１１は、チェック１の点検項目に対するテキストデータを反映するように（チェック１の入力の音声認識結果を含むように）、点検項目の入力画面ＧＡ２を更新する（Ｓ１２）。更新された点検項目の入力画面ＧＡ２では、チェック１のエンジンの確認に対する入力ボックスｂｙ１に「ＯＫ」の文字が表示される（図４参照）。 If the text data speech-recognized in step S9 is the input for the check item of check 1, the processor 11 is configured to reflect the text data for the check item of check 1 (including the speech recognition result of the input of check 1). ), the inspection item input screen GA2 is updated (S12). In the updated inspection item input screen GA2, the input box by1 for the confirmation of the engine in Check 1 displays the characters "OK" (see FIG. 4).

ステップＳ７～Ｓ１２までの同様の処理は、点検項目の数に相当するＮ回分繰り返される。つまり、点検項目の番号（チェック番号）を第ｋ番目の点検項目で表すと、ステップＳ７～Ｓ１２までの同様の処理は、ｋ＝１～Ｎで行われる。 Similar processing from steps S7 to S12 is repeated N times corresponding to the number of inspection items. That is, if the inspection item number (check number) is represented by the k-th inspection item, the same processing from steps S7 to S12 is performed with k=1 to N.

プロセッサ１１は、表示装置１５にチェック（Ｎ－１）の項目が入力済みである点検項目の入力画面ＧＡ２を表示する（Ｓ１３）。プロセッサ１１は、この表示状態で、ステップＳ７と同様、音声入力装置１４で利用者の発話を収音する（Ｓ１４）。プロセッサ１１は、収音された音声の音声データをメモリ１２に記憶する。プロセッサ１１の音声認識部２５は、メモリ１２に記憶された音声データに対し音声認識を行い、音声認識できたか否かを判別する（Ｓ１５）。音声認識できなかった場合、プロセッサ１１は、ステップＳ１４に戻り、再度、利用者の発話を取得する。 The processor 11 displays an inspection item input screen GA2 in which the check (N-1) item has been input on the display device 15 (S13). In this display state, the processor 11 picks up the user's speech with the voice input device 14 (S14), as in step S7. The processor 11 stores audio data of the collected audio in the memory 12 . The speech recognition unit 25 of the processor 11 performs speech recognition on the speech data stored in the memory 12, and determines whether or not the speech has been recognized (S15). If the speech cannot be recognized, the processor 11 returns to step S14 and acquires the user's speech again.

プロセッサ１１は、音声認識されたテキストデータが該当する項目（チェックＮの点検項目）に対する入力（例えば「ＯＫ」）であるかを判別する（Ｓ１６）。音声認識されたテキストデータがチェックＮの点検項目に対する入力でない場合、例えば、利用者が「前の項目ＮＧ」と発話した場合、プロセッサ１１は、音声認識されたテキストデータが前の点検項目（チェック（Ｎ－１）の点検項目）に対する入力であるか否かを判別する（Ｓ１７）。 The processor 11 determines whether the text data obtained by voice recognition is an input (for example, "OK") for the corresponding item (inspection item of check N) (S16). If the voice-recognized text data is not an input for the check item of check N, for example, if the user utters "previous item NG", the processor 11 detects that the voice-recognized text data is the previous check item (check It is determined whether or not the input is for (N-1) inspection item) (S17).

音声認識されたテキストデータがチェックＮ－１の入力項目に対する入力である場合、プロセッサ１１は、訂正されたチェック（Ｎ－１）のテキストデータを反映するように、点検項目の入力画面ＧＡ２を更新する（Ｓ１８）。この後、プロセッサ１１は、ステップＳ１４の処理に戻る。なお、ステップＳ１７で音声認識されたテキストデータがチェックＮ－１の点検項目に対する入力でない場合、プロセッサ１１は、ステップＳ１４の処理に戻る。このとき、プロセッサ１１は、何も表示しなくてよいし、再入力を促すように、表示装置１５に表示してもよい。 If the voice-recognized text data is the input for the check N-1 input item, the processor 11 updates the check item input screen GA2 so as to reflect the corrected check (N-1) text data. (S18). After that, the processor 11 returns to the process of step S14. Note that if the text data voice-recognized in step S17 is not an input for the inspection item of check N-1, the processor 11 returns to the process of step S14. At this time, the processor 11 does not have to display anything, or may display on the display device 15 to prompt for re-input.

ステップＳ１６で音声認識されたテキストデータがチェックＮの点検項目に対する入力である場合、プロセッサ１１は、点検結果画面ＧＡ３（図７参照）を生成する（Ｓ１９）。プロセッサ１１は、表示装置１５に点検結果画面ＧＡ３を表示する（Ｓ２０）。この後、プロセッサ１１は音声認識の動作を終了する。 If the text data voice-recognized in step S16 is an input for the inspection item of check N, the processor 11 generates an inspection result screen GA3 (see FIG. 7) (S19). The processor 11 displays the inspection result screen GA3 on the display device 15 (S20). After this, the processor 11 terminates the voice recognition operation.

以上により、実施の形態１の変形例１に係る操作端末１０は、利用者が発話すると、自機で音声認識を行い、音声認識結果を表示する。したがって、ネットワーク環境が無い場所で操作端末を使用できる。ネットワーク環境を使用しない、また、音声処理装置を必要としないことで、低コスト化を図ることができる。 As described above, when the user speaks, the operation terminal 10 according to the first modification of the first embodiment performs speech recognition on its own and displays the speech recognition result. Therefore, the operation terminal can be used in places where there is no network environment. Cost reduction can be achieved by not using a network environment and not requiring an audio processing device.

（実施の形態２）
実施の形態１では、点検項目の入力操作は一画面内で順番に行われたが、実施の形態２では、点検項目ごとに画面が遷移して入力操作が行われる場合を示す。実施の形態２の操作入力システム５の構成は実施の形態１に係る操作入力システム５の構成とほぼ同一の構成を有する。従って、実施の形態１と同一の構成要素については同一の符号を用いることで、その説明を簡略化あるいは省略し、異なる内容について説明する。 (Embodiment 2)
In the first embodiment, input operations for inspection items are sequentially performed within one screen, but in the second embodiment, the screen transitions for each inspection item and input operations are performed. The configuration of the operation input system 5 according to the second embodiment is substantially the same as the configuration of the operation input system 5 according to the first embodiment. Therefore, by using the same reference numerals for the same components as those in the first embodiment, the description thereof will be simplified or omitted, and the different contents will be described.

図１０は、実施の形態２に係る表示装置１５の音声入力画面の遷移を示す図である。操作端末１０は、起動後、実施の形態１と同様、表示装置１５に社員ＩＤの入力画面ＧＡ１を表示する。社員ＩＤの入力画面ＧＡ１には、「社員ＩＤの入力」のメッセージｍｓ１、およびその下方に入力ボックスｂｘ１が表示される。操作端末１０は、音声入力装置１４で利用者が発する音声を収音し、音声に含まれる数字を社員ＩＤとして受け付ける。図１０の例では、利用者が「１２３４５６」という音声を発したことで、その音声の認識結果である「１２３４５６」が入力ボックスｂｘ１に入力されている。 FIG. 10 is a diagram showing transition of the voice input screen of the display device 15 according to the second embodiment. After being activated, the operation terminal 10 displays an employee ID input screen GA1 on the display device 15, as in the first embodiment. On the employee ID input screen GA1, a message ms1 of "enter employee ID" and an input box bx1 below it are displayed. The operation terminal 10 picks up the voice uttered by the user with the voice input device 14 and accepts the number contained in the voice as the employee ID. In the example of FIG. 10, the user uttered the voice "123456", and the voice recognition result "123456" is input to the input box bx1.

操作端末１０は、社員ＩＤの音声の認識結果を受け付けると、その社員ＩＤに対応する社員が予め登録された社員であるか否かを認証等し、その認証が成功した場合に、音声処理装置５０から画面データを受信し、チェック１の点検項目の入力画面ＧＡ１２を表示する。入力画面ＧＡ１２の上側には、直前の入力画面ＧＡ１の表示時に収音された利用者の音声の認識によって入力された社員ＩＤが表示される。入力画面ＧＡ１２の下側には、チェック１、点検内容ｃｔ１、点検結果ｅｆ１がそれぞれ表示される。チェック１の点検内容は、エンジンの確認である。点検結果は、「ＯＫ」または「ＯＫ」である。 When the operation terminal 10 receives the recognition result of the voice of the employee ID, the operation terminal 10 authenticates whether or not the employee corresponding to the employee ID is a pre-registered employee. The screen data is received from 50, and the input screen GA12 of the check item of check 1 is displayed. On the upper side of the input screen GA12, the employee ID input by recognizing the user's voice picked up when the previous input screen GA1 was displayed is displayed. Check 1, inspection content ct1, and inspection result ef1 are displayed on the lower side of the input screen GA12. The contents of check 1 are confirmation of the engine. The inspection result is "OK" or "OK".

操作端末１０は、チェック１の点検結果が確認されると、音声処理装置５０から画面データを受信し、チェック２の点検項目の入力画面ＧＡ１３を表示する。入力画面ＧＡ１３の上側には、前の入力画面ＧＡ１２で入力されたチェック１の結果が表示される。入力画面ＧＡ１３の下側には、チェック２、点検内容ｃｔ２、および点検結果が表示される。チェック２の点検内容は、ブレーキディスクの確認である。点検結果は、「ＯＫ」または「ＯＫ」である。以後、チェックＮの入力画面まで同様に画面遷移が行われる。 When the inspection result of Check 1 is confirmed, the operation terminal 10 receives the screen data from the voice processing device 50 and displays the input screen GA13 of the inspection item of Check 2. FIG. On the upper side of the input screen GA13, the result of check 1 input on the previous input screen GA12 is displayed. Check 2, inspection details ct2, and inspection results are displayed on the lower side of the input screen GA13. The contents of the check 2 are confirmation of the brake disc. The inspection result is "OK" or "OK". After that, the screen transition is performed in the same manner up to the input screen of the check N.

次に、実施の形態２に係る操作入力システム５の動作手順について説明する。 Next, operation procedures of the operation input system 5 according to the second embodiment will be described.

実施の形態２においても、実施の形態１と同様に、点検対象物（例えば車両）を点検する作業者の手が塞がれている状態で、作業者が車両の点検結果を記録する例をユースケースとして説明する。図１１および図１２は、操作入力システム５における音声認識手順を示すシーケンス図である。実施の形態１と同様の手順については、同一の手順番号を付すことでその説明を省略する。 In the second embodiment, as in the first embodiment, an example in which the operator records the inspection result of the vehicle while the operator who inspects the object to be inspected (for example, the vehicle) has their hands occupied is taken as an example. Described as a use case. 11 and 12 are sequence diagrams showing the speech recognition procedure in the operation input system 5. FIG. Procedures that are the same as in Embodiment 1 are assigned the same procedure numbers, and descriptions thereof are omitted.

図１１において、手順Ｔ５で音声認識されたテキストデータが社員ＩＤの入力項目に対する入力である場合、制御部６１は、チェック１の点検項目の画面ＧＡ１２を生成する（Ｔ６Ａ）。制御部６１は、チェック１の画面データを操作端末１０に送信する（Ｔ７Ａ）。操作端末１０のプロセッサ１１は、受信した画面データを基に、チェック１の点検項目の入力画面ＧＡ１２を表示する（Ｔ８Ａ）。 In FIG. 11, when the text data voice-recognized in step T5 is an input for the employee ID input item, the control unit 61 generates a check item screen GA12 for check 1 (T6A). The control unit 61 transmits the screen data of check 1 to the operation terminal 10 (T7A). The processor 11 of the operation terminal 10 displays the input screen GA12 of the inspection item of check 1 based on the received screen data (T8A).

手順Ｔ１２で音声認識されたテキストデータが社員ＩＤの入力項目に対する入力である場合、つまり６桁の数字である場合、音声処理装置５０の制御部６１は、訂正された社員ＩＤを含むチェック１の点検項目の入力画面ＧＡ１２を生成する（Ｔ１２Ａ）。制御部６１は、通信回路５３およびネットワークＮＷを介して、訂正されたチェック１の点検項目の入力画面ＧＡ１２の画面データを操作端末１０に送信する（Ｔ１３Ａ）。操作端末１０のプロセッサ１１は、受信した画面データを基に、チェック１の点検項目の入力画面ＧＡ１２を更新する（Ｔ１４Ａ）。 If the text data voice-recognized in step T12 is an input for the employee ID input item, that is, if it is a six-digit number, the control unit 61 of the voice processing device 50 performs check 1 including the corrected employee ID. An inspection item input screen GA12 is generated (T12A). The control unit 61 transmits the screen data of the corrected inspection item input screen GA12 of the check 1 to the operation terminal 10 via the communication circuit 53 and the network NW (T13A). The processor 11 of the operation terminal 10 updates the input screen GA12 of the check item of check 1 based on the received screen data (T14A).

また、手順Ｔ１５で音声認識されたテキストデータがチェック１の点検項目に対する入力である場合、音声処理装置５０の制御部６１は、チェック１の確認結果を含むチェック２の点検項目の入力画面ＧＡ１３を生成する（Ｔ１５Ａ）。制御部６１は、通信回路５３およびネットワークＮＷを介して、チェック２の点検項目の入力画面ＧＡ１３の画面データを操作端末１０に送信する（Ｔ１６Ａ）。操作端末１０のプロセッサ１１は、受信した画面データを基に、チェック２の点検項目の入力画面ＧＡ１３を更新する（Ｔ１７Ａ）。 Further, when the text data voice-recognized in step T15 is an input for the check item of check 1, the control unit 61 of the voice processing device 50 displays the input screen GA13 of the check item of check 2 including the confirmation result of check 1. Generate (T15A). The control unit 61 transmits the screen data of the input screen GA13 of the inspection item of check 2 to the operation terminal 10 via the communication circuit 53 and the network NW (T16A). The processor 11 of the operation terminal 10 updates the input screen GA13 of the check item of check 2 based on the received screen data (T17A).

手順Ｔ９～手順Ｔ１７Ａまでの同様の処理は、点検項目の入力画面の数回分繰り返される。つまり、点検項目の入力画面を第ｍ番目の画面（ｍ＝２～Ｎ）で表すと、手順Ｔ９～手順Ｔ１７Ａまでの同様の処理は、ｍ＝２～Ｎで行われる。 The same processing from procedure T9 to procedure T17A is repeated for several inspection item input screens. That is, if the inspection item input screen is represented by the m-th screen (m=2 to N), the same processing from procedure T9 to procedure T17A is performed for m=2 to N.

その後、手順Ｔ２１で音声認識されたテキストデータがチェック（Ｎ－１）の点検項目に対する入力である場合、音声処理装置５０の制御部６１は、訂正されたチェック（Ｎ－１）の入力を含むチェックＮの点検項目の入力画面を生成する（Ｔ２１Ａ）。制御部６１は、通信回路５３およびネットワークＮＷを介して、訂正されたチェック（Ｎ－１）の入力を含むチェックＮの点検項目の入力画面の画面データを操作端末１０に送信する（Ｔ２２）。操作端末１０のプロセッサ１１は、受信した画面データを基に、手順Ｔ２４でチェックＮの点検項目の入力画面を更新する（Ｔ２３）。 After that, if the text data voice-recognized in step T21 is an input for the inspection item of check (N-1), the control unit 61 of the voice processing device 50 includes the input of the corrected check (N-1). An input screen for inspection items for check N is generated (T21A). The control unit 61 transmits the screen data of the input screen of the check item of check N including the input of the corrected check (N-1) to the operation terminal 10 via the communication circuit 53 and the network NW (T22). Based on the received screen data, the processor 11 of the operation terminal 10 updates the input screen of the check item of check N in step T24 (T23).

また、手順Ｔ２１で音声認識されたテキストデータがチェックＮの点検項目に対する入力である場合、音声処理装置５０の制御部６１は、手順Ｔ２５でチェックＮの確認結果を含む点検結果画面ＧＡ３（図７参照）の画面データを生成する。以後の動作は、実施の形態１と同様であるため、説明を省略する。 Further, when the text data voice-recognized in step T21 is an input for the inspection item of check N, the control unit 61 of the voice processing device 50 displays the check result screen GA3 (FIG. 7) including the confirmation result of check N in step T25. reference) to generate the screen data. Since subsequent operations are the same as those in the first embodiment, description thereof is omitted.

このように、実施の形態２の操作入力システム５では、利用者の手を必要とすることなく入力操作を完結できる。点検項目が変わる度に、表示装置に表示される点検項目の入力画面が変化するので、利用者が視覚的に分かり易い入力操作を行うことができる。また、利用者は、点検項目が多い場合でも、次に点検する内容を即座に把握できる。 Thus, in the operation input system 5 of Embodiment 2, the input operation can be completed without requiring the user's hand. Since the input screen of the inspection item displayed on the display device changes every time the inspection item changes, the user can perform visually easy-to-understand input operations. In addition, even if there are many inspection items, the user can immediately grasp the contents to be inspected next.

以上により、実施の形態２の操作入力方法は、操作端末１０に例えばチェック１の点検項目の入力画面ＧＡ１２（第ｍ番目の画面（２≦ｍ≦Ｎを満たす整数、Ｎ：４以上の整数））が表示された状態でユーザの発する音声の認識結果が入力画面ＧＡ１２に示されるチェック１の点検項目に対する入力内容と合致するか否かを判断するステップと、音声の認識結果が入力画面ＧＡ１２に示されるチェック１の点検項目に対する入力内容と合致する場合、入力画面ＧＡ１２（第ｍ番目の画面）からチェック２の点検項目の入力画面ＧＡ１３（第（ｍ＋１）番目の画面）への表示の切り替えと、入力画面ＧＡ１２に示されるチェック１の点検項目に対応する音声の認識結果のチェック２の点検項目の入力画面ＧＡ１３への表示とを操作端末１０に指示するステップと、を有する。 As described above, according to the operation input method of the second embodiment, the input screen GA12 (the m-th screen (integer satisfying 2≤m≤N, N: an integer equal to or greater than 4) for the inspection item of check 1 is displayed on the operation terminal 10, for example. ) is displayed, a step of determining whether or not the recognition result of the voice uttered by the user matches the input content for the inspection item of check 1 shown on the input screen GA12; If the input content for the inspection item of check 1 shown matches, the display is switched from the input screen GA12 (mth screen) to the input screen GA13 ((m+1)th screen) of the inspection item of check 2. and a step of instructing the operation terminal 10 to display, on the input screen GA13, the inspection item for Check 2 of the speech recognition result corresponding to the inspection item for Check 1 displayed on the input screen GA12.

これにより、点検項目の連続的な入力が可能となり、操作性が向上する。また、画面が切り替わることで、ユーザが次の点検項目の入力操作に移行したことに気付き易くなる。 This enables continuous input of inspection items, improving operability. In addition, the switching of the screen makes it easier for the user to notice that the input operation for the next inspection item has been performed.

また、音声を認識するステップは、音声の認識結果が入力画面ＧＡ１２に示されるチェック１の点検項目（第ｍ番目の画面に示される確認項目）に対する入力内容と合致しない場合、利用者が発話した「前の項目ＮＧ」、「チェック３ＮＯ」等のキーワード（所定のキーワード）とチェック１の点検項目に対してユーザが再度発する音声との認識処理を受け付けるステップを含む。これにより、一旦、入力が完了した後でも、前の項目の入力内容を簡単に訂正できる。 Further, in the voice recognition step, if the voice recognition result does not match the input content for the inspection item of check 1 (confirmation item shown on the m-th screen) shown on the input screen GA12, the user speaks. It includes a step of receiving recognition processing of keywords (predetermined keywords) such as “previous item NG” and “check 3 NO” and the voice uttered by the user again for the inspection item of check 1. FIG. This makes it possible to easily correct the input contents of the previous item even after the input is completed.

また、音声の認識結果が点検結果画面ＧＡ３（第Ｎ番目の画面）に示される点検項目に対する入力内容と合致する場合、入力画面ＧＡ１２（第２番目の画面）から第Ｎ番目の画面までのそれぞれの点検項目とそれぞれの点検項目に対するユーザの発する音声の認識結果とを対応付けた点検結果画面ＧＡ３（認識結果）の表示を操作端末１０に指示するステップ、を更に有する。これにより、ユーザは、全ての点検項目の入力内容を一覧で視覚的に確認できる。誤入力を見つけ易くなり、入力ミスの低減を図ることができる。 Also, if the speech recognition result matches the input content for the inspection item shown on the inspection result screen GA3 (Nth screen), each of the input screens GA12 (second screen) to the Nth screen and a step of instructing the operation terminal 10 to display an inspection result screen GA3 (recognition result) in which the inspection items and the recognition results of the user's voice for each inspection item are associated with each other. Thereby, the user can visually confirm the input contents of all inspection items in a list. It becomes easy to find erroneous input, and it is possible to reduce input errors.

また、入力画面ＧＡ１２である第２番目の入力画面から第Ｎ番目の入力画面までに示される点検項目に対する入力内容は、社員ＩＤの入力画面ＧＡ１に示される点検項目に対する入力内容と対応付けられる。これにより、社員ＩＤごとに点検項目を管理できる。また、ユーザは、各点検項目の入力画面ＧＡ１２，ＧＡ１３に示された点検項目の入力内容と自身が想定している点検項目の内容とを比較し、その正誤を容易に確認できる。 Also, the input contents for the inspection items shown on the input screen GA12 from the second input screen to the Nth input screen are associated with the input contents for the inspection items shown on the employee ID input screen GA1. Thereby, inspection items can be managed for each employee ID. In addition, the user can compare the input contents of the inspection items shown on the input screens GA12 and GA13 of each inspection item with the contents of the inspection items assumed by the user, and can easily confirm whether the input contents are correct or not.

また、利用者が発話した「前の項目ＮＧ」、「チェック３ＮＯ」等のキーワード（所定のキーワード）は、１つ前の画面（第ｍ－１番目の画面）に示される点検項目に対する入力内容の訂正（修正）を表すテキストデータ（情報）である。これにより、１つ前の点検項目を簡単に訂正できる。 In addition, keywords (predetermined keywords) such as "previous item NG" and "check 3 NO" uttered by the user are input to the inspection item displayed on the previous screen (m-1th screen). This is text data (information) representing the correction (modification) of the content. This makes it possible to easily correct the previous check item.

（実施の形態２の変形例１）
実施の形態２では、音声処理装置５０が操作端末１０から音声データを受信して音声認識を行う場合を示したが、実施の形態２の変形例１では、操作端末１０が音声認識を行う例を説明する。実施の形態２の変形例１に係る操作端末１０は、実施の形態１と同一の構成を有する。実施の形態１と同一の構成要素については同一の符号を用いることで、その説明を省略する。 (Modification 1 of Embodiment 2)
In the second embodiment, the case where the voice processing device 50 receives voice data from the operation terminal 10 and performs voice recognition has been described. explain. The operation terminal 10 according to Modification 1 of Embodiment 2 has the same configuration as that of Embodiment 1. FIG. The same reference numerals are used for the same components as in the first embodiment, and the description thereof is omitted.

図１３および図１４は、実施の形態２の変形例１に係る音声認識の動作手順例を示すフローチャートである。実施の形態１の変形例１と同一のステップ処理については同一のステップ番号を付す。 13 and 14 are flowcharts showing an example of a speech recognition operation procedure according to Modification 1 of Embodiment 2. FIG. The same step numbers are assigned to the step processes that are the same as those in Modification 1 of Embodiment 1. FIG.

図１３において、利用者による電源オンの操作等によって操作端末１０が起動すると、操作端末１０のプロセッサ１１は、音声認識の動作を開始する。プロセッサ１１は、社員ＩＤの入力画面ＧＡ１を表示装置１５に表示する（Ｓ１）。社員ＩＤの入力画面ＧＡ１が表示された状態で、利用者が音声（例えば、番号「１２３４５６」）を発する。プロセッサ１１は、音声入力装置１４で利用者の発話を収音する（Ｓ２）。 In FIG. 13, when the operating terminal 10 is activated by a user's power-on operation or the like, the processor 11 of the operating terminal 10 starts speech recognition operation. The processor 11 displays an employee ID input screen GA1 on the display device 15 (S1). With the employee ID input screen GA1 displayed, the user speaks (for example, the number "123456"). The processor 11 picks up the user's speech with the voice input device 14 (S2).

プロセッサ１１は、社員ＩＤの認証ＯＫ、および社員ＩＤに対応する複数の点検項目を基に、チェック１の点検項目の入力画面ＧＡ１２を生成する（Ｓ５Ａ）。プロセッサ１１は、チェック１の点検項目の入力画面ＧＡ１２を表示装置１５に表示する（Ｓ６Ａ）。チェック１の点検項目の入力画面ＧＡ１２には、社員ＩＤの入力画面ＧＡ１で入力された社員ＩＤが表示される。 The processor 11 generates an inspection item input screen GA12 for check 1 based on the authentication OK of the employee ID and a plurality of inspection items corresponding to the employee ID (S5A). The processor 11 displays the input screen GA12 of the inspection item of check 1 on the display device 15 (S6A). The input screen GA12 of the check item of check 1 displays the employee ID input on the input screen GA1 of the employee ID.

ステップＳ８で音声を認識できた場合、プロセッサ１１は、音声認識されたテキストデータが該当する項目（チェック１の入力項目）に対する入力であるか否かを判別する（Ｓ９）。音声認識されたテキストデータがチェック１の入力項目に対する入力でない場合、プロセッサ１１は、音声認識されたテキストデータが前の入力項目（社員ＩＤの入力項目）に対する入力であるか否かを判別する（Ｓ１０）。社員ＩＤの入力項目に対する入力でない場合、プロセッサ１１は、ステップＳ７の処理に戻る。なお、社員ＩＤの入力項目に対する入力でない場合、プロセッサ１１は、何も表示しなくてよいし、再入力を促すように、表示装置１５に表示してもよい。 If the voice is recognized in step S8, the processor 11 determines whether or not the voice-recognized text data is an input for the corresponding item (input item for check 1) (S9). If the voice-recognized text data is not the input for the input item of check 1, the processor 11 determines whether or not the voice-recognized text data is the input for the previous input item (employee ID input item) ( S10). If the input item is not the employee ID, the processor 11 returns to the process of step S7. Note that if the input is not for the employee ID input item, the processor 11 may display nothing, or may display on the display device 15 to prompt re-input.

ステップＳ１０で音声認識されたテキストデータが社員ＩＤの入力項目に対する入力である場合、つまり６桁の数字である場合、プロセッサ１１は、この入力が社員ＩＤの訂正であると判断し、訂正された社員ＩＤを反映するように、チェック１の点検項目の入力画面ＧＡ１２を更新する（Ｓ１１Ａ）。この後、プロセッサ１１は、ステップＳ７の処理に戻る。 If the text data voice-recognized in step S10 is an input for an employee ID input item, that is, if it is a 6-digit number, the processor 11 determines that this input is a correction of the employee ID and corrects the input. The check item input screen GA12 of check 1 is updated so as to reflect the employee ID (S11A). After that, the processor 11 returns to the process of step S7.

ステップＳ９で音声認識されたテキストデータがチェック１の点検項目に対する入力である場合、プロセッサ１１は、チェック２の点検項目の入力画面ＧＡ１３を生成する（Ｓ９Ａ）。プロセッサ１１は、チェック２の点検項目の入力画面ＧＡ１３を表示装置１５に表示する（Ｓ１２Ａ）。チェック２の点検項目の入力画面ＧＡ１３には、チェック１の点検項目の入力画面ＧＡ１２で入力された確認結果（例えば「ＯＫ」）が表示される（図１０参照）。 If the text data voice-recognized in step S9 is an input for the check item of check 1, the processor 11 generates an input screen GA13 of the check item of check 2 (S9A). The processor 11 displays the input screen GA13 of the inspection item of check 2 on the display device 15 (S12A). The input screen GA13 of the check item of check 2 displays the confirmation result (for example, "OK") input on the input screen GA12 of the check item of check 1 (see FIG. 10).

ステップＳ７～Ｓ１２Ａまでの同様の処理は、点検項目の数に相当するＮ回分繰り返される。つまり、点検項目の入力画面を第ｍ番目の画面（ｍ＝２～Ｎ）で表すと、ステップＳ７～Ｓ１２Ａまでの同様の処理は、ｍ＝２～Ｎで行われる。 Similar processing from steps S7 to S12A is repeated N times corresponding to the number of inspection items. That is, if the inspection item input screen is represented by the m-th screen (m=2 to N), the same processing of steps S7 to S12A is performed for m=2 to N.

その後、プロセッサ１１は、表示装置１５にチェックＮの点検項目の入力画面ＧＡ２を表示する（Ｓ１３Ａ）。チェックＮの点検項目の入力画面ＧＡＮには、チェックＮ－１の点検項目の入力画面で入力された確認結果（例えば「ＯＫ」）が表示される。 After that, the processor 11 displays the input screen GA2 of the check item of check N on the display device 15 (S13A). The input screen GAN for the check item of check N displays the confirmation result (for example, "OK") input on the input screen of the check item for check N-1.

プロセッサ１１は、この表示状態で、ステップＳ７と同様、音声入力装置１４で利用者の発話を収音する（Ｓ１４）。プロセッサ１１は、収音された音声の音声データをメモリ１２に記憶する。プロセッサ１１の音声認識部２５は、メモリ１２に記憶された音声データに対し音声認識を行い、音声認識できたか否かを判別する（Ｓ１５）。音声認識できなかった場合、プロセッサ１１は、ステップＳ１４に戻り、再度、利用者の発話を取得する。 In this display state, the processor 11 picks up the user's speech with the voice input device 14 (S14), as in step S7. The processor 11 stores audio data of the collected audio in the memory 12 . The speech recognition unit 25 of the processor 11 performs speech recognition on the speech data stored in the memory 12, and determines whether or not the speech has been recognized (S15). If the speech cannot be recognized, the processor 11 returns to step S14 and acquires the user's speech again.

プロセッサ１１は、音声認識されたテキストデータが該当する項目（チェックＮの入力項目）に対する入力（例えば「ＯＫ」）であるかを判別する（Ｓ１６）。音声認識されたテキストデータがチェックＮの入力項目に対する入力でない場合、例えば、利用者が「前の項目ＮＧ」と発話した場合、プロセッサ１１は、音声認識されたテキストデータが前の入力項目（チェックＮ－１の入力項目）に対する入力であるか否かを判別する（Ｓ１７）。 The processor 11 determines whether or not the text data that has undergone speech recognition is an input (for example, "OK") for the corresponding item (input item for check N) (S16). If the voice-recognized text data is not an input for the check N input item, for example, if the user utters "Previous item NG", the processor 11 determines that the voice-recognized text data is the previous input item (check N-1 input items) is determined (S17).

音声認識されたテキストデータがチェックＮ－１の入力項目に対する入力である場合、プロセッサ１１は、訂正されたチェックＮ－１のテキストデータを反映するように、チェックＮの点検項目の入力画面を更新する（Ｓ１８Ａ）。この後、プロセッサ１１は、ステップＳ１４の処理に戻る。なお、ステップＳ１７で音声認識されたテキストデータがチェックＮ－１の入力項目に対する入力でない場合、プロセッサ１１は、ステップＳ１４の処理に戻る。このとき、プロセッサ１１は、何も表示しなくてよいし、再入力を促すように、表示装置１５に表示してもよい。 If the voice-recognized text data is the input for the input item for Check N-1, the processor 11 updates the check item input screen for Check N to reflect the corrected text data for Check N-1. (S18A). After that, the processor 11 returns to the process of step S14. It should be noted that if the text data voice-recognized in step S17 is not an input for the input item of check N-1, the processor 11 returns to the process of step S14. At this time, the processor 11 does not have to display anything, or may display on the display device 15 to prompt for re-input.

このように、実施の形態２の変形例１における操作端末１０は、利用者が発話すると、自機で音声認識を行い、音声認識結果を表示する。ネットワーク環境が無い場所で操作端末を使用できる。ネットワーク環境を使用しないこと、また、音声処理装置を必要としないことで、低コスト化を図ることができる。また、点検項目が変わる度に、表示装置に表示される点検項目の入力画面が変化するので、利用者が視覚的に分かり易い入力操作を行うことができる。また、利用者は、点検項目が多い場合でも、次に点検する内容を即座に把握できる。 As described above, when the user speaks, the operation terminal 10 according to the first modification of the second embodiment performs speech recognition on its own and displays the speech recognition result. The operation terminal can be used in places where there is no network environment. Cost reduction can be achieved by not using a network environment and not requiring an audio processing device. In addition, every time the inspection item changes, the input screen of the inspection item displayed on the display device changes, so that the user can perform visually easy-to-understand input operations. In addition, even if there are many inspection items, the user can immediately grasp the contents to be inspected next.

（実施の形態１，２の変形例２）
実施の形態１、２では、音声処理装置５０は、操作端末１０から送信された音声データを基に音声認識を行い、この音声認識の結果を基に画面データを生成し、操作端末１０に送信した。操作端末１０は、音声処理装置５０から送信された画面データを受信し、表示装置１５に各種の画面を表示した。 (Modification 2 of Embodiments 1 and 2)
In the first and second embodiments, the speech processing device 50 performs speech recognition based on speech data transmitted from the operation terminal 10, generates screen data based on the result of this speech recognition, and transmits the screen data to the operation terminal 10. bottom. The operation terminal 10 received the screen data transmitted from the voice processing device 50 and displayed various screens on the display device 15 .

実施の形態１，２の変形例２では、操作端末１０は、音声データを音声処理装置５０に送信し、音声処理装置５０から音声認識されたテキストデータを受信する。操作端末１０は、受信したテキストデータを基に、自機で各種画面（社員ＩＤの入力画面ＧＡ１、各点検項目の入力画面ＧＡ２、および点検結果画面ＧＡ３）の画面データを生成し、表示装置１５に表示する。 In Modified Example 2 of Embodiments 1 and 2, operation terminal 10 transmits voice data to voice processing device 50 and receives text data whose voice has been recognized from voice processing device 50 . Based on the received text data, the operation terminal 10 generates screen data for various screens (employee ID input screen GA1, inspection item input screen GA2, and inspection result screen GA3). to display.

これにより、操作端末が音声認識を行う処理を省くことができ、かつ、音声処理装置が画面データを生成する処理を省くことができる。また、データ量の多い画面データをネットワークを介して通信しなくて済み、通信量が減ることでネットワーク通信のトラフィックを低減できる。 As a result, the operation terminal can omit the process of performing voice recognition, and the voice processing device can omit the process of generating screen data. In addition, screen data with a large amount of data does not need to be communicated via the network, and the traffic of network communication can be reduced by reducing the amount of communication.

以上、図面を参照しながら各種の実施の形態について説明したが、本開示はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても当然に本開示の技術的範囲に属するものと了解される。また、発明の趣旨を逸脱しない範囲において、上述した各種の実施の形態における各構成要素を任意に組み合わせてもよい。 Various embodiments have been described above with reference to the drawings, but it goes without saying that the present disclosure is not limited to such examples. It is obvious that a person skilled in the art can conceive of various modifications, modifications, substitutions, additions, deletions, and equivalents within the scope of the claims. Naturally, it is understood that it belongs to the technical scope of the present disclosure. In addition, the constituent elements of the various embodiments described above may be combined arbitrarily without departing from the gist of the invention.

例えば、前述した実施の形態２では、１つの点検項目と１つの画面とが１対１に対応していた。つまり、１つの画面には、１つの点検項目の内容および入力が表示された。２以上の点検項目と１つの画面とを対応付け、１つの画面に２以上の点検項目の内容および入力が表示されてもよい。これにより、音声処理装置および操作端末が画面を生成する処理を軽減できる。 For example, in Embodiment 2 described above, one inspection item and one screen corresponded one-to-one. That is, one screen displayed the contents and input of one inspection item. Two or more inspection items may be associated with one screen, and the contents and inputs of two or more inspection items may be displayed on one screen. As a result, it is possible to reduce the processing of generating screens by the voice processing device and the operation terminal.

また、前述した各実施の形態では、車両を点検する際、作業者による点検項目の表示を例示したが、車両の点検に限らず、工場で作業者が物を生産する工程の表示や、電柱等の高所で作業者が作業する工程の表示についても、本開示は同様に適用可能である。また、利用者が発話することで動作する、スマートスピーカやこれに連動する機器が表示機能を有する場合、ユーザが操作順に発話した音声認識結果を表示する際にも、本開示は同様に適用可能である。 In each of the above-described embodiments, when inspecting a vehicle, the display of inspection items by the worker was exemplified. The present disclosure is similarly applicable to display of processes in which workers work at high places such as. In addition, if a smart speaker or a device linked to it that operates when a user speaks has a display function, the present disclosure can be similarly applied when displaying voice recognition results spoken by the user in the order of operation. is.

本開示は、ユーザが手を用いた操作を行うことが難しい状況等でも、ユーザ操作の入力時の利便性を向上する操作入力方法、操作入力システムおよび操作端末として有用である。 INDUSTRIAL APPLICABILITY The present disclosure is useful as an operation input method, an operation input system, and an operation terminal that improve convenience in inputting user operations even in situations where it is difficult for the user to perform operations using hands.

５操作入力システム
１０操作端末
１１プロセッサ
１２、５２メモリ
１３、５３通信回路
１４音声入力装置
１５表示装置
２５音声認識部
５０音声処理装置
５１プロセッサ
５４ストレージ
６１制御部
６２音声認識部
６３キーワードマッチング部
５４１キーワードデータベース 5 operation input system 10 operation terminal 11 processors 12, 52 memories 13, 53 communication circuit 14 voice input device 15 display device 25 voice recognition unit 50 voice processing device 51 processor 54 storage 61 control unit 62 voice recognition unit 63 keyword matching unit 541 keyword database

Claims

a step of collecting a voice uttered by a user while the first screen is displayed on the operation terminal;
recognizing the collected speech;
a step of determining whether or not the speech recognition result matches the input content for the confirmation items shown on the first screen;
a step of instructing the operating terminal to switch display from the first screen to a second screen when the voice recognition result matches the input content for the confirmation item displayed on the first screen;
A recognition result of the voice uttered by the user is displayed on the m-th screen while the m-th screen (an integer satisfying 2≤m≤N, where N is an integer equal to or greater than 4) is displayed on the operation terminal. a step of determining whether or not it matches the input content for the confirmation item;
Instructing the operation terminal to switch the display from the m-th screen to the (m+1)-th screen when the voice recognition result matches the input content for the confirmation item displayed on the m-th screen. and
Manipulation input method.

displaying the speech recognition result corresponding to the confirmation item shown on the first screen;
The operation input method according to claim 1.

displaying the speech recognition result corresponding to the confirmation item shown on the m-th screen;
The operation input method according to claim 1 or 2.

An operation input system in which an operation terminal having a voice input device and a display device and a voice processing device are communicably connected,
The operation terminal collects a voice uttered by the user with the voice input device while the first screen is displayed on the display device,
The operation terminal or the voice processing device recognizes the collected voice,
The speech processing device determines whether or not the speech recognition result matches the input content for the confirmation item shown on the first screen, and confirms the speech recognition result shown on the first screen. instructing the operating terminal to switch the display from the first screen to the second screen if the input content for the item matches;
The voice processing device is configured to recognize the recognition result of the voice uttered by the user while the m-th screen (integer satisfying 2≤m≤N; N: an integer equal to or greater than 4) is displayed on the operation terminal. It is determined whether or not the input content for the confirmation item shown on the m-th screen matches the input content for the confirmation item shown on the m-th screen, and if the speech recognition result matches the input content for the confirmation item shown on the m-th screen, instructing the operating terminal to switch the display from the screen to the (m+1)th screen,
operation input system.

a voice input device that picks up a voice uttered by a user while the first screen is displayed on the display device;
a recognition unit that recognizes the collected voice;
a control unit that determines whether or not the speech recognition result matches the input content for the confirmation items shown on the first screen,
The control unit instructs the display device to switch the display from the first screen to the second screen when the recognition result of the voice matches the input contents for the confirmation items displayed on the first screen. death,
The control unit controls the m-th screen (an integer satisfying 2≦m≦N, where N is an integer equal to or greater than 4) to be displayed on the display device, and the recognition result of the voice uttered by the user is recognized as the m-th screen. If the speech recognition result matches the input content for the confirmation item shown on the m-th screen, the m-th screen instructing the display device to switch the display from the screen to the (m+1)th screen;
operating terminal.