JP6090027B2

JP6090027B2 - Voice command compatible information terminal with specific sound

Info

Publication number: JP6090027B2
Application number: JP2013151448A
Authority: JP
Inventors: 勝彦井川
Original assignee: 株式会社ナカヨ
Priority date: 2013-07-22
Filing date: 2013-07-22
Publication date: 2017-03-08
Anticipated expiration: 2033-07-22
Also published as: JP2015023485A

Description

本発明は、情報端末に関し、音声コマンドにより、情報端末の自動制御を行う技術に関する。 The present invention relates to an information terminal, and more particularly to a technique for performing automatic control of an information terminal using a voice command.

従来、電話着信の応答処理を手操作で行っていたため、誤操作をしてしまうという問題がある。電話着信の代理応答処理を自動的に行う技術として、例えば特許文献１には、通話中に電話機のハンドセットから手を離した場合、自動的に電話機を保留するといった技術が開示されている。 Conventionally, since the incoming call response processing has been performed manually, there is a problem of erroneous operation. As a technology for automatically performing proxy response processing for incoming calls, for example, Patent Document 1 discloses a technology in which a telephone is automatically put on hold when a hand is released from the telephone handset during a call.

特開平５−３２７８３８号公報JP-A-5-327838

しかしながら、特許文献１の技術は、通話を保留にしたくない場合であっても電話機のハンドセットから手が離れたことを誤検知してしまうことにより、保留されることがある。本発明は、上記課題に鑑みてなされたものであり、その目的は音声コマンド前後の会話の内容により会話相手との通話から電話の代理応答処理を自動的に行う技術を提供することである。 However, the technique of Patent Document 1 may be put on hold by erroneously detecting that the hand has left the handset of the telephone even if it is not desired to put the call on hold. The present invention has been made in view of the above problems, and its object is to provide a automatically performing techniques proxy response processing of the telephone from the call between the conversational partner on the contents of the conversation around voice command is there.

本発明は上記課題を解決するために、音声コマンドにより所定の動作を実行する情報端末であって、音声コマンドの先頭のユーザが発する特定音声を登録する特定音声登録手段と、自端末に入力する音声から前記特定音声を検出する特定音声検出手段と、前記検出した特定音声に続く一連の特定音声信号を抽出する特定音声信号抽出手段と、前記抽出した特定音声信号を所定のサーバへ送信する特定音声信号送信手段と、前記送信した特定音声信号に対応する処理結果データを受信する処理結果受信手段と、前記受信した処理結果データを解析し自端末の動作に係るコマンドを判定するコマンド判定手段と、前記判定したコマンドに応じて自端末の動作を制御する端末制御手段と、を有することを特徴とする。 In order to solve the above-mentioned problems, the present invention is an information terminal that performs a predetermined operation by a voice command, and that is input to the own terminal with specific voice registration means for registering a specific voice uttered by the user at the head of the voice command Specific voice detecting means for detecting the specific voice from voice, specific voice signal extracting means for extracting a series of specific voice signals following the detected specific voice, and specification for transmitting the extracted specific voice signal to a predetermined server Audio signal transmitting means, processing result receiving means for receiving the processing result data corresponding to the transmitted specific voice signal, command determining means for analyzing the received processing result data and determining a command related to the operation of the terminal. , characterized by having a a terminal control means for controlling the operation of the terminal in response to the determination command.

本発明によれば、ユーザが発した音声による音声コマンドで自端末の操作を行うので、手入力または手操作による誤操作を防止するという効果がある。 According to the present invention, since the operation of the terminal is performed by a voice command by a voice uttered by the user, there is an effect of preventing an erroneous operation due to manual input or manual operation.

図１は、本発明の一実施の形態に係る音声コマンド対応情報端末システムの概略構成図である。FIG. 1 is a schematic configuration diagram of a voice command compatible information terminal system according to an embodiment of the present invention. 図２は、情報端末１の概略機能構成図である。FIG. 2 is a schematic functional configuration diagram of the information terminal 1. 図３は、特定音声記憶部１０４の登録内容例を模式的に表した図である。Figure 3 is a diagram schematically illustrating an example of registration content of a particular voice storage unit 10 4. 図４は、処理内容記憶部１１１の登録内容例を模式的に表した図である。Figure 4 is a diagram schematically showing a registration content example of the processing content storage unit 1 11. 図５は、本実施の形態に係る情報端末の動作例を説明するためのシーケンス図である。FIG. 5 is a sequence diagram for explaining an operation example of the information terminal according to the present embodiment. 図６は、情報端末１の特定音声登録動作を説明するためのフロー図である。FIG. 6 is a flowchart for explaining the specific voice registration operation of the information terminal 1. 図７は、情報端末１の音声による端末制御動作を説明するためのフロー図である。FIG. 7 is a flowchart for explaining the terminal control operation by the voice of the information terminal 1.

以下に、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

図１は、本実施の形態に係る音声コマンド対応情報端末システムの概略構成図である。図示するように、本実施の形態に係る音声コマンド対応情報端末システムは、ネットワーク２に接続する情報端末１と、音声認識サーバ３と、情報端末４とから構成される。 FIG. 1 is a schematic configuration diagram of a voice command compatible information terminal system according to the present embodiment. As shown in the figure, the voice command compatible information terminal system according to the present embodiment includes an information terminal 1 connected to a network 2, a voice recognition server 3, and an information terminal 4 .

情報端末１は、ネットワーク２を介して音声認識サーバ３に接続する。情報端末１は、電話機能を有する。
情報端末４は電話機能を有し、ネットワーク２を介して情報端末１と接続する。 The information terminal 1 is connected to the voice recognition server 3 via the network 2. The information terminal 1 has a telephone function.
The information terminal 4 has a telephone function and is connected to the information terminal 1 via the network 2.

音声認識サーバ３は、音声信号をテキスト変換する機能を有する。 The voice recognition server 3 has a function of converting voice signals into text.

図２は、情報端末１の概略機能構成図である。 FIG. 2 is a schematic functional configuration diagram of the information terminal 1.

情報端末１は、通信制御部１０１、マンマシンインタフェース部１０２、端末制御部１０３、特定音声記憶部１０４、特定音声信号抽出部１０５、特定音声検索部１０６、音声認識サーバ送信部１０７、テキスト受信部１０８、操作キーワード抽出部１０９、処理内容判定部１１０、処理内容記憶部１１１、呼制御部１１２から構成される。 The information terminal 1 includes a communication control unit 101, a man-machine interface unit 102, a terminal control unit 103, a specific voice storage unit 104, a specific voice signal extraction unit 105, a specific voice search unit 106, a voice recognition server transmission unit 107, and a text reception unit. 108, an operation keyword extraction unit 109, a processing content determination unit 110, a processing content storage unit 111, and a call control unit 112.

通信制御部１０１は、ネットワーク２と接続するための接続インタフェースならびにネットワーク２と情報端末１の通信制御全般を行う。通信制御部１０１は、音声認識サーバ送信部１０７から送信された特定音声信号を音声認識サーバ３に送信し、音声認識サーバ３より受信したテキスト情報をテキスト受信部１０８に渡す。 The communication control unit 101 performs a connection interface for connecting to the network 2 and overall communication control between the network 2 and the information terminal 1. The communication control unit 101 transmits the specific voice signal transmitted from the voice recognition server transmission unit 107 to the voice recognition server 3, and passes the text information received from the voice recognition server 3 to the text reception unit 108.

尚ここで、特定音声信号とは、後述の特定音声の後に続き、無音を検知するまでの音声信号を意味する。 Here, the specific sound signal means a sound signal that follows a specific sound described later and continues until silence is detected.

マンマシンインタフェース部１０２は、ユーザが通話するためのマイク・スピーカを備えた受話器、およびユーザからの入力操作を受付けるダイヤルキー、ユーザへ情報を表示するＬＣＤ・ＬＥＤ等のユーザインターフェースである。マンマシンインタフェース部１０２は、ユーザからの音声を受信すると、端末制御部１０３へ渡す。マンマシンインタフェース部１０２は、ユーザから発信操作に係るボタン入力操作を受けると、端末制御部１０３へボタン入力情報を渡す。マンマシンインタフェース部１０２は、端末制御部１０３からの指示に応じて鳴動等の呼接続に係る表示処理、通話音声出力を行う。マンマシンインタフェース部１０２は、ユーザから特定音声登録開始操作が行われた場合、特定音声登録処理を開始し、ユーザから特定音声を受信すると、特定音声を端末制御部１０３へ渡す。 The man-machine interface unit 102 is a user interface such as a receiver having a microphone / speaker for a user to talk, a dial key for receiving an input operation from the user, and an LCD / LED for displaying information to the user. When receiving the voice from the user, the man-machine interface unit 102 passes it to the terminal control unit 103. When the man-machine interface unit 102 receives a button input operation related to a call origination operation from the user, the man-machine interface unit 102 passes the button input information to the terminal control unit 103. In response to an instruction from the terminal control unit 103, the man-machine interface unit 102 performs display processing related to call connection such as ringing and voice communication output. Man-machine interface unit 102, if a particular voice registration start operation has been performed by the user, initiates a particular voice registration process, when receiving a specific audio from a user and passes the specific sound to the terminal control unit 103.

尚ここで、特定音声とは、特定音声信号の取得開始を指示する音声コマンドを意味する。 Here, the specific voice means a voice command instructing to start acquisition of a specific voice signal.

端末制御部１０３は、マンマシンインタフェース部１０２からユーザからの音声を受け取った場合、特定音声信号抽出部１０５と呼制御部１１２へ渡す。端末制御部１０３は、マンマシンインタフェース部１０２からボタン入力情報を受け取った場合、ボタン入力情報から呼接続指示情報を判定し、判定した呼接続指示情報を呼制御部１１２へ渡す。端末制御部１０３は、呼制御部１１２から保留、転送、発信それぞれの場合に関わる処理命令を受け取り、受け取った処理命令から鳴動等の呼接続に係る表示処理、通話音声出力の指示の内容を判定し、判定した指示内容をマンマシンインタフェース部１０２へ渡す。端末制御部１０３は、マンマシンインタフェース部１０２で特定音声登録処理が開始されて受信された特定音声を受け取り、受け取った特定音声を特定音声記憶部１０４へ渡す。 When the terminal control unit 103 receives a voice from the user from the man-machine interface unit 102, the terminal control unit 103 passes the voice to the specific voice signal extraction unit 105 and the call control unit 112. When the terminal control unit 103 receives button input information from the man-machine interface unit 102, the terminal control unit 103 determines call connection instruction information from the button input information, and passes the determined call connection instruction information to the call control unit 112. The terminal control unit 103 receives a processing command related to each of the hold, transfer, and outgoing cases from the call control unit 112, and determines the content of the display processing related to the call connection such as ringing and the voice output instruction from the received processing command Then, the determined instruction content is passed to the man-machine interface unit 102. The terminal control unit 103 receives the specific voice received when the man-machine interface unit 102 starts the specific voice registration process, and passes the received specific voice to the specific voice storage unit 104.

特定音声記憶部１０４は、端末制御部１０３から受け取った特定音声を特定音声記憶テーブル１０４０に登録する。 The specific voice storage unit 104 registers the specific voice received from the terminal control unit 103 in the specific voice storage table 10 40 .

図３は、特定音声記憶部１０４の登録内容例を模式的に表した図である。 FIG. 3 is a diagram schematically illustrating an example of registered contents in the specific voice storage unit 104.

図示するように、特定音声記憶部１０４には、特定音声記憶テーブル１０４０が記憶されている。特定音声記憶テーブル１０４０は対象の特定音声毎に、音声認識サーバに送る特定音声信号のレコード１０４３が登録されている。レコード１０４３は、登録された特定音声を記憶しているフィールド１０４１と、音声認識サーバに送る特定音声信号の取得範囲情報を示しているフィールド１０４２と、を有する。 As shown in the figure, the specific voice storage unit 104 stores a specific voice storage table 1040 . In the specific voice storage table 1040, a record 1043 of a specific voice signal to be sent to the voice recognition server is registered for each target specific voice. The record 1043 has a field 1041 that stores the registered specific voice, and a field 1042 that indicates acquisition range information of the specific voice signal to be sent to the voice recognition server.

特定音声信号抽出部１０５は、端末制御部１０３から送られた音声を、特定音声検索部１０６に特定音声検索を要求する。特定音声信号抽出部１０５は、特定音声検索部１０６から検索結果を受け取ると、検索結果が特定音声であった場合に、当該特定音声の後に続く音声を特定音声信号として音声認識サーバ送信部１０７に送る。 The specific voice signal extraction unit 105 requests the specific voice search unit 106 to perform a specific voice search for the voice sent from the terminal control unit 103. When the specific voice signal extraction unit 105 receives the search result from the specific voice search unit 106, if the search result is the specific voice, the voice following the specific voice is sent to the voice recognition server transmission unit 107 as the specific voice signal. send.

特定音声検索部１０６は、特定音声信号抽出部１０５から特定音声検索が要求された場合、特定音声記憶部１０４の特定音声記憶テーブル１０４０を参照し、当該音声が登録された特定音声であるかどうか検索を行い、検索結果を特定音声信号抽出部１０５に渡す。 When a specific voice search is requested from the specific voice signal extraction unit 105, the specific voice search unit 106 refers to the specific voice storage table 1040 of the specific voice storage unit 104 and determines whether the voice is a registered specific voice. A search is performed, and the search result is passed to the specific audio signal extraction unit 105.

音声認識サーバ送信部１０７は、予め音声認識サーバ３のアドレス情報を記憶し、特定音声信号抽出部１０５から特定音声信号を受け取ると、通信制御部１０１を介して音声認識サーバ３宛に当該特定音声信号を送信する。 Speech recognition server transmission unit 107 stores in advance the address information of the speech recognition server 3, when the specific sound signal takes accept from a particular audio signal extraction unit 105, the addressed speech recognition server 3 via the communication control section 101 A specific audio signal is transmitted.

テキスト受信部１０８は、予め音声認識サーバ３のアドレス情報を記憶し、通信制御部１０１を介して音声認識サーバ３から渡されたテキスト情報を受け取ると、操作キーワード抽出部１０９へ当該テキスト情報を渡す。 Text receiving unit 108 stores in advance the address information of the speech recognition server 3 and via the communication control unit 101 Ru preparative receives text information passed from the speech recognition server 3, the text information to the operation keyword extraction section 109 give.

操作キーワード抽出部１０９は、テキスト受信部１０８から受け取ったテキスト情報から操作キーワードを抽出し、抽出した操作キーワードを処理内容判定部１１０へ渡す。 The operation keyword extraction unit 109 extracts an operation keyword from the text information received from the text reception unit 108 and passes the extracted operation keyword to the processing content determination unit 110.

処理内容判定部１１０は、操作キーワード抽出部１０９から操作キーワードを受け取ると、処理内容記憶部１１１の処理内容判定テーブル１１１０を参照して、当該操作キーワードに対応する処理内容を検索する。検出した処理内容（保留、転送、発信）の処理を呼制御部１１２に指示する。 When receiving the operation keyword from the operation keyword extraction unit 109, the process content determination unit 110 refers to the process content determination table 1110 of the process content storage unit 111 and searches for the process content corresponding to the operation keyword. Detected process content (hold, transfer, outbound) and instructs the processing of the call control unit 112.

処理内容記憶部１１１は、処理内容判定テーブル１１１０を記憶する。図４は、処理内容記憶部１１１の登録内容例を模式的に表した図である。 The processing content storage unit 111 stores a processing content determination table 1110 . FIG. 4 is a diagram schematically illustrating an example of registered content in the processing content storage unit 111.

図示するように、処理内容記憶部１１１には、処理内容判定テーブル１１１０が記憶されている。処理内容判定テーブル１１１０は対象の操作キーワード毎に、情報端末１が自動的に行う処理内容情報のレコード１１１３が登録されている。レコード１１１３は、抽出する操作キーワードを記憶しているフィールド１１１１と、情報端末１が自動的に行う処理内容を記憶しているフィールド１１１２と、を有する。 As illustrated, the processing content storage unit 111 stores a processing content determination table 1110 . In the processing content determination table 1110, a record 1113 of processing content information automatically performed by the information terminal 1 is registered for each target operation keyword. The record 1113 includes a field 1111 that stores an operation keyword to be extracted, and a field 1112 that stores the processing content automatically performed by the information terminal 1.

呼制御部１１２は、一般的な発信、着信、終話時の呼接続に係る処理全般ならびに通話中の音声処理全般を行う。呼制御部１１２は、処理内容判定部１１０からの指示にしたがい、処理を行う。呼制御部１１２は、端末制御部１０３に保留、転送、発信それぞれの場合に関わる処理命令を渡す。 The call control unit 112 performs general processing related to call connection at the time of general outgoing call, incoming call, and call termination, and overall voice processing during a call. The call control unit 112 performs processing according to an instruction from the processing content determination unit 110. The call control unit 112 passes to the terminal control unit 103 processing instructions relating to each of the hold, transfer, and outgoing cases.

図５は、本実施の形態に係る情報端末の動作例を説明するためのシーケンス図である。 FIG. 5 is a sequence diagram for explaining an operation example of the information terminal according to the present embodiment.

尚、本シーケンスは、情報端末１と情報端末４との通話が確立し、通話中の状態から開始する。 Note that this sequence starts from a state in which a call between the information terminal 1 and the information terminal 4 is established and the call is in progress.

情報端末１は、通話中にユーザの音声を監視し、監視中の音声のなかから特定音声を検知すると（Ｓ１０１）、特定音声信号の取得を開始する（Ｓ１０２）。そして、情報端末１は、特定音声信号取得開始後、無音を３秒以上検知すると（Ｓ１０３）、特定音声信号の取得を終了する（Ｓ１０４）。それから、情報端末１は、特定音声信号取得終了後、取得した音声信号から特定音声信号を抽出する（Ｓ１０５）。そして、情報端末１は、特定音声信号抽出後、特定音声信号情報を音声認識サーバ３へ送信する（Ｓ１０６）。 The information terminal 1 monitors the user's voice during a call, and when a specific voice is detected from the monitored voice (S101), the information terminal 1 starts acquiring a specific voice signal (S102). When the information terminal 1 detects silence for 3 seconds or longer after the acquisition of the specific audio signal (S103), the information terminal 1 ends the acquisition of the specific audio signal (S104). Then, the information terminal 1 extracts the specific audio signal from the acquired audio signal after the acquisition of the specific audio signal is completed (S105). Then, after extracting the specific audio signal, the information terminal 1 transmits the specific audio signal information to the audio recognition server 3 (S106).

音声認識サーバ３は、特定音声信号情報を受信すると、受信した特定音声信号をテキスト変換する（Ｓ１０７）。そして、音声認識サーバ３は、特定音声信号をテキスト変換し、変換したテキスト情報を情報端末１へ送信する（Ｓ１０８）。 When the voice recognition server 3 receives the specific voice signal information, the voice recognition server 3 converts the received specific voice signal into text (S107). Then, the voice recognition server 3 converts the specific voice signal into text, and transmits the converted text information to the information terminal 1 (S108).

情報端末１は、テキスト情報を受信すると、受信したテキスト情報の中から操作キーワードを探索する（Ｓ１０９）。そして、情報端末１は、テキスト情報の中から操作キーワードを検出した場合に、対応する処理を実行する（尚、本実施例では、保留処理に係る操作キーワードを検出したものとする。）。情報端末１は、保留処理に係る操作キーワードを検出すると、自動的に保留処理を行う（Ｓ１１０）。 Upon receiving the text information, the information terminal 1 searches for an operation keyword from the received text information (S109). When the operation terminal is detected from the text information , the information terminal 1 executes a corresponding process (in this embodiment, it is assumed that the operation keyword related to the hold process is detected). When the information terminal 1 detects the operation keyword related to the hold process, the information terminal 1 automatically performs the hold process (S110).

図６は、情報端末１の特定音声登録動作を説明するためのフロー図である。本フローは、ユーザによって特定音声登録開始操作が実行された状態からスタートする。 FIG. 6 is a flowchart for explaining the specific voice registration operation of the information terminal 1. This flow starts from a state where the specific voice registration start operation is executed by the user.

マンマシンインタフェース部１０２は、特定音声登録処理を開始し（Ｓ２０１）、特定音声が入力されたか否か判定する（Ｓ２０２）。特定音声が入力されたならば（Ｓ２０２であり）、入力された音声を特定音声記憶部１０４へ渡し、入力されなければ（Ｓ２０２でなし）、特定音声登録動作を終了する。 The man-machine interface unit 102 starts a specific voice registration process (S201), and determines whether or not a specific voice has been input (S202). If a particular sound is input (Yes in S202), it passes the voice input to a particular voice storage unit 104, to be entered (No at S202), and terminates the specific sound registration operation.

特定音声が入力された場合、特定音声記憶部１０４は、マンマシンインタフェース部１０２から入力された特定音声を受け取り、特定音声を登録する（Ｓ２０３）。特定音声を登録後、特定音声登録動作を終了する。 When the specific voice is input, the specific voice storage unit 104 receives the specific voice input from the man-machine interface unit 102 and registers the specific voice (S203) . After registering the specific voice , the specific voice registration operation is terminated.

図７は、情報端末１の音声による端末制御動作を説明するためのフロー図である。 FIG. 7 is a flowchart for explaining the terminal control operation by the voice of the information terminal 1.

本フローは、情報端末１と情報端末４とが、通話中の状態から開始される。 This flow is started when the information terminal 1 and the information terminal 4 are in a call.

特定音声信号抽出部１０５は、特定音声検索部１０６に特定音声検索を要求する。特定音声検索部１０６は、通話中の音声信号から特定音声を検知したかどうかの判定を行い（Ｓ３０１）、判定結果を特定音声信号抽出部１０５に渡す。特定音声信号抽出部１０５は、判定結果より、特定音声の検知が確認できれば（Ｓ３０１でＹＥＳ）、特定音声信号の取得を開始し（Ｓ３０２）、検知が確認できなければ（Ｓ３０１でＮＯ）、検知処理を続ける。 The specific voice signal extraction unit 105 requests the specific voice search unit 106 to perform a specific voice search. The specific voice search unit 106 determines whether a specific voice is detected from the voice signal during a call (S301), and passes the determination result to the specific voice signal extraction unit 105. The specific sound signal extraction unit 105 starts acquisition of the specific sound signal if the detection of the specific sound can be confirmed from the determination result (YES in S301) (S302), and if the detection is not confirmed (NO in S301), the detection is performed. Continue processing.

特定音声信号抽出部１０５は、特定音声信号取得処理を開始後（Ｓ３０２）、３秒以上の無音を検知したかどうかの判定を行い（Ｓ３０３）、検知が確認できれば（Ｓ３０３でＹＥＳ）、特定音声信号取得処理を終了し（Ｓ３０４）、検知が確認できなければ（Ｓ３０３でＮＯ）、検知処理を続ける。 The specific audio signal extraction unit 105 determines whether or not silence for 3 seconds or more has been detected after starting the specific audio signal acquisition process (S302). If the detection can be confirmed (YES in S303), the specific audio signal is extracted. The signal acquisition process ends (S304), and if the detection cannot be confirmed (NO in S303), the detection process is continued.

特定音声信号抽出部１０５は、特定音声信号取得処理を終了後（Ｓ３０４）、取得した特定音声信号を音声認識サーバ送信部１０７に渡す。 The specific audio signal extraction unit 105 passes the acquired specific audio signal to the voice recognition server transmission unit 107 after completing the specific audio signal acquisition process (S304).

音声認識サーバ送信部１０７は、検知した特定音声の後に続く特定音声信号を、通信制御部１０１を介して、音声認識サーバ３に送信する（Ｓ３０５）。 The voice recognition server transmission unit 107 transmits a specific voice signal following the detected specific voice to the voice recognition server 3 via the communication control unit 101 (S305).

テキスト受信部１０８は、音声認識サーバに送信後、３０秒以内にテキスト情報を受信したかどうかの判定を行う（Ｓ３０６）。３０秒以内にテキスト情報を受信した場合（Ｓ３０６でＹＥＳ）は、操作キーワード抽出部１０９にテキスト情報を渡し、操作キーワード抽出部１０９は、受信したテキスト情報から操作キーワードを抽出して、抽出した操作キーワードを処理内容判定部１１０に渡す。操作キーワードを渡された処理内容判定部１１０は渡された操作キーワードが「保留」である操作キーワードか否かを判定する（Ｓ３０７）。一方、３０秒以内にテキスト情報を受信しなかった場合（Ｓ３０６でＮＯ）は、ステップ３０１に戻る。 Text receiving unit 108, after transmitting to the voice recognition server, intends rows of determining whether the received text information within 30 seconds (S306). (In YES S306) is if the received text information within 30 seconds, passing the text information to the operator keyword extraction unit 109, the operation keyword extracting section 109 extracts the operation keyword from the received text information, extracted The operation keyword is passed to the processing content determination unit 110. The processing content determination unit 110 to which the operation keyword has been passed determines whether or not the passed operation keyword is an operation keyword that is “pending” (S307). On the other hand, (NO in S306) If you did not receive text information within 30 seconds, the process returns to step 301.

処理内容判定部１１０は、抽出した操作キーワードが「保留」である場合（Ｓ３０７でＹＥＳ）は、呼制御部１１２に処理内容が「保留」であることを通知する。 Processing content determining unit 110, when the extracted operation keyword Ru der "pending" (YES in S307) notifies the processing contents to the call control unit 112 is "pending".

呼制御部１１２は、処理内容判定部１１０から受け取った「保留」という処理内容から自動的に保留処理を行い（Ｓ３０８）、処理終了後、ステップ３０１に戻る。 The call control unit 112 automatically performs hold processing from the processing content “hold” received from the processing content determination unit 110 (S308), and returns to step 301 after the processing is completed.

処理内容判定部１１０は、抽出した操作キーワードが「保留」でなかった場合（Ｓ３０７でＮＯ）は、受信したテキスト情報から操作キーワードが「発信」であるか判定を行い（Ｓ３０９）、操作キーワードが「発信」である場合（Ｓ３０９でＹＥＳ）は、電話帳検索で該当する人物の電話番号が登録されているかの判定を行う（Ｓ３１０）。 Processing content determination unit 110, when the extracted operation keyword was not a "hold" (NO in S307), a determination whether the operation keyword from the received text information is "transmission" (S309), Operation Keywords but if it is "outgoing" (YES in S309), the intends row a judgment of whether the corresponding phone number of the person is registered in the telephone directory search (S310).

処理内容判定部１１０は、抽出したキーワードが「発信」の場合、テキスト情報から発信先の相手の名前を読み取り、電話帳検索で該当する人物の電話番号が登録されているかを判定し、登録されていれば（Ｓ３１０でＹＥＳ）、「発信」という処理内容と発信先の相手の電話番号を呼制御部１０９に渡し、登録されていなければ（Ｓ３１０でＮＯ）、処理を終了し、ステップ３０１に戻る。 Processing content determining section 110, when the extracted keyword is "outgoing", it is determined whether to read the name of the other party from the text information, the telephone number of the appropriate person in the phone book search has been registered, it is registered If so (YES in S310), the processing content “calling” and the telephone number of the other party of the call destination are passed to the call control unit 109. If not registered (NO in S310), the process ends and the process goes to step 301 . Return.

呼制御部１１２は、処理内容判定部１１０から、「発信」という処理内容と発信先の相手の電話番号を受け取った場合は、自動的に発信先の相手の電話番号のダイヤル入力を行い（Ｓ３１１）、入力された電話番号の相手へ通話する発信処理を行い（Ｓ３１２）、処理終了後、開始時に戻る。 The call control unit 112, the processing content determination unit 110, when receiving the telephone number of the processing content of "transmission" the other party is automatically destination telephone number dialing line physician partner ( (S311), a call process for calling the other party of the input telephone number is performed (S312), and the process returns to the start after the process is completed.

処理内容判定部１１０は、抽出したキーワードから「発信」という処理内容が判定できなかった場合（Ｓ３０９でＮＯ）は、受信したテキスト情報から抽出したキーワードが「転送」であるか否かの判定を行い（Ｓ３１３）、抽出したキーワードが「転送」である場合（Ｓ３１３でＹＥＳ）は、電話帳検索で該当する人物の電話番号が登録されているかの判定を行い（Ｓ３１４）、キーワードが抽出できなかった場合（Ｓ３１３でＮＯ）は、処理を終了し、ステップ３０１に戻る。 Processing content determination unit 110, when the processing content of "Call" can not be determined from the extracted keyword (NO in S309), the determination keyword extracted from the received text information is whether a "transfer" If carried out (S313), the extracted keyword is "transfer" (YES in S313), the phone book makes a determination of whether the corresponding phone number of the person is registered in the search (S314), it can not be extracted keywords and if (NO in S313) ends the process returns to step 301.

処理内容判定部１１０は、電話帳検索で該当する人物の電話番号が登録されているかを判定し、登録されていることが確認できれば（Ｓ３１４でＹＥＳ）、「転送」の処理と転送先の相手の電話番号を呼制御部１１２に渡し、電話帳検索で該当する人物の電話番号が登録されていなければ（Ｓ３１４でＮＯ）、処理を終了し、ステップ３０１に戻る。 Processing content determining section 110, Phonebook the corresponding telephone number of the person in the search, it is determined whether it is registered, if it can be confirmed to have been registered (YES in S314), the destination and the processing of the "transfer" of passing the phone number of the other party to the call controller 1 12, if no appropriate telephone number of the person is registered in the phonebook search (NO at S314), the process ends and returns to step 301.

呼制御部１１２は、処理内容判定部１１０から、「転送」の処理と転送先の相手の電話番号を受け取った場合は、自動的に転送先の相手の電話番号のダイヤル入力を行い（Ｓ３１５）、入力された電話番号の相手へ転送処理を行い（Ｓ３１６）、処理終了後、ステップ３０１に戻る。 The call control unit 112, the processing content determination unit 110, when receiving the treatment and number of transfer recipient of "transfer" is performed automatically dialing the destination party telephone number (S315 ), the transfer process to the other of the input telephone number (S316), after the processing returns to step 301.

以上、本発明の一実施形態を説明した。 The embodiment of the present invention has been described above.

本実施の形態において、情報端末１はユーザからの通話音声の中から、特定音声を検知した場合、そのあとに続く特定音声信号を音声認識サーバ３に送信する。音声認識サーバ３は情報端末１から受信した特定音声信号をテキスト変換し、変換したテキスト情報を情報端末１へ送信する。情報端末１は、音声認識サーバ３からテキスト情報を受信後、テキスト情報から操作キーワードを抽出し、操作キーワードをもとに「保留」、「発信」、「転送」のそれぞれの操作を自動的に制御する。 In the present embodiment, when the information terminal 1 detects a specific voice from the call voice from the user, the information terminal 1 transmits a subsequent specific voice signal to the voice recognition server 3. The voice recognition server 3 converts the specific voice signal received from the information terminal 1 into text, and transmits the converted text information to the information terminal 1. After receiving the text information from the voice recognition server 3, the information terminal 1 extracts the operation keyword from the text information, and automatically performs “hold”, “call”, and “transfer” operations based on the operation keyword. Control.

したがって、本実施の形態によれば、情報端末１が音声による電話操作の自動制御をする場合、ユーザの手操作による誤操作を防止することができる。 Therefore, according to the present embodiment, when the information terminal 1 automatically controls the telephone operation by voice, it is possible to prevent an erroneous operation due to a user's manual operation.

また、本実施の形態において、情報端末１は、通話音声から特定音声を検知すると、特定音声信号取得を開始し、3秒以上の無音を検知すると、特定音声信号取得を終了する。 Moreover, in this Embodiment, the information terminal 1 will start acquisition of a specific audio | voice signal, if a specific audio | voice is detected from a telephone call voice, and will complete | finish acquisition of a specific audio | voice signal if the silence of 3 seconds or more is detected.

したがって、本実施の形態によれば、情報端末１が音声から特定音声信号を自動的に抽出することにより、音声による電話操作の自動制御を実現し、ユーザの手操作による誤操作を防止することができる。 Therefore, according to the present embodiment, the information terminal 1 automatically extracts the specific voice signal from the voice, thereby realizing the automatic control of the telephone operation by the voice and preventing the erroneous operation due to the manual operation of the user. it can.

また、本実施の形態において、情報端末１は、処理内容判定結果に応じて、鳴動等の呼接続に係る表示処理、通話音声出力を行い、キー入力を検知したならば、前記処理内容判定結果に応じて、自端末の動作を制御する。 Further, in the present embodiment, the information terminal 1 performs display processing related to call connection such as ringing and voice communication output according to the processing content determination result, and if the key input is detected, the processing content determination result. The operation of the own terminal is controlled according to the above.

したがって、本実施の形態によれば、情報端末１が処理内容判定結果に応じて自端末の制御を行うことにより、ユーザの手操作による誤操作を防止することができる。 Therefore, according to this Embodiment, the information terminal 1 can control the own terminal according to a processing content determination result, and can prevent the erroneous operation by a user's manual operation.

１、４：情報端末、２：ネットワーク、３：音声認識サーバ、１０１：通信制御部、１０２：マンマシンインタフェース部、１０３：端末制御部、１０４：特定音声記憶部、１０５：特定音声信号抽出部、１０６：特定音声検索部、１０７：音声認識サーバ送信部、１０８：テキスト受信部、１０９：操作キーワード抽出部、１１０：処理内容判定部、１１１：処理内容記憶部、１１２：呼制御部 1, 4: Information terminal, 2: Network, 3: Voice recognition server, 101: Communication control unit, 102: Man-machine interface unit, 103: Terminal control unit, 104: Specific voice storage unit, 105: Specific voice signal extraction unit , 106: specific voice search unit, 107: voice recognition server transmission unit, 108: text reception unit, 109: operation keyword extraction unit, 110: processing content determination unit, 111: processing content storage unit, 112: call control unit

Claims

An information terminal that performs a predetermined operation by a voice command,
A specific voice registration means for registering a specific voice emitted by the user at the head of the voice command;
Specific voice detecting means for detecting the specific voice from voice input to the terminal;
Specific audio signal extracting means for extracting a series of specific audio signals following the detected specific audio;
Specific audio signal transmitting means for transmitting the extracted specific audio signal to a predetermined server;
Processing result receiving means for receiving processing result data corresponding to the transmitted specific audio signal;
Command determination means for analyzing the received processing result data and determining a command related to the operation of the terminal;
Terminal control means for controlling the operation of the terminal according to the determined command;
A voice command compatible information terminal with a specific sound.

The information terminal according to claim 1,
With particular sound, wherein the specific set of speech signal following specific sound which the particular audio signal extracting means to extract is the audio signal to the detection of the silence over a certain time after a specific sound voices that the detected Voice command compatible information terminal.

The information terminal according to claim 1 or 2,
The command determined by the command determination means is displayed on a display unit included in the terminal, monitors input of a specific key or an arbitrary key, and detects the key input, and then detects the key according to the determined command. A voice command compatible information terminal with a specific sound, characterized by controlling the operation of.