TW201919041A

TW201919041A - Speech processing device, speech recognition input system and speech recognition input method

Info

Publication number: TW201919041A
Application number: TW106139071A
Authority: TW
Inventors: 陳定宇
Original assignee: 群光電子股份有限公司
Priority date: 2017-11-10
Filing date: 2017-11-10
Publication date: 2019-05-16
Also published as: TWI650749B

Abstract

A speech recognition input system includes a speech processing device and a computer device. The speech processing device includes a microphone module, an audio transceiver module, a speech recognition module, a processing module and a human interface device. The microphone module outputs an audio signal. The audio transceiver module transmits the audio signal to the speech recognition module. After the speech recognition module recognizes the audio signal, it returns a recognized document to the audio transceiver module. The processing module generates a to-be-transmitted information according to the recognized document, and controls the human interface device to output the to-be-transmitted information. The to-be-transmitted information refers to a control instruction, a to-be-transmitted text or a combination thereof. The computer device includes a processor, which receives the to-be-transmitted information and correspondingly executes the control instruction, or displays to-be-transmitted text or a combination thereof.

Description

Voice processing device, voice recognition input system and voice recognition input method

本發明係關於一種辨識系統，特別是指一種語音處理裝置、語音辨識輸入系統及語音辨識輸入方法。The invention relates to a recognition system, in particular to a speech processing device, a speech recognition input system and a speech recognition input method.

過去使用者在操作電腦時，一般都是藉由鍵盤、滑鼠或其它周邊裝置之協助以完成指令或文字的輸入。然而，隨著語音辨識技術的發展，現今諸多電子裝置（例如桌上型電腦、筆記型電腦、手機、個人數位助理或其它電子裝置）也廣泛使用語音作為文字輸入的方式。In the past, when a user operated a computer, they usually used a keyboard, mouse, or other peripheral device to complete the input of commands or text. However, with the development of speech recognition technology, many electronic devices (such as desktop computers, notebook computers, mobile phones, personal digital assistants or other electronic devices) also widely use speech as a method of text input.

語音辨識在應用方面，大多都是將電腦連接一麥克風，使用者可藉由麥克風將音訊輸入，即可透過語音辨識系統將音訊加以辨識並轉換成文字，以供電腦進行文字輸入的動作。然而，目前的麥克風一般都是經由UAC（USB Audio Class）裝置協定與電腦連接，因此，容易與電腦系統原有之音效裝置產生衝突。In terms of application of speech recognition, most of them connect a computer with a microphone. The user can input audio through the microphone, and the audio can be recognized and converted into text by the speech recognition system for the computer to perform text input. However, the current microphones are generally connected to the computer through the UAC (USB Audio Class) device protocol, so it is easy to conflict with the original sound effects device of the computer system.

舉例來說，假設電腦系統中已內建有麥克風，當電腦通過UAC裝置協定外接語音輸入用的麥克風時，電腦系統即顯示有二個以上的麥克風連接中，此時使用者必須到系統裝置選項中進行設定，以選定目前需要使用的麥克風，否則在進行語音輸入文字時，會造成系統無法正確開啟相應的麥克風或應用程式，導致音樂播放、即時對談可能被使用者語音介入、或者外接麥克風被其他應用程式佔用，導致無法使用語音輸入文字的情形。For example, assuming that a computer system has a built-in microphone, when the computer externally connects a microphone for voice input through the UAC device protocol, the computer system will display more than two microphone connections. At this time, the user must go to the system device option. Setting to select the microphone currently needed, otherwise the system will not be able to open the corresponding microphone or application correctly when inputting text by voice, resulting in music playback, real-time conversation may be voiced by the user, or an external microphone Occupied by other applications, preventing text input by voice.

另一方面，以麥克風輸入語音後，經語音辨識系統辨識後為文字，再將辨識後的文字輸入電腦。然而，大多數的電腦並無預載語音輸入法，因此要採用語音輸入文字時需額外在電腦安裝語音輸入法或是應用程式，使用上較為不便。On the other hand, after inputting voice through a microphone, it is recognized as text by a speech recognition system, and then the recognized text is input to a computer. However, most computers do not have a pre-loaded voice input method, so when you use voice input text, you need to install a voice input method or application on your computer, which is more inconvenient to use.

有鑑於此，本發明提供一種語音辨識輸入系統包括語音處理裝置與電腦裝置。語音處理裝置包括麥克風模組、音訊收發模組、語音辨識模組、處理模組及人機傳輸介面（HID Interface）。麥克風模組輸出音訊信號。音訊收發模組電連接麥克風模組、並與語音辨識模組通訊連接，音訊收發模組接收並傳送音訊信號至語音辨識模組。語音辨識模組辨識音訊信號後對應回傳一已辨識文檔至音訊收發模組。處理模組電連接音訊收發模組與人機傳輸介面，處理模組接收已辨識文檔、並根據已辨識文檔產生一待傳輸資訊，待傳輸資訊是指一控制指令、一待傳輸文字或其組合，且處理模組控制人機傳輸介面輸出待傳輸資訊。電腦裝置包括一處理器，處理器電連接人機傳輸介面以接收待傳輸資訊、並對應執行控制指令、或顯示待傳輸文字或其組合。In view of this, the present invention provides a speech recognition input system including a speech processing device and a computer device. The voice processing device includes a microphone module, an audio transceiver module, a voice recognition module, a processing module, and a human-machine transmission interface (HID Interface). The microphone module outputs audio signals. The audio transceiver module is electrically connected to the microphone module and communicatively connected with the speech recognition module. The audio transceiver module receives and sends audio signals to the speech recognition module. After the speech recognition module recognizes the audio signal, it correspondingly returns an identified document to the audio transceiver module. The processing module is electrically connected to the audio transceiver module and the human-machine transmission interface. The processing module receives the identified document and generates a message to be transmitted according to the identified document. The message to be transmitted refers to a control command, a text to be transmitted, or a combination thereof , And the processing module controls the human-machine transmission interface to output the information to be transmitted. The computer device includes a processor, and the processor is electrically connected to the human-machine transmission interface to receive the information to be transmitted, and executes the control instruction correspondingly, or displays the text to be transmitted or a combination thereof.

於一實施例中，本發明提供一種語音處理裝置包括麥克風模組、音訊收發模組、處理模組及人機傳輸介面（HID Interface）。麥克風模組輸出音訊信號。音訊收發模組電連接於麥克風模組，音訊收發模組接收音訊信號後傳送出去，並接收對應於音訊信號的已辨識文檔。處理模組電連接於音訊收發模組並接收已辨識文檔，處理模組根據已辨識文檔產生待傳輸資訊，待傳輸資訊是指控制指令、待傳輸文字或其組合。人機傳輸介面電連接於處理模組，處理模組控制人機傳輸介面輸出待傳輸資訊。In one embodiment, the present invention provides a voice processing device including a microphone module, an audio transceiver module, a processing module, and a HID Interface. The microphone module outputs audio signals. The audio transceiver module is electrically connected to the microphone module. The audio transceiver module transmits the audio signal after receiving the audio signal, and receives the identified file corresponding to the audio signal. The processing module is electrically connected to the audio transceiver module and receives the identified document. The processing module generates information to be transmitted according to the identified document. The information to be transmitted refers to a control instruction, text to be transmitted, or a combination thereof. The human-machine transmission interface is electrically connected to the processing module, and the processing module controls the human-machine transmission interface to output information to be transmitted.

於一實施例中，本發明提供一種語音辨識輸入方法，包括音訊輸出步驟：輸出音訊信號、語音辨識步驟：語音辨識模組辨識音訊信號後並對應輸出已辨識文檔、處理步驟：處理模組根據已辨識文檔產生待傳輸資訊，其中待傳輸資訊是指一控制指令、一待傳輸文字或其組合、輸出步驟：處理模組控制一人機傳輸介面（HID Interface）輸出待傳輸資訊。In one embodiment, the present invention provides a voice recognition input method, including an audio output step: outputting an audio signal, and a voice recognition step: after the voice recognition module recognizes the audio signal and correspondingly outputs the identified document, the processing step: the processing module according to The identified file generates information to be transmitted, where the information to be transmitted refers to a control instruction, a text to be transmitted or a combination thereof, and an output step: the processing module controls a HID Interface to output the information to be transmitted.

綜上，本發明實施例之語音處理裝置透過人機傳輸介面（HID Interface）與電腦裝置連接，使語音處理裝置與電腦裝置之間能經由HID通訊協定傳輸資料，相較於過去麥克風經由UAC（USB Audio Class）裝置協定與電腦裝置連接的方式來說，語音處理裝置不會與電腦裝置內建的音訊裝置產生衝突，使用者不須到系統裝置選項中選擇音訊裝置，因而原有電腦裝置的音訊設定不會受到變動，達到使用上更加便利。In summary, the voice processing device according to the embodiment of the present invention is connected to a computer device through a human-machine transmission interface (HID Interface), so that the voice processing device and the computer device can transmit data through the HID communication protocol, compared with the previous microphone via UAC ( In terms of the USB Audio Class) device protocol and computer device connection, the voice processing device does not conflict with the built-in audio device of the computer device. The user does not need to select the audio device in the system device option. Therefore, the original computer device ’s Audio settings will not be changed, making it more convenient to use.

圖1為本發明語音辨識輸入系統第一實施例之系統方塊圖。請參見圖1所示，語音辨識輸入系統1包括語音處理裝置10與電腦裝置20。FIG. 1 is a system block diagram of the first embodiment of the speech recognition input system of the present invention. As shown in FIG. 1, the speech recognition input system 1 includes a speech processing device 10 and a computer device 20.

在一些實施例中，電腦裝置20具體上可為個人電腦、筆記型電腦、平板電腦、智慧型手機、導航裝置、車用電腦裝置、個人數位助理、數位電子看板或其他可輸入文字或指令的電子裝置。In some embodiments, the computer device 20 may be a personal computer, a notebook computer, a tablet computer, a smart phone, a navigation device, a car computer device, a personal digital assistant, a digital signage, or other devices capable of inputting text or instructions. Electronic device.

語音處理裝置10可用於連接於電腦裝置20，用以對音訊（如使用者語音或其他聲音）進行辨識，並傳輸命令至電腦裝置20以進行文字輸入或執行動作。在一些實施例中，語音處理裝置10可為一電腦周邊裝置，例如電腦周邊裝置為一鍵盤、一滑鼠、一觸控筆或一揚聲器，使語音處理裝置10可兼具手動的方式操控電腦裝置20進行文字輸入或執行動作。The voice processing device 10 can be connected to the computer device 20 to recognize audio (such as a user's voice or other sounds) and transmit commands to the computer device 20 to perform text input or perform actions. In some embodiments, the voice processing device 10 may be a computer peripheral device. For example, the computer peripheral device is a keyboard, a mouse, a stylus pen, or a speaker, so that the voice processing device 10 can control the computer manually. The device 20 performs text input or performs actions.

如圖1所示，在本實施例中，語音處理裝置10包括麥克風模組11、一音訊收發模組12、一處理模組14以及一人機傳輸介面15（HID Interface）。麥克風模組11用以接收並轉換一外部語音（speech）後對應輸出一音訊信號S，舉例來說，麥克風模組11可擷取使用者的語音並對應轉換為音訊信號S，例如麥克風模組11可連接或內建一信號轉換器，以經由信號轉換器將使用者語音轉換（例如傅立葉轉換公式或其他演算法轉換）為音訊信號S。在一些實施例中，麥克風模組11可為動圈式麥克風、電容式麥克風、駐極體麥克風、微機電麥克風或指向性麥克風。As shown in FIG. 1, in this embodiment, the voice processing device 10 includes a microphone module 11, an audio transceiver module 12, a processing module 14, and a human-machine transmission interface 15 (HID Interface). The microphone module 11 is used for receiving and converting an external speech (speech) and correspondingly outputting an audio signal S. For example, the microphone module 11 can capture a user's voice and correspondingly convert it into an audio signal S, such as a microphone module 11 A signal converter can be connected or built in to convert the user's voice (such as Fourier transform formula or other algorithms) into the audio signal S via the signal converter. In some embodiments, the microphone module 11 may be a dynamic microphone, a condenser microphone, an electret microphone, a micro-electro-mechanical microphone, or a directional microphone.

如圖1所示，語音處理裝置10的音訊收發模組12電連接麥克風模組11，以接收麥克風模組11傳送的音訊信號S。在本實施例中，音訊收發模組12是通過處理模組14間接連接於麥克風模組11，但此並不侷限，音訊收發模組12也可直接連接於麥克風模組11。音訊收發模組12可傳送音訊信號S至一語音辨識模組13以進行語音辨識作業。As shown in FIG. 1, the audio transceiver module 12 of the voice processing device 10 is electrically connected to the microphone module 11 to receive the audio signal S transmitted by the microphone module 11. In this embodiment, the audio transceiver module 12 is indirectly connected to the microphone module 11 through the processing module 14, but this is not limited, and the audio transceiver module 12 may also be directly connected to the microphone module 11. The audio transceiver module 12 can transmit an audio signal S to a voice recognition module 13 for voice recognition operations.

如圖1所示，在一實施例中，語音處理裝置10可通過音訊收發模組12連接於語音辨識模組13。例如，語音辨識模組13可為一近端語音辨識模組，音訊收發模組12為一有線音訊收發模組，以經由電連接線連接語音辨識模組13。或者在圖1的實施例中，語音辨識模組13為一雲端語音辨識模組且通訊連接於網際網路，例如語音辨識模組13是建置在雲端的系統中，音訊收發模組12是以無線或有線方式連接至網際網路以與語音辨識模組13彼此通訊。舉例來說，音訊收發模組12為可為一無線音訊收發模組，其中無線音訊收發模組可為無線遠距模組（例如3G/4G/5G模組、無線電模組或無線區網模組）或無線近距模組（例如WiFi模組、藍牙模組、ZigBee模組），以透過無線的方式連接於網際網路而與語音辨識模組13彼此無線通訊。或者，在另一實施例中，音訊收發模組12也可通過電連接線連接於無線傳輸裝置（例如無線分享器或無線基地台），以經由無線傳輸裝置連接於網際網路以與語音辨識模組13彼此通訊。As shown in FIG. 1, in one embodiment, the voice processing device 10 may be connected to the voice recognition module 13 through the audio transceiver module 12. For example, the voice recognition module 13 may be a near-end voice recognition module, and the audio transceiver module 12 is a wired audio transceiver module, so as to be connected to the voice recognition module 13 through an electrical connection line. Or in the embodiment of FIG. 1, the speech recognition module 13 is a cloud speech recognition module and communicates with the Internet. For example, the speech recognition module 13 is a system built in the cloud, and the audio transceiver module 12 is Connected to the Internet in a wireless or wired manner to communicate with the voice recognition module 13. For example, the audio transceiver module 12 may be a wireless audio transceiver module, and the wireless audio transceiver module may be a wireless long-distance module (such as a 3G / 4G / 5G module, a radio module, or a wireless network module). Group) or a wireless short-range module (such as a WiFi module, a Bluetooth module, a ZigBee module), and wirelessly communicate with the voice recognition module 13 by being connected to the Internet in a wireless manner. Or, in another embodiment, the audio transceiver module 12 may also be connected to a wireless transmission device (such as a wireless sharer or a wireless base station) through an electrical connection line, so as to connect to the Internet via the wireless transmission device for voice recognition. The modules 13 communicate with each other.

另外，在圖1的實施例中，藉由語音辨識模組13為一雲端語音辨識模組，使語音處理裝置10與電腦裝置20不須額外安裝語音辨識軟體，且語音處理裝置10連接於不同的電腦裝置20時，不用再重新進行語音訓練，使語音處理裝置10達到隨插即用的功能而大幅提升使用上的便利性。In addition, in the embodiment of FIG. 1, the speech recognition module 13 is a cloud speech recognition module, so that the speech processing device 10 and the computer device 20 do not need to install additional speech recognition software, and the speech processing device 10 is connected to different When the computer device 20 is used, it is no longer necessary to perform voice training again, so that the voice processing device 10 can achieve the plug-and-play function and greatly improve the convenience in use.

在另一實施例中，語音辨識模組13也可內建於語音處理裝置10中。例如圖2所示，為本發明語音辨識輸入系統第二實施例之系統方塊圖，本實施例之語音辨識輸入系統2的語音處理裝置10內建有語音辨識模組13’，例如語音辨識模組13’為具備運算能力的硬體或韌體（例如數位訊號處理器或可程式化邏輯裝置等），音訊收發模組12為一有線音訊收發模組，以經由電連接線連接語音辨識模組13’。In another embodiment, the speech recognition module 13 may also be built into the speech processing device 10. For example, FIG. 2 is a system block diagram of the second embodiment of the speech recognition input system of the present invention. The speech processing device 10 of the speech recognition input system 2 of this embodiment has a built-in speech recognition module 13 ', such as a speech recognition module. Group 13 'is hardware or firmware with computing capability (such as a digital signal processor or a programmable logic device, etc.), and the audio transceiver module 12 is a wired audio transceiver module, which is connected to the voice recognition module through an electrical connection line. Group 13 '.

再如圖1所示，在本實施例中，語音辨識模組13辨識音訊信號S後即對應回傳一已辨識文檔T至音訊收發模組12。舉例來說，語音辨識模組13可利用統計模式識別技術進行語音辨識作業，例如語音辨識模組13可內建有信號處理單元、聲學模型、發音詞典、語言模型及解碼器等（圖面省略繪示），其中信號處理單元用以從音訊信號S中提取特徵供聲學模型使用。聲學模型例如可採用隱藏式馬可夫（Hidden Markov Model，HMM）模型進行建模。語言模型對所針對的語言進行建模。發音詞典包含多個詞彙集及其發音，用以提供聲學模型與語言模型之間的映射。解碼器根據聲學模型、語言模型及發音詞典，尋找出音訊信號S對應的文字，以將音訊信號S轉換為已辨識文檔T（Text）。其中所述已辨識文檔T中可包含字元、符號、標點符號、數字、字詞、字串或由多個字串所組成的詞句等等。As shown in FIG. 1 again, in this embodiment, after the speech recognition module 13 recognizes the audio signal S, it correspondingly returns an identified document T to the audio transceiver module 12. For example, the speech recognition module 13 may use statistical pattern recognition technology to perform speech recognition operations. For example, the speech recognition module 13 may have a built-in signal processing unit, an acoustic model, a pronunciation dictionary, a language model, and a decoder (the figure is omitted) (Illustrated), wherein the signal processing unit is used to extract features from the audio signal S for use by the acoustic model. The acoustic model may be modeled using, for example, a Hidden Markov Model (HMM) model. The language model models the target language. The pronunciation dictionary contains multiple vocabulary sets and their pronunciations to provide mapping between acoustic models and language models. The decoder searches for the text corresponding to the audio signal S according to the acoustic model, the language model and the pronunciation dictionary, so as to convert the audio signal S into a recognized document T (Text). The identified document T may include characters, symbols, punctuation marks, numbers, words, strings, or phrases composed of multiple strings, and so on.

再如圖1所示，處理模組14電連接音訊收發模組12與人機傳輸介面15，處理模組14接收並根據已辨識文檔T產生一待傳輸資訊I，其中待傳輸資訊I是指一控制指令、一待傳輸文字或其組合。具體而言，待傳輸文字可為純文字形式，例如待傳輸文字可包括字元、符號、標點符號、數字、字詞、字串或由多個字串所組成的詞句等使用者欲輸入的文字。控制指令可為一訊號形式以對應控制電腦裝置20執行特定之動作，以電腦裝置20為個人電腦、筆記型電腦或智慧型手機來說，可對應執行文檔搜尋、啟閉程式、磁碟重組、音量控制、關機、重新開機等特定動作。若以電腦裝置20為導航裝置來說，可對應執行路線規劃、儲存地點、搜尋車位等特定動作。也就是說，處理模組14可根據已辨識文檔T的內容判斷是控制指令、待傳輸文字還是兩者的組合，此容後詳述。As shown in FIG. 1, the processing module 14 is electrically connected to the audio transceiver module 12 and the human-machine transmission interface 15. The processing module 14 receives and generates an information to be transmitted I according to the identified file T, where the information to be transmitted I refers to A control instruction, a text to be transmitted, or a combination thereof. Specifically, the text to be transmitted may be in the form of plain text. For example, the text to be transmitted may include characters, symbols, punctuation marks, numbers, words, strings, or phrases composed of multiple strings. Text. The control instruction may be a signal form to control the computer device 20 to perform a specific action. For the computer device 20 as a personal computer, a notebook computer, or a smart phone, it may execute a file search, an opening / closing program, a disk reorganization, Specific actions such as volume control, shutdown, restart. If the computer device 20 is used as a navigation device, specific actions such as route planning, storage, and searching for parking spaces can be performed correspondingly. That is, the processing module 14 can determine whether it is a control instruction, a text to be transmitted, or a combination of the two according to the content of the recognized document T, which will be described in detail later.

在一實施例中，語音處理裝置10的處理模組14具體上可為具備運算能力的硬體，例如中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之微處理器(Microprocessor)、數位訊號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuits，ASIC)、可程式化邏輯裝置(Programmable Logic Device，PLD)或其他類似裝置。In an embodiment, the processing module 14 of the speech processing device 10 may be specifically hardware capable of computing, such as a Central Processing Unit (CPU), or other programmable microprocessor (Microprocessor ), Digital Signal Processor (DSP), Programmable Controller, Application Specific Integrated Circuits (ASIC), Programmable Logic Device (PLD), or other similar devices .

再如圖1所示，電腦裝置20包括有處理器21，處理器21電連接人機傳輸介面15以接收待傳輸資訊I、並對應執行控制指令、或顯示待傳輸文字或其組合。藉此，本發明實施例之語音處理裝置10透過人機傳輸介面15（HID Interface）與電腦裝置20連接，使語音處理裝置10與電腦裝置20之間能經由HID通訊協定傳輸資料，相較於過去麥克風經由UAC（USB Audio Class）裝置協定與電腦裝置20連接的方式來說，語音處理裝置10不會與電腦裝置20內建的音訊裝置（如麥克風、喇叭）產生衝突。詳言之，假設電腦裝置20為個人電腦或筆記型電腦，其系統音效裝置中不會顯示有新的音效裝置加入，因而不會跟電腦裝置20內建的音訊裝置產生衝突而有誤動作的情形，使用者也不須到裝置選項中選擇須執行的音訊裝置，藉此，原有電腦裝置20的音訊設定不會受到變動，達到在使用上更加便利之優點。As shown in FIG. 1 again, the computer device 20 includes a processor 21, and the processor 21 is electrically connected to the human-machine transmission interface 15 to receive the information I to be transmitted, and executes the control instruction correspondingly, or displays the text to be transmitted or a combination thereof. As a result, the voice processing device 10 according to the embodiment of the present invention is connected to the computer device 20 through a human-machine transmission interface 15 (HID Interface), so that the voice processing device 10 and the computer device 20 can transmit data through the HID communication protocol, compared to In the past, in the manner that the microphone was connected to the computer device 20 through a UAC (USB Audio Class) device protocol, the voice processing device 10 would not conflict with the audio devices (such as a microphone and a speaker) built in the computer device 20. In detail, assuming that the computer device 20 is a personal computer or a notebook computer, the system audio device will not show the addition of a new audio device, so it will not conflict with the built-in audio device of the computer device 20 and malfunction. The user does not need to select the audio device to be executed in the device option, thereby the audio setting of the original computer device 20 will not be changed, and the advantage of more convenience in use is achieved.

為了更清楚說明上述實施例之操作步驟，請參見圖5所示，其中圖5為本發明語音辨識輸入方法第一實施例之步驟流程圖，以下所提之硬體結構可對照上述語音辨識輸入系統1所揭示，先此敘明。本實施例之在語音辨識輸入的過程中，首先是先進行一音訊輸出步驟S1：輸出一音訊信號S，具體而言，請對照圖1所示，使用者可將語音處理裝置10通過人機傳輸介面15連接至電腦裝置20，當使用者需要控制電腦裝置20執行動作或輸入文字時，可對著麥克風模組11發出聲音或說出話語，麥克風模組11即可將聲音擷取並轉換為音訊信號S而輸出。In order to explain the operation steps of the above embodiment more clearly, please refer to FIG. 5, where FIG. 5 is a flowchart of the steps of the first embodiment of the speech recognition input method of the present invention. The hardware structure mentioned below can be compared with the above speech recognition input. Revealed by System 1, first described here. In the process of voice recognition input of this embodiment, first, an audio output step S1 is performed: an audio signal S is output. Specifically, as shown in FIG. 1, the user can pass the voice processing device 10 through a human-machine The transmission interface 15 is connected to the computer device 20. When the user needs to control the computer device 20 to perform actions or input text, he can make a sound or speak a speech to the microphone module 11, and the microphone module 11 can capture and convert the sound Output for audio signal S.

在音訊輸出步驟S1後，接著可進行語音辨識步驟S2：語音辨識模組13辨識音訊信號S後並對應輸出一已辨識文檔T。例如在圖1與圖2的實施例中，麥克風模組11可傳送音訊信號S至音訊收發模組12，以經由音訊收發模組12傳送至雲端的語音辨識模組13（請見圖1）或者語音處理裝置10內建的語音辨識模組13’（請見圖2），以辨識音訊信號S並轉換為文字而輸出已辨識文檔T。After the audio output step S1, a voice recognition step S2 may be performed: the voice recognition module 13 recognizes the audio signal S and outputs a recognized file T correspondingly. For example, in the embodiments of FIGS. 1 and 2, the microphone module 11 can transmit the audio signal S to the audio transceiver module 12 to be transmitted to the cloud speech recognition module 13 via the audio transceiver module 12 (see FIG. 1). Or the speech recognition module 13 ′ (see FIG. 2) built in the speech processing device 10 recognizes the audio signal S and converts it into text to output the recognized document T.

在語音辨識步驟S2後，接著可進行處理步驟S3：處理模組14根據已辨識文檔T產生一待傳輸資訊I。例如處理模組14可根據已辨識文檔T的內容判斷待傳輸資訊I是控制指令、待傳輸文字還是兩者的組合。After the speech recognition step S2, a processing step S3 may be performed: the processing module 14 generates a message I to be transmitted according to the recognized document T. For example, the processing module 14 may determine whether the information to be transmitted I is a control instruction, a text to be transmitted, or a combination of the two according to the content of the identified document T.

在處理步驟S3後，接著可進行輸出步驟S4：處理模組14控制一人機傳輸介面15（HID Interface）輸出待傳輸資訊I。藉此，語音處理裝置10與電腦裝置20之間即可經由HID通訊協定傳輸資料，以避免與電腦裝置20內建的音訊裝置（如麥克風、喇叭）產生衝突，使用者也不須到電腦裝置20的裝置選項中選擇須執行的音訊裝置，在使用上更加便利。After processing step S3, an output step S4 may be performed: the processing module 14 controls a human-machine transmission interface 15 (HID Interface) to output information I to be transmitted. In this way, data can be transmitted between the voice processing device 10 and the computer device 20 via the HID communication protocol, so as to avoid conflicts with audio devices (such as microphones and speakers) built in the computer device 20, and the user does not need to go to the computer device Choose the audio device to be executed from the 20 device options, which is more convenient to use.

再如圖5所示，本實施例在輸出步驟S4後，更進行執行步驟S5：電腦裝置20接收待傳輸資訊I、並對應執行控制指令、或顯示待傳輸文字或其組合。具體而言，假設處理模組14輸出的是控制指令時，電腦裝置20則執行相應的特定動作，例如執行文檔搜尋或音量控制等動作。當處理模組14輸出的是待傳輸文字時，電腦裝置20則對應輸入待傳輸文字而顯示於螢幕上。在一實施例中，處理模組14也可能同時輸出控制指令及待傳輸文字，以驅使電腦裝置20執行動作與輸入文字，例如電腦裝置20可開啟文檔並在文檔中輸入待傳輸文字、或者電腦裝置20開啟搜尋引擎並輸入待傳輸文字以進行搜尋作業。As shown in FIG. 5 again, after outputting step S4 in this embodiment, step S5 is further performed: the computer device 20 receives the information to be transmitted I, executes the control instruction correspondingly, or displays the text to be transmitted or a combination thereof. Specifically, when the processing module 14 outputs a control instruction, the computer device 20 performs a corresponding specific action, for example, a document search or a volume control operation. When the processing module 14 outputs the text to be transmitted, the computer device 20 correspondingly inputs the text to be transmitted and displays it on the screen. In an embodiment, the processing module 14 may also output control instructions and text to be transmitted at the same time, so as to drive the computer device 20 to perform actions and input text. For example, the computer device 20 may open a document and enter the text to be transmitted in the document, or the computer The device 20 starts a search engine and enters text to be transmitted for a search operation.

在一些實施例中，處理模組14可經由下述多種方式根據已辨識文檔T的內容判斷待傳輸資訊I是控制指令、待傳輸文字還是兩者的組合。In some embodiments, the processing module 14 can determine whether the information to be transmitted I is a control instruction, a text to be transmitted, or a combination of the two according to the content of the identified document T in the following ways.

處理模組14可依據已辨識文檔T是否包括觸發指令，來控制人機傳輸介面15輸出待傳輸文字或輸出控制指令。例如圖6所示，為本發明語音辨識輸入方法第二實施例之步驟流程圖。在本實施例中，輸出步驟S4可包含三個子步驟，首先子步驟S41：處理模組14可先判斷已辨識文檔T是否包括觸發指令，當已辨識文檔T包括觸發指令時，即進行子步驟S42：控制人機傳輸介面15輸出待傳輸文字，當已辨識文檔T不包括觸發指令時，即進行子步驟S43：控制人機傳輸介面15輸出控制指令。舉例來說，請對照圖1所示，假設「語音輸入」的字詞為觸發電腦輸入文字的條件時，當使用者對麥克風模組11說出：「語音輸入，電腦重新啟動」，語音辨識模組13即可辨識上述語音而轉換為文字形式的已辨識文檔T，處理模組14即可根據「語音輸入」的字詞產生待傳輸文字，其中待傳輸文字為「電腦重新啟動」之字串，由於「語音輸入」為觸發電腦輸入文字的觸發指令，因此，處理模組14僅會控制人機傳輸介面15輸出「電腦重新啟動」的字詞以供電腦裝置20輸入而顯示於螢幕上，而非在電腦螢幕上顯示「語音輸入，電腦重新啟動」的字詞。在另一實施例中，假設已辨識文檔T不包括上述「語音輸入」的字詞時，處理模組14即控制人機傳輸介面15輸出對應「電腦重新啟動」內容的控制指令以對應使電腦裝置20進行重新開機的動作。當然，在其他實施例中，已辨識文檔T中也可包含觸發指令及其他控制指令（如搜尋文件或音量控制等），處理模組14可同時控制人機傳輸介面15輸出控制指令與待傳輸文字，此並不侷限。於另一些實施例中，亦可在處理模組14判斷已辨識文檔T中包括觸發指令時，控制人機傳輸介面15輸出控制指令，而在已辨識文檔T中不包括觸發指令時，輸出待傳輸文字。藉此，本發明實施例透過處理模組14能根據已辨識文檔T的內容判斷待傳輸資訊I是控制指令、待傳輸文字還是兩者的組合，可避免電腦裝置20產生誤動作的情形（例如使用者欲透過語音控制電腦裝置20進行特定動作時，電腦裝置20卻根據語音內容輸入文字的情形），從而本發明實施例能達到更精確的語音辨識輸入效果。The processing module 14 can control the human-machine transmission interface 15 to output a text to be transmitted or output a control instruction according to whether the identified document T includes a trigger instruction. For example, FIG. 6 is a flowchart of steps in a second embodiment of a speech recognition input method according to the present invention. In this embodiment, the output step S4 may include three sub-steps. First, the sub-step S41: The processing module 14 may first determine whether the identified document T includes a trigger instruction. When the identified document T includes a trigger instruction, the sub-step is performed. S42: Control the human-machine transmission interface 15 to output the text to be transmitted. When the identified document T does not include a trigger instruction, perform sub-step S43: control the human-machine transmission interface 15 to output a control instruction. For example, please refer to FIG. 1. Assume that the word “speech input” is a condition that triggers the computer to input text. When the user speaks to the microphone module 11: “speech input, computer restarts”, speech recognition Module 13 can recognize the voice and convert it into a recognized document T in text form. Processing module 14 can generate the text to be transmitted according to the words of "voice input", where the text to be transmitted is the word "computer restart" String, because "voice input" is a trigger command that triggers computer input of text, the processing module 14 only controls the human-machine transmission interface 15 to output the word "computer restart" for input by the computer device 20 and is displayed on the screen Instead of the words "Speech, computer restart" on the computer screen. In another embodiment, assuming that the recognized document T does not include the above-mentioned "voice input", the processing module 14 controls the human-machine transmission interface 15 to output a control instruction corresponding to the content of "computer restart" to correspond to the computer The device 20 performs a restart operation. Of course, in other embodiments, the identified file T may also include trigger instructions and other control instructions (such as searching for files or volume control). The processing module 14 may simultaneously control the human-machine transmission interface 15 to output control instructions and to be transmitted. Text, this is not limited. In other embodiments, when the processing module 14 determines that the identified document T includes a trigger instruction, the control panel 15 is controlled to output a control instruction, and when the identified document T does not include a trigger instruction, it outputs Transfer text. Thus, in the embodiment of the present invention, the processing module 14 can determine whether the information to be transmitted I is a control instruction, text to be transmitted, or a combination of the two based on the content of the identified document T, and can prevent the computer device 20 from malfunctioning (for example, using When the user wants to control the computer device 20 to perform a specific action through voice, the computer device 20 enters text according to the voice content), so that the embodiment of the present invention can achieve a more accurate voice recognition input effect.

或者，語音辨識輸入系統3的語音處理裝置10可包括一開關模組16，其中開關模組16可為按鈕開關、微動開關、撥動開關、薄膜開關、磁性開關等等，使用者可操控開關模組16以對應輸出開關訊號。處理模組14可依據是否接收到開關訊號，控制人機傳輸介面15輸出待傳輸文字或輸出控制指令。如圖7所示，為本發明語音辨識輸入方法第三實施例之步驟流程圖。在本實施例中，輸出步驟S4可包含三個子步驟，首先子步驟S44：處理模組14是否收到開關訊號，若是，即進行子步驟S45：控制人機傳輸介面15輸出待傳輸文字。若否，即進行子步驟S46：控制人機傳輸介面15輸出控制指令。具體而言，請對照圖3所示，為本發明語音辨識輸入系統第三實施例之系統方塊圖。使用者可操控開關模組16並且對麥克風模組11說出一段文字，語音辨識模組13即可辨識上述語音而轉換為文字形式的已辨識文檔T，處理模組14即可根據開關訊號控制人機傳輸介面15輸出待傳輸文字（即該段文字）。反之，當使用者未操控開關模組16而僅對麥克風模組11說出該段文字時，處理模組14則根據該段文字的內容輸出控制指令。於一些實施例中，亦可在處理模組14收到開關訊號時，控制人機傳輸介面15輸出控制指令；在處理模組14未收到開關訊號時，則控制人機傳輸介面15輸出待傳輸文字，並不以此為限。Alternatively, the speech processing device 10 of the speech recognition input system 3 may include a switch module 16, wherein the switch module 16 may be a button switch, a micro switch, a toggle switch, a membrane switch, a magnetic switch, etc., and the user can control the switch The module 16 corresponds to the output switching signal. The processing module 14 can control the human-machine transmission interface 15 to output the text to be transmitted or output a control instruction according to whether the switch signal is received. As shown in FIG. 7, it is a flowchart of steps in the third embodiment of the speech recognition input method of the present invention. In this embodiment, the output step S4 may include three sub-steps. First, sub-step S44: whether the processing module 14 receives a switch signal; if so, proceed to sub-step S45: control the human-machine transmission interface 15 to output the text to be transmitted. If not, proceed to sub-step S46: control the human-machine transmission interface 15 to output a control instruction. Specifically, please refer to FIG. 3, which is a system block diagram of the third embodiment of the speech recognition input system of the present invention. The user can manipulate the switch module 16 and speak a piece of text to the microphone module 11, and the speech recognition module 13 can recognize the above-mentioned speech and convert it into a recognized document T in text form. The processing module 14 can control according to the switch signal. The human-machine transmission interface 15 outputs the text to be transmitted (that is, the text). Conversely, when the user does not control the switch module 16 and only speaks the text to the microphone module 11, the processing module 14 outputs a control instruction according to the content of the text. In some embodiments, when the processing module 14 receives the switching signal, the human-machine transmission interface 15 may be controlled to output a control instruction; when the processing module 14 does not receive the switching signal, the human-machine transmission interface 15 may be controlled to output the standby instruction. Text is not limited.

如圖8所示，為本發明語音辨識輸入方法第四實施例之步驟流程圖。本實施例相較於圖5之實施例來說，本實施例的語音辨識步驟S2’ 更包括：語意分析單元131分析已辨識文檔T並對應輸出一指令文字、一非指令文字或其組合。本實施例的處理步驟S3’包括：處理模組14根據已辨識文檔T中的指令文字產生控制指令、或根據非指令文字以產生待傳輸文字、或其組合。舉例來說，請對照圖4所示，為本發明語音辨識輸入系統第四實施例之系統方塊圖。在本實施例中，語音辨識輸入系統4之語音處理裝置10的語音辨識模組13可包括語意分析單元131以對應分析已辨識文檔T。在一些實施例中，語意分析單元131具體上可為具備運算能力的硬體，例如可程式化之微處理器(Microprocessor)、數位訊號處理器(Digital Signal Processor，DSP)等。As shown in FIG. 8, it is a flowchart of steps in the fourth embodiment of the speech recognition input method of the present invention. Compared with the embodiment of FIG. 5, the speech recognition step S2 ′ of this embodiment further includes: the semantic analysis unit 131 analyzes the recognized document T and outputs a command text, a non-command text or a combination thereof correspondingly. The processing step S3 'of this embodiment includes: the processing module 14 generates a control instruction according to the instruction text in the recognized document T, or generates a text to be transmitted according to a non-instruction text, or a combination thereof. For example, please refer to FIG. 4, which is a system block diagram of the fourth embodiment of the speech recognition input system of the present invention. In this embodiment, the speech recognition module 13 of the speech processing device 10 of the speech recognition input system 4 may include a semantic analysis unit 131 to correspondingly analyze the recognized document T. In some embodiments, the semantic analysis unit 131 may specifically be hardware capable of computing, such as a programmable microprocessor (Microprocessor), a digital signal processor (DSP), and the like.

承上，語意分析技術是指將字句、字串或段落內容，從其中分析出摘要及大意。利用語意分析可不侷限使用者用詞，只要滿足一般語言語法即能分析辨別。在一些實施例中，語意分析單元131可通過奇異值分解（Singular Value Decompositiob, SVD）、非負矩陣拆解法（Non-negative matrix factorization, NMF）或類神經網絡（Neural Network, NN）或其他演算法進行語意分析。具體來說，透過分析已辨識文檔T的語意即可得知使用者的意圖，例如當已辨識文檔T的一字串中包含「搜尋」、「開啟」、「關閉」、「控制」或「調整」等動詞時，即可透過語意分析以判斷使用者有操控電腦裝置20進行動作之意圖，進而將該字串輸出並表示為一指令文字，處理模組14即可對應指令文字的內容產生控制指令。當已辨識文檔T的一字串中包含「輸入」字詞時，即可透過語意分析以判斷使用者有輸入文字之意圖，進而將該字串輸出並表示為一非指令文字，處理模組14即可對應非指令文字的內容產生待傳輸文字。In the past, semantic analysis technology refers to analyzing the content of a sentence, a string or a paragraph, and extracting the summary and the meaning from it. The use of semantic analysis does not limit the user's words, as long as the general language grammar is satisfied, it can be analyzed and discriminated. In some embodiments, the semantic analysis unit 131 may use singular value decomposition (SVD), non-negative matrix factorization (NMF), or neural network (NN) or other calculations. Method for semantic analysis. Specifically, the user's intention can be known by analyzing the semantics of the identified document T, such as when a string of the identified document T includes "search", "open", "close", "control", or " When adjusting verbs such as “adjustment”, you can use semantic analysis to determine that the user intends to manipulate the computer device 20 to perform actions, and then output the string as an instruction text, and the processing module 14 can generate the content of the instruction text correspondingly. Control instruction. When the word "input" is included in a string of the recognized document T, semantic analysis can be used to determine that the user has the intention to enter text, and then the string is output and represented as a non-command text. The processing module 14 can generate the text to be transmitted corresponding to the content of the non-command text.

如圖9所示，為本發明語音辨識輸入方法第五實施例之步驟流程圖。本實施例相較於圖5之實施例來說，本實施例的處理步驟S3’’中更包括：將待傳輸文字編制為萬國碼格式(Unicode)。例如在圖3之實施例中，處理模組14包括一編碼單元141（例如文字編輯器），以將待傳輸文字編制為萬國碼格式。由於萬國碼格式的文字可適用各種電腦裝置20的系統，因此，電腦裝置20可直接讀取待傳輸文字並進行輸入作業，不需要再額外安裝其他語音輸入法，使語音輸入更加快速、便利。As shown in FIG. 9, it is a flowchart of steps in a fifth embodiment of a speech recognition input method according to the present invention. Compared with the embodiment of FIG. 5, the processing step S3 '' in this embodiment further includes: compiling the text to be transmitted into a universal code format (Unicode). For example, in the embodiment of FIG. 3, the processing module 14 includes an encoding unit 141 (such as a text editor) to compile the text to be transmitted into a universal code format. Since the characters in the universal code format can be applied to various systems of the computer device 20, the computer device 20 can directly read the text to be transmitted and perform input operations, and does not need to additionally install other voice input methods, making voice input faster and more convenient.

如圖10所示，為本發明語音辨識輸入方法第六實施例之步驟流程圖。本實施例相較於圖5之實施例來說，本實施例在語音辨識步驟S2後更包括一翻譯步驟S6：翻譯單元132分析音訊信號S並輸出一翻譯文字，已辨識文檔T中包括翻譯文字。具體而言，在圖3之實施例中，語音辨識模組13可包括一翻譯單元132，以根據使用者的需求將音訊信號S解讀後並翻譯為另外一種語言文字。As shown in FIG. 10, it is a flowchart of steps in a sixth embodiment of a speech recognition input method according to the present invention. Compared with the embodiment of FIG. 5, this embodiment further includes a translation step S6 after the speech recognition step S2: the translation unit 132 analyzes the audio signal S and outputs a translated text. The recognized document T includes a translation Text. Specifically, in the embodiment of FIG. 3, the speech recognition module 13 may include a translation unit 132 to decode the audio signal S and translate it into another language according to the needs of the user.

雖然本發明的技術內容已經以較佳實施例揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神所作些許之更動與潤飾，皆應涵蓋於本發明的範疇內，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the technical content of the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art and making some changes and retouching without departing from the spirit of the present invention should be covered by the present invention. Therefore, the scope of protection of the present invention shall be determined by the scope of the appended patent application.

1~4‧‧‧語音辨識輸入系統 1 ~ 4‧‧‧speech recognition input system

10‧‧‧語音處理裝置 10‧‧‧Voice processing device

11‧‧‧麥克風模組 11‧‧‧ Microphone Module

12‧‧‧音訊收發模組 12‧‧‧Audio transceiver module

13、13’‧‧‧語音辨識模組 13, 13’‧‧‧ Voice recognition module

131‧‧‧語意分析單元 131‧‧‧ Semantic Analysis Unit

132‧‧‧翻譯單元 132‧‧‧ Translation Unit

14‧‧‧處理模組 14‧‧‧Processing Module

141‧‧‧編碼單元 141‧‧‧coding unit

15‧‧‧人機傳輸介面 15‧‧‧ HMI

16‧‧‧開關模組 16‧‧‧Switch Module

20‧‧‧電腦裝置 20‧‧‧Computer device

21‧‧‧處理器 21‧‧‧Processor

S‧‧‧音訊信號 S‧‧‧Audio signal

T‧‧‧已辨識文檔 T‧‧‧Identified documents

I‧‧‧待傳輸資訊 I‧‧‧ Information to be transmitted

S1‧‧‧音訊輸出步驟 S1‧‧‧Audio output steps

S2、S2’‧‧‧語音辨識步驟 S2, S2’‧‧‧‧Speech recognition steps

S3、S3’、S3’’‧‧‧處理步驟 S3, S3 ’, S3’’‧‧‧ processing steps

S4‧‧‧輸出步驟 S4‧‧‧Output steps

S41~S46‧‧‧子步驟 S41 ~ S46‧‧‧‧Sub-step

S5‧‧‧執行步驟 S5‧‧‧ implementation steps

S6‧‧‧翻譯步驟 S6‧‧‧Translation steps

[圖1] 係本發明語音辨識輸入系統第一實施例之系統方塊圖。 [圖2] 係本發明語音辨識輸入系統第二實施例之系統方塊圖。 [圖3] 係本發明語音辨識輸入系統第三實施例之系統方塊圖。 [圖4] 係本發明語音辨識輸入系統第四實施例之系統方塊圖。 [圖5] 係本發明語音辨識輸入方法第一實施例之步驟流程圖。 [圖6] 係本發明語音辨識輸入方法第二實施例之步驟流程圖。 [圖7] 係本發明語音辨識輸入方法第三實施例之步驟流程圖。 [圖8] 係本發明語音辨識輸入方法第四實施例之步驟流程圖。 [圖9] 係本發明語音辨識輸入方法第五實施例之步驟流程圖。 [圖10] 係本發明語音辨識輸入方法第六實施例之步驟流程圖。[FIG. 1] A system block diagram of the first embodiment of the speech recognition input system of the present invention. [FIG. 2] It is a system block diagram of the second embodiment of the speech recognition input system of the present invention. [FIG. 3] A system block diagram of the third embodiment of the speech recognition input system of the present invention. [FIG. 4] A system block diagram of the fourth embodiment of the speech recognition input system of the present invention. [FIG. 5] It is a flowchart of steps in the first embodiment of the speech recognition input method of the present invention. [FIG. 6] It is a flowchart of the steps of the second embodiment of the speech recognition input method of the present invention. [FIG. 7] It is a flowchart of the steps of the third embodiment of the speech recognition input method of the present invention. [FIG. 8] It is a flowchart of the steps of the fourth embodiment of the speech recognition input method of the present invention. [FIG. 9] It is a flowchart of steps in the fifth embodiment of the speech recognition input method of the present invention. [FIG. 10] It is a flowchart of steps in the sixth embodiment of the speech recognition input method of the present invention.

Claims

A voice recognition input system includes: a voice processing device including a microphone module, an audio transceiver module, a voice recognition module, a processing module, and a HID Interface. The microphone module outputs An audio signal, the audio transceiver module is electrically connected to the microphone module, and is communicatively connected with the voice recognition module, the audio transceiver module receives and sends the audio signal to the voice recognition module, and the voice recognition module recognizes The audio signal correspondingly returns an identified document to the audio transceiver module, the processing module electrically connects the audio transceiver module and the human-machine transmission interface, and the processing module receives the identified document, and according to the The identification file generates information to be transmitted, the information to be transmitted refers to a control instruction, a text to be transmitted, or a combination thereof, and the processing module controls the human-machine transmission interface to output the information to be transmitted; and a computer device including a A processor, which is electrically connected to the human-machine transmission interface to receive the information to be transmitted, and executes the control instruction correspondingly, or displays the control instruction Transmission of text, or combinations thereof.

The speech recognition input system according to claim 1, wherein the audio transceiver module is a wireless audio transceiver module or a wired audio transceiver module.

The speech recognition input system according to claim 1, wherein the speech recognition module is a cloud speech recognition module and the communication is connected to the Internet, and the audio transceiver module is connected to the Internet in a wireless or wired manner to Communicate with the speech recognition module.

The speech recognition input system according to claim 1, wherein the speech recognition module includes a translation unit that analyzes the audio signal and outputs a translated text, and the recognized document includes the translated text.

The speech recognition input system according to claim 1, wherein the speech recognition module includes a semantic analysis unit, the semantic analysis unit analyzes the recognized document and outputs a command text, a non-command text, or a combination thereof, and the processing The module generates the control instruction according to the instruction text, or generates the text to be transmitted according to the non-instruction text, or a combination thereof.

The speech recognition input system according to claim 1, wherein the processing module controls the output of the text to be transmitted or the control instruction according to whether the recognized document includes a trigger instruction.

The speech recognition input system according to claim 1, wherein the speech processing device includes a switch module, and the switch module is connected to the processing module and can selectively output a switching signal. The switch signal controls outputting the text to be transmitted or outputting the control instruction.

The speech recognition input system according to claim 1, wherein the processing module includes an encoding unit, and the encoding unit compiles the text to be transmitted into a universal code format (Unicode).

The speech recognition input system according to claim 1, wherein the speech processing device is a computer peripheral device, and the computer peripheral device is a keyboard, a mouse, a stylus, or a speaker.

A voice processing device includes: a microphone module that outputs an audio signal; an audio transceiver module that is electrically connected to the microphone module, the audio transceiver module transmits the audio signal after receiving the audio signal, and receives the audio signal corresponding to the audio signal; An identified file of the signal; a processing module electrically connected to the audio transceiver module and receiving the identified file; the processing module generates a piece of information to be transmitted according to the identified file; the piece of information to be transmitted refers to a control Instructions, a text to be transmitted, or a combination thereof; and a human-machine transmission interface (HID Interface), which is electrically connected to the processing module, and the processing module controls the human-machine transmission interface to output the information to be transmitted.

The voice processing device according to claim 10, wherein the audio transceiver module is a wireless audio transceiver module or a wired audio transceiver module.

The voice processing device according to claim 10, further comprising a voice recognition module, which is communicatively connected to the audio transceiver module, and the voice recognition module receives the audio signal and recognizes it and outputs the recognized document correspondingly.

The speech processing device according to claim 12, wherein the speech recognition module is a cloud speech recognition module and is communicatively connected to the Internet, and the audio transceiver module is wirelessly or wiredly connected to the Internet to communicate with the Internet. The speech recognition modules communicate with each other.

The speech processing device according to claim 12, wherein the speech recognition module includes a translation unit, the analysis unit analyzes the audio signal and outputs a translated text, and the recognized document includes the translated text.

The speech processing device according to claim 12, wherein the speech recognition module includes a semantic analysis unit that analyzes the recognized document and correspondingly outputs a command text, a non-command text, or a combination thereof. The processing module The system generates the control instruction according to the instruction text, or generates the text to be transmitted according to the non-instruction text, or a combination thereof.

The speech processing device according to claim 10, wherein the processing module includes an encoding unit, and the encoding unit compiles the text to be transmitted into a universal code format (Unicode).

The speech processing device according to claim 10, wherein the processing module controls the output of the text to be transmitted when determining that the recognized document includes a trigger instruction, and the processing module determines that the recognized document does not include the When a command is triggered, the control outputs the control command.

The voice processing device according to claim 10, further comprising a switch module. The switch module is electrically connected to the processing module and can selectively output a switching signal. When the processing module receives the switching signal, it controls When the text to be transmitted is output, the control module outputs the control instruction when the processing module does not receive the switch signal.

The voice processing device according to claim 10, wherein the voice processing device is a computer peripheral device, and the computer peripheral device is a keyboard, a mouse, a stylus, or a speaker.

A speech recognition input method includes the following steps: an audio output step: outputting an audio signal; a speech recognition step: a speech recognition module recognizes the audio signal and correspondingly outputs an identified document; a processing step: a processing module according to the The identified file generates a message to be transmitted, wherein the message to be transmitted refers to a control instruction, a text to be transmitted, or a combination thereof; and an output step: the processing module controls a HID Interface to output the information to be transmitted .

The speech recognition input method according to claim 20, wherein the speech recognition module in the speech recognition step is a cloud speech recognition module and the communication is connected to the Internet.

The speech recognition input method according to claim 20, wherein the speech recognition step includes: a semantic analysis unit analyzes the recognized document and correspondingly outputs a command text, a non-command text or a combination thereof, and the processing step includes the processing The module generates the control instruction according to the instruction text, or generates the text to be transmitted according to the non-instruction text, or a combination thereof.

The speech recognition input method according to claim 20, wherein the speech recognition step further includes a translation step: a translation unit analyzes the audio signal and outputs a translated text, and the recognized document includes the translated text.

The speech recognition input method according to claim 20, wherein the processing step includes: compiling the text to be transmitted into a universal code format (Unicode).

The speech recognition input method according to claim 20, wherein the output step includes: the processing module controls outputting the text to be transmitted or outputting the control instruction according to whether the recognized document includes a trigger instruction.

The speech recognition input method according to claim 20, wherein the outputting step includes: the processing module controls outputting the text to be transmitted or outputting the control instruction according to whether a switching signal is received from a switching module.

The speech recognition input method according to claim 20, further comprising an execution step: a computer device receives the information to be transmitted, and executes the control instruction correspondingly, or displays the text to be transmitted or a combination thereof.