WO2022204937A1 - 基于语音识别设备的文本输入系统及其方法 - Google Patents

基于语音识别设备的文本输入系统及其方法 Download PDF

Info

Publication number
WO2022204937A1
WO2022204937A1 PCT/CN2021/083946 CN2021083946W WO2022204937A1 WO 2022204937 A1 WO2022204937 A1 WO 2022204937A1 CN 2021083946 W CN2021083946 W CN 2021083946W WO 2022204937 A1 WO2022204937 A1 WO 2022204937A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
output
recognition device
speech recognition
text
Prior art date
Application number
PCT/CN2021/083946
Other languages
English (en)
French (fr)
Inventor
董学章
蒋剡洋
Original Assignee
江苏树实科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏树实科技有限公司 filed Critical 江苏树实科技有限公司
Priority to PCT/CN2021/083946 priority Critical patent/WO2022204937A1/zh
Publication of WO2022204937A1 publication Critical patent/WO2022204937A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the technical field of speech recognition systems, and in particular, to a text input system based on a speech recognition device and a method and technology thereof.
  • the support platform is single. At present, most of the voice-to-text input devices on the market can only run on PC computers such as windows and macos, and cannot be used on Android and tablet computers.
  • the purpose of the present application is to provide a text input system based on a speech recognition device and a method thereof.
  • the present application discloses a text input system based on a speech recognition device, including a speech recognition device, a text input core module and an application terminal;
  • the voice recognition device is in communication connection with the text input core module; the text input core module is in communication connection with the application terminal;
  • the text input core module includes an input module and an output module; the input module and the output module are communicatively connected;
  • the input module includes an input control unit and an input unit; the input control unit and the input unit are connected in communication;
  • the output module includes an output control unit and an output unit; the output control unit and the output unit are connected in communication;
  • the input module and the output module are communicatively connected through the input control unit and the output control unit.
  • the text input core module further includes a power management unit; the power management unit is electrically connected to the input module and the output module, respectively;
  • the input module further includes an input antenna unit; the input antenna unit is connected in communication with the input control unit;
  • the output module further includes an output antenna unit; the output antenna unit is connected in communication with the output control unit;
  • the input control unit includes an input MCU, an input crystal oscillator circuit and an input reset circuit; the input MCU, the input crystal oscillator circuit and the input reset circuit are electrically connected; and
  • the input control unit further includes an input radio frequency transceiver; or
  • the input MCU is integrated with an input radio frequency transceiver
  • the output control unit includes an output MCU, an output crystal oscillator circuit and an output reset circuit; the output MCU, the output crystal oscillator circuit and the output reset circuit are electrically connected; and
  • the output control unit further includes an output radio frequency transceiver; or
  • the output MCU is integrated with an output radio frequency transceiver.
  • the input unit includes any one or any combination of a key input unit, an encoder signal input unit and an analog signal input unit;
  • the output unit includes a light effect output unit
  • the power management unit includes a lithium battery, a voltage regulator unit and a charging unit;
  • the charging unit includes any one or any combination of an overcharge protection subunit, an overdischarge protection subunit and a short circuit protection subunit;
  • the input antenna unit includes an input power amplifier circuit, an input filter circuit and an input PCB antenna;
  • the output antenna unit includes an output power amplifier circuit, an output filter circuit and an output PCB antenna;
  • the speech recognition device includes any one or any combination of a microphone, a speaker, a processor, a wireless module and a key;
  • the speech recognition device supports any one or any combination of the following:
  • the text input system based on the speech recognition device further includes a cloud server; the cloud server is connected in communication with the speech recognition device;
  • the voice recognition device communicates with the text input core module through the BLE protocol, and communicates with the cloud server through the WIFI protocol;
  • the cloud service supports automatic speech recognition algorithms
  • the application terminal includes any one or any combination of a Windows computer, a MacOS computer, a Linux computer, an IOS mobile phone, an IOS tablet, an Android mobile phone, and an Android tablet.
  • the input end of the speech recognition device is connected to the network; the output end of the speech recognition device is connected to the input unit of the text input core module; the output unit of the text input core module is connected to the input end of the application terminal;
  • described speech recognition equipment collects environmental audio signal or physical switch signal, under the set environmental audio signal or physical switch signal, described speech recognition equipment executes and enters text input mode;
  • the voice recognition device after entering the text input mode, the voice recognition device starts to record the user's voice content, and converts the user's voice content into text information;
  • Input step the speech recognition device converts the text information into text data, and sends the text data to the application terminal via the text input core module.
  • the speech recognition device in the text input mode stops collecting the environment when the collection time is greater than the preset time and the ambient audio signal is lower than the preset audio signal audio signal.
  • the processor of the speech recognition device processes the collected user speech content, and generates an audio file; the audio file is uploaded to the preset preset of the cloud server through the Internet port, and identify and convert the audio file into text information through the cloud server; the cloud server transmits the text information to the speech recognition device.
  • the text input core module includes an output unit and an input unit; in the input step, the speech recognition device sends the text information to the input unit;
  • the input unit encodes the text information, and then sends the processed text information to the output unit.
  • the output unit is provided with a Bluetooth receiver.
  • the input unit converts the processed text information into text data; if the application terminal is configured with a Bluetooth device, the input unit directly converts the processed text information into text data, and send the text data to the application terminal.
  • the method for turning a voice assistant into a text input device has the following technical effects:
  • the text input system and method based on the speech recognition device provided by the present application can use the method of speech-to-text input on application terminals such as computers, mobile phones, and tablets without installing any software.
  • features A+B+C are disclosed, and in another example, features A+B+D+E are disclosed, and features C and D are equivalent technical means that serve the same function. It can be used as soon as it is used, it is impossible to use it at the same time, and feature E can technically be combined with feature C, then the solution of A+B+C+D should not be regarded as having been recorded because it is technically infeasible, while A+B+ The C+E scheme shall be deemed to have been documented.
  • FIG. 1 is a schematic structural diagram of a text input system based on a speech recognition device provided by the present application.
  • FIG. 2 is a schematic diagram of a hardware structure of a core module provided by the present application.
  • FIG. 3 is a flowchart of a text input method based on a speech recognition device provided by the present application.
  • FIG. 4 is a schematic diagram of a text input method based on a speech recognition device provided by the present application.
  • the present application provides a text input system based on a speech recognition device, including a speech recognition device, a text input core module and an application terminal; the speech recognition device communicates with the text input core module connection; the text input core module and the application terminal are communicatively connected; the text input core module includes an input module and an output module; the input module and the output module are communicatively connected; the input module includes an input module a control unit and an input unit; the input control unit and the input unit are communicatively connected; the output module includes an output control unit and an output unit; the output control unit and the output unit are communicatively connected; the input module and the output module The input control unit and the output control unit are communicatively connected.
  • the text input core module further includes a power management unit; the power management unit is electrically connected with the input module and the output module respectively; the input module further includes an input antenna unit; the input antenna unit and the input control unit Communication connection; the output module also includes an output antenna unit; the output antenna unit is connected in communication with the output control unit; the input control unit includes an input microcontroller unit (Microcontroller Unit, MCU), an input crystal oscillator circuit and an input reset circuit ; the input MCU, the input crystal oscillator circuit and the input reset circuit are electrically connected; and the input control unit further includes an input radio frequency transceiver; or the input MCU integrates an input radio frequency transceiver; the output control unit includes an output MCU, an output crystal oscillator circuit and an output reset circuit; the output MCU, the output crystal oscillator circuit and the output reset circuit are electrically connected; and the output control unit further includes an output radio frequency transceiver; or the output MCU integrates an output radio frequency transceiver.
  • MCU Microcontroller Unit
  • the input control unit is the control unit A shown in FIG. 2
  • the output control unit is the control unit B shown in FIG. 2 .
  • the input MCU and the output MCU are internally integrated with a 32-bit microprocessor core
  • the maximum clock frequency of the input MCU and the output MCU is 48Mhz
  • the Flash is greater than 512k
  • the RAM is greater than 32k
  • the input MCU and the output MCU integrate BLE/802.15.4/2.4GHz radio frequency transceivers and work in the 2.4GHz frequency band.
  • the clock frequencies of the input crystal oscillator circuit and the output crystal oscillator circuit are both 24MHz.
  • the input unit includes any one or any combination of a key input unit, an encoder signal input unit and an analog signal input unit; the output unit includes a light effect output unit; the power management unit includes a lithium battery, a stable voltage unit and charging unit; the charging unit includes any one or any combination of an overcharge protection subunit, an overdischarge protection subunit and a short circuit protection subunit; the input antenna unit includes an input power amplifier circuit, an input filter circuit and input PCB antenna; the output antenna unit includes an output power amplifier circuit, an output filter circuit and an output PCB antenna; the voice recognition device includes any one or more of a microphone, a speaker, a processor, a wireless module and a button combination; the voice recognition device supports any one or any combination of the following: BLE protocol and/or WIFI protocol; voice wake-up; Chinese and English bilingual.
  • the lithium battery is preferably a polymer lithium battery, the rated voltage is preferably 3.7V, and the capacity is greater than 300mAh.
  • the input voltage of the voltage regulator unit is 3.3V.
  • the charging unit has functions such as overcharge protection, overdischarge protection and short circuit protection.
  • the text input system based on the speech recognition device further includes a cloud server; the cloud server is connected in communication with the speech recognition device; the speech recognition device is in communication and connection with the text input core module through the BLE protocol, and communicates with the cloud server through the WIFI protocol connection; the cloud service supports automatic speech recognition algorithm; the application terminal includes any one or any combination of Windows computer, MacOS computer, Linux computer, IOS mobile phone, IOS tablet, Android mobile phone and Android tablet.
  • the present application also provides a text input method based on a speech recognition device, including:
  • the input end of the speech recognition device is connected to the network; the output end of the speech recognition device is connected to the input unit of the text input core module; the output unit of the text input core module is connected to the input end of the application terminal;
  • the speech recognition device collects the environmental audio signal or the physical switch signal, and under the set environmental audio signal or the physical switch signal, the speech recognition device executes and enters the text input mode;
  • the voice recognition device after entering the text input mode, the voice recognition device starts to record the user's voice content, and converts the user's voice content into text information;
  • Input step the speech recognition device converts the text information into text data, and sends the text data to the application terminal via the text input core module.
  • the speech recognition device in the text input mode stops collecting the environmental audio signal when the collection time is greater than the preset time and the ambient audio signal is lower than the preset audio signal.
  • the processor of the speech recognition device processes the collected user speech content and generates an audio file; the audio file is uploaded to the preset port of the cloud server through the network, and the audio file is uploaded to the cloud server through the cloud.
  • the server recognizes and converts the audio file into text information; the cloud server transmits the text information to the speech recognition device.
  • the text input core module includes an output unit and an input unit; in the input step, the speech recognition device sends the text information to the input unit; the input unit encodes the text information, and then encodes the text information. The processed text information is sent to the output unit.
  • the output unit is provided with a Bluetooth receiver.
  • the input unit converts the processed text information into text data; if the application terminal is configured with a Bluetooth device, the input unit directly converts the processed text information into text data, and converts the processed text information into text data. Text data is sent to the application terminal.
  • the voice recognition device preferably the voice assistant
  • the voice assistant is configured to be networked (ie, the connection step).
  • Configure the text input core module hereinafter referred to as the core module, to connect it with the voice assistant.
  • configure the application terminal to connect it with the core module.
  • the core module integrates an input module, an output module, and a power management unit, so that it can be expanded into various human-computer interaction devices, such as adding buttons, wheel encoders, and batteries to make a mouse. Adding buttons and lighting effects can make a keyboard and so on.
  • the voice assistant When the voice assistant is woken up, it starts to collect the ambient audio signal, and when the ambient sound is lower than a certain value and maintains for a period of time, the audio collection ends. After the processor simply processes the signal, it will generate an audio file, send the audio file to a specific port of the cloud server through the network, and call the function interface of the ASR algorithm in the cloud server. The converted text information is returned to the voice assistant.
  • the voice assistant sends the text information to module A (ie, the input module) in the core module.
  • Module A encodes the text and sends the text data to module B (ie, the output module) through the serial port.
  • module B is connected to a specific USB Bluetooth receiver by default.
  • the USB receiver is used to solve the problem that some computers do not have Bluetooth.
  • Computers, mobile phones, and tablets with Bluetooth can search for module B in the core module.
  • Bluetooth signal and pair with it.
  • Module B virtualizes itself into a HID (human-computer interaction device) device, such as a keyboard, mouse and other devices, so that the application terminal can be recognized normally.
  • HID human-computer interaction device
  • module B sends the text data to the application terminal through the HID protocol to complete the input.
  • the present application also provides a text input method based on a speech recognition device.
  • the text input method based on a speech recognition device provided by the present application has two implementations:
  • the user asks the voice recognition device, and the voice recognition device obtains the answer to the question through the cloud server, broadcasts the answer voice to the user, and displays or outputs the text data of the corresponding answer (answered) on the application terminal.
  • the voice recognition device collects environmental audio signals, preferably user audio signals or physical switch signals (that is, the user presses a corresponding specific button or key). Under the preset environmental audio signals or physical switch signals, the voice The recognition device executes entering text input mode.
  • the AI voice server in the cloud server receives the voice content (voice data) collected from the voice recognition device, and the voice Content event processing server, the event processing server extracts the supported instructions and processes according to the state machine set by itself (for example, reading the corresponding data in the central database, which can be understood as the answer in the above question), the event The processing server may also feed back the processing result in the form of text information to the AI voice server, and the AI voice server sends the text information (text) to the voice recognition device through the HTTPS protocol.
  • the central database is used for storing public information and user private information, and provides information reading and storage services to the event processing server.
  • the public information refers to, for example, calculation formulas and user voice input texts (ie, ambient audio signals) and the like.
  • the private information refers to private keywords such as contact information saved by the user.
  • the voice recognition device will prompt the user what operations can be performed. The user can say "1+1 equals how many" to input a calculation formula and get the answer.
  • the voice data will be sent to the AI voice server through the data packet of the HTTPS protocol, and the AI voice server extracts the voice data from the data packet, and converts it into a text information packet, and sends it to the event processing server event processing server;
  • Event processing server will analyze the received text information data packet, identify the "answer" command in it, and call the corresponding mathematical calculation command processing function to complete the processing: calculate the addition according to "1+1". Then, the recognized calculation formula and the text information of the obtained calculation result are sent to the AI voice server and fed back to the AI voice server.
  • the AI voice server translates the step information of the text into voice data packets, and uses the HTTPS protocol to send the voice data packets.
  • the voice recognition device broadcasts the voice to the user; at the same time, the voice recognition device will send the text information to the text input core module, and the intelligent text input core module uses the HID protocol to output to the text input box of the application terminal .
  • the text input core module When the text input core module completes the text output, it will send a response command to the speech recognition device, indicating that the current command has been completed. The speech recognition device forwards the completion instruction to the event processing server.
  • the user asks a question to the speech recognition device, or describes a paragraph.
  • the speech recognition device collects audio data (the content of the question or the paragraph described), sends the collected audio data to the cloud server, and returns it to the cloud server after processing is completed.
  • the voice recognition device the voice recognition device is then transmitted to the text input core module, and the text input core module transmits the text data to the application terminal, and the application terminal displays or outputs the question content (non-question). answer) or the paragraph described.
  • Embodiment 2 is basically the same as that of Embodiment 1, and the difference between the two lies in steps 3) and 5);
  • the third) step in the second embodiment is that the user can speak a sentence or a paragraph, the speech recognition device starts to acquire audio data, and judges whether it ends according to the interval of the speech.
  • Step 5 the event processing server processes the received text information data packets, and then returns the processed text information to the AI voice server, and the AI voice server pushes it to the voice recognition device, and the voice recognition device will
  • the text information is sent to the text input core module, and the intelligent text input core module uses the HID protocol to output to the text input box of the computer application terminal.
  • the microphone on the speech recognition device collects the audio signal
  • the controller on the speech recognition device detects the time interval between the two audio signals, when the time interval is greater than the set time interval. value, then it is judged that this segment of audio is the end of one sentence, and the segment of audio file is sent to the cloud server for parsing.
  • the 5th) step that is, the input step, after the speech recognition device obtains the text information returned by the cloud server, the text information is sent to the module A in the core module through the BLE mode, and the module A then sends the text data through the serial port Send to module B.
  • the software in module B virtualizes itself into a HID keyboard, connects with the application terminal through BLE, and finally enters the text data into the text input box of the application terminal to complete the input.
  • an action is performed according to a certain element, it means at least that the action is performed according to the element, which includes two situations: the action is performed only according to the element, and the action is performed according to the element and Other elements perform this behavior.
  • Expressions such as multiple, multiple, multiple, etc. include 2, 2, 2, and 2 or more, 2 or more, and 2 or more.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种基于语音识别设备的文本输入系统,包括语音识别设备、文本输入核心模组以及应用终端;所述语音识别设备和文本输入核心模组通信连接;所述文本输入核心模组和应用终端通信连接;所述文本输入核心模组包括输入模组和输出模组;所述输入模组和输出模组通信连接;所述输入模组包括输入控制单元和输入单元;所述输入控制单元和输入单元通信连接;所述输出模组包括输出控制单元和输出单元;所述输出控制单元和输出单元通信连接;所述输入模组和输出模组通过输入控制单元和输出控制单元通信连接。基于语音识别设备的文本输入系统及其方法无需安装任何软件,就可以在应用终端上使用语音转文本输入这一方式。

Description

基于语音识别设备的文本输入系统及其方法 技术领域
本申请涉及语音识别系统技术领域,特别涉及一种基于语音识别设备的文本输入系统及其方法技术。
背景技术
目前市面上的相关产品如下存在以下几个问题:
1、操作步骤繁琐,目前市面上大多数的语音转文本输入设备,例如智能语音鼠标等都需要先安装客户端软件、再经过一番电脑参数配置之后才能使用。
2、支持平台单一,目前市面上大多数的语音转文本输入设备只能在windows、macos等PC电脑上运行,不能在安卓、平板电脑上使用。
发明内容
本申请的目的在于提供一种基于语音识别设备的文本输入系统及其方法。
本申请公开了一种基于语音识别设备的文本输入系统,包括语音识别设备、文本输入核心模组以及应用终端;
所述语音识别设备和文本输入核心模组通信连接;所述文本输入核心模组和应用终端通信连接;
所述文本输入核心模组包括输入模组和输出模组;所述输入模组和输出 模组通信连接;
所述输入模组包括输入控制单元和输入单元;所述输入控制单元和输入单元通信连接;
所述输出模组包括输出控制单元和输出单元;所述输出控制单元和输出单元通信连接;
所述输入模组和输出模组通过输入控制单元和输出控制单元通信连接。
在一个优选例中,所述文本输入核心模组还包括电源管理单元;所述电源管理单元分别与输入模组和输出模组电连接;
所述输入模组还包括输入天线单元;所述输入天线单元与输入控制单元通信连接;
所述输出模组还包括输出天线单元;所述输出天线单元与输出控制单元通信连接;
所述输入控制单元包括输入MCU、输入晶振电路以及输入复位电路;所述输入MCU、输入晶振电路以及输入复位电路电连接;并且
所述输入控制单元还包括输入射频收发器;或者
所述输入MCU集成有输入射频收发器;
所述输出控制单元包括输出MCU、输出晶振电路以及输出复位电路;所述输出MCU、输出晶振电路以及输出复位电路电连接;并且
所述输出控制单元还包括输出射频收发器;或者
所述输出MCU集成有输出射频收发器。
在一个优选例中,所述输入单元包括按键输入单元、编码器信号输入单元以及模拟信号输入单元中的任一种或任多种组合;
所述输出单元包括灯效输出单元;
所述电源管理单元包括锂电池、稳压单元以及充电单元;所述充电单元包括过充保护子单元、过放保护子单元以及短路保护子单元中的任一种或任多种组合;
所述输入天线单元包括输入功放电路、输入滤波电路以及输入PCB天线;
所述输出天线单元包括输出功放电路、输出滤波电路以及输出PCB天线;
所述语音识别设备包括麦克风、扬声器、处理器、无线模块以及按键中的任一种或任多种组合;
所述语音识别设备支持下列各项中的任一种或任多种组合:
BLE协议和/或WIFI协议;
语音唤醒;
中英双语。
在一个优选例中,所述基于语音识别设备的文本输入系统还包括云服务器;所述云服务器与语音识别设备通信连接;
所述语音识别设备通过BLE协议与文本输入核心模组通信连接,通过WIFI协议与云服务器通信连接;
所述云服务支持自动语音识别算法;
所述应用终端包括Windows电脑、MacOS电脑、Linux电脑、IOS手机、IOS平板、Android手机以及Android平板中的任一种或任多种组合。
在一个优选例中,包括:
连接步骤:语音识别设备的输入端与网络连接;所述语音识别设备的输出端与文本输入核心模组的输入单元连接;所述文本输入核心模组的输出单元与应用终端的输入端连接;
采集步骤:所述语音识别设备采集环境音频信号或物理开关信号,在设 定的环境音频信号或物理开关信号下,所述语音识别设备执行进入文本输入模式;
转换步骤:进入所述文本输入模式后,所述语音识别设备开始录制用户语音内容,并将所述用户语音内容转换为文本信息;
输入步骤:所述语音识别设备将所述文本信息转换为文本数据,并经所述文本输入核心模组将文本数据发送至应用终端。
在一个优选例中,在所述采集步骤中,处于文本输入模式下的语音识别设备,在采集时间大于预设时间并且环境音频信号低于预设音频信号时,所述语音识别设备停止采集环境音频信号。
在一个优选例中,在所述转换步骤中,所述语音识别设备的处理器将已采集的用户语音内容进行处理,并生成音频文件;所述音频文件通过网络互联网上传至云服务器的预设端口,并通过所述云服务器将所述音频文件识别、转换为文本信息;所述云服务器将所述文本信息传至所述语音识别设备。
在一个优选例中,所述文本输入核心模组包括输出单元和输入单元;在所述输入步骤中,所述语音识别设备将所述文本信息发送至输入单元;
所述输入单元将文本信息进行编码处理,并随后将已处理的所述文本信息发送至输出单元。
在一个优选例中,所述输出单元设置有蓝牙接收器。
在一个优选例中,所述输入单元将已处理的所述文本信息转换为文本数据;若所述应用终端配置有蓝牙设备,则所述输入单元直接将已处理的所述文本信息转换为文本数据,并将所述文本数据发送至所述应用终端。
与现有技术相比,本申请提供的将语音助手变成文本输入器的方法具有如下技术效果:
本申请提供的基于语音识别设备的文本输入系统及其方法无需安装任 何软件,就可以在电脑、手机、平板等应用终端上使用语音转文本输入这一方式。
本申请的说明书中记载了大量的技术特征,分布在各个技术方案中,如果要罗列出本申请所有可能的技术特征的组合(即技术方案)的话,会使得说明书过于冗长。为了避免这个问题,本申请上述发明内容中公开的各个技术特征、在下文各个实施方式和例子中公开的各技术特征、以及附图中公开的各个技术特征,都可以自由地互相组合,从而构成各种新的技术方案(这些技术方案均应该视为在本说明书中已经记载),除非这种技术特征的组合在技术上是不可行的。例如,在一个例子中公开了特征A+B+C,在另一个例子中公开了特征A+B+D+E,而特征C和D是起到相同作用的等同技术手段,技术上只要择一使用即可,不可能同时采用,特征E技术上可以与特征C相组合,则,A+B+C+D的方案因技术不可行而应当不被视为已经记载,而A+B+C+E的方案应当视为已经被记载。
附图说明
图1为本申请提供的基于语音识别设备的文本输入系统结构示意图。
图2为本申请提供的核心模组硬件结构示意图。
图3为本申请提供的基于语音识别设备的文本输入方法的流程图。
图4为本申请提供的基于语音识别设备的文本输入方法的示意图。
具体实施方式
在以下的叙述中,为了使读者更好地理解本申请而提出了许多技术细节。但是,本领域的普通技术人员可以理解,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。
如图1和图2所示,本申请提供了一种基于语音识别设备的文本输入系统,包括语音识别设备、文本输入核心模组以及应用终端;所述语音识别设备和文本输入核心模组通信连接;所述文本输入核心模组和应用终端通信连接;所述文本输入核心模组包括输入模组和输出模组;所述输入模组和输出模组通信连接;所述输入模组包括输入控制单元和输入单元;所述输入控制单元和输入单元通信连接;所述输出模组包括输出控制单元和输出单元;所述输出控制单元和输出单元通信连接;所述输入模组和输出模组通过输入控制单元和输出控制单元通信连接。
所述文本输入核心模组还包括电源管理单元;所述电源管理单元分别与输入模组和输出模组电连接;所述输入模组还包括输入天线单元;所述输入天线单元与输入控制单元通信连接;所述输出模组还包括输出天线单元;所述输出天线单元与输出控制单元通信连接;所述输入控制单元包括输入微控制单元(Microcontroller Unit,MCU)、输入晶振电路以及输入复位电路;所述输入MCU、输入晶振电路以及输入复位电路电连接;并且所述输入控制单元还包括输入射频收发器;或者所述输入MCU集成有输入射频收发器;所述输出控制单元包括输出MCU、输出晶振电路以及输出复位电路;所述输出MCU、输出晶振电路以及输出复位电路电连接;并且所述输出控制单元还包括输出射频收发器;或者所述输出MCU集成有输出射频收发器。
所述输入控制单元即为图2中所示的控制单元A,所述输出控制单元即为图2所示的控制单元B。
其中,优选地,所述输入MCU和输出MCU的内部集成有32位微处理器内核,所述输入MCU和所述输出MCU的时钟频率最大均为48Mhz,Flash大于512k,RAM大于32k,带UART、ADC、IIC、USB、VO等硬件接口。所述输入MCU和输出MCU集成BLE/802.15.4/2.4GHz射频收发器,工作在2.4GHz频段。所述输入晶振电路和输出晶振电路的时钟频率均为24MHz。
所述输入单元包括按键输入单元、编码器信号输入单元以及模拟信号输入单元中的任一种或任多种组合;所述输出单元包括灯效输出单元;所述电源管理单元包括锂电池、稳压单元以及充电单元;所述充电单元包括过充保护子单元、过放保护子单元以及短路保护子单元中的任一种或任多种组合;所述输入天线单元包括输入功放电路、输入滤波电路以及输入PCB天线;所述输出天线单元包括输出功放电路、输出滤波电路以及输出PCB天线;所述语音识别设备包括麦克风、扬声器、处理器、无线模块以及按键中的任一种或任多种组合;所述语音识别设备支持下列各项中的任一种或任多种组合:BLE协议和/或WIFI协议;语音唤醒;中英双语。所述锂电池优选的为聚合物锂电池,该额定电压优选地为3.7V,该容量大于300mAh。所述稳压单元的输入电压为3.3V。优选地,所述充电单元有过充保护、过放保护以及短路保护等功能。
所述基于语音识别设备的文本输入系统还包括云服务器;所述云服务器与语音识别设备通信连接;所述语音识别设备通过BLE协议与文本输入核心模组通信连接,通过WIFI协议与云服务器通信连接;所述云服务支持自动语音识别算法;所述应用终端包括Windows电脑、MacOS电脑、Linux电脑、IOS手机、IOS平板、Android手机以及Android平板中的任一种或任多种组合。
如图3所示,本申请还提供了一种基于语音识别设备的文本输入方法,包括:
连接步骤:语音识别设备的输入端与网络连接;所述语音识别设备的输出端与文本输入核心模组的输入单元连接;所述文本输入核心模组的输出单元与应用终端的输入端连接;
采集步骤:所述语音识别设备采集环境音频信号或物理开关信号,在设定的环境音频信号或物理开关信号下,所述语音识别设备执行进入文本输入模式;
转换步骤:进入所述文本输入模式后,所述语音识别设备开始录制用户语音内容,并将所述用户语音内容转换为文本信息;
输入步骤:所述语音识别设备将所述文本信息转换为文本数据,并经所述文本输入核心模组将文本数据发送至应用终端。
在所述采集步骤中,处于文本输入模式下的语音识别设备,在采集时间大于预设时间并且环境音频信号低于预设音频信号时,所述语音识别设备停止采集环境音频信号。
在所述转换步骤中,所述语音识别设备的处理器将已采集的用户语音内容进行处理,并生成音频文件;所述音频文件通过网络上传至云服务器的预设端口,并通过所述云服务器将所述音频文件识别、转换为文本信息;所述云服务器将所述文本信息传至所述语音识别设备。
所述文本输入核心模组包括输出单元和输入单元;在所述输入步骤中,所述语音识别设备将所述文本信息发送至输入单元;所述输入单元将文本信息进行编码处理,并随后将已处理的所述文本信息发送至输出单元。
所述输出单元设置有蓝牙接收器。
所述输入单元将已处理的所述文本信息转换为文本数据;若所述应用终端配置有蓝牙设备,则所述输入单元直接将已处理的所述文本信息转换为文本数据,并将所述文本数据发送至所述应用终端。
下面对本申请提供的基于语音识别设备的文本输入系统及其方法进行进一步说明:
基于语音识别设备的文本输入系统
首先,对语音识别设备,优选地为语音助手进行配置,使其联网(即为连接步骤)。在对文本输入核心模组,以下简称为核心模组进行配置,使其和语音助手连接。最后对应用终端进行配置,使其和核心模组连接。当用户 需要输入文字时只需要说一下唤醒词,例如“开始输入”或按一下语音助手的特定按键(即,输入控制单元),就进入文本输入模式,该模式下用户说的语音内容都会被转换成文本输入到应用终端内。
所述核心模组内部集成输入模组、输出模组、电源管理单元使其能够被扩展成各种人机交互设备,如增加按键、滚轮编码器、电池就可以做成一个鼠标。增加按键、灯效就可以做成一个键盘等等。
当语音助手被唤醒时,开始采集环境音频信号,等到环境声音小于一定值并维持一段时间后,结束音频采集。处理器将信号做简单处理后会生成一个音频文件,将音频文件通过网络发送到云服务器的特定端口,并调用云服务器中的ASR算法的函数接口,云服务器将音频文件进行识别转换,最后将转换好的文本信息返回给语音助手。
语音助手将文本信息再发送给核心模组中的模组A(即输入模组),模组A将文本进行编码处理再经过串口将文本数据发送给模组B(即输出模组)。
优选地,模组B默认出厂与一个特定的USB蓝牙接收器相连接,USB接收器用于解决部分电脑没有蓝牙的问题,带蓝牙的电脑、手机、平板就可以搜索核心模组内模组B的蓝牙信号并与之配对。模组B将自己虚拟成HID(人机交互设备)设备,如键盘、鼠标等设备,使得应用终端能够正常识别。最后模组B通过HID协议将文本数据发送给应用终端,完成输入。
基于语音识别设备的文本输入方法
本申请还提供了一种基于语音识别设备的文本输入方法,优选地,本申请提供的基于语音识别设备的文本输入方法具有两种实施方式:
实施例1
用户对语音识别设备进行提问,该语音识别设备通过云服务器获取该提 问问题的答案后,将所述答案语音播报给用户的同时在应用终端显示出或者说输出对应答案(回答的)文本数据。
具体地说,语音识别设备采集环境音频信号,优选的为用户音频信号或物理开关信号(即用户按下相应特定按钮或按键),在预设的环境音频信号或物理开关信号下,所述语音识别设备执行进入文本输入模式。进入文本输入模式后,所述云服务器中的AI语音服务器(云服务器包括AI语音服务器、事件处理服务器以及中心数据库)接收来自所述语音识别设备采集到的语音内容(语音数据),将该语音内容事件处理服务器,所述事件处理服务器提取其中可支持的指令,根据自身设定状态机进行处理(例如,在中心数据库中读取相应数据,可以理解为上述问题中的答案),所述事件处理服务器还可以将处理结果以文本信息方式反馈至AI语音服务器,所述AI语音服务器将文本信息(文字)通过HTTPS协议发送至所述语音识别设备。
其中,所述中心数据库用于存储公共信息和用户私有信息,并向事件处理服务器提供信息读取和存储服务。其中,所述公共信息是指例如,计算公式和用户语音输入文本(即环境音频信号)等。所述私有信息是指例如用户保存的联系方式等私有关键词。
更具体地说:
1)首先,用户启动语音识别设备,并说“匹配我的配件”,文本输入核心模组和语音识别设备会自动匹配并关联上。
2)用户对语音识别设备发出指令,唤醒文本输入核心模组输入的功能(即为转换步骤)。
3)语音识别设备会提示用户可以进行哪些操作,用户可以说“1+1等于几”来输入一个计算公式并获取答案。
4)语音数据会通过HTTPS协议的数据包发到AI语音服务器,AI语音服务器从数据包里抽出语音数据,并转化成文字信息数据包,发送给所述事 件处理服务器事件处理服务器;
5)事件处理服务器事件处理服务器会分析收到的文字信息数据包,识别出其中的“答案”命令,调用相应的数学计算命令处理函数完成处理:根据“1+1”来计算加法。然后将识别出来的计算公式以及得到的计算结果的文本信息发送到AI语音服务器反馈给AI语音服务器,所述AI语音服务器将文字的步骤信息翻译合成语音数据包,使用HTTPS协议将语音数据包发送到语音识别设备,语音识别设备将语音播报给用户;同时,语音识别设备会将文本信息发送给文本输入核心模组,智能文本输入核心模组用过HID协议输出到应用终端的文本输入框内。
6)当文本输入核心模组完成文本的输出,会发送响应指令给到语音识别设备,表示当前命令已经完成。语音识别设备将完成指令再转发给事件处理服务器。
7)当事件处理服务器确认当前指令完成,会提示用户可以进行下一句文本的输入或者其他操作。
上述数字)表示为步骤顺序,但并不能作为限制本申请的保护范围,仅仅为本申请的其中一种实施方式。
实施例2
用户对语音识别设备进行提问,或者描述一段话,语音识别设备采集音频数据(提问内容或描述的该段话),将已采集的音频数据发送给云服务器,待云服务器处理完成后回传至所述语音识别设备,所述语音识别设备再传至文本输入核心模组,所述文本输入核心模组将该文本数据传至应用终端,应用终端显示出或者说输出该提问内容(非问题的答案)或描述的该段话。
具体地说,实施例2与实施例1的内容基本一致,两者之间的不同在于第3)和5)步骤;
实施例2中的第3)步骤为用户可以说一句话或一段话,语音识别设备开始获取音频数据,根据话语的间隔判断是否结束。
步骤5)为事件处理服务器根据设定的业务逻辑,处理收到的文字信息数据包,然后将处理后的文本信息回传给AI语音服务器,AI语音服务器推给语音识别设备,语音识别设备将文本信息发送给文本输入核心模组,智能文本输入核心模组用HID协议输出到电脑应用终端的文本输入框内。
更具体地,在第3)步骤中,即采集步骤中,所述语音识别设备上的麦克风采集音频信号,语音识别设备上的控制器检测两断音频信号的时间间隔,当时间间隔大于设定值,则判断这一段音频为一句话结束,将该段音频文件发送至云服务器解析。
第5)步骤,即输入步骤,所述语音识别设备获取到云服务器返回的文本信息后,将文本信息通过BLE方式发送给核心模组内的模组A,模组A再将文本数据通过串口发送给模组B。模组B内软件将自身虚拟成一个HID键盘,通过BLE方式和应用终端连接,最后将文本数据输入到应用终端的文本输入框中,完成输入。
需要说明的是,在本专利的申请文件中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。本专利的申请文件中,如果提到根据某要素执行某行为,则是指至少根据该要素执行该行为的意思,其中包括了两种情况:仅根据该 要素执行该行为、和根据该要素和其它要素执行该行为。多个、多次、多种等表达包括2个、2次、2种以及2个以上、2次以上、2种以上。
本说明书包括本文所描述的各种实施例的组合。对“一个实施例”或特定实施例等的单独提及不一定是指相同的实施例;然而,除非指示为是互斥的或者本领域技术人员很清楚是互斥的,否则这些实施例并不互斥。应当注意的是,除非上下文另外明确指示或者要求,否则在本说明书中以非排他性的意义使用“或者”一词。
在本申请提及的所有文献都被认为是整体性地包括在本申请的公开内容中,以便在必要时可以作为修改的依据。此外应理解,在阅读了本申请的上述公开内容之后,本领域技术人员可以对本申请作各种改动或修改,这些等价形式同样落于本申请所要求保护的范围。

Claims (10)

  1. 一种基于语音识别设备的文本输入系统,其特征在于,包括语音识别设备、文本输入核心模组以及应用终端;
    所述语音识别设备和文本输入核心模组通信连接;所述文本输入核心模组和应用终端通信连接;
    所述文本输入核心模组包括输入模组和输出模组;所述输入模组和输出模组通信连接;
    所述输入模组包括输入控制单元和输入单元;所述输入控制单元和输入单元通信连接;
    所述输出模组包括输出控制单元和输出单元;所述输出控制单元和输出单元通信连接;
    所述输入模组和输出模组通过输入控制单元和输出控制单元通信连接。
  2. 根据权利要求1所述的基于语音识别设备的文本输入系统,其特征在于,所述文本输入核心模组还包括电源管理单元;所述电源管理单元分别与输入模组和输出模组电连接;
    所述输入模组还包括输入天线单元;所述输入天线单元与输入控制单元通信连接;
    所述输出模组还包括输出天线单元;所述输出天线单元与输出控制单元通信连接;
    所述输入控制单元包括输入MCU、输入晶振电路以及输入复位电路;所述输入MCU、输入晶振电路以及输入复位电路电连接;并且
    所述输入控制单元还包括输入射频收发器;或者
    所述输入MCU集成有输入射频收发器;
    所述输出控制单元包括输出MCU、输出晶振电路以及输出复位电路;所述输出MCU、输出晶振电路以及输出复位电路电连接;并且
    所述输出控制单元还包括输出射频收发器;或者
    所述输出MCU集成有输出射频收发器。
  3. 根据权利要求2所述的基于语音识别设备的文本输入系统,其特征在于,所述输入单元包括按键输入单元、编码器信号输入单元以及模拟信号输入单元中的任一种或任多种组合;
    所述输出单元包括灯效输出单元;
    所述电源管理单元包括锂电池、稳压单元以及充电单元;所述充电单元包括过充保护子单元、过放保护子单元以及短路保护子单元中的任一种或任多种组合;
    所述输入天线单元包括输入功放电路、输入滤波电路以及输入PCB天线;
    所述输出天线单元包括输出功放电路、输出滤波电路以及输出PCB天线;
    所述语音识别设备包括麦克风、扬声器、处理器、无线模块以及按键中的任一种或任多种组合;
    所述语音识别设备支持下列各项中的任一种或任多种组合:
    BLE协议和/或WIFI协议;
    语音唤醒;
    中英双语。
  4. 根据权利要求1至3中任一项所述的基于语音识别设备的文本输入系统,其特征在于,所述基于语音识别设备的文本输入系统还包括云服务器;所述云服务器与语音识别设备通信连接;
    所述语音识别设备通过BLE协议与文本输入核心模组通信连接,通过 WIFI协议与云服务器通信连接;
    所述云服务支持自动语音识别算法;
    所述应用终端包括Windows电脑、MacOS电脑、Linux电脑、IOS手机、IOS平板、Android手机以及Android平板中的任一种或任多种组合。
  5. 一种基于语音识别设备的文本输入方法,基于权利要求1至4中任一项所述的基于语音识别设备的文本输入系统,其特征在于,包括:
    连接步骤:语音识别设备的输入端与网络连接;所述语音识别设备的输出端与文本输入核心模组的输入单元连接;所述文本输入核心模组的输出单元与应用终端的输入端连接;
    采集步骤:所述语音识别设备采集环境音频信号或物理开关信号,在设定的环境音频信号或物理开关信号下,所述语音识别设备执行进入文本输入模式;
    转换步骤:进入所述文本输入模式后,所述语音识别设备开始录制用户语音内容,并将所述用户语音内容转换为文本信息;
    输入步骤:所述语音识别设备将所述文本信息转换为文本数据,并经所述文本输入核心模组将文本数据发送至应用终端。
  6. 根据权利要求1所述的基于语音识别设备的文本输入方法,其特征在于,在所述采集步骤中,处于文本输入模式下的语音识别设备,在采集时间大于预设时间并且环境音频信号低于预设音频信号时,所述语音识别设备停止采集环境音频信号。
  7. 根据权利要求5所述的基于语音识别设备的文本输入方法语音识别设备,其特征在于,在所述转换步骤中,所述语音识别设备的处理器将已采集的用户语音内容进行处理,并生成音频文件;所述音频文件通过互联网上传至云服务器的预设端口,并通过所述云服务器将所述音频文件识别、转换 为文本信息;所述云服务器将所述文本信息传至所述语音识别设备。
  8. 根据权利要求1所述的基于语音识别设备的文本输入方法,其特征在于,所述文本输入核心模组包括输出单元和输入单元;在所述输入步骤中,所述语音识别设备将所述文本信息发送至输入单元;
    所述输入单元将文本信息进行编码处理,并随后将已处理的所述文本信息发送至输出单元。
  9. 根据权利要求8所述的基于语音识别设备的文本输入方法,其特征在于,所述输出单元设置有蓝牙接收器。
  10. 根据权利要求8或9所述的基于语音识别设备的文本输入方法,其特征在于,所述输入单元将已处理的所述文本信息转换为文本数据;若所述应用终端配置有蓝牙设备,则所述输入单元直接将已处理的所述文本信息转换为文本数据,并将所述文本数据发送至所述应用终端。
PCT/CN2021/083946 2021-03-30 2021-03-30 基于语音识别设备的文本输入系统及其方法 WO2022204937A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083946 WO2022204937A1 (zh) 2021-03-30 2021-03-30 基于语音识别设备的文本输入系统及其方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083946 WO2022204937A1 (zh) 2021-03-30 2021-03-30 基于语音识别设备的文本输入系统及其方法

Publications (1)

Publication Number Publication Date
WO2022204937A1 true WO2022204937A1 (zh) 2022-10-06

Family

ID=83455381

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083946 WO2022204937A1 (zh) 2021-03-30 2021-03-30 基于语音识别设备的文本输入系统及其方法

Country Status (1)

Country Link
WO (1) WO2022204937A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101789840A (zh) * 2010-01-19 2010-07-28 华为技术有限公司 一种全t交叉装置和方法
CN106409296A (zh) * 2016-09-14 2017-02-15 安徽声讯信息技术有限公司 基于分核处理技术的语音快速转写校正系统
CN110047484A (zh) * 2019-04-28 2019-07-23 合肥马道信息科技有限公司 一种语音识别交互方法、系统、设备和存储介质
CN111009244A (zh) * 2019-12-06 2020-04-14 贵州电网有限责任公司 语音识别方法及系统
CN111597308A (zh) * 2020-05-19 2020-08-28 中国电子科技集团公司第二十八研究所 一种基于知识图谱的语音问答系统及其应用方法
US20200357393A1 (en) * 2019-05-07 2020-11-12 Getac Technology Corporation Voice control method and voice control system for in-vehicle device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101789840A (zh) * 2010-01-19 2010-07-28 华为技术有限公司 一种全t交叉装置和方法
CN106409296A (zh) * 2016-09-14 2017-02-15 安徽声讯信息技术有限公司 基于分核处理技术的语音快速转写校正系统
CN110047484A (zh) * 2019-04-28 2019-07-23 合肥马道信息科技有限公司 一种语音识别交互方法、系统、设备和存储介质
US20200357393A1 (en) * 2019-05-07 2020-11-12 Getac Technology Corporation Voice control method and voice control system for in-vehicle device
CN111009244A (zh) * 2019-12-06 2020-04-14 贵州电网有限责任公司 语音识别方法及系统
CN111597308A (zh) * 2020-05-19 2020-08-28 中国电子科技集团公司第二十八研究所 一种基于知识图谱的语音问答系统及其应用方法

Similar Documents

Publication Publication Date Title
CN106920548B (zh) 语音控制装置、语音控制系统和语音控制方法
CN107277754B (zh) 一种蓝牙连接的方法及蓝牙外围设备
US10893365B2 (en) Method for processing voice in electronic device and electronic device
CN103021411A (zh) 语音控制装置和语音控制方法
CN108877805A (zh) 语音处理模组和具有语音功能的终端
CN110070863A (zh) 一种语音控制方法及装置
CN109672775B (zh) 调节唤醒灵敏度的方法、装置及终端
CN109101517A (zh) 信息处理方法、信息处理设备以及介质
CN103152480A (zh) 利用移动终端进行到站提示的方法和装置
CN111429897A (zh) 智能家居系统控制实现方式
TW201732497A (zh) 麥克風裝置
CN111862965A (zh) 唤醒处理方法、装置、智能音箱及电子设备
CN206259172U (zh) 多功能语音控制系统
CN112363851A (zh) 智能终端的语音唤醒方法、系统、智能手表及存储介质
CN108877799A (zh) 一种语音控制装置及方法
WO2022204937A1 (zh) 基于语音识别设备的文本输入系统及其方法
CN210327999U (zh) 基于双蓝牙芯片控制的语音传输设备
CN206164799U (zh) 可见光通信的语音播放系统、耳机及音乐播放器
CN210112219U (zh) 基于usb控制的多路蓝牙麦克风系统
CN112259076A (zh) 语音交互方法、装置、电子设备及计算机可读存储介质
CN213400541U (zh) 基于无线通信语音交互系统的智能眼镜
CN110168511B (zh) 一种电子设备和降低功耗的方法及装置
CN109446297A (zh) 信息处理方法、信息处理设备以及设备可读介质
CN114999496A (zh) 音频传输方法、控制设备及终端设备
CN207021746U (zh) 移动电源

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21933621

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21933621

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.02.2024)