CN210325192U

CN210325192U - Off-line voice terminal

Info

Publication number: CN210325192U
Application number: CN201920757746.5U
Authority: CN
Inventors: 陶永耀
Original assignee: Actions (zhuhai) Technology Co Ltd
Current assignee: Actions Technology Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2020-04-14
Anticipated expiration: 2029-05-23

Abstract

The embodiment of the utility model provides an off-line voice terminal, include: the system comprises a microphone acquisition device, a voice recognition device, a control device and a first communication device; the microphone acquisition device is connected with the first communication device, and the first communication device is used for being in wireless or wired connection with external intelligent equipment and transmitting acquired voice data to the external intelligent equipment; the control device is connected with the voice recognition device; the voice recognition device is connected with the microphone acquisition device; the first communication device is also used for receiving the speech recognition model parameters sent by the external intelligent equipment for the speech recognition device to use. The voice training function of the off-line voice product can be realized.

Description

Off-line voice terminal

Technical Field

The utility model relates to a multimedia technology especially relates to a system of off-line voice terminal and end user training thereof.

Background

With the advent of voice human-computer interaction interfaces, more and more products are required to interact with intelligent voice. At present, more intelligent online voice products exist, but the online voice products have the problems of response delay, confidentiality, high system cost and the like. Some off-line voice products exist in the market, and a large amount of voice collection is required to be trained to achieve the coverage rate of voice recognition during product design. However, even in this case, the voice of all people cannot be covered, and the training problem of people with dialects cannot be solved, which results in that many voices are difficult to be recognized in voice recognition by off-line products.

SUMMERY OF THE UTILITY MODEL

In view of this, based on the above problem, the embodiment of the utility model provides an off-line voice terminal can realize the speech training function of off-line voice product.

The embodiment of the utility model provides a realize like this, an off-line pronunciation terminal, include: the system comprises a microphone acquisition device, a voice recognition device, a control device and a first communication device; the microphone acquisition device is connected with the first communication device, the first communication device is used for being in wireless or wired connection with external intelligent equipment, and voice data acquired by the microphone acquisition device is transmitted to the external intelligent equipment; the control device is connected with the voice recognition device; the voice recognition device is connected with the microphone acquisition device; the first communication device is also used for receiving the speech recognition model parameters sent by the external intelligent equipment for the speech recognition device to use.

Further, the off-line voice terminal further comprises a coding device and/or a storage device; the coding device is connected between the microphone acquisition device and the first communication device and used for coding the acquired voice data and transmitting the coded voice data to the external intelligent equipment through the first communication device; the storage device is connected with the first communication device and the voice recognition device and is used for storing the voice recognition model parameters.

Further, the external intelligent device comprises a voice training device, and the voice training device is used for training according to voice data and generating voice recognition model parameters.

Furthermore, the external intelligent device comprises a network device, the network device is further connected with an external cloud server, the cloud server comprises a voice training device, and the voice training device is used for training according to voice data and generating voice recognition model parameters.

Further, the first communication device comprises a WIFI device or a bluetooth device.

According to the utility model discloses on the other hand, the embodiment of the utility model provides a still provide a system that is used for off-line pronunciation end user to train, can realize the pronunciation training function of off-line pronunciation product. The embodiment of the utility model provides a realize like this, a system for off-line pronunciation terminal user training, including off-line pronunciation terminal and smart machine; the offline voice terminal includes: the system comprises a microphone acquisition device, a voice recognition device, a control device and a first communication device; the intelligent equipment comprises a second communication device and a voice training device; the second communication device is connected with the voice training device; the microphone acquisition device is connected with the first communication device, and the first communication device is used for being connected with the second communication device in a wireless or wired mode and transmitting the voice data acquired by the microphone acquisition device to the second communication device; the control device is connected with the voice recognition device, and the voice recognition device is connected with the microphone acquisition device; the voice training device is used for training according to voice data and generating voice recognition model parameters, and the first communication device is used for receiving the voice recognition model parameters trained by the external intelligent equipment and supplying the voice recognition model parameters to the voice recognition device.

Furthermore, the off-line voice terminal further comprises a coding device and/or a storage device, wherein the coding device is connected between the microphone acquisition device and the voice recognition device and is used for coding the acquired voice data and then transmitting the coded voice data to the external intelligent equipment through the first communication device; the storage device is connected with the voice recognition device and the first communication device and is used for storing the voice recognition model parameters.

According to the utility model discloses on the other hand, the embodiment of the utility model provides a still provide a system that is used for off-line pronunciation end user to train, can realize the pronunciation training function of off-line pronunciation product. The embodiment of the utility model provides a realize like this, a system for off-line pronunciation terminal user training, including off-line pronunciation terminal, smart machine and high in the clouds server; the offline voice terminal includes: the system comprises a microphone acquisition device, a voice recognition device, a control device and a first communication device; the intelligent equipment comprises a second communication device and a network device; the cloud server comprises a voice training device; the microphone acquisition device is connected with the first communication device, and the first communication device is used for being connected with the second communication device in a wireless or wired mode and transmitting the voice data acquired by the microphone acquisition device to the second communication device; the network device is used for sending the voice data to a cloud server through a network; the voice training device is used for training according to the voice data and generating voice recognition model parameters; the network device is further configured to receive the speech recognition model parameters over a network; the second communication device is further configured to send the speech recognition model parameters to the first communication device for use by the speech recognition device; the control device is connected with the voice recognition device, and the voice recognition device is connected with the microphone acquisition device.

Further, the first communication device and the second communication device are WIFI devices or bluetooth devices.

By adopting the technical scheme, the method has the following beneficial effects: the mode not only meets the off-line requirement in use, but also aims at the targeted training of the user, and solves the problem that part of people use the unified voice training library and the recognition rate is low. The processing capacity and the transmission capacity of intelligent equipment such as a mobile phone and the like and/or the training capacity of a cloud server are/is utilized to upgrade an off-line voice recognition control device on the equipment, so that the scenes of on-line training, upgrading and off-line use are realized. This better adapts to the user's scene and environment. Meanwhile, the problems of large workload of factory training and difficulty in dialect training are solved.

Drawings

Fig. 1 is a block diagram of a circuit configuration according to an embodiment of the present invention;

fig. 2 is a block diagram of a circuit configuration according to another embodiment of the present invention;

fig. 3 is a block diagram of a circuit configuration according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by a person skilled in the art without creative work belong to the protection scope of the present invention. The embodiments or technical features of the embodiments of the present application may be combined with each other without conflict.

Referring to fig. 1, an embodiment of the present invention provides an offline voice terminal 1, including: a microphone acquisition device 101, a voice recognition device 102, a control device 103, a first communication device 104; the microphone acquisition device 101 is connected with the first communication device 104, and the first communication device 104 is used for being connected with the external intelligent device 2 in a wireless or wired manner and transmitting the voice data acquired by the microphone acquisition device 101 to the external intelligent device 2; the control device 103 is connected with the voice recognition device 102, and the control device 103 controls the off-line voice terminal 1 according to the voice command recognized by the voice recognition device 102; the voice recognition device 102 is connected with the microphone acquisition device 101; the first communication device 104 is further configured to receive the speech recognition model parameters sent by the external smart device for use by the speech recognition device 102. In this embodiment, the first communication device and the external smart device may be connected by a wired connection or a wireless connection, and when using the wireless connection, the first communication device may be a WIFI device, a bluetooth low energy connection BLE device, or other short-distance wireless connections, or may be connected by a wired connection, for example, a USB connection. The first communication device 104 is further configured to receive the speech recognition model parameters sent by the external smart device for use by the speech recognition device 102, which means that the speech recognition model parameters received by the first communication device are not necessarily directly transmitted to the speech recognition device, but are generally stored in a storage device, and are used by the speech recognition device when being used, and the storage device may also store multiple sets of speech recognition model parameters so as to be able to recognize speech commands of different people.

The present invention further provides a preferred embodiment, on the basis of the above embodiment, as shown in fig. 1, the offline voice terminal further includes a coding device 105 and/or a storage device 106; the encoding device 105 is connected between the microphone acquisition device 101 and the first communication device 104, and is configured to encode the acquired voice data and transmit the encoded voice data to the external smart device 2 through the first communication device 104; the storage 106 is connected to the first communication device 104 and the speech recognition device 102 for storing the speech recognition model parameters. The microphone collection device 101 encodes the collected voice command corpus into an audio file format suitable for BLE transmission through the encoding device 105, for example, the opus format, and then transmits the audio file format back to the external intelligent device through BLE, so that the corpus data can be smaller after encoding, and bandwidth transmission is saved more quickly.

The utility model also provides a preferred embodiment, outside smart machine 2 includes the speech training device, the speech training device is used for training and producing speech recognition model parameter according to speech data. Specifically, when the offline voice terminal with a voice function enters a voice training mode, a wireless or wired connection with an external smart device is established first, generally, the smart device generally selects a device with strong computing and processing capabilities and a network function, such as a commonly used smart phone, a tablet computer, a smart set-top box, and the following embodiments take a mobile phone as an example. For example, a data path for connecting an offline voice terminal and a mobile phone through bluetooth is established through BLE, and in the second step, the offline terminal performs corpus collection by using a microphone collection device 101 and transmits the collected corpus to the mobile phone through the bluetooth connection; and thirdly, the mobile phone enters a training stage, a local training algorithm library is called, the corpus training is carried out locally on the mobile phone according to the corpus of the command word collected just before, the voice recognition model parameter for the client is generated, and the generated voice recognition model parameter is sent back to the offline voice terminal.

The utility model also provides a preferred embodiment, outside smart machine 2 includes network device, network device further is connected with outside high in the clouds server, the high in the clouds server includes the speech training device, the speech training device is used for training and producing speech recognition model parameter according to speech data. The cloud server enters a training state, a training algorithm library which is richer in the cloud is called, training is carried out according to the collected linguistic data of the command words, voice recognition model parameters for the client are generated at the cloud, the generated voice recognition model parameters are sent back to the mobile phone, and the mobile phone sends the generated voice recognition model parameters to the offline voice terminal through BLE.

The utility model also provides a preferred embodiment, first communication device includes WIFI device or bluetooth device. Specifically, the bluetooth device includes but is not limited to a classic bluetooth device and a BLE bluetooth device, and when the first communication device is a bluetooth device, the second communication device is also a bluetooth device, so as to ensure that a bluetooth wireless connection between the offline terminal and the intelligent device is established. According to another aspect of the embodiment of the present invention, as shown in fig. 2, the embodiment of the present invention further provides a system for off-line voice terminal user training, which includes an off-line voice terminal 1 and an intelligent device 2; the offline voice terminal 1 includes: a microphone acquisition device 101, a voice recognition device 102, a control device 103, a first communication device 104; the intelligent device comprises a second communication device 201 and a voice training device 202; the second communication device 201 is connected 202 with the voice training device; the microphone acquisition device 101 is connected with the first communication device 104, and the first communication device 104 is used for being connected with the second communication device 201 in a wireless or wired manner to transmit the voice data acquired by the microphone acquisition device 101 to the second communication device 201; the control device 103 is connected with the voice recognition device 102, and the voice recognition device 102 is connected with the microphone acquisition device 101; the speech training apparatus 202 is configured to perform training according to speech data and generate speech recognition model parameters, and the first communication apparatus 104 is configured to receive the speech recognition model parameters trained by the external smart device 2 for use by the speech recognition apparatus 102. Specifically, when the offline voice terminal with the voice function enters the voice training mode, firstly, a wireless or wired connection with an external intelligent device is established, for example, a data path of bluetooth connection between the offline voice terminal 1 and the intelligent device 2 is established through BLE, and secondly, the offline voice terminal 1 performs corpus collection by using the microphone collection device 101 and transmits the collected corpus to the intelligent device 2 through bluetooth connection; of course, in practical application, the corpus may be collected first and then the data path may be established. And thirdly, the intelligent device 2 enters a training stage, calls a local training algorithm library, performs corpus training locally on the intelligent device 2 according to the corpus of the command word collected just before, generates a voice recognition model parameter for the client, and sends the generated voice recognition model parameter back to the offline voice terminal 1.

The present invention further provides a preferred embodiment, wherein on the basis of the above embodiment, the offline voice terminal further comprises a coding device and/or a storage device; the coding device is connected between the microphone acquisition device and the first communication device and used for coding the acquired voice data and transmitting the coded voice data to the external intelligent equipment through the first communication device; the storage device is connected with the first communication device and the voice recognition device and is used for storing the voice recognition model parameters. The microphone collection device encodes the collected voice command corpus into an audio file format suitable for BLE transmission through the encoding device, such as the opus format, and then transmits the audio file format back to the mobile phone through BLE, and after encoding, the corpus data can be smaller, and transmission is quicker.

According to the present invention, an embodiment is further provided, as shown in fig. 3, which further provides a system for off-line voice terminal user training, comprising an off-line voice terminal 1, an intelligent device 2 and a cloud server 3; the offline voice terminal 1 includes: a microphone acquisition device 101, a voice recognition device 102, a control device 103, a first communication device 104; the intelligent device 2 comprises a second communication device 201 and a network device 203; the cloud server 3 comprises a voice training device 301; the microphone acquisition device 101 is connected to the first communication device 104, and the first communication device 104 is used for connecting with the second communication device 201 wirelessly or by wire and transmitting the voice data acquired by the microphone acquisition device 101 to the second communication device 201; the network device 203 is configured to send the voice data to the cloud server 3 through a network; the voice training device 301 is configured to perform training according to the voice data and generate voice recognition model parameters; the network device is further configured to receive the speech recognition model parameters over a network; the second communication device 201 is further configured to send the speech recognition model parameters to the first communication device 104 for use by the speech recognition device 102; the control device 103 is connected to the speech recognition device 102, and the speech recognition device 102 is connected to the microphone collection device 101. The cloud server is connected with the intelligent network equipment through a network, and carries out corpus training on a corpus sent by the intelligent network equipment and generates voice recognition model parameters. The cloud server enters a training stage after receiving the collected linguistic data, calls a training algorithm library with richer cloud, trains according to the linguistic data of the collected command words, generates voice recognition model parameters for the client at the cloud, sends the generated voice recognition model parameters back to the intelligent device, and the intelligent device sends the generated voice recognition model parameters to the offline voice terminal through wired or wireless connection.

The utility model also provides a preferred embodiment, first communication device includes WIFI device or bluetooth device. Specifically, the bluetooth device includes but is not limited to a classic bluetooth device and a BLE bluetooth device, and when the first communication device is a bluetooth device, the second communication device is also a bluetooth device, so as to ensure that a bluetooth wireless connection between the offline terminal and the intelligent device is established.

It should be noted that the utility model provides an off-line terminal and intelligent device between wireless connection all are suitable for the bluetooth and are connected, especially BLE bluetooth is connected to can make off-line terminal utilize bluetooth function to realize the generation of speech recognition model parameter again with the help of intelligent terminal's throughput or network transmission ability, thereby conveniently update off-line terminal's speech recognition ability, the guarantee can realize high performance equipment's effect under the low-cost condition.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention in its corresponding aspects.

Claims

1. An offline voice terminal, comprising: the system comprises a microphone acquisition device, a voice recognition device, a control device and a first communication device; the microphone acquisition device is connected with the first communication device, and the first communication device is used for being in wireless or wired connection with external intelligent equipment and transmitting acquired voice data to the external intelligent equipment; the control device is connected with the voice recognition device; the voice recognition device is connected with the microphone acquisition device; the first communication device is also used for receiving the speech recognition model parameters sent by the external intelligent equipment for the speech recognition device to use.

2. The offline voice terminal according to claim 1, characterized in that said offline voice terminal further comprises encoding means and/or storage means; the coding device is connected between the microphone acquisition device and the first communication device and used for coding the acquired voice data and transmitting the coded voice data to the external intelligent equipment through the first communication device; the storage device is connected with the first communication device and the voice recognition device and is used for storing the voice recognition model parameters.

3. The offline voice terminal of claim 1, wherein the external smart device comprises a voice training device, and the voice training device is configured to perform training according to voice data and generate voice recognition model parameters.

4. The offline voice terminal of any one of claims 1 to 3, wherein the external smart device comprises a network device, the network device is further connected to an external cloud server, the cloud server comprises a voice training device, and the voice training device is configured to train according to voice data and generate voice recognition model parameters.

5. The offline voice terminal of any one of claims 1 to 3, wherein the first communication device comprises a WIFI device or a Bluetooth device.