CN111261168A - Speech recognition engine and method supporting multi-task and multi-model - Google Patents
Speech recognition engine and method supporting multi-task and multi-model Download PDFInfo
- Publication number
- CN111261168A CN111261168A CN202010069206.5A CN202010069206A CN111261168A CN 111261168 A CN111261168 A CN 111261168A CN 202010069206 A CN202010069206 A CN 202010069206A CN 111261168 A CN111261168 A CN 111261168A
- Authority
- CN
- China
- Prior art keywords
- model
- recognition
- voice
- user
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000010586 diagram Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013475 authorization Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
Abstract
The invention relates to a speech recognition engine and method for supporting multitask and multiple models, a speech recognition engine for supporting multitask and multiple models comprises a central device; the central equipment comprises a container, a voice acquisition module, a voiceprint recognition module and a result output module; the container is loaded with a speech recognition model; the voice acquisition module is used for acquiring voice information of a user, the voiceprint recognition module is used for carrying out voiceprint recognition on the voice information and determining a corresponding voice recognition model according to the voiceprint recognition structure, and after the voice recognition model recognizes the voice information, a recognition result is output through the result output module. The invention uploads the voice recognition model and the voiceprint model of each user, so that the central equipment can use the model of each participating user to perform voice recognition, and the central equipment deletes the loaded user model when the service is finished, thereby playing a role in protecting the private data and the model of the user.
Description
Technical Field
The invention belongs to the field of ultrasonic regulation and control, and relates to a voice recognition engine and method supporting multiple tasks and multiple models.
Background
With the increasing maturity of speech recognition technology, mainstream speech recognition products achieve high recognition accuracy. However, the mainstream technology collects voice data of a user, and transmits the voice data to the cloud for analysis, processing and model training, which infringes the guarantee of user privacy to a certain extent. Today, people's privacy awareness is increasing due to the rapid development of science and technology. Therefore, how to ensure the privacy of user data while performing voice recognition is a considerable research issue.
Disclosure of Invention
The invention provides a speech recognition engine and a method supporting multiple tasks and multiple models, the speech recognition engine does not infringe the privacy of users and collect the speech data of the users privately, all training is finished on the private equipment of the users, and when needing to carry out recognition service, the speech recognition model and the voiceprint model of each user are uploaded, so that the central equipment (such as a conference recorder) can carry out speech recognition by using the model of each participating user, and when the service is finished, the central equipment deletes the loaded user model, thereby playing the role of protecting the private data and the model of the users.
The technical scheme for solving the problems is as follows: a speech recognition engine that supports multitasking and multiple models, characterized by:
comprises a central device;
the central equipment comprises a container, a voice acquisition module, a voiceprint recognition module and a result output module; the container is loaded with a speech recognition model;
the voice acquisition module is used for acquiring voice information of a user, the voiceprint recognition module is used for carrying out voiceprint recognition on the voice information and determining a corresponding voice recognition model according to the voiceprint recognition structure, and after the voice recognition model recognizes the voice information, a recognition result is output through the result output module.
Preferably, the number of containers is at least two.
Preferably, each container is loaded with a different speech recognition model.
Preferably, the container and the voiceprint recognition module may be in a central device (i.e., local) or in a cloud.
Preferably, the bottom layer system of the central device is an android system.
A speech recognition method supporting multitask and multiple models is characterized by comprising the following steps:
1) acquiring user voice information;
2) performing voiceprint recognition on the voice information;
3) after the corresponding user is identified, the voice information is transmitted to a container corresponding to the user, and voice identification is carried out by using a voice identification model of the corresponding user;
4) and recording and outputting the recognition result.
Preferably, the method further comprises a step 5) of deleting the voice recognition model of the corresponding user after the recognition result is output.
Preferably, the speech recognition model corresponding to the user refers to a speech recognition model uploaded by the user.
The invention has the advantages that:
the invention provides a voice recognition engine and a method supporting multitask and multiple models, which have the advantages that enterprises or other users cannot obtain voice data and models of the users, so that private data of the users are protected; the user can decide whether to authorize the model obtained by the voice training of the user or not; the distributed voice recognition container also enables users to upload different voice models which are most suitable for the users; the whole engine achieves privacy and individuation.
Drawings
FIG. 1 is a flow chart of a multitasking and multimodal speech recognition service proposed by the present invention;
FIG. 2 is a diagram of a central facility architecture according to the present invention;
FIG. 3 is a flow chart of the operation of the center device in the present invention;
FIG. 4 is a diagram of a multi-tasking multi-model cloud identification architecture according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
A speech recognition method supporting multitask and multiple models comprises the following steps:
1) acquiring user voice information;
2) performing voiceprint recognition on the voice information;
3) after the corresponding user is identified, the voice information is transmitted to a container corresponding to the user, and voice recognition is carried out by using a voice recognition model of the corresponding user;
4) and recording and outputting the recognition result.
Preferentially, the method also comprises a step 5) of disconnecting the user after the recognition result is output and deleting the voice recognition model of the corresponding user.
Preferably, the speech recognition model corresponding to the user refers to a speech recognition model uploaded by the user.
Based on the above method, the present invention provides a speech recognition engine supporting multitask and multiple models, as shown in fig. 1, including a central device; the central equipment comprises a container, a voice acquisition module, a voiceprint recognition module and a result output module; the container is loaded with a speech recognition model; the voice acquisition module is used for acquiring voice information of a user, the voiceprint recognition module is used for carrying out voiceprint recognition on the voice information and determining a corresponding voice recognition model according to the voiceprint recognition structure, and after the voice recognition model recognizes the voice information, a recognition result is output through the result output module.
Preferably, the number of the containers may be plural.
Preferably, each container is loaded with a different speech recognition model, or each container is loaded with the same speech recognition model, with the parameters of each speech recognition model being different.
Preferably, the above-mentioned speech recognition model and voiceprint model of the user are both performed on the user's private device. When the central equipment/system needs to use the models of the users, the voice and voiceprint models of the users are uploaded to the central equipment through a mode of manual authorization of the users, containers used by each loading model in the equipment are isolated from each other and do not affect each other, and different types of models can be loaded. After the required voice service is finished, each container in the central equipment deletes the user model loaded by each container.
The central equipment architecture in the invention is as follows:
the bottom layer system of the central device is an android system, referring to fig. 2, a plurality of docker containers are operated on the android system, the specific number of the containers is determined by the number of connected users, and each container is allocated with the same amount of GPU and CPU resources. By the method, the isolation among the containers is ensured, and the condition that the model which is in operation by the other side cannot be obtained among the containers is ensured. Each container has a respective TensorflowLite. Each TensorflowLite is responsible for loading a model for one user. Taking the following (architecture) diagram as an example, the number of connected users is three, three containers are established in an equipment main system (android), each container is respectively provided with a TensorflowLite, and each TensorflowLite is used for loading a speech recognition model of a corresponding user.
The user mobile phone and the center device in the invention are interacted as follows:
when the mobile phone of the user and the central device are in the same network environment, the user can transmit the own pb model file to the central device by using the app on the mobile phone. Because the central facility opens a separate container space for each user, the user can use different speech recognition models, or the same model with parameters custom-trained from the user's own data. Achieving the effect of individuation or multiple models. The whole steps are as follows:
connecting a wireless network (the same as the central device);
opening app, confirming and authorizing transmission of the pb model file;
and after the transmission of all the users is finished, sending an instruction for starting the identification record to the central equipment.
Identification and recording of the central device in the invention:
when the recognition service is started, the voice signal is acquired by the central equipment through the microphone, the central equipment recognizes the user corresponding to the voice section by comparing the voiceprint models uploaded by all the participating users, and the container corresponding to the user is searched through the hash table.
The central equipment pushes the voice to a container of a user corresponding to the central equipment, the model in the container is used for voice recognition, the container pushes the recognized voice back to the central equipment, and the central equipment records the relation between the model and the model in the container, wherein the relation is that the model in the container is used for voice recognition, the container is used for recording the relation between the model in the central equipment and the model in the central equipment, and the: xxxx "is recorded and saved, and the specific flow is shown in fig. 3.
Multitasking and multiple models
A) Multitasking:
because the container system created for each user has relative independence, each container is respectively responsible for processing the voice data of a single user corresponding to the container;
B) multiple models are as follows:
the user is free to upload the model. The model uploaded by the user A and the model uploaded by the user B have different model structures, or have the same model structure but different parameters, namely the model is a multi-model. The model can be a model which is trained by a user by using personal data and optimized aiming at individuals, so that the recognition effect is optimal as much as possible.
Preferentially, the container and the voiceprint recognition module of the central device are arranged at the cloud end.
The architecture in the central device may be implemented in the cloud (as in fig. 4):
1) the user is still connected with the central equipment in a mobile phone authorization mode and authorizes the use of the user voiceprint and voice recognition model;
2) the central equipment is connected with the cloud end through a network, the same central equipment and a container as those in the invention are established at the cloud end, and the user model is transmitted to the cloud end;
3) after the service is started, the central equipment transmits voice input to the cloud end, the cloud end carries out voiceprint recognition firstly, and the voiceprint module transmits the voice to a container corresponding to a voiceprint recognition result to carry out voice recognition;
4) the cloud combines the results of voiceprint recognition and speech recognition into a "xxx (someone): xxxxx (speech specific content) "form is sent back to the center device;
5) the central equipment stores the received identification result.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, or applied directly or indirectly to other related systems, are included in the scope of the present invention.
Claims (8)
1. A speech recognition method supporting multitask and multiple models is characterized by comprising the following steps:
1) acquiring user voice information;
2) performing voiceprint recognition on the voice information;
3) after the corresponding user is identified, the voice information is transmitted to a container corresponding to the user, and voice identification is carried out by using a voice identification model of the corresponding user;
4) and recording and outputting the recognition result.
2. The method of claim 1, wherein the model is a multi-task model,
and 5), after the recognition result is output, deleting the voice recognition model of the corresponding user.
3. A speech recognition engine supporting multitasking and multiple models according to claim 1 or 2 and characterized by:
the voice recognition model corresponding to the user refers to a voice recognition model uploaded by the user.
4. A speech recognition engine that supports multitasking and multiple models, characterized by:
comprises a central device;
the central equipment comprises a container, a voice acquisition module, a voiceprint recognition module and a result output module; the container is loaded with a speech recognition model;
the voice acquisition module is used for acquiring voice information of a user, the voiceprint recognition module is used for carrying out voiceprint recognition on the voice information and determining a corresponding voice recognition model according to the voiceprint recognition structure, and after the voice recognition model recognizes the voice information, a recognition result is output through the result output module.
5. A speech recognition engine supporting multitasking and multiple models according to claim 4 and wherein:
the number of the containers is at least two.
6. A speech recognition engine supporting multitasking and multiple models according to claim 5 and wherein:
each container is loaded with a different speech recognition model.
7. A speech recognition engine supporting multitasking and multiple models according to any one of claims 4-6 and wherein:
the container and the voiceprint recognition module are located in the central device or the cloud.
8. A speech recognition engine supporting multitasking and multiple models according to claim 7 and wherein:
the bottom layer system of the central equipment is an android system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010069206.5A CN111261168A (en) | 2020-01-21 | 2020-01-21 | Speech recognition engine and method supporting multi-task and multi-model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010069206.5A CN111261168A (en) | 2020-01-21 | 2020-01-21 | Speech recognition engine and method supporting multi-task and multi-model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111261168A true CN111261168A (en) | 2020-06-09 |
Family
ID=70954670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010069206.5A Pending CN111261168A (en) | 2020-01-21 | 2020-01-21 | Speech recognition engine and method supporting multi-task and multi-model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111261168A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111785275A (en) * | 2020-06-30 | 2020-10-16 | 北京捷通华声科技股份有限公司 | Voice recognition method and device |
CN113823263A (en) * | 2020-06-19 | 2021-12-21 | 深圳Tcl新技术有限公司 | Voice recognition method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096941A (en) * | 2015-09-02 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
CN105931643A (en) * | 2016-06-30 | 2016-09-07 | 北京海尔广科数字技术有限公司 | Speech recognition method and apparatus |
CN110675872A (en) * | 2019-09-27 | 2020-01-10 | 青岛海信电器股份有限公司 | Voice interaction method based on multi-system display equipment and multi-system display equipment |
-
2020
- 2020-01-21 CN CN202010069206.5A patent/CN111261168A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096941A (en) * | 2015-09-02 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
CN105931643A (en) * | 2016-06-30 | 2016-09-07 | 北京海尔广科数字技术有限公司 | Speech recognition method and apparatus |
CN110675872A (en) * | 2019-09-27 | 2020-01-10 | 青岛海信电器股份有限公司 | Voice interaction method based on multi-system display equipment and multi-system display equipment |
Non-Patent Citations (1)
Title |
---|
董泽: "基于容器的任务卸载技术研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113823263A (en) * | 2020-06-19 | 2021-12-21 | 深圳Tcl新技术有限公司 | Voice recognition method and system |
WO2021253779A1 (en) * | 2020-06-19 | 2021-12-23 | 深圳Tcl新技术有限公司 | Speech recognition method and system |
CN111785275A (en) * | 2020-06-30 | 2020-10-16 | 北京捷通华声科技股份有限公司 | Voice recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6687671B2 (en) | Method and apparatus for automatic collection and summarization of meeting information | |
US9424836B2 (en) | Privacy-sensitive speech model creation via aggregation of multiple user models | |
CN103081004B (en) | For the method and apparatus providing input to voice-enabled application program | |
CN104679631B (en) | Method of testing and system for the equipment based on android system | |
CN109388701A (en) | Minutes generation method, device, equipment and computer storage medium | |
US9047506B2 (en) | Computer-readable recording medium storing authentication program, authentication device, and authentication method | |
WO2015024413A1 (en) | Conference summary extraction method and device | |
CN111261168A (en) | Speech recognition engine and method supporting multi-task and multi-model | |
US20110316671A1 (en) | Content transfer system and communication terminal | |
CN104038354A (en) | Intelligent mobile phone-based conference interaction method | |
US20130243186A1 (en) | Audio encryption systems and methods | |
CN105897686A (en) | Smart television user account speech management method and smart television | |
CN107862071A (en) | The method and apparatus for generating minutes | |
CN110060656A (en) | Model management and phoneme synthesizing method, device and system and storage medium | |
CN112350834B (en) | AI voice conference system with screen and method | |
CN109493866A (en) | Intelligent sound box and its operating method | |
CN109637534A (en) | Voice remote control method, system, controlled device and computer readable storage medium | |
JP4469867B2 (en) | Apparatus, method and program for managing communication status | |
CN105427857B (en) | Generate the method and system of writing record | |
CN104113604A (en) | Implementation method of voice rapid acquisition in cloud environment | |
CN113241070A (en) | Hot word recall and updating method, device, storage medium and hot word system | |
CN107122291A (en) | Mobile terminal software stability test method and apparatus | |
KR101351264B1 (en) | System and method for message translation based on voice recognition | |
CN108419108A (en) | Sound control method, device, remote controler and computer storage media | |
JP2018120203A (en) | Information processing method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200609 |
|
RJ01 | Rejection of invention patent application after publication |