CN111599350B

CN111599350B - Command word customization identification method and system

Info

Publication number: CN111599350B
Application number: CN202010266075.XA
Authority: CN
Inventors: 许东星; 曹昊; 周雷
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2023-02-28
Anticipated expiration: 2040-04-07
Also published as: CN111599350A

Abstract

The invention provides a command word customized identification method and a system, wherein the method comprises the following steps: step 1: receiving an input project requirement, analyzing a project command word list based on the project requirement, and generating a project data acquisition task; step 2: issuing a training data acquisition task through an online task platform; and step 3: acquiring test data in a preset scene through a recording device based on the test data acquisition task; and 4, step 4: generating a command word voice recognition model according to the project command word list, the training data and the test data based on an automatic model training platform; and 5: and adding the command word sound recognition model into a version management tool, and constructing an engine through Jenkins. The command word customized recognition method of the invention adopts the method of collecting training data (voice) and simulating data through the online task platform, thereby greatly reducing the cost and period of data collection and ensuring the performance of command word recognition.

Description

Command word customization identification method and system

Technical Field

The invention relates to the technical field of voice recognition, in particular to a command word customization recognition method and system.

Background

At present, an off-line command word detection system is generally used for solving the problem of voice recognition of fixed and limited command words. The generic model of command word detection systems often has difficulty achieving good performance due to user age, accent, etc. Aiming at the problems of user age, accent and the like, a model customization method is needed for different projects.

The traditional model customization not only needs to collect a large amount of voices and labels of real scenes, but also needs a large amount of manual participation in the model customization and release process to adjust and optimize parameters, so that the project cycle is long, and the consumed labor cost and material resource cost are greatly increased.

Disclosure of Invention

The invention provides a command word customization recognition method, which is a method for collecting training data (voice) through an online task platform and performing data simulation, so that the cost and the period of data collection are greatly reduced, and the performance of command word recognition is ensured.

The embodiment of the invention provides a command word customization identification method, which comprises the following steps:

step 1: receiving an input project requirement, analyzing a project command word list based on the project requirement, and generating a project data acquisition task; the project data acquisition task comprises a training data acquisition task and a test data acquisition task;

step 2: issuing a training data acquisition task through an online task platform, and receiving training data uploaded based on the training data acquisition task through the online task platform;

and step 3: acquiring test data in a preset scene through a recording device based on the test data acquisition task;

and 4, step 4: generating a command word sound recognition model according to the project command word list, the training data and the test data based on an automatic model training platform;

and 5: and adding the command word voice recognition model into a version management tool, and constructing an engine through Jenkins.

Preferably, the generating of the command word speech recognition model according to the project command vocabulary, the training data and the test data specifically includes:

configuring training data into a plurality of training groups according to a preset first rule;

configuring the test data into test groups;

performing data simulation and expansion on the training set by adopting a data enhancement method;

sequentially adopting one of a plurality of training groups after data enhancement, training the deep neural network model by adjusting parameter configuration, and obtaining a plurality of initial models; the training groups correspond to the initial models one by one;

performing model evaluation on each initial model by adopting a test group and generating an evaluation report, wherein the evaluation report comprises the reference recognition rate of the initial model;

selecting a model with the highest reference recognition rate from the plurality of initial model models as a command word voice recognition model;

and outputting a command word voice recognition model and a release evaluation report.

Preferably, the data enhancement method includes one or more of loading noise, increasing reverberation, and increasing or decreasing speech rate.

Preferably, the preset scenario includes: one or more of a mall, a movie theater, a parking lot, a school and a vegetable farm.

Preferably, the training data comprises a close-talking silent speech without background noise.

The invention also provides a command word customization recognition system, which comprises:

the task generation module is used for receiving the input project requirement, analyzing a project command word list based on the project requirement and generating a project data acquisition task; the project data acquisition task comprises a training data acquisition task and a test data acquisition task;

the training data acquisition module is used for issuing a training data acquisition task through the online task platform and receiving training data uploaded based on the training data acquisition task through the online task platform;

the test data acquisition module is used for acquiring test data in a preset scene through the recording equipment based on the test data acquisition task;

the model generation module is used for generating a command word sound recognition model according to the project command word list, the training data and the test data based on the automatic model training platform;

and the engine generation module is used for adding the command word voice recognition model into the version management tool and constructing an engine through Jenkins.

Preferably, the model generation module specifically operates to:

configuring the test data into test groups;

sequentially carrying out model evaluation on each initial model by adopting the test group and generating an evaluation report, wherein the evaluation report comprises the reference recognition rate of the initial model;

and outputting a command word voice recognition model and issuing an evaluation report.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic diagram of a command word customization identification method according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.

The embodiment of the invention provides a command word customization identification method, as shown in fig. 1, comprising the following steps:

and 2, step: issuing a training data acquisition task through an online task platform, and receiving training data uploaded based on the training data acquisition task through the online task platform;

and 3, step 3: acquiring test data in a preset scene through recording equipment based on the test data acquisition task;

The working principle and the beneficial effects of the technical scheme are as follows:

receiving an input project requirement, and analyzing a project command word list based on the project requirement; the project requirement can be a customized command word, and a plurality of command words which can be customized are stored in the project command word table; analyzing the project command word list to generate a project data acquisition task; it is desirable to customize how much training data, and how much test data, the command words in the project requirements need to be collected. The training data is a task issued through an online task platform, and received tasks record voice required by the training data; the test data is collected by a special project group in a preset scene by using a recording device. After data acquisition is finished, training a model by adopting training data, and testing convergence is carried out on the model by using test data so as to obtain a command word tone recognition model; the model needs to be capable of being used, and a command word voice recognition model is added into a version management tool, and an engine is built through Jenkins; and completing the command word customized recognition. The command word voice recognition model is a deep learning convolutional neural network model and is used for recognizing voice so as to recognize whether a command word exists in the voice. The version management tool is used for managing a plurality of voice recognition models; the engine is an application program which is constructed by taking the deep-learning convolutional neural network model as a core and is used for recognizing the voice, and the application program comprises processing programs such as voice recording and voice noise reduction.

The command word customization recognition method of the invention collects training data (voice) through the on-line task platform and carries out data simulation, thereby greatly reducing the cost and the period of data collection and ensuring the performance of command word recognition.

In addition, in the model training and publishing process, a process-based tool-based standardized training process (an automatic model training platform) is used for replacing manual participation, and the project efficiency can be greatly improved.

In one embodiment, the generating a command word pronunciation recognition model according to the project command word list, the training data and the test data specifically comprises:

configuring the test data into test groups;

outputting a command word voice recognition model and issuing an evaluation report;

the data enhancement method comprises one or more of loading noise, increasing reverberation, and increasing or decreasing speech speed.

the preset first rule is not simply to evenly distribute the training data into a plurality of groups. The training set used for training the model is a training set with enhanced data, which is clean speech in nature, and different data enhancement methods are added to generate different enhanced training data, and then the different enhanced training data are combined into different combinations.

The automation of model training is realized, manual participation is replaced, and the project efficiency can be greatly improved. And generating a plurality of initial models by adopting a plurality of groups of training data, thereby selecting the model with the highest recognition rate from the generated plurality of initial models and ensuring that the final engine has higher recognition rate. The initial model, which is essentially a deep learning convolutional neural network model, is a model in an initial state generated after training by training data, and is not tested and verified by testing data.

In one embodiment, the preset scenario includes: one or more of a mall, a movie theater, a parking lot, a school and a vegetable farm.

The preset scenes are specifically applied scenes of the engine, and specific interferences exist in markets, cinemas, parking lots, schools, vegetable fields and the like, so that the recognition rate of the engine is remarkably improved by testing the data acquired in the scenes.

In one embodiment, the training data includes near-speaking quiet speech without background noise.

The close-talking quiet voice is voice in a quiet environment within a preset distance; the training data must be pure speech, i.e. non-interfering speech, so that close-talking quiet speech is relatively optimal as training data.

the task generation module is used for receiving the input project requirements, analyzing a project command word list based on the project requirements and generating project data acquisition tasks; the project data acquisition task comprises a training data acquisition task and a test data acquisition task;

the task generation module receives an input project requirement and analyzes a project command word list based on the project requirement; the project requirement can be a customized command word, and a plurality of command words which can be customized are stored in the project command word table; analyzing the project command word list to generate a project data acquisition task; it is desirable to customize how much training data, and how much test data, the command words in the project requirements need to be collected. The training data is a task issued by an on-line task platform through a training data acquisition module, and the received task records the voice required by the training data; the test data is acquired by a test data acquisition module through a special project group by adopting a recording device in a preset scene. After data acquisition is completed, the model generation module adopts a training data training model, and test data carries out test convergence on the model so as to obtain a command word tone recognition model; the model needs to be used, and an engine generation module is also needed to be adopted to add the command word voice recognition model into a version management tool, and an engine is built through Jenkins; thus, the command word customization recognition is completed. Jenkins is an open-source software project, is a continuous integration tool developed based on Java, is used for monitoring continuous and repeated work, aims to provide an open and easy-to-use software platform, and enables continuous integration of software to be possible.

The command word customization recognition system of the invention collects training data (voice) through the on-line task platform and carries out data simulation, thereby greatly reducing the cost and the period of data collection and ensuring the performance of command word recognition.

In one embodiment, the model generation module is specifically operable to:

configuring the test data into test groups;

sequentially adopting one of a plurality of training groups after data enhancement, and training the deep neural network model by adjusting parameter configuration to obtain a plurality of initial models; the training groups correspond to the initial models one by one;

The automation of model training is realized, manual participation is replaced, and the project efficiency can be greatly improved. And generating a plurality of initial models by adopting a plurality of groups of training data, so that a model with the highest recognition rate is selected from the generated plurality of initial models, and the final engine is ensured to have higher recognition rate.

In one embodiment, the training data includes a close-talking quiet voice without background noise.

The training data must be pure speech, i.e. non-interfering speech, so that close-talking quiet speech is relatively optimal as training data.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A command word customized recognition method is characterized by comprising the following steps:

and 2, step: the training data acquisition task is issued through an online task platform, and the training data uploaded based on the training data acquisition task is received through the online task platform;

and step 3: collecting test data in a preset scene through recording equipment based on the test data collection task;

and 5: adding the command word voice recognition model into a version management tool, and constructing an engine through Jenkins;

the method comprises the following steps of generating a command word sound recognition model according to the project command word list, the training data and the test data based on an automatic model training platform, and specifically comprises the following steps:

configuring the training data into a plurality of training groups according to a preset first rule;

configuring the test data into test groups;

sequentially adopting one of the plurality of training groups after data enhancement, and training the deep neural network model by adjusting parameter configuration to obtain a plurality of initial models; the training groups correspond to the initial models one by one;

selecting a model with the highest reference recognition rate from a plurality of initial models as the command word voice recognition model;

outputting the command word tone recognition model and issuing the assessment report.

2. The method of claim 1, wherein the data enhancement method comprises one or more of loading noise, adding reverberation, increasing or decreasing speech rate.

3. The method for customized recognition of command words according to claim 1, wherein the preset scenario comprises: one or more of a mall, a movie theater, a parking lot, a school and a vegetable farm.

4. The method of claim 1, wherein the training data comprises a background noise free silent speech.

5. A command word custom recognition system, comprising:

the task generation module is used for receiving an input project requirement, analyzing a project command word list based on the project requirement and generating a project data acquisition task; the project data acquisition task comprises a training data acquisition task and a test data acquisition task;

the training data acquisition module is used for releasing the training data acquisition task through an online task platform and receiving training data uploaded based on the training data acquisition task through the online task platform;

the model generation module is used for generating a command word sound recognition model according to the project command word list, the training data and the test data based on an automatic model training platform;

the engine generation module is used for adding the command word voice recognition model into a version management tool and constructing an engine through Jenkins;

the model generation module specifically operates as follows:

configuring the test data into test groups;

sequentially adopting one of the plurality of training groups after data enhancement, and training the deep neural network model by adjusting parameter configuration to obtain a plurality of initial models; the training sets correspond to the initial models one to one;

outputting the command word tone recognition model and issuing the evaluation report.

6. The system of claim 5, wherein the data enhancement method comprises one or more of loading noise, increasing reverberation, increasing speech rate, or decreasing speech rate.

7. The command word custom recognition system of claim 5, wherein the preset scenario comprises: one or more of a mall, a movie theater, a parking lot, a school and a vegetable farm.

8. The command word custom recognition system of claim 5, wherein the training data comprises a background noise free, close-talking silent speech.