CN110400560A

CN110400560A - Data processing method and device, storage medium, electronic device

Info

Publication number: CN110400560A
Application number: CN201910673507.6A
Authority: CN
Inventors: 郭欣; 唐大闰
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2019-11-01
Anticipated expiration: 2039-07-24
Also published as: CN110400560B

Abstract

The present invention provides a kind of data processing method and device, storage medium, electronic equipments, wherein, the above method includes: that the first tone testing data that will acquire are input in the first model, wherein first model is used to the first tone testing data being converted to the second tone testing data；Obtain the second tone testing data of the first model output；The second tone testing data are input in the second model, to indicate that second model is trained according to parameter of the second tone testing data to second model, wherein, second model is for identifying voice messaging, the voice messaging includes: the first tone testing data, the second tone testing data.

Description

Data processing method and device, storage medium, electronic device

Technical field

The present invention relates to computer fields, in particular to a kind of data processing method and device, storage medium, electricity Sub-device.

Background technique

In the related technology, the mark of speech recognition training data is at high cost, but it is relatively easy to acquire data.Assuming that existing The mandarin pronunciation data that marks of a batch, but using the trained speech recognition system of the voice data for the language that has an accent Sound data discrimination is not high.However acquire the data having an accent and it is labeled, training one is directed to the language of this kind of accent Sound identifying system cost is relatively high.

For in the related technology, during training speech model, for two kinds of tone testing data, model can not effectively into The problems such as row identification, not yet there is effective solution at present.

Summary of the invention

The embodiment of the invention provides a kind of data processing method and device, storage medium, electronic devices, to solve voice In identifying system, training is directed to the problems such as speech recognition system having an accent is prohibitively expensive.According to one embodiment of present invention, Provide a kind of data processing method, comprising: the first tone testing data that will acquire are input in the first model, wherein First model is used to the first tone testing data being converted to the second tone testing data；Obtain the first model output The second tone testing data；The second tone testing data are input in the second model, to indicate second model Be trained according to parameter of the second tone testing data to second model, wherein second model for pair Voice messaging is identified that the voice messaging includes: the first tone testing data, the second tone testing data.

In embodiments of the present invention, it obtains first model and is directed to the third that the first tone testing data are exported Tone testing data；Using the third tone testing data as the input of first model, so that first model is defeated To the specified content in the second tone testing data of the second model, accounting is more than pre- in the second tone testing data out If threshold value.

In embodiments of the present invention, the second tone testing data are input in the second model, to indicate described After two models are trained according to parameter of the second tone testing data to second model, the method is also wrapped It includes: the second model corresponding to the parameter after determining training；The second model is to language according to corresponding to the parameter after the training Message breath is identified, recognition result is obtained；Show the recognition result.

In embodiments of the present invention, the first model includes: Feature Conversion network；Second model includes: two classification nerve nets Network.

In embodiments of the present invention, the first tone testing data include: test data corresponding to standard mandarin voice, Second tone testing data include: test data corresponding to non-standard mandarin.

According to another embodiment of the invention, a kind of data processing equipment is additionally provided, comprising: the first input module, The first tone testing data for will acquire are input in the first model, wherein first model is used for the first language Sound test data conversion is the second tone testing data；First obtains module, for obtaining the second of the first model output Tone testing data；Second input module, for the second tone testing data to be input in the second model, to indicate It states the second model to be trained according to parameter of the second tone testing data to second model, wherein described second For model for identifying to voice messaging, the voice messaging includes: the first tone testing data, second voice Test data.

In embodiments of the present invention, described device further include: second obtains module, is directed to for obtaining first model The third tone testing data that the first tone testing data are exported；Processing module is used for the third tone testing Input of the data as first model, so that first model is output in the second tone testing data of the second model Specified content in the second tone testing data accounting be more than preset threshold.

In embodiments of the present invention, described device further include: determining module, for determining corresponding to the parameter after training Second model；Identification module identifies voice messaging for the second model according to corresponding to the parameter after the training, Obtain recognition result；Display module, for showing the recognition result.

According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.

According to another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is special Sign is, computer program is stored in the memory, and the processor is arranged to run the computer program to hold Element processing method described in row any of the above item.

Through the invention, the first tone testing data that will acquire are input in the first model, wherein first mould Type is used to the first tone testing data being converted to the second tone testing data；Obtain the second voice of the first model output Test data；The second tone testing data are input in the second model, to indicate second model according to described Two tone testing data are trained the parameter of second model, wherein second model be used for voice messaging into Row identification, the voice messaging includes: the first tone testing data, the second tone testing data, using above-mentioned skill Art scheme solves in the related technology, and during training speech model, for two kinds of tone testing data, model can not be effective First tone testing data can be converted to the second tone testing data by the first model by the problems such as being identified, make One tone testing data and the second tone testing data have similitude, then using the second tone testing data after conversion into Row training can avoid being labeled the second tone testing data by adopting the above technical scheme, not only reduce to the second voice The cost that test data is labeled, and may be implemented to have the first tone testing data and the second tone testing data Effect identification.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is a kind of flow chart of optional data processing method according to an embodiment of the present invention；

Fig. 2 is a kind of flow chart of optional speech recognition system training method according to an embodiment of the present invention；

Fig. 3 is a kind of structural block diagram of optional data processing equipment according to an embodiment of the present invention；

Fig. 4 is a kind of another structural block diagram of optional data processing equipment according to an embodiment of the present invention.

Specific embodiment

Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.

Fig. 1 is a kind of flow chart of optional data processing method according to an embodiment of the present invention, as shown in Figure 1, the stream Journey includes the following steps:

Step S102, the first tone testing data that will acquire are input in the first model, wherein first model For the first tone testing data to be converted to the second tone testing data；

Step S104 obtains the second tone testing data of the first model output；

The second tone testing data are input in the second model by step S106, to indicate the second model root It is trained according to parameter of the second tone testing data to second model, wherein second model is used for language Message breath is identified that the voice messaging includes: the first tone testing data, the second tone testing data.

In embodiments of the present invention, before the second tone testing data being input in the second model, the method Further include: it obtains first model and is directed to the third tone testing data that the first tone testing data are exported；By institute Input of the third tone testing data as first model is stated, so that first model is output to the second of the second model Specified content in the tone testing data accounting in the second tone testing data is more than preset threshold.

Wherein, the first tone testing data (such as can be standard accent data) are input in the first model, can be obtained To above-mentioned third voice data (such as can be the data that have an accent), until obtained third tone testing data (such as can be Accent data) it is more than preset threshold or more (such as can be 95% or more), then it represents that and third tone testing data at this time can be with It is input in the second model as the second tone testing data.

Optionally, the first tone testing data that will acquire are input in the first model, comprising: obtain default region Tone testing data are as the first tone testing data；The the first tone testing data that will acquire are input to the first model In.

Wherein, the default region can be the region of the mandarins standards of comparison such as Beijing.

Above-mentioned data handling procedure is explained below in conjunction with an example, but is not used in the restriction embodiment of the present invention Technical solution, the exemplary technical solution of the present invention is as follows:

Fig. 2 is a kind of flow chart of optional speech recognition system training method according to an embodiment of the present invention, such as Fig. 2 institute Show, which includes:

Step 1, using standard accent data and the data that have an accent, one two Classification Neural of training, the two classification mind It can be deep neural network (Deep Neural Network, abbreviation DNN) through network.Wherein, standard accent data correspond to Above-mentioned first tone testing data；The data that have an accent correspond to above-mentioned second tone testing data.

Step 2, using standard accent data one Feature Conversion network of training, using the output of the network as two classification minds Input through network (such as DNN).Then, the parameter of continuous repetitive exercise this feature switching network inputs standard accent data This feature switching network after iteration, obtains the data that have an accent, until the probability of the obtained data that have an accent reaches 95% (i.e. Above-mentioned preset threshold) more than, then stop iteration.It should be noted that the process only trains the parameter of this feature switching network, no The parameter of two Classification Neurals of training.Wherein, features described above switching network can be understood as a kind of neural network, may be implemented First tone testing data are converted to the function of the second tone testing data, it can realization is converted to standard accent data The function for the data that have an accent.

Step 3, standard accent data are inputted into trained Feature Conversion network, output it as speech recognition system Feature, which is trained.And the speech recognition system is applied in the scene having an accent and is identified, is identified As a result.

It can avoid being labeled the data that have an accent by adopting the above technical scheme, by carrying out standard accent data characteristics Strengthen, makes its feature and the data that have an accent are with high similitude, be trained using the data that have an accent after reinforcing, not only subtracted Lack and the data that have an accent are marked with prohibitively expensive problem, and has improved speech recognition system to the robust for the data that have an accent Property.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

A kind of data processing equipment is additionally provided in the present embodiment, and the data processing equipment is for realizing above-described embodiment And preferred embodiment, the descriptions that have already been made will not be repeated.As used below, term " module " may be implemented to make a reservation for The combination of the software and/or hardware of function.It is hard although device described in following embodiment is preferably realized with software The realization of the combination of part or software and hardware is also that may and be contemplated.

Fig. 3 is a kind of structural block diagram of optional data processing equipment according to an embodiment of the present invention, as shown in figure 3, should Device includes:

First input module 30, the first tone testing data for will acquire are input in the first model, wherein institute The first model is stated for the first tone testing data to be converted to the second tone testing data；

First obtains module 32, for obtaining the second tone testing data of the first model output；

Second input module 34, for the second tone testing data to be input in the second model, described in instruction Second model is trained according to parameter of the second tone testing data to second model, wherein second mould For type for identifying to voice messaging, the voice messaging includes: the first tone testing data, and second voice is surveyed Try data.

In embodiments of the present invention, Fig. 4 is a kind of the another of optional data processing equipment according to an embodiment of the present invention Structural block diagram, as shown in figure 4, described device further include:

Second obtains module 36, the exported for obtaining first model for the first tone testing data Three tone testing data；

Processing module 38, for using the third tone testing data as the input of first model, so that described First model is output to the specified content in the second tone testing data of the second model in the second tone testing data Accounting is more than preset threshold.

In embodiments of the present invention, as shown in figure 4, described device further include:

Determining module 40, for determining the second model corresponding to the parameter after training；

Identification module 42 knows voice messaging for the second model according to corresponding to the parameter after the training Not, recognition result is obtained；

Display module 44, for showing the recognition result.

It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor；Alternatively, above-mentioned modules are with any Combined form is located in different processors.

The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.

Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:

S1, the first tone testing data that will acquire are input in the first model, wherein first model is used for will First tone testing data are converted to the second tone testing data；

S2 obtains the second tone testing data of the first model output；

The second tone testing data are input in the second model by S3, to indicate second model according to Second tone testing data are trained the parameter of second model, wherein second model is used for voice messaging It is identified, the voice messaging includes: the first tone testing data, the second tone testing data.

Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.

The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.

Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.

Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:

S2 obtains the second tone testing data of the first model output；

Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of data processing method characterized by comprising

The the first tone testing data that will acquire are input in the first model, wherein first model is used for the first language Sound test data conversion is the second tone testing data；

Obtain the second tone testing data of the first model output；

The second tone testing data are input in the second model, to indicate second model according to second voice Test data is trained the parameter of second model, wherein second model for being identified to voice messaging, The voice messaging includes: the first tone testing data, the second tone testing data.

2. the method according to claim 1, wherein the second tone testing data are input to the second model In before, the method also includes:

It obtains first model and is directed to the third tone testing data that the first tone testing data are exported；

Using the third tone testing data as the input of first model, so that first model is output to the second mould Specified content in second tone testing data of the type accounting in the second tone testing data is more than preset threshold.

3. the method according to claim 1, wherein the second tone testing data are input to the second model In, to indicate that second model is trained it according to parameter of the second tone testing data to second model Afterwards, the method also includes:

Second model corresponding to parameter after determining training；

The second model according to corresponding to the parameter after the training identifies voice messaging, obtains recognition result；

Show the recognition result.

4. method according to any one of claims 1 to 3, which is characterized in that the first model includes: Feature Conversion network； Second model includes: two Classification Neurals.

5. method according to any one of claims 1 to 3, which is characterized in that the first tone testing data include: that standard is general Test data corresponding to call voice, the second tone testing data include: test data corresponding to non-standard mandarin.

6. a kind of data processing equipment characterized by comprising

First input module, the first tone testing data for will acquire are input in the first model, wherein described first Model is used to the first tone testing data being converted to the second tone testing data；

First obtains module, for obtaining the second tone testing data of the first model output；

Second input module, for the second tone testing data to be input in the second model, to indicate second mould Type is trained according to parameter of the second tone testing data to second model, wherein second model is used for Voice messaging is identified, the voice messaging includes: the first tone testing data, the second tone testing number According to.

7. device according to claim 6, described device further include:

Second obtains module, is directed to the third voice that the first tone testing data are exported for obtaining first model Test data；

Processing module, for using the third tone testing data as the input of first model, so that first mould It is super that type is output to the accounting in the second tone testing data of the specified content in the second tone testing data of the second model Cross preset threshold.

8. device according to claim 6, described device further include:

Determining module, for determining the second model corresponding to the parameter after training；

Identification module identifies voice messaging for the second model according to corresponding to the parameter after the training, obtains Recognition result；

Display module, for showing the recognition result.

9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in described any one of claims 1 to 5 when operation.

10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 5 Method.