CN108986792A

CN108986792A - The training dispatching method and system of speech recognition modeling for voice dialogue platform

Info

Publication number: CN108986792A
Application number: CN201811056567.5A
Authority: CN
Inventors: 刘振鲁; 张顺
Original assignee: AI Speech Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2018-12-11
Anticipated expiration: 2038-09-11
Also published as: CN108986792B

Abstract

The embodiment of the present invention provides a kind of training dispatching method of speech recognition modeling for voice dialogue platform.This method comprises: receiving the training data of external server deposit, wherein voice dialogue platform includes external server and training server；Training data is imported into message queue；Training data in message queue is distributed to training server and carries out speech recognition modeling training；Receive and cache the speech recognition modeling corresponding with training data of training server return；Speech recognition modeling is fed back into external server.The embodiment of the present invention also provides a kind of training scheduling system of speech recognition modeling for voice dialogue platform.The embodiment of the present invention is by spliting external service and training service, storage sites of the message queue as external service and training service transmission information, so that data persistence, when message queue reaches certain length, dynamic increases the quantity of training service, it realizes dynamic capacity-expanding, improves training effectiveness when a large amount of train requests.

Description

The training dispatching method and system of speech recognition modeling for voice dialogue platform

Technical field

The present invention relates to voice dialogue platform field more particularly to a kind of speech recognition modelings for voice dialogue platform Training dispatching method and system.

Background technique

With the development of intelligent sound technology, voice dialogue platform is the more and more convenient and fast voice technology clothes that user provides Business, for example, user can voluntarily create the speech production for meeting oneself demand, develop specific voice according to the demand of itself Technical ability, writes corresponding sentence dictionary at the special user's saying of setting.User is after the exploitation of above-mentioned steps, to obtain Corresponding speech production.In the speech production in use, user inputs speech sentences, the voice to the speech production Product is identified according to the automatic speech recognition model configured in voice dialogue platform, so that it is determined that corresponding text letter out Breath, and then corresponding saying is matched, call specific voice technical ability.

In realizing process of the present invention, at least there are the following problems in the related technology for inventor's discovery:

When user is to speech production read statement, the speech production can all be used in provided by voice dialogue platform Set automatic speech recognition model.User can not voluntarily train specific automatic speech recognition model.

Simultaneously as time consumption for training is longer, if Server Restart or hair at this time when training automatic speech recognition model Raw fortuitous event will lead to trained automatic speech recognition model data and lose.It is user's training speech recognition due to having opened The function of model may receive the train request of a large amount of automatic speech recognition model in a certain period, at this point, voice dialogue Platform needs successively to train it, so that training speed is more slow when there is the request of a large amount of automatic speech recognition model trainings.

Summary of the invention

User's voluntarily trained speech recognition modeling cannot be provided in order at least solve voice dialogue platform in the prior art, together In Shi Xunlian speech recognition modeling, in case of fortuitous event, data can lose, meanwhile, when requesting peak period, due to according to Secondary training managing, so that the problem that speech recognition modeling training is more slow.

In a first aspect, the embodiment of the present invention provides a kind of training scheduling of speech recognition modeling for voice dialogue platform Method, comprising:

Receive the training data of external server deposit, wherein voice dialogue platform includes the external server and instruction Practice server；

The training data is imported into message queue；

Training data in the message queue is distributed to training server and carries out speech recognition modeling training；

Receive and cache the speech recognition modeling corresponding with the training data that the training server returns；

The speech recognition modeling is fed back into the external server.

Second aspect, the embodiment of the present invention provide a kind of training scheduling of speech recognition modeling for voice dialogue platform System, comprising:

Training data stores program module, for receiving the training data of external server deposit, wherein voice dialogue is flat Platform includes the external server and training server；

Training data imports program module, for the training data to be imported message queue；

Training data distribution program module, for by the training data in the message queue distribute to training server into The training of row speech recognition modeling；

Speech recognition modeling stores program module, returning with the training for receiving and caching the training server The corresponding speech recognition modeling of data；

Model feedback program module, for the speech recognition modeling to be fed back to the external server.

The third aspect, the embodiment of the present invention provide a kind of electronic equipment comprising: at least one processor, and with institute State the memory of at least one processor communication connection, wherein the memory is stored with can be by least one described processor The instruction of execution, described instruction are executed by least one described processor, so that at least one described processor is able to carry out this The step of training dispatching method of the speech recognition modeling for voice dialogue platform of invention any embodiment.

Fourth aspect, the embodiment of the present invention provide a kind of storage medium, are stored thereon with computer program, and feature exists In the speech recognition modeling for voice dialogue platform of realization any embodiment of the present invention when the program is executed by processor The step of training dispatching method.

The beneficial effect of the embodiment of the present invention is, external will be serviced by message queue and train the letter passed through in service Breath storage, can be to disappearing when training is interrupted or training is abnormal in speech recognition modeling training so that data persistence Training data is reacquired in breath queue, prevents training data from losing.

By spliting external service and training service, two services transmit data by message queue, work as message When queue reaches certain length, dynamic increases the quantity of training service, dynamic capacity-expanding is realized, to improve a large amount of train requests When training effectiveness.

It is opened speech recognition modeling as independent service, can be convenient competent user's self-developing, or make Automatic speech recognition technology used as an independent technique develop the product of oneself.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of training scheduling for speech recognition modeling for voice dialogue platform that one embodiment of the invention provides The flow chart of method；

Fig. 2 is a kind of training scheduling for speech recognition modeling for voice dialogue platform that one embodiment of the invention provides The structural schematic diagram of system.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

A kind of instruction of the speech recognition modeling for voice dialogue platform provided as shown in Figure 1 for one embodiment of the invention The flow chart for practicing dispatching method, includes the following steps:

S11: the training data of external server deposit is received, wherein voice dialogue platform includes the external server And training server；

S12: the training data is imported into message queue；

S13: the training data in the message queue is distributed to training server and carries out speech recognition modeling training；

S14: receiving and caches the speech recognition modeling corresponding with the training data that the training server returns；

S15: the speech recognition modeling is fed back into the external server.

In the present embodiment, the service of the voice dialogue platform is divided into 3 parts: externally service, training service and Store dispatch service.Externally service: with user, storage dispatch service, for issue the Resource Server of speech recognition modeling into Row interaction.Training service: according in training data training saying and training dictionary train speech recognition modeling.Storage is adjusted Degree service: externally service and training service indirect interaction are helped, while storing the data in interactive process.

For step S11, user to voice dialogue platform submit for speech recognition modeling training training saying and Training dictionary data, externally the trained saying and training dictionary data are deposited and are sent to storage dispatch service by service, are stored Dispatch service stores the trained saying and training dictionary.

For step S12, the training data is imported into message queue.The storage dispatch service may include Rabbitmq, wherein the rabbitmq is a kind of communication means of application program to application program, and the application program of one end is logical It crosses read-write and enters and leaves the message of message queue in storage dispatch service to communicate.Message transmission refers between each program by disappearing Data are sent in breath queue to be communicated, rather than are called directly between program to communicate with one another.

For step S13, the training data in the message queue (is trained saying and instruction by the storage dispatch service Practice dictionary) distribution is to training server progress speech recognition modeling training, since voice dialogue platform needs to mention to many users For service, therefore, the pre-configured a certain number of training services of voice dialogue platform in the distribution of training data, can To inquire which Training Services Division in idle state, or when the not training service of idle state, which training clothes inquired Business queuing training mission to be processed is less reasonably to be distributed.

In the training of speech recognition modeling, four steps can be divided into:

1, canonical expansion is done to training saying, then carries out word segmentation processing；The training direct word segmentation processing of dictionary data；

2, the trained saying and training dictionary are merged；

3, corresponding training resource is generated according to the data after merging；

4, after generating training resource, while training information is generated, the version including training script, training catalogue, training Time etc. is used for subsequent update.

During training, subtask can be used and concurrently train, is different from thread pool and process pool, uses herein The mode of association Cheng Chi carrys out concurrently training for subtasking.

General thread pool and process pool preferably handle same or similar task, compare if the input and output of task have Big difference, then after process pool and thread pool will be continued to execute in the processing returned the result to task using call back function The task in face increases development difficulty, is also easy to cause bug.

Since there are many type tasks, if every kind of task all increases a thread pool, parallel effect will be poor Very much.If the task of polymorphic type is put into a thread pool, the difficulty of exploitation will increase.And using reference count come real An existing association Cheng Chi, in this way, using when only need to add a modifier and can realize training mission is added to association Cheng Chizhong.Size of code is simplified, while also ensuring parallel effect.

In the training of speech recognition modeling, often same user needs repeatedly to modify its speech recognition modeling, A large amount of voice pair can be occupied if training total data every time by often having small saying or dictionary change The training Service Source of platform is talked about, and due to training total data, the time is longer, also very poor for the experience of user.

By the training result of the caching speech recognition modeling last time, incremental training is done according to the variation of resource, Training speed is substantially increased in this way.

The variation of step 1, track training script

According to the version information of the training information of last speech recognition modeling resource and current script, to judge currently to instruct Whether version and the training script of last training semantic resources for practicing script are consistent, if inconsistent, the language of that last time training Sound identification model resource failed, training whole resource；If the two is consistent, last time semantic resources are effective, execute step 2.

Step 2, tracking resource change

In the participle stage, it is whether consistent compared to last version with the initial data of dictionary to judge saying, if unanimously, directly It connect using last version as a result, if there is inconsistent, segments again.

In saying and dictionary merging phase, not needing caching terminates.

The training resource stage is being generated, is judging whether the saying after merging is consistent with last version with dictionary, if unanimously, Directly language model resource was regenerated as a result, if inconsistent using last version.

In the training process, in order to improve the usage experience of user, physical training condition can be added in voice dialogue platform and looked into The interface of inquiry increases the field that the training time estimates in interface.After training service receives training data, provided according to training The size in source estimates the training time, and will estimate the training time as a field of physical training condition and be sent to storage scheduling clothes Message queue in business.

Externally service updates in voice dialogue platform at any time after getting in the message queue and estimating the training time Training remaining time.

For step S14, it is described storage dispatch service receives and cache it is described training service return with the trained number According to corresponding speech recognition modeling.The storage dispatch service can service the training data of external service transmission and training Speech recognition modeling after the training of return uses persistent storage, it is ensured that the initial data of user's request will not lose FLAG_ CUSTOM。

Training service is during training speech recognition modeling using ACK (Acknowledgement confirms character) machine System, in data communication, receiving station issues a kind of transmission class control character of dispatching station.Indicate that the data sent have confirmed that reception It is errorless, in ICP/IP protocol, if recipient successfully receives data, an ack msg can be replied.Usual ACK The format that signal has oneself to fix, length scale reply to sender by recipient.Thus training can be divided into two kinds of situations:

Successfully/failure to train is trained, training result is returned and confirms ACK；

Training process exits extremely, connects terminal, no ACK with the message queue of storage dispatch service.In training service weight Qi Hou still is able to obtain training data re -training.

For step S15, the storage dispatch service is fed back to the speech recognition modeling of storage pair by message queue Outer service.

Externally service trained after speech recognition modeling after, to the speech recognition modeling generation relative to model ID, and the speech recognition modeling and model ID are sent to Resource Server and issued.

In order to guarantee accuracy, the availability of trained speech recognition modeling, after speech recognition modeling is issued successfully, use Several standard testing audios are packaged into speech recognition request, are parsed.After the result for ensuring to identify is correct, by the model ID Feed back to user.

When user needs the speech recognition modeling using training, it is only necessary to which audio data and model ID are sent to resource Server is called, and Resource Server calls corresponding speech recognition modeling to be parsed, and the audio data is converted into Corresponding text.

It can be seen that by the implementation method and deposited the information passed through in external service and training service by message queue Storage avoids in speech recognition modeling training when training is interrupted or training is abnormal so that data persistence, can be to disappearing It is reacquired in breath queue, prevents training data from losing.

As an implementation, in the present embodiment, the training data by the message queue is distributed to instruction Practicing server progress speech recognition modeling training includes:

Inquire the queue length of the message queue；

When the queue length is not above preset threshold, training server is distributed for the training data；

When the queue length is more than preset threshold, increase interim training server, faces for training data distribution When training server.

In the present embodiment, there is no direct communication between service due to externally servicing and training, but adjusted by storage Message queue transmits data in degree service.Message queue and multiple training services connect.Training data can be according to current company Number dynamics distribution is connect, but when the data in message queue reach certain length, has indicated that it is currently user's training language The peak period of sound identification model, existing multiple training service connections may be used not enough.Dynamic is needed to increase certain amount It is interim training service, thus to realize dynamic capacity-expanding.

It can be seen that present embodiment by the embodiment to split external service and training service, two services Data are transmitted by message queue, when message queue reaches certain length, so that dynamic increases the quantity of training service, from And realize dynamic capacity-expanding.

As an implementation, in the present embodiment, the training data by the message queue distribute to Training server carries out speech recognition modeling after training, the method also includes:

Receive the abnormal message of the training of the training server feedback；

Again the training data in the message queue is distributed to other training servers and carries out speech recognition modeling instruction Practice.

In the present embodiment, in response to the message of the training exception of training service feedback, again by the message queue In training data distribute to other train service carry out speech recognition modeling training, since training data is in message queue Middle persistent storage restarts so no matter training abnormal is to have a power failure or machine delay machine, training data will not all be lost.

As an implementation, in the present embodiment, the method also includes:

It receives the training data training that the training server returns and completes symbol of reading really, delete the message team The training data that training is completed in column.

Symbol of reading really is completed when having received the training data training that training service returns, since training has been completed, Training data is not being needed, so deleting the training data that the training is completed in the message queue.

It can be seen that by the embodiment and fed back by the difference in response to training server, carry out different operations, When training is abnormal, guarantee that data are not lost, again re -training.When training successfully, the training data that will no longer be required to is deleted It removes, ensures the reasonable employment of memory space.

Initial training state is switched into the first physical training condition；When receive and cache that the training server returns with institute When stating the corresponding speech recognition modeling of training data, the first physical training condition is switched into the second physical training condition；

When the abnormal message of the training for receiving the training server feedback, the first physical training condition is switched into third instruction Practice state；

In response to the switching of physical training condition, the physical training condition after switching is fed back into user, wherein the physical training condition packet It includes: initial training state, the first physical training condition, the second physical training condition, third physical training condition.

In the present embodiment, the variation for recording physical training condition, by the training data in the message queue distribute to After training service carries out speech recognition modeling training, physical training condition is switched to " in queuing ".

It, will be in queuing when receiving and caching the model corresponding with the training data that the training server returns Physical training condition is switched to " training is completed ".

When receiving the training exception information of training server feedback, the physical training condition in queuing is switched to " trained different Often ".

After the completion of training, speech recognition modeling is sent to external model, corresponding model ID is generated to external model, it will When speech recognition modeling and corresponding model ID are sent in Resource Server, the physical training condition that training is completed is switched to " in publication ".

After external service receives the successful information of Resource Server feedback, the physical training condition in publication is switched to " issuing successfully ".

By the embodiment can be seen that by record speech recognition modeling training scheduling process, can in real time to The state of user feedback training, user can by according to the physical training condition come the issued state of voice inquirement identification model and Issue result.

A kind of instruction of speech recognition modeling for voice dialogue platform of one embodiment of the invention offer is provided Practice the structural schematic diagram of scheduling system, the technical solution of the present embodiment is applicable to the language for voice dialogue platform to equipment The language that voice dialogue platform is used for described in above-mentioned any embodiment can be performed in the training dispatching method of sound identification model, the system The training dispatching method of sound identification model, and configure in the terminal.

A kind of training scheduling system of speech recognition modeling for voice dialogue platform provided in this embodiment includes: instruction Practice data recording program module 11, training data imports program module 12, training data distribution program module 13, speech recognition mould Type stores program module 14 and model feedback program module 15.

Wherein, training data storage program module 11 is used to receive the training data of external server deposit, wherein voice Dialogue platform includes the external server and training server；Training data imports program module 12 and is used for the trained number According to importing message queue；Training data distribution program module 13 is for distributing the training data in the message queue to training Server carries out speech recognition modeling training；Speech recognition modeling storage program module 14 is for receiving and caching the training clothes The speech recognition modeling corresponding with the training data that business device returns；Model feedback program module 15 is for knowing the voice Other model feedback gives the external server.

Further, the training data distribution program module is used for:

Inquire the queue length of the message queue；

Further, the system also includes: exception processing module is used for:

Receive the abnormal message of the training of the training server feedback；

Further, wherein the system is also used to:

Further, the system also includes: physical training condition feedback process module is used for:

The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment and are used for voice dialogue platform Speech recognition modeling training dispatching method；

As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:

The training data is imported into message queue；

The speech recognition modeling is fed back into the external server.

As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held The training dispatching method of the speech recognition modeling for voice dialogue platform in the above-mentioned any means embodiment of row.

Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function；Storage data area can be stored according to test software Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non- Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any The step of training dispatching method of the speech recognition modeling for voice dialogue platform of embodiment.

The client of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) vehicle-mounted voice equipment: this kind of equipment can show and play multimedia content.Such equipment includes: audio, view Frequency player (such as iPod) and portable car-mounted navigation equipment.

(4) other electronic devices having data processing function.

Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element Or there is also other identical elements in equipment.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of training dispatching method of the speech recognition modeling for voice dialogue platform, comprising:

Receive the training data of external server deposit, wherein voice dialogue platform includes the external server and training clothes Business device；

The training data is imported into message queue；

The speech recognition modeling is fed back into the external server.

2. according to the method described in claim 1, wherein, the training data by the message queue is distributed to training clothes Business device carries out speech recognition modeling training

Inquire the queue length of the message queue；

When the queue length is more than preset threshold, increase interim training server, for the interim instruction of training data distribution Practice server.

3. according to the method described in claim 1, wherein, distributing in the training data by the message queue to training Server carries out speech recognition modeling after training, the method also includes:

Receive the abnormal message of the training of the training server feedback；

Again the training data in the message queue is distributed to other training servers and carries out speech recognition modeling training.

4. according to the method described in claim 3, wherein, the method also includes:

It receives the training data training that the training server returns and completes symbol of reading really, delete in the message queue The training data that training is completed.

5. method according to any of claims 1-4, wherein in the training data by the message queue Distribution to training server carries out speech recognition modeling after training, the method also includes:

Initial training state is switched into the first physical training condition；It is returning with the instruction when receiving and caching the training server When practicing the corresponding speech recognition modeling of data, the first physical training condition is switched into the second physical training condition；

When the abnormal message of the training for receiving the training server feedback, the first physical training condition is switched into third training shape State；

In response to the switching of physical training condition, the physical training condition after switching is fed back into user, wherein the physical training condition includes: just Beginning physical training condition, the first physical training condition, the second physical training condition, third physical training condition.

6. a kind of training scheduling system of speech recognition modeling for voice dialogue platform, comprising:

Training data stores program module, for receiving the training data of external server deposit, wherein voice dialogue platform packet Include the external server and training server；

Training data distribution program module carries out language for distributing the training data in the message queue to training server The training of sound identification model；

Speech recognition modeling stores program module, returning with the training data for receiving and caching the training server Corresponding speech recognition modeling；

7. system according to claim 6, wherein the training data distribution program module is used for:

Inquire the queue length of the message queue；

8. system according to claim 6, wherein the system also includes: exception processing module is used for:

Receive the abnormal message of the training of the training server feedback；

9. system according to claim 8, wherein the system is also used to:

10. the system according to any one of claim 6-9, wherein the system also includes: physical training condition feedback process Module is used for: