CN110428816A

CN110428816A - A kind of method and device voice cell bank training and shared

Info

Publication number: CN110428816A
Application number: CN201910706841.7A
Authority: CN
Inventors: 龚思颖; 赵晓朝
Original assignee: Beijing Suddenly Cognitive Technology Co Ltd
Current assignee: Hangzhou Suddenly Cognitive Technology Co ltd
Priority date: 2019-02-26
Filing date: 2019-08-01
Publication date: 2019-11-08
Anticipated expiration: 2039-08-01
Also published as: CN110428816B

Abstract

The embodiment of the invention provides a kind of personal methods sound bank training and shared, comprising: step 1, voice assistant obtains the voice command of user's input；Step 2, the voice command of user's input is identified and parsed, marks one or more labels for the voice command；Step 3, for statistical analysis to the voice command of user；Step 4, voice assistant creation and/or more new speech cell bank.This method can make the interaction between voice assistant and user more intelligent, and hommization improves interactive efficiency, improve user experience.

Description

A kind of method and device voice cell bank training and shared

Technical field

The present embodiments relate to technical field of information processing, in particular to a kind of sound bank training and the method shared, Device, equipment and computer readable storage medium.

Background technique

With the development of technology, artificial intelligence gradually penetrates into people's lives, and voice assistant is as human-computer interaction Bridge plays critically important effect, and user is interacted by way of voice with voice assistant, on the one hand can liberate both hands, separately On the one hand it can arbitrarily communicate, but there are some problems for existing voice assistant, since user has multifarious spy Point, the languages spoken, accent, dialect and user interest hobby are different, and voice assistant needs to obtain the voice compared with multiple types Library and user link up, and existing sound bank generation is usually taken a large amount of corpus of collection and is trained, and wherein most is all from official It is obtained in square corpus or large cardinal samples to obtain, which does not have specific aim for a user, limits voice assistant With the good interaction of user, poor user experience.

How to make voice assistant that there is the sound bank for more meeting user individual requirement to become a urgent problem to be solved.

Summary of the invention

For the above-mentioned problems in the prior art, the present invention proposes a kind of method voice cell bank training and shared And device, to overcome the above problem.

The embodiment of the invention provides a kind of methods voice cell bank training and shared, comprising:

Step 1, voice assistant obtains the voice command of user's input；

Step 2, the voice command of user's input is identified and parsed, marks one or more labels for the voice command；

Step 3, for statistical analysis to the voice command of user；

Step 4, voice assistant creation and/or more new speech cell bank.

Preferably, this method further includes

Step 5, creation and/or the voice cell bank updated are shared to intelligent interaction platform and/or user good friend.

Preferably, this method further includes

Step 6, the frequency of use for judging the voice cell bank that voice assistant includes will when frequency of use is lower than threshold value The voice cell bank is deleted.

Preferably, step 2 specifically includes

When voice assistant has the personal voice cell bank of user, it is primarily based on the cell bank and is identified, judgement is It is no to identify the voice command, if can identify, which is parsed, marks one or more labels for it；

When voice assistant does not have the personal voice cell bank of user, speech recognition is carried out based on general-purpose library, if can be with Identification, then parsed, and mark one or more labels；If can not be identified based on general-purpose library, voice assistant is by the voice Order is scanned for or is sent to Cloud Server by network, carries out identification parsing by it, if identification successfully resolved, to the language Sound Command Flags one or more label carries out manual labeling by user, for the voice command mark if identification parsing is unsuccessful Remember one or more labels.

Preferably, in step 3, the number occurred to the label of voice command and/or tag set counts；Wherein Tag set refers to the set that in a voice command while multiple labels of label are constituted.

Preferably, when a predetermined condition is satisfied, step 3 is executed；

The predetermined condition are as follows: reach predetermined time interval or voice command quantity reaches predetermined quantity, when one of item When part meets, step 3 is executed.

Preferably, create voice cell bank the following steps are included:

Step 41, judge whether user good friend has the voice cell bank of the label and/or tag set feature, if so, It then sends and requests to user good friend, request voice cell bank, if it is not, thening follow the steps 42；

Step 42, inquiry request is sent to intelligent interaction platform, judges whether it has with the label and/or tag set The voice cell bank of feature, if not having, executes step 43 if then obtaining the voice cell bank from intelligent interaction platform；

Step 43, the voice command of user is grouped according to each label and/or tag set, to the language after grouping Sound order is trained respectively, and the voice cell bank corresponding to the label and/or tag set feature is respectively created；

More new speech cell bank is that the user command based on label is special with respective labels and/or tag set to corresponding to The voice cell bank of sign is updated.

Preferably, when label and/or tag set meet following either conditions, creation and/or update have the satisfaction The label of condition and/or the voice cell bank of tag set feature:

Condition (1): the number that i-th group of tag set Ci occurs meets following formula:

N_Ci≥n (1)

Condition (2): when label Li belongs to the tag set for meeting formula (1), the number of label Li is subtracted expires at this The number occurred in the tag set of sufficient condition, residue degree are more than or equal to m；

Condition (3): when label Li is not belonging to meet the tag set of formula (1), the number of label Li is more than or equal to k；

Wherein N_CiIndicate the number that i-th group of tag set Ci occurs；N, m, k are nonnegative integer.

The present embodiment additionally provides a kind of device voice cell bank training and shared, which includes:

Module is obtained, for obtaining the voice command that user inputs in voice assistant；

Mark module for identification and parses the voice command that user inputs, marks for the voice command one or more Label；

Statistical module, it is for statistical analysis to the voice command of user；

Creation/update module, for creation and/or more new speech cell bank.

Preferably, which further includes

Sharing module is good to intelligent interaction platform and/or user for sharing creation and/or the voice cell bank updated Friend.

Preferably, which further includes

Removing module, for judging the frequency of use of voice cell bank that voice assistant includes, when frequency of use is lower than threshold When value, which is deleted.

Preferably, the mark module is specifically used for

When voice assistant does not have the personal voice cell bank of user, speech recognition is carried out based on general-purpose library, if can be with Identification, then parsed, and mark one or more labels；If can not be identified based on general-purpose library, voice assistant is by the voice Order is scanned for or is sent to Cloud Server by network, carries out identification parsing by it, if identification successfully resolved, to the language Sound Command Flags one or more label；If identification parsing is unsuccessful, manual labeling is carried out by user, for the voice command Mark one or more labels.

Preferably, statistical module is specifically used for uniting to the number that the label and/or tag set of voice command occur Meter；Wherein tag set refers to the set that in a voice command while multiple labels of label are constituted.

Preferably, when a predetermined condition is satisfied, statistical module is triggered；

The predetermined condition are as follows: reach predetermined time interval or voice command quantity reaches predetermined quantity, when one of item When part meets, statistical module is triggered.

Preferably, create voice cell bank the following steps are included:

N_Ci≥n (1)

The present invention also provides a kind of voice assistant, which includes above-mentioned apparatus.

The present invention also provides a kind of terminal, which includes above-mentioned voice assistant.

The present invention also provides a kind of computer equipment, the computer equipment includes processor and memory, the storage The computer instruction that device storage can be executed by processor is realized as described above when processor executes above-mentioned computer instruction Method.

The present invention also provides a kind of computer readable storage mediums, store computer instruction, and the computer instruction is used for Realize method as described above.

Detailed description of the invention

Fig. 1 is intelligent interaction platform schematic diagram provided in an embodiment of the present invention.

Fig. 2 is personal sound bank training in an embodiment of the present invention and the method shared.

Fig. 3 is in an embodiment of the present invention for personal device sound bank training and shared.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.The embodiment of the present invention and the specific features of embodiment are to technical side of the embodiment of the present invention The detailed description of case, rather than the restriction to description of the invention technical solution, in the absence of conflict, the embodiment of the present invention And the technical characteristic of embodiment can be combined with each other.

Refering to fig. 1, Fig. 1 is the schematic diagram of intelligent interaction platform of the present invention, specifically includes that human-computer interaction interface 101, processing Module 102, database 103 etc..Wherein processing module includes multiple interactive engines 112, and interactive engine 112 may include semantic reason Module 201, dialogue management and control module 202 are solved, generation module 203, command execution module 204 are talked with.Wherein, processing module 102 are connected with each other with human-computer interaction interface 101, and the data of user's input, Yi Jitong can be received by human-computer interaction interface 101 It crosses human-computer interaction interface and exports interaction data to user, is i.e. 101 one side of human-computer interaction interface can be connect by processing module 102 The dialogue data for feeding back to user is received, on the one hand can receive the order implementation procedure and result data of the feedback of processing module 102. For intelligent sound interaction platform, processing module 102 can also include: speech recognition module 210, voice output module 211.Institute Speech recognition module 210 is stated, voice output module 211 can also be configured in interactive engine 112.In addition, interactive engine 112 can To be single interactive engine, can also be made of one or more interaction sub- engine.

The one of them main points of the optimization of interactive engine in intelligent interaction platform are to improve the processing capacity of interactive engine, are increased Strong interactive engine improves the efficiency of dialogue interaction for semantic understanding, and promotes task execution accuracy；These are required pair The carry out accurate understanding that user is intended to is parsed by slot position in enrichment interactive engine and slot position, improves interactive engine for interaction Control and management.

Referring to Fig. 2, Fig. 2 is a kind of voice cell bank training that provides of the embodiment of the present invention one and the method shared, this Method includes but is not limited to:

Step 1, voice assistant obtains the voice command of user's input；

Specifically, in step 1, as user and voice assistant dialogue, voice assistant obtains the voice command of user.

Such as when user and voice assistant talk about the theme discussed about " king's honor " with Sichuan, voice assistant obtains user Voice command.

In step 2, voice assistant identifies user voice command, when voice assistant has the personal voice of user It when cell bank, is primarily based on the cell bank and is identified, judge whether to identify the voice command, if can identify, to this Voice command is parsed, and marks one or more labels for it, the label is related to user voice command, such as voice command Theme (being king's honor in this example), user vocal feature are related (being Sichuan dialect in this example)；If voice assistant does not have When having the personal voice cell bank of user, speech recognition is carried out based on general-purpose library and is parsed if can identify, and is marked Label；If can not identify that the voice command is scanned for or be sent to cloud by network by voice assistant based on general-purpose library Server carries out identification parsing by it, if identification successfully resolved, one or more labels is marked to the voice command, if identification It parses unsuccessful, manual labeling is carried out by user, marks one or more labels for the voice command

Further, the mark mode of label is the mode { label 1, label 2 } of set.

Step 3, for statistical analysis to the voice command of user；

In this step, for statistical analysis to the voice command of user includes carrying out to the label of the voice command of user Statistical analysis.

Statistical analysis to the label includes: to count the number that label and/or tag set occur, and tag set refers to one The set that multiple labels of label are constituted simultaneously in a voice command.

Such as user is as follows to the statistics of label:

Label Li	Number
		Label 1	12
Label 2	13
		Label 3	5
Label 4	9

It is as follows to the statistics of tag set:

Preferably, when a predetermined condition is satisfied, step 3 is executed；Wherein, predetermined condition are as follows: reach predetermined time interval Or voice command quantity reaches predetermined quantity, when one of condition meets, then executes statistical analysis step；The predetermined time Interval, predetermined quantity is set by the user perhaps voice assistant adaptive setting or intelligent interaction platform configures it. Voice assistant carries out statistical to the label and/or tag set of the voice command of the user's input obtained in predetermined time interval Analysis, or analyzed when the voice command for user's input that it is obtained is more, it can determine the characteristics of user speech inputs.

Step 4, voice assistant creation and/or more new speech cell bank；

By the statistical analysis to user voice command, voice assistant can determine the spy of voice when user interacts Sign, the hobby etc. of user are that voice assistant creates and/or update has the mark based on above-mentioned label and/or tag set The voice cell bank of label and/or tag set feature, wherein when voice assistant does not have the label and/or tag set feature Voice cell bank when, create the voice cell bank with the label and/or tag set feature, it is otherwise thin to corresponding voice Born of the same parents library is updated.

Create voice cell bank the following steps are included:

Step 41, judge whether user good friend has the voice cell bank with the label and/or tag set feature, if It is then to send and request to user good friend, request voice cell bank, if it is not, thening follow the steps 42；

User good friend includes that the contacts list personnel of the various communication modes of user or voice assistant are helped with user speech Hand establishes the personnel of friend relation；By sending inquiry request to user good friend, it can relatively go the voice for obtaining the condition that meets Assistant.

The voice cell bank and other staff's training that there is intelligent interaction platform general sound bank, developer to upload Voice cell bank etc., voice assistant not from good friend obtain needed for voice cell bank when, by being asked to the transmission of intelligent interaction platform It asks, improves the probability for obtaining and meeting condition sound bank.

If voice assistant can not obtain the voice cell bank of the condition of satisfaction, the voice command based on user is trained, Form the voice cell bank for meeting self-demand.

The voice command that wherein user is trained can derive from the dialogue of user and voice assistant, can also derive from User and the call voice between other people.

N_Ci≥n (1)

Wherein N_CiIndicate the number that i-th group of tag set Ci occurs；N, m, k are nonnegative integer, and value is set by the user, or Person's voice assistant adaptive setting or intelligent interaction platform configure it.

Further, this method further includes step 5, and creation and/or the voice cell bank updated are shared to intelligent interaction Platform and/or user good friend；

Voice assistant shares the voice cell bank for creating and/or updating to intelligent interaction platform, and intelligent friendship can be improved The classification of voice cell bank in mutual platform, further, it is also possible to which voice cell bank is shared with voice command with the voice cell The good friend of label and/or tag set feature that library has, so that good friend has the voice cell bank or combines the voice thin The voice cell bank that born of the same parents library has it is trained update, which can be executed by user's operation or voice assistant.

Further, this method further includes step 6, judges the frequency of use for the voice cell bank that voice assistant includes, when When frequency of use is lower than threshold value, which is deleted, can achieve the effect for saving user terminal space.

It by the above method, can make voice assistant that there is the voice cell bank for meeting user characteristics, meet personalized Demand further increases the intelligence of voice assistant, improves user experience.

A kind of device for voice cell bank training and sharing that the present invention also proposes, as shown in figure 3, for execution Method is stated, which includes

Specifically, mark module identifies user voice command, when voice assistant has the personal voice of user thin It when born of the same parents library, is primarily based on the cell bank and is identified, judge whether to identify the voice command, if can identify, to the language Sound order is parsed, and marks one or more labels for it, if voice assistant does not have the personal voice cell bank of user, Speech recognition is carried out based on general-purpose library to be parsed if can identify, and marks one or more labels；If based on general Library can not identify that then the voice command is scanned for or be sent to Cloud Server by network by voice assistant, by its progress Identification parsing marks one or more labels to pass through if identification parsing is unsuccessful the voice command if identification successfully resolved User carries out manual labeling, marks one or more labels for the voice command

Further, the mark mode of label is the mode { label 1, label 2 } of set.

Wherein, for statistical analysis to the voice command of user includes carrying out statistical to the label of the voice command of user Analysis.

Preferably, when a predetermined condition is satisfied, statistical module is triggered；Wherein, predetermined condition are as follows: reach between the predetermined time Every or voice command quantity reach predetermined quantity, when one of condition meets, trigger statistical module；Between the predetermined time Every, predetermined quantity is set by the user perhaps voice assistant adaptive setting or intelligent interaction platform configures it.

Creation/update module, for creation and/or more new speech cell bank.

Creation/update module is based on above-mentioned label and/or tag set is that voice assistant is created and/or updated with the mark The voice cell bank of label and/or tag set feature, wherein when voice assistant does not have the label and/or tag set feature Voice cell bank when, create the voice cell bank with the label and/or tag set feature, it is otherwise thin to corresponding voice Born of the same parents library is updated.

Create individual's voice cell bank the following steps are included:

Create voice cell bank the following steps are included:

N_Ci≥n (1)

Further, which further includes sharing module, for sharing creation and/or the voice cell bank updated to intelligence It can interaction platform and/or user good friend.

Sharing module shares the voice cell bank for creating and/or updating to intelligent interaction platform, and intelligent friendship can be improved The classification of voice cell bank in mutual platform, further, it is also possible to which voice cell bank is shared with voice command with the voice cell The good friend of label and/or tag set feature that library has, so that good friend has the voice cell bank or combines the voice thin The voice cell bank that born of the same parents library has it is trained update, which can be executed by user's operation or voice assistant.

Further, which further includes removing module, for judging the use of voice cell bank that voice assistant includes Frequency deletes the voice cell bank when frequency of use is lower than threshold value.

Specifically, terminal device can be computer, tablet computer, mobile phone, intelligent assistant, car-mounted terminal etc..

It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating Machine readable signal medium or computer readable storage medium.Computer readable storage medium can for example be but not limited to electricity, Magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Computer-readable storage Medium may include: the electrical connection with one or more conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), flash memory, erasable programmable read only memory (EPROM), optical fiber, portable compact disc Read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, Computer readable storage medium can be any tangible medium for including or store program, which can be commanded and execute system System, device or device use or in connection.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code.

It is described above to be intended merely to facilitate the example for understanding the present invention and enumerating, it is not used in and limits the scope of the invention.In When specific implementation, those skilled in the art can according to the actual situation change the component of device, increase, reduce, not The step of method, can be changed according to the actual situation on the basis of the function that influence method is realized, increased, reduced or Change sequence.

Although an embodiment of the present invention has been shown and described, it should be understood by those skilled in the art that: do not departing from this These embodiments can be carried out with a variety of change, modification, replacement and modification in the case where the principle and objective of invention, it is of the invention Range is limited by claim and its equivalent replacement, without creative work improvements introduced etc., should be included in this hair Within bright protection scope.

Claims

1. a kind of method voice cell bank training and shared, which is characterized in that method includes the following steps:

Step 1, voice assistant obtains the voice command of user's input；

Step 3, for statistical analysis to the voice command of user；

Step 4, voice assistant creation and/or more new speech cell bank.

2. the method according to claim 1, wherein this method further includes

3. method according to claim 1 or 2, which is characterized in that this method further includes

Step 6, the frequency of use for judging the voice cell bank that voice assistant includes, when frequency of use is lower than threshold value, by the language Sound cell bank is deleted.

4. method according to claim 1-3, which is characterized in that this method further includes

Step 2 specifically includes

When voice assistant has the personal voice cell bank of user, it is primarily based on the cell bank and is identified, judging whether can To identify the voice command, if can identify, which is parsed, marks one or more labels for it；

When voice assistant does not have the personal voice cell bank of user, speech recognition is carried out based on general-purpose library, if can identify, It is then parsed, and marks one or more labels；If can not identify that voice assistant leads to the voice command based on general-purpose library It crosses network and scans for or be sent to Cloud Server, identification parsing is carried out by it, if identification successfully resolved, to the voice command One or more labels are marked, if identification parsing is unsuccessful, manual labeling is carried out by user, is the voice command label one Or multiple labels.

5. method according to claim 1-4, which is characterized in that this method further includes

In step 3, the number occurred to the label of voice command and/or tag set counts；Wherein tag set refers to The set that multiple labels of label are constituted simultaneously in a voice command.

6. method according to claim 1-5, which is characterized in that this method further includes

When a predetermined condition is satisfied, step 3 is executed；

The predetermined condition are as follows: reach predetermined time interval or voice command quantity reaches predetermined quantity, when one of condition is full When sufficient, step 3 is executed.

7. method according to claim 1-6, which is characterized in that this method further includes

Create voice cell bank the following steps are included:

Step 41, judge whether user good friend has the voice cell bank of the label and/or tag set feature, if so, to User good friend sends request, request voice cell bank, if it is not, thening follow the steps 42；

Step 42, inquiry request is sent to intelligent interaction platform, judges whether it has with the label and/or tag set feature Voice cell bank, if then obtaining the voice cell bank from intelligent interaction platform, if not having, execute step 43；

Step 43, the voice command of user is grouped according to each label and/or tag set, the voice after grouping is ordered Order is trained respectively, and the voice cell bank corresponding to the label and/or tag set feature is respectively created；

More new speech cell bank is the user command based on label to corresponding to respective labels and/or tag set feature Voice cell bank is updated.