CN110503943A - A kind of voice interactive method and voice interactive system - Google Patents
A kind of voice interactive method and voice interactive system Download PDFInfo
- Publication number
- CN110503943A CN110503943A CN201810473045.9A CN201810473045A CN110503943A CN 110503943 A CN110503943 A CN 110503943A CN 201810473045 A CN201810473045 A CN 201810473045A CN 110503943 A CN110503943 A CN 110503943A
- Authority
- CN
- China
- Prior art keywords
- voice
- gender
- information
- input
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000002203 pretreatment Methods 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 44
- 238000001514 detection method Methods 0.000 claims description 32
- 230000015654 memory Effects 0.000 claims description 26
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000003796 beauty Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Computation (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of voice interactive method and voice interactive systems.This method comprises: pre-treatment step, pre-processes the voice messaging of input and exports voice segments;Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;And fusion treatment step, it merges the gender information and institute's semantic information and obtains the personalized reply message for the voice messaging.Sound exchange method and voice interactive system according to the present invention can distinguish reply according to the gender of user, improve user experience, improve the intelligence of interactive voice.
Description
Technical field
The present invention relates to speech recognition technology, more particularly to a kind of voice interactive method that can identify user's gender and
Voice interactive system.
Background technique
In vehicle-mounted conversational system, existing speech recognition technology can identify the voice of user to a certain extent, still
There is part topic to be related to the gender of user, current speech recognition technology, which is often difficult to be provided according to the text of identification, meets use
The answer of family gender.
Being disclosed in the information of background parts of the present invention, it is only intended to increase understanding of the overall background of the invention, without answering
When being considered as recognizing or imply that the information constitutes the prior art already known to those of ordinary skill in the art in any form.
Summary of the invention
In view of the above problems, the present invention is intended to provide a kind of voice interactive method and voice that can identify user's gender
Interactive system.
Voice interactive method of the invention characterized by comprising
Pre-treatment step pre-processes the voice messaging of input and exports voice segments;
Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;
Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;With
And
Fusion treatment step merges the gender information and institute's semantic information and obtains the personalization for the voice messaging
Return information.
Optionally, the gender analysis step includes:
Model training sub-step, output acoustic feature based on filter and the gender information marked in advance carry out length and remember in short-term
Recall model training and obtains long memory models in short-term;And
Institute's speech segment is input to the length obtained by training and memory models and exports gender point in short-term by Gender Classification sub-step
Class.
Optionally, in the pre-treatment step, for the input voice information, voice is carried out using end-point detection algorithm
The detection of section.
Optionally, in the pre-treatment step, for the input voice information, voice is carried out using end-point detection algorithm
The detection and output of section are supplied to the first voice segments of the semantics recognition step and are supplied to the Gender Classification step
Second voice segments, wherein the end-point detection boundary of second voice segments than first voice segments end-point detection boundary more
Add stringent.
Optionally, the model training sub-step includes:
Prepare the training set with gender mark;
Extract the output acoustic feature of the filter of the training set;
Construct the corresponding mark file of output acoustic feature of filter;And
The output acoustic feature of the filter and the mark file are input in long memory models in short-term and carry out model
Training is until model is restrained.
Optionally, the Gender Classification sub-step includes:
Institute's speech segment is input to by the trained length obtained in short-term memory models;
It carries out calculating the posterior probability for obtaining different classifications gender forward;And
Add up stipulated time long posterior probability and obtains gender classification results.
Voice interactive system of the invention characterized by comprising
Preprocessing module, for being pre-processed to the voice messaging of input and exporting voice segments;
Semantics recognition module, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic information;
Gender Classification module, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's gender
And export gender information;And
Fusion treatment module, for merging the gender information and the acquisition of institute's semantic information for the individual character of the voice messaging
Change return information.
Optionally, the Gender Classification module includes:
Model training submodule, the gender information marked for the output acoustic feature based on filter and in advance carry out length
When memory models training obtain long memory models in short-term;And
Gender Classification submodule, for institute's speech segment to be input to the length by training acquisition memory models and output property in short-term
Do not classify.
Optionally, in the preprocessing module, for the input voice information, voice is carried out using end-point detection algorithm
The detection of section.
Optionally, the preprocessing module carries out voice segments using end-point detection algorithm for the input voice information
It detects and exports the first voice segments for being supplied to the semantics recognition module and be supplied to the second of the Gender Classification module
Voice segments,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
Optionally, the model training submodule extracts the filter of the training set based on the training set marked with gender
The output acoustic feature of wave device constructs the corresponding mark file of output acoustic feature of filter, by the output of the filter
Model training is carried out until model is restrained in acoustic feature and the long memory models in short-term of mark file input.
Optionally, institute's speech segment is input to the long short-term memory mould obtained by training by the Gender Classification submodule
Type obtains the posterior probability of different classifications gender and the posterior probability of accumulative stipulated time length by calculating forward with acquired
Other classification results.
Above-mentioned voice interactive method of the invention is applied to vehicle or above-mentioned voice interactive system of the invention is applied to vehicle
.
The present invention also provides a kind of interactive voice equipment, it is able to carry out above-mentioned voice interactive method or it includes above-mentioned
Voice interactive system.
Optionally, above-mentioned interactive voice equipment is set to vehicle.
The present invention provides a kind of controller comprising storage unit, processing component and is stored on storage unit and can
The instruction of component operation processed, which is characterized in that when described instruction is run, the processing component realizes above-mentioned voice
Exchange method.Voice interactive method and voice interactive system according to the present invention, in conjunction with semantic analysis and Gender Classification, energy
It is enough that reply is distinguished according to the gender of user, user experience is improved, the intelligence of interactive voice is improved.
It is used to illustrate the specific reality of certain principles of the invention together with attached drawing by include this paper attached drawing and then
Mode is applied, other feature possessed by methods and apparatus of the present invention and advantage will become more concrete or explained
It is bright.
Detailed description of the invention
Fig. 1 is the flow chart for indicating the voice interactive method of an embodiment of the present invention.
Fig. 2 is the idiographic flow schematic diagram of Gender Classification step.
Fig. 3 is the construction block diagram for indicating the voice interactive system of an embodiment of the present invention.
Specific embodiment
What is be described below is some in multiple embodiments of the invention, it is desirable to provide to basic understanding of the invention.And
It is not intended to and confirms crucial or conclusive element of the invention or limit scope of the claimed.
It is explained firstly, for some terms hereinafter to be occurred.
Nlu: natural language understanding;
Asr: automatic speech recognition;
Long memory models (LSTM) in short-term: long memory models, a kind of deep learning model in short-term can learn to rely on letter for a long time
Breath;
Feats: the filter bank characteristic parameter of audio file;
Cmvn: tag file demographic information;
A kind of gmm-hmm: hidden Markov model of the conventional acoustic model-based on mixed Gauss model.
Fig. 1 is the flow chart of voice interactive method according to an embodiment of the present invention.
As the voice interactive method of Fig. 1, an embodiment of the present invention include the following steps:
Input step S100: input voice information;
Pre-treatment step S200: the voice messaging of input step S100 input is pre-processed and exports voice segments;
Semantics recognition step S300: semantics recognition is carried out to the voice segments of pre-treatment step S200 output and exports semantic letter
Breath;
Gender Classification step S400: Gender Classification is carried out to the voice segments of pre-treatment step S200 output, identifies user
Gender simultaneously exports gender information;
Fusion treatment step S500: the gender information and the acquisition of institute's semantic information are merged for the voice messaging of the input
Personalized reply message;And
Export step 600: to export the personalized reply message.For example, can be exported with voice mode, it can also be with text side
Formula output.
Then, example is carried out for pre-treatment step S200, Gender Classification step S400 and fusion treatment step S500
Property explanation.Wherein, in semantics recognition step S300 to voice segments carry out semantics recognition and output semantic information can be used with often
The identical technological means of rule technology, in this description will be omitted.
As an example, for the input voice information, using end-point detection algorithm in pre-treatment step S200
(VAD) voice messaging is detected to obtaining voice segments.For example, the voice messaging of user is input into VAD model,
VAD model obtains voice segments by modes such as end-point detection, feature extractions.Voice segments obtained are respectively supplied to subsequent language
Adopted identification step S300 and Gender Classification step S400.Wherein, voice recognition tasks require to retain complete text as far as possible
The boundary of information, VAD should be more tolerant;And Gender Classification mission requirements reject all silence(tone-offs as far as possible), VAD
Boundary should be more stringent.It therefore, is optionally respectively to provide two different voice segments to rear in pre-treatment step S200
Continuous semantics recognition step S300 and Gender Classification step S400.
Then, Gender Classification step S400 is illustrated.
Fig. 2 is the idiographic flow schematic diagram of Gender Classification step S400.
As shown in Fig. 2, Gender Classification step S400 can substantially be divided into training stage and cognitive phase.
It is illustrated firstly, for the training stage.
It needs to prepare a batch and possesses the training set of gender mark as training sample, including wav.scp, utt2spk,
Text and the corresponding gender information of every language (utterance) extract the feats(of training set that is, audio file
Filter bank characteristic parameter, the filter in Fig. 2 output acoustic feature (that is, filter bank feature, filter
Output acoustic feature) and cmvn be training grow in short-term memory models prepare.Feats herein is needed using based on triphones
The gmm-hmm model of (that is, tri-phone) carries out pressure alignment (that is, forced align) to it, finds the corresponding nothing of feature
The boundary sound (that is, silence), the unvoiced segments of feats are reduced, and only retaining being capable of the other voice segments of distinction.
Since gender model is a disaggregated model, the corresponding mark file (i.e. FA in Fig. 2) of construction feature is needed, together
Sample marks file FA just for the voice segments of feats, constructs the mark file that a batch reflects its gender according to the frame number of feats
FA。
By the tag file feats prepared above and mark file FA, it is input in long memory models in short-term and is trained
Until convergence.Here, LSTM(Long-Short Term Memory) it is recurrent neural network (RNN:Recurrent
Neutral Network) one kind.RNNs is also recurrent neural network sequence, it is a kind of according to time series or character sequence
The special neural network for arranging self calling after it is unfolded by sequence, just becomes common three-layer neural network, is commonly applied to language
Sound identification.
Here, the basic parameter that long memory models in short-term use are as follows:
num-lstm-layers: 1;
cell-dim: 1024;
lstm-delay: -1。
Secondly, being illustrated for cognitive phase.
Firstly, it is necessary to carry out feature extraction.When a user speaks, voice is believed using end-point detection algorithm (VAD) first
Breath is detected, and carries out feature extraction to non-noiseless (non-silence) speech frame for detecting VAD.Since length is remembered in short-term
Recalling model is a model dependent on last time, therefore it is accumulative that a buffer progress feature can be set.
Then, forward calculation is carried out.The eigenmatrix of certain length is sent into long memory models in short-term, by preceding to meter
It calculates, the posterior probability of different classifications gender will be obtained.Wherein, so-called posterior probability refers to after the information for obtaining " result " again
Modified probability is " fruit " in " hold fruit seek because " problem.Thing occurs not yet, it is desirable that a possibility that this occurs
Size is prior probability;Thing has occurred and that, it is desirable that the reason of this occurs is a possibility that being caused by some factor
Size is exactly posterior probability.
Finally, carrying out posteriority processing.One time threshold T is set by repetition test, the posterior probability for adding up T duration is general
Rate value compares, using the classification of greater probability value as the Gender Classification result of input audio.Here as time threshold T's
Value for example can be with value 0.5s, 1s etc..Time threshold T can not be arranged too long, because that may require that more data, know
Other real-time just cannot not become highly, but can not be arranged too short, and such accuracy is possible and not high enough.
In this way, semantics recognition is carried out to voice segments by semantics recognition step S300 and exports semantic information, on the other hand,
Gender Classification is carried out to voice segments by gender classifying step S400 and identifies user's gender and exports gender information, is then existed
The gender information and semantic information that fusion recognition goes out in fusion treatment step S500, obtain the voice messaging for the input
Personalized reply message.In some examples of the application, " fusion " mentioned in step S500 can be regarded as carrying out voice
When interactive information, the gender information obtained in step S400 can be considered, it is more targetedly or more appropriate for example to reply,
The several examples as follows provided.But the case where gender information in other application step S400 is also not precluded.
For example, when the voice of user's input is that " good morning!", when being identified as male in Gender Classification step S400, then
" sir, good morning for output!", when being identified as women in Gender Classification step S400, then export that " Ms, good morning!";When with
The voice of family input is " you think that I am good-looking ", when being identified as male in Gender Classification step S400, is then exported " certainly
, you are awesome guys!", when being identified as women in Gender Classification step S400, then export that " you bet, you are big beauties!";
When the voice of user's input is " be now several points ", when being identified as male in Gender Classification step S400, then export " first
Raw, it is at 3 points in afternoon now ", when being identified as women in Gender Classification step S400, then export that " Ms is afternoon 3 now
Point ".
The embodiment of voice interactive method of the invention is illustrated above.Then, for language of the invention
Sound interactive system,
Fig. 3 is the construction block diagram for indicating the voice interactive system of an embodiment of the present invention.
As shown in figure 3, the voice interactive system of one embodiment of the present invention includes:
Input module 100 is used for input voice information;
Preprocessing module 200 exports voice segments for receiving voice messaging and being pre-processed;
Gender Classification module 300, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's property
Not and export gender information;
Semantics recognition module 400, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic letter
Breath;
Fusion treatment module 500 is obtained for merging the gender information and institute's semantic information for the voice messaging
Personalized reply message;And
Output module 600, for personalized reply message described in voice output.
For the input voice information in preprocessing module 200, voice segments are carried out using end-point detection algorithm (VAD)
Detection, moreover, specifically, preprocessing module 200 carries out voice segments using end-point detection algorithm for the input voice information
Detection and output be supplied to the first voice segments of Gender Classification module 300 and be supplied to the second of semantics recognition module 400
Voice segments, wherein since Gender Classification module requires to reject all unvoiced segments as far as possible, the boundary of VAD should be more stringent, and
Since semantics recognition module 400 requires to retain complete text information as far as possible, so the boundary of VAD should be more tolerant, because
This, the end-point detection boundary of the first voice segments is stringenter than the end-point detection boundary of second voice segments.
Wherein, Gender Classification module 300 includes:
Model training submodule 310, the gender information marked for the output acoustic feature based on filter and in advance carry out
The long training of memory models in short-term obtains long memory models in short-term;And
Gender Classification submodule 320, for institute's speech segment to be input to the length obtained by training memory models and defeated in short-term
Gender Classification out.
Wherein, model training submodule 410 extracts the filter of the training set based on the training set marked with gender
Output acoustic feature, the corresponding mark file FA of output acoustic feature of filter is constructed, by the output sound of the filter
It learns feature and the mark file is input in long memory models in short-term and carries out model training until model is restrained.Gender Classification
Institute's speech segment is input to the memory models in short-term of the length by training acquisition by submodule 420, obtains difference by calculating forward
The posterior probability of the other posterior probability of classification and accumulative stipulated time length is to obtain gender classification results.
Voice interactive method described in any of the above-described example can be applied to language described in vehicle or any of the above-described example
Sound interactive system can be applied to vehicle.For example, a part as control method for vehicle or vehicle control system is presented.
The present invention also provides a kind of interactive voice equipment, are able to carry out interactive voice side described in any example as above
Method;Or comprising voice interactive system described in any of the above-described example.It is one that the interactive voice equipment, which can be implemented separately,
Component can be set to vehicle, such as make interior personnel that can carry out interactive voice with it.Here, the interactive voice is set
It is standby to can be the equipment being fixed on vehicle, it is also possible to the equipment that can be taken away/put back to from vehicle.And further, one
In a little examples, which can be communicated with the electronic control system in vehicle.In some cases, may be used
The interactive voice equipment is realized in the existing electronic component of vehicle, such as information entertainment of vehicle etc..
The present invention also provides a kind of controllers comprising storage unit, processing component and is stored on storage unit simultaneously
Can component processed operation instruction, which is characterized in that when described instruction is run, the processing component realizes above-mentioned language
Sound exchange method.
Each exemplary voice interactive method and voice interactive system according to the present invention, in conjunction with semantic analysis and gender point
Class can distinguish reply according to the gender of user, improve user experience, improve the intelligence of interactive voice.
Example above primarily illustrates voice interactive method and voice interactive system of the invention.Although only to wherein one
A little a specific embodiment of the invention are described, but those of ordinary skill in the art are it is to be appreciated that the present invention can be
Implement without departing from its spirit in range in many other forms.Therefore, the example shown is considered as showing with embodiment
Meaning property and not restrictive, in the case where not departing from the spirit and scope of the present invention as defined in appended claims,
The present invention may cover various modification and replacement.
Claims (16)
1. a kind of voice interactive method characterized by comprising
Pre-treatment step pre-processes the voice messaging of input and exports voice segments;
Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;
Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;With
And
Fusion treatment step merges the gender information and institute's semantic information and obtains the personalization for the voice messaging
Return information.
2. voice interactive method as described in claim 1, which is characterized in that the gender analysis step includes:
Model training sub-step, output acoustic feature based on filter and the gender information marked in advance carry out length and remember in short-term
Recall model training and obtains long memory models in short-term;And
Institute's speech segment is input to the length obtained by training and memory models and exports gender point in short-term by Gender Classification sub-step
Class.
3. voice interactive method as described in claim 1, which is characterized in that
In the pre-treatment step, for the input voice information, the detection of voice segments is carried out using end-point detection algorithm.
4. voice interactive method as claimed in claim 3, which is characterized in that
In the pre-treatment step, for the input voice information, the detection of voice segments is carried out simultaneously using end-point detection algorithm
Output is supplied to the first voice segments of the semantics recognition step and is supplied to the second voice segments of the Gender Classification step,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
5. voice interactive method as claimed in claim 2, which is characterized in that the model training sub-step includes:
Prepare the training set with gender mark;
Extract the output acoustic feature of the filter of the training set;
Construct the corresponding mark file of output acoustic feature of filter;And
The output acoustic feature of the filter and the mark file are input in long memory models in short-term and carry out model
Training is until model is restrained.
6. voice interactive method as claimed in claim 2, which is characterized in that the Gender Classification sub-step includes:
Institute's speech segment is input to by the trained length obtained in short-term memory models;
It carries out calculating the posterior probability for obtaining different classifications gender forward;And
Add up stipulated time long posterior probability and obtains gender classification results.
7. a kind of voice interactive system characterized by comprising
Preprocessing module, for being pre-processed to the voice messaging of input and exporting voice segments;
Semantics recognition module, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic information;
Gender Classification module, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's gender
And export gender information;And
Fusion treatment module, for merging the gender information and the acquisition of institute's semantic information for the individual character of the voice messaging
Change return information.
8. voice interactive system as claimed in claim 7, which is characterized in that the Gender Classification module includes:
Model training submodule, the gender information marked for the output acoustic feature based on filter and in advance carry out length
When memory models training obtain long memory models in short-term;And
Gender Classification submodule, for institute's speech segment to be input to the length by training acquisition memory models and output property in short-term
Do not classify.
9. voice interactive system as claimed in claim 7, which is characterized in that
In the preprocessing module, for the input voice information, the detection of voice segments is carried out using end-point detection algorithm.
10. voice interactive system as claimed in claim 9, which is characterized in that
The preprocessing module carries out the input voice information using end-point detection algorithm detection and the output of voice segments
It is supplied to the first voice segments of the semantics recognition module and is supplied to the second voice segments of the Gender Classification module,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
11. voice interactive system as claimed in claim 8, which is characterized in that
The model training submodule extracts the output sound of the filter of the training set based on the training set marked with gender
Learn feature, construct the corresponding mark file of output acoustic feature of filter, by the output acoustic feature of the filter and
Model training is carried out until model is restrained in the long memory models in short-term of mark file input.
12. voice interactive system as claimed in claim 8, which is characterized in that the Gender Classification submodule is by the voice
Section is input to the memory models in short-term of the length by training acquisition, by calculating the posterior probability of acquisition different classifications gender forward simultaneously
And the posterior probability of accumulative stipulated time length is to obtain gender classification results.
13. voice interactive method or any one of such as claim 7 to 12 as described in claims 1 to 6 kind any one
The voice interactive system is applied to vehicle.
14. a kind of interactive voice equipment, is able to carry out the voice interactive method as described in any one of claims 1 to 6,
Or it includes the voice interactive system as described in any one of claim 7 to 12.
15. interactive voice equipment as claimed in claim 14, is set to vehicle.
16. a kind of controller comprising storage unit, processing component and be stored on storage unit and can component processed fortune
Capable instruction, which is characterized in that when described instruction is run, the processing component realizes any one of claim 1 ~ 6
The voice interactive method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810473045.9A CN110503943B (en) | 2018-05-17 | 2018-05-17 | Voice interaction method and voice interaction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810473045.9A CN110503943B (en) | 2018-05-17 | 2018-05-17 | Voice interaction method and voice interaction system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110503943A true CN110503943A (en) | 2019-11-26 |
CN110503943B CN110503943B (en) | 2023-09-19 |
Family
ID=68583957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810473045.9A Active CN110503943B (en) | 2018-05-17 | 2018-05-17 | Voice interaction method and voice interaction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110503943B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883133A (en) * | 2020-07-20 | 2020-11-03 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium |
CN112397067A (en) * | 2020-11-13 | 2021-02-23 | 重庆长安工业(集团)有限责任公司 | Voice control terminal of weapon equipment |
CN113870861A (en) * | 2021-09-10 | 2021-12-31 | Oppo广东移动通信有限公司 | Voice interaction method and device, storage medium and terminal |
CN116092056A (en) * | 2023-03-06 | 2023-05-09 | 安徽蔚来智驾科技有限公司 | Target recognition method, vehicle control method, device, medium and vehicle |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140163984A1 (en) * | 2012-12-10 | 2014-06-12 | Lenovo (Beijing) Co., Ltd. | Method Of Voice Recognition And Electronic Apparatus |
CN105700682A (en) * | 2016-01-08 | 2016-06-22 | 北京乐驾科技有限公司 | Intelligent gender and emotion recognition detection system and method based on vision and voice |
CN107146615A (en) * | 2017-05-16 | 2017-09-08 | 南京理工大学 | Audio recognition method and system based on the secondary identification of Matching Model |
CN107305541A (en) * | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
CN107799126A (en) * | 2017-10-16 | 2018-03-13 | 深圳狗尾草智能科技有限公司 | Sound end detecting method and device based on Supervised machine learning |
-
2018
- 2018-05-17 CN CN201810473045.9A patent/CN110503943B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140163984A1 (en) * | 2012-12-10 | 2014-06-12 | Lenovo (Beijing) Co., Ltd. | Method Of Voice Recognition And Electronic Apparatus |
CN105700682A (en) * | 2016-01-08 | 2016-06-22 | 北京乐驾科技有限公司 | Intelligent gender and emotion recognition detection system and method based on vision and voice |
CN107305541A (en) * | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
CN107146615A (en) * | 2017-05-16 | 2017-09-08 | 南京理工大学 | Audio recognition method and system based on the secondary identification of Matching Model |
CN107799126A (en) * | 2017-10-16 | 2018-03-13 | 深圳狗尾草智能科技有限公司 | Sound end detecting method and device based on Supervised machine learning |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883133A (en) * | 2020-07-20 | 2020-11-03 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium |
CN111883133B (en) * | 2020-07-20 | 2023-08-29 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, server and storage medium |
CN112397067A (en) * | 2020-11-13 | 2021-02-23 | 重庆长安工业(集团)有限责任公司 | Voice control terminal of weapon equipment |
CN113870861A (en) * | 2021-09-10 | 2021-12-31 | Oppo广东移动通信有限公司 | Voice interaction method and device, storage medium and terminal |
CN116092056A (en) * | 2023-03-06 | 2023-05-09 | 安徽蔚来智驾科技有限公司 | Target recognition method, vehicle control method, device, medium and vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN110503943B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110838289B (en) | Wake-up word detection method, device, equipment and medium based on artificial intelligence | |
EP3424044B1 (en) | Modular deep learning model | |
CN110503943A (en) | A kind of voice interactive method and voice interactive system | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN106504768B (en) | Phone testing audio frequency classification method and device based on artificial intelligence | |
CN108564940A (en) | Audio recognition method, server and computer readable storage medium | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN108885870A (en) | For by combining speech to TEXT system with speech to intention system the system and method to realize voice user interface | |
US20210174805A1 (en) | Voice user interface | |
CN107731233A (en) | A kind of method for recognizing sound-groove based on RNN | |
CN109313892A (en) | Steady language identification method and system | |
CN110570853A (en) | Intention recognition method and device based on voice data | |
CN109036393A (en) | Wake-up word training method, device and the household appliance of household appliance | |
CN110648659A (en) | Voice recognition and keyword detection device and method based on multitask model | |
CN113255362B (en) | Method and device for filtering and identifying human voice, electronic device and storage medium | |
Maheswari et al. | A hybrid model of neural network approach for speaker independent word recognition | |
Gupta et al. | Speech emotion recognition using SVM with thresholding fusion | |
CN113129867A (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
CN111199149A (en) | Intelligent statement clarifying method and system for dialog system | |
CN110931018A (en) | Intelligent voice interaction method and device and computer readable storage medium | |
CN109104534A (en) | A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate | |
CN109545202A (en) | Method and system for adjusting corpus with semantic logic confusion | |
CN113362815A (en) | Voice interaction method, system, electronic equipment and storage medium | |
CN105869622B (en) | Chinese hot word detection method and device | |
CN109074809B (en) | Information processing apparatus, information processing method, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200806 Address after: Susong Road West and Shenzhen Road North, Hefei Economic and Technological Development Zone, Anhui Province Applicant after: Weilai (Anhui) Holding Co.,Ltd. Address before: 30 Floor of Yihe Building, No. 1 Kangle Plaza, Central, Hong Kong, China Applicant before: NIO NEXTEV Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |