CN110503943A - A kind of voice interactive method and voice interactive system - Google Patents

A kind of voice interactive method and voice interactive system Download PDF

Info

Publication number
CN110503943A
CN110503943A CN201810473045.9A CN201810473045A CN110503943A CN 110503943 A CN110503943 A CN 110503943A CN 201810473045 A CN201810473045 A CN 201810473045A CN 110503943 A CN110503943 A CN 110503943A
Authority
CN
China
Prior art keywords
voice
gender
information
input
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810473045.9A
Other languages
Chinese (zh)
Other versions
CN110503943B (en
Inventor
孙珏
徐曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NIO Holding Co Ltd
Original Assignee
NIO Nextev Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NIO Nextev Ltd filed Critical NIO Nextev Ltd
Priority to CN201810473045.9A priority Critical patent/CN110503943B/en
Publication of CN110503943A publication Critical patent/CN110503943A/en
Application granted granted Critical
Publication of CN110503943B publication Critical patent/CN110503943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of voice interactive method and voice interactive systems.This method comprises: pre-treatment step, pre-processes the voice messaging of input and exports voice segments;Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;And fusion treatment step, it merges the gender information and institute's semantic information and obtains the personalized reply message for the voice messaging.Sound exchange method and voice interactive system according to the present invention can distinguish reply according to the gender of user, improve user experience, improve the intelligence of interactive voice.

Description

A kind of voice interactive method and voice interactive system
Technical field
The present invention relates to speech recognition technology, more particularly to a kind of voice interactive method that can identify user's gender and Voice interactive system.
Background technique
In vehicle-mounted conversational system, existing speech recognition technology can identify the voice of user to a certain extent, still There is part topic to be related to the gender of user, current speech recognition technology, which is often difficult to be provided according to the text of identification, meets use The answer of family gender.
Being disclosed in the information of background parts of the present invention, it is only intended to increase understanding of the overall background of the invention, without answering When being considered as recognizing or imply that the information constitutes the prior art already known to those of ordinary skill in the art in any form.
Summary of the invention
In view of the above problems, the present invention is intended to provide a kind of voice interactive method and voice that can identify user's gender Interactive system.
Voice interactive method of the invention characterized by comprising
Pre-treatment step pre-processes the voice messaging of input and exports voice segments;
Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;
Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;With And
Fusion treatment step merges the gender information and institute's semantic information and obtains the personalization for the voice messaging Return information.
Optionally, the gender analysis step includes:
Model training sub-step, output acoustic feature based on filter and the gender information marked in advance carry out length and remember in short-term Recall model training and obtains long memory models in short-term;And
Institute's speech segment is input to the length obtained by training and memory models and exports gender point in short-term by Gender Classification sub-step Class.
Optionally, in the pre-treatment step, for the input voice information, voice is carried out using end-point detection algorithm The detection of section.
Optionally, in the pre-treatment step, for the input voice information, voice is carried out using end-point detection algorithm The detection and output of section are supplied to the first voice segments of the semantics recognition step and are supplied to the Gender Classification step Second voice segments, wherein the end-point detection boundary of second voice segments than first voice segments end-point detection boundary more Add stringent.
Optionally, the model training sub-step includes:
Prepare the training set with gender mark;
Extract the output acoustic feature of the filter of the training set;
Construct the corresponding mark file of output acoustic feature of filter;And
The output acoustic feature of the filter and the mark file are input in long memory models in short-term and carry out model Training is until model is restrained.
Optionally, the Gender Classification sub-step includes:
Institute's speech segment is input to by the trained length obtained in short-term memory models;
It carries out calculating the posterior probability for obtaining different classifications gender forward;And
Add up stipulated time long posterior probability and obtains gender classification results.
Voice interactive system of the invention characterized by comprising
Preprocessing module, for being pre-processed to the voice messaging of input and exporting voice segments;
Semantics recognition module, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic information;
Gender Classification module, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's gender And export gender information;And
Fusion treatment module, for merging the gender information and the acquisition of institute's semantic information for the individual character of the voice messaging Change return information.
Optionally, the Gender Classification module includes:
Model training submodule, the gender information marked for the output acoustic feature based on filter and in advance carry out length When memory models training obtain long memory models in short-term;And
Gender Classification submodule, for institute's speech segment to be input to the length by training acquisition memory models and output property in short-term Do not classify.
Optionally, in the preprocessing module, for the input voice information, voice is carried out using end-point detection algorithm The detection of section.
Optionally, the preprocessing module carries out voice segments using end-point detection algorithm for the input voice information It detects and exports the first voice segments for being supplied to the semantics recognition module and be supplied to the second of the Gender Classification module Voice segments,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
Optionally, the model training submodule extracts the filter of the training set based on the training set marked with gender The output acoustic feature of wave device constructs the corresponding mark file of output acoustic feature of filter, by the output of the filter Model training is carried out until model is restrained in acoustic feature and the long memory models in short-term of mark file input.
Optionally, institute's speech segment is input to the long short-term memory mould obtained by training by the Gender Classification submodule Type obtains the posterior probability of different classifications gender and the posterior probability of accumulative stipulated time length by calculating forward with acquired Other classification results.
Above-mentioned voice interactive method of the invention is applied to vehicle or above-mentioned voice interactive system of the invention is applied to vehicle .
The present invention also provides a kind of interactive voice equipment, it is able to carry out above-mentioned voice interactive method or it includes above-mentioned Voice interactive system.
Optionally, above-mentioned interactive voice equipment is set to vehicle.
The present invention provides a kind of controller comprising storage unit, processing component and is stored on storage unit and can The instruction of component operation processed, which is characterized in that when described instruction is run, the processing component realizes above-mentioned voice Exchange method.Voice interactive method and voice interactive system according to the present invention, in conjunction with semantic analysis and Gender Classification, energy It is enough that reply is distinguished according to the gender of user, user experience is improved, the intelligence of interactive voice is improved.
It is used to illustrate the specific reality of certain principles of the invention together with attached drawing by include this paper attached drawing and then Mode is applied, other feature possessed by methods and apparatus of the present invention and advantage will become more concrete or explained It is bright.
Detailed description of the invention
Fig. 1 is the flow chart for indicating the voice interactive method of an embodiment of the present invention.
Fig. 2 is the idiographic flow schematic diagram of Gender Classification step.
Fig. 3 is the construction block diagram for indicating the voice interactive system of an embodiment of the present invention.
Specific embodiment
What is be described below is some in multiple embodiments of the invention, it is desirable to provide to basic understanding of the invention.And It is not intended to and confirms crucial or conclusive element of the invention or limit scope of the claimed.
It is explained firstly, for some terms hereinafter to be occurred.
Nlu: natural language understanding;
Asr: automatic speech recognition;
Long memory models (LSTM) in short-term: long memory models, a kind of deep learning model in short-term can learn to rely on letter for a long time Breath;
Feats: the filter bank characteristic parameter of audio file;
Cmvn: tag file demographic information;
A kind of gmm-hmm: hidden Markov model of the conventional acoustic model-based on mixed Gauss model.
Fig. 1 is the flow chart of voice interactive method according to an embodiment of the present invention.
As the voice interactive method of Fig. 1, an embodiment of the present invention include the following steps:
Input step S100: input voice information;
Pre-treatment step S200: the voice messaging of input step S100 input is pre-processed and exports voice segments;
Semantics recognition step S300: semantics recognition is carried out to the voice segments of pre-treatment step S200 output and exports semantic letter Breath;
Gender Classification step S400: Gender Classification is carried out to the voice segments of pre-treatment step S200 output, identifies user Gender simultaneously exports gender information;
Fusion treatment step S500: the gender information and the acquisition of institute's semantic information are merged for the voice messaging of the input Personalized reply message;And
Export step 600: to export the personalized reply message.For example, can be exported with voice mode, it can also be with text side Formula output.
Then, example is carried out for pre-treatment step S200, Gender Classification step S400 and fusion treatment step S500 Property explanation.Wherein, in semantics recognition step S300 to voice segments carry out semantics recognition and output semantic information can be used with often The identical technological means of rule technology, in this description will be omitted.
As an example, for the input voice information, using end-point detection algorithm in pre-treatment step S200 (VAD) voice messaging is detected to obtaining voice segments.For example, the voice messaging of user is input into VAD model, VAD model obtains voice segments by modes such as end-point detection, feature extractions.Voice segments obtained are respectively supplied to subsequent language Adopted identification step S300 and Gender Classification step S400.Wherein, voice recognition tasks require to retain complete text as far as possible The boundary of information, VAD should be more tolerant;And Gender Classification mission requirements reject all silence(tone-offs as far as possible), VAD Boundary should be more stringent.It therefore, is optionally respectively to provide two different voice segments to rear in pre-treatment step S200 Continuous semantics recognition step S300 and Gender Classification step S400.
Then, Gender Classification step S400 is illustrated.
Fig. 2 is the idiographic flow schematic diagram of Gender Classification step S400.
As shown in Fig. 2, Gender Classification step S400 can substantially be divided into training stage and cognitive phase.
It is illustrated firstly, for the training stage.
It needs to prepare a batch and possesses the training set of gender mark as training sample, including wav.scp, utt2spk, Text and the corresponding gender information of every language (utterance) extract the feats(of training set that is, audio file Filter bank characteristic parameter, the filter in Fig. 2 output acoustic feature (that is, filter bank feature, filter Output acoustic feature) and cmvn be training grow in short-term memory models prepare.Feats herein is needed using based on triphones The gmm-hmm model of (that is, tri-phone) carries out pressure alignment (that is, forced align) to it, finds the corresponding nothing of feature The boundary sound (that is, silence), the unvoiced segments of feats are reduced, and only retaining being capable of the other voice segments of distinction.
Since gender model is a disaggregated model, the corresponding mark file (i.e. FA in Fig. 2) of construction feature is needed, together Sample marks file FA just for the voice segments of feats, constructs the mark file that a batch reflects its gender according to the frame number of feats FA。
By the tag file feats prepared above and mark file FA, it is input in long memory models in short-term and is trained Until convergence.Here, LSTM(Long-Short Term Memory) it is recurrent neural network (RNN:Recurrent Neutral Network) one kind.RNNs is also recurrent neural network sequence, it is a kind of according to time series or character sequence The special neural network for arranging self calling after it is unfolded by sequence, just becomes common three-layer neural network, is commonly applied to language Sound identification.
Here, the basic parameter that long memory models in short-term use are as follows:
num-lstm-layers: 1;
cell-dim: 1024;
lstm-delay: -1。
Secondly, being illustrated for cognitive phase.
Firstly, it is necessary to carry out feature extraction.When a user speaks, voice is believed using end-point detection algorithm (VAD) first Breath is detected, and carries out feature extraction to non-noiseless (non-silence) speech frame for detecting VAD.Since length is remembered in short-term Recalling model is a model dependent on last time, therefore it is accumulative that a buffer progress feature can be set.
Then, forward calculation is carried out.The eigenmatrix of certain length is sent into long memory models in short-term, by preceding to meter It calculates, the posterior probability of different classifications gender will be obtained.Wherein, so-called posterior probability refers to after the information for obtaining " result " again Modified probability is " fruit " in " hold fruit seek because " problem.Thing occurs not yet, it is desirable that a possibility that this occurs Size is prior probability;Thing has occurred and that, it is desirable that the reason of this occurs is a possibility that being caused by some factor Size is exactly posterior probability.
Finally, carrying out posteriority processing.One time threshold T is set by repetition test, the posterior probability for adding up T duration is general Rate value compares, using the classification of greater probability value as the Gender Classification result of input audio.Here as time threshold T's Value for example can be with value 0.5s, 1s etc..Time threshold T can not be arranged too long, because that may require that more data, know Other real-time just cannot not become highly, but can not be arranged too short, and such accuracy is possible and not high enough.
In this way, semantics recognition is carried out to voice segments by semantics recognition step S300 and exports semantic information, on the other hand, Gender Classification is carried out to voice segments by gender classifying step S400 and identifies user's gender and exports gender information, is then existed The gender information and semantic information that fusion recognition goes out in fusion treatment step S500, obtain the voice messaging for the input Personalized reply message.In some examples of the application, " fusion " mentioned in step S500 can be regarded as carrying out voice When interactive information, the gender information obtained in step S400 can be considered, it is more targetedly or more appropriate for example to reply, The several examples as follows provided.But the case where gender information in other application step S400 is also not precluded.
For example, when the voice of user's input is that " good morning!", when being identified as male in Gender Classification step S400, then " sir, good morning for output!", when being identified as women in Gender Classification step S400, then export that " Ms, good morning!";When with The voice of family input is " you think that I am good-looking ", when being identified as male in Gender Classification step S400, is then exported " certainly , you are awesome guys!", when being identified as women in Gender Classification step S400, then export that " you bet, you are big beauties!"; When the voice of user's input is " be now several points ", when being identified as male in Gender Classification step S400, then export " first Raw, it is at 3 points in afternoon now ", when being identified as women in Gender Classification step S400, then export that " Ms is afternoon 3 now Point ".
The embodiment of voice interactive method of the invention is illustrated above.Then, for language of the invention Sound interactive system,
Fig. 3 is the construction block diagram for indicating the voice interactive system of an embodiment of the present invention.
As shown in figure 3, the voice interactive system of one embodiment of the present invention includes:
Input module 100 is used for input voice information;
Preprocessing module 200 exports voice segments for receiving voice messaging and being pre-processed;
Gender Classification module 300, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's property Not and export gender information;
Semantics recognition module 400, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic letter Breath;
Fusion treatment module 500 is obtained for merging the gender information and institute's semantic information for the voice messaging Personalized reply message;And
Output module 600, for personalized reply message described in voice output.
For the input voice information in preprocessing module 200, voice segments are carried out using end-point detection algorithm (VAD) Detection, moreover, specifically, preprocessing module 200 carries out voice segments using end-point detection algorithm for the input voice information Detection and output be supplied to the first voice segments of Gender Classification module 300 and be supplied to the second of semantics recognition module 400 Voice segments, wherein since Gender Classification module requires to reject all unvoiced segments as far as possible, the boundary of VAD should be more stringent, and Since semantics recognition module 400 requires to retain complete text information as far as possible, so the boundary of VAD should be more tolerant, because This, the end-point detection boundary of the first voice segments is stringenter than the end-point detection boundary of second voice segments.
Wherein, Gender Classification module 300 includes:
Model training submodule 310, the gender information marked for the output acoustic feature based on filter and in advance carry out The long training of memory models in short-term obtains long memory models in short-term;And
Gender Classification submodule 320, for institute's speech segment to be input to the length obtained by training memory models and defeated in short-term Gender Classification out.
Wherein, model training submodule 410 extracts the filter of the training set based on the training set marked with gender Output acoustic feature, the corresponding mark file FA of output acoustic feature of filter is constructed, by the output sound of the filter It learns feature and the mark file is input in long memory models in short-term and carries out model training until model is restrained.Gender Classification Institute's speech segment is input to the memory models in short-term of the length by training acquisition by submodule 420, obtains difference by calculating forward The posterior probability of the other posterior probability of classification and accumulative stipulated time length is to obtain gender classification results.
Voice interactive method described in any of the above-described example can be applied to language described in vehicle or any of the above-described example Sound interactive system can be applied to vehicle.For example, a part as control method for vehicle or vehicle control system is presented.
The present invention also provides a kind of interactive voice equipment, are able to carry out interactive voice side described in any example as above Method;Or comprising voice interactive system described in any of the above-described example.It is one that the interactive voice equipment, which can be implemented separately, Component can be set to vehicle, such as make interior personnel that can carry out interactive voice with it.Here, the interactive voice is set It is standby to can be the equipment being fixed on vehicle, it is also possible to the equipment that can be taken away/put back to from vehicle.And further, one In a little examples, which can be communicated with the electronic control system in vehicle.In some cases, may be used The interactive voice equipment is realized in the existing electronic component of vehicle, such as information entertainment of vehicle etc..
The present invention also provides a kind of controllers comprising storage unit, processing component and is stored on storage unit simultaneously Can component processed operation instruction, which is characterized in that when described instruction is run, the processing component realizes above-mentioned language Sound exchange method.
Each exemplary voice interactive method and voice interactive system according to the present invention, in conjunction with semantic analysis and gender point Class can distinguish reply according to the gender of user, improve user experience, improve the intelligence of interactive voice.
Example above primarily illustrates voice interactive method and voice interactive system of the invention.Although only to wherein one A little a specific embodiment of the invention are described, but those of ordinary skill in the art are it is to be appreciated that the present invention can be Implement without departing from its spirit in range in many other forms.Therefore, the example shown is considered as showing with embodiment Meaning property and not restrictive, in the case where not departing from the spirit and scope of the present invention as defined in appended claims, The present invention may cover various modification and replacement.

Claims (16)

1. a kind of voice interactive method characterized by comprising
Pre-treatment step pre-processes the voice messaging of input and exports voice segments;
Semantics recognition step carries out semantics recognition to the voice segments of pre-treatment step output and exports semantic information;
Gender Classification step identifies user's gender to the voice segments of pre-treatment step output and exports gender information;With And
Fusion treatment step merges the gender information and institute's semantic information and obtains the personalization for the voice messaging Return information.
2. voice interactive method as described in claim 1, which is characterized in that the gender analysis step includes:
Model training sub-step, output acoustic feature based on filter and the gender information marked in advance carry out length and remember in short-term Recall model training and obtains long memory models in short-term;And
Institute's speech segment is input to the length obtained by training and memory models and exports gender point in short-term by Gender Classification sub-step Class.
3. voice interactive method as described in claim 1, which is characterized in that
In the pre-treatment step, for the input voice information, the detection of voice segments is carried out using end-point detection algorithm.
4. voice interactive method as claimed in claim 3, which is characterized in that
In the pre-treatment step, for the input voice information, the detection of voice segments is carried out simultaneously using end-point detection algorithm Output is supplied to the first voice segments of the semantics recognition step and is supplied to the second voice segments of the Gender Classification step,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
5. voice interactive method as claimed in claim 2, which is characterized in that the model training sub-step includes:
Prepare the training set with gender mark;
Extract the output acoustic feature of the filter of the training set;
Construct the corresponding mark file of output acoustic feature of filter;And
The output acoustic feature of the filter and the mark file are input in long memory models in short-term and carry out model Training is until model is restrained.
6. voice interactive method as claimed in claim 2, which is characterized in that the Gender Classification sub-step includes:
Institute's speech segment is input to by the trained length obtained in short-term memory models;
It carries out calculating the posterior probability for obtaining different classifications gender forward;And
Add up stipulated time long posterior probability and obtains gender classification results.
7. a kind of voice interactive system characterized by comprising
Preprocessing module, for being pre-processed to the voice messaging of input and exporting voice segments;
Semantics recognition module, the voice segments for exporting to the preprocessing module carry out semantics recognition and export semantic information;
Gender Classification module, the voice segments for exporting to the preprocessing module carry out Gender Classification, identify user's gender And export gender information;And
Fusion treatment module, for merging the gender information and the acquisition of institute's semantic information for the individual character of the voice messaging Change return information.
8. voice interactive system as claimed in claim 7, which is characterized in that the Gender Classification module includes:
Model training submodule, the gender information marked for the output acoustic feature based on filter and in advance carry out length When memory models training obtain long memory models in short-term;And
Gender Classification submodule, for institute's speech segment to be input to the length by training acquisition memory models and output property in short-term Do not classify.
9. voice interactive system as claimed in claim 7, which is characterized in that
In the preprocessing module, for the input voice information, the detection of voice segments is carried out using end-point detection algorithm.
10. voice interactive system as claimed in claim 9, which is characterized in that
The preprocessing module carries out the input voice information using end-point detection algorithm detection and the output of voice segments It is supplied to the first voice segments of the semantics recognition module and is supplied to the second voice segments of the Gender Classification module,
Wherein, the end-point detection boundary of second voice segments is stringenter than the end-point detection boundary of first voice segments.
11. voice interactive system as claimed in claim 8, which is characterized in that
The model training submodule extracts the output sound of the filter of the training set based on the training set marked with gender Learn feature, construct the corresponding mark file of output acoustic feature of filter, by the output acoustic feature of the filter and Model training is carried out until model is restrained in the long memory models in short-term of mark file input.
12. voice interactive system as claimed in claim 8, which is characterized in that the Gender Classification submodule is by the voice Section is input to the memory models in short-term of the length by training acquisition, by calculating the posterior probability of acquisition different classifications gender forward simultaneously And the posterior probability of accumulative stipulated time length is to obtain gender classification results.
13. voice interactive method or any one of such as claim 7 to 12 as described in claims 1 to 6 kind any one The voice interactive system is applied to vehicle.
14. a kind of interactive voice equipment, is able to carry out the voice interactive method as described in any one of claims 1 to 6, Or it includes the voice interactive system as described in any one of claim 7 to 12.
15. interactive voice equipment as claimed in claim 14, is set to vehicle.
16. a kind of controller comprising storage unit, processing component and be stored on storage unit and can component processed fortune Capable instruction, which is characterized in that when described instruction is run, the processing component realizes any one of claim 1 ~ 6 The voice interactive method.
CN201810473045.9A 2018-05-17 2018-05-17 Voice interaction method and voice interaction system Active CN110503943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810473045.9A CN110503943B (en) 2018-05-17 2018-05-17 Voice interaction method and voice interaction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810473045.9A CN110503943B (en) 2018-05-17 2018-05-17 Voice interaction method and voice interaction system

Publications (2)

Publication Number Publication Date
CN110503943A true CN110503943A (en) 2019-11-26
CN110503943B CN110503943B (en) 2023-09-19

Family

ID=68583957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810473045.9A Active CN110503943B (en) 2018-05-17 2018-05-17 Voice interaction method and voice interaction system

Country Status (1)

Country Link
CN (1) CN110503943B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883133A (en) * 2020-07-20 2020-11-03 深圳乐信软件技术有限公司 Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium
CN112397067A (en) * 2020-11-13 2021-02-23 重庆长安工业(集团)有限责任公司 Voice control terminal of weapon equipment
CN113870861A (en) * 2021-09-10 2021-12-31 Oppo广东移动通信有限公司 Voice interaction method and device, storage medium and terminal
CN116092056A (en) * 2023-03-06 2023-05-09 安徽蔚来智驾科技有限公司 Target recognition method, vehicle control method, device, medium and vehicle

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163984A1 (en) * 2012-12-10 2014-06-12 Lenovo (Beijing) Co., Ltd. Method Of Voice Recognition And Electronic Apparatus
CN105700682A (en) * 2016-01-08 2016-06-22 北京乐驾科技有限公司 Intelligent gender and emotion recognition detection system and method based on vision and voice
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN107799126A (en) * 2017-10-16 2018-03-13 深圳狗尾草智能科技有限公司 Sound end detecting method and device based on Supervised machine learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163984A1 (en) * 2012-12-10 2014-06-12 Lenovo (Beijing) Co., Ltd. Method Of Voice Recognition And Electronic Apparatus
CN105700682A (en) * 2016-01-08 2016-06-22 北京乐驾科技有限公司 Intelligent gender and emotion recognition detection system and method based on vision and voice
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model
CN107799126A (en) * 2017-10-16 2018-03-13 深圳狗尾草智能科技有限公司 Sound end detecting method and device based on Supervised machine learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883133A (en) * 2020-07-20 2020-11-03 深圳乐信软件技术有限公司 Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium
CN111883133B (en) * 2020-07-20 2023-08-29 深圳乐信软件技术有限公司 Customer service voice recognition method, customer service voice recognition device, server and storage medium
CN112397067A (en) * 2020-11-13 2021-02-23 重庆长安工业(集团)有限责任公司 Voice control terminal of weapon equipment
CN113870861A (en) * 2021-09-10 2021-12-31 Oppo广东移动通信有限公司 Voice interaction method and device, storage medium and terminal
CN116092056A (en) * 2023-03-06 2023-05-09 安徽蔚来智驾科技有限公司 Target recognition method, vehicle control method, device, medium and vehicle

Also Published As

Publication number Publication date
CN110503943B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN110838289B (en) Wake-up word detection method, device, equipment and medium based on artificial intelligence
EP3424044B1 (en) Modular deep learning model
CN110503943A (en) A kind of voice interactive method and voice interactive system
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
CN106504768B (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN108564940A (en) Audio recognition method, server and computer readable storage medium
CN108447471A (en) Audio recognition method and speech recognition equipment
CN108885870A (en) For by combining speech to TEXT system with speech to intention system the system and method to realize voice user interface
US20210174805A1 (en) Voice user interface
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN109313892A (en) Steady language identification method and system
CN110570853A (en) Intention recognition method and device based on voice data
CN109036393A (en) Wake-up word training method, device and the household appliance of household appliance
CN110648659A (en) Voice recognition and keyword detection device and method based on multitask model
CN113255362B (en) Method and device for filtering and identifying human voice, electronic device and storage medium
Maheswari et al. A hybrid model of neural network approach for speaker independent word recognition
Gupta et al. Speech emotion recognition using SVM with thresholding fusion
CN113129867A (en) Training method of voice recognition model, voice recognition method, device and equipment
CN111199149A (en) Intelligent statement clarifying method and system for dialog system
CN110931018A (en) Intelligent voice interaction method and device and computer readable storage medium
CN109104534A (en) A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109545202A (en) Method and system for adjusting corpus with semantic logic confusion
CN113362815A (en) Voice interaction method, system, electronic equipment and storage medium
CN105869622B (en) Chinese hot word detection method and device
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200806

Address after: Susong Road West and Shenzhen Road North, Hefei Economic and Technological Development Zone, Anhui Province

Applicant after: Weilai (Anhui) Holding Co.,Ltd.

Address before: 30 Floor of Yihe Building, No. 1 Kangle Plaza, Central, Hong Kong, China

Applicant before: NIO NEXTEV Ltd.

GR01 Patent grant
GR01 Patent grant