CN110364156A - Voice interactive method, system, terminal and readable storage medium storing program for executing - Google Patents
Voice interactive method, system, terminal and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN110364156A CN110364156A CN201910737612.1A CN201910737612A CN110364156A CN 110364156 A CN110364156 A CN 110364156A CN 201910737612 A CN201910737612 A CN 201910737612A CN 110364156 A CN110364156 A CN 110364156A
- Authority
- CN
- China
- Prior art keywords
- voice
- voice signal
- signal
- target speaker
- key words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Abstract
The invention discloses a kind of voice interactive method, system, terminal and readable storage medium storing program for executing, method includes the first voice signal of acquisition terminal ambient enviroment;Judge whether the first voice signal includes default Key Words;If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words;And extract the first vocal print feature of target speaker;Continue the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker;Extract the second vocal print feature of the second voice signal;Judge that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature;If matching, control command therein is extracted according to the second voice signal, and corresponding operation is executed according to control command.In this way, the interference of other users sound or noise can be excluded, the accuracy rate of the identification of speech-sound intelligent interactive product is effectively improved, and then improve the intelligence of human-computer interaction.
Description
Technical field
The present invention relates to speech signal processing technology more particularly to a kind of voice interactive method, system, terminals and can
Read storage medium.
Background technique
With the development of information technology, occur more and more speech-sound intelligent products in the market, speech-sound intelligent product is
Through at the tool for interacting communication in for people's lives, when user is by carrying out interactive voice with speech-sound intelligent product, user
By saying that a default Key Words wake up speech-sound intelligent product, then user continues phonetic control command, speech-sound intelligent product
Corresponding operation is carried out according to the phonetic control command.
Currently, since actual environment is typically relatively complex, ambient noise source include cooling fan sound and other
User speaks generated sound, these signal sources can not only cause certain quality to influence general call, and for voice
Intellectual product is when carrying out speech recognition, it is easier to cause phonetic recognization rate low, and user is caused to use speech-sound intelligent product
On difficulty and discomfort understand severe jamming if there is multiple users speak simultaneously to speech recognition, speech recognition caused to go out
It is wrong or the problem of without response, and then influence the experience of user.
Summary of the invention
The main purpose of the present invention is to provide a kind of voice interactive method, system, terminal and readable storage medium storing program for executing, it is intended to
Solve the low technical problem of speech-sound intelligent product discrimination when interacting.
To achieve the above object, the embodiment of the present invention provides a kind of voice interactive method, and the voice interactive method includes:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words
The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction
Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command
Row corresponding operation.
Further, the step of the first voice signal of the acquisition terminal ambient enviroment includes:
The terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and use respectively
In the first voice signal for acquiring the terminal surrounding environment.
Further, if first voice signal includes the default Key Words, according to the default key
Language determines the step of position of target speaker, comprising:
Obtain multiple first voice signals;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass
First voice signal of key language.
Further, second voice for continuing acquisition terminal ambient enviroment according to the position of the target speaker is believed
Number the step of, comprising:
According to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired second voice
Signal.
Further, the position according to the target speaker, the microphone for being determined for compliance with preset condition are adopted
The step of collecting second voice signal, comprising:
Second voice signal is acquired by the single microphone near the position of the target speaker.
Further, the single microphone by near the position of the target speaker is acquired described
After the step of two voice signals, comprising:
Collected second voice signal is subjected to denoising.
Further, described the step of collected second voice signal is subjected to denoising, comprising:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute
It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone,
The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
The present invention also provides a kind of voice interactive system, the system comprises:
Acquisition module, the first voice signal for acquisition terminal ambient enviroment;
First judgment module, for judging whether first voice signal includes default Key Words;
Determining module, if including the default Key Words for first voice signal, according to the default key
Language determines the position of target speaker, and extracts the first vocal print feature of the target speaker;
Continue acquisition module, for continuing the second language of acquisition terminal ambient enviroment according to the position of the target speaker
Sound signal;
Second judgment module judges that the second vocal print of second voice signal is characterized in no and first vocal print feature
Matching;
Extraction module, if extracting control command therein according to second voice signal, and according to institute for matching
It states control command and executes corresponding operation.
The present invention also provides a kind of terminal, the terminal includes: memory, processor and is stored on the memory simultaneously
The interactive voice program that can be run on the processor is realized as above when the interactive voice program is executed by the processor
The step of voice interactive method stated.
The present invention also provides a kind of readable storage medium storing program for executing, which is characterized in that calculating is stored on the readable storage medium storing program for executing
Machine program is realized when the computer program is executed by processor such as the step of above-mentioned voice interactive method.
The present invention passes through the first voice signal of multiple microphone acquisition terminal ambient enviroments;Judging the first voice signal is
It is no to include default Key Words;If the first voice signal includes default Key Words, target speaker is determined according to Key Words are preset
Position;And extract the first vocal print feature of target speaker;Continued around acquisition terminal according to the position of target speaker
Second voice signal of environment;Extract the second vocal print feature of the second voice signal;Judge the second vocal print of the second voice signal
Whether feature matches with the first vocal print feature;If matching, according to the second voice signal extraction control command therein, and according to
Control command executes corresponding operation;If mismatching, terminal does not make any response.By to the position on target speaker
Vocal print feature further judges, can exclude the interference of other users sound or noise, effectively improve speech-sound intelligent
The accuracy rate of the identification of interactive product, and then improve the intelligence of human-computer interaction.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention
Example, and be used to explain the principle of the present invention together with specification.
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, for those of ordinary skill in the art
Speech, without any creative labor, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of hardware structural diagram of embodiment of terminal provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of one embodiment of voice interactive method of the present invention;
A kind of circuit theory schematic diagram of one embodiment of voice interactive system of Fig. 3 present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
In subsequent description, it is only using the suffix for indicating such as " module ", " component " or " unit " of element
Be conducive to explanation of the invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix
Ground uses.
As shown in Figure 1, Fig. 1 is the structural schematic diagram of the terminal for the hardware operation that the embodiment of the present invention is related to.
The terminal of that embodiment of the invention can be PC, be also possible to smart phone, tablet computer, E-book reader, MP3
(Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3)
Player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard sound
Frequency level 3) terminal device with voice interactive function such as player, portable computer.
As shown in Figure 1, the terminal may include: processor 1001, such as CPU, network interface 1004, user interface
1003, memory 1005, communication bus 1002.The equipment such as multiple microphones have can be set in the terminal, are constantly in voice letter
Number acquisition state, for acquiring the voice signal of user in real time.Wherein, communication bus 1002 is for realizing between these components
Connection communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional
User interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include standard
Wireline interface, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable
Memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned
The storage system of processor 1001.
Optionally, terminal can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, audio
Circuit, WiFi module etc..Wherein, sensor such as optical sensor, motion sensor and other sensors.Specifically, light
Sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can according to the light and shade of ambient light come
The brightness of display screen is adjusted, proximity sensor can close display screen and/or backlight when terminal is moved in one's ear.As movement
One kind of sensor, gravity accelerometer can detect the size of (generally three axis) acceleration in all directions, when static
Size and the direction that can detect that gravity can be used to identify application (such as the horizontal/vertical screen switching, related trip of mobile terminal posture
Play, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;Certainly, mobile terminal can also configure
The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal of terminal structure shown in Fig. 1, can wrap
It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe module, Subscriber Interface Module SIM and interactive voice program.
In terminal shown in Fig. 1, network interface 1004 is mainly used for connecting background server, carries out with background server
Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor
1001 can be used for calling the interactive voice program stored in memory 1005, and execute following operation:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words
The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction
Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command
Row corresponding operation.
Further, the terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions,
And it is respectively used to acquire the first voice signal of the terminal surrounding environment.
Further, multiple first voice signals are obtained;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass
First voice signal of key language.
Further, according to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired institute
State the second voice signal.
Further, second language is acquired by the single microphone near the position of the target speaker
Sound signal.
Further, processor 1001 can call the interactive voice program stored in memory 1005, also execute following
Operation: collected second voice signal is subjected to denoising.
Further, second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute
It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone,
The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
Based on above-mentioned terminal hardware structure, each embodiment of the method for the present invention is proposed.
The present invention provides a kind of voice interactive method, in one embodiment of voice interactive method, referring to Fig. 2, this method packet
It includes:
Step S10, the first voice signal of acquisition terminal ambient enviroment;
First voice signal of acquisition terminal ambient enviroment.Wherein, which can be set the equipment such as multiple microphones,
And it is constantly in speech signal collection state, for acquiring the voice signal of user in real time.
Step S20 judges whether the first voice signal includes default Key Words;
Terminal judges whether the first voice signal includes default Key Words.Wherein, terminal collects the first voice of user
Signal carries out speech recognition to first voice signal, the corresponding content of text of the first voice signal is got, according in text
Hold and inquire default Key Words, judges that content of text whether there is and the first word similar in default Key Words or default Key Words
Or first sentence;If there are the first word or the first sentences in content of text, it is determined that comprising pre- in the first voice signal
If Key Words.Specifically, which can be is arranged by business men oneself, is also possible to user oneself in terminal applies
Interface is configured.For example, default Key Words can be Little Bear, Little Bear.
Step S30 determines target speaker's according to Key Words are preset if the first voice signal includes default Key Words
Position, and extract the first vocal print feature of target speaker;
If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words, with
And extract the first vocal print feature of target speaker.Wherein, speech recognition is carried out to the first voice signal, if including default key
Language, and determine the position of target speaker, and terminal is converted from dormant state to wake-up states, it is subsequent to be carried out with user
Interactive voice operation, and to the first voice signal of the user carry out Application on Voiceprint Recognition, obtain the first vocal print feature of the user.
Step S40, continues the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker, and extracts
Second vocal print feature of the second voice signal;
Terminal continues the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker, and extracts second
Second vocal print feature of voice signal.Wherein, when terminal is in wake-up states, then continue the second of acquisition terminal ambient enviroment
Voice signal, and extract the vocal print feature in the direction on target speaker.Second voice signal may include control command.
Step S50 judges that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature;
Terminal judges that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature.Wherein, terminal after
Second voice signal of the position on continuous acquisition target speaker further judges whether second voice signal is whole with wake-up
The first vocal print feature of the target speaker at end matches.
Step S60 extracts control command therein according to the second voice signal, and hold according to control command if matching
Row corresponding operation.
If the second vocal print feature of the second voice signal is matched with the first vocal print feature, terminal is according to the second voice signal
Control command therein is extracted, and corresponding operation is executed according to control command;If the second vocal print feature of the second voice signal
It is mismatched with the first vocal print feature, then terminal does not make any response.Wherein, determine the second voice signal the second vocal print feature with
The matching of first vocal print feature, then terminal continues to collect the second voice signal of target speaker, gets the second voice signal
Corresponding content of text extracts control command therein according to content of text, and according to the control command, execution and target user
The corresponding operating for carrying out intelligent sound interaction, for example, the movement for command adapted thereto of answering a question, do.
The present invention passes through the first voice signal of acquisition terminal ambient enviroment;Judge whether the first voice signal includes default
Key Words;If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words;And
Extract the first vocal print feature of target speaker;Continue the second language of acquisition terminal ambient enviroment according to the position of target speaker
Sound signal;Extract the second vocal print feature of the second voice signal;It is no with the to judge that the second vocal print of the second voice signal is characterized in
The matching of one vocal print feature;If matching, control command therein is extracted according to the second voice signal, and execute according to control command
Corresponding operation;If mismatching, terminal does not make any response.By the vocal print feature to the position on target speaker into one
The judgement of step ground, can exclude the interference of other users sound or noise, effectively improve the knowledge of speech-sound intelligent interactive product
Other accuracy rate, and then improve the intelligence of human-computer interaction.
Optionally, step S10 specifically may include following steps:
Step S11, terminal are provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and use respectively
In the first voice signal of acquisition terminal ambient enviroment.
Multiple microphones have can be set in terminal, and multiple microphones are respectively arranged at different positions, and decibel is for adopting
Collect the first voice signal of terminal surrounding environment.Wherein, multiple microphones can be laid out as circular array, or rectangle,
Quadrate array etc., and multiple microphones can be uniformly distributed, it can also be according to practical situations non-uniform Distribution, such as in advance
It learns that the probability occurred in certain Place object speakers is larger, then microphone can be laid out more than its corresponding position, at it
His position microphone arrangement is more sparse, receives ability to enhance the voice signal on specific direction.
In some embodiments, terminal can determine which microphone is closer from user according to day regular data, will compare
Farther away microphone is closed, and the distributed data of this microphone closed uploads, to improve the setting layout of subsequent terminal microphone.
The voice signal of user is acquired by closer microphone, is closed other farther away microphones, can be economized on resources, Jin Erjie
Save electricity.
Optionally, step S20 specifically may include following steps:
Step S21 obtains multiple first voice signals;
Step S22 calculates the voice intensity of multiple default Key Words according to multiple first voice signals;
Step S23, according to the voice intensity of multiple default Key Words, mark include voice maximum intensity default pass
First voice signal of key language.
Terminal obtains multiple first voice signals and calculates the language of multiple predetermined keywords according to multiple first voice signals
Loudness of a sound degree, according to the voice intensity of multiple default Key Words, mark include voice maximum intensity default Key Words
One voice signal, wherein for marking hithermost first voice signal in position of microphone position Yu target speaker.Tool
Body, the signal screening that voice intensity is greater than default voice intensity threshold is come out, includes then that voice is strong to screening
The first voice signal of maximum default Key Words is spent, the position of target speaker is then determined according to default Key Words.
Optionally, step S40 specifically may include following steps:
Step S41, according to the position of target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice
Signal.
According to the position of target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice to be believed terminal
Number.Wherein, terminal filters out qualified microphone by the way that preset condition is arranged, and adopts to the second voice signal of user
Collection.
Further, in the another embodiment of voice interactive method of the present invention, after step S41, comprising:
Step S411 is acquired the second voice signal by the single microphone near the position of close-target speaker.
Terminal is acquired the second voice signal by the single microphone near the position of close-target speaker, by other
Microphone is closed.Wherein, according to the position of target speaker, then the estimated multiple microphones of calculating are spoken in reception target
When the acoustic wave energy of people, show that acoustic receiver time difference and multiple microphones receive the wave amplitude of target speaker, according to
Sound wave it is weak, obtain the wave amplitude of each microphone, according to wave amplitude and acoustic receiver time difference obtain near
The single microphone of the position of target speaker is acquired the second voice signal.Then said using the microphone as near close-target
The microphone for talking about the position of people acquires the second voice signal by the microphone.
Further, terminal can be with current entry near the data of the single microphone of the position of close-target speaker, will
Other microphones are closed, then the distributed data of this microphone closed is uploaded, to improve subsequent terminal microphone
Setting layout.The voice signal of user is acquired by closer microphone, is closed other farther away microphones, can be saved money
Source, and then save electricity.
In the present embodiment, the second voice letter is acquired by the single microphone near the position of close-target speaker
Number, it avoids wasting multiple microphones and is acquired, waste of resource, and the interference of other microphones can also be excluded, only to close to mesh
A microphone for marking the position of speaker carries out speech recognition, and then improves speech recognition effect, and human-computer interaction is made to have more intelligence
It can property.
Further, in the another embodiment of voice interactive method of the present invention, after step S411, comprising:
Collected second voice signal is carried out denoising by step A.
Collected second voice signal is carried out denoising by terminal, the second voice signal after being denoised, and is mentioned
The second vocal print feature of the second voice signal is taken, determines that the second vocal print feature of the second voice signal is the first vocal print feature, then
Speech recognition is carried out according to the second voice signal after denoising.Wherein, denoising is carried out to voice to belong to those skilled in the art and can operate
Technology, no longer illustrate here.
In this embodiment, the second voice signal after being denoised, and the second vocal print feature of the second voice signal is extracted,
The second vocal print feature for determining the second voice signal is the first vocal print feature, then carries out language according to the second voice signal after denoising
Sound identification can be improved terminal recognition rate, and then improve the accuracy of terminal recognition.
Optionally, it the step of collected second voice signal is carried out denoising, specifically includes:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of frequency-region signal are calculated, Optimal Parameters include: directive property parameter and white noise acoustic gain, directive property parameter
Referring to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, white noise acoustic gain is
Refer to multiple microphone output signal-to-noise ratios and inputs the ratio of signal-to-noise ratio;Frequency-region signal is optimized according to Optimal Parameters, is obtained
Voice signal after denoising.
In practical applications, it is contemplated that voice signal is broadband signal, and different frequent points are needed to handle respectively, so needing
The time-domain signal that microphone is collected into is converted into frequency-region signal.Terminal calculates the Optimal Parameters of frequency-region signal, Optimal Parameters
It include: directive property parameter and white noise acoustic gain, directive property parameter refers to input signal-to-noise ratio of the desired signal relative to omnidirectional's noise
With the ratio of the input signal-to-noise ratio of microphone, white noise acoustic gain refers to multiple microphone output signal-to-noise ratios and inputs the ratio of signal-to-noise ratio
Value;Frequency-region signal is optimized according to Optimal Parameters, the voice signal after being denoised.
In one embodiment, as shown in figure 3, Fig. 3 is a kind of frame of 60 1 embodiment of voice interactive system of the present invention
Structural schematic diagram, comprising: acquisition module 61, determining module 63, continues the judgement of acquisition module 64, second at first judgment module 62
Module 65, extraction module 66, in which:
Acquisition module 61, the first voice signal for acquisition terminal ambient enviroment;
First judgment module 62, for judging whether first voice signal includes default Key Words;
Determining module 63, if for including the default Key Words for first voice signal, according to described pre-
If Key Words determine the position of target speaker, and extract the first vocal print feature of the target speaker;
Continue acquisition module 64, for continuing the second of acquisition terminal ambient enviroment according to the position of the target speaker
Voice signal;
Second judgment module 65, for judging that the second vocal print of second voice signal is characterized in no and first sound
Line characteristic matching;
Extraction module 66, if for matching, according to second voice signal extraction control command therein, and according to
The control command executes corresponding operation.
Specific restriction about voice interactive system may refer to limit above for interactive voice, no longer superfluous herein
It states.Modules in above-mentioned voice interactive system can be realized fully or partially through software, hardware and combinations thereof.It is above-mentioned each
Module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be stored in meter in a software form
It calculates in the memory in machine equipment, executes the corresponding operation of the above modules in order to which processor calls.
In addition, the embodiment of the present invention also proposes a kind of readable storage medium storing program for executing (i.e. computer-readable memory), it is described readable
It is stored with interactive voice program on storage medium, following operation is realized when the interactive voice program is executed by processor:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words
The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction
Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command
Row corresponding operation.
Further, the terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions,
And it is respectively used to acquire the first voice signal of the terminal surrounding environment.
Further, multiple first voice signals are obtained;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass
First voice signal of key language.
Further, according to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired institute
State the second voice signal.
Further, second language is acquired by the single microphone near the position of the target speaker
Sound signal.
Further, following operation is also realized when the interactive voice program is executed by processor:
Collected second voice signal is subjected to denoising.
Further, second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute
It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone,
The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service
Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
Claims (10)
1. a kind of voice interactive method, which is characterized in that the voice interactive method includes:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine target speaker's according to the default Key Words
Position, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and extracts described second
Second vocal print feature of voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and phase is executed according to the control command
The operation answered.
2. voice interactive method as described in claim 1, which is characterized in that the first voice of the acquisition terminal ambient enviroment
The step of signal includes:
The terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and are respectively used to adopt
Collect the first voice signal of the terminal surrounding environment.
3. voice interactive method as claimed in claim 2, which is characterized in that the terminal is provided with multiple microphones,
Multiple microphones are respectively arranged at different positions, and are respectively used to acquire the first voice letter of the terminal surrounding environment
Number the step of, comprising:
Obtain multiple first voice signals;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default Key Words
First voice signal.
4. voice interactive method as described in claim 1, which is characterized in that the position according to the target speaker after
The step of second voice signal of continuous acquisition terminal ambient enviroment, comprising:
According to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice letter
Number.
5. voice interactive method as claimed in claim 4, which is characterized in that the position according to the target speaker,
It is determined for compliance with the step of microphone of preset condition is acquired second voice signal, comprising:
Second voice signal is acquired by the single microphone near the position of the target speaker.
6. voice interactive method as claimed in claim 5, which is characterized in that described by near the target speaker's
After the step of single microphone of position is acquired second voice signal, comprising:
Collected second voice signal is subjected to denoising.
7. voice interactive method as claimed in claim 6, which is characterized in that described by collected second voice signal
The step of carrying out denoising, comprising:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, the finger
Tropism parameter refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, described
White noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
8. a kind of voice interactive system, which is characterized in that the system comprises:
Acquisition module, the first voice signal for acquisition terminal ambient enviroment;
First judgment module, for judging whether first voice signal includes default Key Words;
Determining module, it is true according to the default Key Words if including the default Key Words for first voice signal
Set the goal the position of speaker, and extracts the first vocal print feature of the target speaker;
Continue acquisition module, the second voice for continuing acquisition terminal ambient enviroment according to the position of the target speaker is believed
Number;
Second judgment module judges that the second vocal print of second voice signal is characterized in no and first vocal print feature
Match;
Extraction module, if extracting control command therein according to second voice signal, and according to the control for matching
System order executes corresponding operation.
9. a kind of terminal, which is characterized in that the terminal includes: memory, processor and is stored on the memory and can
The interactive voice program run on the processor realizes such as right when the interactive voice program is executed by the processor
It is required that the step of voice interactive method described in any one of 1 to 7.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with computer program, the meter on the readable storage medium storing program for executing
The step of voice interactive method as described in any one of claims 1 to 7 is realized when calculation machine program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910737612.1A CN110364156A (en) | 2019-08-09 | 2019-08-09 | Voice interactive method, system, terminal and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910737612.1A CN110364156A (en) | 2019-08-09 | 2019-08-09 | Voice interactive method, system, terminal and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110364156A true CN110364156A (en) | 2019-10-22 |
Family
ID=68223773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910737612.1A Pending CN110364156A (en) | 2019-08-09 | 2019-08-09 | Voice interactive method, system, terminal and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110364156A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110944056A (en) * | 2019-11-29 | 2020-03-31 | 深圳传音控股股份有限公司 | Interaction method, mobile terminal and readable storage medium |
CN111276141A (en) * | 2020-01-19 | 2020-06-12 | 珠海格力电器股份有限公司 | Voice interaction method and device, storage medium, processor and electronic equipment |
CN111640417A (en) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | Information input method, device, equipment and computer readable storage medium |
CN111681654A (en) * | 2020-05-21 | 2020-09-18 | 北京声智科技有限公司 | Voice control method and device, electronic equipment and storage medium |
CN111988426A (en) * | 2020-08-31 | 2020-11-24 | 深圳康佳电子科技有限公司 | Communication method and device based on voiceprint recognition, intelligent terminal and storage medium |
CN113066513A (en) * | 2021-03-24 | 2021-07-02 | Oppo广东移动通信有限公司 | Voice data processing method and device, electronic equipment and storage medium |
CN113921016A (en) * | 2021-10-15 | 2022-01-11 | 阿波罗智联(北京)科技有限公司 | Voice processing method, device, electronic equipment and storage medium |
CN114281182A (en) * | 2020-09-17 | 2022-04-05 | 华为技术有限公司 | Man-machine interaction method, device and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104936091A (en) * | 2015-05-14 | 2015-09-23 | 科大讯飞股份有限公司 | Intelligent interaction method and system based on circle microphone array |
CN107579883A (en) * | 2017-08-25 | 2018-01-12 | 上海肖克利信息科技股份有限公司 | Distributed pickup intelligent home furnishing control method |
CN108259280A (en) * | 2018-02-06 | 2018-07-06 | 北京语智科技有限公司 | A kind of implementation method, the system of Inteldectualization Indoors control |
CN108806681A (en) * | 2018-05-28 | 2018-11-13 | 江西午诺科技有限公司 | Sound control method, device, readable storage medium storing program for executing and projection device |
CN109473095A (en) * | 2017-09-08 | 2019-03-15 | 北京君林科技股份有限公司 | A kind of intelligent home control system and control method |
-
2019
- 2019-08-09 CN CN201910737612.1A patent/CN110364156A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104936091A (en) * | 2015-05-14 | 2015-09-23 | 科大讯飞股份有限公司 | Intelligent interaction method and system based on circle microphone array |
CN107579883A (en) * | 2017-08-25 | 2018-01-12 | 上海肖克利信息科技股份有限公司 | Distributed pickup intelligent home furnishing control method |
CN109473095A (en) * | 2017-09-08 | 2019-03-15 | 北京君林科技股份有限公司 | A kind of intelligent home control system and control method |
CN108259280A (en) * | 2018-02-06 | 2018-07-06 | 北京语智科技有限公司 | A kind of implementation method, the system of Inteldectualization Indoors control |
CN108806681A (en) * | 2018-05-28 | 2018-11-13 | 江西午诺科技有限公司 | Sound control method, device, readable storage medium storing program for executing and projection device |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110944056A (en) * | 2019-11-29 | 2020-03-31 | 深圳传音控股股份有限公司 | Interaction method, mobile terminal and readable storage medium |
CN111276141A (en) * | 2020-01-19 | 2020-06-12 | 珠海格力电器股份有限公司 | Voice interaction method and device, storage medium, processor and electronic equipment |
CN111640417A (en) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | Information input method, device, equipment and computer readable storage medium |
CN111681654A (en) * | 2020-05-21 | 2020-09-18 | 北京声智科技有限公司 | Voice control method and device, electronic equipment and storage medium |
CN111988426A (en) * | 2020-08-31 | 2020-11-24 | 深圳康佳电子科技有限公司 | Communication method and device based on voiceprint recognition, intelligent terminal and storage medium |
CN111988426B (en) * | 2020-08-31 | 2023-07-18 | 深圳康佳电子科技有限公司 | Communication method and device based on voiceprint recognition, intelligent terminal and storage medium |
CN114281182A (en) * | 2020-09-17 | 2022-04-05 | 华为技术有限公司 | Man-machine interaction method, device and system |
CN113066513A (en) * | 2021-03-24 | 2021-07-02 | Oppo广东移动通信有限公司 | Voice data processing method and device, electronic equipment and storage medium |
CN113066513B (en) * | 2021-03-24 | 2024-03-19 | Oppo广东移动通信有限公司 | Voice data processing method and device, electronic equipment and storage medium |
CN113921016A (en) * | 2021-10-15 | 2022-01-11 | 阿波罗智联(北京)科技有限公司 | Voice processing method, device, electronic equipment and storage medium |
EP4099320A3 (en) * | 2021-10-15 | 2023-07-19 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Method and apparatus of processing speech, electronic device, storage medium, and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110364156A (en) | Voice interactive method, system, terminal and readable storage medium storing program for executing | |
US11450337B2 (en) | Multi-person speech separation method and apparatus using a generative adversarial network model | |
CN108615526B (en) | Method, device, terminal and storage medium for detecting keywords in voice signal | |
US9685161B2 (en) | Method for updating voiceprint feature model and terminal | |
CN111554321B (en) | Noise reduction model training method and device, electronic equipment and storage medium | |
CN108735209A (en) | Wake up word binding method, smart machine and storage medium | |
US20240038238A1 (en) | Electronic device, speech recognition method therefor, and medium | |
CN110931000B (en) | Method and device for speech recognition | |
CN111477243B (en) | Audio signal processing method and electronic equipment | |
WO2021103449A1 (en) | Interaction method, mobile terminal and readable storage medium | |
CN111863020A (en) | Voice signal processing method, device, equipment and storage medium | |
CN113033245A (en) | Function adjusting method and device, storage medium and electronic equipment | |
CN109754823A (en) | A kind of voice activity detection method, mobile terminal | |
CN111081275B (en) | Terminal processing method and device based on sound analysis, storage medium and terminal | |
CN108600559B (en) | Control method and device of mute mode, storage medium and electronic equipment | |
CN110728993A (en) | Voice change identification method and electronic equipment | |
CN112735388B (en) | Network model training method, voice recognition processing method and related equipment | |
CN110517702A (en) | The method of signal generation, audio recognition method and device based on artificial intelligence | |
CN114333774A (en) | Speech recognition method, speech recognition device, computer equipment and storage medium | |
CN111522592A (en) | Intelligent terminal awakening method and device based on artificial intelligence | |
US20210110838A1 (en) | Acoustic aware voice user interface | |
CN114765026A (en) | Voice control method, device and system | |
CN107645604B (en) | Call processing method and mobile terminal | |
WO2020102943A1 (en) | Method and apparatus for generating gesture recognition model, storage medium, and electronic device | |
CN110931047A (en) | Voice data acquisition method and device, acquisition terminal and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |