CN110364156A - Voice interactive method, system, terminal and readable storage medium storing program for executing - Google Patents

Voice interactive method, system, terminal and readable storage medium storing program for executing Download PDF

Info

Publication number
CN110364156A
CN110364156A CN201910737612.1A CN201910737612A CN110364156A CN 110364156 A CN110364156 A CN 110364156A CN 201910737612 A CN201910737612 A CN 201910737612A CN 110364156 A CN110364156 A CN 110364156A
Authority
CN
China
Prior art keywords
voice
voice signal
signal
target speaker
key words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910737612.1A
Other languages
Chinese (zh)
Inventor
陈昊亮
罗伟航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou National Acoustic Intelligent Technology Co Ltd
Original Assignee
Guangzhou National Acoustic Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou National Acoustic Intelligent Technology Co Ltd filed Critical Guangzhou National Acoustic Intelligent Technology Co Ltd
Priority to CN201910737612.1A priority Critical patent/CN110364156A/en
Publication of CN110364156A publication Critical patent/CN110364156A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The invention discloses a kind of voice interactive method, system, terminal and readable storage medium storing program for executing, method includes the first voice signal of acquisition terminal ambient enviroment;Judge whether the first voice signal includes default Key Words;If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words;And extract the first vocal print feature of target speaker;Continue the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker;Extract the second vocal print feature of the second voice signal;Judge that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature;If matching, control command therein is extracted according to the second voice signal, and corresponding operation is executed according to control command.In this way, the interference of other users sound or noise can be excluded, the accuracy rate of the identification of speech-sound intelligent interactive product is effectively improved, and then improve the intelligence of human-computer interaction.

Description

Voice interactive method, system, terminal and readable storage medium storing program for executing
Technical field
The present invention relates to speech signal processing technology more particularly to a kind of voice interactive method, system, terminals and can Read storage medium.
Background technique
With the development of information technology, occur more and more speech-sound intelligent products in the market, speech-sound intelligent product is Through at the tool for interacting communication in for people's lives, when user is by carrying out interactive voice with speech-sound intelligent product, user By saying that a default Key Words wake up speech-sound intelligent product, then user continues phonetic control command, speech-sound intelligent product Corresponding operation is carried out according to the phonetic control command.
Currently, since actual environment is typically relatively complex, ambient noise source include cooling fan sound and other User speaks generated sound, these signal sources can not only cause certain quality to influence general call, and for voice Intellectual product is when carrying out speech recognition, it is easier to cause phonetic recognization rate low, and user is caused to use speech-sound intelligent product On difficulty and discomfort understand severe jamming if there is multiple users speak simultaneously to speech recognition, speech recognition caused to go out It is wrong or the problem of without response, and then influence the experience of user.
Summary of the invention
The main purpose of the present invention is to provide a kind of voice interactive method, system, terminal and readable storage medium storing program for executing, it is intended to Solve the low technical problem of speech-sound intelligent product discrimination when interacting.
To achieve the above object, the embodiment of the present invention provides a kind of voice interactive method, and the voice interactive method includes:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command Row corresponding operation.
Further, the step of the first voice signal of the acquisition terminal ambient enviroment includes:
The terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and use respectively In the first voice signal for acquiring the terminal surrounding environment.
Further, if first voice signal includes the default Key Words, according to the default key Language determines the step of position of target speaker, comprising:
Obtain multiple first voice signals;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass First voice signal of key language.
Further, second voice for continuing acquisition terminal ambient enviroment according to the position of the target speaker is believed Number the step of, comprising:
According to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired second voice Signal.
Further, the position according to the target speaker, the microphone for being determined for compliance with preset condition are adopted The step of collecting second voice signal, comprising:
Second voice signal is acquired by the single microphone near the position of the target speaker.
Further, the single microphone by near the position of the target speaker is acquired described After the step of two voice signals, comprising:
Collected second voice signal is subjected to denoising.
Further, described the step of collected second voice signal is subjected to denoising, comprising:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
The present invention also provides a kind of voice interactive system, the system comprises:
Acquisition module, the first voice signal for acquisition terminal ambient enviroment;
First judgment module, for judging whether first voice signal includes default Key Words;
Determining module, if including the default Key Words for first voice signal, according to the default key Language determines the position of target speaker, and extracts the first vocal print feature of the target speaker;
Continue acquisition module, for continuing the second language of acquisition terminal ambient enviroment according to the position of the target speaker Sound signal;
Second judgment module judges that the second vocal print of second voice signal is characterized in no and first vocal print feature Matching;
Extraction module, if extracting control command therein according to second voice signal, and according to institute for matching It states control command and executes corresponding operation.
The present invention also provides a kind of terminal, the terminal includes: memory, processor and is stored on the memory simultaneously The interactive voice program that can be run on the processor is realized as above when the interactive voice program is executed by the processor The step of voice interactive method stated.
The present invention also provides a kind of readable storage medium storing program for executing, which is characterized in that calculating is stored on the readable storage medium storing program for executing Machine program is realized when the computer program is executed by processor such as the step of above-mentioned voice interactive method.
The present invention passes through the first voice signal of multiple microphone acquisition terminal ambient enviroments;Judging the first voice signal is It is no to include default Key Words;If the first voice signal includes default Key Words, target speaker is determined according to Key Words are preset Position;And extract the first vocal print feature of target speaker;Continued around acquisition terminal according to the position of target speaker Second voice signal of environment;Extract the second vocal print feature of the second voice signal;Judge the second vocal print of the second voice signal Whether feature matches with the first vocal print feature;If matching, according to the second voice signal extraction control command therein, and according to Control command executes corresponding operation;If mismatching, terminal does not make any response.By to the position on target speaker Vocal print feature further judges, can exclude the interference of other users sound or noise, effectively improve speech-sound intelligent The accuracy rate of the identification of interactive product, and then improve the intelligence of human-computer interaction.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, for those of ordinary skill in the art Speech, without any creative labor, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of hardware structural diagram of embodiment of terminal provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of one embodiment of voice interactive method of the present invention;
A kind of circuit theory schematic diagram of one embodiment of voice interactive system of Fig. 3 present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
In subsequent description, it is only using the suffix for indicating such as " module ", " component " or " unit " of element Be conducive to explanation of the invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix Ground uses.
As shown in Figure 1, Fig. 1 is the structural schematic diagram of the terminal for the hardware operation that the embodiment of the present invention is related to.
The terminal of that embodiment of the invention can be PC, be also possible to smart phone, tablet computer, E-book reader, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3) Player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard sound Frequency level 3) terminal device with voice interactive function such as player, portable computer.
As shown in Figure 1, the terminal may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.The equipment such as multiple microphones have can be set in the terminal, are constantly in voice letter Number acquisition state, for acquiring the voice signal of user in real time.Wherein, communication bus 1002 is for realizing between these components Connection communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional User interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include standard Wireline interface, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable Memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned The storage system of processor 1001.
Optionally, terminal can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, audio Circuit, WiFi module etc..Wherein, sensor such as optical sensor, motion sensor and other sensors.Specifically, light Sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can according to the light and shade of ambient light come The brightness of display screen is adjusted, proximity sensor can close display screen and/or backlight when terminal is moved in one's ear.As movement One kind of sensor, gravity accelerometer can detect the size of (generally three axis) acceleration in all directions, when static Size and the direction that can detect that gravity can be used to identify application (such as the horizontal/vertical screen switching, related trip of mobile terminal posture Play, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;Certainly, mobile terminal can also configure The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal of terminal structure shown in Fig. 1, can wrap It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, Subscriber Interface Module SIM and interactive voice program.
In terminal shown in Fig. 1, network interface 1004 is mainly used for connecting background server, carries out with background server Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor 1001 can be used for calling the interactive voice program stored in memory 1005, and execute following operation:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command Row corresponding operation.
Further, the terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, And it is respectively used to acquire the first voice signal of the terminal surrounding environment.
Further, multiple first voice signals are obtained;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass First voice signal of key language.
Further, according to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired institute State the second voice signal.
Further, second language is acquired by the single microphone near the position of the target speaker Sound signal.
Further, processor 1001 can call the interactive voice program stored in memory 1005, also execute following Operation: collected second voice signal is subjected to denoising.
Further, second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
Based on above-mentioned terminal hardware structure, each embodiment of the method for the present invention is proposed.
The present invention provides a kind of voice interactive method, in one embodiment of voice interactive method, referring to Fig. 2, this method packet It includes:
Step S10, the first voice signal of acquisition terminal ambient enviroment;
First voice signal of acquisition terminal ambient enviroment.Wherein, which can be set the equipment such as multiple microphones, And it is constantly in speech signal collection state, for acquiring the voice signal of user in real time.
Step S20 judges whether the first voice signal includes default Key Words;
Terminal judges whether the first voice signal includes default Key Words.Wherein, terminal collects the first voice of user Signal carries out speech recognition to first voice signal, the corresponding content of text of the first voice signal is got, according in text Hold and inquire default Key Words, judges that content of text whether there is and the first word similar in default Key Words or default Key Words Or first sentence;If there are the first word or the first sentences in content of text, it is determined that comprising pre- in the first voice signal If Key Words.Specifically, which can be is arranged by business men oneself, is also possible to user oneself in terminal applies Interface is configured.For example, default Key Words can be Little Bear, Little Bear.
Step S30 determines target speaker's according to Key Words are preset if the first voice signal includes default Key Words Position, and extract the first vocal print feature of target speaker;
If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words, with And extract the first vocal print feature of target speaker.Wherein, speech recognition is carried out to the first voice signal, if including default key Language, and determine the position of target speaker, and terminal is converted from dormant state to wake-up states, it is subsequent to be carried out with user Interactive voice operation, and to the first voice signal of the user carry out Application on Voiceprint Recognition, obtain the first vocal print feature of the user.
Step S40, continues the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker, and extracts Second vocal print feature of the second voice signal;
Terminal continues the second voice signal of acquisition terminal ambient enviroment according to the position of target speaker, and extracts second Second vocal print feature of voice signal.Wherein, when terminal is in wake-up states, then continue the second of acquisition terminal ambient enviroment Voice signal, and extract the vocal print feature in the direction on target speaker.Second voice signal may include control command.
Step S50 judges that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature;
Terminal judges that the second vocal print of the second voice signal is characterized in no matching with the first vocal print feature.Wherein, terminal after Second voice signal of the position on continuous acquisition target speaker further judges whether second voice signal is whole with wake-up The first vocal print feature of the target speaker at end matches.
Step S60 extracts control command therein according to the second voice signal, and hold according to control command if matching Row corresponding operation.
If the second vocal print feature of the second voice signal is matched with the first vocal print feature, terminal is according to the second voice signal Control command therein is extracted, and corresponding operation is executed according to control command;If the second vocal print feature of the second voice signal It is mismatched with the first vocal print feature, then terminal does not make any response.Wherein, determine the second voice signal the second vocal print feature with The matching of first vocal print feature, then terminal continues to collect the second voice signal of target speaker, gets the second voice signal Corresponding content of text extracts control command therein according to content of text, and according to the control command, execution and target user The corresponding operating for carrying out intelligent sound interaction, for example, the movement for command adapted thereto of answering a question, do.
The present invention passes through the first voice signal of acquisition terminal ambient enviroment;Judge whether the first voice signal includes default Key Words;If the first voice signal includes default Key Words, the position of target speaker is determined according to default Key Words;And Extract the first vocal print feature of target speaker;Continue the second language of acquisition terminal ambient enviroment according to the position of target speaker Sound signal;Extract the second vocal print feature of the second voice signal;It is no with the to judge that the second vocal print of the second voice signal is characterized in The matching of one vocal print feature;If matching, control command therein is extracted according to the second voice signal, and execute according to control command Corresponding operation;If mismatching, terminal does not make any response.By the vocal print feature to the position on target speaker into one The judgement of step ground, can exclude the interference of other users sound or noise, effectively improve the knowledge of speech-sound intelligent interactive product Other accuracy rate, and then improve the intelligence of human-computer interaction.
Optionally, step S10 specifically may include following steps:
Step S11, terminal are provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and use respectively In the first voice signal of acquisition terminal ambient enviroment.
Multiple microphones have can be set in terminal, and multiple microphones are respectively arranged at different positions, and decibel is for adopting Collect the first voice signal of terminal surrounding environment.Wherein, multiple microphones can be laid out as circular array, or rectangle, Quadrate array etc., and multiple microphones can be uniformly distributed, it can also be according to practical situations non-uniform Distribution, such as in advance It learns that the probability occurred in certain Place object speakers is larger, then microphone can be laid out more than its corresponding position, at it His position microphone arrangement is more sparse, receives ability to enhance the voice signal on specific direction.
In some embodiments, terminal can determine which microphone is closer from user according to day regular data, will compare Farther away microphone is closed, and the distributed data of this microphone closed uploads, to improve the setting layout of subsequent terminal microphone. The voice signal of user is acquired by closer microphone, is closed other farther away microphones, can be economized on resources, Jin Erjie Save electricity.
Optionally, step S20 specifically may include following steps:
Step S21 obtains multiple first voice signals;
Step S22 calculates the voice intensity of multiple default Key Words according to multiple first voice signals;
Step S23, according to the voice intensity of multiple default Key Words, mark include voice maximum intensity default pass First voice signal of key language.
Terminal obtains multiple first voice signals and calculates the language of multiple predetermined keywords according to multiple first voice signals Loudness of a sound degree, according to the voice intensity of multiple default Key Words, mark include voice maximum intensity default Key Words One voice signal, wherein for marking hithermost first voice signal in position of microphone position Yu target speaker.Tool Body, the signal screening that voice intensity is greater than default voice intensity threshold is come out, includes then that voice is strong to screening The first voice signal of maximum default Key Words is spent, the position of target speaker is then determined according to default Key Words.
Optionally, step S40 specifically may include following steps:
Step S41, according to the position of target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice Signal.
According to the position of target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice to be believed terminal Number.Wherein, terminal filters out qualified microphone by the way that preset condition is arranged, and adopts to the second voice signal of user Collection.
Further, in the another embodiment of voice interactive method of the present invention, after step S41, comprising:
Step S411 is acquired the second voice signal by the single microphone near the position of close-target speaker.
Terminal is acquired the second voice signal by the single microphone near the position of close-target speaker, by other Microphone is closed.Wherein, according to the position of target speaker, then the estimated multiple microphones of calculating are spoken in reception target When the acoustic wave energy of people, show that acoustic receiver time difference and multiple microphones receive the wave amplitude of target speaker, according to Sound wave it is weak, obtain the wave amplitude of each microphone, according to wave amplitude and acoustic receiver time difference obtain near The single microphone of the position of target speaker is acquired the second voice signal.Then said using the microphone as near close-target The microphone for talking about the position of people acquires the second voice signal by the microphone.
Further, terminal can be with current entry near the data of the single microphone of the position of close-target speaker, will Other microphones are closed, then the distributed data of this microphone closed is uploaded, to improve subsequent terminal microphone Setting layout.The voice signal of user is acquired by closer microphone, is closed other farther away microphones, can be saved money Source, and then save electricity.
In the present embodiment, the second voice letter is acquired by the single microphone near the position of close-target speaker Number, it avoids wasting multiple microphones and is acquired, waste of resource, and the interference of other microphones can also be excluded, only to close to mesh A microphone for marking the position of speaker carries out speech recognition, and then improves speech recognition effect, and human-computer interaction is made to have more intelligence It can property.
Further, in the another embodiment of voice interactive method of the present invention, after step S411, comprising:
Collected second voice signal is carried out denoising by step A.
Collected second voice signal is carried out denoising by terminal, the second voice signal after being denoised, and is mentioned The second vocal print feature of the second voice signal is taken, determines that the second vocal print feature of the second voice signal is the first vocal print feature, then Speech recognition is carried out according to the second voice signal after denoising.Wherein, denoising is carried out to voice to belong to those skilled in the art and can operate Technology, no longer illustrate here.
In this embodiment, the second voice signal after being denoised, and the second vocal print feature of the second voice signal is extracted, The second vocal print feature for determining the second voice signal is the first vocal print feature, then carries out language according to the second voice signal after denoising Sound identification can be improved terminal recognition rate, and then improve the accuracy of terminal recognition.
Optionally, it the step of collected second voice signal is carried out denoising, specifically includes:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of frequency-region signal are calculated, Optimal Parameters include: directive property parameter and white noise acoustic gain, directive property parameter Referring to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, white noise acoustic gain is Refer to multiple microphone output signal-to-noise ratios and inputs the ratio of signal-to-noise ratio;Frequency-region signal is optimized according to Optimal Parameters, is obtained Voice signal after denoising.
In practical applications, it is contemplated that voice signal is broadband signal, and different frequent points are needed to handle respectively, so needing The time-domain signal that microphone is collected into is converted into frequency-region signal.Terminal calculates the Optimal Parameters of frequency-region signal, Optimal Parameters It include: directive property parameter and white noise acoustic gain, directive property parameter refers to input signal-to-noise ratio of the desired signal relative to omnidirectional's noise With the ratio of the input signal-to-noise ratio of microphone, white noise acoustic gain refers to multiple microphone output signal-to-noise ratios and inputs the ratio of signal-to-noise ratio Value;Frequency-region signal is optimized according to Optimal Parameters, the voice signal after being denoised.
In one embodiment, as shown in figure 3, Fig. 3 is a kind of frame of 60 1 embodiment of voice interactive system of the present invention Structural schematic diagram, comprising: acquisition module 61, determining module 63, continues the judgement of acquisition module 64, second at first judgment module 62 Module 65, extraction module 66, in which:
Acquisition module 61, the first voice signal for acquisition terminal ambient enviroment;
First judgment module 62, for judging whether first voice signal includes default Key Words;
Determining module 63, if for including the default Key Words for first voice signal, according to described pre- If Key Words determine the position of target speaker, and extract the first vocal print feature of the target speaker;
Continue acquisition module 64, for continuing the second of acquisition terminal ambient enviroment according to the position of the target speaker Voice signal;
Second judgment module 65, for judging that the second vocal print of second voice signal is characterized in no and first sound Line characteristic matching;
Extraction module 66, if for matching, according to second voice signal extraction control command therein, and according to The control command executes corresponding operation.
Specific restriction about voice interactive system may refer to limit above for interactive voice, no longer superfluous herein It states.Modules in above-mentioned voice interactive system can be realized fully or partially through software, hardware and combinations thereof.It is above-mentioned each Module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be stored in meter in a software form It calculates in the memory in machine equipment, executes the corresponding operation of the above modules in order to which processor calls.
In addition, the embodiment of the present invention also proposes a kind of readable storage medium storing program for executing (i.e. computer-readable memory), it is described readable It is stored with interactive voice program on storage medium, following operation is realized when the interactive voice program is executed by processor:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine that target is spoken according to the default Key Words The position of people, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and described in extraction Second vocal print feature of the second voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and hold according to the control command Row corresponding operation.
Further, the terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, And it is respectively used to acquire the first voice signal of the terminal surrounding environment.
Further, multiple first voice signals are obtained;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default pass First voice signal of key language.
Further, according to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired institute State the second voice signal.
Further, second language is acquired by the single microphone near the position of the target speaker Sound signal.
Further, following operation is also realized when the interactive voice program is executed by processor:
Collected second voice signal is subjected to denoising.
Further, second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, institute It states directive property parameter and refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, The white noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (10)

1. a kind of voice interactive method, which is characterized in that the voice interactive method includes:
First voice signal of acquisition terminal ambient enviroment;
Judge whether first voice signal includes default Key Words;
If first voice signal includes the default Key Words, determine target speaker's according to the default Key Words Position, and extract the first vocal print feature of the target speaker;
Continue the second voice signal of acquisition terminal ambient enviroment according to the position of the target speaker, and extracts described second Second vocal print feature of voice signal;
Judge that the second vocal print of second voice signal is characterized in no matching with first vocal print feature;
If matching, control command therein is extracted according to second voice signal, and phase is executed according to the control command The operation answered.
2. voice interactive method as described in claim 1, which is characterized in that the first voice of the acquisition terminal ambient enviroment The step of signal includes:
The terminal is provided with multiple microphones, and multiple microphones are respectively arranged at different positions, and are respectively used to adopt Collect the first voice signal of the terminal surrounding environment.
3. voice interactive method as claimed in claim 2, which is characterized in that the terminal is provided with multiple microphones, Multiple microphones are respectively arranged at different positions, and are respectively used to acquire the first voice letter of the terminal surrounding environment Number the step of, comprising:
Obtain multiple first voice signals;
According to multiple first voice signals, the voice intensity of multiple default Key Words is calculated;
According to the voice intensity of multiple default Key Words, mark include voice maximum intensity the default Key Words First voice signal.
4. voice interactive method as described in claim 1, which is characterized in that the position according to the target speaker after The step of second voice signal of continuous acquisition terminal ambient enviroment, comprising:
According to the position of the target speaker, the microphone for being determined for compliance with preset condition is acquired the second voice letter Number.
5. voice interactive method as claimed in claim 4, which is characterized in that the position according to the target speaker, It is determined for compliance with the step of microphone of preset condition is acquired second voice signal, comprising:
Second voice signal is acquired by the single microphone near the position of the target speaker.
6. voice interactive method as claimed in claim 5, which is characterized in that described by near the target speaker's After the step of single microphone of position is acquired second voice signal, comprising:
Collected second voice signal is subjected to denoising.
7. voice interactive method as claimed in claim 6, which is characterized in that described by collected second voice signal The step of carrying out denoising, comprising:
Second voice signal is converted into frequency-region signal;
The Optimal Parameters of the frequency-region signal are calculated, the Optimal Parameters include: directive property parameter and white noise acoustic gain, the finger Tropism parameter refers to ratio of the desired signal relative to the input signal-to-noise ratio of omnidirectional's noise and the input signal-to-noise ratio of microphone, described White noise acoustic gain refers to the multiple microphone output signal-to-noise ratio and inputs the ratio of signal-to-noise ratio;
The frequency-region signal is optimized according to the Optimal Parameters, the voice signal after being denoised.
8. a kind of voice interactive system, which is characterized in that the system comprises:
Acquisition module, the first voice signal for acquisition terminal ambient enviroment;
First judgment module, for judging whether first voice signal includes default Key Words;
Determining module, it is true according to the default Key Words if including the default Key Words for first voice signal Set the goal the position of speaker, and extracts the first vocal print feature of the target speaker;
Continue acquisition module, the second voice for continuing acquisition terminal ambient enviroment according to the position of the target speaker is believed Number;
Second judgment module judges that the second vocal print of second voice signal is characterized in no and first vocal print feature Match;
Extraction module, if extracting control command therein according to second voice signal, and according to the control for matching System order executes corresponding operation.
9. a kind of terminal, which is characterized in that the terminal includes: memory, processor and is stored on the memory and can The interactive voice program run on the processor realizes such as right when the interactive voice program is executed by the processor It is required that the step of voice interactive method described in any one of 1 to 7.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with computer program, the meter on the readable storage medium storing program for executing The step of voice interactive method as described in any one of claims 1 to 7 is realized when calculation machine program is executed by processor.
CN201910737612.1A 2019-08-09 2019-08-09 Voice interactive method, system, terminal and readable storage medium storing program for executing Pending CN110364156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910737612.1A CN110364156A (en) 2019-08-09 2019-08-09 Voice interactive method, system, terminal and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910737612.1A CN110364156A (en) 2019-08-09 2019-08-09 Voice interactive method, system, terminal and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
CN110364156A true CN110364156A (en) 2019-10-22

Family

ID=68223773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910737612.1A Pending CN110364156A (en) 2019-08-09 2019-08-09 Voice interactive method, system, terminal and readable storage medium storing program for executing

Country Status (1)

Country Link
CN (1) CN110364156A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110944056A (en) * 2019-11-29 2020-03-31 深圳传音控股股份有限公司 Interaction method, mobile terminal and readable storage medium
CN111276141A (en) * 2020-01-19 2020-06-12 珠海格力电器股份有限公司 Voice interaction method and device, storage medium, processor and electronic equipment
CN111640417A (en) * 2020-05-13 2020-09-08 广州国音智能科技有限公司 Information input method, device, equipment and computer readable storage medium
CN111681654A (en) * 2020-05-21 2020-09-18 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN111988426A (en) * 2020-08-31 2020-11-24 深圳康佳电子科技有限公司 Communication method and device based on voiceprint recognition, intelligent terminal and storage medium
CN113066513A (en) * 2021-03-24 2021-07-02 Oppo广东移动通信有限公司 Voice data processing method and device, electronic equipment and storage medium
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium
CN114281182A (en) * 2020-09-17 2022-04-05 华为技术有限公司 Man-machine interaction method, device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN107579883A (en) * 2017-08-25 2018-01-12 上海肖克利信息科技股份有限公司 Distributed pickup intelligent home furnishing control method
CN108259280A (en) * 2018-02-06 2018-07-06 北京语智科技有限公司 A kind of implementation method, the system of Inteldectualization Indoors control
CN108806681A (en) * 2018-05-28 2018-11-13 江西午诺科技有限公司 Sound control method, device, readable storage medium storing program for executing and projection device
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN107579883A (en) * 2017-08-25 2018-01-12 上海肖克利信息科技股份有限公司 Distributed pickup intelligent home furnishing control method
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method
CN108259280A (en) * 2018-02-06 2018-07-06 北京语智科技有限公司 A kind of implementation method, the system of Inteldectualization Indoors control
CN108806681A (en) * 2018-05-28 2018-11-13 江西午诺科技有限公司 Sound control method, device, readable storage medium storing program for executing and projection device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110944056A (en) * 2019-11-29 2020-03-31 深圳传音控股股份有限公司 Interaction method, mobile terminal and readable storage medium
CN111276141A (en) * 2020-01-19 2020-06-12 珠海格力电器股份有限公司 Voice interaction method and device, storage medium, processor and electronic equipment
CN111640417A (en) * 2020-05-13 2020-09-08 广州国音智能科技有限公司 Information input method, device, equipment and computer readable storage medium
CN111681654A (en) * 2020-05-21 2020-09-18 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN111988426A (en) * 2020-08-31 2020-11-24 深圳康佳电子科技有限公司 Communication method and device based on voiceprint recognition, intelligent terminal and storage medium
CN111988426B (en) * 2020-08-31 2023-07-18 深圳康佳电子科技有限公司 Communication method and device based on voiceprint recognition, intelligent terminal and storage medium
CN114281182A (en) * 2020-09-17 2022-04-05 华为技术有限公司 Man-machine interaction method, device and system
CN113066513A (en) * 2021-03-24 2021-07-02 Oppo广东移动通信有限公司 Voice data processing method and device, electronic equipment and storage medium
CN113066513B (en) * 2021-03-24 2024-03-19 Oppo广东移动通信有限公司 Voice data processing method and device, electronic equipment and storage medium
CN113921016A (en) * 2021-10-15 2022-01-11 阿波罗智联(北京)科技有限公司 Voice processing method, device, electronic equipment and storage medium
EP4099320A3 (en) * 2021-10-15 2023-07-19 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus of processing speech, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN110364156A (en) Voice interactive method, system, terminal and readable storage medium storing program for executing
US11450337B2 (en) Multi-person speech separation method and apparatus using a generative adversarial network model
CN108615526B (en) Method, device, terminal and storage medium for detecting keywords in voice signal
US9685161B2 (en) Method for updating voiceprint feature model and terminal
CN111554321B (en) Noise reduction model training method and device, electronic equipment and storage medium
CN108735209A (en) Wake up word binding method, smart machine and storage medium
US20240038238A1 (en) Electronic device, speech recognition method therefor, and medium
CN110931000B (en) Method and device for speech recognition
CN111477243B (en) Audio signal processing method and electronic equipment
WO2021103449A1 (en) Interaction method, mobile terminal and readable storage medium
CN111863020A (en) Voice signal processing method, device, equipment and storage medium
CN113033245A (en) Function adjusting method and device, storage medium and electronic equipment
CN109754823A (en) A kind of voice activity detection method, mobile terminal
CN111081275B (en) Terminal processing method and device based on sound analysis, storage medium and terminal
CN108600559B (en) Control method and device of mute mode, storage medium and electronic equipment
CN110728993A (en) Voice change identification method and electronic equipment
CN112735388B (en) Network model training method, voice recognition processing method and related equipment
CN110517702A (en) The method of signal generation, audio recognition method and device based on artificial intelligence
CN114333774A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
US20210110838A1 (en) Acoustic aware voice user interface
CN114765026A (en) Voice control method, device and system
CN107645604B (en) Call processing method and mobile terminal
WO2020102943A1 (en) Method and apparatus for generating gesture recognition model, storage medium, and electronic device
CN110931047A (en) Voice data acquisition method and device, acquisition terminal and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191022