CN110097875A - Interactive voice based on microphone signal wakes up electronic equipment, method and medium - Google Patents
Interactive voice based on microphone signal wakes up electronic equipment, method and medium Download PDFInfo
- Publication number
- CN110097875A CN110097875A CN201910475949.XA CN201910475949A CN110097875A CN 110097875 A CN110097875 A CN 110097875A CN 201910475949 A CN201910475949 A CN 201910475949A CN 110097875 A CN110097875 A CN 110097875A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- user
- voice
- microphone
- electronic equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 50
- 230000004044 response Effects 0.000 claims abstract description 49
- 230000003993 interaction Effects 0.000 claims abstract description 11
- 238000012790 confirmation Methods 0.000 claims abstract description 9
- 206010002953 Aphonia Diseases 0.000 claims abstract description 3
- 210000001260 vocal cord Anatomy 0.000 claims description 49
- 230000003287 optical effect Effects 0.000 claims description 4
- 230000005611 electricity Effects 0.000 claims 3
- 210000000214 mouth Anatomy 0.000 description 56
- 238000003860 storage Methods 0.000 description 25
- 238000004458 analytical method Methods 0.000 description 15
- 230000005236 sound signal Effects 0.000 description 14
- 238000010801 machine learning Methods 0.000 description 12
- 238000001228 spectrum Methods 0.000 description 11
- 230000004913 activation Effects 0.000 description 9
- 238000013022 venting Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000009826 distribution Methods 0.000 description 7
- 238000003062 neural network model Methods 0.000 description 7
- 241000209140 Triticum Species 0.000 description 6
- 235000021307 Triticum Nutrition 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 230000036544 posture Effects 0.000 description 4
- 230000019771 cognition Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 238000007789 sealing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000002618 waking effect Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007664 blowing Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephone Function (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Provide a kind of intelligent electronic device for being built-in with microphone, the smart electronics portable equipment operates as follows carries out the interaction inputted based on voice with user: the voice signal for handling microphones capture judges in voice signal with the presence or absence of voice signal;In response to, there are voice signal, the voice signal based on microphone acquisition further judges whether the mouth distance of intelligent electronic device and user are less than predetermined threshold in confirmation voice signal;In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, using the voice signal of microphone acquisition as voice input processing.The step of exchange method is suitable for user and carries out voice input when carrying intelligent electronic device, operates nature and simple, simplifies voice input reduces interaction and bears and difficulty, so that interactive more natural.
Description
Technical field
The present invention generally relates to voices to input field, and more specifically, is related to intelligent electronic device, voice input
Triggering method.
Background technique
With the development of computer technology, speech recognition algorithm is increasingly mature, and voice is inputted because it is in interactive mode
High naturality is just becoming more and more important with validity.User can by voice and mobile device (mobile phone, wrist-watch etc.) into
Row interaction, completes the multiple-tasks such as instruction input, information inquiry, voice-enabled chat.
And on this point of when triggering voice input, existing solution has some defects:
1. physical button triggers
After some (or certain) physical button for pressing (or pinning) mobile device, activation voice input.
The disadvantages of this solution is: needing physical button;It is easy false triggering;Need user key-press.
2. interface element triggers
Click the interface element (such as icon) on the screen of (or pinning) mobile device, activation voice input.
The disadvantages of this solution is: equipment being needed to have screen;It triggers element and occupies screen content;It is limited to software UI limit
System, it is cumbersome to may cause triggering mode;It is easy false triggering.
3. waking up word (voice) detection
It is to wake up word with some particular words (such as product pet name), activation voice is defeated after equipment detects corresponding wake-up word
Enter.
The disadvantages of this solution is: privacy and social poor;Interactive efficiency is lower.
Summary of the invention
In view of the foregoing, the present invention is proposed:
According to an aspect of the present invention, a kind of intelligent electronic device being built-in with microphone, the smart electronics are portable
Equipment operates as follows carries out the interaction inputted based on voice with user: the voice signal for handling microphones capture judges voice signal
In whether there is voice signal;In response to confirming there are voice signal in voice signal, the voice signal based on microphone acquisition
Further judge whether the mouth distance of intelligent electronic device and user are less than predetermined threshold;In response to determining electronic equipment and using
Family mouth distance is less than predetermined threshold, using the voice signal of microphone acquisition as voice input processing.
Preferably, predetermined threshold is 3 centimetres.
Preferably, predetermined threshold is 1 centimetre.
Preferably, also close to optical sensor at the microphone of electronic equipment, by being judged whether there is close to optical sensor
Object proximity electronic equipment.
Preferably, there are also range sensors at the microphone of electronic equipment, and electronics is directly measured by range sensor and is set
It is standby at a distance from user's mouth.
Preferably, the mouth distance of intelligent electronic device and user is judged by the voice signal property of microphone collection
Whether predetermined threshold is less than.
Preferably, the voice signal includes one of following items or combination: user is spoken sending with normal quantity
Sound;User mumbles the sound of sending;User's vocal cords not speak the sound of generation by sounding.
Preferably, electronic equipment is also operable to: in response to determining that user is closely speaking against electronic equipment;Sentence
Disconnected user is one of as follows in sounding, comprising: user is with normal quantity one's voice in speech;User is said with small volume
The sound of words;User with vocal cords, do not speak the sound of sending by tune;And it is different according to the result of judgement, to voice signal
Do different processing.
Preferably, the different processing is the different application program processing voice input of activation.
Preferably, feature used in judgement includes volume, spectrum signature, Energy distribution etc..
Preferably, judge the feature packet used when whether the mouth distance of intelligent electronic device and user are less than predetermined threshold
Include the temporal signatures and frequency domain character in voice signal, including volume, spectrum energy.
Preferably, whether the mouth distance for judging intelligent electronic device and user includes: from wheat less than predetermined threshold
Gram collected voice signal signal of wind extracts voice signal by filter;Judge the voice signal energy whether be more than
Certain threshold value;It is more than certain threshold value in response to voice signal intensity, judges that electronic equipment and user's mouth distance are less than predetermined threshold
Value.
Preferably, whether the mouth distance for judging intelligent electronic device and user includes: to utilize less than predetermined threshold
Deep neural network model handles the data of microphone acquisition, judges whether the mouth distance of intelligent electronic device and user are less than
Predetermined threshold.
Preferably, whether the mouth distance for judging intelligent electronic device and user includes: record less than predetermined threshold
Voice signal of the user when not doing voice input;By voice signal that microphone currently acquires and language when not doing voice input
Sound signal is made comparisons;If it is determined that the voice signal volume that microphone currently acquires is more than voice signal when not doing voice input
The certain threshold value of volume, judge that the mouth distance of intelligent electronic device and user are less than predetermined threshold.
Preferably, voice signal is inputted done processing as the voice of user includes one or more of: by sound
Sound signal storage on electronic equipment can storage medium;Voice signal is sent by internet;It will be in voice signal
Voice signal be identified as text, store on electronic equipment can storage medium;By the voice signal identification in voice signal
For text, sent by internet;Voice signal in voice signal is identified as text, understands that the voice of user refers to
It enables, executes corresponding operating.
Preferably, electronic equipment also identifies specific user by voiceprint analysis, only to the sound comprising specific user's voice
Signal processes.
Preferably, electronic equipment is smart phone, smartwatch, intelligent ring etc..
Mobile device herein includes but is not limited to mobile phone, head-mounted display, wrist-watch, and intelligent ring, watch etc.
Smaller intelligent wearable device.
According to another aspect of the present invention, a kind of voice executed by the intelligent electronic device configured with microphone is provided
Triggering method is inputted, carries out the interaction inputted based on voice for intelligent electronic device and user including operating as follows: processing wheat
The voice signal of gram wind capture judge to whether there is voice signal in voice signal;In response to there are voices in confirmation voice signal
Signal, the voice signal based on microphone acquisition further judge whether the mouth distance of intelligent electronic device and user are less than in advance
Determine threshold value;In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, the voice signal of microphone acquisition is made
For voice input processing.
According to another aspect of the present invention, a kind of computer-readable medium is provided, it is executable to be stored thereon with computer
Instruction, is able to carry out interactive voice awakening method when computer executable instructions are computer-executed, the interactive voice wakes up
Method includes: that the voice signal of processing microphones capture judges in voice signal with the presence or absence of voice signal;In response to confirmation sound
There are voice signal in sound signal, the voice signal based on microphone acquisition further judges intelligent electronic device and the mouth of user
Whether portion's distance is less than predetermined threshold;In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, by microphone
The voice signal of acquisition is as voice input processing.
According to an aspect of the invention, there is provided a kind of electronic equipment configured with microphone, electronic equipment, which has, to be deposited
Reservoir and central processing unit are stored with computer executable instructions on memory, and computer executable instructions are by central processing unit
Following operation is able to carry out when execution: the voice signal of analysis microphone acquisition identifies whether speak comprising people in voice signal
Voice and whether comprising people speak generation air-flow hit microphone generate wind noise sound, in response to determine voice signal
In comprising people's one's voice in speech and comprising user speak generation air-flow hit microphone generate wind noise sound, by the sound
Signal is processed as the voice input of user.
Preferably, the voice that user speaks includes: user with normal quantity one's voice in speech, and user is spoken with small volume
Sound, user with vocal cords, do not speak the sound of sending by tune.
Preferably, electronic equipment is also operable to: in response to determining that user is closely speaking against electronic equipment, being sentenced
Disconnected user is one of as follows in sounding, comprising: user is said with normal quantity one's voice in speech, user with small volume
The sound of words, user with vocal cords, do not speak the sound of sending by tune;It is different according to the result of judgement, voice signal is done not
Same processing.
Preferably, the different processing is the different application program processing voice input of activation.
Preferably, judge that the feature used includes volume, spectrum signature, Energy distribution etc..
Preferably, voice signal is inputted done processing as the voice of user includes one or more of: by sound
Sound signal storage on electronic equipment can storage medium;Voice signal is sent by internet;It will be in voice signal
Voice signal be identified as text, store on electronic equipment can storage medium;By the voice signal identification in voice signal
For text, sent by internet;Voice signal in voice signal is identified as text, understands that the voice of user refers to
It enables, executes corresponding operating.
Preferably, electronic equipment is also operable to identify specific user by voiceprint analysis, only to including specific user's language
The voice signal of sound processes.
Preferably, electronic equipment is one of smart phone, smartwatch, intelligent ring.
Preferably, whether electronic equipment is also operable to: being judged in voice signal using neural network model comprising user
The air-flow of the voice and generation of speaking spoken hits the wind noise sound that microphone generates.
Preferably, whether electronic equipment is also operable to the voice whether spoken comprising people in identification voice signal and wraps
The speak air-flow of generation containing people hits the wind noise sound that microphone generates includes whether speaking comprising user in identification voice signal
Voice;In response to determining the voice spoken in voice signal comprising user, identifies the phoneme in voice, voice signal is indicated
For aligned phoneme sequence;For each phoneme in aligned phoneme sequence, determine whether the phoneme is phoneme of feeling elated and exultant, it may be assumed that user's sounding sound
There is air-flow to come out when plain from mouth;It according to fixed length of window cutting is sound clip sequence by voice signal;Utilize frequency spy
Sign, identifies whether each sound clip includes wind noise;It will be in phoneme and the sound fragment sequence venting one's pent-up feelings in phoneme of speech sound sequence
The segment for being identified as wind noise compares, at the same by aligned phoneme sequence non-phoneme venting one's pent-up feelings and wind noise segment make comparisons, when spitting
Aspirant element and wind noise segment registration are higher than certain threshold value, and non-phoneme venting one's pent-up feelings and non-wind noise segment registration are lower than certain
When threshold value, judge in the voice signal comprising user speak generation air-flow hit microphone generate wind noise sound.
Preferably, it identifies the voice whether spoken comprising people in voice signal and whether speaks the air-flow of generation comprising people
Hitting the wind noise sound that microphone generates includes: the sound characteristic for identifying and making an uproar in voice signal comprising wind;In response to determining sound
It include wind noise in signal, identification voice signal includes voice signal;Include voice signal in voice signal in response to determining, knows
The corresponding aligned phoneme sequence of other voice signal;It makes an uproar feature for the wind in voice signal, the wind for calculating each moment is made an uproar characteristic strength;
For each phoneme in aligned phoneme sequence, phoneme intensity venting one's pent-up feelings is obtained according to data model predetermined;By being based on
Gaussian Mixture Bayesian model analysis wind is made an uproar the consistency of feature and aligned phoneme sequence, and when registration is higher than certain threshold value, judgement should
In voice signal comprising user speak generation air-flow hit microphone generate wind noise sound.
According to another aspect of the present invention, a kind of electronic equipment for being built-in with multiple microphones, electronic equipment tool are provided
There are memory and central processing unit, computer executable instructions is stored on memory, computer executable instructions are by centre
Reason device is able to carry out following operation when executing: analyzing the voice signal of multiple microphone acquisitions;Judge user whether just in low coverage
It speaks from against electronic equipment;In response to determining that user is closely speaking against electronic equipment, the sound that microphone is acquired
Voice input processing of the sound signal as user.
Preferably, multiple microphones constitute microphone array system.
Preferably, it is described judge user whether closely against electronic equipment speaking include: using reach array on
Time difference between the voice signal of each microphone calculates position of user's mouth relative to microphone array;When user's mouth away from
When being less than certain threshold value with a distance from electronic equipment, determine that user closely speaks against electronic equipment.
Preferably, the distance threshold is 10 centimetres.
Preferably, it is described processed using the voice signal as the input of the voice of user include: according to speaker's mouth and
The difference of distance between electronic equipment does different disposal to the voice input of user.
Preferably, described to judge whether closely to speak include: to judge whether at least to user against electronic equipment
The voice signal spoken in the voice signal of one microphone acquisition comprising user;In response to determining at least one Mike's elegance
The voice signal spoken in the voice signal of collection comprising user extracts voice signal from the voice signal that microphone acquires;Sentence
When whether the amplitude difference of the disconnected voice signal extracted from the voice signal that different microphones acquire is more than predetermined threshold;Response
In determining that Magnitude Difference is more than predetermined threshold, confirmation user closely speaks against electronic equipment.
Preferably, electronic equipment is also operable to: being defined in multiple microphones, the maximum microphone of voice signal amplitude is
Response microphones;The difference of microphone according to response does different processing to the voice input of user.
Preferably, it is described judge user whether closely against electronic equipment speaking include: using in advance training
Machine learning model handles the voice signal of multiple microphones, judges whether user is closely speaking against electronic equipment.
Preferably, the voice that user speaks includes: user with normal quantity one's voice in speech;User is spoken with small volume
Sound;User with vocal cords, do not speak the sound of sending by tune.
Preferably, the voice that user speaks includes: user with normal quantity one's voice in speech, and user is spoken with small volume
Sound, user with vocal cords, do not speak the sound of sending by tune.
Preferably, electronic equipment is also operable to: in response to determining that user is closely speaking against electronic equipment;Sentence
Disconnected user is one of as follows in sounding, comprising: user is with normal quantity one's voice in speech;User is said with small volume
The sound of words;User with vocal cords, do not speak the sound of sending by tune;And it is different according to the result of judgement, to voice signal
Do different processing.
Preferably, the different processing is the different application program processing voice input of activation.
Preferably, the feature of judgement includes volume, spectrum signature, Energy distribution etc..
Preferably, voice signal is inputted done processing as the voice of user includes one or more of: by sound
Sound signal storage on electronic equipment can storage medium;Voice signal is sent by internet;It will be in voice signal
Voice signal be identified as text, store on electronic equipment can storage medium;By the voice signal identification in voice signal
For text, sent by internet;Voice signal in voice signal is identified as text, understands that the voice of user refers to
It enables, executes corresponding operating.
Preferably, electronic equipment is also operable to identify specific user by voiceprint analysis, only to including specific user's language
The voice signal of sound processes.
Preferably, electronic equipment is one of smart phone, smartwatch, intelligent ring, tablet computer.
According to another aspect of the present invention, a kind of electronic equipment being built-in with microphone, electronic equipment have memory and
Central processing unit is stored with computer executable instructions on memory, when computer executable instructions are executed by central processing unit
It is able to carry out following operation: whether judging in the voice signal of microphone acquisition comprising voice signal;In response to confirming microphone
Include voice signal in the voice signal of acquisition, judge whether user is mumbling, i.e., in a manner of lower than normal quantity
It speaks;In response to determining that user is mumbling, without any wake operation using voice signal as voice input
Reason.
Preferably, the described two kinds of sides that mumble to mumble with vocal cords sounding including vocal cords not sounding that mumble
Formula.
Preferably, electronic equipment also operates to: in response to determining that user is mumbling;Judge that user is doing vocal cords not
Sounding mumbles or is doing mumbling for vocal cords sounding;It is different according to the result of judgement, difference is done to voice signal
Processing.
Preferably, different processing is that the different application program of activation carrys out voice responsive input.
Preferably, judge that the signal characteristic whether user uses when mumbling includes volume, spectrum signature, energy
Distribution.
Preferably, judging user, sounding does not mumble or is doing making when mumbling for vocal cords sounding doing vocal cords
Signal characteristic includes volume, spectrum signature, Energy distribution.
Preferably, described to judge whether user includes: to handle Mike's elegance using machine learning model mumbling
The voice signal of collection, judges whether user is mumbling.
Preferably, the machine learning model is convolutional neural networks model or Recognition with Recurrent Neural Network model.
Preferably, sounding does not mumble or is doing mumbling for vocal cords sounding doing vocal cords by the judgement user
It include: the voice signal of processing microphone acquisition using machine learning model, judging user, sounding is not whispered doing vocal cords
Words are doing mumbling for vocal cords sounding.
Preferably, the machine learning model is convolutional neural networks model or Recognition with Recurrent Neural Network model.
Preferably, voice signal is inputted done processing as the voice of user includes one or more of: by sound
Sound signal storage on electronic equipment can storage medium;Voice signal is sent by internet;It will be in voice signal
Voice signal be identified as text, store on electronic equipment can storage medium;By the voice signal identification in voice signal
For text, sent by internet;Voice signal in voice signal is identified as text, understands that the voice of user refers to
It enables, executes corresponding operating.
Preferably, specific user is identified by voiceprint analysis, only the voice signal comprising specific user's voice is processed.
Preferably, electronic equipment is smart phone, smartwatch, intelligent ring etc..
This programme advantage:
1. interaction is more natural.Equipment is placed on before mouth i.e. triggering voice input, meets user's habit and cognition.
2. service efficiency is higher.One hand can be used.Without switching between different user interface/applications, also it is not required to
Some key is pinned, directly lifting hand can use to mouth.
3. radio reception quality is high.The recorder of equipment is in user mouth, and the voice input signal collected is clear, by ambient sound
It influences smaller.
4. high privacy with it is social.Equipment before mouth, user need to only issue relatively small sound can be completed it is high-quality
The voice of amount inputs, smaller to other people interference, while user's posture may include sealing mouth etc., has preferable secret protection.
Detailed description of the invention
From the detailed description with reference to the accompanying drawing to the embodiment of the present invention, above-mentioned and/or other purpose of the invention, spy
The advantage of seeking peace will become clearer and be easier to understand.Wherein:
Fig. 1 is the schematic flow chart of voice input exchange method according to an embodiment of the present invention.
Fig. 2 shows the electronic equipments configured with multiple microphones according to another embodiment of the present invention to use multiple wheats
The overview flow chart of the voice input triggering method of the difference of gram received voice signal of wind.
Fig. 3 is shown the electronic equipment according to an embodiment of the present invention for being built-in with microphone and is identified based on the mode of mumbling
Voice input triggering method overview flow chart.
Fig. 4 describes the overview flow chart of the voice input triggering method of the Distance Judgment of the voice signal based on microphone
Fig. 5 is that the front by mobile phone upper end microphone close to mouth in trigger gesture according to an embodiment of the present invention is illustrated
Figure.
Fig. 6 is to illustrate mobile phone upper end microphone close to the side of mouth in trigger gesture according to an embodiment of the present invention
Figure.
Fig. 7 is the schematic diagram by mobile phone lower end microphone close to mouth in trigger gesture according to an embodiment of the present invention.
Fig. 8 is the schematic diagram by smartwatch microphone close to mouth in trigger gesture according to an embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair
It is bright to be described in further detail.
The disclosure inputs triggering for the voice of intelligent electronic device, in spy in the sound that the microphone based on configuration captures
Sign, to determine whether to trigger voice input application, wherein without traditional physical button triggering, interface element triggering, waking up word
Detection, interaction are more natural.Equipment is placed on before mouth i.e. triggering voice input, meets user's habit and cognition.
It will continue the disclosure from the following aspects below: the voice input of wind noise feature when 1, speaking based on the mankind
Triggering, specifically, the voice and wind noise sound when by identifying that people speaks are to directly initiate voice input and by received sound
Sound signal is as voice input processing;2, the voice of the difference based on the received voice signal of multiple microphones inputs triggering;3,
The voice input triggering identified based on the mode of mumbling;4, the voice input of the Distance Judgment of the voice signal based on microphone
Triggering.
One, the voice of wind noise feature inputs triggering when being spoken based on the mankind
When user closely speaks against microphone, even if sound very little or not triggering vocal cords sounding, Mike's elegance
It include two kinds of acoustic constituents in the voice signal collected, first is that the sound that the vibration of human vocal band the latter oral cavity issues, second is that people speaks
The air-flow of generation hits the wind noise sound that microphone generates.The voice input that electronic equipment can be triggered based on this characteristic is answered
With.
Fig. 1 shows the schematic flow chart of voice input exchange method 100 according to an embodiment of the present invention.
In step s101, the voice signal of analysis microphone acquisition identifies in voice signal whether include what people spoke
Voice and whether comprising people speak generation air-flow hit microphone generate wind noise sound,
In step s 102, in response to determining comprising people's one's voice in speech in voice signal and speaking generation comprising user
Air-flow hit microphone generate wind noise sound, using the voice signal as user voice input process.
The voice input exchange method of the embodiment of the present invention is not particularly suitable for having in the case where privacy requirement is relatively high
Carry out voice input to vocal cords sounding.
Here the voice that user speaks may include: that user is spoken with normal quantity one's voice in speech, user with small volume
Sound, user with vocal cords, do not speak the sound of sending by tune.
In one example, it can identify above-mentioned different tongue, different feedbacks is generated according to recognition result, than
The voice assistant of mobile phone is exactly controlled as normally spoken, mumbles and exactly controls wechat, do not speak vocal cords be exactly to do language by sounding
Phonemic transcription notes.
As an example, it includes one or more of that voice signal, which is inputted done processing as the voice of user:
By in sound signal storage to electronic equipment can storage medium;
Voice signal is sent by internet;
Voice signal in voice signal is identified as text, store on electronic equipment can storage medium;
Voice signal in voice signal is identified as text, is sent by internet;
Voice signal in voice signal is identified as text, understands the phonetic order of user, executes corresponding operating.
It in one example, further include that specific user is identified by voiceprint analysis, only to the sound comprising specific user's voice
Sound signal processes.
In one example, electronic equipment is one of smart phone, smartwatch, intelligent ring.
In one example, using neural network model judge in voice signal the voice whether spoken comprising user and
The wind noise sound that the air-flow shock microphone of generation of speaking generates.This is merely illustrative, and other machines learning algorithm can be used.
In one example, it identifies the voice whether spoken comprising people in voice signal and whether speaks generation comprising people
Air-flow hit microphone generate wind noise sound include:
The voice whether spoken comprising user in identification voice signal;
In response to determining the voice spoken in voice signal comprising user, the phoneme in voice is identified, by voice signal table
It is shown as aligned phoneme sequence;
For each phoneme in aligned phoneme sequence, determine whether the phoneme is phoneme of feeling elated and exultant, it may be assumed that when user's sounding phoneme
There is air-flow to come out from mouth;
It according to fixed length of window cutting is sound clip sequence by voice signal;
Using frequecy characteristic, identify whether each sound clip includes wind noise;
The segment that wind noise is identified as in phoneme and sound fragment sequence venting one's pent-up feelings in phoneme of speech sound sequence is compared, together
When by aligned phoneme sequence non-phoneme venting one's pent-up feelings and wind noise segment make comparisons, when phoneme venting one's pent-up feelings and wind noise segment registration are higher than
Certain threshold value, and when non-phoneme venting one's pent-up feelings and non-wind noise segment registration are lower than certain threshold value, judge include in the voice signal
User speak generation air-flow hit microphone generate wind noise sound.
In one example, it identifies the voice whether spoken comprising people in voice signal and whether speaks generation comprising people
Air-flow hit microphone generate wind noise sound include:
The sound characteristic made an uproar in identification voice signal comprising wind;
It include wind noise in voice signal in response to determining, identification voice signal includes voice signal;
It include voice signal, the corresponding aligned phoneme sequence of recognition of speech signals in voice signal in response to determining;
It makes an uproar feature for the wind in voice signal, the wind for calculating each moment is made an uproar characteristic strength;
For each phoneme in aligned phoneme sequence, phoneme intensity venting one's pent-up feelings is obtained according to data model predetermined;
It is made an uproar the consistency of feature and aligned phoneme sequence by analyzing wind based on Gaussian Mixture Bayesian model, registration is higher than one
When determining threshold value, judge in the voice signal comprising user speak generation air-flow hit microphone generate wind noise sound.
Two, the voice of the difference based on the received voice signal of multiple microphones inputs triggering
Fig. 2 shows the electronic equipments configured with multiple microphones according to another embodiment of the present invention to use multiple wheats
The overview flow chart of the voice input triggering method of the difference of gram received voice signal of wind.
Electronic equipment such as mobile phone is built-in with the electronic equipment of multiple microphones, and electronic equipment has memory and centre
Device is managed, computer executable instructions is stored on memory, can be held when computer executable instructions are executed by central processing unit
The voice of row the present embodiment inputs triggering method.
As shown in Fig. 2, in step s 201, analyzing the voice signal of multiple microphone acquisitions.
In one example, multiple microphones include at least three microphone, constitute microphone array system, pass through sound
The time difference that signal reaches each microphone can estimate spatial position of the sound source relative to smart machine.
Here voice signal includes the amplitude of such as voice signal, frequency etc..
In step S202, based on the voice signal of multiple microphones acquisition, judge whether user is closely opposite
Electronic equipment is spoken.
In one example, judging whether user closely speaks against electronic equipment includes:
User's mouth is calculated relative to microphone using the time difference reached on array between the voice signal of each microphone
The position of array,
When distance of user's mouth apart from electronic equipment is less than certain threshold value, determine user closely against electronics
Equipment is spoken.
In one example, the distance threshold is 10 centimetres.
In step S203, in response to determining that user is closely speaking against electronic equipment, by microphone acquisition
Voice input processing of the voice signal as user.
In one example, which is processed as the input of the voice of user and includes:
According to the difference of distance between speaker's mouth and electronic equipment, different disposal is done to the voice input of user.Example
Such as, when distance is 0-3cm, the voice input of activation voice assistant response user;When distance is 3-10cm, activation wechat is answered
With the voice input of program response user, voice messaging is sent to good friend;
In one example, judging whether user closely speaks against electronic equipment includes:
Judge whether the voice signal spoken in the voice signal of at least one microphone acquisition comprising user,
The voice signal spoken in voice signal in response to determining the acquisition of at least one microphone comprising user, from wheat
Voice signal is extracted in the voice signal of gram elegance collection,
Whether the amplitude difference for judging the voice signal extracted from the voice signal that different microphones acquire is more than predetermined
When threshold value,
In response to determining that Magnitude Difference is more than predetermined threshold, confirmation user closely speaks against electronic equipment.
In the above example, can also include:
Defining the maximum microphone of voice signal amplitude in multiple microphones is response microphones,
The difference of microphone according to response does different processing to the voice input of user.For example, when response microphones are
When the microphone of smart phone bottom, the voice assistant on smart phone is activated;When response microphones are at the top of smart phone
When microphone, activate recorder function by the voice record of user to storage equipment;
In one example, judging whether user closely speaks against electronic equipment includes: that utilization is trained in advance
Machine learning model, handle the voice signal of multiple microphones, judge whether user is closely saying against electronic equipment
Words.Generally, prepare training sample data, selected machine learning model is then trained using training sample data, in reality
Test (is also named) when applying in border sometimes, and the voice signal (as test sample) of multiple microphones captures is inputted machine learning mould
Type, obtained output indicate whether user is closely speaking against electronic equipment.As an example, machine learning model is for example
For deep learning neural network, support vector machine, decision tree etc..
In one example, the voice that user speaks includes: user with normal quantity one's voice in speech, and user is with small volume
One's voice in speech, user with vocal cords, do not speak the sound of sending by tune.
In one example, it includes following a kind of or more for voice signal being inputted done processing as the voice of user
Kind: by sound signal storage to electronic equipment can storage medium;Voice signal is sent by internet;By sound
Voice signal in signal is identified as text, store on electronic equipment can storage medium;By the voice letter in voice signal
Number it is identified as text, is sent by internet;Voice signal in voice signal is identified as text, understands the language of user
Sound instruction, executes corresponding operating.
It further include that specific user is identified by voiceprint analysis, only to the sound comprising specific user's voice in an example
Signal processes.
As an example, electronic equipment is smart phone, smartwatch, intelligent ring, tablet computer etc..
The present embodiment identifies whether user is closely right using the difference of voice signal between built-in different microphones
Electronic equipment speak, and then decide whether start voice input, have many advantages, such as identification reliably, calculation method is simple.
Three, the voice input triggering identified based on the mode of mumbling
Mumble the mode for referring to that speaking volume is less than (for example normally talking with other people) volume of normally speaking.It whispers
Words include two ways.One is what vocal cords did not shook to mumble and (be commonly called as secret words), and another kind is that vocal cords shake
It mumbles.Under the mode that mumbles that vocal cords do not shake, the sound of generation mainly includes that air is issued by throat, mouth
Sound and mouth in tongue tooth issue sound.Under the mode that mumbles of vocal cords vibration, the sound of sending is in addition to packet
The sound generated under the mode that mumbles not shaken containing vocal cords further includes the sound that vocal cords vibration generates.But compared to normal
The tongue of volume, during the mumbling of vocal cords vibration, it is smaller that vocal cords shake degree, the vocal cords of generation vibration sound compared with
It is small.The frequency range for the sound that vocal cords do not shake the sound for the generation that mumbles and vocal cords vibration generates is different, can distinguish.Sound
It mumbles with vibration and speaks and can be distinguished by volume threshold with the normal quantity of vocal cords vibration, specific threshold value can mention
Preceding setting can also be set by user.
Exemplary method: being filtered the voice signal of microphone acquisition, extracts two parts of signals, and respectively vocal cords shake
The raw acoustic constituents V1 of movable property and air pass through sound V2 that throat, mouth issue and that tongue tooth issues in mouth.Work as V1
When being less than certain threshold value with the energy ratio of V2, determine that user is mumbling.
Under normal circumstances, mumbling could only detect when user distance microphone is closer, such as distance
When less than 30 centimetres.And define mumbling in the case of short distance and inputted as voice, it is that one kind is easy to learn for a user
Practise and understand and facilitate the interactive mode of operation, can exempt explicit wake operation, for example, pressing it is specific wake up key or
Person is to wake up word by voice.And this mode is in most of actual use, it will not be by false triggering.
Fig. 3 is shown the electronic equipment according to an embodiment of the present invention equipped with microphone and is identified based on the mode of mumbling
Voice input triggering method overview flow chart.There is memory and central processing unit equipped with the electronic equipment of microphone,
It is stored with computer executable instructions on memory, basis is able to carry out when computer executable instructions are executed by central processing unit
The voice of the embodiment of the present invention inputs triggering method.
As shown in figure 3, whether judging in the voice signal of microphone acquisition in step S301 comprising voice signal.
In step s 302, in response to including voice signal in the voice signal of confirmation microphone acquisition, judge that user is
It is no to mumble, i.e., it speaks in a manner of lower than normal quantity.
In step S303, in response to determining that user is mumbling, sound is believed without any wake operation
Number be used as voice input processing.
Mumble may include vocal cords not sounding the two ways that mumbles to mumble with vocal cords sounding.
In one example, voice input triggering method can also include: and sentence in response to determining that user is mumbling
Doing vocal cords, sounding does not mumble or is doing mumbling for vocal cords sounding disconnected user, different according to the result of judgement,
Different processing is done to voice signal.
As an example, different processing is to give voice input to different application programs to handle.For example it normally speaks
The voice assistant for exactly controlling mobile phone, mumbles and exactly controls wechat, and do not speak vocal cords be exactly to do phonetic transcription notes by sounding.
As an example, judging that the signal characteristic whether user uses when mumbling may include volume, frequency spectrum spy
Sign, Energy distribution etc..
As an example, judging user, doing vocal cords, sounding does not mumble or is doing when mumbling of vocal cords sounding
The signal characteristic used includes volume, spectrum signature, Energy distribution etc..
As an example, judging whether user may include: to handle microphone using machine learning model mumbling
The voice signal of acquisition, judges whether user is mumbling.
As an example, machine learning model can be convolutional neural networks model or Recognition with Recurrent Neural Network model.
As an example, judge user do vocal cords not sounding mumble or in the packet that mumbles for doing vocal cords sounding
Include: using machine learning model, the voice signal of processing microphone acquisition, judging user, sounding does not mumble doing vocal cords
Or doing mumbling for vocal cords sounding.
As an example, it includes one or more of that voice signal, which is inputted done processing as the voice of user:
By in sound signal storage to electronic equipment can storage medium;
Voice signal is sent by internet;
Voice signal in voice signal is identified as text, store on electronic equipment can storage medium;
Voice signal in voice signal is identified as text, is sent by internet;
Voice signal in voice signal is identified as text, understands the phonetic order of user, executes corresponding operating.
As an example, voice input triggering method can also include: by voiceprint analysis identify specific user, only to comprising
The voice signal of specific user's voice processes.
As an example, electronic equipment can be smart phone, smartwatch, intelligent ring etc..
Related mumble mode and detection method, as an example, following bibliography can be referred to:
Zhang,Chi,and John HL Hansen."Analysis and classification of speech
mode:whispered through shouted."Eighth Annual Conference of the International
Speech Communication Association.2007.
Meenakshi,G.Nisha,and Prasanta Kumar Ghosh."Robust whisper activity
detection using long-term log energy variation of sub-band signal."IEEE
Signal Processing Letters 22.11(2015):1859-1863.
Four, the voice of the Distance Judgment of the voice signal based on microphone inputs triggering
Below with reference to the totality of the voice input triggering method of the Distance Judgment of voice signal of Fig. 4 description based on microphone
Flow chart.
As shown in figure 4, in step 401, the voice signal for handling microphones capture judges to whether there is in voice signal
Voice signal.
In step 402, in response to confirming there are voice signal in voice signal, the voice signal based on microphone acquisition
Further judge whether the mouth distance of intelligent electronic device and user are less than predetermined threshold.
In step 403, in response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, microphone is acquired
Voice signal as voice input processing.
In one example, predetermined threshold is 10 centimetres.
Voice signal may include one of following items or combination: user is spoken the sound of sending with normal quantity;With
Family mumbles the sound of sending;User's vocal cords not speak the sound of generation by sounding.
In one example, judge to use when whether the mouth distance of intelligent electronic device and user are less than predetermined threshold
Feature includes temporal signatures and frequency domain character in voice signal, including volume, spectrum energy.
In one example, whether the mouth distance for judging intelligent electronic device and user is less than predetermined threshold packet
It includes: using the data of deep neural network model processing microphone acquisition, judging intelligent electronic device and the mouth distance of user
Whether predetermined threshold is less than.
In one example, whether the mouth distance for judging intelligent electronic device and user is less than predetermined threshold packet
Include: voice signal of the record user when not doing voice input, by voice signal that microphone currently acquires with not do voice defeated
Fashionable voice signal is made comparisons, if it is determined that the voice signal volume that microphone currently acquires is more than when not doing voice input
The certain threshold value of the volume of voice signal judges that the mouth distance of intelligent electronic device and user are less than predetermined threshold.
In one example, it includes following a kind of or more for voice signal being inputted done processing as the voice of user
Kind: by sound signal storage to electronic equipment can storage medium;Voice signal is sent by internet;By sound
Voice signal in signal is identified as text, store on electronic equipment can storage medium;By the voice letter in voice signal
Number it is identified as text, is sent by internet;Voice signal in voice signal is identified as text, understands the language of user
Sound instruction, executes corresponding operating.
In one example, voice input triggering further includes that specific user is identified by voiceprint analysis, only to comprising specific
The voice signal of user speech processes.
In one example, electronic equipment is smart phone, smartwatch, intelligent ring etc..
Fig. 5 to Fig. 8 shows that the microphone of smart electronics portable equipment is placed into mouth closer distance by several users
Position, the voice that user issues at this time will be inputted as voice.Wherein, Fig. 5 and Fig. 6 is the feelings that there is microphone in mobile phone upper end respectively
The microphone of mobile phone when user has interactive voice intention, can be moved to 0~10 centimeters of mouth in this case by condition,
It directly speaks and can be used as voice input.Fig. 7 is that mobile phone lower end has the case where microphone, has microphone similar with aforementioned upper end
Seemingly, two kinds of postures are not mutual exclusions, and interaction side can be implemented in any one posture if mobile phone upper and lower side has microphone
Case.Fig. 8 be corresponding equipment be smartwatch when the case where, with above equipment be mobile phone the case where it is similar.It is above-mentioned to trigger gesture
Explanation be exemplary, and non-exclusive, and be also not necessarily limited to disclosed various equipment and microphone situation.
Voice input is received as single microphone is used and triggers a specific embodiment of voice input, it can be with
The voice input received first by analyzing single microphone judges whether it is voice, and special by the short distance of analysis voice
Some features, such as microphone plosive, near field wind are made an uproar, air blowing sound, energy, spectrum signature, temporal signatures, judge electronic equipment
Itself whether it is less than given threshold value at a distance from the mouth of user, and judges that voice inputs whether source belongs to by Application on Voiceprint Recognition
User can be serviced, in conjunction with the above several points to determine whether inputting microphone signal as voice.
Voice input is received as dual microphone is used and triggers a specific embodiment of voice input, by dividing
The feature difference for analysing dual microphone input signal, such as energy feature, spectrum signature, judge sounding position whether close to one of them
Then microphone passes through by the signal difference of dual microphone to which shielding environment noise, separation voice are to corresponding monophonic
The characteristic analysis method of above-mentioned single microphone judges that electronic equipment itself is less than given threshold value at a distance from the mouth of user, and
User can be serviced by judging whether voice input source belongs to by Application on Voiceprint Recognition, in conjunction with the above several points to determine whether signal is made
For voice input.
Voice input is received as multi-microphone array is used and triggers a specific embodiment of voice input, is led to
The difference for crossing the signal for the voice input that comparative analysis difference microphone receives, by separating near field voice letter from environment
Number, whether identification includes voice with detection voice signal, judges voice signal by the auditory localization technology of multi-microphone array
User's mouth position and the distance between equipment whether be less than predetermined threshold, and by Application on Voiceprint Recognition judge voice input come
Whether source, which belongs to, can service user, in conjunction with the above several points to determine whether inputting signal as voice.
In one example, detect that position of articulation is located at certainly by analysis voice signal in smart electronics portable equipment
Near body namely mobile device is located at user's mouth closer location, and smart electronics portable equipment is just using voice signal as voice
Input understands that the voice of user inputs and completes phase in conjunction with natural language processing technique according to the difference of task and context
Answering for task.
Microphone is not limited to aforementioned exemplary, but may include one of following items or a combination thereof: built in equipment
Single microphone;Dual microphone built in equipment;Multi-microphone array built in equipment;External wireless microphone;And external wired wheat
Gram wind.
As previously mentioned, smart electronics portable equipment can be mobile phone, equipped with ears bluetooth headset, having with microphone
Line earphone or other microphone sensors.
Smart electronics portable equipment can be one of wrist-watch, and intelligent ring, watch intelligent wearable device.
Smart electronics portable equipment is that head-wearing type intelligent shows equipment, is equipped with microphone or multi-microphone group.
In one example, after the input application of electronic apparatus activating voice, feedback output, feedback output packet can be made
Include one of vibration, voice, image or a combination thereof.
The scheme of each embodiment of the present invention can provide following one or more of advantages:
1. interaction is more natural.Equipment is placed on before mouth i.e. triggering voice input, meets user's habit and cognition.
2. service efficiency is higher.One hand can be used.Without switching between different user interface/applications, also it is not required to
Some key is pinned, directly lifting hand can use to mouth.
3. radio reception quality is high.The recorder of equipment is in user mouth, and the voice input signal collected is clear, by ambient sound
It influences smaller.
4. high privacy with it is social.Equipment before mouth, user need to only issue relatively small sound can be completed it is high-quality
The voice of amount inputs, smaller to other people interference, while user's posture may include sealing mouth etc., has preferable secret protection.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.Therefore, protection scope of the present invention is answered
This is subject to the protection scope in claims.
Claims (10)
1. a kind of intelligent electronic device configured with microphone, the smart electronics portable equipment operates as follows carries out base with user
In the interaction of voice input:
The voice signal of processing microphones capture judges in voice signal with the presence or absence of voice signal;
In response to, there are voice signal, the voice signal based on microphone acquisition further judges intelligence electricity in confirmation voice signal
Whether the mouth of sub- equipment and user distance are less than predetermined threshold;And
In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, using the voice signal of microphone acquisition as language
Sound input processing.
2. intelligent electronic device according to claim 1, predetermined threshold is 3 centimetres.
3. intelligent electronic device according to claim 1, predetermined threshold is 1 centimetre.
4. intelligent electronic device according to claim 1, there are also close to optical sensor at the microphone of electronic equipment, by close
Optical sensor judges whether there is object proximity electronic equipment.
5. intelligent electronic device according to claim 1, there are also range sensors at the microphone of electronic equipment, pass through distance and pass
Sensor directly measures electronic equipment at a distance from user's mouth.
6. intelligent electronic device according to claim 1 judges smart electronics by the voice signal property of microphone collection
Whether the mouth of equipment and user distance are less than predetermined threshold.
7. intelligent electronic device according to claim 1, the voice signal includes one of following items or combination:
User is spoken the sound of sending with normal quantity;
User mumbles the sound of sending;
User's vocal cords not speak the sound of generation by sounding.
8. intelligent electronic device according to claim 1, further include:
In response to determining that user is closely speaking against electronic equipment,
Judge user one of as follows in sounding, comprising:
User with normal quantity one's voice in speech,
User with small volume one's voice in speech,
User with vocal cords, do not speak the sound of sending by tune;And
It is different according to the result of judgement, different processing is done to voice signal.
9. a kind of interactive voice awakening method executed by the intelligent electronic device configured with microphone, including operate be used for as follows
Intelligent electronic device and user carry out the interaction inputted based on voice:
Processing microphones capture voice signal judge in voice signal whether there is voice signal,
In response to, there are voice signal, the voice signal based on microphone acquisition further judges intelligence electricity in confirmation voice signal
Whether the mouth of sub- equipment and user distance are less than predetermined threshold;
In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, using the voice signal of microphone acquisition as language
Sound input processing.
10. a kind of computer-readable medium is stored thereon with computer executable instructions, computer executable instructions are by computer
Interactive voice awakening method is able to carry out when execution, the interactive voice awakening method includes:
Processing microphones capture voice signal judge in voice signal whether there is voice signal,
In response to, there are voice signal, the voice signal based on microphone acquisition further judges intelligence electricity in confirmation voice signal
Whether the mouth of sub- equipment and user distance are less than predetermined threshold;
In response to determining that electronic equipment and user's mouth distance are less than predetermined threshold, using the voice signal of microphone acquisition as language
Sound input processing.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910475949.XA CN110097875B (en) | 2019-06-03 | 2019-06-03 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
PCT/CN2020/089551 WO2020244355A1 (en) | 2019-06-03 | 2020-05-11 | Microphone signal-based voice interaction wake-up electronic device, method, and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910475949.XA CN110097875B (en) | 2019-06-03 | 2019-06-03 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097875A true CN110097875A (en) | 2019-08-06 |
CN110097875B CN110097875B (en) | 2022-09-02 |
Family
ID=67450117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910475949.XA Active CN110097875B (en) | 2019-06-03 | 2019-06-03 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110097875B (en) |
WO (1) | WO2020244355A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111276155A (en) * | 2019-12-20 | 2020-06-12 | 上海明略人工智能(集团)有限公司 | Voice separation method, device and storage medium |
CN111343410A (en) * | 2020-02-14 | 2020-06-26 | 北京字节跳动网络技术有限公司 | Mute prompt method and device, electronic equipment and storage medium |
CN111681654A (en) * | 2020-05-21 | 2020-09-18 | 北京声智科技有限公司 | Voice control method and device, electronic equipment and storage medium |
CN111933140A (en) * | 2020-08-27 | 2020-11-13 | 恒玄科技(上海)股份有限公司 | Method, device and storage medium for detecting voice of earphone wearer |
WO2020244355A1 (en) * | 2019-06-03 | 2020-12-10 | 清华大学 | Microphone signal-based voice interaction wake-up electronic device, method, and medium |
CN114260919A (en) * | 2022-01-18 | 2022-04-01 | 华中科技大学同济医学院附属协和医院 | Intelligent robot |
WO2024055831A1 (en) * | 2022-09-14 | 2024-03-21 | 荣耀终端有限公司 | Voice interaction method and apparatus, and terminal |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040141418A1 (en) * | 2003-01-22 | 2004-07-22 | Fujitsu Limited | Speaker distance detection apparatus using microphone array and speech input/output apparatus |
US20130085757A1 (en) * | 2011-09-30 | 2013-04-04 | Kabushiki Kaisha Toshiba | Apparatus and method for speech recognition |
WO2013091677A1 (en) * | 2011-12-20 | 2013-06-27 | Squarehead Technology As | Speech recognition method and system |
CN104657105A (en) * | 2015-01-30 | 2015-05-27 | 腾讯科技(深圳)有限公司 | Method and device for starting voice input function of terminal |
CN105096946A (en) * | 2014-05-08 | 2015-11-25 | 钰太芯微电子科技(上海)有限公司 | Voice activation detection based awakening device and method |
CN106254612A (en) * | 2015-06-15 | 2016-12-21 | 中兴通讯股份有限公司 | A kind of sound control method and device |
CN106412259A (en) * | 2016-09-14 | 2017-02-15 | 广东欧珀移动通信有限公司 | Mobile terminal call control method and apparatus, and mobile terminal |
CN106448672A (en) * | 2016-10-27 | 2017-02-22 | Tcl通力电子(惠州)有限公司 | Sound system and control method |
CN107889031A (en) * | 2017-11-30 | 2018-04-06 | 广东小天才科技有限公司 | A kind of audio control method, audio control apparatus and electronic equipment |
CN109448759A (en) * | 2018-12-28 | 2019-03-08 | 武汉大学 | A kind of anti-voice authentication spoofing attack detection method based on gas explosion sound |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105120059B (en) * | 2015-07-07 | 2019-03-26 | 惠州Tcl移动通信有限公司 | Mobile terminal and its method that earphone call noise reduction is controlled according to breathing power |
CN105847584B (en) * | 2016-05-12 | 2019-03-05 | 歌尔股份有限公司 | A kind of method of smart machine identification secret words |
US10192552B2 (en) * | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
EP3613206A4 (en) * | 2017-06-09 | 2020-10-21 | Microsoft Technology Licensing, LLC | Silent voice input |
CN109686378B (en) * | 2017-10-13 | 2021-06-08 | 华为技术有限公司 | Voice processing method and terminal |
CN110097875B (en) * | 2019-06-03 | 2022-09-02 | 清华大学 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
CN110428806B (en) * | 2019-06-03 | 2023-02-24 | 交互未来(北京)科技有限公司 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
CN110111776A (en) * | 2019-06-03 | 2019-08-09 | 清华大学 | Interactive voice based on microphone signal wakes up electronic equipment, method and medium |
CN110223711B (en) * | 2019-06-03 | 2021-06-01 | 清华大学 | Microphone signal based voice interaction wake-up electronic device, method, and medium |
-
2019
- 2019-06-03 CN CN201910475949.XA patent/CN110097875B/en active Active
-
2020
- 2020-05-11 WO PCT/CN2020/089551 patent/WO2020244355A1/en active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040141418A1 (en) * | 2003-01-22 | 2004-07-22 | Fujitsu Limited | Speaker distance detection apparatus using microphone array and speech input/output apparatus |
US20130085757A1 (en) * | 2011-09-30 | 2013-04-04 | Kabushiki Kaisha Toshiba | Apparatus and method for speech recognition |
WO2013091677A1 (en) * | 2011-12-20 | 2013-06-27 | Squarehead Technology As | Speech recognition method and system |
CN105096946A (en) * | 2014-05-08 | 2015-11-25 | 钰太芯微电子科技(上海)有限公司 | Voice activation detection based awakening device and method |
CN104657105A (en) * | 2015-01-30 | 2015-05-27 | 腾讯科技(深圳)有限公司 | Method and device for starting voice input function of terminal |
CN106254612A (en) * | 2015-06-15 | 2016-12-21 | 中兴通讯股份有限公司 | A kind of sound control method and device |
CN106412259A (en) * | 2016-09-14 | 2017-02-15 | 广东欧珀移动通信有限公司 | Mobile terminal call control method and apparatus, and mobile terminal |
CN106448672A (en) * | 2016-10-27 | 2017-02-22 | Tcl通力电子(惠州)有限公司 | Sound system and control method |
CN107889031A (en) * | 2017-11-30 | 2018-04-06 | 广东小天才科技有限公司 | A kind of audio control method, audio control apparatus and electronic equipment |
CN109448759A (en) * | 2018-12-28 | 2019-03-08 | 武汉大学 | A kind of anti-voice authentication spoofing attack detection method based on gas explosion sound |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020244355A1 (en) * | 2019-06-03 | 2020-12-10 | 清华大学 | Microphone signal-based voice interaction wake-up electronic device, method, and medium |
CN111276155A (en) * | 2019-12-20 | 2020-06-12 | 上海明略人工智能(集团)有限公司 | Voice separation method, device and storage medium |
CN111343410A (en) * | 2020-02-14 | 2020-06-26 | 北京字节跳动网络技术有限公司 | Mute prompt method and device, electronic equipment and storage medium |
CN111681654A (en) * | 2020-05-21 | 2020-09-18 | 北京声智科技有限公司 | Voice control method and device, electronic equipment and storage medium |
CN111933140A (en) * | 2020-08-27 | 2020-11-13 | 恒玄科技(上海)股份有限公司 | Method, device and storage medium for detecting voice of earphone wearer |
CN111933140B (en) * | 2020-08-27 | 2023-11-03 | 恒玄科技(上海)股份有限公司 | Method, device and storage medium for detecting voice of earphone wearer |
CN114260919A (en) * | 2022-01-18 | 2022-04-01 | 华中科技大学同济医学院附属协和医院 | Intelligent robot |
CN114260919B (en) * | 2022-01-18 | 2023-08-29 | 华中科技大学同济医学院附属协和医院 | Intelligent robot |
WO2024055831A1 (en) * | 2022-09-14 | 2024-03-21 | 荣耀终端有限公司 | Voice interaction method and apparatus, and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110097875B (en) | 2022-09-02 |
WO2020244355A1 (en) | 2020-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097875A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
CN110223711A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
CN110428806A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
CN110111776A (en) | Interactive voice based on microphone signal wakes up electronic equipment, method and medium | |
US10276164B2 (en) | Multi-speaker speech recognition correction system | |
CN107481718B (en) | Audio recognition method, device, storage medium and electronic equipment | |
CN103095911B (en) | Method and system for finding mobile phone through voice awakening | |
CN104168353B (en) | Bluetooth headset and its interactive voice control method | |
CN111432303B (en) | Monaural headset, intelligent electronic device, method, and computer-readable medium | |
CN109074806A (en) | Distributed audio output is controlled to realize voice output | |
CN108735209A (en) | Wake up word binding method, smart machine and storage medium | |
CN110164440A (en) | Electronic equipment, method and medium are waken up based on the interactive voice for sealing mouth action recognition | |
CN107978316A (en) | The method and device of control terminal | |
CN108346425A (en) | A kind of method and apparatus of voice activity detection, the method and apparatus of speech recognition | |
CN110364156A (en) | Voice interactive method, system, terminal and readable storage medium storing program for executing | |
CN111105796A (en) | Wireless earphone control device and control method, and voice control setting method and system | |
CN111798850B (en) | Method and system for operating equipment by voice and server | |
EP4002363A1 (en) | Method and apparatus for detecting an audio signal, and storage medium | |
US11626104B2 (en) | User speech profile management | |
CN112102850A (en) | Processing method, device and medium for emotion recognition and electronic equipment | |
CN109036410A (en) | Audio recognition method, device, storage medium and terminal | |
CN110728993A (en) | Voice change identification method and electronic equipment | |
CN111835522A (en) | Audio processing method and device | |
CN107403623A (en) | Store method, terminal, Cloud Server and the readable storage medium storing program for executing of recording substance | |
KR102037789B1 (en) | Sign language translation system using robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |