CN110164443A - Method of speech processing, device and electronic equipment for electronic equipment - Google Patents
Method of speech processing, device and electronic equipment for electronic equipment Download PDFInfo
- Publication number
- CN110164443A CN110164443A CN201910584198.5A CN201910584198A CN110164443A CN 110164443 A CN110164443 A CN 110164443A CN 201910584198 A CN201910584198 A CN 201910584198A CN 110164443 A CN110164443 A CN 110164443A
- Authority
- CN
- China
- Prior art keywords
- electronic equipment
- user
- speech data
- relative position
- position information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Abstract
Present disclose provides a kind of method of speech processing, voice processing apparatus and electronic equipments for electronic equipment.Wherein, method of speech processing for electronic equipment includes: the first voice data in the first reception user;Meet wake-up condition in response to first voice data, wakes up the electronic equipment;By pronunciation receiver in the second speech data of the second reception user, the second speech data is used to indicate the electronic equipment and executes relevant operation;Meet the first specific time length in response to the time span between second moment and first moment, the relative position information of user's face Yu the electronic equipment is determined based on the second speech data;And meet specified conditions in response to the relative position information, it controls the electronic equipment and is based on the second speech data execution relevant operation.
Description
Technical field
This disclosure relates to a kind of method of speech processing for electronic equipment, a kind of voice processing apparatus and a kind of electronics
Equipment.
Background technique
With the fast development of electronic technology, miscellaneous electronic equipment is gradually dissolved into our work and life
In.Wherein, user can pass through voice control electronic equipment when using electronic equipment.But in the related art, when with
After family first passage wakes up word wake-up electronic equipment, if user also needs after a period of time and electronic equipment carries out voice friendship
Mutually, then after needing again by word wake-up electronic equipment is waken up, could continue to carry out interactive voice with electronic equipment, that is, every
Secondary interactive voice requires to wake up electronic equipment by waking up word, causes user experience poor.
Summary of the invention
An aspect of this disclosure provides a kind of method of speech processing for electronic equipment, comprising: at the first moment
The first voice data for receiving user, meets wake-up condition in response to first voice data, wakes up the electronic equipment, leads to
Pronunciation receiver is crossed in the second speech data of the second reception user, the second speech data is used to indicate the electricity
Sub- equipment executes relevant operation, and it is specific to meet first in response to the time span between second moment and first moment
Time span determines the relative position information of user's face Yu the electronic equipment based on the second speech data, in response to
The relative position information meets specified conditions, controls the electronic equipment and is based on the second speech data execution correlation
Operation.
Optionally, the above method further include: in response to the time span between second moment and first moment
Meet the second specific time length, control the electronic equipment and be based on the second speech data execution relevant operation,
In, the second specific time length is less than the first specific time length.
Optionally, above-mentioned pronunciation receiver includes multiple pronunciation receivers, described to be based on the second speech data
Determine the relative position information of user's face Yu the electronic equipment, comprising: handle the second speech data and obtain described
The speech waveform and audio time delay of two voice data, wherein the audio time delay characterizes the multiple pronunciation receiver and receives
To the time difference of the second speech data, be based on the speech waveform and the audio time delay, determine user's face with it is described
The relative position information of electronic equipment.
Optionally, above-mentioned to be based on the speech waveform and the audio time delay, determine user's face and the electronic equipment
Relative position information, comprising: determine whether the type of the speech waveform meets specific type, in response to the speech waveform
Type meet specific type, the relative position information of user's face Yu the electronic equipment is determined based on the audio time delay.
Optionally, above-mentioned multiple pronunciation receivers include the first pronunciation receiver and the second pronunciation receiver, institute
Stating the distance between the first pronunciation receiver and the second pronunciation receiver is specific range, described to be based on the audio time delay
Determine the relative position information of user's face Yu the electronic equipment, comprising: determine that first pronunciation receiver receives
The third moment of the second speech data determines that second pronunciation receiver receives the of the second speech data
Four moment were based on the third moment and the 4th moment, and determined the first delay inequality of the audio time delay, based on described the
One delay inequality and the specific range, determine the relative position information of user's face Yu the electronic equipment.
Optionally, the above method further include: handle the second speech data and obtain the audio of the second speech data
Energy;The relative position information that user's face Yu the electronic equipment are determined based on the audio time delay, comprising: in response to
The audio power is greater than particular energy threshold value, determines the user relative to the electronic equipment based on the audio power
Target position is based on the target position and the audio time delay, determines the relative position of user's face Yu the electronic equipment
Information.
It is optionally, above-mentioned that target position of the user relative to the electronic equipment is determined based on the audio power,
Comprise determining that the first audio power and the second audio power, wherein first audio power is located at for characterizing the user
The front region of the electronic equipment, second audio power is for characterizing the side that the user is located at the electronic equipment
Region, handles first audio power and second audio power obtains processing result, is based on the processing result, determines
Target position of the user relative to the electronic equipment.
Optionally, above-mentioned multiple pronunciation receivers include multiple groups pronunciation receiver, the method also includes: in response to
The audio power is less than or equal to the particular energy threshold value, determines the second delay inequality of the audio time delay, based on described
The location information of second delay inequality and the multiple groups pronunciation receiver determines the opposite position of user's face Yu the electronic equipment
Confidence breath.
An aspect of this disclosure provides a kind of method of speech processing for electronic equipment, comprising: passes through multiple languages
Sound acquisition device acquires the voice data of user, and the voice data is used to indicate the electronic equipment and executes relevant operation, place
It manages the voice data and obtains the speech waveform and audio time delay of the voice data, wherein described in the audio time delay characterization
Multiple voice acquisition devices receive the time difference of the voice data, are based on the speech waveform and the audio time delay, really
The relative position information for determining user's face Yu the electronic equipment meets specified conditions in response to the relative position information, control
It makes the electronic equipment and is based on the voice data execution relevant operation.
Optionally, described to determine that the relative position of user's face and the electronic equipment is believed based on the second speech data
Breath, comprising: handle the second speech data and obtain the speech waveform and audio time delay of the second speech data, wherein institute
It states audio time delay and characterizes the time difference that the multiple pronunciation receiver receives the second speech data, be based on the voice
Waveform and the audio time delay, determine the relative position information of user's face Yu the electronic equipment.
Optionally, above-mentioned to be based on the speech waveform and the audio time delay, determine user's face and the electronic equipment
Relative position information, comprising: determine whether the type of the speech waveform meets specific type, in response to the speech waveform
Type meet specific type, the relative position information of user's face Yu the electronic equipment is determined based on the audio time delay.
Optionally, above-mentioned multiple pronunciation receivers include the first pronunciation receiver and the second pronunciation receiver, institute
Stating the distance between the first pronunciation receiver and the second pronunciation receiver is specific range, described to be based on the audio time delay
Determine the relative position information of user's face Yu the electronic equipment, comprising: determine that first pronunciation receiver receives
The third moment of the second speech data determines that second pronunciation receiver receives the of the second speech data
Four moment were based on the third moment and the 4th moment, and determined the first delay inequality of the audio time delay, based on described the
One delay inequality and the specific range, determine the relative position information of user's face Yu the electronic equipment.
Optionally, the above method further include: handle the second speech data and obtain the audio of the second speech data
Energy;The relative position information that user's face Yu the electronic equipment are determined based on the audio time delay, comprising: in response to
The audio power is greater than particular energy threshold value, determines the user relative to the electronic equipment based on the audio power
Target position is based on the target position and the audio time delay, determines the relative position of user's face Yu the electronic equipment
Information.
It is optionally, above-mentioned that target position of the user relative to the electronic equipment is determined based on the audio power,
Comprise determining that the first audio power and the second audio power, wherein first audio power is located at for characterizing the user
The front region of the electronic equipment, second audio power is for characterizing the side that the user is located at the electronic equipment
Region, handles first audio power and second audio power obtains processing result, is based on the processing result, determines
Target position of the user relative to the electronic equipment.
Optionally, above-mentioned multiple pronunciation receivers include multiple groups pronunciation receiver, the method also includes: in response to
The audio power is less than or equal to the particular energy threshold value, determines the second delay inequality of the audio time delay, based on described
The location information of second delay inequality and the multiple groups pronunciation receiver determines the opposite position of user's face Yu the electronic equipment
Confidence breath.
Another aspect of the disclosure provides a kind of voice processing apparatus, comprising: the first receiving module, wake-up module,
Second receiving module, the first determining module and the first control module.Wherein, the first receiving module is in the first reception user
The first voice data, wake-up module meets wake-up condition in response to first voice data, wakes up the electronic equipment,
Second speech data of two receiving modules by pronunciation receiver in the second reception user, the second speech data use
In indicating that the electronic equipment executes relevant operation, the first determining module in response to second moment and first moment it
Between time span meet the first specific time length, determine that user's face is set with the electronics based on the second speech data
Standby relative position information, the first control module meet specified conditions in response to the relative position information, control the electronics
Equipment is based on the second speech data and executes the relevant operation.
Another aspect of the disclosure provides a kind of electronic equipment, comprising: processor and memory.Wherein, it stores
Device is for storing executable instruction, wherein when described instruction is executed by the processor, so that processor execution is used for
Realize method as above.
Another aspect of the present disclosure provides a kind of non-volatile readable storage medium, is stored with the executable finger of computer
It enables, instructs when executed for realizing method as above.
Another aspect of the present disclosure provides a kind of computer program, and computer program includes computer executable instructions,
Instruction is when executed for realizing method as above.
Detailed description of the invention
In order to which the disclosure and its advantage is more fully understood, referring now to being described below in conjunction with attached drawing, in which:
Fig. 1 diagrammatically illustrates the method for speech processing for electronic equipment and speech processes according to the embodiment of the present disclosure
The application scenarios of device;
Fig. 2 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the first embodiment of the present disclosure
Figure;
Fig. 3 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the second embodiment of the present disclosure
Figure;
Fig. 4 diagrammatically illustrate include according to the electronic equipment of the embodiment of the present disclosure pronunciation receiver schematic diagram;
Fig. 5-Fig. 6 diagrammatically illustrates the schematic diagram that speech waveform is received according to the electronic equipment of the embodiment of the present disclosure;
Fig. 7 diagrammatically illustrates the signal that relative position information is determined based on audio time delay according to the embodiment of the present disclosure
Figure;
Fig. 8 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the third embodiment of the present disclosure
Figure;
Fig. 9, which is diagrammatically illustrated, to be shown according to the determination user of the embodiment of the present disclosure relative to the target position of electronic equipment
It is intended to;
Figure 10 diagrammatically illustrates the stream of the method for speech processing for electronic equipment according to the fourth embodiment of the present disclosure
Cheng Tu;
Figure 11-Figure 12, which is diagrammatically illustrated, determines relative position by multiple groups pronunciation receiver according to the embodiment of the present disclosure
Schematic diagram;
Figure 13 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure;
Figure 14 diagrammatically illustrates the block diagram of the voice processing apparatus according to the first embodiment of the present disclosure;
Figure 15 diagrammatically illustrates the block diagram of the voice processing apparatus according to the second embodiment of the present disclosure;
Figure 16 diagrammatically illustrates the block diagram of the voice processing apparatus according to the third embodiment of the present disclosure;
Figure 17 diagrammatically illustrates the block diagram of the voice processing apparatus according to the fourth embodiment of the present disclosure;And
Figure 18 diagrammatically illustrates the box of the computer system for realizing speech processes according to the embodiment of the present disclosure
Figure.
Specific embodiment
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary
, and it is not intended to limit the scope of the present disclosure.In the following detailed description, to elaborate many specific thin convenient for explaining
Section is to provide the comprehensive understanding to the embodiment of the present disclosure.It may be evident, however, that one or more embodiments are not having these specific thin
It can also be carried out in the case where section.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid
Unnecessarily obscure the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein
The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of
Or add other one or more features, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood
Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification
Meaning, without that should be explained with idealization or excessively mechanical mode.
It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to
Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C "
Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or
System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come
Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least
One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have
B and C, and/or the system with A, B, C etc.).
Shown in the drawings of some block diagrams and/or flow chart.It should be understood that some sides in block diagram and/or flow chart
Frame or combinations thereof can be realized by computer program instructions.These computer program instructions can be supplied to general purpose computer,
The processor of special purpose computer or other programmable control units, so that these instructions can create when executed by this processor
For realizing function/operation device illustrated in these block diagrams and/or flow chart.
Therefore, the technology of the disclosure can be realized in the form of hardware and/or software (including firmware, microcode etc.).Separately
Outside, the technology of the disclosure can take the form of the computer program product on the computer-readable medium for being stored with instruction, should
Computer program product uses for instruction execution system or instruction execution system is combined to use.In the context of the disclosure
In, computer-readable medium, which can be, can include, store, transmitting, propagating or transmitting the arbitrary medium of instruction.For example, calculating
Machine readable medium can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, device or propagation medium.
The specific example of computer-readable medium includes: magnetic memory apparatus, such as tape or hard disk (HDD);Light storage device, such as CD
(CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication link.
Embodiment of the disclosure provides a kind of method of speech processing for electronic equipment, comprising: connects at the first moment
The first voice data for receiving user meets wake-up condition in response to the first voice data, wakes up electronic equipment, pass through phonetic incepting
In the second speech data of the second reception user, second speech data is used to indicate electronic equipment and executes related behaviour device
Make, meets the first specific time length in response to the time span between the second moment and the first moment, be based on the second voice number
According to the relative position information for determining user's face and electronic equipment, meet specified conditions, control electricity in response to relative position information
Sub- equipment is based on second speech data and executes relevant operation.
Fig. 1 diagrammatically illustrates the method for speech processing for electronic equipment and speech processes according to the embodiment of the present disclosure
The application scenarios of device.It should be noted that being only the example that can apply the scene of the embodiment of the present disclosure shown in Fig. 1, with side
The technology contents those skilled in the art understand that disclosure are helped, but are not meant to that the embodiment of the present disclosure may not be usable for other and set
Standby, system, environment or scene.
As shown in Figure 1, the application scenarios 100 for example may include user 110 and electronic equipment 120.
According to the embodiment of the present disclosure, electronic equipment 120 for example can be smart machine, the electronic equipment 120 for example with
Receive the function of voice and speech processes.Wherein, which for example can be computer, smart phone, intelligent sound
Case etc..
For example, user 110 can be interacted by voice and electronic equipment 120, set with will pass through voice control electronics
Standby 120 execute relevant operation.Wherein, user 110 can wake up electronic equipment 120 by waking up word, be called out in electronic equipment 120
After waking up, user 110 can continue to execute relevant operation based on voice command control electronic equipment 120.For example, user 110 can
Word " Hi, XX " are waken up to issue.Electronic equipment 120 after the voice for receiving user 110, judge user 110 voice whether
To wake up word, if it is, responding the wake-up word and waking up, after the wake-up of electronic equipment 120, user 110 can for example be sent out
Phonetic order " please open XXX application " out after electronic equipment 120 receives phonetic order, can respond the phonetic order to beat
Open related application.
Below with reference to the application scenarios of Fig. 1, the use according to disclosure illustrative embodiments is described with reference to Fig. 2~Figure 12
In the method for speech processing of electronic equipment.It should be noted that above-mentioned application scenarios are merely for convenience of understanding the essence of the disclosure
Mind and principle and show, embodiment of the present disclosure is unrestricted in this regard.On the contrary, embodiment of the present disclosure can be with
Applied to applicable any scene.
Fig. 2 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the first embodiment of the present disclosure
Figure.
As shown in Fig. 2, this method includes operation S210~S250.
In operation S210, in the first voice data of the first reception user.
According to the embodiment of the present disclosure, user can for example pass through voice control electronic equipment.Such as it is in electronic equipment
When dormant state or off-mode, user can wake up electronic equipment by waking up word accordingly.For example, when electronic equipment exists
After first reception to the first voice data to user, can further judge whether first voice data is to wake up word.
In operation S220, meets wake-up condition in response to the first voice data, wake up electronic equipment.
For example, it is wake-up word that the first voice data, which meets wake-up condition including the first voice data,.Electronic equipment is judging
First voice data is that after waking up word, can respond first voice data and be waken up, and is indicated convenient for subsequent execution user
Relevant operation.
S230 is being operated, the second speech data by pronunciation receiver in the second reception user, the second voice
Data are used to indicate electronic equipment and execute relevant operation.
According to the embodiment of the present disclosure, pronunciation receiver for example can be microphone or microphone array in electronic equipment
Column etc..Wherein, after electronic equipment is waken up, when user needs further controlling electronic devices to execute relevant operation,
User can issue second speech data, so that electronic equipment receives the second speech data of user and responds second speech data
Execute relevant operation.Wherein, electronic equipment is in the second reception to second speech data, the second moment the first moment it
Afterwards.
In operation S240, it is long to meet the first specific time in response to the time span between the second moment and the first moment
Degree, the relative position information of user's face and electronic equipment is determined based on second speech data.
For example, after receiving the second speech data of user, electronic equipment can further judge the second moment and
Time span between first moment, when the time span at the second moment and the first moment is less than or equal to the first specific time length
When, electronic equipment can determine the relative position information of user's face and electronic equipment based on second speech data.Wherein, first
Specific time length for example can be 30 seconds, 1 minute etc..It is appreciated that the first specific time length can be according to actually answering
Depending on demand.
Wherein, the relative position information of user's face and electronic equipment can be determined based on second speech data.For example, can
To determine that user's face is directed towards the first electronic equipment or back to electronic equipment, further, it is based on second speech data
The direction that user's face can also be calculated and the relative angle between electronic equipment, the relative angle can indicate that user sends out
Electronics is controlled by second speech data convenient for learning whether user has whether towards electronic equipment when second speech data out
The intention of equipment.
In operation S250, meet specified conditions in response to relative position information, controlling electronic devices is based on the second voice number
According to execution relevant operation.
According to the embodiment of the present disclosure, relative position information meets specified conditions and sets for example including user's face towards electronics
It is standby.Alternatively, relative position information meet specified conditions can also include user's face direction between electronic equipment it is opposite
Angle meets special angle.For example, when electronic equipment includes display unit, the direction of user's face between display unit
Relative angle meets special angle.
According to the embodiment of the present disclosure, when relative position information meets specified conditions, electronic equipment can be directly in response to
Two voice data execute relevant operation, without waking up electronic equipment again by wake-up word.That is, the embodiment of the present disclosure passes through
User's face is determined towards the mode with the relative position information between electronic equipment, realizes and is waken up in user by waking up word
In a period of time after electronic equipment (the first moment to a period of time between the second moment, this period of time in for example with
There is no carry out other interactive voices with electronic equipment at family), user directly can execute correlation by voice control electronic equipment
Operation, without again by word wake-up electronic equipment is waken up, the interactive process avoided between user and electronic equipment is numerous
It is trivial, and improve the interactive experience between user and electronic equipment.
Fig. 3 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the second embodiment of the present disclosure
Figure.
As shown in figure 3, this method includes operation S210~S250 and S310.Wherein, operation S210~S250 and upper ginseng
Examine described in Fig. 2 that operation is same or like, and details are not described herein.
In operation S310, it is long to meet the second specific time in response to the time span between the second moment and the first moment
Degree, controlling electronic devices are based on second speech data and execute relevant operation, wherein the second specific time length is specific less than first
Time span.
According to the embodiment of the present disclosure, if the time span between the second moment and the first moment is specific less than or equal to second
Time span can directly control electronic equipment response second speech data and execute relevant operation, judge user without continuing
Relative position information between face and electronic equipment.Wherein, the second specific time length is less than the first specific time length.Example
Such as, when the first specific time length is 30 seconds, the second specific time length be can be 20 seconds, when the first specific time length is
At 1 minute, the second specific time length is 40 seconds etc..
In the embodiments of the present disclosure, the size of the time span between the second moment and the first moment can for example characterize use
Family is used for the probability size that controlling electronic devices executes relevant operation in the second speech data that the second moment issued.For example, the
Time span between two moment and the first moment is smaller, indicates that user issues the after waking up electronic equipment within the short period
It is larger by the probability (the first probability) of second speech data controlling electronic devices to show that user wants for two voice data, at this point,
Electronic equipment can execute relevant operation directly in response to the second speech data, not need to wake up electronics again by waking up word and set
It is standby.When the time span between the second moment and the first moment is bigger, indicate user after waking up electronic equipment the long period it
Interior sending second speech data shows that user is smaller by the probability (the second probability) of second speech data controlling electronic devices,
That is the second probability is less than the first probability, at this point, electronic equipment can continue to judge user whether towards electronic equipment, in user face
Indicate that user wants that the second voice number can be responded at this time by second speech data controlling electronic devices when to electronic equipment
According to relevant operation is executed, without waking up electronic equipment again by waking up word.
With reference to following figure 4-Figure 12, wherein Fig. 4-Fig. 9 description embodiment be suitable between user and electronic equipment away from
From closer scene, the embodiment of Figure 10-Figure 12 description is suitable for the farther away scene of the distance between user and electronic equipment.Its
In, the distance between user and electronic equipment are more closely user within 1 meter for example including the distance between user and electronic equipment
The distance between electronic equipment is farther out for example including the distance between user and electronic equipment more than 1 meter.
Firstly, such as being describeed how really with reference to Fig. 4-Fig. 9 at a distance from user is between electronic equipment under closer scene
Determine the relative position information of user's face and electronic equipment.
Fig. 4 diagrammatically illustrate include according to the electronic equipment of the embodiment of the present disclosure pronunciation receiver schematic diagram.
As shown in figure 4, electronic equipment is for example including multiple pronunciation receivers.Wherein, two are diagrammatically illustrated in Fig. 4
Pronunciation receiver, for example, microphone M1 and microphone M2。
According to the embodiment of the present disclosure, S240 is operated as described in Fig. 2, based on second speech data determine user's face with
The relative position information of electronic equipment for example may comprise steps of (1)~(2).
(1) processing second speech data obtains the speech waveform and audio time delay of second speech data, wherein audio time delay
Characterize the time difference that multiple pronunciation receivers receive second speech data.
For example, electronic equipment can handle second speech data and obtain after electronic equipment receives second speech data
Speech waveform and audio time delay.Wherein, speech waveform for example may include plane waveform, curved surface waveform or other irregular waves
Shape etc..Wherein, due to microphone M1With microphone M2Difference at the time of receiving second speech data, therefore microphone M1And wheat
Gram wind M2The time difference for receiving second speech data is audio time delay.
(2) it is based on speech waveform and audio time delay, determines the relative position information of user's face and electronic equipment.
For example, speech waveform can characterize user issue second speech data when user face be directed towards electronic equipment or
Person is back to electronic equipment.Also, microphone M1With microphone M2Receive second speech data audio time delay can indicate user with
Microphone M1With microphone M2Relative position information.Therefore, user's face can be determined according to speech waveform and audio time delay
With the relative position information of electronic equipment, detailed process is referring to the description in following Fig. 5-Fig. 6.
Fig. 5-Fig. 6 diagrammatically illustrates the schematic diagram that speech waveform is received according to the electronic equipment of the embodiment of the present disclosure.
According to the embodiment of the present disclosure, after processing second speech data obtains speech waveform, it is first determined speech waveform
Whether type meets specific type, secondly, the type in response to speech waveform meets specific type, is determined and is used based on audio time delay
The relative position information of family face and electronic equipment.Wherein, it includes speech wave that whether the type of speech waveform, which meets specific type,
Whether shape is plane waveform.
As shown in figure 5, after electronic equipment receives second speech data, if it is judged that the voice of second speech data
The type of waveform is plane waveform, at least not back to electronic equipment when can issue second speech data with principium identification user, and
Judge the relative position information of user's face and electronic equipment, by audio time delay further so that electronic equipment is based on relatively
Location information determines whether to respond second speech data to execute relevant operation.
As shown in fig. 6, after electronic equipment receives second speech data, if it is judged that the voice of second speech data
It, can be with principium identification user when the type of waveform is not plane waveform (for example, curved surface waveform or other irregular waveforms etc.)
Back to electronic equipment when issuing second speech data, this is because user's face causes to hinder to lead to the transmission of second speech data
Causing speech waveform is not plane waveform.Therefore, when the type of the speech waveform of second speech data is not plane waveform, tentatively
Judge when user issues second speech data that then electronic equipment can be not responding to the second speech data, no back to electronic equipment
The judgement subsequently with respect to relative position information is carried out again.
Fig. 7 diagrammatically illustrates the signal that relative position information is determined based on audio time delay according to the embodiment of the present disclosure
Figure.
As shown in fig. 7, multiple pronunciation receivers include the first pronunciation receiver (microphone M1) and the second voice connect
Receiving apparatus (microphone M2), the distance between the first pronunciation receiver and the second pronunciation receiver are specific range D1.At this
In open embodiment, since each pronunciation receiver in multiple pronunciation receivers is different at a distance from user, no
May be different at the time of receiving second speech data with pronunciation receiver, it is possible thereby to be connect according to multiple pronunciation receivers
The audio time delay of second speech data is received to determine the relative position information between user's face and electronic equipment.
According to the embodiment of the present disclosure, the relative position information of user's face and electronic equipment is determined based on audio time delay, is wrapped
Include following steps (1)~(4).
(1) determine that the first pronunciation receiver receives the third moment of second speech data.
(2) determine that the second pronunciation receiver receives the 4th moment of second speech data.
For example, determining microphone M when electronic equipment receives second speech data1Receive second speech data
Third moment and microphone M2Receive the 4th moment of second speech data.
(3) it is based on third moment and the 4th moment, determines the first delay inequality of audio time delay.
For example, indicating that second speech data arrives first at microphone M when being greater than for four moment at the third moment2, work as third
When moment is less than four moment, indicate that second speech data arrives first at microphone M1(second speech data arrives first at Mike
Wind M2The case where it is as shown in Figure 7).Wherein, the difference between third moment and the 4th moment is the first delay inequality.
(4) the first delay inequality and specific range D are based on1, determine the relative position information of user's face and electronic equipment.
By taking situation shown in Fig. 7 as an example, second speech data arrives first at microphone M1, the first delay inequality is negative at this time
Number, according to the spread speed (for example, velocity of sound) of the first time delay absolute value of the difference and voice it can be seen that distance D in figure2.It should be away from
From D2The difference (absolute value) between first distance and second distance can be characterized.Wherein, first distance can indicate user face
Plane and microphone M where portion1The distance between, plane and microphone M where second distance can indicate user's face2Between
Distance.
In the embodiments of the present disclosure, it is based on distance D2With specific range D1It can be seen that angle R, for example, D2=D1* cosR,
In, due to D1And D2It is known that angle R can be calculated.Wherein, angle R for example can be used to indicate that user's face and electronic equipment
Relative position information, for example, angle R be user's face towards between the plane where the display unit of N and electronic equipment
Angle, wherein the embodiment of the present disclosure assume electronic equipment display unit perpendicular to ground.
As shown in fig. 7, since the distance between user and electronic equipment are relatively close, when based on the first delay inequality (distance D2) and
Specific range D1After the relative position information (angle R) for determining user's face and electronic equipment, user is likely to be at institute in Fig. 7
Location A or B location for showing etc..Therefore, it is necessary to further judge target position locating for user, for example, it is desired to judge to use
Family is in location A or B location.Detailed process is as follows with reference to describing in Fig. 8-Fig. 9.
Fig. 8 diagrammatically illustrates the process of the method for speech processing for electronic equipment according to the third embodiment of the present disclosure
Figure.
As shown in figure 8, this method includes operation S210~S250 and S810.Wherein, operation S210~S250 and upper ginseng
Examine described in Fig. 2 that operation is same or like, and details are not described herein.
In operation S810, processing second speech data obtains the audio power of second speech data.
According to the embodiment of the present disclosure, the audio power of second speech data can for example be indicated between user and electronic equipment
Distance.The audio power of second speech data is bigger, then it represents that the distance between user and electronic equipment are smaller, audio power
It is smaller, then it represents that the distance between user and electronic equipment are bigger.
According to the embodiment of the present disclosure, the relative position information of user's face and electronic equipment is determined based on audio time delay, is wrapped
Include following steps (1)~(2).
(1) it is greater than particular energy threshold value in response to audio power, determines user relative to electronic equipment based on audio power
Target position.
For example, indicating that the distance between user and electronic equipment are smaller, then when audio power is greater than particular energy threshold value
May further determine that user relative to electronic equipment target position (such as shown in fig. 7, determine target position be location A
Or B location).Wherein, determine that the process of target position is referred to as follows shown in Fig. 9 according to audio power.
If audio power is less than or equal to particular energy threshold value, indicate that the distance between user and electronic equipment are larger,
Then determine process side with reference to described in following figure 10-Figure 12 of the relative position information between user and electronic equipment
Formula.
(2) it is based on target position and audio time delay, determines the relative position information of user's face and electronic equipment.
After determining user relative to the target position of electronic equipment, can according to target position and audio time delay compared with
Adequately (relative position information is for example including the A in Fig. 7 for the relative position information between determining user's face and electronic equipment
Position and angle R).
Fig. 9, which is diagrammatically illustrated, to be shown according to the determination user of the embodiment of the present disclosure relative to the target position of electronic equipment
It is intended to.
As shown in figure 9, determining target position of the user relative to electronic equipment based on audio power, include the following steps
(1)~(3)
(1) the first audio power and the second audio power are determined, wherein the first audio power is located at electricity for characterizing user
The front region of sub- equipment, the second audio power is for characterizing the side region that user is located at electronic equipment.
According to the embodiment of the present disclosure, such as the target position of microphone array technological orientation user can be passed through.Specifically,
Such as pass through the first audio power in two kinds of different modes in Beam Forming technology respectively determining second speech data
With the second audio power.Wherein.Such as the first audio power is obtained by Cardioid mode, which can use
It is located at the front region of electronic equipment in expression user, the second audio power is obtained by Dipole mode, the second audio energy
Amount can be used in indicating that user is located at the side region of electronic equipment.Wherein, as shown in figure 9, front region is, for example, the region E,
Side region is, for example, the region F (left and right sides in Fig. 9 is the region F).
(2) it handles the first audio power and the second audio power obtains processing result.
(3) it is based on processing result, determines target position of the user relative to electronic equipment.
For example, the first audio power and the second audio power, which are overlapped processing, obtains processing result, processing result can
Indicate the target position of user.
As shown in figure 9, indicating that user is in front region, then according to user's when the target position for obtaining user is A
The direction and the phase between electronic equipment that relative position information (including target position A and angle R) can determine user's face
Specified conditions are met to location information, and controlling electronic devices responds second speech data.If the target position of user is B
When, indicate that user is in front region, then it can be true according to the relative position information of user (including target position B and angle R)
The direction for determining user's face with the relative position information between electronic equipment is unsatisfactory for specified conditions (user is not towards electronics
Equipment), then electronic equipment is not responding to second speech data.
Similarly, if the target position of user is C, indicate that user is in side region, then according to the opposite of user
The direction and the opposite position between electronic equipment that location information (including target position C and angle R) can determine user's face
Confidence breath meets specified conditions, then controlling electronic devices responds second speech data.If the target position of user is D, table
Show that user is in side region, then user can be determined according to the relative position information of user (including target position D and angle R)
The direction of face is unsatisfactory for specified conditions (user is not towards electronic equipment) with the relative position information between electronic equipment,
Then electronic equipment is not responding to second speech data.
According to the embodiment of the present disclosure, at a distance from user is between electronic equipment under closer scene, by determining user
Relative to the target position of electronic equipment, and determine based on target position and audio time delay the opposite position of user's face electronic equipment
Confidence breath, so as to the second speech data of information controlling electronic devices response user and the directly related behaviour of execution depending on the relative position
Make, does not have to again by word wake-up electronic equipment is waken up, the interactive process avoided between user and electronic equipment is cumbersome, and mentions
The interactive experience between user and electronic equipment is risen.
In addition, such as being describeed how with reference to Figure 10-Figure 12 at a distance from user is between electronic equipment under farther away scene
Determine the relative position information of user's face and electronic equipment.
Figure 10 diagrammatically illustrates the stream of the method for speech processing for electronic equipment according to the fourth embodiment of the present disclosure
Cheng Tu.
As shown in Figure 10, this method includes operation S210~S250 and S1010~S1020.Wherein, operate S210~
S250 is same or like with the upper operation with reference to described in Fig. 2, and details are not described herein.
Figure 11-Figure 12, which is diagrammatically illustrated, determines relative position by multiple groups pronunciation receiver according to the embodiment of the present disclosure
Schematic diagram.
As is illustrated by figs. 11 and 12, multiple pronunciation receivers include multiple groups pronunciation receiver.For example including three groups of languages
Sound reception device, every group of pronunciation receiver for example may include two microphones.Due between user and electronic equipment away from
From farther out, therefore, the opposite position between user and electronic equipment can relatively accurately be determined by multiple groups pronunciation receiver
Confidence breath.
In conjunction with shown in Figure 10, Figure 11 and Figure 12, in operation S1010, it is less than or equal to particular energy in response to audio power
Threshold value determines the second delay inequality of audio time delay.
For example, receiving second speech data by multiple groups pronunciation receiver first, and judge the language of second speech data
Whether the type of sound wave shape is plane waveform type, if it is, can further judge the audio power of second speech data
Whether particular energy threshold value is less than, if it is, the distance between user and electronic equipment are indicated farther out, at this point it is possible into one
Walk the second delay inequality for determining that multiple groups pronunciation receiver receives second speech data.
Operation S1020, the location information based on the second delay inequality and multiple groups pronunciation receiver, determine user's face with
The relative position information of electronic equipment.
As shown in figure 12, farther out due to the distance between user and electronic equipment, the change in location of user is to user
Face whether towards electronic equipment influence it is smaller.For example, after user is moved to B distance from A distance, it is believed that user's face
Always towards electronic equipment, (when rather than the user in Fig. 7 and electronic equipment short distance, user is set in location A towards electronics
It is standby, in B location then not towards electronic equipment).Therefore, at a distance from user is between electronic equipment farther out when, due to user
The variation of target position to user's face whether towards electronic equipment influence it is smaller, from there through the second delay inequality and multiple groups language
The location information of sound reception device is the relative position information that can determine that user's face and electronic equipment.
According to the embodiment of the present disclosure, at a distance from user is between electronic equipment under farther away scene, audio can be based on
Time delay determines the relative position information of user's face and electronic equipment, so that information controlling electronic devices responds depending on the relative position
The second speech data of user simultaneously directly executes relevant operation, does not have to avoid use again by word wake-up electronic equipment is waken up
Interactive process between family and electronic equipment is cumbersome, and improves the interactive experience between user and electronic equipment.
It is opposite between user's face and electronic equipment in addition to being determined according to second speech data according to the embodiment of the present disclosure
Except location information, the relative position information between user's face and electronic equipment can also be determined by other sensors.Example
It can such as be obtained by radar, TOF (Time OfFlight) distance measuring sensor, the outer scanner mode of thermal technology about user
The data of face, to determine the relative position information between user's face and electronic equipment.
Figure 13 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure.
As shown in figure 13, the electronic equipment 1300 of the embodiment of the present disclosure includes: processor 1310 and memory 1320.Its
In, memory 1320 is for storing executable instruction, wherein when instruction is executed by processor 1310, so that processor 1310
The method of speech processing as shown in Fig. 2-Figure 12 is executed, details are not described herein.
Figure 14 diagrammatically illustrates the block diagram of the voice processing apparatus according to the first embodiment of the present disclosure.
As shown in figure 14, voice processing apparatus 1400 includes the first receiving module 1410, the reception of wake-up module 1420, second
Module 1430, the first determining module 1440 and the first control module 1450.
First receiving module 1410 can be used for the first voice data in the first reception user.According to disclosure reality
Example is applied, the first receiving module 1410 can for example execute the operation S210 above with reference to Fig. 2 description, and details are not described herein.
Wake-up module 1420 can be used for meeting in response to the first voice data wake-up condition, wake up electronic equipment.According to
The embodiment of the present disclosure, wake-up module 1420 can for example execute the operation S220 above with reference to Fig. 2 description, and details are not described herein.
Second receiving module 1430 can be used for the second voice by pronunciation receiver in the second reception user
Data, second speech data are used to indicate electronic equipment and execute relevant operation.According to the embodiment of the present disclosure, the second receiving module
1430 can for example execute the operation S230 above with reference to Fig. 2 description, and details are not described herein.
First determining module 1440 can be used for meeting the in response to the time span between the second moment and the first moment
One specific time length, the relative position information of user's face and electronic equipment is determined based on second speech data.
According to the embodiment of the present disclosure, pronunciation receiver includes multiple pronunciation receivers, true based on second speech data
Determine the relative position information of user's face and electronic equipment, comprising: processing second speech data obtains the language of second speech data
Sound wave shape and audio time delay, wherein audio time delay characterizes the time difference that multiple pronunciation receivers receive second speech data,
Based on speech waveform and audio time delay, the relative position information of user's face and electronic equipment is determined.
According to the embodiment of the present disclosure, it is based on speech waveform and audio time delay, determines the opposite of user's face and electronic equipment
Location information, comprising: determine whether the type of speech waveform meets specific type, in response to speech waveform type meet it is specific
Type determines the relative position information of user's face and electronic equipment based on audio time delay.
According to the embodiment of the present disclosure, multiple pronunciation receivers include that the first pronunciation receiver and the second phonetic incepting fill
It sets, the distance between the first pronunciation receiver and the second pronunciation receiver are specific range, are determined and are used based on audio time delay
The relative position information of family face and electronic equipment, comprising: determine that the first pronunciation receiver receives second speech data
The third moment determines that the second pronunciation receiver receives the 4th moment of second speech data, is based on third moment and the 4th
Moment determines the first delay inequality of audio time delay, is based on the first delay inequality and specific range, determines user's face and electronic equipment
Relative position information.
According to the embodiment of the present disclosure, the first determining module 1440 can for example execute the operation above with reference to Fig. 2 description
S240, details are not described herein.
First control module 1450 can be used for meeting in response to relative position information specified conditions, controlling electronic devices base
Relevant operation is executed in second speech data.According to the embodiment of the present disclosure, the first control module 1450 can for example be executed above
With reference to the operation S250 that Fig. 2 is described, details are not described herein.
Figure 15 diagrammatically illustrates the block diagram of the voice processing apparatus according to the second embodiment of the present disclosure.
As shown in figure 15, voice processing apparatus 1500 includes the first receiving module 1410, the reception of wake-up module 1420, second
Module 1430, the first determining module 1440, the first control module 1450 and the second control module 1510.Wherein, it first receives
Module 1410, wake-up module 1420, the second receiving module 1430, the first determining module 1440 and the first control module 1450 are such as
On with reference to Figure 14 describe module it is same or like, details are not described herein.
Second control module 1510 can be used for meeting the in response to the time span between the second moment and the first moment
Two specific time length, controlling electronic devices are based on second speech data and execute relevant operation, wherein the second specific time length
Less than the first specific time length.According to the embodiment of the present disclosure, the second control module 1510 can for example be executed above with reference to Fig. 3
The operation S310 of description, details are not described herein.
Figure 16 diagrammatically illustrates the block diagram of the voice processing apparatus according to the third embodiment of the present disclosure.
As shown in figure 16, voice processing apparatus 1600 includes the first receiving module 1410, the reception of wake-up module 1420, second
Module 1430, the first determining module 1440, the first control module 1450 and processing module 1610.Wherein, the first receiving module
1410, wake-up module 1420, the second receiving module 1430, the first determining module 1440 and the first control module 1450 are as above joined
The module for examining Figure 14 description is same or like, and details are not described herein.
Processing module 1610 can be used for handling second speech data and obtain the audio power of second speech data.
According to the embodiment of the present disclosure, the relative position information of user's face and electronic equipment is determined based on audio time delay, is wrapped
It includes: being greater than particular energy threshold value in response to audio power, target position of the user relative to electronic equipment is determined based on audio power
It sets, is based on target position and audio time delay, determines the relative position information of user's face and electronic equipment.
According to the embodiment of the present disclosure, target position of the user relative to electronic equipment is determined based on audio power, comprising: really
Fixed first audio power and the second audio power, wherein the first audio power is for characterizing the front that user is located at electronic equipment
Region, the second audio power handle the first audio power and the second sound for characterizing the side region that user is located at electronic equipment
Frequency energy obtains processing result, is based on processing result, determines target position of the user relative to electronic equipment.
According to the embodiment of the present disclosure, processing module 1610 can for example execute the operation S810 above with reference to Fig. 8 description,
This is repeated no more.
Figure 17 diagrammatically illustrates the block diagram of the voice processing apparatus according to the fourth embodiment of the present disclosure.
As shown in figure 17, voice processing apparatus 1700 includes the first receiving module 1410, the reception of wake-up module 1420, second
Module 1430, the first determining module 1440, the first control module 1450, the second determining module 1710 and third determining module
1720.Wherein, the first receiving module 1410, wake-up module 1420, the second receiving module 1430, the first determining module 1440 and
First control module 1450 is same or like above with reference to the module that Figure 14 is described, and details are not described herein.
Second determining module 1710 can be used for being less than or equal to particular energy threshold value in response to audio power, determine audio
Second delay inequality of time delay.According to the embodiment of the present disclosure, the second determining module 1710 can for example be executed retouches above with reference to Figure 10
The operation S1010 stated, details are not described herein.
Third determining module 1720 can be used for the location information based on the second delay inequality and multiple groups pronunciation receiver, really
Determine the relative position information of user's face and electronic equipment.According to the embodiment of the present disclosure, third determining module 1720 for example can be with
The operation S1020 described above with reference to Figure 10 is executed, details are not described herein.
It is module according to an embodiment of the present disclosure, submodule, unit, any number of or in which any more in subelement
A at least partly function can be realized in a module.It is single according to the module of the embodiment of the present disclosure, submodule, unit, son
Any one or more in member can be split into multiple modules to realize.According to the module of the embodiment of the present disclosure, submodule,
Any one or more in unit, subelement can at least be implemented partly as hardware circuit, such as field programmable gate
Array (FPGA), programmable logic array (PLA), system on chip, the system on substrate, the system in encapsulation, dedicated integrated electricity
Road (ASIC), or can be by the hardware or firmware for any other rational method for integrate or encapsulate to circuit come real
Show, or with any one in three kinds of software, hardware and firmware implementations or with wherein any several appropriately combined next reality
It is existing.Alternatively, can be at least by part according to one or more of the module of the embodiment of the present disclosure, submodule, unit, subelement
Ground is embodied as computer program module, when the computer program module is run, can execute corresponding function.
For example, the first receiving module 1410, wake-up module 1420, the second receiving module 1430, the first determining module 1440,
First control module 1450, the second control module 1510, processing module 1610, the second determining module 1710 and third determine mould
Any number of in block 1720, which may be incorporated in a module, to be realized or any one module therein can be split into
Multiple modules.Alternatively, at least partly function of one or more modules in these modules can be at least portion of other modules
Point function combines, and realizes in a module.In accordance with an embodiment of the present disclosure, module 610, the first control module are obtained
620, at least one of memory module 710, the second control module 720 and third control module 810 can be at least by parts
Ground is embodied as hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), system on chip, substrate
On system, the system in encapsulation, specific integrated circuit (ASIC), or can be by carrying out integrated to circuit or encapsulating any
The hardware such as other rational methods or firmware realize, or with any one in three kinds of software, hardware and firmware implementations
Or it several appropriately combined is realized with wherein any.Alternatively, the first receiving module 1410, wake-up module 1420, second receive
Module 1430, the first determining module 1440, the first control module 1450, the second control module 1510, processing module 1610, second
At least one of determining module 1710 and third determining module 1720 can at least be implemented partly as computer program
Module can execute corresponding function when the computer program module is run.
Figure 18 diagrammatically illustrates the box of the computer system for realizing speech processes according to the embodiment of the present disclosure
Figure.Computer system shown in Figure 18 is only an example, should not function to the embodiment of the present disclosure and use scope bring
Any restrictions.
As shown in figure 18, the computer system 1800 for realizing speech processes includes processor 1801, computer-readable storage
Medium 1802.The system 1800 can execute the method according to the embodiment of the present disclosure.
Specifically, processor 1801 for example may include general purpose microprocessor, instruction set processor and/or related chip group
And/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor 1801 can also include for caching
The onboard storage device of purposes.Processor 1801 can be the different movements for executing the method flow according to the embodiment of the present disclosure
Single treatment unit either multiple processing units.
Computer readable storage medium 1802, for example, can be can include, store, transmitting, propagating or transmitting instruction
Arbitrary medium.For example, readable storage medium storing program for executing can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, dress
It sets, device or propagation medium.The specific example of readable storage medium storing program for executing includes: magnetic memory apparatus, such as tape or hard disk (HDD);Light
Storage device, such as CD (CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication
Link.
Computer readable storage medium 1802 may include computer program 1803, which may include
Code/computer executable instructions executes processor 1801 and is implemented according to the disclosure
The method or its any deformation of example.
Computer program 1803 can be configured to have the computer program code for example including computer program module.Example
Such as, in the exemplary embodiment, the code in computer program 1803 may include one or more program modules, for example including
1803A, module 1803B ....It should be noted that the division mode and number of module are not fixed, those skilled in the art
It can be combined according to the actual situation using suitable program module or program module, when these program modules are combined by processor
When 1801 execution, processor 1801 is executed according to the method for the embodiment of the present disclosure or its any deformation.
In accordance with an embodiment of the present disclosure, the first receiving module 1410, wake-up module 1420, the second receiving module 1430,
One determining module 1440, the first control module 1450, the second control module 1510, processing module 1610, the second determining module
At least one of 1710 and third determining module 1720 can be implemented as the computer program module with reference to Figure 18 description,
When being executed by processor 1801, corresponding operating described above may be implemented.
The disclosure additionally provides a kind of computer-readable medium, which, which can be in above-described embodiment, retouches
Included in the equipment/device/system stated;It is also possible to individualism, and without in the supplying equipment/device/system.On
State computer-readable medium and carry one or more program, when said one or multiple programs are performed, realize with
Upper method of speech processing.
In accordance with an embodiment of the present disclosure, computer-readable medium can be computer-readable signal media or computer can
Read storage medium either the two any combination.Computer readable storage medium for example can be --- but it is unlimited
In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates
The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires
Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or
The above-mentioned any appropriate combination of person.In the disclosure, computer readable storage medium can be it is any include or storage program
Tangible medium, which can be commanded execution system, device or device use or in connection.And in this public affairs
In opening, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable
Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by
Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium
Sequence code can transmit with any suitable medium, including but not limited to: wireless, wired, optical cable, radiofrequency signal etc., or
Above-mentioned any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
It will be understood by those skilled in the art that the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination, even if such combination or combination are not expressly recited in the disclosure.Particularly, exist
In the case where not departing from disclosure spirit or teaching, the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.
Although the disclosure, art technology has shown and described referring to the certain exemplary embodiments of the disclosure
Personnel it should be understood that in the case where the spirit and scope of the present disclosure limited without departing substantially from the following claims and their equivalents,
A variety of changes in form and details can be carried out to the disclosure.Therefore, the scope of the present disclosure should not necessarily be limited by above-described embodiment,
But should be not only determined by appended claims, also it is defined by the equivalent of appended claims.
Claims (10)
1. a kind of method of speech processing for electronic equipment, comprising:
In the first voice data of the first reception user;
Meet wake-up condition in response to first voice data, wakes up the electronic equipment;
By pronunciation receiver in the second speech data of the second reception user, the second speech data is used to indicate
The electronic equipment executes relevant operation;
Meet the first specific time length in response to the time span between second moment and first moment, is based on institute
State the relative position information that second speech data determines user's face Yu the electronic equipment;And
Meet specified conditions in response to the relative position information, controls the electronic equipment and held based on the second speech data
The row relevant operation.
2. according to the method described in claim 1, further include:
Meet the second specific time length in response to the time span between second moment and first moment, controls institute
It states electronic equipment and is based on the second speech data execution relevant operation, wherein the second specific time length is less than
The first specific time length.
3. according to the method described in claim 1, wherein, the pronunciation receiver includes multiple pronunciation receivers, described
The relative position information of user's face Yu the electronic equipment is determined based on the second speech data, comprising:
It handles the second speech data and obtains the speech waveform and audio time delay of the second speech data, wherein the sound
Frequency time delay characterizes the time difference that the multiple pronunciation receiver receives the second speech data;And
Based on the speech waveform and the audio time delay, the relative position information of user's face Yu the electronic equipment is determined.
4. according to the method described in claim 3, wherein, described to be based on the speech waveform and the audio time delay, determination is used
The relative position information of family face and the electronic equipment, comprising:
Determine whether the type of the speech waveform meets specific type;And
Meet specific type in response to the type of the speech waveform, user's face and the electricity are determined based on the audio time delay
The relative position information of sub- equipment.
5. according to the method described in claim 4, wherein, the multiple pronunciation receiver include the first pronunciation receiver and
Second pronunciation receiver, the distance between first pronunciation receiver and the second pronunciation receiver are specific range,
The relative position information that user's face Yu the electronic equipment are determined based on the audio time delay, comprising:
Determine that first pronunciation receiver receives the third moment of the second speech data;
Determine that second pronunciation receiver receives the 4th moment of the second speech data;
Based on the third moment and the 4th moment, the first delay inequality of the audio time delay is determined;And
Based on first delay inequality and the specific range, determine that the relative position of user's face and the electronic equipment is believed
Breath.
6. according to the method described in claim 4, further include: it handles the second speech data and obtains the second speech data
Audio power;The relative position information that user's face Yu the electronic equipment are determined based on the audio time delay, comprising:
It is greater than particular energy threshold value in response to the audio power, determines the user relative to described based on the audio power
The target position of electronic equipment;And
Based on the target position and the audio time delay, the relative position information of user's face Yu the electronic equipment is determined.
7. described to determine the user relative to described based on the audio power according to the method described in claim 6, wherein
The target position of electronic equipment, comprising:
Determine the first audio power and the second audio power, wherein first audio power is located at for characterizing the user
The front region of the electronic equipment, second audio power is for characterizing the side that the user is located at the electronic equipment
Region;
It handles first audio power and second audio power obtains processing result;And
Based on the processing result, target position of the user relative to the electronic equipment is determined.
8. according to the method described in claim 6, wherein, the multiple pronunciation receiver includes multiple groups pronunciation receiver,
The method also includes:
It is less than or equal to the particular energy threshold value in response to the audio power, determines the second time delay of the audio time delay
Difference;And
Location information based on second delay inequality and the multiple groups pronunciation receiver, determines user's face and the electronics
The relative position information of equipment.
9. a kind of voice processing apparatus, comprising:
First receiving module, in the first voice data of the first reception user;
Wake-up module meets wake-up condition in response to first voice data, wakes up the electronic equipment;
Second receiving module, the second speech data by pronunciation receiver in the second reception user, second language
Sound data are used to indicate the electronic equipment and execute relevant operation;
First determining module, when specific in response to the time span satisfaction first between second moment and first moment
Between length, the relative position information of user's face Yu the electronic equipment is determined based on the second speech data;And
First control module meets specified conditions in response to the relative position information, controls the electronic equipment based on described
Second speech data executes the relevant operation.
10. a kind of electronic equipment, comprising:
Processor;And
Memory, for storing executable instruction, wherein when described instruction is executed by the processor, so that the processing
Device executes the method as described in claim 1~8 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584198.5A CN110164443B (en) | 2019-06-28 | 2019-06-28 | Voice processing method and device for electronic equipment and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584198.5A CN110164443B (en) | 2019-06-28 | 2019-06-28 | Voice processing method and device for electronic equipment and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110164443A true CN110164443A (en) | 2019-08-23 |
CN110164443B CN110164443B (en) | 2021-09-14 |
Family
ID=67637146
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910584198.5A Active CN110164443B (en) | 2019-06-28 | 2019-06-28 | Voice processing method and device for electronic equipment and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110164443B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110575040A (en) * | 2019-09-09 | 2019-12-17 | 珠海格力电器股份有限公司 | Control method and control terminal of intelligent curtain and intelligent curtain control system |
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
CN113626778A (en) * | 2020-05-08 | 2021-11-09 | 百度在线网络技术(北京)有限公司 | Method, apparatus, electronic device, and computer storage medium for waking up device |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060143017A1 (en) * | 2004-12-24 | 2006-06-29 | Kabushiki Kaisha Toshiba | Interactive robot, speech recognition method and computer program product |
JP5326934B2 (en) * | 2009-01-23 | 2013-10-30 | 株式会社Jvcケンウッド | Electronics |
US9378733B1 (en) * | 2012-12-19 | 2016-06-28 | Google Inc. | Keyword detection without decoding |
CN106531190A (en) * | 2016-10-12 | 2017-03-22 | 科大讯飞股份有限公司 | Speech quality evaluation method and device |
CN106653021A (en) * | 2016-12-27 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Voice wake-up control method and device and terminal |
US20170206900A1 (en) * | 2016-01-20 | 2017-07-20 | Samsung Electronics Co., Ltd. | Electronic device and voice command processing method thereof |
CN107113498A (en) * | 2014-12-26 | 2017-08-29 | 爱信精机株式会社 | Sound processing apparatus |
CN107123421A (en) * | 2017-04-11 | 2017-09-01 | 广东美的制冷设备有限公司 | Sound control method, device and home appliance |
US20180081352A1 (en) * | 2016-09-22 | 2018-03-22 | International Business Machines Corporation | Real-time analysis of events for microphone delivery |
CN108369476A (en) * | 2015-12-11 | 2018-08-03 | 索尼公司 | Information processing equipment, information processing method and program |
CN108538298A (en) * | 2018-04-04 | 2018-09-14 | 科大讯飞股份有限公司 | voice awakening method and device |
EP3379844A1 (en) * | 2015-11-17 | 2018-09-26 | Sony Corporation | Information processing device, information processing method, and program |
CN108831474A (en) * | 2018-05-04 | 2018-11-16 | 广东美的制冷设备有限公司 | Speech recognition apparatus and its voice signal catching method, device and storage medium |
CN108962250A (en) * | 2018-09-26 | 2018-12-07 | 出门问问信息科技有限公司 | Audio recognition method, device and electronic equipment |
CN109710080A (en) * | 2019-01-25 | 2019-05-03 | 华为技术有限公司 | A kind of screen control and sound control method and electronic equipment |
CN109814718A (en) * | 2019-01-30 | 2019-05-28 | 天津大学 | A kind of multi-modal information acquisition system based on Kinect V2 |
-
2019
- 2019-06-28 CN CN201910584198.5A patent/CN110164443B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060143017A1 (en) * | 2004-12-24 | 2006-06-29 | Kabushiki Kaisha Toshiba | Interactive robot, speech recognition method and computer program product |
JP5326934B2 (en) * | 2009-01-23 | 2013-10-30 | 株式会社Jvcケンウッド | Electronics |
US9378733B1 (en) * | 2012-12-19 | 2016-06-28 | Google Inc. | Keyword detection without decoding |
CN107113498A (en) * | 2014-12-26 | 2017-08-29 | 爱信精机株式会社 | Sound processing apparatus |
EP3379844A1 (en) * | 2015-11-17 | 2018-09-26 | Sony Corporation | Information processing device, information processing method, and program |
CN108369476A (en) * | 2015-12-11 | 2018-08-03 | 索尼公司 | Information processing equipment, information processing method and program |
US20170206900A1 (en) * | 2016-01-20 | 2017-07-20 | Samsung Electronics Co., Ltd. | Electronic device and voice command processing method thereof |
US20180081352A1 (en) * | 2016-09-22 | 2018-03-22 | International Business Machines Corporation | Real-time analysis of events for microphone delivery |
CN106531190A (en) * | 2016-10-12 | 2017-03-22 | 科大讯飞股份有限公司 | Speech quality evaluation method and device |
CN106653021A (en) * | 2016-12-27 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Voice wake-up control method and device and terminal |
CN107123421A (en) * | 2017-04-11 | 2017-09-01 | 广东美的制冷设备有限公司 | Sound control method, device and home appliance |
CN108538298A (en) * | 2018-04-04 | 2018-09-14 | 科大讯飞股份有限公司 | voice awakening method and device |
CN108831474A (en) * | 2018-05-04 | 2018-11-16 | 广东美的制冷设备有限公司 | Speech recognition apparatus and its voice signal catching method, device and storage medium |
CN108962250A (en) * | 2018-09-26 | 2018-12-07 | 出门问问信息科技有限公司 | Audio recognition method, device and electronic equipment |
CN109710080A (en) * | 2019-01-25 | 2019-05-03 | 华为技术有限公司 | A kind of screen control and sound control method and electronic equipment |
CN109814718A (en) * | 2019-01-30 | 2019-05-28 | 天津大学 | A kind of multi-modal information acquisition system based on Kinect V2 |
Non-Patent Citations (2)
Title |
---|
XINGWEI SUN: "Effect of Steering Vector Estimation on MVDR Beamformer for Noisy Speech Recognition", 《2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP)》 * |
刘龙梅: "基于灾害现场机器人救援的声源定位研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110575040A (en) * | 2019-09-09 | 2019-12-17 | 珠海格力电器股份有限公司 | Control method and control terminal of intelligent curtain and intelligent curtain control system |
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
CN113626778A (en) * | 2020-05-08 | 2021-11-09 | 百度在线网络技术(北京)有限公司 | Method, apparatus, electronic device, and computer storage medium for waking up device |
CN113626778B (en) * | 2020-05-08 | 2024-04-02 | 百度在线网络技术(北京)有限公司 | Method, apparatus, electronic device and computer storage medium for waking up device |
Also Published As
Publication number | Publication date |
---|---|
CN110164443B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6455686B2 (en) | Distributed wireless speaker system | |
CN108877770B (en) | Method, device and system for testing intelligent voice equipment | |
CN110992974B (en) | Speech recognition method, apparatus, device and computer readable storage medium | |
US9854362B1 (en) | Networked speaker system with LED-based wireless communication and object detection | |
US10075791B2 (en) | Networked speaker system with LED-based wireless communication and room mapping | |
US10339913B2 (en) | Context-based cancellation and amplification of acoustical signals in acoustical environments | |
US20160284350A1 (en) | Controlling electronic device based on direction of speech | |
CN110164443A (en) | Method of speech processing, device and electronic equipment for electronic equipment | |
US11435429B2 (en) | Method and system of acoustic angle of arrival detection | |
CN109286875A (en) | For orienting method, apparatus, electronic equipment and the storage medium of pickup | |
US11574626B2 (en) | Method of controlling intelligent security device | |
US9826332B2 (en) | Centralized wireless speaker system | |
EP2945156A1 (en) | Audio signal recognition method and electronic device supporting the same | |
KR20220117282A (en) | Audio device auto-location | |
US9924286B1 (en) | Networked speaker system with LED-based wireless communication and personal identifier | |
CN104244055B (en) | Real-time interaction method within the scope of the multimedia equipment useful space | |
CN113053368A (en) | Speech enhancement method, electronic device, and storage medium | |
JP2018096961A (en) | Gesture recognition system, and gesture recognition method using the same | |
US11656837B2 (en) | Electronic device for controlling sound and operation method therefor | |
US20210225374A1 (en) | Method and system of environment-sensitive wake-on-voice initiation using ultrasound | |
US20230236318A1 (en) | PERFORMANCE OF A TIME OF FLIGHT (ToF) LASER RANGE FINDING SYSTEM USING ACOUSTIC-BASED DIRECTION OF ARRIVAL (DoA) | |
CN114566171A (en) | Voice awakening method and electronic equipment | |
US11398070B1 (en) | Boundary approximation utilizing radar | |
US10506192B2 (en) | Gesture-activated remote control | |
CN113411649B (en) | TV state detecting device and system using infrasound signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |