CN107146614A

CN107146614A - A kind of audio signal processing method, device and electronic equipment

Info

Publication number: CN107146614A
Application number: CN201710231244.4A
Authority: CN
Inventors: 李福祥; 李峥
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2017-04-10
Filing date: 2017-04-10
Publication date: 2017-09-08
Anticipated expiration: 2037-04-10
Also published as: CN107146614B

Abstract

The embodiment of the invention discloses a kind of audio signal processing method, device and electronic equipment, methods described includes：It is in the electronic equipment under sleep state condition, receives voice signal, and judge whether the received corresponding interactive instruction of voice signal is to wake up instruction；If it is, working condition is switched to by sleep state, and it is user sound bearing to position the sound bearing of received voice signal；Continue to voice signal, and noise suppressed processing is carried out to the voice signal derived from the voice signal that continues to beyond user sound bearing, obtain user voice signal；Respond the corresponding interactive instruction of user voice signal.Electronic equipment carries out noise suppressed processing to the voice signal beyond user sound bearing in the voice signal that continues to, the user voice signal of acquisition is the voice signal that the user in user sound bearing sends, therefore it can correctly be responded, lift Consumer's Experience.

Description

A kind of audio signal processing method, device and electronic equipment

Technical field

The present invention relates to voice process technology field, more particularly to a kind of audio signal processing method, device and Electronic equipment.

Background technology

At present, in the market has increasing product to have voice interactive function, the electronics such as intelligent sound box, robot Equipment.These electronic equipments switch to working condition, and pass through microphone array after wake-up instruction is received from holding state Voice signal is received, that is, carries out pickup, and then the voice signal can be identified and be parsed, so as to respond the voice signal Corresponding interactive instruction.

The above-mentioned electronic equipment with voice interactive function passes through microphone array and receives week after wake-up instruction is received The voice signal that each sound source in collarette border is sent, the corresponding sound bearing of volume the maximum in these voice signals is identified as User sound bearing, the maximum voice signal of sound namely is considered as the voice signal that user sends, and then responds voice letter Number corresponding interactive instruction.

Under normal circumstances, Speech processing can be preferably carried out using aforesaid way, but if is deposited around user It is more than the sounding object of user's volume in one or more volumes, then should will be by with electronic equipment of voice interactive function The corresponding sound bearing of volume the maximum is identified as user sound bearing in the voice signal received, and to the maximum language of volume Message number is identified and parsing obtains interactive instruction, and then can carry out the response of mistake, causes Consumer's Experience not good.

The content of the invention

The embodiment of the invention discloses a kind of audio signal processing method, device and electronic equipment, to avoid response wrong By mistake, Consumer's Experience is lifted.Technical scheme is as follows：

In a first aspect, the embodiments of the invention provide a kind of audio signal processing method, applied to interactive voice work( The electronic equipment of energy, methods described includes：

It is in the electronic equipment under sleep state condition, receives voice signal, and judge received voice letter Whether number corresponding interactive instruction is to wake up instruction；

If it is, working condition is switched to by sleep state, and the sound bearing of the received voice signal of positioning is User sound bearing；

Continue to voice signal, and to being derived from the voice signal that continues to beyond the user sound bearing Voice signal carry out noise suppressed processing, obtain user voice signal；

Respond the corresponding interactive instruction of the user voice signal.

Optionally, the voice letter beyond the user sound bearing is derived from the voice signal that described pair continues to Number carry out noise suppressed processing, obtain user voice signal the step of, including：

Noise is carried out to deriving from the voice signal beyond the user sound bearing in the voice signal that continues to Suppression is handled, and carries out wave beam to the voice messaging number in the voice signal that continues to from the user sound bearing Enhancing is handled, and obtains the user voice signal.

Optionally, methods described also includes：

According to the user sound bearing instruction user orientation.

Optionally, methods described also includes：

Judge whether the corresponding interactive instruction of voice signal received from the user sound bearing is auditory localization side Formula conversion instruction；

If it is, voice signal is continued to, by the corresponding sound source side of volume the maximum in received voice signal Position is defined as user sound bearing, and volume the maximum in received voice signal is defined as into user voice signal, rings Answer the corresponding interactive instruction of the user voice signal.

Optionally, whether the corresponding interactive instruction of voice signal received by the judgement is to wake up the step of instructing, Including：

Whether judge the corresponding interactive instruction of each voice signal received in such a way is to wake up instruction：

Filtration treatment is carried out to targeted voice signal, frequency in the targeted voice signal is filtered out and belongs to predeterminated frequency section Voice signal, wherein, the targeted voice signal is：The voice signal received；

Whether judge the corresponding interactive instruction of the targeted voice signal after filtration treatment is to wake up instruction.

Optionally, the step of sound bearing of the voice signal received by the positioning is user sound bearing, including：

The sound bearing of received voice signal is positioned and recorded, Equations of The Second Kind sound bearing is used as；

According to first kind sound bearing and Equations of The Second Kind sound bearing positioning user sound bearing, wherein, described first In the case that class sound bearing is in sleep state for the electronic equipment, the received voice signal for positioning and recording Sound bearing, the corresponding interactive instruction of the voice signal does not instruct to wake up.

Optionally, the step that user sound bearing is positioned according to first kind sound bearing and the Equations of The Second Kind sound bearing Suddenly, including：

Judge in the Equations of The Second Kind sound bearing with the presence or absence of the sound bearing for being not belonging to the first kind sound bearing；

If it is, the sound bearing that the first kind sound bearing is not belonging in the Equations of The Second Kind sound bearing is orientated as User sound bearing.

Optionally, it is described to determine the sound bearing that the first kind sound bearing is not belonging in the Equations of The Second Kind sound bearing The step of position is user sound bearing, including：

Determine to be not belonging to the quantity of the sound bearing of first kind sound bearing in the Equations of The Second Kind sound bearing；

When identified quantity is more than 1, the corresponding sound bearing of voice signal of predeterminated frequency section is will not belong to, it is determined that For the user sound bearing.

Optionally, the corresponding sound bearing of voice signal that will not belong to the predeterminated frequency section, is defined as described The step of user sound bearing, including：

It is determined that being not belonging to the quantity of the corresponding sound bearing of voice signal of the predeterminated frequency section；

When identified quantity is more than 1, be not belonging to described in the voice signal of predeterminated frequency section, waveform with it is pre- If the corresponding sound bearing of voice signal that the similarity of waveform is more than the first preset value is defined as the user sound bearing.

Optionally, in the case of belonging to the first kind sound bearing in the Equations of The Second Kind sound bearing, methods described Also include：

Judge the energy differences of the first voice signal and the second voice signal in same sound bearing whether more than the Two preset values, wherein, first voice signal is that the electronic equipment is in the voice signal received during sleep state, institute State the voice signal that the second voice signal receives for the electronic equipment when in running order；

If it is, the corresponding Equations of The Second Kind sound bearing of second voice signal is defined as into the user sound bearing.

By in the Equations of The Second Kind sound bearing, the similarity of waveform and predetermined waveform is more than the voice signal of the first preset value Corresponding sound bearing is defined as the user sound bearing.

It is target sound source to determine the sound bearing that the first kind sound bearing is not belonging in the Equations of The Second Kind sound bearing Orientation；

According to the target sound source orientation, target zone [A, B] is determined, and the sound bearing in the target zone is true It is set to the user sound bearing, wherein, A is the difference in the target sound source orientation and the first pre-configured orientation difference, and B is described Target sound source orientation and the second pre-configured orientation difference plus and.

Second aspect, the embodiment of the present invention additionally provides a kind of speech signal processing device, applied to interactive voice The electronic equipment of function, described device includes：

Instruction judge module is waken up, for being in the electronic equipment under sleep state condition, voice signal is received, and Whether judge the received corresponding interactive instruction of voice signal is to wake up instruction；

Auditory localization module, for waking up the situation of instruction in the received corresponding interactive instruction of voice signal Under, working condition is switched to by sleep state, and it is user sound bearing to position the sound bearing of received voice signal；

User voice signal obtains module, for continuing to voice signal, and in the voice signal that continues to Voice signal beyond the user sound bearing carries out noise suppressed processing, obtains user voice signal；

First interactive instruction respond module, for responding the corresponding interactive instruction of the user voice signal.

Optionally, the user voice signal obtains module and included：

User voice signal obtains submodule, for deriving from user's sound source in the voice signal to continuing to Voice signal beyond orientation carries out noise suppressed processing, and to deriving from user's sound in the voice signal that continues to The voice messaging number in source orientation carries out wave beam enhancing processing, obtains the user voice signal.

Optionally, described device also includes：

User location indicating module, for according to the user sound bearing instruction user orientation.

Optionally, described device also includes：

Conversion instruction judge module, for the corresponding interaction of voice signal for judging to receive from the user sound bearing Whether instruction is auditory localization mode conversion instruction；

Second interactive instruction respond module, in the corresponding friendship of voice signal received from the user sound bearing In the case that mutually instruction is auditory localization mode conversion instruction, voice signal is continued to, by received voice signal The corresponding sound bearing of volume the maximum is defined as user sound bearing, and by volume the maximum in received voice signal It is defined as user voice signal, responds the corresponding interactive instruction of the user voice signal.

Optionally, the instruction judge module that wakes up includes：

Signal filter submodule and instruction judging submodule；

It is described to wake up instruction judge module, specifically for being sentenced by the signal filter submodule and instruction judging submodule Whether the disconnected corresponding interactive instruction of each voice signal received is to wake up instruction；

The signal filter submodule, for carrying out filtration treatment to targeted voice signal, filters out the target language message Frequency belongs to the voice signal of predeterminated frequency section in number, wherein, the targeted voice signal is：The voice letter received Number；

The instruction judging submodule, for whether judging the corresponding interactive instruction of the targeted voice signal after filtration treatment To wake up instruction.

Optionally, the auditory localization module includes：

Auditory localization submodule, the sound bearing for positioning and recording received voice signal, is used as Equations of The Second Kind Sound bearing；

User sound bearing determination sub-module, for according to first kind sound bearing and Equations of The Second Kind sound bearing positioning User sound bearing, wherein, the first kind sound bearing is that positioning is simultaneously in the case that the electronic equipment is in sleep state The sound bearing of the received voice signal of record, the corresponding interactive instruction of the voice signal does not instruct to wake up.

Optionally, user sound bearing determination sub-module includes：

Judging unit, the first kind sound bearing is not belonging to for judging to whether there is in the Equations of The Second Kind sound bearing Sound bearing；

User sound bearing determining unit, the first kind sound is not belonging to for existing in the Equations of The Second Kind sound bearing In the case of the sound bearing in source orientation, the sound source of the first kind sound bearing will be not belonging in the Equations of The Second Kind sound bearing Fixing by gross bearings is user sound bearing.

Optionally, user sound bearing determining unit includes：

Quantity determination subelement, the sound source for determining to be not belonging to first kind sound bearing in the Equations of The Second Kind sound bearing The quantity in orientation；

First orientation determination subelement, for when identified quantity is more than 1, will not belong to the voice of predeterminated frequency section The corresponding sound bearing of signal, is defined as the user sound bearing.

Optionally, the first orientation determination subelement, the voice of the predeterminated frequency section is not belonging to specifically for determination The quantity of the corresponding sound bearing of signal；When identified quantity is more than 1, by the language for being not belonging to the predeterminated frequency section In message number, waveform sound bearing corresponding with the voice signal that the similarity of predetermined waveform is more than the first preset value is defined as institute State user sound bearing.

Optionally, described device also includes：

Energy differences judge module, the feelings for belonging to the first kind sound bearing in the Equations of The Second Kind sound bearing Under condition, judge to be in the first voice signal of same sound bearing and whether the energy differences of the second voice signal are pre- more than second If value, wherein, first voice signal is the voice signal that the receives when electronic equipment is in sleep state, described the Two voice signals are the voice signal received when the electronic equipment is in running order；If it is, second voice is believed Number corresponding Equations of The Second Kind sound bearing is defined as the user sound bearing.

Optionally, described device also includes：

Waveform comparison module, for by the Equations of The Second Kind sound bearing, the similarity of waveform and predetermined waveform is more than the The corresponding sound bearing of voice signal of one preset value is defined as the user sound bearing.

Optionally, user sound bearing determination sub-module includes：

Target sound source orientation determination element, for determining to be not belonging to the first kind sound source in the Equations of The Second Kind sound bearing The sound bearing in orientation is target sound source orientation；

Second orientation determining unit, for according to the target sound source orientation, determining target zone [A, B], and will be described Sound bearing in target zone is defined as the user sound bearing, wherein, A is that the target sound source orientation is preset with first The difference of orientation difference, B be the target sound source orientation and the second pre-configured orientation difference plus and.

The third aspect, the embodiment of the present invention additionally provides a kind of electronic equipment, and the electronic equipment includes：Housing, processing Device, memory, circuit board and power circuit, wherein, circuit board is placed in the interior volume that housing is surrounded, processor and memory Set on circuit boards；Power circuit, for being powered for each circuit or device of electronic equipment；Memory is used to store and can hold Line program code；The executable program code that processor is stored by reading in memory is run and executable program code pair The program answered, for performing above-mentioned audio signal processing method.

In the scheme that the embodiment of the present invention is provided, the electronic equipment with voice interactive function is in sleep state feelings Under condition, voice signal is received, and judges whether the received corresponding interactive instruction of voice signal is to wake up instruction, if It is that working condition is switched to by sleep state, and it is user sound bearing to position the sound bearing of received voice signal, Then proceed to receive voice signal, and believe deriving from the voice beyond user sound bearing in the voice signal that continues to Number noise suppressed processing is carried out, obtain user voice signal, and then respond the corresponding interactive instruction of user voice signal.Electronics is set The standby sound bearing that will be waken up corresponding to instruction is defined as user sound bearing, and to the party in the voice signal that continues to Voice signal beyond position carries out noise suppressed processing, and the user voice signal of acquisition is the user in user sound bearing The voice signal sent, therefore can correctly be responded, lift Consumer's Experience.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

A kind of flow chart for audio signal processing method that Fig. 1 is provided by the embodiment of the present invention；

A kind of structural representation for speech signal processing device that Fig. 2 is provided by the embodiment of the present invention；

The structural representation for a kind of electronic equipment that Fig. 3 is provided by the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

In order to avoid response mistake, lifted Consumer's Experience, the embodiments of the invention provide a kind of audio signal processing method, Device and electronic equipment.

A kind of audio signal processing method provided first below the embodiment of the present invention is introduced.

Firstly the need of explanation, a kind of audio signal processing method that the embodiment of the present invention is provided can apply to tool There is the electronic equipment (hereinafter referred to as electronic equipment) of voice interactive function, for example, intelligent sound box, robot etc..The electronic equipment Typically there is microphone array, or communication connection is set up with microphone array, the communication connection can be wired connection or wireless Connection, wherein, wireless connection can be WIFI connections, bluetooth connection etc..The microphone array is used to receive voice signal.

As shown in figure 1, a kind of audio signal processing method, described applied to the electronic equipment with voice interactive function Method includes：

S101, is under sleep state condition in the electronic equipment, receives voice signal, and judge received language Whether the corresponding interactive instruction of message number is to wake up instruction, if it is, performing step S102；

For certain angle, the state of electronic equipment can be divided into：Sleep state and working condition, work as electronic equipment During in sleep state, wake instruction electronic equipment need to be waken up by receiving, and then switch to working condition.In addition, working as electronics When equipment is in sleep state, it can still continue to receive the voice signal that the sound source in surrounding environment is sent, it is, electronics When equipment is in sleep state, microphone array still works.Now the electronic equipment can receive voice signal, to determine to be It is no to receive wake-up instruction.

Electronic equipment is received after one section of voice signal, that is, starts to carry out voice knowledge to this section of voice signal received Not, whether judge the corresponding interactive instruction of this section of voice signal received is to wake up instruction.If specifically, this section of language The voice identification result of message number includes default wake-up word, then the corresponding interactive instruction of this section of voice signal is to wake up Instruction.That is, electronic equipment is received after voice signal, speech recognition can be carried out to the voice signal, obtain voice Recognition result, and then just may determine that in the voice identification result of the voice signal whether include default wake-up word.

It should be noted that electronic equipment is received after voice signal, it can know in the voice for locally carrying out voice signal Not, voice identification result is obtained, the voice signal can also be sent to server, server is received after the voice signal, Just speech recognition can be carried out to the voice signal, obtains voice identification result, and voice identification result is sent to electronics set It is standby, electronic equipment also with regard to the voice identification result can be obtained, and then, in the voice identification result that just may determine that the voice signal Whether default wake-up word is included.

For example, if default wake-up word is " small refined ", then if the voice signal pair that electronic equipment is received The voice identification result answered includes " small refined " two words, then the corresponding interactive instruction of the voice signal is to wake up instruction； If being not include other languages of " small refined " two words in the corresponding voice identification result of voice signal that electronic equipment is received Sentence, or without any semantic voice signal, the voice signal that such as air-conditioning is sent, then the voice signal is corresponding Interactive instruction is not just to wake up instruction.

S102, switches to working condition, and position the sound bearing of received voice signal to use by sleep state Family sound bearing；

Electronic equipment judges that the received corresponding interactive instruction of voice signal, to wake up during instruction, illustrates now to use Family have issued voice signal to wake up electronic equipment, so that electronic equipment can carry out interactive voice with user, realize function, electricity Sub- equipment just needs to switch to working condition by sleep state.

At the same time, electronic equipment can position the sound bearing of received voice signal, and the sound bearing is true It is set to user sound bearing.It should be noted that the positioning method of the sound bearing of voice signal can use Time-delay Prediction method Deng auditory localization mode, that is to say, that can reach the time in microphone array at each microphone to determine according to voice signal The sound bearing of position voice signal, is not specifically limited and illustrates herein.

If it is understood that electronic equipment judges that the interactive instruction corresponding to received voice signal is not When waking up instruction, then just will not change to working condition, but continue to voice signal in a sleep state, and continue to sentence Whether the received corresponding interactive instruction of voice signal of breaking is to wake up instruction.

S103, continues to voice signal, and to deriving from the user sound source side in the voice signal that continues to Voice signal beyond position carries out noise suppressed processing, obtains user voice signal；

Determine behind user sound bearing, because user typically may proceed to send voice signal, so electronic equipment can be with Continue to voice signal, and the voice signal derived from the voice signal that continues to beyond user sound bearing is entered The processing of row noise suppressed, obtains user voice signal.

It is understood that because other orientation beyond user sound bearing also likely to be present sound source, i.e. noise source, These noise sources may also can send voice signal, and electronic equipment will also receive the voice letter that these noise sources are sent Number, in order to preferably receive the voice signal that user is sent, that is, the voice signal from user sound bearing, Electronic equipment can carry out noise suppressed processing to the voice signal beyond user sound bearing, to weaken from use The energy of voice signal beyond the sound bearing of family, and then obtain user voice signal.

It should be noted that above-mentioned noise suppressed processing can use existing any noise suppressed processing mode, such as Can be the modes such as end-point detection, noise separation, frequency spectrum filtering, as long as can reach weakened the energy of voice signal Purpose, is not specifically limited herein.

S104, responds the corresponding interactive instruction of the user voice signal.

Obtain after user voice signal, electronic equipment just can respond the corresponding interactive instruction of the user voice signal. Electronic equipment can respond the interactive instruction by diversified forms such as speech plays, if electronic equipment has display screen, that The interactive instruction can also be responded by display screen, this is all rational.

For example, if the corresponding interactive instruction of user voice signal is certain music of broadcasting, then electronic equipment just may be used The music sources locally preserved with acquisition, or the music sources are asked to server, and then play out.If user speech The corresponding interactive instruction of signal is inquiry weather condition today, then electronic equipment just can ask weather resource to server, And then weather condition is informed into user with forms such as speech plays, and then complete the response of interactive instruction.

It can be seen that, in the scheme that the embodiment of the present invention is provided, the electronic equipment with voice interactive function is in sleep Under state status, voice signal is received, and judges whether the received corresponding interactive instruction of voice signal is to wake up instruction, If it is, working condition is switched to by sleep state, and it is user's sound source to position the sound bearing of received voice signal Orientation, then proceedes to receive voice signal, and to being derived from the voice signal that continues to beyond user sound bearing Voice signal carries out noise suppressed processing, obtains user voice signal, and then respond the corresponding interactive instruction of user voice signal. The sound bearing waken up corresponding to instruction is defined as user sound bearing, and the voice signal to continuing to by electronic equipment In voice signal beyond the orientation carry out noise suppressed processing, the user voice signal of acquisition is to be in user sound bearing The voice signal that sends of user, therefore can correctly be responded, lift Consumer's Experience.

As a kind of embodiment of the embodiment of the present invention, derived from the voice signal that described pair continues to described Voice signal beyond user sound bearing carries out noise suppressed processing, and the step of obtaining user voice signal can include：

In order that the user voice signal obtained is stronger, so that electronic equipment more accurately responds user voice signal correspondence Interactive instruction, electronic equipment in the voice signal to continuing to derive from user sound bearing beyond voice signal , can also be to deriving from the voice of user sound bearing in the voice signal that continues to while carrying out noise suppressed processing Information number carries out wave beam enhancing processing, to increase the energy of the voice signal from user sound bearing, such electronic equipment Voice signal after can wave beam enhancing be handled more accurately is known as user voice signal to the user voice signal Other and parsing, obtains correct interactive instruction, and then, properly respond to the interactive instruction.

It should be noted that above-mentioned wave beam enhancing processing can strengthen processing mode using existing any wave beam, such as The modes such as separation, the formation of diagonal loading algorithm, adaptive velocity of wave can be extracted for voice, as long as can reach voice signal Energy carries out enhanced purpose, is not specifically limited herein.

As a kind of embodiment of the embodiment of the present invention, the above method can also include：

According to the user sound bearing instruction user orientation.

User checks active user sound bearing for convenience, and electronic equipment can be according to user sound bearing instruction user Orientation.In one embodiment, electronic equipment can by the way of indicator lamp instruction user orientation, for example, user's sound source Orientation is 45 degree of orientation, then electronic equipment just can light indicator lamp in 45 degree of orientation.In another embodiment, such as Fruit electronic equipment has display screen, then electronic equipment can also show user sound bearing on the display screen, or, Indicator lamp is shown on display screen, this is all rational.In another embodiment, if electronic equipment is the tool such as robot Have the electronic equipment of movable part, then electronic equipment can also using rotate head, brandish the modes such as arm indicate use Family orientation.

Judge whether the corresponding interactive instruction of voice signal received from the user sound bearing is auditory localization side Formula conversion instruction；If it is, voice signal is continued to, by the corresponding sound source of volume the maximum in received voice signal Orientation is defined as user sound bearing, and volume the maximum in received voice signal is defined as into user voice signal, Respond the corresponding interactive instruction of the user voice signal.

Because the application scenarios of electronic equipment can change, when electronic equipment is used to respond the friendship that multiple users send Mutually during instruction, for the interactive instruction that the plurality of user of response more accurately sends, electronic equipment is being received from use During the voice signal of family sound bearing, it can be determined that whether the corresponding interactive instruction of the voice signal is that auditory localization mode is changed Instruction, if it is then explanation user have issued auditory localization mode conversion instruction, to indicate that the application scenarios of electronic equipment change Become, then electronic equipment just needs to respond the auditory localization mode conversion instruction, that is, changes the mode of auditory localization.

Specifically, if the corresponding interactive instruction of the voice signal is auditory localization mode conversion instruction, then electronics Equipment just continues to voice signal, while changing the mode of auditory localization, the auditory localization mode after conversion is：It will be connect The corresponding sound bearing of volume the maximum is defined as user sound bearing in the voice signal received, it is to be understood that so Volume the maximum in received voice signal now just can be defined as user voice signal by electronic equipment, and then, ring Should the corresponding interactive instruction of user voice signal.So, when the multiple users for being in different azimuth send interactive instruction, electricity Sub- equipment can receive the interactive instruction that each user sends, rather than regard a fixed orientation as user sound source side Position.

Certainly, after the auditory localization mode after using conversion positions user's auditory localization, electronic equipment can be to continuing In the voice signal received derive from user sound bearing beyond voice signal carry out noise suppressed processing, can also to after Voice messaging number in the voice signal that continued access is received from user sound bearing carries out wave beam enhancing processing, and then is used Family voice signal, this is all rational.

By electronic equipment judges whether the corresponding interactive instruction of each voice signal received is to wake up the mistake instructed Journey is the same, so, as a kind of embodiment of the embodiment of the present invention, the voice signal correspondence received by the judgement Interactive instruction whether be wake up instruction the step of, including：

It is understood that there may be multi-acoustical in environment where electronic equipment, then electronic equipment also will The voice signal that each sound source in surrounding environment is sent is received, if for example, electronic equipment is positioned in home environment, then The electronic equipment is likely to be received the voice signal that multi-acoustical is sent, for example, what the home appliance such as television set, refrigerator was sent Voice signal, or the voice signal transmitted outside window etc..There may be some voices letter in the voice signal that these sound sources are sent Number frequency be not belonging to the frequency range of the voice signal that people is sent, so in order to filter out the speech-like signal, and more accurate Ground positioning user sound bearing, electronic equipment can carry out filtration treatment to each voice signal received.

Specifically, the frequency range for the sound that people sends generally 100-20000Hz, then be not belonging to the frequency range Interior voice signal is not the voice signal that people is sent, then it is voice signal that user sends also to be impossible to, so, In order to effectively remove voice signal that some are not belonging in the range of the voice signal frequency that user is sent to positioning user's sound source The harmful effect in orientation, before judging whether the received corresponding interactive instruction of voice signal is wake-up instruction, electronics is set It is standby to carry out filtration treatment to targeted voice signal, filter out the voice letter that frequency in targeted voice signal belongs to predeterminated frequency section Number, whether then judge the corresponding interactive instruction of the targeted voice signal after filtration treatment again is to wake up instruction, wherein, the target What voice signal was referred to is the voice signal that electronic equipment is received in a sleep state.

Above-mentioned predeterminated frequency section can be the one or more frequency bands being not belonging in the audio frequency range that people sends, can Frequency section is thought, for example, can be 0-100Hz；Can also be higher frequency section, such as 20000-40000Hz, certainly Frequency section and higher frequency section can also be included, this is all rational.

Often belong to the voice signal of predeterminated frequency section, such as one in the presence of some frequencies in the use environment of electronic equipment A little bass stereo sets, the frequency of its voice signal sent is generally tens hertz, hence it is evident that be not belonging to the voice that people sends The frequency range of signal, so can filter out the speech-like signal using above-mentioned filtration treatment mode, reduces follow-up positioning user The workload of sound bearing, while making user's auditory localization more accurate.

It is real as one kind of the embodiment of the present invention for electronic equipment has multi-acoustical in the environment Apply mode, the step of sound bearing of the voice signal received by the positioning is user sound bearing, including：

In the case where electronic equipment is in sleep state, judge that received voice signal does not trigger wake-up instruction When, electronic equipment can be positioned and record the sound bearing of the voice signal, and the embodiment of the present invention is described for convenience and is provided Scheme, regard the sound bearing of the voice signal as first kind sound bearing.

Because now electronic equipment is in sleep state, and the received corresponding interactive instruction of voice signal is not called out Wake up and instruct, it is possible to which understanding, the voice signal that now electronic equipment is received is the voice signal that noise sound source is sent, It is not the voice signal that user sends, would not triggers electronic equipment yet and handle the voice signal, then electronic equipment can be with Record, that is, remember as the orientation of noise sound source using the sound bearing of these voice signals as first kind sound bearing Record is got off, and continues to be connected to voice signal.

When electronic equipment judges the received corresponding interactive instruction of voice signal to wake up instruction, electronic equipment The sound bearing for the voice signal being currently received can be positioned, and is recorded the sound bearing as Equations of The Second Kind sound bearing Come.

Electronic equipment have recorded behind above-mentioned first kind sound bearing and Equations of The Second Kind sound bearing, just can be according to the first kind Sound bearing and Equations of The Second Kind sound bearing positioning user sound bearing.The voice letter that electronic equipment is received in a sleep state Number it is probably change, that is to say, that over time, may have some sound sources no longer to send voice signal, and can The sound source for not sending voice signal before some can be had sends voice signal.

For example, electronic equipment in a sleep state when, may there is TV, air-conditioning sending voice signal, when having crossed one section Between, TV may be closed, then the first kind sound bearing corresponding to TV is also just not present, and a period of time has been spent again, Computer may be unlocked, and play music, then the sound bearing corresponding to computer is just occurred in that in first kind sound bearing.Again For example, electronic equipment in a sleep state when, may at a time, a people somewhere have issued voice signal, but should The corresponding interactive instruction of voice signal is not to wake up instruction, and electronic equipment does not switch to working condition by sleep state, then Electronic equipment will spend a period of time, the people is not by the azimuth recording where the people in first kind sound bearing at this moment Voice signal is sent again, so, first kind sound bearing is probably to change over time.

Longer moment corresponding first kind sound before at the time of switching to working condition by sleep state due to electronic equipment The otherness of source orientation and Equations of The Second Kind sound bearing may be larger, then for easier and positioning user sound source side exactly Position, preset time period before the working condition moment can be switched to by sleep state using Equations of The Second Kind sound bearing and electronic equipment Interior first kind sound bearing, to determine ownership goal sound bearing.Wherein, the preset time period can be by people in the art Member determines according to practical factors such as the usage scenarios of electronic equipment, for example, can be 2 seconds, 3 seconds or 5 seconds etc., do specific herein Limit.

In one embodiment, user sound bearing is positioned according to first kind sound bearing and Equations of The Second Kind sound bearing Mode can be：Judge in the Equations of The Second Kind sound bearing with the presence or absence of the sound source side for being not belonging to the first kind sound bearing Position；If it is, orientating the sound bearing that first kind sound bearing is not belonging in Equations of The Second Kind sound bearing as user sound bearing.

If being not belonging to the sound bearing of first kind sound bearing it is understood that existing in Equations of The Second Kind sound bearing, The sound bearing that first kind sound bearing is so not belonging in the Equations of The Second Kind sound bearing is：In electronic equipment by sleep state Switch to what is positioned during working condition, and be not belonging to the sound bearing of first kind sound bearing, then just can determine the sound Source orientation is the sound bearing that the corresponding interactive instruction that user sends is the voice signal for waking up instruction, then the sound bearing As user sound bearing.

For example, electronic equipment is switched to the first kind before the working condition moment in preset time period by sleep state Sound bearing is 3, is respectively：0 degree, 30 degree and 90 degree orientation, when electronic equipment switches to working condition by sleep state, note Second sound bearing of record is 4, is respectively：0 degree, 30 degree, 60 degree and 90 degree orientation, it is clear that 60 degree of sound bearings are in electricity Sub- equipment emerging sound bearing when switching to working condition by sleep state, and now electronic equipment is just received Corresponding interactive instruction is wakes up the voice signal of instruction, then just can determine that 60 degree of sound bearings are that user sends Corresponding interactive instruction for wake up instruction voice signal sound bearing, that is, user sound bearing.

It is described that described will be not belonging in the Equations of The Second Kind sound bearing as a kind of embodiment of the embodiment of the present invention The step of sound bearing of one class sound bearing orientates user sound bearing as, can include：

Determine to be not belonging to the quantity of the sound bearing of first kind sound bearing in the Equations of The Second Kind sound bearing；When being determined Quantity when being more than 1, will not belong to the corresponding sound bearing of voice signal of predeterminated frequency section, be defined as user's sound Source orientation.

In some cases, electronic equipment is the same of the voice signal of wake-up instruction receiving corresponding interactive instruction When, it is understood that there may be another or multi-acoustical orientation are not belonging to other sound sources of first kind sound bearing, these other sound sources It has issued voice signal, then electronic equipment will also receive these voice signals.For example, sending corresponding interaction in user While instructing the voice signal to wake up instruction, bass stereo set is unlocked, and sends voice signal, then electronic equipment is just The voice signal that the voice signal and bass stereo set that user sends are sent can be received, it is clear that the two voice signals Sound bearing is not admitted to first kind sound bearing, so, the sound of first kind sound bearing is not belonging in Equations of The Second Kind sound bearing The quantity in source orientation is at this moment just to be multiple.

In this case, in order to position user sound bearing exactly, electronic equipment can determine Equations of The Second Kind sound first The quantity of the sound bearing of first kind sound bearing is not belonging in the orientation of source, if identified quantity is more than 1, illustrates now the The quantity for the sound bearing for being not belonging to first kind sound bearing in two class sound bearings is multiple, then electronic equipment just can be by The corresponding sound bearing of voice signal for being not belonging to predeterminated frequency section is defined as user sound bearing.

For example, while user sends voice signal of the corresponding interactive instruction to wake up instruction, bass sound equipment Equipment is unlocked, and sends voice signal, then electronic equipment will receive the voice signal and bass sound equipment that user sends and set The voice signal that preparation goes out, electronic equipment can determine to be not belonging to the sound source side of first kind sound bearing in Equations of The Second Kind sound bearing The quantity of position is 2, it is clear that be greater than 1, then electronic equipment just can will not belong to the voice signal correspondence of predeterminated frequency section Sound bearing, be defined as user sound bearing, the frequency of the voice signal sent due to bass stereo set belong to one it is solid Fixed Frequency scope, then predeterminated frequency section is set as into the Frequency scope, just can be exactly by bass sound equipment Sound bearing where equipment is excluded, and then, electronic equipment just can accurately determine out user sound bearing.

As a kind of embodiment of the embodiment of the present invention, the voice signal pair that will not belong to the predeterminated frequency section The sound bearing answered, the step of being defined as the user sound bearing can include：

It is determined that being not belonging to the quantity of the corresponding sound bearing of voice signal of the predeterminated frequency section；When identified quantity During more than 1, it is not belonging to described in the voice signal of the predeterminated frequency section, the similarity of waveform and predetermined waveform is more than first The corresponding sound bearing of voice signal of preset value is defined as the user sound bearing.

Due to the quantity of the corresponding sound bearing of voice signal that in some cases, is not belonging to above-mentioned predeterminated frequency section It is probably to be more than 1, that is to say, that, it is understood that there may be multiple corresponding sound source sides of voice signal for being not belonging to above-mentioned predeterminated frequency section Position, then now in order to accurately determine user sound bearing, electronic equipment can further pass through the waveform comparison of voice signal To determine user sound bearing.

The sound bearing waken up corresponding to instruction is sent it is understood that user sound bearing is user, then on The waveform for waking up the corresponding voice signal of word can be thought by stating predetermined waveform, so, be more than the with the similarity of the predetermined waveform The waveform of one preset value is clearly with waking up the wave-form similarity of the corresponding voice signal of word very high waveform, then also just explanation The corresponding interactive instruction of the voice signal is probably wake-up instruction, then i.e. the sound bearing user of the voice signal Sound bearing.Wherein, the first preset value can be by those skilled in the art's sound according to present in the usage scenario of electronic equipment Source sends the factors such as the wave characteristics of voice signal and set, and is not specifically limited herein.

For example, while user sends voice signal of the corresponding interactive instruction to wake up instruction, also having other human hairs Go out voice signal, then electronic equipment will receive the voice signal that user sends and the voice signal that other people send, its The frequency for the voice signal that other people send also is not belonging to predeterminated frequency section, and electronic equipment can determine to be not belonging to above-mentioned predeterminated frequency The quantity of the corresponding sound bearing of voice signal of section is multiple, it is clear that be greater than 1, then, electronic equipment just can be by this The waveform of multiple voice signals waveform corresponding with default wake-up word is compared, and similarity is higher than the voice of the first preset value The sound bearing of signal, that is, user sound bearing.It can be seen that, can be more accurate by the voice signal waveform comparison mode Ground determines user sound bearing.

It should be noted that the sound bearing of first kind sound bearing is not belonging in Equations of The Second Kind sound bearing is determined When quantity is more than 1, above-mentioned voice signal waveform comparison mode can also be first passed through, by the waveform higher with predetermined waveform similarity The sound bearing of corresponding voice signal is determined, if the quantity determined is still above 1, then just can be further The corresponding sound bearing of voice signal of above-mentioned predeterminated frequency section is will not belong to, is defined as the user sound bearing, this is also Reasonably.

As a kind of embodiment of the embodiment of the present invention, the first kind sound is belonged in the Equations of The Second Kind sound bearing In the case of the orientation of source, the above method can also include：

Judge the energy differences of the first voice signal and the second voice signal in same sound bearing whether more than the Two preset values；If it is, the corresponding Equations of The Second Kind sound bearing of second voice signal is defined as the user sound bearing, its In, first voice signal is that the electronic equipment is in the voice signal received during sleep state, second voice Signal is the voice signal received when the electronic equipment is in running order.

When sending voice signal of the corresponding interactive instruction to wake up instruction due to user, it may be in and first kind sound In the orientation of source in some sound bearing identical orientation, then the Equations of The Second Kind sound bearing that now electronic equipment is oriented will go out The situation of first kind sound bearing is now belonged to, in this case, in order to accurately make user sound bearing, electronics is set Whether the energy differences of standby the first voice signal that may determine that in same sound bearing and the second voice signal are more than second Preset value.Wherein, the energy of voice signal can be characterized by volume, frequency, wave character etc., be not specifically limited herein.

It should be noted that describing for convenience, what above-mentioned first voice signal was referred to is that electronic equipment is in sleep shape Received voice signal during state, i.e. its corresponding sound bearing first kind sound bearing, above-mentioned second voice signal What is referred to is voice signal received when electronic equipment is in running order, i.e. its corresponding sound bearing Equations of The Second Kind Sound bearing.Explanation is needed further exist for, above-mentioned second preset value can be by those skilled in the art according to electronic equipment The factors such as the energy for the voice signal that sound source present in usage scenario is sent are set, and are not specifically limited herein.

If it is pre- that the energy differences of the first voice signal and the second voice signal in same sound bearing are more than second If value, then it is most likely not the voice signal that same sound source is sent to illustrate the first voice signal and the second voice signal.Lift For example, if the first voice signal and the second voice signal are all the voice signals that refrigerator is sent, then the energy of the two Difference is very small, would not also be more than the second preset value；If the voice signal that the first voice signal, which is refrigerator, to be sent, Second voice signal is the voice signal that user sends, then the energy differences of the two are usually that, than larger, will also be more than Second preset value.So when the energy differences of the first voice signal and the second voice signal in same sound bearing are more than the During two preset values, the corresponding Equations of The Second Kind sound bearing of second voice signal just can be defined as user sound source side by electronic equipment Position.

In the case of belonging to first kind sound bearing in Equations of The Second Kind sound bearing, electronic equipment can also be believed by voice The mode of number waveform comparison determines user sound bearing, and specific implementation is similar with above-mentioned waveform comparison mode, related part The explanation of above-mentioned waveform comparison mode part is may refer to, be will not be repeated here.

If it should be noted that the energy of above-mentioned the first voice signal in same sound bearing and the second voice signal It is multiple that difference, which is measured, more than the second voice signal of the second preset value, then can also further pass through relatively more the plurality of second language The waveform of message number and the similarity of predetermined waveform determine user sound bearing, and embodiment may refer to above-mentioned voice The explanation of signal waveform manner of comparison part, will not be repeated here.

It is understood that user is during voice signal is sent, it may change in a small range residing for oneself Position, then the sound bearing of its voice signal sent will also change therewith, in order to also can in this case Receive with carrying out voice signal exactly, electronic equipment can will be not belonging to first kind sound bearing in Equations of The Second Kind sound bearing Sound bearing is defined as target sound source orientation, then according to the target sound source orientation, determines target zone [A, B], and by the mesh Sound bearing in the range of mark is defined as user sound bearing.

Wherein, A can be target sound source orientation and the difference of the first pre-configured orientation difference, and B can be target sound source orientation With the second pre-configured orientation difference plus and.The first pre-configured orientation difference and the second pre-configured orientation difference can be with equal, can also Unequal, its specific value can be entered by those skilled in the art according to the usage scenario of electronic equipment and the active situation of user Row setting, for example, can be 10 degree, 15 degree, 30 degree etc., be not specifically limited herein.

In one embodiment, the first pre-configured orientation difference can be with equal, for example, user with the second pre-configured orientation difference Sound bearing is 60 degree of orientation, and the first pre-configured orientation difference and the second pre-configured orientation difference are 30 degree, then electronic equipment is just (60-30=30) can be spent to the sound bearing in the range of (60+30=90) degree and be defined as final user sound bearing.When So, in another embodiment, the first pre-configured orientation difference can be with unequal, for example, user with the second pre-configured orientation difference Sound bearing is 60 degree of orientation, and the first pre-configured orientation difference is 10 degree, and the second pre-configured orientation difference is 15 degree, then electronic equipment Just (60-10=50) can be spent to the sound bearing in the range of (60+15=75) degree and is defined as final user sound bearing, This is all rational.

Corresponding to above method embodiment, the embodiment of the present invention additionally provides a kind of speech signal processing device, below it is right A kind of speech signal processing device that the embodiment of the present invention is provided is introduced.

As shown in Fig. 2 a kind of speech signal processing device, described applied to the electronic equipment with voice interactive function Device includes：

Instruction judge module 210 is waken up, for being in the electronic equipment under sleep state condition, voice letter is received Number, and judge whether the received corresponding interactive instruction of voice signal is to wake up instruction；

Auditory localization module 220, for waking up the feelings of instruction in the received corresponding interactive instruction of voice signal Under condition, working condition is switched to by sleep state, and it is user sound source side to position the sound bearing of received voice signal Position；

User voice signal obtains module 230, for continuing to voice signal, and the voice signal to continuing to In derive from the user sound bearing beyond voice signal carry out noise suppressed processing, obtain user voice signal；

First interactive instruction respond module 240, for responding the corresponding interactive instruction of the user voice signal.

As a kind of embodiment of the embodiment of the present invention, the user voice signal, which obtains module 230, to be included：

User voice signal obtains submodule (not shown in Fig. 2), for being originated in the voice signal to continuing to Voice signal beyond the user sound bearing carries out noise suppressed processing, and the voice signal to continuing to The voice messaging number for coming from the user sound bearing carries out wave beam enhancing processing, obtains the user voice signal.

Electronic equipment enters traveling wave to the voice messaging number in the voice signal that continues to from user sound bearing Shu Zengqiang processing, increase derives from the energy of the voice messaging number of user sound bearing, and such electronic equipment can increase wave beam Voice signal after the reason of strength carries out more accurately parsing identification to user voice signal, obtained just as user voice signal True interactive instruction, and then, properly respond to the interactive instruction.

As a kind of embodiment of the embodiment of the present invention, described device can also include：

User location indicating module (not shown in Fig. 2), for according to the user sound bearing instruction user orientation.

Electronic equipment can facilitate user to check active user's sound source according to user sound bearing instruction user orientation Orientation.

Conversion instruction judge module (not shown in Fig. 2), for the voice for judging to receive from the user sound bearing Whether the corresponding interactive instruction of signal is auditory localization mode conversion instruction；

Second interactive instruction respond module (not shown in Fig. 2), in the language received from the user sound bearing In the case that the corresponding interactive instruction of message number is auditory localization mode conversion instruction, voice signal is continued to, will be received To voice signal in the corresponding sound bearing of volume the maximum be defined as user sound bearing, and received voice is believed Volume the maximum is defined as user voice signal in number, responds the corresponding interactive instruction of the user voice signal.

Electronic equipment is when receiving from the voice signal of user sound bearing, it can be determined that voice signal correspondence Interactive instruction whether be auditory localization mode conversion instruction, turn if it is then explanation user have issued auditory localization mode Instruction is changed, to indicate that the application scenarios of electronic equipment are changed, then electronic equipment just can respond the auditory localization mode and turn Instruction is changed, to respond the interactive instruction that multiple users send, it is possible to the interactive instruction of response more accurately.

As a kind of embodiment of the embodiment of the present invention, the wake-up instruction judge module 210 can include：

Signal filter submodule (not shown in Fig. 2) and instruction judging submodule (not shown in Fig. 2)；

It is described to wake up instruction judge module 210, specifically for judging submodule by the signal filter submodule and instruction Block judges whether the corresponding interactive instruction of each voice signal received is to wake up instruction；

As a kind of embodiment of the embodiment of the present invention, the auditory localization module 220 can include：

Auditory localization submodule (not shown in Fig. 2), the sound source side for positioning and recording received voice signal Position, is used as Equations of The Second Kind sound bearing；

User sound bearing determination sub-module (not shown in Fig. 2), for according to first kind sound bearing and described second Class sound bearing positioning user sound bearing, wherein, the first kind sound bearing is that the electronic equipment is in sleep state In the case of, the sound bearing for the received voice signal for positioning and recording, the corresponding interactive instruction of the voice signal is not To wake up instruction.

When electronic equipment is in and existed in the environment of multi-acoustical, pass through first kind sound bearing and Equations of The Second Kind sound source side Position can be accurately positioned user sound bearing.

As a kind of embodiment of the embodiment of the present invention, user sound bearing determination sub-module can include：

Judging unit (not shown in Fig. 2), it is described with the presence or absence of being not belonging in the Equations of The Second Kind sound bearing for judging The sound bearing of first kind sound bearing；

User sound bearing determining unit (not shown in Fig. 2), does not belong to for existing in the Equations of The Second Kind sound bearing In the case of the sound bearing of the first kind sound bearing, the first kind will be not belonging in the Equations of The Second Kind sound bearing Orientate user sound bearing as in the sound bearing of sound bearing.

Because the sound bearing that first kind sound bearing is not belonging in Equations of The Second Kind sound bearing is：In electronic equipment by sleeping Dormancy state switches to what is positioned during working condition, and is not belonging to the sound bearing of first kind sound bearing, then just can be true The fixed sound bearing is that the corresponding interactive instruction that user sends is the sound bearing for the voice signal for waking up instruction, then just may be used To be accurately positioned user sound bearing.

As a kind of embodiment of the embodiment of the present invention, user sound bearing determining unit can include：

Quantity determination subelement (not shown in Fig. 2), for determining to be not belonging to the first kind in the Equations of The Second Kind sound bearing The quantity of the sound bearing of sound bearing；

First orientation determining unit (not shown in Fig. 2), for when identified quantity is more than 1, will not belong to preset The corresponding sound bearing of voice signal of frequency band, is defined as the user sound bearing.

The frequency of the voice signal sent by the equipment that bass sound equipment etc. can make a noise typically belong to one it is solid Fixed frequency range, then predeterminated frequency section is set as into the fixed frequency scope, electronic equipment can will not belong to default frequency The corresponding sound bearing of voice signal of rate section, is defined as user sound bearing, so can will belong to predeterminated frequency exactly The sound bearing of the voice signal of section is excluded, and then, electronic equipment just can accurately determine out user sound bearing.

As a kind of embodiment of the embodiment of the present invention, the first orientation determination subelement specifically can be used for really Surely it is not belonging to the quantity of the corresponding sound bearing of voice signal of the predeterminated frequency section；, will when identified quantity is more than 1 Described to be not belonging in the voice signal of the predeterminated frequency section, the similarity of waveform and predetermined waveform is more than the language of the first preset value The corresponding sound bearing of message number is defined as the user sound bearing.

By will not belong in the voice signal of predeterminated frequency section, the judgement of the similarity of waveform and predetermined waveform can be with Be not belonging to predeterminated frequency section voice signal for it is multiple when, be accurately positioned user sound bearing.

Energy differences judge module (not shown in Fig. 2), for belonging to described first in the Equations of The Second Kind sound bearing In the case of class sound bearing, the energy differences of the first voice signal and the second voice signal in same sound bearing are judged Whether the second preset value is more than, wherein, the first voice signal electronic equipment is received when being in sleep state Voice signal, second voice signal is the voice signal received when the electronic equipment is in running order；If it is, The corresponding Equations of The Second Kind sound bearing of second voice signal is defined as the user sound bearing.

When sending voice signal of the corresponding interactive instruction to wake up instruction due to user, it may be in and first kind sound In the orientation of source in some sound bearing identical orientation, then the Equations of The Second Kind sound bearing that now electronic equipment is oriented will go out The situation of first kind sound bearing is now belonged to, in this case, if the first voice signal in same sound bearing It is more than the second preset value with the energy differences of the second voice signal, then illustrate that the first voice signal and the second voice signal very may be used It can not be the voice signal that same sound source is sent.So when the first voice signal and the second voice in same sound bearing When the energy differences of signal are more than the second preset value, electronic equipment just can be by the corresponding Equations of The Second Kind sound source of second voice signal Orientation is defined as user sound bearing.

Waveform comparison module (not shown in Fig. 2), for by the Equations of The Second Kind sound bearing, waveform and predetermined waveform The corresponding sound bearing of voice signal that similarity is more than the first preset value is defined as the user sound bearing.

, can be with by by the judgement of the waveform of the corresponding voice signal in Equations of The Second Kind sound bearing and the similarity of predetermined waveform In the case of belonging to first kind sound bearing in Equations of The Second Kind sound bearing, user sound bearing is accurately positioned.

Target sound source orientation determination element (not shown in Fig. 2), for determining to be not belonging in the Equations of The Second Kind sound bearing The sound bearing of the first kind sound bearing is target sound source orientation；

Second orientation determining unit (not shown in Fig. 2), for according to the target sound source orientation, determining target zone [A, B], and the sound bearing in the target zone is defined as the user sound bearing, wherein, A is the target sound source The difference of orientation and the first pre-configured orientation difference, B be the target sound source orientation with the second pre-configured orientation difference plus and.

User may change the location of oneself during voice signal is sent in a small range, then its The sound bearing of the voice signal sent will also change therewith, using above-mentioned user sound bearing determination mode, electronic equipment It can receive while voice signal is carried out exactly in this case, and then carry out respondent behavior exactly.

The embodiment of the present invention additionally provides a kind of electronic equipment, and the electronic equipment provided below the embodiment of the present invention enters Row is introduced.

As shown in figure 3, a kind of electronic equipment, the electronic equipment includes：

Housing 301, processor 302, memory 303, circuit board 304 and power circuit 305, wherein, circuit board 304 is disposed The interior volume surrounded in housing 301, processor 302 and memory 303 are arranged on circuit board 304；Power circuit 305, is used Powered in for each circuit or device of electronic equipment；Memory 303 is used to store executable program code；Processor 302 leads to Cross and read in memory 303 executable program code that stores to run program corresponding with executable program code, for Perform the audio signal processing method described in above method embodiment.

In a kind of implementation, above-mentioned audio signal processing method can include：

Respond the corresponding interactive instruction of the user voice signal.

Other implementations of above-mentioned audio signal processing method referring to preceding method embodiment part explanation, here not Repeat again.

Processor 302 is to above-mentioned steps and the specific implementation procedure of other implementations of above-mentioned audio signal processing method And the process that processor 302 is further performed by running executable program code, it may refer in the embodiment of the present invention Fig. 1 and embodiment illustrated in fig. 2 description, will not be repeated here.

It should be noted that the electronic equipment exists in a variety of forms, include but is not limited to：

(1) mobile communication equipment：The characteristics of this kind equipment is that possess mobile communication function, and to provide speech, data Communicate as main target.This Terminal Type includes：Smart mobile phone (such as iPhone), multimedia handset, feature mobile phone, and it is low Hold mobile phone etc..

(2) super mobile personal computer equipment：This kind equipment belongs to the category of personal computer, there is calculating and processing work( Can, typically also possess mobile Internet access characteristic.This Terminal Type includes：PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device：This kind equipment can show and play content of multimedia.The kind equipment includes：Audio, Video player (such as iPod), handheld device, e-book, and intelligent toy and portable car-mounted navigation equipment.

(4) server：The equipment for providing the service of calculating, the composition of server is total including processor, hard disk, internal memory, system Line etc., server is similar with general computer architecture, but is due to need to provide highly reliable service, therefore in processing energy Require higher in terms of power, stability, reliability, security, scalability, manageability.

(5) other electronic installations with data interaction function.

It can be seen that, in the scheme that the embodiment of the present invention is provided, the processor of electronic equipment is stored by reading in memory Executable program code run program corresponding with executable program code, under sleep state condition, receive language Message number, and judge whether the received corresponding interactive instruction of voice signal is to wake up instruction, if it is, by sleep state Working condition is switched to, and it is user sound bearing to position the sound bearing of received voice signal, then proceedes to receive Voice signal, and carry out noise suppression to deriving from the voice signal beyond user sound bearing in the voice signal that continues to System processing, obtains user voice signal, and then respond the corresponding interactive instruction of user voice signal.Electronic equipment will wake up and instruct Corresponding sound bearing is defined as user sound bearing, and to the voice beyond the orientation in the voice signal that continues to Signal carries out noise suppressed processing, and the user voice signal of acquisition is the voice letter that the user in user sound bearing sends Number, therefore can correctly be responded, lift Consumer's Experience.

For electronic equipment embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, The relevent part can refer to the partial explaination of embodiments of method.

It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.

Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of audio signal processing method, it is characterised in that applied to the electronic equipment with voice interactive function, the side Method includes：

It is in the electronic equipment under sleep state condition, receives voice signal, and judge received voice signal pair Whether the interactive instruction answered is to wake up instruction；

If it is, working condition is switched to by sleep state, and it is user to position the sound bearing of received voice signal Sound bearing；

Continue to voice signal, and to deriving from the language beyond the user sound bearing in the voice signal that continues to Message number carries out noise suppressed processing, obtains user voice signal；

Respond the corresponding interactive instruction of the user voice signal.

2. the method as described in claim 1, it is characterised in that derived from the voice signal that described pair continues to described Voice signal beyond user sound bearing carries out noise suppressed processing, the step of obtaining user voice signal, including：

Noise suppressed is carried out to deriving from the voice signal beyond the user sound bearing in the voice signal that continues to Processing, and wave beam enhancing is carried out to the voice messaging number in the voice signal that continues to from the user sound bearing Processing, obtains the user voice signal.

3. method as claimed in claim 1 or 2, it is characterised in that methods described also includes：

According to the user sound bearing instruction user orientation.

4. method as claimed in claim 1 or 2, it is characterised in that methods described also includes：

Whether judge the corresponding interactive instruction of voice signal received from the user sound bearing is that auditory localization mode turns Change instruction；

If it is, voice signal is continued to, the corresponding sound bearing of volume the maximum in received voice signal is true It is set to user sound bearing, and volume the maximum in received voice signal is defined as user voice signal, responds institute State the corresponding interactive instruction of user voice signal.

5. method as claimed in claim 1 or 2, it is characterised in that the corresponding friendship of voice signal received by the judgement Mutually whether instruction is to wake up the step of instructing, including：

Filtration treatment is carried out to targeted voice signal, the voice that frequency in the targeted voice signal belongs to predeterminated frequency section is filtered out Signal, wherein, the targeted voice signal is：The voice signal received；

6. method as claimed in claim 1 or 2, it is characterised in that the sound source side of the voice signal received by the positioning The step of position is user sound bearing, including：

According to first kind sound bearing and Equations of The Second Kind sound bearing positioning user sound bearing, wherein, the first kind sound Source orientation is the sound source for the received voice signal for positioning and recording in the case that the electronic equipment is in sleep state Orientation, the corresponding interactive instruction of the voice signal does not instruct to wake up.

7. method as claimed in claim 6, it is characterised in that described according to first kind sound bearing and the Equations of The Second Kind sound source The step of fixing by gross bearings user sound bearing, including：

If it is, orientating the sound bearing that the first kind sound bearing is not belonging in the Equations of The Second Kind sound bearing as user Sound bearing.

8. method as claimed in claim 7, it is characterised in that described that described will be not belonging in the Equations of The Second Kind sound bearing The step of sound bearing of one class sound bearing orientates user sound bearing as, including：

When identified quantity is more than 1, the corresponding sound bearing of voice signal of predeterminated frequency section is will not belong to, is defined as institute State user sound bearing.

9. a kind of speech signal processing device, it is characterised in that applied to the electronic equipment with voice interactive function, the dress Put including：

Instruction judge module is waken up, for being in the electronic equipment under sleep state condition, voice signal is received, and judge Whether the corresponding interactive instruction of received voice signal is to wake up instruction；

Auditory localization module, in the case of in the received corresponding interactive instruction of voice signal to wake up instruction, by Sleep state switches to working condition, and it is user sound bearing to position the sound bearing of received voice signal；

User voice signal obtains module, for continuing to voice signal, and to being originated in the voice signal that continues to Voice signal beyond the user sound bearing carries out noise suppressed processing, obtains user voice signal；

10. a kind of electronic equipment, it is characterised in that the electronic equipment includes：Housing, processor, memory, circuit board and electricity Source circuit, wherein, circuit board is placed in the interior volume that housing is surrounded, and processor and memory are set on circuit boards；Power supply Circuit, for being powered for each circuit or device of electronic equipment；Memory is used to store executable program code；Processor leads to Cross and read in memory the executable program code that stores to run program corresponding with executable program code, for performing Audio signal processing method any one of claim 1-8.