CN107577449B

CN107577449B - Wake-up voice pickup method, device, equipment and storage medium

Info

Publication number: CN107577449B
Application number: CN201710786855.5A
Authority: CN
Inventors: 耿雷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-09-04
Filing date: 2017-09-04
Publication date: 2023-06-23
Anticipated expiration: 2037-09-04
Also published as: CN107577449A

Abstract

The embodiment of the invention discloses a wake-up voice pickup method, device, equipment and storage medium. Wherein the method comprises the following steps: controlling one microphone in the microphone array to detect whether a sound signal is picked up or not, and judging whether the sound signal is a voice activation signal or not; when judging the voice activation signal, controlling all microphones in the microphone array to pick up the voice activation signal; calculating a sound source direction according to the voice activation signal; the wake-up voice signals picked up by the microphone array are weighted and synthesized according to the sound source direction, generating a directional pick-up wake-up voice signal; and identifying whether the wake-up voice signal picked up by the directivity is a wake-up word. The interference of uncorrelated noise can be reduced, and the recognition accuracy of far-field voice awakening in a noise environment is improved.

Description

Wake-up voice pickup method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of voice processing, in particular to a wake-up voice pickup method, device, equipment and storage medium.

Background

Currently, consumer intelligent devices generally have voice interaction functions, which can recognize and understand voice instructions of users and perform voice conversations. However, the traditional mode of acquiring voice by using a single microphone has high requirements on the distance between a sound source and the microphone, and the voice can be acquired only under the condition of being close enough. The above scenario is often referred to as near field speech.

To overcome the limitation on the distance between sound sources, a microphone array technology is currently used to pick up the voice, and the microphone array technology is to form a plurality of microphones into a corresponding array according to a certain rule, and the mode is called far-field voice. Although the microphone array is capable of picking up the voice information of a distant sound source, various noise interferences are inevitably introduced while the voice signal is acquired. Noise interference not only contaminates the received speech with noise, but also results in a significant degradation of the performance of many speech processing systems.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a storage medium for picking up wake-up voice, so as to achieve the purpose of reducing noise components in the picked-up voice.

In a first aspect, an embodiment of the present invention provides a method for picking up wake-up speech, including:

controlling one microphone in the microphone array to detect whether a sound signal is picked up or not, and judging whether the sound signal is a voice activation signal or not;

when judging the voice activation signal, controlling all microphones in the microphone array to pick up the voice activation signal;

calculating a sound source direction according to the voice activation signal;

weighting and synthesizing the wake-up voice signals picked up by the microphone array according to the sound source direction to generate directional pick-up wake-up voice signals;

and identifying whether the wake-up voice signal picked up by the directivity is a wake-up word.

In a second aspect, an embodiment of the present invention further provides a device for waking up a voice, including:

the judging module is used for controlling one microphone in the microphone array to detect whether a sound signal is picked up or not and judging whether the sound signal is a voice activation signal or not;

the control module is used for controlling all microphones in the microphone array to pick up the voice activation signal when judging the voice activation signal;

the calculating module is used for calculating the sound source direction according to the voice activation signal;

the generating module is used for carrying out weighted synthesis on the wake-up voice signals picked up by the microphone array according to the sound source direction to generate directional pick-up wake-up voice signals;

and the recognition module is used for recognizing whether the wake-up voice signal picked up by the directivity is a wake-up word or not.

In a third aspect, an embodiment of the present invention further provides an apparatus, including:

one or more processors;

a storage means for storing one or more programs;

the microphone array is used for picking up external sounds;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the wake-up speech pick-up method as provided in the above embodiments.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a wake-up speech pickup method as provided in the above embodiment.

According to the wake-up voice pickup method, device, equipment and storage medium provided by the embodiment of the invention, the angle of a sound source is determined through voice signals picked up by all microphones in the microphone array, and the voice is picked up in directivity according to the angle of the sound source. The interference of uncorrelated noise can be reduced, and the recognition accuracy of far-field voice awakening in a noise environment is improved.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

fig. 1 is a flowchart of a wake-up voice pickup method according to an embodiment of the present invention;

fig. 2 is a flowchart of a wake-up voice pickup method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a wake-up voice pickup method according to a third embodiment of the present invention;

fig. 4 is a flowchart of a wake-up voice pickup method according to a fourth embodiment of the present invention;

fig. 5 is a block diagram of a wake-up voice pickup apparatus according to a fifth embodiment of the present invention;

fig. 6 is a structural diagram of an apparatus provided in a sixth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a wake-up voice pickup method according to an embodiment of the present invention, where the embodiment is applicable to a case of picking up wake-up voice by using a microphone array, and the method may be performed by a wake-up voice pickup device, and specifically includes the following steps: .

In step 110, one microphone in the microphone array is controlled to detect whether a sound signal is picked up, and determine whether the sound signal is a voice activation signal.

A microphone array consists of a number of acoustic sensors (typically microphones) that are used in a system to sample and process the spatial characteristics of the sound field. Since not all moments during far field pick-up have speech generated, not all moments of sound need to be picked up. This situation can be solved by manually opening the record key in the near-field voice environment, but cannot be solved by manually in the remote voice environment. Therefore, in the present embodiment, whether or not a sound signal is picked up can be detected by controlling one microphone in the microphone array, so that the detection is performed by controlling one microphone in the microphone array, because it is relatively necessary for a mobile terminal or the like to turn on the microphone array at the same time. Thus, one of the microphones of the microphone array may be selected to be on for a long period of time for detecting the presence of a speech signal. The microphone that is turned on may be determined by a randomly selected or designated manner.

Whether or not there is an acoustic signal can be detected by one of the microphones being turned on in the microphone array. When a sound signal is detected, a judgment is required to be made on the sound signal to determine whether the detected sound signal is a voice activation signal. The voice activated signal may be an acoustic signal generated by a single person picked up by a microphone, i.e. any voice uttered by the user. By way of example, it may be determined whether a voice activated signal is picked up based on the frequency, energy and/or intensity of the sound signal received by one of the microphones in the array of microphones.

Due to the relatively complex far field environment, various sounds may occur. Therefore, it is necessary to determine whether or not the picked-up sound signal is a voice activation signal. Since the sound emitted by a person is generally concentrated at 300-3000HZ, it can be determined whether the frequency of the picked-up sound signal is between 300-3000HZ, and if so, it can be determined that the sound signal is a voice signal emitted by a person;

accordingly, it is also possible to judge whether the picked-up sound signal is a voice activation signal by the energy judgment of the sound signal. Since speech is also a mechanical wave, its energy is also in a certain range. And the energy decays along with the distance, because the range of the user and the microphone array is still in a certain range in far-field communication, whether the received sound signal is a voice activation signal can be judged according to the energy, and the voice signals sent by other devices such as televisions and broadcasting can be prevented from being mistakenly recognized as the voice activation signals.

In addition, it is also possible to judge whether the picked-up sound signal is a voice activation signal by sound intensity. The sound intensity is related to both frequency and amplitude, so that the sound intensity of the voice emitted by a single person is also within a certain range, typically within a 10-5W interval, and therefore, whether the voice is emitted by a single person or not can be judged by the sound intensity, and further, whether the voice is an activation signal can be judged.

And 120, when the voice activation signal is judged, controlling all microphones in the microphone array to pick up the voice activation signal.

By the method, whether the picked-up sound signal is a voice activation signal can be judged, and when one microphone in the microphone array detects that the picked-up sound signal is the voice activation signal, all microphones in the microphone array are controlled to be turned on for picking up the voice activation signal. In this embodiment, determining whether the picked-up sound signal is a voice activation signal may be implemented by a voice detection module, which may employ a digital signal processing chip (Digital Signal Processing, DSP). The implementation of the method can be completed in a very short time, and compared with the time length of the process of picking up the voice activation signal, the method can be ignored. Thus, all microphones in the control microphone array can pick up the near complete speech activation signal.

And 130, calculating the sound source direction according to the voice activation signal.

By way of example, differential delay estimation (Time Difference of Arrival, TDOA) may be employed. The first step is to obtain TDE (Time Delay Estimation) of the microphone array, i.e. calculate the time difference between the sound source and each microphone, the TDE referring to an estimate of the time difference of the same signal source to different sensors in the sensor array; the second step is to obtain a position estimate based on the TDE and the position of the microphone. The TDOA (Time Difference of Arrival) technique can be applied to different array structures, and the calculation amount is relatively small, and compared with a subspace algorithm, the method is not limited by sampling intervals, and is applicable to broadband signals such as voice. The position of the sound source can be calculated using TDOA and, correspondingly, the direction of the sound source relative to each of the microphones in the microphone array can be calculated.

And 140, performing weighted synthesis on the wake-up voice signals picked up by the microphone array according to the sound source direction to generate directional pick-up wake-up voice signals.

In this embodiment, the wake-up voice signal may be used to determine whether the wake-up voice signal is a sound wave signal corresponding to a preset wake-up word. For example, the general term "hi, siri" is a wake-up word that may be used to wake up a smart device configured with a microphone array, such that the smart device performs a corresponding operation according to a subsequent voice command.

And according to the method, the sound source direction is obtained and used as a weighting coefficient to carry out wave beam synthesis on the wake-up speech signals picked up by the microphone array. For example, wake-up speech signals picked up by each microphone in the array of microphones may be weighted summed. The method comprises the steps of firstly, carrying out time delay estimation on signals picked up by each microphone, then carrying out time delay compensation so as to realize synchronization of voice signals picked up by each microphone, multiplying the signals picked up by each microphone by a weighted value, and then carrying out addition and summation to obtain an average value. The weighting value can be determined according to the angle between the microphone and the sound source, the sound source direction is taken as the target direction, the signal weight consistent with the phase in the target direction can be set to be higher, and the signals in other directions are correspondingly decreased. For example, the weighting coefficients may be set as a function related to the sound source location, and a corresponding weighting matrix is established according to the function, and the weighting matrix is multiplied by a matrix established by the voice signals picked up by the respective microphones. And obtaining a corresponding product result. And generating a directivity pickup wake-up voice signal according to the product result. The directivity generated by the method picks up the wake-up voice signal, can strengthen the voice signal of the sound source azimuth, can effectively reduce the noise generated by other azimuth, and achieves the aim of effectively reducing noise. And the noise reduction effect of the method is related to the number of microphones in the microphone array. Particularly for microphone arrays consisting of a plurality of microphones.

Step 150, identifying whether the wake-up voice signal picked up by the directivity is a wake-up word.

For example, a language recognition model may be established according to the wake word voice or wake word input by the user, for recognizing the voice information. And matching the wake-up voice signal picked up by directivity acquired in the steps with the language recognition model, and if the matching is successful, recognizing wake-up words corresponding to the wake-up voice signal model picked up by directivity. If the match fails, the process may return to step 110 again to re-monitor.

In this embodiment, the angle of the sound source is determined by using the voice signals picked up by all the microphones in the microphone array, and the voice is picked up in directivity according to the angle of the sound source. The interference of uncorrelated noise can be reduced, and the recognition accuracy of far-field voice awakening in a noise environment is improved.

In a preferred implementation manner of this embodiment, after the wake-up voice signals received by the microphone array are weighted and synthesized according to the sound source direction, the following steps may be added: and if the wake-up voice signal picked up by the directivity is recognized as a wake-up word, controlling the microphone array to pick up the interactive voice signal. After recognizing that the wake-up voice signal picked up by the directivity is the wake-up word, the microphone array is controlled to pick up the interactive voice signal, and the interactive voice signal is recognized by a corresponding voice recognition engine to acquire a voice command of the user.

Example two

Fig. 2 is a flowchart of a wake-up voice pickup method according to a second embodiment of the present invention. The present embodiment is optimized based on the above embodiment, and in the present embodiment, before weighting and synthesizing the wake-up voice signals picked up by the microphone array according to the sound source direction, the following steps are added: acquiring an echo cancellation reference line, and canceling an echo signal in the wake-up voice signal according to the echo cancellation reference line; correspondingly, the wake-up voice signals picked up by the microphone array are weighted and synthesized according to the sound source direction, and the method is specifically optimized as follows: and carrying out weighted synthesis on the wake-up voice signal after echo signal elimination according to the sound source direction.

Correspondingly, the method for picking up wake-up voice provided by the embodiment specifically includes:

in step 210, one microphone in the microphone array is controlled to detect whether a sound signal is picked up, and determine whether the sound signal is a voice activation signal.

And 220, when the voice activation signal is judged, controlling all microphones in the microphone array to pick up the voice activation signal.

And step 230, calculating the sound source direction according to the voice activation signal.

Step 240, acquiring an echo cancellation reference line, and canceling the echo signal in the wake-up speech signal according to the echo cancellation reference line.

In this embodiment, since the voice uttered by the user may be fed back to the microphone array multiple times in a closed space, echo interference is formed. Therefore, it is necessary to try to cancel the echo. By way of example only, and not by way of limitation, generating the echo cancellation reference line using a terminal for picking up echoes may be used. For example: an echo raw signal approximating an echo is obtained at one or more corners of the enclosed space, since the acoustic wave acts as a kind of a conducting wave, comprising two parameters, one being the phase of the wave and one being the amplitude of the wave. In the logical relationship of the waves, the signals are inverted, equal in magnitude, and the result of the logical sum is zero. The obtained echo original signal can be utilized to carry out operations such as displacement, inversion and the like, and meanwhile, the amplitude of the signal is amplified to the average amplitude value range of the secondary sound source according to different using conditions, so that a sound wave generated by human intervention is obtained, and the sound wave is an echo cancellation reference line. The echo signal in the wake-up speech signal can be eliminated according to the echo elimination reference line, and the echo elimination reference line can be added into the wake-up speech signal through a logical addition relationship to eliminate the echo signal in the wake-up speech signal.

Step 250, performing weighted synthesis on the wake-up voice signal after echo signal cancellation according to the sound source direction, and generating a directional pick-up wake-up voice signal.

And carrying out weighted synthesis on the wake-up voice signal after echo signal elimination in the step according to the sound source direction to generate a directional pick-up wake-up voice signal. As can be seen from the above description of the method, the directional wake-up voice signal generated by performing weighted synthesis according to the sound source direction has a better shielding effect on voice signals irrelevant to other directions, and has a worse shielding effect on voice signals with relevance, for example, echo signals.

Step 260, identifying whether the wake-up voice signal picked up by the directivity is a wake-up word.

The embodiment adds the following steps before weighting and synthesizing the wake-up voice signals picked up by the microphone array according to the sound source direction: acquiring an echo cancellation reference line, and canceling an echo signal in the wake-up voice signal according to the echo cancellation reference line; correspondingly, the wake-up voice signals picked up by the microphone array are weighted and synthesized according to the sound source direction, and the method is specifically optimized as follows: and carrying out weighted synthesis on the wake-up voice signal after echo signal elimination according to the sound source direction. The interference of echo signal noise can be reduced, and the recognition accuracy of far-field voice wake-up in a noise environment is further improved.

Example III

Fig. 3 is a flowchart of a method for picking up wake-up voice according to a third embodiment of the present invention. The present embodiment is optimized based on the above embodiment, and in the present embodiment, after generating a directional pickup wake-up voice signal, the following steps are added before recognizing whether the directional pickup wake-up voice signal is a wake-up word: carrying out noise reduction and amplification treatment on the directivity pickup wake-up voice signal; the identifying whether the wake-up voice signal picked up by directivity is a wake-up word includes: and identifying whether the processed wake-up voice signal picked up by directivity is a wake-up word.

in step 310, one microphone in the microphone array is controlled to detect whether a sound signal is picked up, and determine whether the sound signal is a voice activation signal.

And 320, when the voice activation signal is judged, controlling all microphones in the microphone array to pick up the voice activation signal.

And step 330, calculating the sound source direction according to the voice activation signal.

And 340, performing weighted synthesis on the wake-up voice signals picked up by the microphone array according to the sound source direction, and generating directional pick-up wake-up voice signals.

And 350, carrying out noise reduction and amplification processing on the directivity pickup wake-up voice signal.

Although the noise in the wake-up speech signal can be effectively reduced by the directional pick-up wake-up speech signal, some other noise is inevitably present in the directional pick-up wake-up speech signal. Further noise cancellation by filtering is required. For example, a band-pass filter may be selected to filter out sound waves that are not outside the frequency range of sound waves generated by a person, so as to achieve the purpose of filtering noise.

In addition, since the sound source may be far from the microphone matrix, signal fading may occur in the wake-up voice signal picked up by the microphone matrix, resulting in a large difference between the signal strength and the normal wake-up voice signal, and thus the directional pick-up wake-up voice signal needs to be amplified, for example, an Automatic Gain Control (AGC) circuit may be used to amplify the directional pick-up wake-up voice signal.

Step 360, identify whether the processed wake-up speech signal picked up by directivity is a wake-up word.

The wake-up voice signal picked up by directivity after the processing of the steps is more similar to the sound wave signal sent by the user, so that the processed wake-up voice signal picked up by directivity can be sent to a voice recognition engine for recognition.

The method comprises the following steps after generating a directional pickup wake-up voice signal and before recognizing whether the directional pickup wake-up voice signal is a wake-up word: carrying out noise reduction and amplification treatment on the directivity pickup wake-up voice signal; the identifying whether the wake-up voice signal picked up by directivity is a wake-up word includes: and identifying whether the processed wake-up voice signal picked up by directivity is a wake-up word. The wake-up voice signal picked up by the identified directivity can be more close to the sound wave signal sent by the user, and the identification accuracy of far-field voice wake-up in a noise environment can be further improved.

Example IV

Fig. 4 is a flowchart of a method for picking up wake-up voice according to embodiment 4 of the present invention. The present embodiment is optimized based on the foregoing embodiment, and in this embodiment, the determining whether the voice activation signal is picked up according to the frequency, the energy and/or the intensity of the voice signal received by one microphone in the microphone array is specifically optimized as follows: echo cancellation is performed on a voice signal received by one microphone in the microphone array; and judging whether the voice activation signal is picked up or not according to the frequency, the energy and/or the intensity of the voice signal after the echo cancellation.

in step 410, one microphone in the microphone array is controlled to detect whether a sound signal is picked up, and echo cancellation is performed on the sound signal received by the one microphone in the microphone array.

If the user is in a relatively large enclosed space, an acoustic echo may be generated after the user utters a voice, and if the microphone detects the echo signal, the echo signal may be misjudged to be a voice activation signal. To avoid this, in this embodiment, echo cancellation is performed on the sound signal received by one of the microphones in the microphone array. For example, the echo signal may be eliminated by using the echo cancellation reference line, and the specific implementation manner may refer to the manner in the foregoing embodiment, which is not described herein.

Step 420, determining whether the voice activation signal is picked up according to the frequency, energy and/or intensity of the voice signal after the echo cancellation.

And step 430, when the voice activation signal is judged, controlling all microphones in the microphone array to pick up the voice activation signal.

Step 440, calculating the sound source direction according to the voice activation signal.

And 450, performing weighted synthesis on the wake-up voice signals picked up by the microphone array according to the sound source direction, and generating directional pick-up wake-up voice signals.

Step 460, identifying whether the wake-up voice signal picked up by the directivity is a wake-up word.

The embodiment determines whether the voice activation signal is picked up according to the frequency, energy and/or intensity of the voice signal received by one microphone in the microphone array, and specifically optimizes: echo cancellation is performed on a voice signal received by one microphone in the microphone array; and judging whether the voice activation signal is picked up or not according to the frequency, the energy and/or the intensity of the voice signal after the echo cancellation. Misjudgment caused by echo can be avoided, and the power consumption of the microphone array can be effectively reduced.

Example five

Fig. 5 is a schematic structural diagram of a wake-up voice pickup apparatus according to a fifth embodiment of the present invention, as shown in fig. 5, where the apparatus includes:

a judging module 510, configured to control one microphone in the microphone array to detect whether a sound signal is picked up, and judge whether the sound signal is a voice activation signal;

the control module 520 is configured to control all microphones in the microphone array to pick up the voice activation signal when the voice activation signal is determined;

a calculating module 530, configured to calculate a sound source direction according to the voice activation signal;

a generating module 540, configured to perform weighted synthesis on the wake-up voice signal picked up by the microphone array according to the sound source direction, and generate a directional pick-up wake-up voice signal;

and the recognition module 550 is configured to recognize whether the wake-up voice signal picked up by the directivity is a wake-up word.

According to the voice awakening pickup device, the angle of the sound source is determined through voice signals picked up by all microphones in the microphone array, and the voice is picked up in directivity according to the angle of the sound source. The interference of uncorrelated noise can be reduced, and the recognition accuracy of far-field voice awakening in a noise environment is improved.

On the basis of the above embodiments, the device further includes:

and the pickup module is used for controlling the microphone array to pick up the interactive voice signal and carrying out voice recognition on the interactive voice signal if the wake-up voice signal picked up by the directivity is recognized as a wake-up word.

On the basis of the above embodiments, the device further includes:

the cancellation module is used for acquiring an echo cancellation reference line and canceling an echo signal in the awakening voice signal according to the echo cancellation reference line;

the generating module comprises:

and the synthesis unit is used for carrying out weighted synthesis on the wake-up voice signal after the echo signal is eliminated according to the sound source direction.

On the basis of the above embodiment, the apparatus further includes:

the processing module is used for carrying out noise reduction and amplification processing on the directivity pickup wake-up voice signal;

the identification module is used for:

and identifying whether the processed wake-up voice signal picked up by directivity is a wake-up word.

On the basis of the above embodiment, the judging module includes:

and the judging unit is used for judging whether the voice activation signal is picked up or not according to the frequency, the energy and/or the intensity of the voice signal received by one microphone in the microphone array.

On the basis of the above embodiment, the judging unit is configured to:

echo cancellation is performed on sound signals received by one microphone in the microphone array;

and judging whether the voice activation signal is picked up or not according to the frequency, the energy and/or the intensity of the voice signal after the echo cancellation.

The wake-up voice pickup device provided by the embodiment of the invention can execute the wake-up voice pickup method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example six

Fig. 6 is a schematic structural diagram of an apparatus according to a sixth embodiment of the present invention. Fig. 6 shows a block diagram of an exemplary device 612 suitable for use in implementing embodiments of the invention. The device 12 shown in fig. 6 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 6, device 12 is in the form of a general purpose computing device. Components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in fig. 6, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with device 12, such as a microphone array (not shown), and/or any device (e.g., network card, modem, etc.) that enables device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via network adapter 20. As shown, network adapter 20 communicates with other modules of device 12 over bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, implementing the wake-up voice pick-up method provided by the embodiment of the present invention.

Example D

Embodiment D of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the wake-up speech pickup method provided in the above embodiment.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of wake-up speech pickup, comprising:

controlling one microphone in the microphone array to detect whether a sound signal is picked up or not, and judging whether the sound signal is a voice activation signal or not; one microphone of the microphone array is turned on for a long period of time;

when judging the voice activation signal, controlling all microphones in the microphone array to be turned on for picking up the voice activation signal;

calculating a sound source direction according to the voice activation signal;

noise reduction and amplification processing are carried out on the directivity pickup wake-up voice signal, and the method comprises the following steps: a band-pass filter is selected to filter out sound waves which are not outside the frequency range of sound waves generated by people, and an automatic gain control circuit is adopted to amplify the directivity pickup wake-up voice signal;

recognizing whether the processed wake-up voice signal picked up by directivity is a wake-up word;

before the wake-up voice signals picked up by the microphone array are weighted and synthesized according to the sound source direction, the method further comprises the following steps:

acquiring an echo cancellation reference line, and canceling an echo signal in the wake-up voice signal according to the echo cancellation reference line;

the echo cancellation reference line is obtained by obtaining echo original signals similar to echo at a plurality of corners of a closed space, performing displacement and/or anti-phase intervention on the echo original signals, and amplifying the amplitude of the intervention result to the range of amplitude values of average secondary sound sources;

one microphone of the control microphone array detects whether a voice activation signal is picked up, including:

judging whether a voice activation signal is picked up or not according to the frequency, energy or strength of the voice signal after echo cancellation;

the step of performing weighted synthesis on the wake-up voice signals picked up by the microphone array according to the sound source direction comprises the following steps:

performing delay compensation after performing delay estimation on wake-up voice signals picked up by each microphone so as to synchronize the wake-up voice signals picked up by each microphone;

multiplying the wake-up voice signals picked up by each synchronized microphone by a weighted value respectively, adding and summing the weighted values, and then obtaining an average value; wherein the weighting value is determined from an angle between the microphone and a sound source;

wherein, the weighting synthesis is performed on the wake-up voice signals picked up by the microphone array according to the sound source direction, and the method further comprises:

and carrying out weighted synthesis on the wake-up voice signal after echo signal elimination according to the sound source direction.

2. The method of claim 1, further comprising, after weighting the wake-up speech signals received by the microphone array according to the sound source direction,:

and if the wake-up voice signal picked up by the directivity is recognized as a wake-up word, controlling a microphone array to pick up an interactive voice signal, and performing voice recognition on the interactive voice signal.

3. A wake-up speech pickup apparatus, comprising:

the judging module is used for controlling one microphone in the microphone array to detect whether a sound signal is picked up or not and judging whether the sound signal is a voice activation signal or not; one microphone of the microphone array is turned on for a long period of time;

the control module is used for controlling all microphones in the microphone array to be opened when judging the voice activation signal and picking up the voice activation signal;

the recognition module is used for recognizing whether the wake-up voice signal picked up by the directivity is a wake-up word or not;

the apparatus further comprises:

the generating module is specifically configured to:

wherein the apparatus further comprises:

the processing module is used for carrying out noise reduction and amplification processing on the directivity pickup wake-up voice signal, and comprises the following steps: a band-pass filter is selected to filter out sound waves which are not outside the frequency range of sound waves generated by people, and an automatic gain control circuit is adopted to amplify the directivity pickup wake-up voice signal;

the recognition module is specifically used for recognizing whether the processed wake-up voice signal picked up by the directivity is a wake-up word or not;

wherein, the generating module includes:

4. A wake-up speech pickup device, the device comprising:

one or more processors;

a storage means for storing one or more programs;

the microphone array is used for picking up external sounds;

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the wake-up speech pick-up method of any of claims 1-2.

5. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a wake-up speech pick-up method according to any of claims 1-2.