CN109087650B

CN109087650B - Voice wake-up method and device

Info

Publication number: CN109087650B
Application number: CN201811246595.3A
Authority: CN
Inventors: 杨敬文; 熊达蔚
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2022-02-22
Anticipated expiration: 2038-10-24
Also published as: CN109087650A

Abstract

The disclosure relates to a voice awakening method and device, and belongs to the technical field of voice processing. The method comprises the following steps: acquiring a first voice signal in an environment where electronic equipment is located; if the audio is being played in the environment when the first voice signal is collected, collecting the audio signal of the audio clip played in the collection time period of the first voice signal; performing noise reduction processing on the first voice signal according to the audio signal to obtain a second voice signal; and executing voice awakening operation according to the second voice signal. This openly can filter the audio signal in the first voice signal, makes the second voice signal that obtains not include audio signal, carries out pronunciation according to the second voice signal again and awakens the operation, has solved audio signal and has aroused the operation to pronunciation and cause the interference, leads to the problem that the success rate of pronunciation awaken the operation is low to the success rate of pronunciation awaken the operation has been improved.

Description

Voice wake-up method and device

Technical Field

The present disclosure relates to the field of voice processing technologies, and in particular, to a voice wake-up method and apparatus.

Background

The voice wake-up means that the user wakes up the electronic device by speaking a wake-up word, so that the electronic device enters a state of waiting for a voice instruction or executing the voice instruction.

The voice signal collected by the electronic device includes the voice corresponding to the awakening word spoken by the user, and also includes noise in the environment where the electronic device is located at that time, and when the noise is large, the success rate of voice awakening is affected.

Disclosure of Invention

To solve the problems in the related art, the present disclosure provides a voice wake-up method and apparatus.

According to a first aspect of the embodiments of the present disclosure, there is provided a voice wake-up method, including:

acquiring a first voice signal in an environment where electronic equipment is located;

if the audio is being played in the environment when the first voice signal is collected, collecting the audio signal of the audio clip played in the collection time period of the first voice signal;

performing noise reduction processing on the first voice signal according to the audio signal to obtain a second voice signal;

and executing voice awakening operation according to the second voice signal.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice wake-up apparatus, the apparatus including:

a first acquisition module configured to acquire a first voice signal within an environment in which an electronic device is located;

a second capture module further configured to capture an audio signal of an audio clip played during a capture period of the first speech signal if audio is playing within the environment while the first speech signal was captured;

the noise reduction module is configured to perform noise reduction processing on the first voice signal acquired by the first acquisition module according to the audio signal acquired by the second acquisition module to obtain a second voice signal;

and the awakening module is configured to execute voice awakening operation according to the second voice signal obtained by the noise reduction module.

According to a third aspect of the embodiments of the present disclosure, there is provided a voice wake-up apparatus, the apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

and executing voice awakening operation according to the second voice signal.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the voice wake-up method according to the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

through gathering the first speech signal in the environment that electronic equipment is located, and gather the audio signal of the audio frequency piece of broadcast in this first speech signal's the collection period, like this, can fall the noise treatment according to audio signal to first speech signal, filter the audio signal in the first speech signal promptly, make the second speech signal who obtains not include audio signal, carry out the operation of awakening by voice according to the second speech signal again, it causes the interference to the operation of awakening by voice to have solved audio signal, lead to the problem that the success rate of the operation of awakening by voice is low, thereby the success rate of the operation of awakening by voice has been improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a schematic diagram illustrating a successful wake-up of an electronic device according to an example embodiment.

Fig. 2 is a diagram illustrating a successful receipt of a voice instruction according to an example embodiment.

Fig. 3 is a flow chart illustrating a voice wake-up method according to an example embodiment.

Fig. 4 is a flow chart illustrating a voice wake-up method according to another exemplary embodiment.

Fig. 5 is a flow chart illustrating a voice wake-up method according to another exemplary embodiment.

FIG. 6 is a block diagram illustrating a voice wake-up unit in accordance with an exemplary embodiment.

Fig. 7 is a block diagram illustrating a voice wake-up unit in accordance with an exemplary embodiment.

Fig. 8 is a block diagram illustrating an apparatus for voice wake-up in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The following explains the terms to which the present disclosure relates.

Voice wake-up operation: the voice wake-up operation includes two parts of a wake-up operation and a recognition operation. The waking operation refers to that a user speaks a waking word to wake up the electronic device, so that the electronic device is in a state of waiting for a voice instruction. The identification operation refers to that after the electronic equipment is awakened, a user speaks a voice instruction to control the electronic equipment to execute corresponding voice operation.

Optionally, when the electronic device is successfully awakened, a schematic diagram of the successfully awakened electronic device may be displayed to indicate that the electronic device is waiting for the user to speak the voice command. Referring to fig. 1, the electronic device in fig. 1 displays a prompt box on the desktop, and the prompt box displays a prompt message of "kayen, you say …".

Optionally, when the electronic device receives the voice command, a schematic diagram of successfully receiving the voice command may be displayed. Referring to fig. 2, if the voice command is "weather today" in fig. 2, the prompt message displayed in the prompt box in fig. 2 may be "weather today".

And (3) awakening word: a string for waking up the electronic device. For example, the wake-up word is "love classmates", etc.

Voice instruction: and the voice control electronic equipment executes the corresponding voice operation instruction. For example, the voice command may be "navigate home", "play music", or the like.

The following presents a simplified summary of the disclosure.

When the user is inconvenient to manually operate the electronic equipment, voice awakening operation can be performed on the electronic equipment. That is, the user may speak the wake-up word to wake up the electronic device, and then speak the voice command to control the electronic device to perform the corresponding voice operation. Two application scenarios for playing audio during voice wake-up are described below.

In a first application scenario, the electronic device transmits the audio in the electronic device to a playing device, and the playing device plays the audio, where the playing device may be a vehicle-mounted playing device, a sound box, a television, a computer, and the like, which are connected to the electronic device, and this embodiment is not limited. For example, the electronic device establishes a bluetooth connection with the vehicle-mounted playing device, and the electronic device transmits the songs in the electronic device to the vehicle-mounted playing device for playing. Or, the electronic device establishes a bluetooth or WiFi (Wireless Fidelity) connection with the speaker, and the electronic device transmits the song in the electronic device to the speaker for playing. Or the electronic equipment establishes WiFi connection with the television (or the computer), and the electronic equipment transmits the songs or videos in the electronic equipment to the television (or the computer) for playing.

In a second application scenario, the playing device obtains audio from other devices except the electronic device to play, where the other devices may be a server, other electronic devices, and the like, and the playing device may be a vehicle-mounted playing device, a sound box, a television, a computer, and the like, which establishes a connection with the other devices. For example, the television sets up a WiFi connection with the server, and obtains audio or video from the server for playing. Or the television sets establish WiFi connection with other electronic equipment, and audio or video is acquired from the other electronic equipment and played.

In the above two application scenarios, if the audio is being played in the environment where the electronic device is located, the wake-up operation and the recognition operation may be interfered.

Taking the awakening word as the "favorite classmates" as an example, if the audio is being played in the environment, the voice signal acquired by the electronic device not only includes the voice of the "favorite classmates", but also includes the audio signal of the audio, and at this time, the interference of the audio signal is large, and the electronic device may not recognize the "favorite classmates", which results in a low success rate of the awakening operation; or, a voice similar to "love classmate" may appear in the audio, and the user does not say "love classmate", at which time the electronic device may mistakenly wake up the electronic device, thereby disturbing the user.

Taking the voice instruction as the "weather today" as an example, if the audio is being played in the environment, the voice signal acquired by the electronic device not only includes the voice of the "weather today" but also includes the audio signal of the audio, at this time, the interference of the audio signal is large, the electronic device may not recognize the "weather today", and the weather forecast today cannot be broadcasted for the user, so that the success rate of the recognition operation is low; alternatively, it may be that a human voice similar to "weather today" appears in the audio, and the user does not speak "weather today", at which time the electronic device may misrepresent the weather forecast today, thereby disturbing the user.

As can be seen from the above, the electronic device needs to process the interference, and the details are described in the following embodiments.

Fig. 3 is a flowchart illustrating a voice wake-up method applied to an electronic device according to an exemplary embodiment, where the voice wake-up method includes the following steps, as shown in fig. 3.

In step 301, a first speech signal within an environment in which an electronic device is located is acquired.

In step 302, if the audio is being played in the environment when the first voice signal is collected, the audio signal of the audio clip played in the collection period of the first voice signal is collected.

In step 303, the first speech signal is subjected to noise reduction processing according to the audio signal to obtain a second speech signal.

In step 304, a voice wakeup operation is performed according to the second voice signal.

In summary, according to the voice awakening method provided by the present disclosure, by acquiring the first voice signal in the environment where the electronic device is located and acquiring the audio signal of the audio clip played in the acquisition time period of the first voice signal, in this way, noise reduction processing can be performed on the first voice signal according to the audio signal, that is, the audio signal in the first voice signal is filtered, so that the obtained second voice signal does not include the audio signal, and then the voice awakening operation is executed according to the second voice signal, thereby solving the problem that the success rate of the voice awakening operation is low due to interference caused by the audio signal on the voice awakening operation, and further improving the success rate of the voice awakening operation.

Fig. 4 is a flowchart illustrating a voice wake-up method according to another exemplary embodiment, which is applied to an electronic device in a first application scenario, and as shown in fig. 4, the voice wake-up method includes the following steps.

In step 401, a first speech signal within an environment in which an electronic device is located is acquired.

The electronic equipment is provided with an option of voice awakening operation, and when a user opens the option, the electronic equipment is in a state of supporting the voice awakening operation and executes the voice awakening method provided by the embodiment; when the user equipment closes the option, the electronic equipment is in a state that the voice awakening operation is not supported, and the process is ended.

In this embodiment, when the user turns on the option, the electronic device needs to acquire the first voice signal in real time because the user may perform the voice wakeup operation at any time. The first voice signal may include a wakeup word or a voice command, and the embodiment is not limited.

When the first voice signal is collected, the electronic device may turn on the microphone, collect the sound signal in the environment where the electronic device is located in real time through the microphone, and use the sound signal as the first voice signal.

As mentioned in the introduction of the first application scenario, when the first voice signal is collected, and the audio is playing in the environment where the electronic device is located, the first voice signal includes the audio signal in addition to the human voice, and the audio signal interferes with the voice wakeup operation, so that it is required to detect whether the audio is playing in the environment when the first voice signal is collected, and step 402 is executed.

In step 402, it is detected whether the electronic device is transmitting audio to a playing device when the first voice signal is captured.

Since the audio in the first application scenario is sent to the playing device by the electronic device, the electronic device may detect whether the electronic device is transmitting the audio to the playing device when the first voice signal is collected, and when the electronic device is transmitting the audio to the playing device, determine that the audio is being played in the environment where the electronic device is located when the first voice signal is collected, that is, the first voice signal includes the audio signal, and execute step 403; when the electronic equipment does not transmit audio to the playing equipment, determining that the audio is not played in the environment where the electronic equipment is located when the first voice signal is collected, namely the first voice signal does not include the audio signal, executing voice awakening operation by the electronic equipment according to the first voice signal, and ending the process.

When detecting whether the electronic equipment is transmitting audio to the playing equipment when acquiring the first voice signal, the electronic equipment can detect whether the electronic equipment is connected with the playing equipment when acquiring the first voice signal, and if the electronic equipment is connected with the playing equipment when acquiring the first voice signal, whether the electronic equipment is transmitting audio to the playing equipment when acquiring the first voice signal is determined; and if the electronic equipment is not connected with the playing equipment when the first voice signal is collected, determining that the electronic equipment does not transmit audio to the playing equipment when the first voice signal is collected. It should be noted that the connection established between the electronic device and the playing device described herein includes a direct connection and an indirect connection, where the direct connection refers to the connection established between the electronic device and the playing device, and the indirect connection refers to the electronic device and the playing device being located in the same network.

If the electronic device is connected with the playing device, but the playing device pauses playing audio when the first voice signal is collected, the detection result is inaccurate by adopting the detection mode. Optionally, the electronic device may detect whether the electronic device sends data to the playing device when acquiring the first voice signal, and if the electronic device is sending data to the playing device when acquiring the first voice signal, determine whether the electronic device is transmitting audio to the playing device when acquiring the first voice signal; and if the electronic equipment does not send data to the playing equipment when the first voice signal is collected, determining that the electronic equipment does not transmit audio to the playing equipment when the first voice signal is collected.

In step 403, when the electronic device is transmitting audio to the playback device when the first voice signal is captured, it is determined that audio is being played in the environment when the first voice signal is captured.

In step 404, if audio is being played in the environment while the first voice signal is being captured, audio signals of audio clips transmitted to the playing device in the capture period are captured.

In this embodiment, the electronic device may collect an audio signal of an audio transmitted to the playing device by the electronic device in real time, and then intercept the audio signal of the audio clip within the collection time period from the audio signal. The acquisition period referred to herein is a period in which the electronic device acquires the first speech signal.

In step 405, the audio signal included in the first speech signal is filtered to obtain a second speech signal.

The electronic device may input the first voice Signal and the audio Signal into a DSP (Digital Signal Processing), and the DSP may synchronously process the first voice Signal and the audio Signal.

In one possible processing mode, the DSP takes the audio signal in the first speech signal as noise, and filters the audio signal in the first speech signal through a noise reduction algorithm, so that only the voice spoken by the user remains in the first speech signal, and the processed first speech signal is referred to as a second speech signal.

In step 406, a voice wakeup operation is performed according to the second voice signal.

Fig. 5 is a flowchart illustrating a voice wake-up method according to another exemplary embodiment, which is applied to an electronic device in a second application scenario, and as shown in fig. 5, the voice wake-up method includes the following steps.

In step 501, a first speech signal within an environment in which an electronic device is located is collected.

The implementation process of step 501 is described in step 401, and is not described herein again.

In step 502, it is detected whether the electronic device receives the audio transmitted by the playing device when the first voice signal is collected.

Since the audio in the second application scenario is sent to the electronic device by the playing device, the electronic device may detect whether the electronic device receives the audio transmitted by the playing device when acquiring the first voice signal, and when the electronic device receives the audio transmitted by the playing device, determine that the audio is being played in the environment where the electronic device is located when acquiring the first voice signal, that is, the first voice signal includes the audio signal, and execute step 503; when the electronic equipment does not receive the audio transmitted by the playing equipment, the electronic equipment determines that the audio is not played in the environment where the electronic equipment is located when the first voice signal is collected, namely the first voice signal does not include the audio signal, the electronic equipment executes voice awakening operation according to the first voice signal, and the process is ended.

When detecting whether the electronic equipment receives the audio transmitted by the playing equipment when acquiring the first voice signal, the electronic equipment can detect whether the electronic equipment is connected with the playing equipment when acquiring the first voice signal, and if the electronic equipment is connected with the playing equipment when acquiring the first voice signal, the electronic equipment is determined to receive the audio transmitted by the playing equipment when acquiring the first voice signal; and if the electronic equipment is not connected with the playing equipment when the first voice signal is collected, determining that the electronic equipment does not receive the audio transmitted by the playing equipment when the first voice signal is collected. It should be noted that the connection established between the electronic device and the playing device described herein includes a direct connection and an indirect connection, where the direct connection refers to the connection established between the electronic device and the playing device, and the indirect connection refers to the electronic device and the playing device being located in the same network.

If the electronic device is connected with the playing device, but the playing device pauses playing audio when the first voice signal is collected, the detection result is inaccurate by adopting the detection mode. Optionally, the electronic device may detect whether the electronic device receives data sent by the playing device when acquiring the first voice signal, and if the electronic device receives the data sent by the playing device when acquiring the first voice signal, determine that the electronic device receives audio transmitted by the playing device when acquiring the first voice signal; and if the electronic equipment does not receive the data sent by the playing equipment when the first voice signal is collected, determining that the electronic equipment does not receive the audio transmitted by the playing equipment when the first voice signal is collected.

In step 503, when the electronic device receives the audio transmitted by the playing device when the first voice signal is captured, it is determined that the audio is being played in the environment when the first voice signal is captured.

In step 504, if the audio is being played in the environment when the first voice signal is collected, the audio signal of the audio clip transmitted to the electronic device by the playing device in the collection period is collected.

In this embodiment, the playing device may transmit an audio signal of an audio to the electronic device in real time, and the electronic device intercepts the audio signal of the audio segment within the acquisition time period from the audio signal. The acquisition period referred to herein is a period in which the electronic device acquires the first speech signal.

In step 505, the audio signal included in the first speech signal is filtered to obtain a second speech signal.

The electronic device can input the first voice signal and the audio signal into the DSP, and the first voice signal and the audio signal are synchronously processed through the DSP.

In step 506, a voice wakeup operation is performed according to the second voice signal.

Fig. 6 is a block diagram illustrating a voice wake-up apparatus applied to an electronic device according to an exemplary embodiment, where the voice wake-up apparatus includes: a first acquisition module 610, a second acquisition module 620, a noise reduction module 630, and a wake-up module 640.

The first collecting module 610 is configured to collect a first voice signal in an environment where the electronic device is located;

the second collecting module 620 is further configured to collect an audio signal of an audio clip played in a collection period of the first voice signal if the audio is being played in the environment when the first voice signal is collected;

the noise reduction module 630 is configured to perform noise reduction processing on the first voice signal acquired by the first acquisition module 610 according to the audio signal acquired by the second acquisition module 620 to obtain a second voice signal;

the wake-up module 640 is configured to perform a voice wake-up operation according to the second voice signal obtained by the noise reduction module 630.

In summary, the voice wake-up device provided by the present disclosure collects the first voice signal in the environment where the electronic device is located, and collects the audio signal of the audio clip played in the collection time period of the first voice signal, so that the first voice signal can be subjected to noise reduction processing according to the audio signal, that is, the audio signal in the first voice signal is filtered, so that the obtained second voice signal does not include the audio signal, and then the voice wake-up operation is executed according to the second voice signal, thereby solving the problem that the audio signal interferes with the voice wake-up operation, which results in a low success rate of the voice wake-up operation, and further improving the success rate of the voice wake-up operation.

Fig. 7 is a block diagram illustrating a voice wake-up apparatus applied to an electronic device according to an exemplary embodiment, where the voice wake-up apparatus includes, as shown in fig. 7: a first acquisition module 710, a second acquisition module 720, a noise reduction module 730, and a wake-up module 740.

The first acquisition module 710 is configured to acquire a first voice signal in an environment where the electronic device is located;

the second capturing module 720 is further configured to capture an audio signal of an audio clip played in a capturing period of the first voice signal if the audio is being played in the environment when the first voice signal is captured;

the noise reduction module 730 is configured to perform noise reduction processing on the first voice signal acquired by the first acquisition module 710 according to the audio signal acquired by the second acquisition module 720 to obtain a second voice signal;

the wake-up module 740 is configured to perform a voice wake-up operation according to the second voice signal obtained by the noise reduction module 730.

In an exemplary embodiment of the present disclosure, the apparatus further includes: a first detection module 750 and a first determination module 760;

the first detecting module 750 is configured to detect whether the electronic device is transmitting audio to the playing device when the first voice signal is acquired;

the first determining module 760 is configured to determine that audio is being played in the environment when the first voice signal is captured when the first detecting module 750 determines that the electronic device is transmitting audio to the playing device when the first voice signal is captured.

In an exemplary embodiment of the disclosure, the second acquisition module 720 is further configured to: when the electronic equipment transmits audio to the playing equipment when the first voice signal is collected, collecting the audio signal of the audio clip transmitted to the playing equipment in the collection time period.

In an exemplary embodiment of the present disclosure, the apparatus further includes: a second detection module 770 and a second determination module 780;

the second detection module 770 is configured to detect whether the electronic device receives the audio transmitted by the playing device when the first voice signal is acquired;

the second determining module 780 is configured to determine that the audio is being played in the environment when the first voice signal is captured when the electronic device receives the audio transmitted by the playing device when the second detecting module 770 determines that the first voice signal is captured.

In an exemplary embodiment of the disclosure, the second acquisition module 720 is further configured to: when the electronic equipment receives the audio transmitted by the playing equipment when the first voice signal is collected, the audio signal of the audio clip transmitted to the electronic equipment by the playing equipment in the collection time period is collected.

In an exemplary embodiment of the present disclosure, the noise reduction module 730 is further configured to: and filtering the audio signal contained in the first voice signal to obtain a second voice signal.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An exemplary embodiment of the present disclosure provides a voice wake-up apparatus, which can implement the voice wake-up method provided by the present disclosure, and the voice wake-up apparatus includes: a processor, a memory for storing processor-executable instructions;

wherein the processor is configured to:

carrying out noise reduction processing on the first voice signal according to the audio signal to obtain a second voice signal;

and executing voice awakening operation according to the second voice signal.

Fig. 8 is a block diagram illustrating an apparatus 800 for voice wake-up in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 806 provides power to the various components of device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, wherein instructions, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the above voice wake-up method.

An exemplary embodiment of the present disclosure provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement the voice wake method as described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A voice wake-up method, the method comprising:

detecting whether the electronic equipment is connected with playing equipment or not when the first voice signal is collected;

responding to the connection between the electronic equipment and the playing equipment, and detecting whether the electronic equipment receives audio transmitted by the playing equipment when the first voice signal is collected, wherein by detecting whether the electronic equipment receives data transmitted by the playing equipment when the first voice signal is collected, if the electronic equipment receives the data transmitted by the playing equipment when the first voice signal is collected, the electronic equipment determines that the electronic equipment receives the audio transmitted by the playing equipment when the first voice signal is collected;

when the electronic equipment receives the audio transmitted by playing equipment when the first voice signal is collected, determining that the audio is being played in the environment when the first voice signal is collected;

collecting audio signals of audio clips transmitted to the electronic equipment by the playing equipment in a collecting time period of the first voice signals, wherein the playing equipment is used for transmitting the audio signals of the audio to the electronic equipment, and the electronic equipment intercepts the audio signals of the audio clips in the collecting time period from the audio signals of the audio;

filtering the audio signals of the audio segments contained in the first voice signal to obtain a second voice signal, wherein the first voice signal and the audio signals of the audio segments are input into a DSP, and the DSP is used for synchronously processing the first voice signal and the audio signals of the audio segments;

and executing voice awakening operation according to the second voice signal.

2. The method of claim 1, further comprising:

detecting whether the electronic equipment is transmitting the audio to playing equipment when the first voice signal is collected;

determining that the audio is being played within the environment when the first speech signal was captured when the electronic device was transmitting the audio to a playback device when the first speech signal was captured.

3. The method of claim 2, wherein the capturing an audio signal of an audio clip played during a capture period of the first speech signal comprises:

and when the electronic equipment transmits the audio to the playing equipment when the first voice signal is collected, collecting the audio signal of the audio clip transmitted to the playing equipment in the collection time period.

4. A voice wake-up apparatus, the apparatus comprising:

the second detection module is configured to detect whether the electronic equipment is connected with a playing device when the first voice signal is collected; responding to the connection between the electronic equipment and the playing equipment, and detecting whether the electronic equipment receives audio transmitted by the playing equipment when the first voice signal is collected, wherein by detecting whether the electronic equipment receives data transmitted by the playing equipment when the first voice signal is collected, if the electronic equipment receives the data transmitted by the playing equipment when the first voice signal is collected, the electronic equipment determines that the electronic equipment receives the audio transmitted by the playing equipment when the first voice signal is collected;

a second determining module configured to determine that the audio is being played in the environment when the first voice signal is acquired when the electronic device receives the audio transmitted by a playing device when the second detecting module determines that the first voice signal is acquired;

a second acquisition module, further configured to acquire an audio signal of an audio clip transmitted to the electronic device by the playback device in an acquisition period of the first voice signal, the playback device being configured to transmit the audio signal of the audio to the electronic device, and the electronic device intercepting the audio signal of the audio clip in the acquisition period from the audio signal of the audio;

a noise reduction module configured to filter the audio signal included in the first voice signal to obtain a second voice signal, where the first voice signal and the audio signal of the audio segment are input into a DSP, and the DSP is configured to synchronously process the first voice signal and the audio signal of the audio segment;

5. The apparatus of claim 4, further comprising:

a first detection module configured to detect whether the electronic device is transmitting the audio to a playback device when the first voice signal is acquired;

a first determination module configured to determine that the audio is being played within the environment at the time the first speech signal was captured when the first detection module determines that the electronic device is transmitting the audio to a playback device at the time the first speech signal was captured.

6. The apparatus of claim 5, wherein the second acquisition module is further configured to:

7. A voice wake-up apparatus, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

collecting audio signals of audio clips transmitted to the electronic equipment by the playing equipment in the collection period of the first voice signal, wherein the playing equipment is used for transmitting the audio signals of the audio to the electronic equipment, and the electronic equipment intercepts the audio signals of the audio clips in the collection period from the audio signals of the audio

and executing voice awakening operation according to the second voice signal.

8. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of voice wake up according to any one of claims 1 to 3.