CN114694661A - First terminal device, second terminal device and voice awakening method - Google Patents

First terminal device, second terminal device and voice awakening method Download PDF

Info

Publication number
CN114694661A
CN114694661A CN202210187574.9A CN202210187574A CN114694661A CN 114694661 A CN114694661 A CN 114694661A CN 202210187574 A CN202210187574 A CN 202210187574A CN 114694661 A CN114694661 A CN 114694661A
Authority
CN
China
Prior art keywords
signal
terminal device
recognized
voice
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210187574.9A
Other languages
Chinese (zh)
Inventor
杨香斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to CN202210187574.9A priority Critical patent/CN114694661A/en
Publication of CN114694661A publication Critical patent/CN114694661A/en
Priority to PCT/CN2022/142800 priority patent/WO2023155607A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces

Abstract

The disclosure relates to a first terminal device, a second terminal device and a voice awakening method, in particular to the technical field of voice interaction of the terminal devices. Wherein, first terminal equipment includes: a first communication module configured to: receiving a voice signal to be recognized sent by second terminal equipment; a first processing module configured to: responding to a voice signal to be recognized, and collecting an audio reference signal; removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal; a first communication module further configured to: and sending the notification signal to the second terminal equipment. The embodiment of the disclosure can eliminate the influence caused by other terminal devices in the process of awakening the terminal device, and reduce the false awakening rate.

Description

First terminal device, second terminal device and voice awakening method
Technical Field
The disclosure relates to the technical field of voice interaction of terminal devices, and in particular to a first terminal device, a second terminal device and a voice awakening method.
Background
With the rapid development of voice recognition technology, the application scene of voice interaction is more and more common. When the terminal equipment performs voice interaction, the terminal equipment is generally in a standby state, if the terminal equipment needs to interact with the terminal equipment, the first step is to firstly awaken the terminal equipment, in the process, the prior art mainly identifies awakening words in voice through an awakening algorithm, but the terminal equipment is possibly awakened by mistake when noise is input, and much inconvenience is brought to normal life of a user.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a first terminal device, a second terminal device, and a voice wake-up method, which can eliminate the influence caused by other terminal devices in the process of waking up the terminal devices, and reduce the false wake-up rate.
In order to achieve the above object, the embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a first terminal device is provided, which includes:
a first communicator configured to: receiving a voice signal to be recognized sent by second terminal equipment, wherein the first terminal equipment is connected with the second terminal equipment in a short-distance wireless communication mode;
a first controller configured to: responding to a voice signal to be recognized, and collecting an audio reference signal; removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal;
a first communicator further configured to: and sending a notification signal to the second terminal device, wherein the notification signal is a target identification signal, or the notification signal is used for indicating the second terminal device to enter an awakening state.
As an optional implementation manner of the embodiment of the present disclosure, the first controller is specifically configured to: calculating the time delay between the voice signal to be recognized and the audio reference signal; if the time delay is larger than a preset time delay threshold value, correcting the audio reference signal based on the voice signal to be recognized to obtain an audio reference signal synchronous with the voice signal to be recognized; and removing the synchronized audio reference signal from the voice signal to be recognized to obtain a target recognition signal.
As an optional implementation manner of the embodiment of the present disclosure, the controller is specifically configured to: removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal; identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal; judging whether the plurality of keywords comprise a target awakening word or not; and generating a notification signal to indicate the second terminal equipment to enter the awakening state under the condition that the target awakening word is included in the plurality of keywords.
As an optional implementation manner of the embodiment of the present disclosure, the wakeup word model includes a plurality of preset wakeup words, and the controller is specifically configured to: respectively calculating the similarity between each preset awakening word and the keyword in the awakening word model; weighting and summing the multiple similarities to obtain a total similarity; and determining the keyword as a target awakening word under the condition that the total similarity reaches a preset similarity threshold value.
In a second aspect, a second terminal device is provided with a microphone array, and includes:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment; receiving a target identification signal fed back by first terminal equipment;
a second controller configured to: responding to the target identification signal, and identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal; judging whether the plurality of keywords comprise target awakening words or not; and controlling to enter an awakening state under the condition that the target awakening word is included in the plurality of keywords.
In a third aspect, a second terminal device is provided with a microphone array, and includes:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment; receiving a notification signal fed back by first terminal equipment, wherein the notification signal is used for indicating second terminal equipment to enter an awakening state;
a second controller configured to: in response to the notification signal, control enters a wake state.
As an optional implementation manner of the embodiment of the present disclosure, the communicator is further configured to: receiving a voice signal to be recognized input by a user;
a controller further configured to: responding to a voice signal to be recognized, and performing voice recognition; controlling the second terminal equipment to enter a state to be awakened under the condition that the voice signal to be recognized comprises a preset awakening word; and sending a voice signal to be recognized to the first terminal equipment in the state of waiting to be awakened.
In a fourth aspect, a voice wake-up method is provided, the method comprising: receiving a voice signal to be recognized sent by second terminal equipment, wherein the first terminal equipment is connected with the second terminal equipment in a short-distance wireless communication mode; responding to a voice signal to be recognized, and collecting an audio reference signal; removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal; and sending a notification signal to the first terminal device, wherein the notification signal is a target identification signal, or the notification signal is used for indicating the second terminal device to enter an awakening state.
As an optional implementation manner of the embodiment of the present disclosure, removing an audio reference signal from a speech signal to be recognized to obtain a target recognition signal includes: calculating the time delay between the voice signal to be recognized and the audio reference signal; if the time delay is larger than a preset time delay threshold value, correcting the audio reference signal based on the voice signal to be recognized to obtain an audio reference signal synchronous with the voice signal to be recognized; and removing the synchronized audio reference signal from the voice signal to be recognized to obtain a target recognition signal.
As an optional implementation manner of the embodiment of the present disclosure, after removing the audio reference signal from the speech signal to be recognized to obtain the target recognition signal and before sending the notification signal to the second terminal device, the method includes: removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal; identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal; judging whether the plurality of keywords comprise target awakening words or not; and generating a notification signal to indicate the second terminal equipment to enter the awakening state under the condition that the target awakening word is included in the plurality of keywords.
As an optional implementation manner of the embodiment of the present disclosure, the method for determining whether the target wake-up word is included in the plurality of keywords includes: respectively calculating the similarity between each preset awakening word and the keyword in the awakening word model; weighting and summing the multiple similarities to obtain a total similarity; and determining the keyword as a target awakening word under the condition that the total similarity reaches a preset similarity threshold value.
In a fifth aspect, a voice wake-up method is provided, which includes: sending a voice signal to be recognized to first terminal equipment; receiving a target identification signal fed back by first terminal equipment; responding to the target identification signal, and identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal; judging whether the plurality of keywords comprise target awakening words or not; and controlling to enter an awakening state under the condition that the target awakening word is included in the plurality of keywords.
In a sixth aspect, a voice wake-up method is provided, which includes: sending a voice signal to be recognized to first terminal equipment; receiving a notification signal fed back by the first terminal device, wherein the notification signal is used for indicating the second terminal device to enter an awakening state; in response to the notification signal, control enters an awake state.
As an optional implementation manner of the embodiment of the present disclosure, sending a to-be-recognized voice signal to a first terminal device includes: receiving a voice signal to be recognized input by a user; responding to a voice signal to be recognized, and performing voice recognition; controlling the second terminal equipment to enter a state to be awakened under the condition that the voice signal to be recognized comprises a preset awakening word; and sending a voice signal to be recognized to the first terminal equipment in the state of waiting to be awakened.
In a seventh aspect, a computer-readable storage medium is provided, comprising: the computer-readable storage medium stores thereon a computer program which, when executed by a processor, implements the voice wake-up method according to the fourth aspect or any one of its alternative embodiments, or the voice wake-up method according to the fifth aspect, or the voice wake-up method according to the fourth aspect or any one of its alternative embodiments.
In an eighth aspect, a computer program product is provided, comprising: when the computer program product runs on a computer, the computer is enabled to implement the voice wake-up method according to the fourth aspect or any one of its alternative embodiments, or the voice wake-up method according to the fifth aspect, or the voice wake-up method according to the fourth aspect or any one of its alternative embodiments.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: the disclosed embodiment provides a first terminal device, a second terminal device and a voice awakening method, wherein under the condition that the first terminal device and the second terminal device are connected in a short-distance wireless communication mode, when a user needs to control the second terminal device through voice but noise interference of the first terminal device exists, the first terminal device comprises: the method comprises the steps that a communicator receives a voice signal to be recognized sent by second terminal equipment, a controller responds to the voice signal to be recognized, an audio reference signal of a local machine of first terminal equipment is collected, the audio reference signal is removed from the voice signal to be recognized to obtain a target recognition signal, and a notification signal is further sent to the first terminal equipment through the controller, so that the first terminal equipment is notified to execute corresponding operation by using the target recognition signal, or the first terminal equipment is notified to execute awakening operation. By the method, the interference of noise generated by other equipment is eliminated, the awakening rate is improved, and the false awakening rate is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1A is a schematic view of a first implementation scenario of a voice wakeup method according to an embodiment of the present disclosure;
fig. 1B is a schematic view of a scenario for implementing a voice wakeup method according to an embodiment of the present disclosure;
fig. 2A is a block diagram of a hardware configuration of a first terminal device according to an embodiment of the disclosure;
fig. 2B is a block diagram of a configuration of a second terminal device according to an embodiment of the disclosure;
fig. 3 is a schematic diagram of software configuration in the first terminal device or the second terminal device according to the embodiment of the disclosure;
fig. 4 is a first flowchart illustrating a voice wake-up method according to an embodiment of the disclosure;
fig. 5 is a schematic flow chart of a voice wake-up method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating obtaining a target identification signal according to an embodiment of the disclosure;
fig. 7 is a schematic flow chart of a secondary wake-up check according to an embodiment of the present disclosure;
fig. 8 is a flowchart of another voice wake-up method according to an embodiment of the disclosure;
fig. 9 is a flowchart of another voice wake-up method according to an embodiment of the disclosure;
fig. 10 is a first block diagram of a first terminal device and a second terminal device according to an embodiment of the disclosure;
fig. 11 is a second architecture diagram of a first terminal device and a second terminal device according to an embodiment of the disclosure.
Detailed Description
To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment. It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
In the intelligent home scene, in order to meet the diversified demands of users, a voice interaction technology is applied to control a plurality of terminal devices in the intelligent home scene. When a user controls one of the terminal devices by voice, the audio content being played by the other terminal devices may affect the terminal device to recognize the user voice. For example, when the user needs to wake up the smart speaker, "love classmates", but at this time, the content of the tv play on the tv refers to "white classmates", the smart speaker receives two pieces of audio content, where "white classmates" is noise information, but it is difficult for the smart speaker to correctly recognize the voice signal of the user and perform the wake-up operation. Therefore, when the terminal device performs a voice recognition operation, noise generated from other terminal devices causes interference, and thus the wake-up rate is reduced.
In order to solve the above problem, embodiments of the present disclosure provide a first terminal device, a second terminal device and a voice wake-up method, in the case that the first terminal device and the second terminal device are connected by short-distance wireless communication, when the user needs to control the second terminal device by voice but there is noise interference of the first terminal device, the first terminal equipment receives a voice signal to be recognized sent by the second terminal equipment through the communication module, then the processing module responds to the voice signal to be recognized and collects an audio reference signal of the local first terminal equipment, then removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal, further sending a notification signal to the first terminal equipment through the processing module, thereby informing the first terminal device to perform a corresponding operation using the target identification signal or informing the first terminal device to perform a wake-up operation. Through the first terminal equipment, the interference of noise generated by other equipment is eliminated, the awakening rate is improved, and the false awakening rate is reduced.
To more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the following briefly introduces terms used in the description of the embodiments or the prior art:
the principle of Echo Cancellation (AEC) is to adjust the weight vector of a filter using an adaptive filtering algorithm, estimate an approximate Echo path to approximate a true Echo path, thereby obtaining an estimated Echo signal, and remove the signal from a mixture of clean speech and Echo to achieve Echo Cancellation.
In some embodiments, end devices include, but are not limited to, desktop devices, handheld Personal Computers (PCs), Personal digital assistants, embedded processors, digital signal processors, graphics devices, video game devices, set-top boxes, microcontrollers, cellular telephones, portable media players, smart home devices, handheld devices, wearable devices, virtual reality and/or augmented reality devices, internet Of things devices, in-vehicle infotainment devices, streaming media client devices, e-book reading devices, Point Of Sale (POS), electric vehicle control systems, and various other electronic devices.
As shown in fig. 1A, fig. 1A is a schematic view of a first implementation scenario of a voice wakeup method according to an embodiment of the present disclosure, where fig. 1A includes a first terminal device 200, a second terminal device 100, an intelligent device 300, and a server 400.
In the implementation scenario shown in fig. 1A, a user desires to control the second terminal device 100 through voice, and in the scenario, the first terminal device 200 is playing audio in a working state, so that the second terminal device 100 not only receives the voice of the user but also receives the audio of the first terminal device 100 to generate a to-be-recognized voice signal, which affects both the wake-up rate and the false wake-up rate of the second terminal device 100. In some embodiments of the present disclosure, after the second terminal device 100 transmits the received speech signal to be recognized to the first terminal device, the processing is performed by the first terminal device 200. First, the first terminal device 200 receives the voice signal to be recognized, then acquires an audio reference signal generated by the local device, removes the audio reference signal from the voice signal to be recognized to obtain a target recognition signal, and further sends the target recognition signal to the second terminal device 100, or sends a notification signal generated after the target recognition signal is verified to the second terminal device 100. The awakening rate of the voice interaction of the terminal equipment is improved, and the false awakening rate is reduced.
In some embodiments, the user may operate the first terminal device 200 through the smart device 300 or the second terminal device 100, and the first terminal device 200 performs data communication with the server 400.
In some embodiments, the second terminal device 100 may be a remote controller, and the communication between the remote controller and the terminal device includes infrared protocol communication or bluetooth protocol communication, and other short-distance communication methods, and the first terminal device 200 is controlled by wireless or wired methods. The user may input a user instruction through a key on a remote controller, voice input, control panel input, etc., to control the first terminal apparatus 200.
In some embodiments, the smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.) may also be used to control the first terminal device 200. For example, the first terminal device 200 is controlled using an application program running on the smart device.
In some embodiments, the first terminal device 200 may also receive the user's control through touch or gesture, etc., instead of receiving the instruction using the smart device or the control device described above.
In some embodiments, the first terminal device 200 may also be controlled in a manner other than the second terminal device 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the first terminal device 200 to obtain a voice instruction, or may be received by a voice control device provided outside the first terminal device 200.
In some embodiments, the first terminal device 200 may be allowed to make a communication connection through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 can provide various contents and interactions to the first terminal apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Or a cloud server. The above is merely an example, and this is not limited in this embodiment.
As shown in fig. 1B, a schematic view of an implementation scenario of the voice wake-up method provided by the embodiment of the present disclosure is shown, where the implementation scenario includes a smart sound box 201 and a smart television 202, where the smart sound box 201 and the smart television 202 are included in the diagram. In the scenario shown in fig. 1B, the user expects to wake up the smart sound box 201 by the voice "xiaobaoshi" and at this time the smart television 202 is playing a television series and the "xiaobaiganchuangzheng" is transmitted from the speaker of the smart television 202, then the to-be-recognized voice signal received by the smart sound box 201 includes "xiaobaiganchuangshi" and "xiaobaiganchuangshi", and the smart television 202 may remove the "xiaobaiganchuangshi" included in the to-be-recognized voice signal according to the signal data corresponding to the "xiaobaiganchung" cached by the smart television 202. Therefore, the smart sound box 201 sends the voice signal to be recognized to the smart television 202, the smart television 202 acquires signal data corresponding to the "little-white classmate", removes the signal data from the voice signal to be recognized to obtain a target recognition signal, and further, removes noise generated by the smart television 202 to obtain a target recognition signal "little-a classmate", and sends the target recognition signal "little-a classmate" to the smart sound box 201 so that the smart sound box 201 performs recognition processing and enters an awakening state; or the smart television 202 sends the notification signal obtained by processing the target identification signal to the smart sound box 201 to indicate that the smart sound box 201 enters the awakening state, so that the smart sound box 201 and the smart television 202 are matched with each other, and the awakening performance is accurately improved and the mistaken awakening performance is reduced.
Fig. 2A is a block diagram of a hardware configuration of a first terminal device according to an embodiment of the present disclosure. The bluetooth apparatus as shown in fig. 2A includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280. The controller includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, and first to nth interfaces for input/output. The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen. The tuner demodulator 210 receives a broadcast television signal through a wired or wireless reception manner, and demodulates an audio/video signal, such as an EPG data signal, from a plurality of wireless or wired broadcast television signals. The detector 230 is used to collect signals of the external environment or interaction with the outside. The controller 250 and the tuner-demodulator 210 may be located in different separate devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the first terminal apparatus 200. The User may input a User command through a Graphical User Interface (GUI) displayed on the display 260, and the User input Interface receives the User input command through the GUI. Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A common presentation form of a user interface is a graphical user interface, which refers to a user interface displayed in a graphical manner and related to computer operations. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.
Fig. 2B is a configuration block diagram of a second terminal device according to the embodiment of the disclosure. As shown in fig. 2B, the second terminal device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply source. The second terminal device 100 may receive an input operation instruction of the user and convert the operation instruction into an instruction recognizable and responsive to the first terminal device 200, serving as an interaction intermediary between the user and the first terminal device 200.
In some embodiments, the controller includes at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphic Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first interface to an nth interface for input/output of a Digital Signal Processor (DSP), a communication Bus (Bus), and the like.
A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions receiving external input, so as to finally display and play various audio-video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.
The embodiment of the present disclosure provides a first terminal device, where the first terminal device includes:
a first communicator configured to: receiving a voice signal to be recognized sent by second terminal equipment, wherein the first terminal equipment is connected with the second terminal equipment in a short-distance wireless communication mode;
a first controller configured to: responding to a voice signal to be recognized, and collecting an audio reference signal; removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal;
a first communicator further configured to: and sending a notification signal to the first terminal device, wherein the notification signal is a target identification signal, or the notification signal is used for indicating the second terminal device to enter an awakening state.
The first terminal equipment and the second terminal equipment are connected in a short-distance wireless communication mode. The device on the first terminal receives a voice signal to be recognized sent by the second terminal device through the communicator, then the controller responds to the voice signal to be recognized, collects an audio reference signal generated by the device, and removes the audio reference signal from the voice signal to be recognized to obtain a target recognition signal. And further, the communicator sends the target identification signal to the first terminal equipment, or sends the obtained notification signal after the target identification signal is processed to the first terminal equipment. The method and the device have the advantages that in an intelligent home scene, the first terminal device eliminates the influence of the audio signal of the first terminal device on the voice received by the second terminal device, the signal to noise ratio of the voice signal to be recognized is improved, the voice interaction awakening rate of the terminal device is further improved, and the voice interaction error awakening rate of the terminal device is reduced.
The embodiment of the present disclosure provides a second terminal device, the second terminal device is provided with a microphone array, and the first terminal device includes:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment; receiving a target identification signal fed back by first terminal equipment;
a second controller configured to: responding to the target identification signal, and identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal; judging whether the plurality of keywords comprise target awakening words or not; and controlling to enter an awakening state under the condition that the target awakening word is included in the plurality of keywords.
The second terminal equipment is connected with the first terminal equipment in a short-distance wireless communication mode. The second terminal device sends the voice signal to be recognized to the first terminal device through the communicator, and then receives the target recognition signal fed back by the first terminal device, wherein the target recognition signal eliminates noise mixed in the voice signal to be recognized, and particularly eliminates an audio reference signal generated by the first terminal device. Further, the second terminal device responds to the target identification signal through the controller, identifies the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal, judges whether the keywords include the target awakening word, and determines to execute awakening operation if the keywords include the target awakening word, so that the second terminal device enters an awakening state. The method realizes the optimization of the awakening rate and the false awakening rate at the same time.
The disclosed embodiment provides another kind of second terminal equipment, and second terminal equipment is provided with the microphone array, and this first terminal equipment includes:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment; receiving a notification signal fed back by the first terminal device, wherein the notification signal is used for indicating the second terminal device to enter an awakening state;
a second controller configured to: and performing a wake-up operation in response to the notification signal to enter a wake-up state.
The second terminal equipment is connected with the first terminal equipment in a short-distance wireless communication mode. The second terminal device sends a voice signal to be recognized to the first terminal device through the communicator, then receives a notification signal fed back by the first terminal device, the notification signal indicates that the second terminal device enters the awakening state, and the controller responds to the notification signal and executes the awakening operation to enable the second terminal device to enter the awakening state. The method and the device have the advantages that in a system formed by the first terminal device and the second terminal device, power consumption is reduced, and awakening performance is improved.
Fig. 3 is a schematic diagram of software configuration in the first terminal device or the second terminal device according to the embodiment of the disclosure, and as shown in fig. 3, the system is divided into four layers, which are, from top to bottom, an Application (Applications) layer (for short, "Application layer"), an Application Framework (for short, "Framework layer"), an Android runtime (Android runtime) and a system library layer (for short, "system runtime library layer"), respectively. The inner core layer comprises at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..
In summary, the embodiments of the present disclosure provide a first terminal device, a second terminal device, and a voice wake-up method, where when a user needs to control a second terminal device through voice but there is noise interference of the first terminal device when the first terminal device and the second terminal device are connected in a short-distance wireless communication manner, the first terminal device includes: the method comprises the steps that a communicator receives a voice signal to be recognized sent by second terminal equipment, a controller responds to the voice signal to be recognized, an audio reference signal of a local machine of first terminal equipment is collected, the audio reference signal is removed from the voice signal to be recognized to obtain a target recognition signal, and a notification signal is further sent to the first terminal equipment through the controller, so that the first terminal equipment is notified to execute corresponding operation by using the target recognition signal, or the first terminal equipment is notified to execute awakening operation. By the method, the interference of noise generated by other equipment is eliminated, the awakening rate is improved, and the mistaken awakening rate is reduced.
As shown in fig. 4, fig. 4 is a first schematic flowchart of a voice wake-up method according to an embodiment of the present disclosure, where the method includes:
s401, sending a voice signal to be recognized to the first terminal device.
The first terminal device is a terminal device which is expected by a user to execute an operation indicated by voice, and the second terminal device is a terminal device which is in the same environment with the first terminal device and plays multimedia data such as audio or video. The second terminal device is provided with a microphone array, and the first terminal device and the second terminal device are connected in a short-distance wireless communication mode.
The short-range wireless communication mode may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises a local area network; the wireless network includes: bluetooth (Bluetooth), wireless local area network 802.11(Wi-Fi), Infrared Data Association (IrDA), and other networks that implement wireless communications.
The microphone array is formed by placing a group of microphone sensors on the second terminal device in a certain way, and receiving sound signals in space. After a certain processing procedure, one can extract the relevant characteristic information of the received signal, such as amplitude, frequency, direction, etc. The microphone array has the following topological structure according to different distribution of the microphone sensors in spatial positions: linear arrays, circular arrays, spherical arrays, etc., and literally straight, crossed, planar, spiral, spherical and random arrays, etc. As for the number of array elements of the microphone array, that is, the number of microphones, 4 may be used, which is not limited by the present disclosure.
In some embodiments, the second terminal device receives a to-be-recognized voice signal input by a user through the microphone array, where the to-be-recognized voice signal is a voice signal obtained by directly obtaining a voice of the user and performing analog-to-digital conversion.
After the second terminal equipment receives the voice signal to be recognized, responding to the voice signal to be recognized, and performing voice recognition to obtain a plurality of keywords included in the voice signal to be recognized. Further, whether the plurality of keywords include a preset awakening word is judged, and when the plurality of keywords include the preset awakening word, the second terminal device is determined to enter a state to be awakened, secondary awakening check is needed, and whether the second terminal enters the awakening state is finally determined, so that the awakening rate is guaranteed. The state to be awakened is a state between a standby state and an awakening state.
Exemplarily, the television is a first terminal device, the smart speaker is a second terminal device, the smart speaker is connected with the television through bluetooth, and the preset awakening word is a favorite classmate. The user awakens the intelligent sound box in a voice mode: "little A is with study", after intelligent audio amplifier received user's pronunciation, carries out speech recognition, obtains a plurality of keywords: and judging to obtain a plurality of keywords including a preset awakening word 'small A classmate', and determining that the intelligent sound box enters a state to be awakened.
In the embodiment of the disclosure, in order to reduce the power consumption of a system formed by the first terminal device and the second terminal device under the condition that the second terminal device generates noise interference and the first terminal device has strong calculation power, the second terminal device performs secondary verification by using the strong calculation power of the first terminal device under the condition that the second terminal device keeps low calculation power, so that on one hand, the awakening rate is improved, on the other hand, the power consumption of the device is reduced, and energy is saved.
In some embodiments, in the process of performing voice recognition based on the voice signal to be recognized, an embodiment of the present disclosure provides an implementation manner, because voices of different users correspond to different voiceprint features, according to the voiceprint features included in the voice signal to be recognized, the second terminal device determines whether a user corresponding to the voice signal to be recognized is a valid user, where the valid user refers to a user registered in advance for logging in, for example, a house owner who registers and logs in a user account in the second terminal device in advance in an intelligent home environment. And under the condition that the user corresponding to the voice signal to be recognized is a legal user, voice recognition is carried out, so that whether the user enters a state to be awakened or not is judged, and the safety of voice awakening is improved.
In other embodiments, before performing speech recognition on the speech signal to be recognized, noise reduction processing is required to be performed to remove the non-human speech signal because the speech signal to be recognized includes the non-human speech signal such as white noise or environmental noise, so as to improve the accuracy of the speech recognition. Noise reduction methods include, but are not limited to: an adaptive (LMS) filter, an adaptive notch filter, a basic spectral subtraction, a wiener filter, etc., and the present disclosure does not limit the noise reduction method.
In the above embodiment, the second terminal device sends the voice signal to be recognized to the first terminal device, so that the second terminal device performs secondary verification. In the process, the voice input by the user is received through the microphone array, and then the user identity can be identified through different voiceprint characteristics, so that the safety is guaranteed, and non-human voice signals in the voice can be removed through a noise reduction method, so that the accuracy of voice identification is improved.
S402, receiving a voice signal to be recognized sent by the second terminal device.
And S403, responding to the voice signal to be recognized, and collecting an audio reference signal.
Wherein, the audio reference signal is generated by the multimedia data played by the second terminal equipment.
In some embodiments, in an intelligent home scenario, because a to-be-recognized voice signal includes an audio reference signal generated by a first terminal device, the first terminal device collects the audio reference signal played by the device. The digital signal is convenient to process, and the analog signal is closer to an audio signal played by the first terminal equipment mixed in the voice signal to be recognized.
S404, removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal.
In some embodiments, since the first terminal device and the second terminal device are connected by short-range communication, there is a delay in the communication interaction process, for example, when the first terminal device and the second terminal device are connected by bluetooth, the delay of the voice signal is less than 80 ms. It should be emphasized that, in practical applications, the time delay between the speech signal to be recognized and the audio reference signal is less than 100ms, which can ensure the synchronization of the two signals during the subsequent processing.
In order to accurately remove the audio reference signal from the speech signal to be recognized, it is necessary to limit the time delay existing between the speech signal to be recognized and the audio reference signal to be less than or equal to a preset time delay threshold. Therefore, as shown in fig. 5, fig. 5 is a flowchart of a second voice wake-up method according to an embodiment of the disclosure, and the step S404 includes the following steps S404a to S404 d:
s404a, calculating the time delay between the speech signal to be recognized and the audio reference signal
The time delay is propagation time delay, which is the time taken for the speech signal to be recognized to propagate for a certain distance in the channel.
Based on the formula: and calculating the time delay between the voice signal to be recognized and the audio reference signal, wherein the time delay is the length of the channel/the propagation rate of the signal on the channel.
S404, 404b, judging whether the time delay between the voice signal to be recognized and the audio reference signal is larger than a preset threshold value.
In case the time delay between the speech signal to be recognized and the audio reference signal is greater than a preset threshold, performing S404 c;
in the case that the time delay between the speech signal to be recognized and the audio reference signal is less than or equal to the preset threshold, it is determined that the speech signal to be recognized and the audio reference signal are synchronized, and S404d is performed.
S404c, correcting the audio reference signal based on the voice signal to be recognized to obtain the audio reference signal synchronous with the voice signal to be recognized. S404d, removing the synchronized audio reference signal from the voice signal to be recognized to obtain a target recognition signal.
In some embodiments, the audio reference signal is removed from the speech signal to be recognized based on an echo cancellation algorithm to obtain the target recognition signal. Echo cancellation algorithms include, but are not limited to: least Mean Square (LMS), Normalized Least Mean Square (NLMS) algorithm.
Illustratively, as shown in fig. 6, (a) in fig. 6 is a waveform diagram of a speech signal to be recognized, (b) in fig. 6 is a waveform diagram of an audio reference signal, and (c) in fig. 6 is a waveform diagram of a target recognition signal, the audio reference signal shown in (a) may be removed from the waveform diagram of the speech signal to be recognized shown in (a) based on an echo cancellation algorithm, so as to obtain the target recognition signal shown in (c).
In some embodiments, a second wake-up check is performed by the first terminal device after obtaining the target identification signal.
As shown in fig. 7, fig. 7 is a schematic flow chart of a secondary wake-up check according to an embodiment of the present disclosure, and the flow chart includes the following steps S701 to S704:
s701, identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal.
In some embodiments, during the process of identifying the target identification signal, Voice Detection (VAD) may be performed to eliminate a long-time silence segment in the target identification signal, so as to remove unnecessary signals, reduce the data processing amount of wake-up identification, and improve the identification efficiency.
S702, judging whether the plurality of keywords comprise target awakening words or not.
In some embodiments, the number of the wake-up word models is 1, and the wake-up word models include a plurality of preset wake-up words. And respectively calculating the similarity between the plurality of recognized keywords and the awakening word model which comprises a plurality of preset awakening words to obtain a plurality of similarities, then carrying out weighted summation on the plurality of similarities to obtain the total similarity, and determining the keywords as the target awakening words under the condition that the total similarity reaches a preset similarity threshold value. The above operations are performed one by one for each of the plurality of keywords to determine whether each keyword is a target wake-up word.
In other embodiments, the number of the wake-up word models is multiple, the similarity between the keyword corresponding to each wake-up model and the preset wake-up word is calculated, and when the ratio between the number of the wake-up models reaching the set threshold and the total number of the wake-up models exceeds one-half, the wake-up condition is reached.
For example, suppose that there are 3 different types of models a, B, and C in the wake-up module in the terminal device, the target identification signal is input into the models a, B, and C, respectively, and 3 keywords corresponding to each model are obtained. And under the condition that 2 keywords in the 3 keywords comprise the target awakening words, determining that the target identification signals after the 3 models are identified comprise the awakening words.
In the case where the target wakeup word is included in the plurality of keywords, S703 is executed;
in a case where none of the plurality of keywords includes the target wake word, S704 is performed.
And S703, generating a notification signal and indicating the second terminal equipment to enter an awakening state.
S704, generating a notification signal to instruct the second terminal device to return to the standby state and wait for being awakened next time.
In the steps S701 to S704, whether the target identification signal includes the target wake-up word is determined by the wake-up word model, so that the second terminal device performs the second wake-up verification, and the wake-up rate is improved.
S405, sending the notification signal to the second terminal equipment.
The notification signal is a target identification signal, or the notification signal is used for indicating the second terminal device to enter an awake state.
In some embodiments, after obtaining the target identification signal, the first terminal device directly sends the target identification signal to the second terminal device, so that the second terminal device performs the identification operation according to the target identification signal. Because the audio reference signal generated by the first equipment is removed from the target identification signal, the identification of the second terminal equipment for the target identification signal is more accurate, and the use experience of a user is improved.
In other embodiments, after the first terminal device identifies the target identification signal, a notification signal is generated according to a result obtained by the identification. Under the condition that the target identification signal obtained by identification contains the target awakening word, the user is expected to awaken the second terminal device through voice, and therefore the notification signal sent to the second terminal device by the first terminal device is used for indicating the second terminal device to enter an awakening state; and under the condition that the target identification signal obtained by identification does not comprise the target awakening word, the audio reference signal generated by the first terminal equipment is mixed in the voice signal to be identified, so that the mistaken awakening of the second terminal equipment is caused, and therefore the notification signal sent by the first terminal equipment to the second terminal equipment is used for indicating the second terminal equipment to exit the state to be awakened and enter the standby state to wait for the next awakening.
In the above embodiment, the first terminal device sends different notification signals to the second terminal device, where the notification signals are target identification signals or signals for instructing the second terminal device to enter the wake-up state. Based on different notification signals, the following will respectively describe the processing procedure of the second terminal device in the case that the notification signal is the target identification signal and in the case that the notification signal is used to instruct the second terminal device to enter the wake-up state:
(1) the notification signal being a target identification signal
In some embodiments, after receiving a target identification signal sent by a first terminal device, a second terminal device firstly identifies the target identification signal based on a wakeup word model to obtain a plurality of keywords included in the target identification signal; in the process of identifying the target identification signal, Voice Detection (VAD) can be performed to eliminate a long-time silence segment in the target identification signal, so as to remove unnecessary signals, reduce the data processing amount of awakening identification, and improve the identification efficiency.
Then, judging whether the plurality of keywords comprise target awakening words or not; in some embodiments, the number of the wake-up word models is 1, and the wake-up word models include a plurality of preset wake-up words. And respectively calculating the similarity between the plurality of recognized keywords and the awakening word model which comprises a plurality of preset awakening words to obtain a plurality of similarities, then carrying out weighted summation on the plurality of similarities to obtain the total similarity, and determining the keywords as the target awakening words under the condition that the total similarity reaches a preset similarity threshold value. The above operations are performed one by one for each of the plurality of keywords to determine whether each keyword is a target wake-up word.
In other embodiments, the number of the wake-up word models is multiple, the similarity between the keyword corresponding to each wake-up model and the preset wake-up word is calculated, and when the ratio between the number of the wake-up models reaching the set threshold and the total number of the wake-up models exceeds one-half, the wake-up condition is reached.
And under the condition that the target awakening words are included in the plurality of keywords, the awakening operation is determined to be executed, so that the second terminal equipment enters an awakening state.
In addition, when the plurality of keywords are identified and do not include the target awakening word, the second terminal device is determined to enter a standby state, and the next awakening is waited.
(2) The notification signal is used for indicating the second terminal equipment to enter the awakening state
In some embodiments, after the second terminal device receives the notification signal sent by the first terminal device, since the notification signal is a signal generated after the identification processing of the first terminal device and indicates that the second terminal device enters the wake-up state, the second terminal device directly responds to the notification signal,
in some embodiments, the notification signal indicates that the second terminal device enters the standby state, and the second terminal device executes a corresponding operation in response to the notification signal, so that when the second terminal device is in the to-be-woken state, the second terminal device exits the to-be-woken state and enters the standby state; or in case of the first verification, the second terminal device directly enters the standby state.
In summary, in a case that a first terminal device and a second terminal device are connected in a short-distance wireless communication manner, and a user needs to control the second terminal device in a voice manner but noise interference of the first terminal device exists, the first terminal device receives a to-be-recognized voice signal sent by the second terminal device, collects an audio reference signal of the first terminal device in response to the to-be-recognized voice signal, removes the audio reference signal from the to-be-recognized voice signal to obtain a target recognition signal, and further sends a notification signal to the first terminal device, so as to notify the first terminal device to perform a corresponding operation by using the target recognition signal, or notify the first terminal device to perform a wakeup operation. By the method, the interference of noise generated by other equipment is eliminated, the awakening rate is improved, and the false awakening rate is reduced.
As shown in fig. 8, fig. 8 is a flowchart of another voice wake-up method provided in the embodiment of the present disclosure, where the method includes: steps S801 to S806:
s801, sending a voice signal to be recognized to the first terminal device.
The voice signal to be recognized is received by a microphone array arranged on the second terminal device, wherein the voice signal to be recognized comprises the voice signal of the user and the audio reference signal generated by the first terminal device.
S802, receiving a target identification signal fed back by the first terminal device.
And S803, responding to the target identification signal, and identifying the target identification signal based on the awakening word model to obtain a plurality of keywords included in the target identification signal.
S804, judging whether the plurality of keywords comprise the target awakening words or not.
In a case where the target wake-up word is included in the plurality of keywords, S805 is performed;
in a case where the target wake word is not included in the plurality of keywords, S806 is performed.
S805, controlling to enter a wake-up state.
And S806, controlling to enter a standby state and waiting for next awakening.
The detailed implementation manner of some embodiments in the above steps is the same as or similar to the embodiments described in steps S401 to S405, and this disclosure is not repeated herein.
As shown in fig. 9, fig. 9 is a flowchart of a voice wake-up method provided in an embodiment of the present disclosure, where the method includes: s901 to S903:
and S901, sending a voice signal to be recognized to the first terminal equipment.
The specific recognition operation is the operation mentioned in some embodiments of the above voice wakeup method, and is not described herein again. And under the condition that the voice signal to be recognized is recognized to contain the preset awakening word, determining that the second terminal equipment enters the state to be awakened, and performing secondary awakening verification through the first terminal equipment.
And sending a voice signal to be recognized to the first terminal equipment in the state of waiting to be awakened.
And S902, receiving a notification signal fed back by the first terminal device.
The notification signal is used for indicating the second terminal equipment to enter the awakening state.
And S903, responding to the notification signal, and controlling to enter an awakening state.
In addition, when the received notification signal fed back by the first terminal device indicates that the second terminal device exits from the to-be-awakened state and enters the standby state, corresponding operation is executed according to the indication of the notification signal.
The detailed implementation manner of some embodiments in the above steps is the same as or similar to the embodiments described in steps S401 to S405, and this disclosure is not repeated herein.
Fig. 10 is a first architecture diagram of a first terminal device and a second terminal device according to an embodiment of the disclosure, as shown in fig. 10, the first terminal device 200 includes a first communicator 1010 and a first controller 1020, and the second terminal device 100 includes a second communicator 1030 and a second controller 1040. The first communicator 1010 includes a first receiving module 1011 and a first transmitting module 1012, and the first controller 1020 includes an acquiring module 1021 and a first processing module 1022; the second communicator 1030 includes a second receiving module 1031, a second sending module 1032, and the second controller 1040 includes a wakeup word model 1041, a determining module 1042, and a second processing module 1043.
As shown in fig. 10, the first architecture of the first terminal device and the second terminal device is that the second terminal device 100 sends a voice signal to be recognized to the first terminal device 200 through the second sending module 1032; the first terminal device 200 receives a to-be-recognized voice signal through the first receiving module 1011, collects an audio reference signal through the collecting module 1021, removes the audio reference signal from the to-be-recognized voice signal through the first processing module 1022 to obtain a target recognition signal, and further sends the target recognition signal to the second terminal device 100 through the first sending module 1012. The second terminal device 100 receives the target identification signal through the second receiving module 1031, and then identifies the target identification signal based on the wakeup word model 1041 to obtain a plurality of keywords included in the target identification signal, and then the determining module 1042 determines whether the plurality of keywords include the target wakeup word, and under the condition that the plurality of keywords include the target wakeup word, the second processing module 1043 controls the second terminal device 100 to enter the wakeup state.
Fig. 11 is a second architecture diagram of the first terminal device and the second terminal device according to the embodiment of the disclosure, as shown in fig. 11, the first terminal device 200 in fig. 10 includes a first communicator 1110 and a first controller 1120, and the second terminal device 100 includes a second communicator 1130 and a second controller 1150. The first communicator 1110 includes a first receiving module 1111 and a first sending module 1112, and the first controller 1120 includes an acquiring module 1121, a wakeup word model 1122, a determining module 1123, and a first processing module 1124; the second communicator 1130 includes a second receiving module 1131, a second transmitting module 1132, and the second controller 1150 includes a wakeup module 1151.
As shown in fig. 11, in a second architecture diagram of the first terminal device and the second terminal device, the second terminal device 100 sends a voice signal to be recognized to the first terminal device 200 through the second sending module 1132; the first terminal device 200 receives a to-be-recognized voice signal through the first receiving module 1111, collects an audio reference signal through the collecting module 1021, removes the audio reference signal from the to-be-recognized voice signal through the first processing module 1122 to obtain a target recognition signal, further recognizes the target recognition signal based on the awakening word model 1122 to obtain a plurality of keywords included in the target recognition signal, then determines whether a target awakening word is included in the plurality of keywords through the determining module 1123, generates a notification signal through the first processing module 1124 if the target awakening word is included in the plurality of keywords, and finally transmits the notification signal to the second terminal device 100 through the first transmitting module 1112 to control the second terminal device 100 to enter an awakening state through the 1151.
An embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process corresponding to a first terminal device in a voice wake-up method in the foregoing method embodiments, or implements each process corresponding to a second terminal device in a voice wake-up method in the foregoing method embodiments, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Embodiments of the present invention provide a computer program product, where the computer program is stored, and when being executed by a processor, the computer program implements each process corresponding to a first terminal device in a voice wake-up method in the foregoing method embodiments, or implements each process corresponding to a second terminal device in a voice wake-up method in the foregoing method embodiments, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.
In the present disclosure, the Processor may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field-Programmable Gate arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In the present disclosure, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
In the present disclosure, computer-readable media include both non-transitory and non-transitory, removable and non-removable storage media. Storage media may implement information storage by any method or technology, and the information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A first terminal device, comprising:
a first communicator configured to: receiving a voice signal to be recognized sent by second terminal equipment, wherein the first terminal equipment is connected with the second terminal equipment in a short-distance wireless communication mode;
a first controller configured to: responding to the voice signal to be recognized, and collecting an audio reference signal;
removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal;
the first communicator further configured to: and sending a notification signal to the second terminal device, wherein the notification signal is the target identification signal, or the notification signal is used for indicating the second terminal device to enter an awakening state.
2. The device according to claim 1, wherein the first controller is specifically configured to:
calculating the time delay between the voice signal to be recognized and the audio reference signal;
if the time delay is larger than a preset time delay threshold value, correcting the audio reference signal based on the voice signal to be recognized to obtain an audio reference signal synchronous with the voice signal to be recognized;
and removing the synchronized audio reference signal from the voice signal to be recognized to obtain the target recognition signal.
3. The device of claim 1, wherein the controller is specifically configured to:
removing the audio reference signal from the voice signal to be recognized to obtain the target recognition signal;
identifying the target identification signal based on a wakeup word model to obtain a plurality of keywords included in the target identification signal;
judging whether the plurality of keywords comprise target awakening words or not;
and generating the notification signal to indicate the second terminal equipment to enter the awakening state under the condition that the target awakening word is included in the keywords.
4. The device according to claim 3, wherein the wake-up word model includes a plurality of preset wake-up words, and the controller is specifically configured to:
respectively calculating the similarity between each preset awakening word in the awakening word model and the keyword;
weighting and summing the multiple similarities to obtain a total similarity;
and determining the keyword as a target awakening word under the condition that the total similarity reaches a preset similarity threshold value.
5. A second terminal device, characterized in that the second terminal device is provided with a microphone array, comprising:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment;
receiving a target identification signal fed back by the first terminal equipment;
a second controller configured to: responding to the target identification signal, and identifying the target identification signal based on a wakeup word model to obtain a plurality of keywords included in the target identification signal;
judging whether the plurality of keywords comprise target awakening words or not;
and controlling to enter an awakening state under the condition that the plurality of keywords comprise the target awakening word.
6. A second terminal device, characterized in that the second terminal device is provided with a microphone array, comprising:
a second communicator configured to: sending a voice signal to be recognized to first terminal equipment;
receiving a notification signal fed back by the first terminal device, wherein the notification signal is used for indicating the second terminal device to enter an awakening state;
a second controller configured to: and controlling to enter an awakening state in response to the notification signal.
7. The device of claim 6, wherein the communicator is further configured to: receiving the voice signal to be recognized input by a user;
the controller further configured to: responding to the voice signal to be recognized, and performing voice recognition;
under the condition that the voice signal to be recognized comprises a preset awakening word, controlling the second terminal equipment to enter a state to be awakened;
and sending the voice signal to be recognized to the first terminal equipment in the state to be awakened.
8. A voice wake-up method is applied to a first terminal device, and comprises the following steps:
receiving a voice signal to be recognized sent by second terminal equipment, wherein the first terminal equipment is connected with the second terminal equipment in a short-distance wireless communication mode;
responding to the voice signal to be recognized, and collecting an audio reference signal;
removing the audio reference signal from the voice signal to be recognized to obtain a target recognition signal;
and sending a notification signal to the first terminal device, wherein the notification signal is the target identification signal, or the notification signal is used for indicating the second terminal device to enter an awakening state.
9. A voice wake-up method is applied to a second terminal device, and comprises the following steps:
sending a voice signal to be recognized to first terminal equipment;
receiving a target identification signal fed back by the first terminal equipment;
responding to the target identification signal, and identifying the target identification signal based on a wakeup word model to obtain a plurality of keywords included in the target identification signal;
judging whether the plurality of keywords comprise target awakening words or not;
and controlling to enter an awakening state under the condition that the plurality of keywords comprise the target awakening word.
10. A voice wake-up method is applied to a second terminal device, and comprises the following steps:
sending a voice signal to be recognized to first terminal equipment;
receiving a notification signal fed back by the first terminal device, wherein the notification signal is used for indicating the second terminal device to enter an awakening state;
and controlling to enter an awakening state in response to the notification signal.
CN202210187574.9A 2022-02-17 2022-02-28 First terminal device, second terminal device and voice awakening method Pending CN114694661A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210187574.9A CN114694661A (en) 2022-02-28 2022-02-28 First terminal device, second terminal device and voice awakening method
PCT/CN2022/142800 WO2023155607A1 (en) 2022-02-17 2022-12-28 Terminal devices and voice wake-up methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210187574.9A CN114694661A (en) 2022-02-28 2022-02-28 First terminal device, second terminal device and voice awakening method

Publications (1)

Publication Number Publication Date
CN114694661A true CN114694661A (en) 2022-07-01

Family

ID=82137452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210187574.9A Pending CN114694661A (en) 2022-02-17 2022-02-28 First terminal device, second terminal device and voice awakening method

Country Status (1)

Country Link
CN (1) CN114694661A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155607A1 (en) * 2022-02-17 2023-08-24 海信视像科技股份有限公司 Terminal devices and voice wake-up methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155607A1 (en) * 2022-02-17 2023-08-24 海信视像科技股份有限公司 Terminal devices and voice wake-up methods

Similar Documents

Publication Publication Date Title
US11443744B2 (en) Electronic device and voice recognition control method of electronic device
CN112863510B (en) Method for executing operation on client device platform and client device platform
JP2020527753A (en) View-based voice interaction methods, devices, servers, terminals and media
US20200219384A1 (en) Methods and systems for ambient system control
CN109101517B (en) Information processing method, information processing apparatus, and medium
US11514890B2 (en) Method for user voice input processing and electronic device supporting same
US10911910B2 (en) Electronic device and method of executing function of electronic device
US11393490B2 (en) Method, apparatus, device and computer-readable storage medium for voice interaction
US10950221B2 (en) Keyword confirmation method and apparatus
US20200075008A1 (en) Voice data processing method and electronic device for supporting same
US20220020358A1 (en) Electronic device for processing user utterance and operation method therefor
KR20190096308A (en) electronic device
US10952075B2 (en) Electronic apparatus and WiFi connecting method thereof
CN113519022A (en) Electronic device and control method thereof
CN114694661A (en) First terminal device, second terminal device and voice awakening method
CN112384974A (en) Electronic device and method for providing or obtaining data for training an electronic device
WO2023155607A1 (en) Terminal devices and voice wake-up methods
US20230362026A1 (en) Output device selection
CN111539214A (en) Method, equipment and system for disambiguating natural language content title
US11127400B2 (en) Electronic device and method of executing function of electronic device
CN113593559A (en) Content display method, display equipment and server
US20230215422A1 (en) Multimodal intent understanding for automated assistant
CN115148188A (en) Language identification method, language identification device, electronic equipment and medium
KR20230123343A (en) Device and method for providing voice assistant service
CN114547367A (en) Electronic equipment, searching method based on audio instruction and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination