CN111862965A - Awakening processing method and device, intelligent sound box and electronic equipment - Google Patents

Awakening processing method and device, intelligent sound box and electronic equipment Download PDF

Info

Publication number
CN111862965A
CN111862965A CN201910351630.6A CN201910351630A CN111862965A CN 111862965 A CN111862965 A CN 111862965A CN 201910351630 A CN201910351630 A CN 201910351630A CN 111862965 A CN111862965 A CN 111862965A
Authority
CN
China
Prior art keywords
voice
wake
word
awakening
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910351630.6A
Other languages
Chinese (zh)
Inventor
姚海通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910351630.6A priority Critical patent/CN111862965A/en
Publication of CN111862965A publication Critical patent/CN111862965A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the invention provides a wake-up processing method and device, an intelligent sound box and electronic equipment, wherein the method comprises the following steps: in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing: in the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word; if the verification is passed, processing is performed in response to the second voice, and if the verification is not passed, the awake state is terminated. According to the embodiment of the invention, after the intelligent equipment is awakened, the voice of the user is synchronously acquired, the awakening word is verified, and the voice of the user acquired in the awakening state is not transmitted until the verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.

Description

Awakening processing method and device, intelligent sound box and electronic equipment
Technical Field
The application relates to a wake-up processing method and device, an intelligent sound box and electronic equipment, and belongs to the technical field of computers.
Background
The intelligent sound box is a product of upgrading a traditional sound box, and can be used for surfing the internet by utilizing voice, such as song-on-demand, online shopping or weather forecast understanding. The smart sound box can also control smart home devices, such as opening curtains, setting the temperature of a refrigerator, and the like.
However, some smart speakers in the prior art have potential safety hazards, and the smart speakers can monitor daily or in-work conversations or behaviors of users through microphones of the smart speakers, so that great potential safety hazards are brought to work and life of the users.
Disclosure of Invention
The embodiment of the invention provides a wake-up processing method and device, an intelligent sound box and electronic equipment, and aims to protect user privacy.
In order to achieve the above object, an embodiment of the present invention provides a wake-up processing method, including:
in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing:
in the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if the verification is passed, processing is performed in response to the second voice, and if the verification is not passed, the awake state is terminated.
The embodiment of the invention also provides a wake-up processing method, which comprises the following steps:
receiving a text of a wake-up word sent by the intelligent equipment or a first voice corresponding to the wake-up word, verifying the text of the wake-up word or the first voice with a registered wake-up word in user registration information, if the text of the wake-up word or the first voice passes the verification, returning a result that the verification passes to the intelligent equipment, and if the result that the verification fails to pass is not returned to the intelligent equipment.
An embodiment of the present invention further provides a wake-up processing apparatus, including:
the first voice acquisition module is used for acquiring a first voice of a user in a monitoring state, and switching to an awakening state if an awakening word is detected;
the second voice acquisition module is used for acquiring second voice of the user in the awakening state and sending the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word to the verification end for verification, wherein the verification end comprises any one or more of a server, other equipment ends except the current equipment and a processing unit used for executing local verification in the current equipment;
and the first processing module is used for responding to the second voice execution processing under the condition that the verification is passed, terminating the awakening state under the condition that the verification is not passed and terminating the awakening state under the condition that the verification is not passed.
An embodiment of the present invention further provides a wake-up processing apparatus, including:
the awakening word receiving module is used for receiving the text of the awakening word sent by the intelligent equipment or the first voice corresponding to the awakening word;
and the awakening word checking module is used for checking the awakening words registered in the user registration information according to the text of the awakening words or the first voice, if the awakening words pass the checking, a result that the checking passes is returned to the intelligent equipment, and otherwise, a result that the checking fails is returned to the intelligent equipment.
An embodiment of the present invention further provides an electronic device, including:
a memory for storing a program;
a processor, coupled to the memory, for executing the program for:
in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing:
in the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if the verification is passed, processing is performed in response to the second voice, and if the verification is not passed, the awake state is terminated.
The embodiment of the present invention further provides an intelligent sound box, including:
the microphone is used for collecting user voice;
the loudspeaker is used for playing audio according to the control instruction;
the monitoring processing module is used for collecting a first voice of a user through a microphone in a monitoring state, and triggering and switching to an awakening state if an awakening word is detected;
the awakening processing module is used for collecting the second voice of the user by the microphone in the awakening state, and sending the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word to the verification end for verification;
and the verification result processing module is used for responding to the second voice to execute processing under the condition of receiving a verification result which is returned by the verification end and passes the verification, and terminating the awakening state under the condition of receiving a verification result which is returned by the verification end and fails the verification.
The embodiment of the invention also provides a wake-up processing method, which comprises the following steps:
in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
If not, the awake state is terminated.
An embodiment of the present invention further provides an electronic device, including:
a memory for storing a program;
a processor, coupled to the memory, for executing the program for:
in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if not, the awake state is terminated.
According to the embodiment of the invention, after the intelligent equipment is awakened, the voice of the user is synchronously acquired, the awakening word is verified, and the voice of the user acquired in the awakening state is not transmitted until the verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Fig. 1 is a schematic view of an application scenario of a wake-up processing method taking a smart speaker as an example according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a wake-up processing method according to an embodiment of the invention;
FIG. 3 is a second flowchart illustrating a wake-up processing method according to an embodiment of the invention;
FIG. 4 is a schematic structural diagram of a wake-up processing apparatus according to an embodiment of the present invention;
FIG. 5 is a second schematic structural diagram of a wake-up processing apparatus according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of an intelligent sound box according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The technical solution of the present invention is further illustrated by some specific examples.
In order to avoid that the voice signal of the user triggers the intelligent sound box to execute the processing, and simultaneously avoid the intelligent sound box to carry out meaningless instruction recognition, the intelligent sound box is provided with a wake-up mode, and only when the intelligent sound box is in the wake-up state, the recognition of the voice control instruction of the user and the further execution of the control instruction can be executed.
With the development of cloud technology, many control instructions relate to cloud processing, for example, "please play xxxx songs", the control instructions relate to cloud retrieval and send retrieved song data to smart speakers, and for example, "please tell me weather conditions today", relate to cloud retrieval of weather data and provide the weather data to each smart speaker. In addition, in order to make the voice recognition more accurate and intelligent, the recognition processing for the user voice control instruction is also completed at the cloud. Based on the reason, after the intelligent sound box is awakened, the user voice collected locally by the intelligent sound box is sent to the cloud end through the network.
One major difference between the awake state and the non-awake state (listening state) is that: the intelligent sound box can continuously collect user voice in a certain time in the awakening state, generate a voice file and transmit the voice file to the cloud end through the network, so that the cloud end can identify a control instruction and trigger corresponding processing actions. In the non-awakening state, the smart sound box is in the monitoring state, although the voice of the user can be collected through sounds such as a microphone, the voice file of the collected user only exists in the memory and cannot be sent to the network only for recognizing the awakening words, and in the non-awakening state, the collected voice of the user cannot be recognized by the control instruction.
Currently, there are two types of wake-up mechanisms, namely a normal wake-up mode and a fast wake-up mode, and the difference between the two modes will be described separately below.
Regular wake-up mode
Under a general condition, a user needs to speak a wake-up word first to wake up the intelligent sound box to enable the intelligent sound box to be in a voice instruction receiving state, the intelligent sound box can collect user voice and transmit the user voice to a cloud end through a network to perform voice instruction recognition, and a control instruction is extracted and corresponding processing actions are triggered. For example, the user may need to speak "XXX sprite" (e.g., the name of the smart speaker) to wake up the smart speaker and then speak the voice control commands to perform further control. In the conventional wake-up mode, the boundary between the wake-up word and the control instruction is relatively clear.
Fast wake-up mode
In order to enable a user to more conveniently control the voice of the intelligent sound box, some quick awakening instructions can be allowed to be used, and the quick awakening instructions are combined with awakening words and control instructions. For example: the quick awakening instruction is 'next', in the monitoring state, after the intelligent sound box detects the quick awakening instruction, the intelligent sound box is switched to the awakening state, the quick awakening instruction and user voice collected subsequently are transmitted to the cloud end through the network, and the control instruction is identified so as to execute processing action corresponding to the control instruction. Certainly, for some simple quick wake-up instructions, the control instructions can be identified locally and simultaneously, and if the control instructions are identified, the control instructions can be directly executed locally without being processed by a cloud.
For the sake of distinction, the wakeup word in the normal wakeup mode is referred to as "normal wakeup word", and the wakeup word in the fast wakeup mode is referred to as "fast wakeup command". That is, in the description of the present invention, the wakeup word includes a conventional wakeup word and a quick wakeup command, in a non-specific case.
Whether the voice is awakened through a conventional awakening word or a quick awakening instruction, the intelligent sound box is in a state of collecting the voice of the user and transmitting the voice to the cloud end through the network within a period of time after the awakening. Through some hacker technologies, the intelligent sound box may be invaded, and a wake-up word or a quick wake-up instruction is modified, for example, some common words in daily conversations can be used as the wake-up word or the quick wake-up instruction, so that the intelligent sound box is easily woken up and a voice file transmitted to a cloud end by the intelligent sound box is intercepted, the benefit of a user is seriously damaged, and great potential safety hazards exist.
It should be noted that the embodiment of the present invention is designed for a situation that after waking up, the voice of the user is transmitted to the cloud by default for further recognition (only in this case, there is a security risk of privacy disclosure due to network transmission), and the recognition of the voice control command is completely completed locally, which is not considered in the scope of the present invention.
In addition, in the setting of general intelligent audio amplifier, awaken the back through awakening up the word, can remind the user through the warning light, intelligent audio amplifier has awakened up, namely the user can know that intelligent audio amplifier is carrying out the collection of user's pronunciation. Due to the fact that the awakening time of the quick awakening instruction is short, the current intelligent sound box generally cannot prompt a user that the intelligent sound box is in an awakening state, and therefore certain potential safety hazards exist, and the user can acquire some personal privacy information without being aware of the intelligent sound box.
For wake-up Recognition, this is generally implemented by using ASR (Automatic Speech Recognition) technology, and a pre-configured wake-up word or a quick wake-up command is configured in the ASR logic to trigger the wake-up of the smart device.
For the wake-up mechanism, in order to avoid the delay of the wake-up processing, the wake-up recognition is mostly selected to be performed locally, however, if the smart sound box is invaded, the ASR judgment logic is tampered, and the wake-up word or the quick wake-up instruction is modified, so that the user is unaware of the fact and privacy disclosure is easily caused. For example, a hacker can randomly condition a plurality of wake-up words or quick wake-up instructions, so that the intelligent device can wake up frequently, and collect user voice to transmit the voice to the network, thereby seriously jeopardizing the privacy security of the user.
Aiming at the situation, the invention provides a cloud verification mechanism of the awakening words, and only after the awakening words are verified, the intelligent sound box is allowed to send the user voice collected after awakening to the cloud. When the words are wakened up in the cloud, the intelligent sound box still can collect the voice of the user, and the voice can not be sent to the cloud only before the cloud check result is not obtained.
In order to distinguish the collected user voice before and after awakening, the user voice collected before awakening is called as first voice, and the user voice collected after awakening is called as second voice.
Fig. 1 is a schematic view of an application scenario of the wake-up processing method taking a smart speaker as an example according to the embodiment of the present invention. After the intelligent sound box is started, the intelligent sound box is in a monitoring state by default. Under the monitoring state, the microphone collects sound signals, and when the sound signals exceed a certain threshold value, the detection and identification of the awakening words are started. As shown in fig. 1, after the first voice is collected, the detection of the wakeup word is performed, a specific detection manner may be based on ASR voice recognition, text information corresponding to the first voice is extracted and compared with a wakeup word (including a conventional wakeup word and a quick wakeup instruction) preset in the smart speaker, if the text information is matched with the wakeup word, the monitoring state is switched to the wakeup state, otherwise, the monitoring state is still maintained.
After switching to the wake-up state, the processing of the two branches is performed simultaneously. One of the branches is: and continuously acquiring the second voice in the awakening state, but not transmitting the second voice to the cloud. The other branch is as follows: and transmitting the detected awakening words to the cloud for awakening word verification, and waiting for the verification result of the cloud.
In order to verify the awakening words, after the awakening words are defined by the local computer, the user needs to further register the awakening words in the cloud. The intelligent sound box can be the text of the awakening word to the awakening word transmitted by the cloud, then the intelligent sound box is verified in a text comparison mode through the cloud, the intelligent sound box can also directly transmit the first voice collected in a monitoring state to the cloud, the first voice is subjected to text extraction and comparison through the cloud, then the first voice is compared with the text of the awakening word registered in advance, or the first voice is directly used to be compared with the voice of the user which is registered in advance and serves as the awakening word.
After the smart sound box receives the verification result of the cloud, whether the second voice collected in the awakening state needs to be transmitted to the cloud is determined according to the verification result. Specifically, if the verification result is that the verification is passed, the second voice is transmitted to the cloud end to trigger the cloud end to perform further control instruction identification and execute action processing corresponding to the control instruction. For the conventional wake-up mode, the control command issued by the user through the voice only appears in the second voice, and therefore, only the second voice needs to be transmitted to the cloud. For the fast wakeup mode, as the fast wakeup command serving as the wakeup word contains part or all of the control commands, after the fast wakeup command is woken up, the first voice and the second voice need to be transmitted to the cloud end, so that the cloud end can perform complete control command identification. Of course, if in the cloud wake-up word verification process and the first voice is received, after the verification is passed, the smart sound box may also only transmit the second voice to the cloud.
After receiving the second voice, the cloud end can identify the control instruction, execute a processing action to be executed by the cloud end based on the control instruction, for example, retrieve information or acquire content data, and send the identified control instruction and/or content data to the smart speaker, and the smart speaker executes action processing according to the received control instruction and/or content data.
The above describes an improvement of the embodiment of the present invention in the wake-up mechanism of the smart speaker. On the other hand, in order to prompt the user to say the control instruction, under the condition of a conventional awakening word, a prompt is sent to the user through the indicator lamp, so that the user knows that the user is in a voice acquisition state, and under the condition of a quick awakening state, the prompt is often not sent to the user, therefore, under the condition of the quick awakening state, the user often ignores the fact that the user is being recorded, the problem of privacy disclosure of the user possibly cannot be noticed, and in this regard, the prompt is also sent to the user aiming at the condition of the quick awakening instruction, and therefore the attention of the user is brought up. The prompting mode can adopt a mode of indicating a lamp or voice to prompt the user that the user is being recorded.
It should be noted that the cloud may be a cloud server providing cloud processing, a distributed cloud server cluster, or a cloud processing platform, and of course, may also be a network server using a conventional technology.
In addition, because the voice control based on the wake-up mechanism is not only applied to the intelligent audio box, but also applied to a plurality of intelligent devices, such as office equipment like mobile phones and computers, and household appliances like refrigerators and air conditioners, when the intelligent devices interact with the cloud, the risk of privacy disclosure caused by the wake-up mechanism also exists, and the intelligent devices are also applicable to the technical scheme provided by the embodiment of the invention.
According to the embodiment of the invention, after the intelligent device is awakened, the voice of the user is synchronously acquired, the awakening word of the cloud is verified, and the voice of the user acquired in the awakening state is not sent to the cloud before the cloud verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.
Example one
In the embodiment of the invention, first voice of a user is collected in a monitoring state, if a wake-up word is detected, the monitoring state is switched to a wake-up state, second voice input by a subsequent user is collected continuously, text of the detected wake-up word or the first voice corresponding to the wake-up word is verified, and if the verification fails, the wake-up state is terminated, so that the second voice collected after wake-up is not sent to a network, and the security of user privacy is ensured. The verification end for performing the verification operation may be any one or more of the server, the other device end except the current device, and the processing unit for performing the local verification in the current device. Therefore, the text or the first voice of the detected awakening word can be sent to the verification end for further verification.
In practical applications, many control instructions of the smart device involve processing by the cloud server, for example, after waking up, the server needs to search for songs, or the server controls other smart devices. Therefore, the server may be generally selected as a verification end for verifying the wakeup word. The following description will mainly use a server as a verification end for exemplary purposes.
Fig. 2 is a schematic flow chart of a wake-up processing method according to an embodiment of the present invention, where the flow is executed on the smart device side, and the flow includes:
s101: and in the monitoring state, collecting the first voice of the user, and if a wake-up word is detected, switching to the wake-up state. In this step, after switching to the wake-up state, a prompt in a recording state may be sent to the user to prompt the user to pay attention to personal privacy, and the prompt may be an indicator light or a voice prompt.
S102: and in the awakening state, collecting second voice of the user.
S103: and sending the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word to a server to trigger the verification of the awakening word. Step S102 and step S103 may be performed in synchronization, thereby improving processing efficiency.
S104: the receiving server returns the verification result, if the verification result is not verified, S105 is executed, and if the verification result is verified, S106 is executed.
S105: the awake state is terminated. After the awakening state is terminated, the monitoring state can be switched back, and the user voice is continuously monitored.
S106: in response to the second voice, the processing is executed, and in a general application scenario, the second voice is sent to the server, and of course, in some cases, local control processing is also executed directly based on the second voice. In this step, if it is a fast wake-up mode, the first voice and the first voice may be sent to the server together.
After step S106, the method may further include: the intelligent device receives a second voice returned by the server or a control instruction and/or content data corresponding to the first voice and the second voice, and executes processing according to the control instruction and/or the content data. Many control commands are required to be performed by the server and the intelligent device together, for example, a control command of "play XXX song", after the server recognizes the control command, the song needs to be downloaded and transmitted to the intelligent device, and the intelligent device executes playing action according to the downloaded song and the control command. It should be noted that, the identification of the control command may also be completed on the smart device, and therefore, the control command is also transmitted to the server, because the completion of many control commands requires the cooperation of the server, and after the control command is transmitted to the server, the action processing corresponding to the control command can be completed after waiting for the content data transmitted by the server.
As shown in fig. 3, which is a second flowchart of the wake-up processing method according to the embodiment of the present invention, at the server side, the verification process for the wake-up word is executed, which specifically includes:
s201: and receiving the text of the awakening word or the first voice corresponding to the awakening word sent by the intelligent equipment.
S202: and checking the registered awakening words in the user registration information according to the text or the first voice of the awakening words, if the checking is passed, executing S203, and if the checking is not passed, executing S204.
S203: and returning the result of passing the verification to the intelligent equipment.
S204: and returning the result that the verification fails to pass to the intelligent equipment.
In addition, another wake-up processing method is provided in an embodiment of the present invention, including:
s301: in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
s302: checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
s303: if not, the awake state is terminated.
According to the awakening processing method provided by the embodiment of the invention, after the intelligent equipment is awakened, the voice of the user is synchronously acquired, the awakening word of the server is verified, and the voice of the user acquired in the awakening state is not sent to the server before the server verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.
Example two
As shown in fig. 4, which is a schematic structural diagram of a wake-up processing apparatus according to an embodiment of the present invention, the apparatus is disposed on an intelligent device, and includes:
the first voice collecting module 11 is configured to collect a first voice of a user in a monitoring state, and switch to an awake state if an awake word is detected. In addition, the first voice collecting module 11 may send a prompt in a recording state to the user after switching to the wake-up state to prompt the user to pay attention to the personal privacy, and the prompting mode may be an indicator light or a voice prompt.
The second voice collecting module 12 is configured to collect a second voice of the user in the wake-up state, and send the text of the wake-up word detected in the monitoring state or the first voice corresponding to the wake-up word to the verification end for verification. The verification end may include any one or more of a server, an end of another device other than the current device, and a processing unit in the current device for performing local verification.
The first processing module 13 is configured to terminate the awake state if the check fails. Specifically, taking the server as the verification end as an example, in the case of receiving a verification result that the verification returned by the server fails, the awake state is terminated, and in the case of receiving a verification result that the verification returned by the server passes, the second voice is sent to the server.
Furthermore, the apparatus may further include:
and the second processing module 14 is configured to receive a control instruction and/or content data corresponding to the second voice or the first voice and the second voice returned by the server, and execute processing according to the control instruction and/or the content data.
As shown in fig. 5, which is a second schematic structural diagram of a wake-up processing apparatus according to an embodiment of the present invention, the apparatus is disposed on a server, and includes:
and the awakening word receiving module 21 is configured to receive a text of the awakening word sent by the intelligent device or a first voice corresponding to the awakening word.
And the awakening word checking module 22 is configured to check the awakening word registered in the user registration information according to the text or the first voice of the awakening word, and if the awakening word passes the check, return a result that the check passes to the intelligent device, otherwise, return a result that the check fails to pass to the intelligent device.
According to the awakening processing device provided by the embodiment of the invention, after the intelligent equipment is awakened, the voice of the user is synchronously acquired, the awakening word of the server is verified, and the voice of the user acquired in the awakening state cannot be sent to the server before the server verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.
EXAMPLE III
As shown in fig. 6, it is a schematic structural diagram of the smart speaker according to the embodiment of the present invention, and includes:
and a microphone 31 for collecting the user's voice.
And the loudspeaker 32 is used for playing audio according to the control instruction.
And the monitoring processing module 33 is configured to collect the first voice of the user through a microphone in a monitoring state, and trigger switching to an awake state if an awake word is detected.
And the wake-up processing module 34 is configured to, in the wake-up state, collect the second voice of the user by the microphone, and send the text of the wake-up word detected in the monitoring state or the first voice corresponding to the wake-up word to the verification end for verification. The verification end may include any one or more of a server, an end of another device other than the current device, and a processing unit in the current device for performing local verification.
And the verification result processing module 35 is configured to terminate the awake state if the verification fails. Specifically, taking the server as the verification end as an example, in the case of receiving a verification result that the verification returned by the server fails, the awake state is terminated, and in the case of receiving a verification result that the verification returned by the server passes, the second voice is sent to the server.
According to the intelligent sound box provided by the embodiment of the invention, after being awakened, the voice of the user is synchronously acquired, the awakening word of the server is verified, and the voice of the user acquired in the awakening state is not sent to the server before the server verification result is not obtained. Therefore, the processing efficiency is considered, and the safety of the user privacy is ensured.
Example four
The foregoing embodiment describes a flow process and a device structure according to an embodiment of the present invention, and the functions of the method and the device can be implemented by an electronic device, as shown in fig. 7, which is a schematic structural diagram of the electronic device according to an embodiment of the present invention, and specifically includes: a memory 110 and a processor 120.
And a memory 110 for storing a program.
In addition to the programs described above, the memory 110 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 110 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
A processor 120, coupled to the memory 110, for executing the program in the memory 110, for performing the following:
in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing:
in the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
in case the check fails, the awake state is terminated.
Wherein, checking the detected text of the wake-up word or the first voice corresponding to the wake-up word may include:
and sending the detected text of the awakening word or the first voice corresponding to the awakening word to a checking end for checking, wherein the checking end comprises any one or more of a server, other equipment ends except the current equipment and a processing unit used for executing local checking in the current equipment.
Wherein, the checking the detected text of the awakening word or the first voice corresponding to the awakening word comprises:
sending the detected text of the awakening word or the first voice corresponding to the awakening word to a server for verification;
And if a verification result that the verification returned by the server fails is received, terminating the awakening state, and if the verification result that the verification returned by the server passes is received, sending the second voice to the server.
The awakening words can be quick awakening instructions for executing awakening and control instructions, and after the awakening words are switched to the awakening state, prompts in a recording state are sent to a user.
The wakeup word may be a conventional wakeup word, and the processing may further include:
and receiving a control instruction and/or content data corresponding to the second voice returned by the server, and executing processing according to the control instruction and/or the content data.
The wakeup word may be a quick wakeup instruction for executing wakeup and control instruction, and the processing may further include:
and receiving a control instruction and/or content data corresponding to the first voice and the second voice returned by the server, and executing processing according to the control instruction and/or the content data.
As another embodiment, the processing may include:
receiving a text of a wake-up word sent by the intelligent equipment or a first voice corresponding to the wake-up word, verifying the text of the wake-up word or the first voice with a registered wake-up word in user registration information, if the text of the wake-up word or the first voice passes the verification, returning a result that the verification passes to the intelligent equipment, and if the result that the verification fails to pass is not returned to the intelligent equipment.
As another embodiment, the processing may include:
in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if not, the awake state is terminated.
The detailed description of the above processing procedure, the detailed description of the technical principle, and the detailed analysis of the technical effect are described in the foregoing embodiments, and are not repeated herein.
Further, as shown, the electronic device may further include: communication components 130, power components 140, audio components 150, display 160, and other components. Only some of the components are schematically shown in the figure and it is not meant that the electronic device comprises only the components shown in the figure.
The communication component 130 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 130 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 130 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The power supply component 140 provides power to the various components of the electronic device. The power components 140 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 150 is configured to output and/or input audio signals. For example, the audio component 150 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 110 or transmitted via the communication component 130. In some embodiments, audio assembly 150 also includes a speaker for outputting audio signals.
The display 160 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (13)

1. A wake-up processing method, comprising:
in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing:
In the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if the verification is passed, processing is performed in response to the second voice, and if the verification is not passed, the awake state is terminated.
2. The method of claim 1, wherein verifying the text of the detected wake word or the first speech corresponding to the wake word comprises:
and sending the detected text of the awakening word or the first voice corresponding to the awakening word to a checking end for checking, wherein the checking end comprises any one or more of a server, other equipment ends except the current equipment and a processing unit used for executing local checking in the current equipment.
3. The method of claim 1, wherein verifying the text of the detected wake word or the first speech corresponding to the wake word comprises:
sending the detected text of the awakening word or the first voice corresponding to the awakening word to a server for verification;
and if a verification result that the verification returned by the server fails is received, terminating the awakening state, and if the verification result that the verification returned by the server passes is received, sending the second voice to the server.
4. The method of claim 1, wherein if the wake word is a quick wake command for executing wake and control commands, a prompt is issued to the user when switching to the wake state.
5. The method of claim 1, wherein the wake word is a regular wake word, the method further comprising:
and receiving a control instruction and/or content data corresponding to the second voice returned by the server, and executing processing according to the control instruction and/or the content data.
6. The method of claim 3, wherein the wake word is a quick wake instruction for executing wake and control instructions, the method further comprising:
and receiving a control instruction and/or content data corresponding to the first voice and the second voice returned by the server, and executing processing according to the control instruction and/or the content data.
7. A wake-up processing method, comprising:
receiving a text of a wake-up word sent by the intelligent equipment or a first voice corresponding to the wake-up word, verifying the text of the wake-up word or the first voice with a registered wake-up word in user registration information, if the text of the wake-up word or the first voice passes the verification, returning a result that the verification passes to the intelligent equipment, and if the result that the verification fails to pass is not returned to the intelligent equipment.
8. A wake-up processing apparatus comprising:
the first voice acquisition module is used for acquiring a first voice of a user in a monitoring state, and switching to an awakening state if an awakening word is detected;
the second voice acquisition module is used for acquiring second voice of the user in the awakening state and sending the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word to the verification end for verification, wherein the verification end comprises any one or more of a server, other equipment ends except the current equipment and a processing unit used for executing local verification in the current equipment;
and the first processing module is used for responding to the second voice execution processing under the condition that the verification is passed, terminating the awakening state under the condition that the verification is not passed and terminating the awakening state under the condition that the verification is not passed.
9. A wake-up processing apparatus comprising:
the awakening word receiving module is used for receiving the text of the awakening word sent by the intelligent equipment or the first voice corresponding to the awakening word;
and the awakening word checking module is used for checking the awakening words registered in the user registration information according to the text of the awakening words or the first voice, if the awakening words pass the checking, a result that the checking passes is returned to the intelligent equipment, and otherwise, a result that the checking fails is returned to the intelligent equipment.
10. An electronic device, comprising:
a memory for storing a program;
a processor, coupled to the memory, for executing the program for:
in a monitoring state, collecting a first voice of a user, if a wake-up word is detected, switching to a wake-up state, and executing the following processing:
in the awakening state, acquiring second voice of the user, and checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if the verification is passed, processing is performed in response to the second voice, and if the verification is not passed, the awake state is terminated.
11. A smart sound box, comprising:
the microphone is used for collecting user voice;
the loudspeaker is used for playing audio according to the control instruction;
the monitoring processing module is used for collecting a first voice of a user through a microphone in a monitoring state, and triggering and switching to an awakening state if an awakening word is detected;
the awakening processing module is used for collecting the second voice of the user by the microphone in the awakening state, and sending the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word to the verification end for verification;
And the verification result processing module is used for responding to the second voice to execute processing under the condition of receiving a verification result which is returned by the verification end and passes the verification, and terminating the awakening state under the condition of receiving a verification result which is returned by the verification end and fails the verification.
12. A wake-up processing method, comprising:
in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if not, the awake state is terminated.
13. An electronic device, comprising:
a memory for storing a program;
a processor, coupled to the memory, for executing the program for:
in a monitoring state, collecting a first voice of a user, and if a wake-up word is detected, switching to a wake-up state;
checking the text of the awakening word detected in the monitoring state or the first voice corresponding to the awakening word;
if not, the awake state is terminated.
CN201910351630.6A 2019-04-28 2019-04-28 Awakening processing method and device, intelligent sound box and electronic equipment Pending CN111862965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910351630.6A CN111862965A (en) 2019-04-28 2019-04-28 Awakening processing method and device, intelligent sound box and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910351630.6A CN111862965A (en) 2019-04-28 2019-04-28 Awakening processing method and device, intelligent sound box and electronic equipment

Publications (1)

Publication Number Publication Date
CN111862965A true CN111862965A (en) 2020-10-30

Family

ID=72965241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910351630.6A Pending CN111862965A (en) 2019-04-28 2019-04-28 Awakening processing method and device, intelligent sound box and electronic equipment

Country Status (1)

Country Link
CN (1) CN111862965A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN113066501A (en) * 2021-03-15 2021-07-02 Oppo广东移动通信有限公司 Method and device for starting terminal by voice, medium and electronic equipment
CN113066490A (en) * 2021-03-16 2021-07-02 海信视像科技股份有限公司 Prompting method of awakening response and display equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886957A (en) * 2017-11-17 2018-04-06 广州势必可赢网络科技有限公司 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition
US20180321905A1 (en) * 2017-05-03 2018-11-08 Transcendent Technologies Corp. Enhanced control, customization, and/or security of a sound controlled device such as a voice controlled assistance device
JP2019015952A (en) * 2017-07-05 2019-01-31 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Wake up method, device and system, cloud server and readable medium
CN109493849A (en) * 2018-12-29 2019-03-19 联想(北京)有限公司 Voice awakening method, device and electronic equipment
CN109545211A (en) * 2018-12-07 2019-03-29 苏州思必驰信息科技有限公司 Voice interactive method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180321905A1 (en) * 2017-05-03 2018-11-08 Transcendent Technologies Corp. Enhanced control, customization, and/or security of a sound controlled device such as a voice controlled assistance device
JP2019015952A (en) * 2017-07-05 2019-01-31 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Wake up method, device and system, cloud server and readable medium
CN107886957A (en) * 2017-11-17 2018-04-06 广州势必可赢网络科技有限公司 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition
CN109545211A (en) * 2018-12-07 2019-03-29 苏州思必驰信息科技有限公司 Voice interactive method and system
CN109493849A (en) * 2018-12-29 2019-03-19 联想(北京)有限公司 Voice awakening method, device and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN112634897B (en) * 2020-12-31 2022-10-28 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN113066501A (en) * 2021-03-15 2021-07-02 Oppo广东移动通信有限公司 Method and device for starting terminal by voice, medium and electronic equipment
CN113066490A (en) * 2021-03-16 2021-07-02 海信视像科技股份有限公司 Prompting method of awakening response and display equipment

Similar Documents

Publication Publication Date Title
CN108320744B (en) Voice processing method and device, electronic equipment and computer readable storage medium
CN106463112B (en) Voice recognition method, voice awakening device, voice recognition device and terminal
CN111192591B (en) Awakening method and device of intelligent equipment, intelligent sound box and storage medium
WO2019007245A1 (en) Processing method, control method and recognition method, and apparatus and electronic device therefor
CN109410952B (en) Voice awakening method, device and system
EP3828741B1 (en) Key phrase detection with audio watermarking
CN107465595B (en) Equipment message playing control method and device, message playing equipment and storage medium
EP3157003B1 (en) Terminal control method and device, voice control device and terminal
CN108154882A (en) The control method and control device of remote control equipment, storage medium and remote control equipment
CN103456306A (en) Method and apparatus for executing voice command in electronic device
KR20160001965A (en) Providing Method For Voice Command and Electronic Device supporting the same
CN111862965A (en) Awakening processing method and device, intelligent sound box and electronic equipment
US20170309156A1 (en) Mobile device self-identification system
WO2020001165A1 (en) Voice control method and apparatus, and storage medium and electronic device
CN109903758B (en) Audio processing method and device and terminal equipment
US11178280B2 (en) Input during conversational session
CN105700660A (en) Electronic Device Comprising a Wake Up Module Distinct From a Core Domain
KR102269387B1 (en) Information sharing method depends on a situation and electronic device supporting the same
CN103338311A (en) Method for starting APP with screen locking interface of smartphone
CN110751948A (en) Voice recognition method, device, storage medium and voice equipment
CN112230877A (en) Voice operation method and device, storage medium and electronic equipment
CN111681652A (en) Voice control method, system and storage medium of intelligent household appliance
CN110933345B (en) Method for reducing television standby power consumption, television and storage medium
CN109992965A (en) Process handling method and device, electronic equipment, computer readable storage medium
CN109903751B (en) Keyword confirmation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination