CN115132172A - Intelligent equipment awakening method and device - Google Patents

Intelligent equipment awakening method and device Download PDF

Info

Publication number
CN115132172A
CN115132172A CN202110313234.1A CN202110313234A CN115132172A CN 115132172 A CN115132172 A CN 115132172A CN 202110313234 A CN202110313234 A CN 202110313234A CN 115132172 A CN115132172 A CN 115132172A
Authority
CN
China
Prior art keywords
scene
wake
voice
awakening
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110313234.1A
Other languages
Chinese (zh)
Inventor
晏小辉
袁牧人
宋凯凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110313234.1A priority Critical patent/CN115132172A/en
Publication of CN115132172A publication Critical patent/CN115132172A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses an intelligent equipment awakening method and an intelligent equipment awakening device in the field of artificial intelligence, wherein the method comprises the following steps: receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device; obtaining the confidence of a wakeup word contained in the first voice; determining a first scene and an awakening threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instruction statement used for controlling the intelligent device to execute operation, and the first scene is a scene corresponding to the interaction intention of the previous round; and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value. According to the technical scheme, the corresponding scene and the awakening threshold corresponding to the scene are determined based on the previous round of interaction intention, so that whether the intelligent equipment is awakened or not is determined according to the confidence coefficient and the awakening threshold.

Description

Intelligent equipment awakening method and device
Technical Field
The invention relates to a man-machine interaction technology in the field of artificial intelligence, in particular to an intelligent equipment awakening method and device
Background
As a research direction in the field of artificial intelligence, human-Computer interaction has been developed from interaction via a keyboard, a mouse, a touch screen, and the like, represented by a Personal Computer (PC), a smart phone, and the like, to voice interaction represented by an intelligent dialog system, such as a mobile phone voice assistant, a smart speaker, a smart large screen, a smart car, and the like. Before voice interaction, the user needs to wake up the smart device by voice, such as a wake-up word of "hi, schoolchild" or the like, so that the smart device enters an operating state from a sleep state, and the smart device can normally process the user instruction. Therefore, the quality of the voice awakening effect greatly influences the user experience of voice interaction.
The intelligent device is preset with a wake-up model, and the collected user voice is input into the wake-up model to obtain the confidence coefficient of the wake-up word included in the user voice. And if the confidence coefficient of the awakening word is greater than the awakening threshold value, awakening the intelligent equipment, and enabling the intelligent equipment to enter a state capable of normally processing the user instruction. Common wake-up models include Hidden Markov Models (HMM) models, Deep Neural Networks (DNN) models, and other statistical models. In most current schemes, the wake-up threshold is a fixed value, and if the wake-up threshold is set too high, in scenes with numerous people such as a user party, sound sources from all parties are mixed together due to inconsistent voice colors, so that the intelligent device is very easy to wake up; if the wake-up threshold is set to be low, false wake-up is easily caused in a relatively quiet scene such as a user's rest. Therefore, the fixed wake-up threshold is not enough to meet the requirements of different users and different environments on the wake-up threshold, and poor user experience is caused.
Disclosure of Invention
The embodiment of the invention provides an intelligent equipment awakening method and device, which are used for solving the problem that a fixed awakening threshold is not enough to meet requirements of different users and different environments on the awakening threshold, so that poor user experience is caused. The technical scheme is as follows:
in a first aspect, an embodiment of the present application provides an intelligent device wake-up method, which is applied to an intelligent device, and the method includes:
receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device;
obtaining the confidence of a wakeup word contained in the first voice;
determining a first scene and an awakening threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; the last round of interaction intention is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instruction statement used for controlling the intelligent device to execute operation, and the first scene is a scene corresponding to the last round of interaction intention;
and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value.
According to the technical scheme, the corresponding scene and the awakening threshold corresponding to the scene are determined based on the previous round of interaction intention, so that whether the intelligent equipment is awakened or not is determined according to the confidence coefficient and the awakening threshold.
In one possible implementation, the wake threshold comprises a threshold that is automatically updated based on historical interactive behavior data, where the historical interactive behavior data comprises a "scene trigger" number that is a cumulative number of triggers for a certain scene and a "scene-wake event pair" number that is a cumulative number of wakes for the smart device for a certain scene.
In one possible implementation, the method further comprises:
under the condition of awakening the intelligent equipment, updating the times of 'scene trigger' and 'scene-awakening event pair' corresponding to the first scene;
and updating the awakening threshold corresponding to the first scene according to the updated 'scene trigger' times and 'scene-awakening event pair' times corresponding to the first scene, so that the effect of automatically adjusting the awakening threshold based on historical interactive behavior data is realized.
In a possible implementation, the updating the wake-up threshold corresponding to the first scene according to the updated "scene trigger" times and the "scene-wake-up event pair" times corresponding to the first scene specifically includes:
updating the awakening probability corresponding to the first scene according to the updated 'scene trigger' times and 'scene-awakening event pair' times corresponding to the first scene;
and updating the awakening threshold corresponding to the first scene according to the awakening probability of the intelligent equipment corresponding to the updated first scene.
In the possible implementation, the awakening probability corresponding to the first scene is automatically adjusted based on the historical interactive behavior data, and then the awakening threshold corresponding to the first scene is automatically adjusted according to the adjusted awakening probability.
In a possible implementation, the wake-up probability corresponding to the first scenario is calculated based on the following formula:
Figure BDA0002990127860000021
wherein, the P (w | c) 1 ) For representing the wake-up probability corresponding to the first scene, said n (c) 1 W) is used to represent the number of "scene-wake event pairs" corresponding to the first scene, n (c) 1 ) For indicating the number of "scene trigger" times corresponding to the first scene.
In a possible implementation, the wake-up threshold corresponding to the first scenario is calculated based on the following formula:
Figure BDA0002990127860000022
wherein, the
Figure BDA0002990127860000023
A wake-up threshold for representing a first scenario correspondence, said
Figure BDA0002990127860000024
The awakening threshold value is used for representing the awakening threshold value corresponding to the daily standby scene, the awakening threshold value corresponding to the daily standby scene is 0.5, and the P (w | c) 1 ) For representing a first sceneCorresponding wake-up probability, said P (w | c) 0 ) Is used for representing the awakening probability corresponding to the daily standby scene, and alpha is used for representing the adjusting amplitude.
In one possible implementation, the method further comprises:
recording a first wake-up event corresponding to the first voice, and determining a first time of the first wake-up event corresponding to the first voice.
In one possible implementation, the method further comprises:
determining a time difference between the first moment and a second moment, wherein the second moment is a moment of determining an interaction intention corresponding to the second voice;
and under the condition that the time difference is smaller than a preset time threshold, determining that the first voice and the second voice are in the same conversation.
In a second aspect, an embodiment of the present invention further provides an apparatus for waking up a smart device, including at least one processor, where the processor is configured to execute instructions stored in a memory, so as to cause the communication apparatus to perform:
the method according to the first aspect and the various steps in the various possible implementations.
In a third aspect, an embodiment of the present invention provides a computer program product including instructions, which when run on a computer, cause the computer to perform:
the method according to the first aspect and the various steps in the various possible implementations.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method and various possible implementations of the first aspect are performed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of an artificial intelligence framework according to an embodiment of the present invention;
FIG. 2 is a diagram of a system architecture for automatically adjusting wake-up sensitivity according to an embodiment of the present invention;
fig. 3 is a detailed structural schematic diagram of a sensitivity control module according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a method for waking up an intelligent device according to an embodiment of the present invention;
fig. 5 is another flowchart of a method for waking up an intelligent device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the term "and/or" in this application is only one kind of association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The terms "first" and "second," and the like, in the description and in the claims of embodiments of the present invention are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first voice and the second voice, etc. are for distinguishing different voices, not for describing a specific order of the target objects. In the embodiments of the present invention, words such as "exemplary," "for example," or "such as" are used to mean serving as examples, illustrations, or illustrations. Any embodiment or design described as "exemplary," "for example," or "such as" in embodiments of the invention is not to be construed as advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion. In the description of the embodiments of the present invention, the meaning of "a plurality" means two or more unless otherwise specified.
FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.
The artificial intelligence topic framework described above is set forth below in terms of two dimensions, the "intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).
The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" process of consolidation.
The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the artificial intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure:
the infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as a CPU, an NPU, a GPU, an ASIC, an FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General purpose capabilities
After the above-mentioned data processing, further general capabilities may be formed based on the results of the data processing, such as algorithms or a general system, for example, translation, analysis of text, computer vision processing, speech recognition, recognition of images, and so on.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.
To facilitate the explanation of the present solution, first a definition of terms that may be involved in embodiments of the present invention is given:
(1) and (3) awakening word: and the preset words are used for awakening the intelligent equipment. The smart device may enter the active state from the dormant state in response to a voice uttered by the user containing a wake-up word. The wake-up word may be, for example, "xiaoi", "hi,
Figure BDA0002990127860000041
"and the like.
(2) Wake-up sensitivity: the degree to which the smart device reacts to a voice input by the user containing a wake-up word. In the embodiment of the invention, the higher the awakening sensitivity is, the easier the intelligent device is awakened, and the lower the awakening sensitivity is, the less easy the intelligent device is awakened. The wake-up sensitivity may be in the form of, for example, "high", "medium", "low", or a quantized value. It should be clear that a person skilled in the art may also use the opposite logic of the embodiments of the invention to set the wake-up sensitivity without going beyond the scope covered by the present application.
(3) Wake-up threshold: the triggering conditions for the intelligent device to be awakened by the awakening words are represented, and the awakening thresholds of different awakening words can be the same or different. The wake-up threshold may be measured in terms of a numerical value. That is, the wake threshold is the minimum threshold for the smart device to wake up by the wake word. In general, when the confidence of a wakeup word contained in the user speech received by the intelligent device is greater than the wakeup threshold, the intelligent device is awakened; and when the confidence degree of the awakening words contained in the user voice received by the intelligent device is smaller than the awakening threshold value, the intelligent device is not awakened. It should be clear that a person skilled in the art may also use the opposite logic of the embodiments of the present invention to set the wake-up threshold without going beyond the scope covered by the present application.
(4) A wake-up event: and when the confidence coefficient of the awakening words contained in the characterization user voice is greater than the awakening threshold corresponding to the scene to which the previous round of interaction intention is mapped, the intelligent device is awakened. For example, the confidence of the wake word contained in the user speech is 0.6, which is greater than the wake threshold 0.4 corresponding to the scene to which the previous round of interaction intention is mapped, and the smart device is woken up. The wake event may be expressed as, for example, "play music, wake up" which is the previous round of interaction intent in this example.
(5) Conversation: a number of user speech inputs that are more continuous in time. In the embodiment of the invention, a preset time threshold value can be adopted to judge whether the voice input of the two front and back users are in the same conversation. For example, if a time difference between a wake-up event corresponding to a subsequent user voice input received by the sensitivity control module 207 and an interaction intention corresponding to a previous user voice input received is smaller than a preset time threshold, it may be considered that the subsequent user voice input and the previous user voice input are in the same session.
(6) And (3) interaction intention: characterizing the intent that the user's speech is intended to convey. For example, the user's voice "plays a popular song at will", and the mapped interaction intent is "play music". As another example, the user speech "play a children's story" and the mapped interaction intent is "play children's content".
(7) Scene: the environment of the session where the intelligent device is located is judged according to the voice input of the user. And judging the corresponding interaction intention according to the voice input of the user. From the intent-to-scene mapping table (i.e., table 1), the interaction intents can be mapped to corresponding scenes, e.g., "entertainment," "children," and "daily standby" as shown in table 1.
(8) Number of "scene triggers": characterizing the number of times a certain scene is triggered. The "scene trigger" times may be recorded and updated cumulatively. For example, assume a "scenario trigger" number n (c) for a "daily standby" scenario 0 ) 299, and updates the "scene trigger" number n (c) after the "daily standby" scene is triggered again 0 )=299+1=300。
(9) Number of "scene-wake event pairs": the number of times the smart device is awakened when a certain scenario is triggered is characterized. The "scene-wake event pair" times may be cumulatively recorded and updated. For example, the "scene trigger" number n (c) of "daily Standby" scenes 0 ) The original value of (c) is n (assuming that the 299 times "daily standby" scenario includes a scenario that the smart device is awakened 99 times and a scenario that the smart device is not awakened 200 times, the original value of the number of times the smart device is awakened in the "daily standby" scenario is n (c) 0 W) is 99. After the daily standby scene is triggered again, if the daily standby scene is triggered again "When the intelligent device is awakened under the condition of scene triggering, updating the number n (c) of times that the intelligent device is awakened under the scene of daily standby 0 ,w)=99+1=100。
(10) Interactive behavior data: the method comprises data of 'scene-wake event pair' times, 'scene trigger' times, an interactive information table and the like. The "scene-wake event pair" times and the "scene trigger" times may be used to calculate a wake probability of the smart device in a certain scene, and then update a wake threshold corresponding to the certain scene. The interaction information table is used for recording the time when the sensitivity control module 207 receives the wake-up event and whether the wake-up event is received, and recording the interaction intention received by the sensitivity control module 207, the scene to which the interaction intention is mapped, and the time when the interaction intention is received.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. In the case of inconsistency, the meaning described in the specification or the meaning derived from the content described in the specification shall be applied. In addition, the terminology used in the present application is for the purpose of describing the embodiments of the present invention only and is not intended to be limiting of the present application.
Next, a possible application scenario and a solved technical problem of the solution provided by the embodiment of the present invention are first described.
The wide application of artificial intelligence systems in various fields makes human-computer interaction application more and more popular. For example, in the application field of smart home, a user can control home appliances through voice recognition. Before voice interaction, a user needs to wake up the intelligent device through voice by using a wake-up word such as 'hi schoolmate' and the like, so that the intelligent device enters a working state from a dormant state, and the intelligent device can normally process a user instruction. Therefore, the quality of the voice awakening effect greatly influences the user experience of voice interaction. If the awakening threshold value is set too high (namely the awakening sensitivity is set too low), in scenes with numerous personnel, sound sources from all parties are mixed together due to inconsistent voice colors, and the intelligent equipment is very easy to be awakened; if the wake-up threshold is set too low (i.e. the wake-up sensitivity is set too high), false wake-up is easily caused in a quiet scene.
In a possible implementation, the intelligent device can dynamically adjust the awakening threshold value of the intelligent device according to preset parameter information by acquiring the preset parameter information in the surrounding environment of the intelligent device, so that the requirements of a user on the awakening threshold value of the intelligent device in different use scenes are met. However, the method does not consider the user behavior, and the preset parameter information in the ambient environment acquired by the smart device is the same for different user behaviors, for example, the smart device acquires the preset parameter information no matter when the voice input of the user is the voice of the user containing the wakeup word, such as "i am a little scholar", or when the voice input of the user is the voice of the user containing the instruction for controlling the smart device to execute the operation, such as "playing a popular song". The preset parameter information may be user activity data. Alternatively, the user activity data may be network traffic data detected by a router connected to the smart device, for example. According to the analysis, the scheme only adjusts the awakening threshold value according to the preset parameter information in the surrounding environment of the intelligent device, but cannot adjust the awakening threshold value according to different user behaviors, and partial user experience is damaged.
In another possible implementation, the smart device may adjust the wake-up threshold of the wake-up word by obtaining the wake-up threshold of at least one wake-up word and counting the number of times of abnormal wake-up corresponding to the wake-up word, so that the wake-up threshold of the wake-up word is more adaptive to the environmental change in which the terminal is located. The abnormal awakening refers to that the actual awakening condition of the intelligent device is different from the expected awakening condition. For example, the wake-up may be false wake-up or not wake-up. The false wake-up is that the actual wake-up condition of the smart device is that the smart device is woken up, and the expected wake-up condition of the smart device is that the smart device is not to be woken up. The above-mentioned actual wake-up situation of the non-wake-up being the smart device is not wake-up, and the expected wake-up situation of the smart device is that the smart device should be wake-up. This approach does not take into account user behavior and scenarios, which may lead to false wakeups and un-wakeups swinging back and forth.
Therefore, the embodiment of the invention provides an intelligent device awakening method. The method automatically identifies scenes based on the previous round of interaction intention in the same session of a user; automatically calculating the awakening probability of the intelligent equipment of the user in the scene according to historical interactive behavior data; and automatically adjusting the awakening threshold value according to the awakening probability of the intelligent equipment in the scene, so that the awakening sensitivity is automatically adjusted. According to the technical scheme, when the intelligent equipment is awakened, the information such as the scene mapped by the previous round of interaction intention and the previous round of interaction intention corresponding to the user voice containing the awakening word can be automatically identified, so that the awakening threshold value is flexibly and automatically adjusted, the awakening threshold value is more reasonable, the scene change where the intelligent equipment is located can be more adapted, the intelligent equipment can better respond to the awakening word, and the user experience is improved. The above method may be implemented using a system architecture diagram that automatically adjusts the wake-up threshold as shown in fig. 2.
A detailed description of the system architecture for automatically adjusting the wake-up threshold is provided below.
Referring to fig. 2, the system architecture diagram includes, but is not limited to, a microphone 201, an audio signal acquisition and processing module 202, an audio stream Ring Buffer (Ring Buffer)203, a voice assistant Application (APP) 204, a voice assistant cloud 205, a wake-up engine 206, and a sensitivity control module 207, where a detailed structural diagram of the sensitivity control module 207 is shown in fig. 3, and includes, but is not limited to, an interactive behavior statistics sub-module 2071 and a scene recognition sub-module 2072.
In the operation of the system, the microphone 201 collects the voice input of the user. Next, the audio signal collecting and processing module 202 collects the voice input of the user from the microphone 201, converts the voice input into an audio stream, and sends the audio stream to the Ring Buffer 203. Then, the wake-up engine 206 obtains the wake-up threshold corresponding to the scene to which the interaction intention of the previous round is mapped from the sensitivity control module 207, and the wake-up engine 206 monitors the audio stream in the audio stream Ring Buffer 203 in real time and segments the audio stream into a plurality of voice segments. The wake engine 206 streams the plurality of voice segments, for example, when the wake engine 206 processes a first voice segment, the confidence of the wake word included in the voice input of the user is less than the wake threshold, and then processes a second voice segment. At this time, if the confidence of the wake word included in the obtained voice input of the user is greater than the wake threshold, the wake engine 206 stops processing the subsequent voice segment. At this point, the wake engine 206 wakes up the smart device and sends a wake event to the sensitivity control module 207. After the smart device wakes up successfully, the wake engine 206 activates the voice assistant APP 204. In turn, the voice assistant APP204 receives a user's voice input from the audio stream Ring Buffer 203. At this point, one possible implementation is that voice assistant APP204 sends the user's voice input to voice assistant cloud 205 to request the corresponding interaction intent. It should be noted that the voice assistant cloud 205 may perform Natural Language Understanding (NLU) to obtain an interaction intention corresponding to the voice input of the user. Moreover, after receiving the request of the voice assistant APP204, the voice assistant cloud 205 returns an interaction intention corresponding to the voice input of the user. Finally, the voice assistant APP204 passes the interaction intention, which may be "play music" or "play children's content", for example, to the sensitivity control module 207. So far, the sensitivity control module 207 completes the reception of one interactive intention, and records the time, scene, whether the interactive intention is a wake-up event, and other information corresponding to the interactive intention. For convenience of description, the time when the sensitivity control module 207 receives the current interaction intention is the third time, and the voice input of the user corresponding to the current interaction intention is recorded as the first user voice.
After the sensitivity control module 207 completes receiving the primary interaction intention, if the next wake-up event transmitted to it by the wake-up engine 206 is received again, it will be determined whether the voice input of the user corresponding to the wake-up event and the voice input of the user (the first user voice) corresponding to the last received interaction intention are in the same session, and the specific process is as follows:
when the sensitivity control module 207 receives the next wake-up event passed to it by the wake-up engine 206 again (note that the voice input of the user corresponding to the wake-up event is the second user voice), note that this time is the fourth time, the sensitivity control module 207 compares the fourth time with the third time (that is, the time when the sensitivity control module 207 last received the interaction intention from the voice assistant APP 204), and if the time difference between the fourth time and the third time is within the preset time threshold, it is considered that the second user voice and the first user voice are in the same session. For example, the sensitivity control module 207, at 13/1/24/2021: 01: 00 receives the wake event that the wake engine 206 again passed to it, and in 2021, 1 month, 24 days 13: 00: 00 receives the interaction intention transmitted to it by the voice assistant APP204, and if the preset time threshold value is 3 minutes, since the time difference between the two times is 1 minute and is smaller than the preset time threshold value (3 minutes), the sensitivity control module 207 determines that the second user voice and the first user voice are in the same session. The above example is only intended as an example.
After the above steps, the interactive behavior statistics submodule 2071 of the sensitivity control module 207 counts the user interactive behavior data, and can calculate the awakening probability of the intelligent device in each scene, so as to calculate the awakening threshold corresponding to each scene.
In the embodiment of the present invention, a system architecture diagram as shown in fig. 2 is used to implement a smart device wake-up method. How to automatically identify a scene based on the interaction intention of the previous round in the same session of the user, automatically calculate the wake-up probability of the smart device in the scene according to the historical interaction behavior data, and automatically adjust the wake-up threshold according to the wake-up probability of the smart device in the scene will be described in more detail below in conjunction with the system architecture diagram shown in fig. 2.
The microphone 201 shown in fig. 2 collects the third voice of the user, which is referred to herein as the user voice containing the wake-up word. The audio signal acquisition and processing module 202 shown in fig. 2 converts it into an audio stream and sends it to the audio stream Ring Buffer 203. The wake-up engine 206 queries the wake-up threshold for the current scene from the sensitivity control module 207 as shown in fig. 2. For example, the sensitivity control module 207 queries that the interaction intention of the previous round of the same session is "standby state", and according to the intention-scene mapping table shown in table 1, it can be determined that the scene of the first column mapped by the interaction intention "standby state" of the second column is "daily standby", and further the wake-up threshold mapped by the scene to the third column is 0.5, and the wake-up sensitivity requirement in the scene is "unchanged". As shown in fig. 2, the wake engine 206 monitors the audio stream in the audio stream Ring Buffer 203 in real time and divides the audio stream into a plurality of voice segments, the wake engine 206 performs streaming processing on the plurality of voice segments, and when the confidence level of the wake word included in the audio stream obtained by processing a certain voice segment is greater than 0.5 of the wake threshold corresponding to the scenario of "daily standby", the wake engine 206 stops processing subsequent voice segments and wakes up the smart device.
Next, a wake-up word is preset as "art". The third voice of the user is "art, how the weather is today, art, and what the specific temperature is" as an example, and the process of the wake-up engine 206 determining whether to wake up the smart device according to the third voice of the user is described in detail:
after the audio signal collecting and processing module 202 converts the third voice of the user into the audio stream, a possible implementation manner is that the wake engine 206 cuts the audio stream into voice segments with preset duration at preset intervals. For example, the preset interval time is 1 second, the preset duration is 5 seconds, that is, 0 to 5 seconds are the first voice segment, 1 to 6 seconds are the second voice segment, 2 to 7 seconds are the third voice segment, …, and so on, until the whole audio stream is completely split. In the embodiment of the present invention, the segmented voice segments may be "art" (voice segment a), "art, today weather" (voice segment B), "weather how-like" (voice segment C), "how-like, small art" (voice segment D), "small art, specific temperature" (voice segment E), and "what temperature" (voice segment F). At this point, one possible implementation is that the wake engine 206 streams the voice segments A, B, C, D, E and F. Specifically, first, the wake engine 206 processes the voice segment a, and outputs a third voice containing a wake word "xianzhi" with a confidence level of 0.2, which is smaller than the wake threshold of 0.5. Next, the wake engine 206 processes the voice segment B, and outputs a third voice containing a wake word "xiaozhi" with a confidence level of 0.3, which is smaller than the wake threshold of 0.5. Then, the wake engine 206 processes the voice segment C, and outputs a third voice containing a wake word "xianzhi" with a confidence level of 0.2, which is less than the wake threshold of 0.5. Further, the wake engine 206 processes the voice segment D, and outputs a third voice including a wake word "xiaozhi" with a confidence level of 0.6, which is greater than the wake threshold. At this point, the wake-up engine stops processing subsequent voice segments and wakes up the smart device. The intelligent device comprises but is not limited to an intelligent sound box, an intelligent large screen, an intelligent vehicle, a mobile phone, a tablet, a notebook computer, a desktop computer, a watch and a bracelet.
Table 1: intent-to-scene mapping table
Figure BDA0002990127860000081
It should be noted that the wake-up threshold corresponding to the "daily standby" scenario is preset and unchanged, and is always a fixed value (as shown in table 1, in the embodiment of the present invention, 0.5 is taken as an example). The wake-up threshold for other scenarios is dynamically changed. Comparing a wake-up threshold corresponding to a certain scene with a wake-up threshold corresponding to a daily standby scene, and if the wake-up threshold corresponding to the certain scene is greater than the wake-up threshold corresponding to the daily standby scene, indicating that the wake-up sensitivity needs to be reduced in the certain scene; if the awakening threshold corresponding to a certain scene is smaller than the awakening threshold corresponding to the daily standby scene, the awakening sensitivity needs to be improved in the certain scene. For example, as shown in table 1, if the wake-up threshold 0.6872 corresponding to the "do not disturb" scenario is greater than the wake-up threshold 0.5 corresponding to the "daily standby" scenario, it indicates that the wake-up sensitivity needs to be reduced in the "do not disturb" scenario; the wake-up threshold 0.3837 for the "entertainment" scenario is less than 0.5 for the "daily standby" scenario, indicating that the wake-up sensitivity needs to be improved in the "entertainment" scenario.
It should also be noted that a round of interaction intents on a plurality of identical sessions can be mapped to a scene. As shown in table 1, the interaction intentions "play music", "play voiced", and "play game" map to the same scene "entertainment".
After the smart device is woken up, the wake engine 206 activates the voice assistant APP204 as shown in fig. 2 and sends a wake event (triggered by a third voice including a wake word) to the sensitivity control module 207 for recording. For example, in a possible implementation manner, the interactive behavior statistics sub-module 2071 (as shown in fig. 3) of the sensitivity control module 207 writes the received wake up event and the time when the wake up event is received into the interactive information table (for example, as shown in tables 2 and 3), that is, the interactive information table is updated after a new wake up event occurs, and a new wake up event record is added into the interactive information table, where the wake up event record may include information such as time, interactive intention, scene, and whether it is a wake up event. For example, table 3 exemplarily shows an updated mutual information table of the original mutual information table (table 2) after a new wake-up event occurs. As shown in tables 2 and 3, table 3 adds a new one to table 2, which occurs at 2 months, 20 days 11 in 2021: 00: 01, the interaction intention corresponding to the wake event is "none", the scenario is "none", the wake event is "yes", the wake event may be triggered by the user's voice including a wake word without instructions to control the smart device to perform an operation, e.g. "hi, xiao yi", "do schoolmates" are present ".
Next, taking the wake-up event as "interaction intention of previous round — standby state, wake-up", and the scenario to which the interaction intention of previous round "standby state" is mapped as "daily standby", a process of calculating the wake-up probability of the smart device in the "daily standby" scenario is described in detail:
the wake-up engine 206 sends a wake-up event of "last round of interactive intention — standby state, wake-up" to the sensitivity control module 207. The wake-up event "last round of interaction intention-standby state" represents that the event is a wake-up event, "last round of interaction intention-standby state" represents that the last received interaction intention from the voice assistant APP204 before the wake-up event is "standby state". Scene recognition submodule 2072 of sensitivity control module (as shown in fig. 3)) Mapping the interactive intention standby state to the scene daily standby according to an intention-scene mapping table (namely table 1), and updating the number n (c) of scene-awakening event pairs under the scene of daily standby 0 ,w)。
Let n (c) 0 W), namely the number of times that the smart device is accumulated to be woken up within the preset time threshold in the "daily standby" scenario before the occurrence of the wake-up event is 500. The number of times that the intelligent device is awakened within a preset time threshold value in a daily standby scene after the awakening event occurs is increased once, so that n (c) 0 W) is updated to n (c) 0 ,w)=500+1=501。
Assume again the number of "scene trigger" times n (c) in the "daily Standby" scenario 0 ) The original value of (1) is 1000, namely the scene of 'daily standby' before the wake-up event happens is accumulated for 1000 times; the number of "daily standby" scenarios after this wake up event is increased by one, so n (c) is added 0 ) Is updated to n (c) 0 )=1000+1=1001。
Therefore, the awakening probability P (w | c) of the intelligent device under a certain scene can be obtained i ) The calculation formula (1) is as follows:
Figure BDA0002990127860000101
wherein, c at this time i Representing a scene, i is a positive integer greater than or equal to 0, and w represents a wake-up event.
When i is 0, there are: under the scene of daily standby, the awakening probability of the intelligent equipment is as follows:
Figure BDA0002990127860000102
wherein, c 0 The representative scenario is "daily standby" and w represents a wake-up event.
N (c) is 0 )-n(c 0 W 1001-501-500 means that 1001 times of "daily" are triggeredUnder the condition of a standby scene, the number of times that the intelligent device is not awakened within a preset time threshold value when the daily standby scene is triggered each time is accumulated to be 500 times.
Table 2: mutual information table (1)
Time Intention to interact Scene Wake-up event
2020.12.01 12:30:05 Is free of Is free of Is that
2020.12.01 12:30:25 Playing children content Children's toy Whether or not
2021.01.30 12:54:08 Is free of Is free of Is that
2021.01.30 12:54:23 Playing children content Children's toy Whether or not
Table 3: mutual information table (2)
Time Intention to interact Scene Wake-up event
2020.12.01 12:30:05 Is free of Is free of Is that
2020.12.01 12:30:25 Playing children content Children's toy Whether or not
2021.01.30 12:54:08 Is free of Is free of Is that
2021.01.30 12:54:23 Playing children content Children's toy Whether or not
2021.02.20 11:00:01 Is free of Is composed of Is that
Fig. 4 is a schematic flowchart of a method for waking up an intelligent device according to an embodiment of the present invention, where the schematic flowchart includes: S402-S420. The flow diagram is described in detail below.
S402, receiving second voice.
After the voice assistant APP204 is activated, the microphone 201 collects a second voice of the user. The second voice is the user voice for controlling the intelligent device to execute the operation instruction. For example, the second voice may be "play Song A". The audio signal collecting and processing module 202 collects the second voice of the user from the microphone 201, converts the second voice into an audio stream, and sends the audio stream to the Ring Buffer 203. The voice assistant APP204 sends the audio stream from the audio stream Ring Buffer 203 to the voice assistant cloud 205 shown in fig. 2, and acquires a corresponding interaction intention and instruction. The instructions are for instructing the smart device to perform an operation corresponding to the second voice.
S404, according to the second voice, determining the interactive intention of the current round corresponding to the second voice and a first scene corresponding to the interactive intention of the current round.
Taking the second voice of the user as "song when party is played" as an example, the voice assistant cloud 205 returns the interaction intention of the current round corresponding to the second voice to the voice assistant APP204 as "music is played", the corresponding instruction is "song is played", and an address of a song suitable for being played at the time of party is provided, and the address may be, for example, a Uniform Resource Locator (URL).
The voice assistant APP204 executes a corresponding instruction, acquires a song at the time of the party from the URL, plays the song, and sends an interaction intention "play music" to the sensitivity control module 207. The interactive behavior statistics submodule 2071 of the sensitivity control module 207 records the interactive intention. That is, after a new interaction intention exists, the interaction information table is updated, and a new record is added to the interaction information table, where the record includes the time when the interaction intention is received, the interaction intention, a scene corresponding to the interaction intention, and whether an event corresponding to the interaction intention is an awakening event. In addition, after the scene recognition submodule 2072 of the sensitivity control module 207 receives the interaction intention from the interaction behavior statistics submodule 2071, the interaction intention may be mapped to a corresponding scene according to the intention-scene mapping table (i.e., table 1), and a scene corresponding to the interaction intention may also be recorded in the interaction information table.
Still take the second voice as "play song at party" as an example. Assuming that the interaction information table before the intelligent device receives the second voice is table 3, after the intelligent device receives the second voice and the sensitivity control module 207 receives the interaction intention "play music" corresponding to the second voice, the interaction information table (i.e., table 3) may be updated, and the updated table 3 may be, for example, as shown in table 4. Table 4 relative to table 3, one more event occurred at 20/2021 at 11: 00: 15 "play music" interaction intention. For example, the voice assistant APP204 sends the current round of interaction intent "play music" to the sensitivity control module 207. The scene recognition submodule 2072 of the sensitivity control module 207 maps the interaction intention "play music" of the current round to the first scene "entertainment" according to the intention-scene mapping table (i.e., table 1). Further, the interactive behavior statistic sub-module 2071 of the sensitivity control module 207 records, in the interactive information table, that this time occurred in 20/2/2021: 00: the current round of interaction of 15 is intended to be an event of "play music" and the first scene is "entertainment". The event refers to an event corresponding to the voice of a user for controlling the intelligent device to execute the operation instruction.
Table 4: mutual information table (3)
Time Intention to interact Scene Wake-up event
2020.12.01 12:30:05 Is composed of Is free of Is that
2020.12.01 12:30:25 Playing children content Children's toy Whether or not
2021.01.30 12:54:08 Is composed of Is composed of Is that
2021.01.30 12:54:23 Playing children content Children's toy Whether or not
2021.02.20 11:00:01 Is free of Is free of Is that
2021.02.20 11:00:15 Playing music Amusement device Whether or not
S406, receiving a first voice. The first voice is the voice of the user containing the wake word when the intelligent device is awakened again.
Still take the second voice as "play music at party" as an example. After performing the above step S404, the smart device starts playing the song at the time of the party. If the user tries to wake up the smart device again during the course of the smart device playing the song at the time of the party, the user needs to speak the voice containing the wake-up word again. In the embodiment of the invention, the voice containing the awakening word which is uttered again by the user is recorded as the first voice. The microphone 201 of the smart device collects a first voice input by the user, e.g., "little schooling".
S408, determining the confidence of the awakening words contained in the first voice; inquiring the previous round of interaction intention corresponding to the first voice (namely, the current round of interaction intention corresponding to the second voice); and determining a first scene and a wake-up threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice.
The wake-up engine 206 queries the previous round of interaction intention corresponding to the first voice (i.e., the current round of interaction intention corresponding to the second voice) from the sensitivity control module 207, and determines the first scene and the wake-up threshold corresponding to the first scene according to the intention-scene mapping table (i.e., table 1).
First, taking the first voice as "literature", as an example, the process of determining the confidence level of the wake word included in the first voice by the wake engine 206 is described in detail:
the audio signal collecting and processing module 202 converts the first voice into an audio stream and sends the audio stream to the Ring Buffer 203. The wake-up engine 206 monitors the audio stream in the audio stream Ring Buffer 203 in real time and cuts it into a number of speech segments, for example, the first speech "Xiaoyi classmates" into three speech segments: "art" (speech segment M), "art" (speech segment N) and "classmate" (speech segment P). The wake engine 206 streams the speech segments M, N and P to obtain a confidence level of the wake words contained in the first speech. Then, the interactive behavior statistics submodule 2071 of the sensitivity control module 207 records a first wake-up event corresponding to the first voice. Finally, the sensitivity control module 207 calculates, according to the time of the first wake-up event corresponding to the first voice recorded by the interactive behavior statistics submodule 2071 and the time of the interaction intention of the current round corresponding to the second voice, that the time difference between the two times is smaller than the preset time threshold. The sensitivity control module 207 thus determines that the first voice and the second voice are two user voices in the same session.
Next, taking the second voice as "song when playing party", "first scene as" entertainment ", and" little art classmate "as an example, the process of the wake-up engine 206 obtaining the wake-up threshold corresponding to the first scene of" entertainment "from the query of the sensitivity control module 207 is described in detail:
first, the sensitivity control module 207 queries the interaction information table in response to a request for acquiring the wake-up threshold by the wake-up engine 206, and obtains that the interaction intention of the previous round in the same session is "play music", that is, the interaction intention corresponding to the second voice is "play music". Further, the sensitivity control module 207 determines that the first scene corresponding to the previous round of interaction intention in the same session of "playing music" is "entertainment" and the wake-up threshold corresponding to the first scene of "entertainment" is 0.3837 according to the intention-scene mapping table (i.e., table 1). Then, the sensitivity control module 207 returns the wake-up threshold 0.3837 to the wake-up engine 206, so that the wake-up engine 206 queries the sensitivity control module 207 that the wake-up threshold corresponding to the first scene is 0.3837.
S410, determining whether to awaken the intelligent device or not according to the confidence coefficient of the awakening word contained in the first voice and the awakening threshold corresponding to the first scene.
Still taking the first scene as "entertainment" and the first voice as "little art classmate" as examples, step S412 is described in detail: when the wake engine 206 processes the voice segment M ("mini process") obtained in S410, if the confidence of the output wake word is 0.5 and is greater than the wake threshold 0.3837 corresponding to the first scene of "entertainment", at this time, the wake engine 206 stops processing subsequent voice segments, and wakes up the smart device again.
S412, if the intelligent device is awakened, updating the awakening probability of the intelligent device in the first scene.
After the intelligent device is awakened, before the awakening probability of the intelligent device in the first scene is updated, the number of times of "scene trigger" and the number of times of "scene-awakening event pair" corresponding to the first scene need to be updated. The following describes in detail the process of updating the "scene trigger" times and the "scene-wake event pair" times corresponding to the first scene, taking the first scene as an example of "entertainment":
first, a first scene of "entertainment" is updated (denoted by c) by taking the second voice as "music at the time of playing a party" and the first scene corresponding to the second voice as "entertainment" as an example 1 ) Corresponding "scene trigger" times n (c) 1 ) The specific process of (1).
Let n (c) 1 ) 299, i.e. the number of times the first scene running "entertainment" was triggered before this scene was triggered is 299. The number of times that the first scene of "entertainment" is triggered after the scene trigger is increased once, thus n (c) is added 1 ) Is updated to n (c) 1 ) 299+1 300. The number n (c) of "scene-wake event pairs" corresponding to the "entertainment" first scene 1 W) is 200, i.e. the number of times the smart device is accumulated to be woken up within the preset time threshold is 200 in the first scenario of "entertainment".
In addition, n (c) 1 )-n(c 1 And w) — 300-.
Next, the detailed description will be given of updating the first scene of "entertainment" (denoted as c) by taking the example of the occurrence of a wake-up event in the first scene of "entertainment 1 ) Corresponding "scene-wake event pair" times n (c) 1 ) The specific process of (1).
After determining the confidence of the wake word contained in the first speech in S408, the wake engine 206 will "last round of interactive intention-playThe music wake-up event (denoted as the first wake-up event) is sent to the sensitivity control module 207 for recording. The interactive behavior statistics submodule 2071 of the sensitivity control module 207 writes the first wake-up event and the time of receiving the first wake-up event into the interactive information table, that is, after a new wake-up event occurs, the table 4 is updated. The interactive behavior statistics submodule 2071 adds a record of the first wake up event in table 4, and the updated table 4 is shown in table 5. The record of the first wake-up event includes the time when the first wake-up event is received and whether the first wake-up event is received, and the record of the first wake-up event is shown as the last line of data in table 5. Table 5 is one more than table 4 at 20/02/2021: 01: 15, first wake-up event. The sensitivity control module 207 determines that the time difference between the first time when the interactive behavior statistics submodule 2071 records the later-occurring first wake-up event and the second time when the interactive behavior statistics submodule 2071 records the previous-occurring "intention to interact with the previous round-play music" is smaller than the preset time threshold, which indicates that the first voice and the second voice are in the same session, and the number n of times n (c) of the "scene-wake-up event pair" corresponding to the first scene "entertainment" is counted 1 W) plus 1.
In the embodiment of the present invention, it is assumed that the preset time threshold value is 3 minutes. For example, the wake engine 206 will "last round of interaction intention-play music, wake" this first wake event at 20/02/11/2021: 01: 15 to the sensitivity control module 207. The sensitivity control module 207 determines that the time difference between the first time (11: 01: 15/20/02/20/2021) when the interactive behavior statistics submodule 2071 records the first wake-up event and the second time (11: 00: 15/11/02/20/2021/02/20/1) when the interactive behavior statistics submodule 2071 records the "previous round of interactive intention — playing music" is 1 minute and is less than the preset time threshold value for 3 minutes, and the first voice and the second voice are in the same conversation. The scene recognition submodule 2072 of the sensitivity control module 207 maps the "playing music" of the previous round of interaction intention to the first scene "entertainment", and updates the "scene-wake event pair" number n (c) corresponding to the "entertainment" first scene 1 W), i.e. "entertainment" after this first wake-up event occurs "The number of times that the intelligent device is awakened within a preset time threshold value is increased once under the scene. Thus, n (c) 1 W) is updated to n (c) 1 ,w)=200+1=201。
Then, based on the updated "scene trigger" times and "scene-wakeup event pair" times, the wakeup probability of the intelligent device in the first scene can be updated. Taking the first scene as "entertainment", there are:
according to equation (1), when i is 1, there is: the awakening probability of the intelligent device in the first scene of entertainment is as follows:
Figure BDA0002990127860000131
wherein, c at this time 1 Representing a first scenario of "entertainment" and w representing a first wake-up event.
And S414, updating the awakening threshold corresponding to the first scene according to the updated awakening probability of the intelligent device in the first scene.
According to the formula (2) of the wake-up threshold corresponding to a certain scene:
Figure BDA0002990127860000132
wherein, f (P (w | c) i ))=(P(w|c i )-P(w|c 0 ))α,
Figure BDA0002990127860000133
And P (w | c) 0 ) The wake-up threshold and the wake-up probability of the intelligent device are respectively corresponding to the scene of daily standby, and alpha is the adjustment amplitude. Comprises the following steps: the first scenario of "entertainment" corresponds to wake-up thresholds:
Figure BDA0002990127860000141
in the embodiment of the present invention, α is assumed to be 0.7. It can be seen that the wake-up threshold for the first scenario of "entertainment" is reduced from 0.3837 to 0.38135. The wake-up sensitivity is improved due to the decrease of the wake-up threshold. According to the content, the effect of automatically adjusting the awakening threshold value is realized according to the 'music playing' of the previous round of interaction intention, so that the awakening sensitivity is automatically adjusted. The smart device starts to play music in response to the second voice of the user, and at the moment, due to the music playing, the ambient sound around the smart device becomes larger than the ambient sound before the music playing is started, so that the smart device is more difficult to capture the voice command of the user, and therefore theoretically, the smart device needs to reduce the awakening threshold at the moment, namely, the awakening sensitivity is improved, so that the smart device can more sensitively detect the voice containing the awakening word sent by the user. The embodiment of the invention provides a method flow as shown in fig. 4, and the method automatically identifies scenes based on the previous round of interaction intention in the same session of a user; automatically calculating the awakening probability of the intelligent equipment of the user in the scene according to historical interactive behavior data; and automatically adjusting the awakening threshold value according to the awakening probability of the intelligent equipment in the scene, so that the awakening sensitivity is automatically adjusted.
It should be noted that, if the time difference between the first time when the interactive behavior statistics sub-module 2071 records the first wake-up event and the second time when the interactive behavior statistics sub-module 2071 records the "previous round of interaction intention — music playing" is greater than the preset time threshold for 3 minutes, the sensitivity control module 207 determines that the first voice and the second voice are not in the same session, and then determines that the device is an intelligent device that is woken up in the "daily standby" scene.
Table 5: mutual information table (4)
Time Intention to interact Scene Wake-up event
2020.12.01 12:30:05 Is composed of Is free of Is that
2020.12.01 12:30:25 Playing children content Children's toy Whether or not
2021.01.30 12:54:08 Is free of Is composed of Is that
2021.01.30 12:54:23 Playing children content Children's toy Whether or not
2021.02.20 11:00:01 Is free of Is free of Is that
2021.02.20 11:00:15 Playing music Entertainment system Whether or not
2021.02.20 11:01:15 Is free of Is free of Is that
Repeating the steps to continuously update the awakening probability P (w | c) of the intelligent equipment i ) And further continuously updating the wake-up threshold
Figure BDA0002990127860000142
Thereby continuously adjusting the wake-up sensitivity.
Fig. 5 is another schematic flow chart of a method for waking up an intelligent device according to an embodiment of the present invention, where the schematic flow chart includes: S502-S508;
s502, receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device.
In the embodiment of the invention, a preset word for waking up the intelligent device is used as the wake-up word. The smart device may receive a first voice containing a wake word.
S504, obtaining the confidence of the awakening words contained in the first voice.
In the embodiment of the invention, after receiving a first voice containing a wakeup word, the intelligent device acquires the confidence of the wakeup word contained in the first voice.
S506, according to the previous round of interaction intention corresponding to the first voice, determining a first scene and a wake-up threshold corresponding to the first scene; wherein the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instructional sentence for controlling the intelligent device to execute an operation, and the first scene is a scene corresponding to the interaction intention of the previous round.
In this embodiment of the present invention, the wake-up threshold in this step includes a threshold that is automatically updated based on historical interactive behavior data, where the historical interactive behavior data includes a "scene trigger" number and a "scene-wake-up event pair" number, where the "scene trigger" number is an accumulated trigger number of a certain scene, and the "scene-wake-up event pair" number is an accumulated wake-up number of the smart device in the certain scene.
S508, when the confidence coefficient is larger than the awakening threshold value, awakening the intelligent equipment.
According to the technical scheme, the corresponding scene and the awakening threshold corresponding to the scene are determined based on the previous round of interaction intention, so that whether the intelligent equipment is awakened or not is determined according to the confidence coefficient and the awakening threshold.
In the embodiment of the present invention, first, in a case of waking up the smart device, the "scene trigger" number and the "scene-wake event pair" number corresponding to the first scene are updated. And secondly, updating the awakening probability corresponding to the first scene according to the updated scene triggering times and scene-awakening event pair times corresponding to the first scene. Specifically, the wake-up probability corresponding to the first scene is updated according to the following formula:
Figure BDA0002990127860000151
wherein, the P (w | c) 1 ) For representing the wake-up probability corresponding to the first scene, the n (c) 1 W) is used to represent the number of said "scene-wake event pair" corresponding to said first scene, said n (c) 1 ) The "scene trigger" number is used to represent the number of times the first scene corresponds to. And finally, updating the awakening threshold corresponding to the first scene according to the updated awakening probability of the intelligent equipment corresponding to the first scene. Specifically, the wake-up threshold corresponding to the first scene is updated according to the following formula:
Figure BDA0002990127860000152
wherein, the
Figure BDA0002990127860000153
A wake threshold for representing a correspondence of the first scenario, the
Figure BDA0002990127860000154
Used for representing the awakening threshold corresponding to the daily standby scene, wherein the awakening threshold corresponding to the daily standby scene is 0.5, and the P (w | c) 1 ) For representing the wake-up probability corresponding to the first scene, the P (w | c) 0 ) The method is used for representing the awakening probability corresponding to the daily standby scene, and alpha is used for representing the adjustment amplitude. According to the technical scheme, the awakening probability corresponding to the first scene is automatically adjusted based on historical interactive behavior data, and then the awakening threshold corresponding to the first scene is automatically adjusted according to the adjusted awakening probability.
It should be noted that, in this embodiment of the present invention, the interactive behavior statistics sub-module 2071 records a first wake-up event corresponding to the first voice, and determines a first time of the first wake-up event corresponding to the first voice. Further, the sensitivity control module 207 determines a time difference between the first time and a second time, where the second time is a time of the interaction intention corresponding to the second voice determined by the interaction behavior statistics submodule 2071; in the case that the time difference is smaller than a preset time threshold, the sensitivity control module 207 determines that the first voice and the second voice are in the same session.
An embodiment of the present invention further provides an apparatus for waking up an intelligent device, including at least one processor, where the processor is configured to execute a program stored in a memory, and when the program is executed, the apparatus is enabled to perform the following steps:
receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device; obtaining the confidence of a wakeup word contained in the first voice; determining a first scene and a wake-up threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; wherein the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instructional sentence for controlling the intelligent device to execute an operation, and the first scene is a scene corresponding to the interaction intention of the previous round; and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value.
Embodiments of the present invention also provide a computer program product containing instructions, which when run on a computer, cause the following steps to be performed by the computer:
receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device; obtaining the confidence of a wakeup word contained in the first voice; determining a first scene and a wake-up threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; wherein the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instructional sentence for controlling the smart device to perform an operation, and the first scene is a scene corresponding to the interaction intention of the previous round; and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are performed:
receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device; obtaining the confidence of a wakeup word contained in the first voice; determining a first scene and a wake-up threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; wherein the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instructional sentence for controlling the intelligent device to execute an operation, and the first scene is a scene corresponding to the interaction intention of the previous round; and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a few embodiments of the present invention, and those skilled in the art can make various modifications or alterations to the present invention without departing from the spirit and scope of the present invention as disclosed in the specification.

Claims (11)

1. A smart device wake-up method is applied to a smart device, and is characterized by comprising the following steps:
receiving a first voice containing a wake-up word, wherein the wake-up word is a preset word for waking up the intelligent device;
obtaining the confidence of a wakeup word contained in the first voice;
determining a first scene and a wake-up threshold corresponding to the first scene according to the previous round of interaction intention corresponding to the first voice; wherein the interaction intention of the previous round is an interaction intention corresponding to a second voice, the second voice is a voice received before the first voice is received, the second voice contains an instructional sentence for controlling the smart device to perform an operation, and the first scene is a scene corresponding to the interaction intention of the previous round;
and awakening the intelligent device under the condition that the confidence coefficient is greater than the awakening threshold value.
2. The method of claim 1, wherein the wake-up threshold comprises a threshold that is automatically updated based on historical interactive behavior data, wherein the historical interactive behavior data comprises a "scene trigger" number and a "scene-wake event pair" number, wherein the "scene trigger" number is an accumulated trigger number of a certain scene, and the "scene-wake event pair" number is an accumulated wake-up number of the smart device in the certain scene.
3. The method of claim 2, further comprising:
under the condition of waking up the intelligent equipment, updating the 'scene trigger' times and the 'scene-wake event pair' times corresponding to the first scene;
and updating the awakening threshold corresponding to the first scene according to the updated scene triggering times and scene-awakening event pair times corresponding to the first scene.
4. The method according to claim 3, wherein the updating the wake-up threshold corresponding to the first scene according to the updated "scene trigger" times and the "scene-wake-up event pair" times corresponding to the first scene specifically includes:
updating the awakening probability corresponding to the first scene according to the updated scene triggering times and scene-awakening event pair times corresponding to the first scene;
and updating the awakening threshold corresponding to the first scene according to the updated awakening probability of the intelligent equipment corresponding to the first scene.
5. The method of claim 4, wherein the wake-up probability corresponding to the first scenario is calculated based on the following formula:
Figure FDA0002990127850000011
wherein, the P (w | c) 1 ) For representing the firstWake-up probability corresponding to the scene, n (c) 1 W) is used to represent the number of said "scene-wake event pair" corresponding to said first scene, said n (c) 1 ) The "scene trigger" number corresponding to the first scene is represented.
6. The method of claim 5, wherein the wake-up threshold corresponding to the first scenario is calculated based on the following formula:
Figure FDA0002990127850000021
wherein, the
Figure FDA0002990127850000022
A wake threshold for representing a correspondence of the first scenario, the
Figure FDA0002990127850000023
The awakening threshold value is used for representing the awakening threshold value corresponding to the daily standby scene, the awakening threshold value corresponding to the daily standby scene is 0.5, and the P (w | c) 1 ) For representing the wake-up probability corresponding to the first scene, P (w | c) 0 ) The method is used for representing the awakening probability corresponding to the daily standby scene, and alpha is used for representing the adjustment amplitude.
7. The method according to any one of claims 1 to 6, further comprising:
recording a first wake-up event corresponding to the first voice, and determining a first time of the first wake-up event corresponding to the first voice.
8. The method of claim 7, further comprising:
determining a time difference between the first moment and a second moment, wherein the second moment is a moment of determining an interaction intention corresponding to the second voice;
and under the condition that the time difference is smaller than a preset time threshold, determining that the first voice and the second voice are in the same conversation.
9. An intelligent device wake-up apparatus, comprising at least one processor configured to execute a program stored in a memory, which when executed, causes the apparatus to perform the method of any of claims 1-8.
10. A computer program product comprising instructions for causing the computer to perform the method according to any one of claims 1 to 8 when the computer program product is run on the computer.
11. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1-8.
CN202110313234.1A 2021-03-24 2021-03-24 Intelligent equipment awakening method and device Pending CN115132172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110313234.1A CN115132172A (en) 2021-03-24 2021-03-24 Intelligent equipment awakening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110313234.1A CN115132172A (en) 2021-03-24 2021-03-24 Intelligent equipment awakening method and device

Publications (1)

Publication Number Publication Date
CN115132172A true CN115132172A (en) 2022-09-30

Family

ID=83373954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110313234.1A Pending CN115132172A (en) 2021-03-24 2021-03-24 Intelligent equipment awakening method and device

Country Status (1)

Country Link
CN (1) CN115132172A (en)

Similar Documents

Publication Publication Date Title
CN107030691B (en) Data processing method and device for nursing robot
US10621478B2 (en) Intelligent assistant
US11100384B2 (en) Intelligent device user interactions
CN110288978B (en) Speech recognition model training method and device
CN110265040B (en) Voiceprint model training method and device, storage medium and electronic equipment
CN108711430B (en) Speech recognition method, intelligent device and storage medium
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN110890093A (en) Intelligent device awakening method and device based on artificial intelligence
KR20210070213A (en) Voice user interface
CN111312222A (en) Awakening and voice recognition model training method and device
CN112207811B (en) Robot control method and device, robot and storage medium
CN108806699B (en) Voice feedback method and device, storage medium and electronic equipment
CN113160815B (en) Intelligent control method, device, equipment and storage medium for voice wakeup
EP3776173A1 (en) Intelligent device user interactions
US20200402498A1 (en) Information processing apparatus, information processing method, and program
CN114360510A (en) Voice recognition method and related device
WO2023006033A1 (en) Speech interaction method, electronic device, and medium
CN111526244A (en) Alarm clock processing method and electronic equipment
CN115132172A (en) Intelligent equipment awakening method and device
US20210326659A1 (en) System and method for updating an input/output device decision-making model of a digital assistant based on routine information of a user
CN114429766A (en) Method, device and equipment for adjusting playing volume and storage medium
CN114495981A (en) Method, device, equipment, storage medium and product for judging voice endpoint
Yang et al. Adaptive Pronunciation Proofreading of Spoken English in a Wireless Sensor Network Environment
CN116775824A (en) Man-machine interaction method and device based on digital person, electronic equipment and storage medium
CN116978359A (en) Phoneme recognition method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination