CN111192591B - Awakening method and device of intelligent equipment, intelligent sound box and storage medium - Google Patents

Awakening method and device of intelligent equipment, intelligent sound box and storage medium Download PDF

Info

Publication number
CN111192591B
CN111192591B CN202010085053.3A CN202010085053A CN111192591B CN 111192591 B CN111192591 B CN 111192591B CN 202010085053 A CN202010085053 A CN 202010085053A CN 111192591 B CN111192591 B CN 111192591B
Authority
CN
China
Prior art keywords
intelligent
time
acoustic signal
information
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010085053.3A
Other languages
Chinese (zh)
Other versions
CN111192591A (en
Inventor
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Shanghai Xiaodu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd, Shanghai Xiaodu Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010085053.3A priority Critical patent/CN111192591B/en
Publication of CN111192591A publication Critical patent/CN111192591A/en
Application granted granted Critical
Publication of CN111192591B publication Critical patent/CN111192591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Electric Clocks (AREA)

Abstract

The application discloses a method and a device for awakening intelligent equipment, an intelligent sound box and a storage medium, and relates to the technical field of voice recognition. The method is applied to a wireless network comprising a cloud end and two or more intelligent devices; the specific implementation scheme is as follows: when voice information containing a wake-up word is received, recording time information for identifying the wake-up word and acoustic signal intensity of the voice information; sending the time information and the acoustic signal intensity to a cloud; and receiving indication information sent by the cloud, wherein the indication information is used for indicating one intelligent device in the wireless network as a target device to enter an awakening mode. The method and the device can be applied to scenes in which a plurality of intelligent devices coexist, and can quickly select the intelligent device which is most likely to be awakened by the user, so that the chaotic voice interaction condition caused by the simultaneous awakening of the plurality of intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.

Description

Awakening method and device of intelligent equipment, intelligent sound box and storage medium
Technical Field
The present application relates to voice recognition technologies in the field of data processing technologies, and in particular, to a method and an apparatus for waking up an intelligent device, an intelligent sound box, and a storage medium.
Background
With the continuous development of speech recognition technology, more and more devices are provided with speech recognition functions. The smart speaker has a powerful voice recognition function, and can perform various operations through voice interaction with a user.
At present, when a plurality of intelligent sound boxes coexist, if a user sends out voice information containing a wake-up word, the plurality of intelligent sound boxes can simultaneously respond to the wake-up word so that the plurality of intelligent sound boxes are in a listening state.
However, if many audio amplifiers are in listening state simultaneously, can lead to the chaotic condition of voice interaction for the on-the-spot voice broadcast environment is noisy, and user experience is not good.
Disclosure of Invention
The application provides a method, device, intelligent audio amplifier and storage medium awaken up of smart machine can be applied to the coexisting scene of many intelligent audio amplifiers, selects a smart audio amplifier that the user most probably awakens up fast, avoids because many smart audio amplifiers awaken up the chaotic condition of the voice interaction that brings simultaneously, improves voice interaction's efficiency and quality, and user experience is better.
In a first aspect, an embodiment of the present application provides a method for waking up an intelligent device, which is applied to a wireless network including a cloud and two or more intelligent devices; the method comprises the following steps:
when voice information containing a wake-up word is received, recording time information for identifying the wake-up word and acoustic signal intensity of the voice information;
sending the time information and the acoustic signal intensity to a cloud;
and receiving indication information issued by the cloud, wherein the indication information is used for indicating one intelligent device in the wireless network as a target device to enter an awakening mode.
In the embodiment, voice information containing awakening words is received; performing recognition processing on the voice information to obtain time information of the recognized awakening words and acoustic signal intensity; and sending the time information and the acoustic signal intensity to a cloud end, selecting one intelligent device from a wireless network through the cloud end as a target device, and controlling the target device to enter an awakening mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by the user can be quickly selected, the disordered voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, before recording time information of the recognition of the wakeup word and the acoustic signal strength of the voice information, the method further includes:
and identifying the awakening words from the voice information through a voice identification algorithm.
In this embodiment, when a user sends out voice information including a wakeup word, the intelligent device may recognize the wakeup word in the voice information through a voice algorithm, and then record time information when the wakeup word is recognized.
In one possible design, controlling the target device to enter an awake mode includes:
controlling the target device to listen;
and in a listening state, executing corresponding operation tasks according to the voice information of the user.
In this embodiment, after the target device enters the listening state, the other intelligent devices do not respond to the voice information sent by the user. The target device identifies the voice information, extracts the voice control instruction and executes corresponding operation according to the voice control instruction. Therefore, one-to-one interaction can be realized, confusion of voice interaction is avoided, and interaction efficiency and quality are improved.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the intelligent device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the intelligent device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because many smart machine awaken up the chaotic condition of voice interaction that brings simultaneously, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two accounts of the smart devices are different.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are positioned in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, one intelligent device which is most likely to be awakened by the user can be quickly selected from the wireless network, the condition that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened simultaneously is avoided, the voice interaction efficiency and quality are improved, and user experience is better.
In a second aspect, an embodiment of the present application provides a method for waking up an intelligent sound box, which is applied to a wireless network including a cloud and two or more intelligent devices; the method comprises the following steps:
respectively receiving time information and acoustic signal strength sent by at least two intelligent devices;
respectively acquiring time differences corresponding to the at least two intelligent devices;
selecting one intelligent device from each intelligent device as a target device according to the time difference and the acoustic signal intensity corresponding to the at least two intelligent devices;
and sending indication information to the target equipment so as to enable the target equipment to enter an awakening mode.
In this embodiment, the cloud receives time information and acoustic signal intensity sent by the smart device, obtains a time difference between the time when the cloud receives the time information and the acoustic signal intensity and the time when the smart device recognizes the wakeup word, selects one smart device from the wireless network as the target device according to the time difference and the acoustic signal intensity, and controls the target device to enter the wakeup mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by the user can be quickly selected, the disordered voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, the respectively obtaining the time differences corresponding to the at least two smart devices includes:
acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and a second time stamp; the second timestamp is the time when the cloud receives the time information and the acoustic signal strength.
In a possible design, the selecting one smart device from the smart devices as a target device according to the time difference and the acoustic signal strength corresponding to the at least two smart devices includes:
selecting intelligent equipment with the acoustic signal intensity larger than a preset threshold value from all intelligent equipment as candidate equipment;
and selecting the candidate equipment with the minimum time difference from the candidate equipment as the target equipment.
In this embodiment, the method for waking up a plurality of intelligent devices can solve the problem that when a user interacts with a plurality of intelligent devices, a wake-up word simultaneously wakes up the plurality of devices, so that the plurality of devices respond and execute a voice instruction. The time difference between the acoustic signal intensity and the time information received by the cloud serves as a judgment basis for the distance between the intelligent sound box and the user, so that the most appropriate intelligent sound box can be accurately selected for control, voice interference among a plurality of devices is avoided, and interactive experience among the devices of the user is optimized.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the intelligent device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the intelligent device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because many smart machine awaken up the chaotic condition of voice interaction that brings simultaneously, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two account numbers of the smart devices are different.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are located in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, one intelligent device which is most likely to be awakened by the user can be quickly selected from the wireless network, the condition that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened simultaneously is avoided, the voice interaction efficiency and quality are improved, and user experience is better.
In a third aspect, the present application provides a wake-up apparatus for an intelligent device, which is applied to a wireless network including a cloud and two or more intelligent devices; the device comprises:
the recognition module is used for recording time information of the recognized awakening words and acoustic signal intensity of the voice information when the voice information containing the awakening words is received;
the sending module is used for sending the time information and the acoustic signal intensity to a cloud end;
and the receiving module is used for receiving indication information sent by the cloud end, wherein the indication information is used for indicating one intelligent device in the wireless network as a target device to enter an awakening mode.
In the embodiment, voice information containing awakening words is received; performing recognition processing on the voice information to obtain time information for recognizing the awakening word and acoustic signal intensity; and sending the time information and the acoustic signal intensity to a cloud end, selecting one intelligent device from a wireless network through the cloud end as a target device, and controlling the target device to enter an awakening mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by a user is rapidly selected, the chaotic voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, the identification module is specifically configured to:
and identifying the awakening words from the voice information through a voice identification algorithm.
In this embodiment, when a user sends out voice information including a wakeup word, the intelligent device may recognize the wakeup word in the voice information through a voice algorithm, and then record time information when the wakeup word is recognized.
In one possible design, further comprising: the control module is specifically configured to:
controlling the target device to listen;
and in a listening state, executing corresponding operation tasks according to the voice information of the user.
In this embodiment, after the target device enters the listening state, the other intelligent devices do not respond to the voice information sent by the user. The target device identifies the voice information, extracts the voice control instruction and executes corresponding operation according to the voice control instruction. Therefore, one-to-one interaction can be realized, confusion of voice interaction is avoided, and interaction efficiency and quality are improved.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the smart device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the smart device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because many smart machine awaken up the chaotic condition of voice interaction that brings simultaneously, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two account numbers of the smart devices are different.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are located in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, one intelligent device which is most likely to be awakened by the user can be quickly selected from the wireless network, the condition that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened simultaneously is avoided, the voice interaction efficiency and quality are improved, and user experience is better.
In a fourth aspect, the present application provides a wake-up apparatus for an intelligent device, which is applied to a wireless network including a cloud and two or more intelligent devices; the device comprises:
the receiving module is used for respectively receiving the time information and the acoustic signal strength sent by at least two intelligent devices;
the acquisition module is used for respectively acquiring time differences corresponding to the at least two intelligent devices;
the determining module is used for selecting one intelligent device from the intelligent devices as a target device according to the time difference and the acoustic signal intensity corresponding to the at least two intelligent devices;
and the sending module is used for sending indication information to the intelligent equipment so as to enable the target equipment to enter an awakening mode.
In this embodiment, the cloud receives time information and acoustic signal intensity sent by the smart device, obtains a time difference between the time when the cloud receives the time information and the acoustic signal intensity and the time when the smart device recognizes the wakeup word, selects one smart device from the wireless network as the target device according to the time difference and the acoustic signal intensity, and controls the target device to enter the wakeup mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by the user can be quickly selected, the disordered voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, the obtaining module is specifically configured to: acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and a second time stamp; the second timestamp is the time when the cloud receives the time information and the acoustic signal strength.
In this embodiment, the first timestamp and the second timestamp are respectively used for recording the time when the intelligent device recognizes the wakeup word and the time when the cloud receives the time information and the intensity of the acoustic signal; based on the time difference between the first timestamp and the second timestamp, the time consumed by the intelligent device for sending data to the cloud end can be reflected. Therefore, the speed of data interaction between the intelligent device and the cloud can be known, the intelligent device with high interaction speed is selected to be used as the target device to be awakened, and the use experience of a user can be improved.
In one possible design, the determining module is specifically configured to:
selecting intelligent equipment with acoustic signal intensity larger than a preset threshold value from all intelligent equipment as candidate equipment;
and selecting the candidate equipment with the minimum time difference from the candidate equipment as the target equipment.
In this embodiment, the method for waking up a plurality of intelligent devices can solve the problem that when a user interacts with a plurality of intelligent devices, a sentence of a wake-up word wakes up the plurality of devices at the same time, so that the plurality of devices respond and execute a voice instruction. The time difference between the acoustic signal intensity and the time information received by the cloud serves as a judgment basis for the distance between the intelligent sound box and the user, so that the most appropriate intelligent sound box can be accurately selected for control, voice interference among multiple devices is avoided, and interaction experience among the devices of the user is optimized.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the smart device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the smart device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because the chaotic condition of voice interaction that many smart machines were awaken up simultaneously and bring, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two accounts of the smart devices are different.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are located in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, one intelligent device which is most likely to be awakened by the user can be quickly selected from the wireless network, the condition that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened simultaneously is avoided, the voice interaction efficiency and quality are improved, and user experience is better.
In a fifth aspect, the present application provides a smart speaker, comprising: a processor and a memory; the memory stores executable instructions of the processor; wherein the processor is configured to perform the wake-up method of the smart device of any of the first aspects via execution of the executable instructions.
In a sixth aspect, the present application provides a server, comprising: a processor and a memory; the memory stores executable instructions of the processor; wherein the processor is configured to perform the wake-up method of the smart device of any of the first aspects via execution of the executable instructions.
In a seventh aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for waking up a smart device according to any one of the first aspect.
In an eighth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the wake-up method of the smart device according to any one of the second aspects.
In a ninth aspect, an embodiment of the present application provides a program product, where the program product includes: a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of a server, execution of the computer program by the at least one processor causing the server to perform the wake-up method of a smart device of any of the first aspects.
One embodiment in the above application has the following advantages or benefits: the voice interaction method can be applied to the scene where multiple intelligent devices coexist, one intelligent sound box which is most likely to be awakened by a user can be quickly selected, the voice interaction efficiency and quality are improved, and the user experience is better. When voice information containing the awakening words is received, recording time information for identifying the awakening words and acoustic signal intensity of the voice information; sending the time information and the acoustic signal intensity to a cloud; the technical means that the indication information is sent by the cloud end and used for indicating one intelligent device in the wireless network to enter the awakening mode as the target device is received, so that the technical problem of disordered voice interaction caused by the fact that a plurality of intelligent devices are awakened at the same time is solved, one intelligent device most likely to be awakened by a user is quickly selected through the time information and the acoustic signal intensity received by the cloud end, and the technical effects of improving the efficiency and the quality of voice interaction and enabling the user experience to be better are achieved.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic diagram of a wake-up method of a smart device, which can implement an embodiment of the present application;
FIG. 2 is a schematic diagram according to a first embodiment of the present application;
FIG. 3 is a schematic illustration according to a second embodiment of the present application;
FIG. 4 is a schematic illustration according to a third embodiment of the present application;
FIG. 5 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 6 is a schematic illustration according to a fifth embodiment of the present application;
fig. 7 is a block diagram of a smart sound box for implementing an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
The smart speaker has a powerful voice recognition function, and can perform various operations through voice interaction with a user. When a plurality of intelligent sound boxes coexist, if a user sends out voice information containing awakening words, the plurality of intelligent sound boxes can simultaneously respond to the awakening words so that the plurality of intelligent sound boxes are in a listening state. And many audio amplifier are in listening the state simultaneously, can lead to the chaotic condition of voice interaction for the scene voice broadcast environment is noisy, and user experience is not good.
In view of the above technical problems, the present application provides a method and an apparatus for waking up an intelligent device, an intelligent sound box and a storage medium, which can be applied to a scene where multiple intelligent sound boxes coexist, and quickly select an intelligent sound box that a user is most likely to wake up, thereby avoiding a situation of confusion of voice interaction caused by the simultaneous wake-up of multiple intelligent sound boxes, improving the efficiency and quality of voice interaction, and providing better user experience. The method provided by the application can be applied to intelligent equipment with a voice interaction function, such as an intelligent sound box, an intelligent mobile phone, an intelligent watch and the like. The intelligent sound box is used as an example for explanation in the application, and the realization principle of other intelligent devices is the same as that of the intelligent sound box.
Fig. 1 is a schematic diagram of a principle of a wake-up method for an intelligent device, which can implement the wake-up method for an intelligent device according to an embodiment of the present application, and as shown in fig. 1, there are many situations in which a single family purchases and uses multiple intelligent devices, and in many cases, for a living environment of a chinese family, when facing a plurality of speakers in the family, a situation that a user shouts a word to wake up multiple speakers to respond at the same time often occurs. Meanwhile, due to the fact that the distance between the loudspeaker box device and the user is different, when the loudspeaker box device receives the voice instruction, the far loudspeaker box often has a poor receiving effect, so that the voice recognition is wrong, and as a result, a plurality of loudspeaker boxes which are waken up can start to execute the instruction to play the voice, but the far loudspeaker boxes can generally execute the wrong instruction. At this moment, the user still sends the stop instruction again to the smart machine of execution mistake instruction, causes loaded down with trivial details or inconvenient in the use, and the purpose of the very poor interactive experience this application is to select out the smart machine that the user is most probably awaken up fast, avoids because the chaotic condition of voice interaction that many smart machines were awaken up simultaneously brought. According to the method and the device, the time information of the awakening words and the acoustic signal intensity of the voice information can be recorded and recognized by the intelligent device. The strength of the acoustic signal can be used as a basis for judging the distance, so that when a user faces a plurality of loudspeaker box devices, different devices do not need to calculate the specific distance between the device and the user after the user speaks the awakening word and receives the sound, the Sound Pressure Level (SPL) can be obtained by comparing the sound picked up by the microphone, and the larger the value is, the closer the user is to the device is represented. The cloud can determine which smart device responded to the user request according to the time difference between the first time stamp and the second time stamp and the acoustic signal strength. The first timestamp is time information of the intelligent device recognizing the awakening word, and the second timestamp is time of receiving the time information and the acoustic signal intensity by the cloud. Therefore, the time difference between the first time stamp and the second time stamp indicates the response speed of the smart device under the network load. The smaller the time difference is, the lower the network delay is, and the faster the response speed of the intelligent device to the user is. And the strength of the acoustic signal can be used as a basis for judging the distance. Therefore, the intelligent device with the smallest time difference and the strongest acoustic signal intensity can be selected as the target device in the network, and the indication information is sent to the target device, so that the intelligent device enters a listening state to respond to the user request.
The method can be applied to the scene where a plurality of intelligent devices coexist, one intelligent device which is most likely to be awakened by the user is quickly selected, the technical problem of disordered voice interaction caused by the simultaneous awakening of the plurality of intelligent devices is solved, one intelligent device which is most likely to be awakened by the user is quickly selected through the time information and the acoustic signal strength received by the cloud, and the technical effects of improving the efficiency and the quality of the voice interaction and better user experience are achieved.
Fig. 2 is a schematic view of a first embodiment of the present application, and as shown in fig. 2, the method in this embodiment may be applied to a wireless network including a cloud and two or more smart devices, and the method includes:
s101, when voice information containing the awakening words is received, recording time information of the awakening words and acoustic signal intensity of the voice information.
In this embodiment, when receiving the voice message including the wakeup word, the smart device records time information for identifying the wakeup word and acoustic signal intensity of the voice message. The smart device may determine which smart device responded to the user request based on the time information and the strength of the acoustic signal.
Specifically, a plurality of situations that a family bought and used a plurality of intelligent audio amplifier now are many, and to the living environment of chinese family under many circumstances, when having a plurality of audio amplifiers in the face of the family, often can appear shouting a sentence and awaken the condition that a plurality of audio amplifiers of word responded simultaneously. Meanwhile, due to the fact that the distances between the sound box device and the user are different, the sound box far away from the user often has poor receiving effect when receiving the voice instruction, so that the voice recognition is wrong, and as a result, a plurality of sound boxes which are awakened up all start to execute the instruction to play sound, but the sound boxes far away generally execute wrong instructions. At this moment, the user still needs to send the stop command to the intelligent audio amplifier equipment of execution mistake instruction once more, causes loaded down with trivial details or inconvenient in the use, and the purpose of the very poor this application of interactive experience is to select out a intelligent audio amplifier that the user is most probably awaken up fast, avoids because the chaotic condition of the voice interaction that many intelligent audio amplifiers were awaken up simultaneously and bring. According to the method and the device, the time information of the awakening words and the acoustic signal intensity of the voice information can be identified through the intelligent sound box records, and the intelligent device which responds to the user request is determined by the cloud as a judgment basis.
Optionally, before recording the time information of the recognized wake word and the acoustic signal strength of the voice information, the method further includes: and identifying the awakening words from the voice information through a voice identification algorithm.
Specifically, the smart speaker may recognize the wake-up word from the voice message according to a voice recognition algorithm. The intelligent sound box is provided with different awakening words such as ' small degree and ' love classmates ' according to different system settings. When the intelligent sound box receives the voice information, semantic recognition can be carried out on the voice information through a voice recognition algorithm. And if the recognition result contains the set awakening words, recording the time information of the identified awakening words and the acoustic signal intensity of the voice information.
Optionally, the acoustic signal strength comprises: the sound pressure intensity of the voice information received by the smart device.
Specifically, the intensity of the sound signal can be used as a basis for judging the distance, after a user speaks a wakeup word to a plurality of devices at different distances, the devices recognize the wakeup word through the microphone array and judge the intensity of the acoustic signal of the sound signal of the wakeup word, and the stronger the signal is, the closer the distance is. The principle is that sound waves are a propagation mode of energy, a propagation medium refers to air, the propagation medium is subjected to diffusion, absorption, scattering and other effects in the propagation process, and the energy of the sound waves is gradually attenuated along with the increase of the distance. For example, the sound pressures of two points from a point sound source are respectively: lp1 and Lp2, the two points are respectively r1 and r2 away from the point sound source, and the sound pressure level difference between the distances r1 and r2 is: lp1-Lp2=20lg (r 2/r 1). When r2/r1=2, the attenuation is 6dB, i.e. the distance is doubled and the sound pressure level is attenuated by 6dB. The sound pressure level SPL is an index of sound pressure magnitude and also represents a relative index of sound propagation energy, and theoretically, the sound pressure level is attenuated by 6dB if the sound propagation distance is doubled, so that for a user facing a plurality of sound box devices, after a wakeup word is spoken, different devices receive sound without calculating the specific distance between the devices and the user, the Sound Pressure Level (SPL) is obtained by only comparing the sound picked by the microphone, and the larger the value is, the closer the user is to the devices is represented.
Optionally, the smart device is located within a preset geographic range, and the accounts of at least two smart devices are different.
Specifically, when a user sends out a wakeup word and queries, if multiple machines log in by using the same account number, the cloud end can judge according to the position and the account number, and then the query result can be sent to one loudspeaker box instead of loudspeaker boxes of all the same account numbers. And when a plurality of machines log in by adopting different accounts, the machine can be awakened and respectively respond to the inquired content, and in this case, the site can be very noisy after the inquired content is issued due to network transmission and terminal load, and the user experience is not good. According to the method, the intelligent sound box which is most likely to be awakened by the user can be selected quickly in the local area, so that the method can be suitable for the situation that the account numbers of at least two intelligent sound boxes are different in the wireless network, namely the method can be suitable for awakening the local multiple sound boxes logged in by any account number.
And S102, sending the time information and the acoustic signal intensity to a cloud terminal.
In this embodiment, the smart device may send the time information and the acoustic signal strength to the cloud, and the cloud performs analysis processing to determine which smart device in the wireless network is the target device to respond to the user request. The cloud end only wakes up one intelligent device from the wireless network to enter the awakening state, so that the efficiency and the quality of voice interaction are improved, and the user experience is better.
And S103, receiving indication information issued by the cloud.
In this embodiment, the smart device may receive an indication message sent by the cloud, where the indication message is used to indicate that one smart device in the wireless network enters the wake-up mode as the target device.
In this embodiment, the target device may be controlled to listen; and in a listening state, executing corresponding operation tasks according to the voice information of the user.
Specifically, after the smart speaker receives the indication information, if the smart speaker is not the target device, the smart speaker maintains the silent state and does not respond to the user request. If the intelligent sound box is the target device, entering a listening state. In a listening state, the smart sound box can recognize the voice information of the user in real time through a voice recognition algorithm and execute a response operation. For example, there are smart speaker a and smart speaker B in the network, where smart speaker a is the target device, and smart speaker a enters the listening state. When the voice information of the user is 'music playing', the intelligent sound box A can identify the voice information and open a music player to play music.
In this embodiment, when receiving voice information including a wakeup word, the intelligent device identifies time information of the wakeup word and acoustic signal intensity of the voice information by recording; sending the time information and the acoustic signal intensity to a cloud terminal; receiving indication information sent by a cloud end, wherein the indication information is used for indicating one intelligent device in a wireless network as a target device; and controlling the target device to enter an awakening mode. Thereby can be applied to many smart machine coexistence scenes, select out a smart sound box that the user most probably awakens up fast, overcome many smart machine and awaken up the chaotic technical problem of the voice interaction that brings simultaneously, through high in the clouds received time information and acoustic signal intensity, select out a smart machine that the user most probably awakens up fast, reach efficiency and quality that improves voice interaction, user experience is better technological effect.
FIG. 3 is a schematic diagram according to a second embodiment of the present application; as shown in fig. 3, the method in this embodiment may be applied to a wireless network including a cloud and two or more smart devices, and the method includes:
s201, respectively receiving time information and acoustic signal strength sent by at least two intelligent devices.
In this embodiment, the cloud end can receive time information and acoustic signal intensity that a plurality of smart machine sent simultaneously.
Specifically, the cloud end can receive the time information recorded by the plurality of intelligent sound boxes and the acoustic signal intensity of the voice information at the same time, and determine which intelligent device responds to the user request by taking the time information and the acoustic signal intensity as a judgment basis.
Optionally, the acoustic signal strength comprises: the sound pressure intensity of the voice information received by the smart device.
Specifically, the intensity of the sound signal can be used as a basis for judging the distance, after a user speaks a wakeup word to a plurality of devices at different distances, the devices recognize the wakeup word through the microphone array and judge the intensity of the acoustic signal of the wakeup word sound signal, and the stronger the signal is, the closer the distance is. The principle is that sound waves are a mode of energy propagation, a propagation medium refers to air and is subjected to diffusion, absorption, scattering and other actions in the process of propagation, and the energy of the sound waves is gradually attenuated along with the increase of distance. For example, the sound pressures of two points from a point sound source are respectively: lp1, lp2, the two points are respectively r1 and r2 away from the point sound source, and the sound pressure level difference between the distances r1 and r2 is: lp1-Lp2=20lg (r 2/r 1). When r2/r1=2, the attenuation is 6dB, i.e. the distance is doubled by the sound pressure level attenuation 6dB. The sound pressure level SPL is an index of sound pressure magnitude and also represents a relative index of sound propagation energy, and theoretically, the sound pressure level is attenuated by 6dB if the sound propagation distance is doubled, so that for a user facing a plurality of sound box devices, after a wakeup word is spoken, different devices receive sound without calculating the specific distance between the devices and the user, the Sound Pressure Level (SPL) is obtained by only comparing the sound picked by the microphone, and the larger the value is, the closer the user is to the devices is represented.
Optionally, the smart device is located within a preset geographic range, and the account numbers of at least two smart devices are different.
Specifically, when a user sends out a wakeup word and queries, if multiple machines log in by using the same account number, the cloud end can judge according to the position and the account number, and then the query result can be sent to one loudspeaker box instead of loudspeaker boxes of all the same account numbers. And when a plurality of machines log in by adopting different accounts, the machines can be awakened and respectively respond to the inquired content, and in this case, the site can be noisy after the inquiry content is issued and the user experience is poor due to network transmission and terminal load. According to the method, the intelligent sound box which is most likely to be awakened by the user can be selected quickly in the local area, so that the method can be suitable for the situation that the account numbers of at least two intelligent sound boxes are different in the wireless network, namely the method can be suitable for awakening the local multiple sound boxes logged in by any account number.
S202, time differences corresponding to at least two intelligent devices are obtained respectively.
In this embodiment, the first timestamp may be obtained from the time information, and a time difference between the first timestamp and the second timestamp may be obtained. The first timestamp is time information of the intelligent device recognizing the awakening word, and the second timestamp is time of receiving the time information and the acoustic signal strength by the cloud. Therefore, the time difference between the first time stamp and the second time stamp indicates the response speed of the smart device under the network load. The smaller the time difference is, the lower the network delay is, and the faster the response speed of the intelligent device to the user is. The cloud can determine which intelligent device responds to the user request according to the index serving as a judgment basis.
S203, selecting one intelligent device from the intelligent devices as a target device according to the time difference and the acoustic signal intensity corresponding to the at least two intelligent devices.
In this embodiment, the intelligent device with the acoustic signal intensity greater than the preset threshold is selected from the intelligent devices as a candidate device; and selecting the candidate equipment with the minimum time difference from the candidate equipment as the target equipment.
Specifically, the above-described time difference indicates the response speed of the smart device under the network load. The smaller the time difference is, the lower the network delay is, and the faster the response speed of the intelligent device to the user is. The strength of the acoustic signal can be used as a basis for judging the distance. Therefore, the intelligent device with the smallest time difference and the strongest acoustic signal intensity can be selected as the target device in the network to enter a listening state to respond to the user request.
S204, sending indication information to the intelligent device to enable the target device to enter an awakening mode.
In this embodiment, the cloud end can send instruction information to the intelligent device. The indication information is used for indicating one intelligent device in the wireless network as a target device so that the target device enters an awakening mode.
In the embodiment, the time information and the acoustic signal strength sent by the intelligent device are received; acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and a second time stamp; the second timestamp is the time when the cloud receives the time information and the strength of the acoustic signal; selecting one intelligent device from each intelligent device as a target device according to the time difference and the acoustic signal intensity; and sending indication information to the intelligent device, wherein the indication information is used for indicating one intelligent device in the wireless network as a target device so as to enable the target device to enter an awakening mode. Thereby can be applied to many smart machine coexistence scenes, select out a smart sound box that the user most probably awakens up fast, overcome many smart machine and awaken up the chaotic technical problem of the voice interaction that brings simultaneously, through high in the clouds received time information and acoustic signal intensity, select out a smart machine that the user most probably awakens up fast, reach efficiency and quality that improves voice interaction, user experience is better technological effect.
FIG. 4 is a schematic illustration according to a third embodiment of the present application; as shown in fig. 4, the method in this embodiment may include:
s301, when voice information containing the awakening words is received, recording time information of the awakening words and acoustic signal intensity of the voice information.
S302, sending the time information and the acoustic signal intensity to a cloud terminal.
And S303, receiving the time information and the acoustic signal strength sent by the intelligent equipment.
S304, acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and the second time stamp.
And S305, selecting one intelligent device from the intelligent devices as a target device according to the time difference and the strength of the acoustic signal.
S306, sending indication information to the intelligent device.
And S307, receiving indication information issued by the cloud.
And S308, controlling the target device to enter an awakening mode.
In this embodiment, for concrete implementation processes and technical principles of steps S301 to S302 and steps S307 to S308, reference is made to relevant descriptions in steps S101 to S103 in the method shown in fig. 2, and details are not repeated here.
In this embodiment, please refer to the related description in step S201 to step S204 in the method shown in fig. 3 for the specific implementation process and technical principle of step S303 to step S306, which is not described herein again.
In this embodiment, when receiving voice information including a wakeup word, the intelligent device identifies time information of the wakeup word and acoustic signal intensity of the voice information by recording; sending the time information and the acoustic signal intensity to a cloud terminal; receiving indication information issued by a cloud, wherein the indication information is used for indicating one intelligent device in a wireless network as a target device; and controlling the target device to enter an awakening mode. Thereby can be applied to many smart machine coexistence scenes, select out a smart sound box that the user most probably awakens up fast, overcome many smart machine and awaken up the chaotic technical problem of the voice interaction that brings simultaneously, through high in the clouds received time information and acoustic signal intensity, select out a smart machine that the user most probably awakens up fast, reach efficiency and quality that improves voice interaction, user experience is better technological effect.
FIG. 5 is a schematic illustration according to a fourth embodiment of the present application; as shown in fig. 5, the apparatus in this embodiment may be applied to a wireless network including a cloud and two or more smart devices; the device comprises:
the recognition module 31 is configured to, when receiving the voice information including the wakeup word, record time information of the recognized wakeup word and an acoustic signal intensity of the voice information;
the sending module 32 is configured to send the time information and the acoustic signal strength to the cloud;
the receiving module 33 is configured to receive indication information issued by the cloud, where the indication information is used to indicate that an intelligent device in a wireless network enters an awake mode as a target device.
In the embodiment, the voice information containing the awakening words is received; carrying out recognition processing on the voice information to obtain time information for recognizing the awakening words and the strength of the acoustic signal; and sending the time information and the acoustic signal intensity to a cloud end, and selecting one intelligent device from the wireless network as a target device through the cloud end to enter an awakening mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by the user can be quickly selected, the disordered voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, the identification module 31 is specifically configured to:
and identifying the awakening words from the voice information through a voice identification algorithm.
In this embodiment, when the user sends out the voice message including the wakeup word, the intelligent device may recognize the wakeup word in the voice message through the voice algorithm, and then record the time information when the wakeup word is recognized.
In one possible design, the control module 34 is further included, and is specifically configured to:
controlling the target equipment to listen;
and in a listening state, executing corresponding operation tasks according to the voice information of the user.
In this embodiment, after the target device enters the listening state, the other intelligent devices do not respond to the voice information sent by the user. And the target equipment identifies the voice information, extracts the voice control instruction and executes corresponding operation according to the voice control instruction. Therefore, one-to-one interaction can be realized, confusion of voice interaction is avoided, and interaction efficiency and quality are improved.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the intelligent device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the intelligent device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because many smart machine awaken up the chaotic condition of voice interaction that brings simultaneously, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two smart devices have different account numbers.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are positioned in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, one intelligent device which is most likely to be awakened by the user can be quickly selected from the wireless network, the condition that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened simultaneously is avoided, the voice interaction efficiency and quality are improved, and user experience is better.
The wake-up apparatus of the intelligent device in this embodiment may execute the technical solutions in the methods shown in fig. 2 and fig. 4, and the specific implementation process and technical principle of the wake-up apparatus refer to the relevant descriptions in the methods shown in fig. 2 and fig. 4, which are not described herein again.
In this embodiment, when receiving voice information including a wakeup word, the intelligent device identifies time information of the wakeup word and acoustic signal intensity of the voice information by recording; sending the time information and the acoustic signal intensity to a cloud terminal; receiving indication information issued by a cloud, wherein the indication information is used for indicating one intelligent device in a wireless network as a target device; and controlling the target device to enter an awakening mode. Thereby can be applied to the coexistent scene of many smart machine, select out a smart sound box that the user most probably awakens up fast, overcome many smart machine and awaken up the chaotic technical problem of the voice interaction that brings simultaneously, through high in the clouds received time information and acoustic signal intensity, select out a smart machine that the user most probably awakens up fast, reach efficiency and the quality that improves the voice interaction, user experience better technological effect.
FIG. 6 is a schematic illustration according to a fifth embodiment of the present application; as shown in fig. 6, the apparatus in this embodiment may be applied to a wireless network including a cloud and two or more smart devices; the device comprises:
a receiving module 41, configured to receive time information and acoustic signal strength sent by at least two receiving smart devices respectively;
an obtaining module 42, configured to obtain time differences corresponding to at least two pieces of intelligent equipment respectively;
a determining module 43, configured to select one intelligent device from the intelligent devices as a target device according to a time difference and an acoustic signal strength corresponding to at least two intelligent devices;
a sending module 44, configured to send instruction information to the smart device, so that the target device enters an awake mode.
In this embodiment, the cloud receives time information and acoustic signal intensity sent by the smart device, obtains the time difference between the time when the cloud receives the time information and the acoustic signal intensity and the time when the smart device recognizes the wakeup word, selects one smart device from the wireless network as the target device according to the time difference and the acoustic signal intensity, and controls the target device to enter the wakeup mode. Therefore, the method can be applied to the scene where multiple intelligent devices coexist, one intelligent device which is most likely to be awakened by the user can be quickly selected, the disordered voice interaction condition caused by the simultaneous awakening of the multiple intelligent devices is avoided, the efficiency and the quality of the voice interaction are improved, and the user experience is better.
In one possible design, the obtaining module 42 is specifically configured to: acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and a second time stamp; the second timestamp is the time when the cloud receives the time information and the acoustic signal strength.
In this embodiment, the first timestamp and the second timestamp are respectively used for recording the time when the intelligent device recognizes the wakeup word and the time when the cloud receives the time information and the intensity of the acoustic signal; based on the time difference between the first timestamp and the second timestamp, the time consumed by the intelligent device for sending data to the cloud end can be reflected. Therefore, the speed of data interaction between the intelligent device and the cloud can be known, the intelligent device with high interaction speed is selected to be used as the target device to be awakened, and the use experience of a user can be improved.
In one possible design, the determining module 43 is specifically configured to:
selecting intelligent equipment with the acoustic signal intensity larger than a preset threshold value from all intelligent equipment as candidate equipment; and selecting the candidate equipment with the minimum time difference from the candidate equipment as the target equipment.
In this embodiment, the method for waking up a plurality of intelligent devices can solve the problem that when a user interacts with a plurality of intelligent devices, a sentence of a wake-up word wakes up the plurality of devices at the same time, so that the plurality of devices respond and execute a voice instruction. The time difference between the acoustic signal intensity and the time information received by the cloud serves as a judgment basis for the distance between the intelligent sound box and the user, so that the most appropriate intelligent sound box can be accurately selected for control, voice interference among multiple devices is avoided, and interaction experience among the devices of the user is optimized.
In one possible design, the acoustic signal strength includes: the sound pressure intensity of the voice information received by the smart device.
In this embodiment, the sound pressure intensity of the voice information received by the smart device is used as a judgment basis, and after the voice information is transmitted through the natural space, the voice information has a certain loss, and the longer the distance between the smart device and the user is, the larger the loss is. Therefore, the distance between the user and the smart device can be reflected by the sound pressure intensity of the voice information. Therefore, the intelligent device closest to the user can be awakened from the wireless network to enter a listening state; avoid because many smart machine awaken up the chaotic condition of voice interaction that brings simultaneously, improve voice interaction's efficiency and quality, user experience is better.
In one possible design, the smart devices are located within a preset geographic range, and at least two smart devices have different account numbers.
The embodiment can be applied to a wireless network consisting of two or more intelligent devices, the intelligent devices in the wireless network are positioned in a preset geographic range, and the accounts of at least two intelligent devices in the wireless network are different. Through the comparison of the time or the voice intensity acquired by the awakening words, the intelligent device which is most likely to be awakened by the user can be rapidly selected from the wireless network, the situation that voice interaction is disordered due to the fact that a plurality of intelligent devices are awakened at the same time is avoided, the efficiency and the quality of the voice interaction are improved, and user experience is better.
The wake-up apparatus of the intelligent device in this embodiment may execute the technical solutions in the methods shown in fig. 3 and fig. 4, and specific implementation processes and technical principles of the wake-up apparatus refer to the relevant descriptions in the methods shown in fig. 3 and fig. 4, which are not described herein again.
In the embodiment, the time information and the acoustic signal strength sent by the intelligent equipment are received; acquiring a first time stamp from the time information, and acquiring a time difference between the first time stamp and the second time stamp; the second timestamp is the time when the cloud receives the time information and the acoustic signal strength; selecting one intelligent device from all intelligent devices as target equipment according to the time difference and the strength of the acoustic signal; and sending indication information to the intelligent device, wherein the indication information is used for indicating one intelligent device in the wireless network as a target device so as to enable the target device to enter an awakening mode. Thereby can be applied to the coexistent scene of many smart machine, select out a smart sound box that the user most probably awakens up fast, overcome many smart machine and awaken up the chaotic technical problem of the voice interaction that brings simultaneously, through high in the clouds received time information and acoustic signal intensity, select out a smart machine that the user most probably awakens up fast, reach efficiency and the quality that improves the voice interaction, user experience better technological effect.
FIG. 7 is a block diagram of a smart sound box used to implement embodiments of the present application; fig. 7 is a block diagram of the smart speaker of fig. 7 according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 7, the smart speaker includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 7 illustrates an example of a processor 501.
Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the wake-up method of the smart sound box of fig. 7 provided in the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the wake-up method of the smart speaker of fig. 7 provided herein.
Memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the wake-up method of the smart speaker of fig. 7 in the embodiments of the present application. The processor 501 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 502, that is, the wake-up method of the smart sound box of fig. 7 in the above method embodiment is implemented.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the smart speaker of fig. 7, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the smart speaker of FIG. 7 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Fig. 7 may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
Input device 503 may receive entered numeric or character information and generate key signal inputs related to user settings and function controls of the smart speaker of fig. 7, such as an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), GPUs (graphics processors), FPGA (field programmable gate arrays) devices, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (15)

1. The method for waking up the intelligent equipment is characterized by being applied to a wireless network comprising a cloud end and two or more intelligent equipment; the method comprises the following steps:
when voice information containing a wake-up word is received, recording time information for identifying the wake-up word and acoustic signal intensity of the voice information;
sending the time information and the acoustic signal intensity to a cloud;
receiving indication information issued by the cloud end, wherein the indication information is used for indicating one intelligent device in the wireless network to enter an awakening mode as a target device, the target device is determined by the cloud end according to time difference and acoustic signal intensity corresponding to at least two intelligent devices, the time difference is a time difference between a first time stamp and a second time stamp, the first time stamp is the time when the intelligent device recognizes an awakening word, and the second time stamp is the time when the cloud end receives the time information and the acoustic signal intensity.
2. The method according to claim 1, before recording time information of the recognized wake word and the acoustic signal strength of the voice information, further comprising:
and identifying the awakening words from the voice information through a voice identification algorithm.
3. The method of claim 1, further comprising:
controlling the target equipment to listen;
and in a listening state, executing corresponding operation tasks according to the voice information of the user.
4. The method of any of claims 1-3, wherein the acoustic signal strength comprises: the sound pressure intensity of the voice information received by the smart device.
5. The method according to any one of claims 1-3, wherein the smart devices are located within a preset geographic range, and account numbers of at least two smart devices are different.
6. The method for waking up the intelligent equipment is characterized by being applied to a wireless network comprising a cloud end and two or more intelligent equipment; the method comprises the following steps:
respectively receiving time information and acoustic signal strength sent by at least two intelligent devices;
acquiring a first time stamp from the time information, wherein the first time stamp is the time when the intelligent equipment identifies the awakening word;
acquiring a time difference between the first timestamp and a second timestamp, wherein the second timestamp is the time when the cloud receives the time information and the acoustic signal strength;
selecting one intelligent device from each intelligent device as a target device according to the time difference and the acoustic signal intensity corresponding to the at least two intelligent devices;
and sending indication information to the target equipment so as to enable the target equipment to enter an awakening mode.
7. The method according to claim 6, wherein selecting one smart device from the smart devices as a target device according to the time difference and the acoustic signal strength corresponding to the at least two smart devices comprises:
selecting intelligent equipment with the acoustic signal intensity larger than a preset threshold value from all intelligent equipment as candidate equipment;
and selecting the candidate equipment with the minimum time difference from the candidate equipment as the target equipment.
8. The method of claim 6, wherein the acoustic signal strength comprises: the sound pressure intensity of the voice information received by the smart device.
9. The method according to any one of claims 6-8, wherein the smart devices are located within a preset geographic range, and account numbers of at least two smart devices are different.
10. The awakening device of the intelligent equipment is characterized by being applied to a wireless network comprising a cloud end and two or more intelligent equipment; the device comprises:
the recognition module is used for recording time information of the recognized awakening words and acoustic signal intensity of the voice information when the voice information containing the awakening words is received;
the sending module is used for sending the time information and the acoustic signal intensity to a cloud end;
the receiving module is used for receiving indication information issued by the cloud end, the indication information is used for indicating one intelligent device in the wireless network to enter an awakening mode as a target device, the target device is determined by the cloud end according to time difference and acoustic signal intensity corresponding to at least two intelligent devices, the time difference is the time difference between a first time stamp and a second time stamp, the first time stamp is the time when the intelligent device recognizes an awakening word, and the second time stamp is the time when the cloud end receives the time information and the acoustic signal intensity.
11. The awakening device of the intelligent equipment is characterized by being applied to a wireless network comprising a cloud end and two or more intelligent equipment; the device comprises:
the receiving module is used for respectively receiving the time information and the acoustic signal strength sent by at least two intelligent devices;
the acquisition module is used for acquiring a first timestamp from the time information, wherein the first timestamp is the time of the intelligent equipment for identifying the awakening word; the time difference between the first time stamp and a second time stamp is acquired, and the second time stamp is the time when the cloud receives the time information and the acoustic signal intensity;
the determining module is used for selecting one intelligent device from the intelligent devices as a target device according to the time difference and the acoustic signal intensity corresponding to the at least two intelligent devices;
and the sending module is used for sending indication information to the intelligent equipment so as to enable the target equipment to enter an awakening mode.
12. An intelligent sound box, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
13. A server, comprising: at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 6-9.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
15. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 6-9.
CN202010085053.3A 2020-02-10 2020-02-10 Awakening method and device of intelligent equipment, intelligent sound box and storage medium Active CN111192591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010085053.3A CN111192591B (en) 2020-02-10 2020-02-10 Awakening method and device of intelligent equipment, intelligent sound box and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010085053.3A CN111192591B (en) 2020-02-10 2020-02-10 Awakening method and device of intelligent equipment, intelligent sound box and storage medium

Publications (2)

Publication Number Publication Date
CN111192591A CN111192591A (en) 2020-05-22
CN111192591B true CN111192591B (en) 2022-12-13

Family

ID=70710399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010085053.3A Active CN111192591B (en) 2020-02-10 2020-02-10 Awakening method and device of intelligent equipment, intelligent sound box and storage medium

Country Status (1)

Country Link
CN (1) CN111192591B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917616A (en) * 2020-06-30 2020-11-10 星络智能科技有限公司 Voice wake-up control method, device, system, computer device and storage medium
CN112331197A (en) * 2020-08-03 2021-02-05 北京京东尚科信息技术有限公司 Response method and response device of electronic equipment, computer system and storage medium
CN112735391B (en) * 2020-12-29 2024-05-31 科大讯飞股份有限公司 Distributed voice response method and related device
CN112750439B (en) * 2020-12-29 2023-10-03 恒玄科技(上海)股份有限公司 Speech recognition method, electronic device and storage medium
CN112929724B (en) * 2020-12-31 2022-09-30 海信视像科技股份有限公司 Display device, set top box and far-field pickup awakening control method
CN115086096A (en) * 2021-03-15 2022-09-20 Oppo广东移动通信有限公司 Method, apparatus, device and storage medium for responding control voice
CN113241068A (en) * 2021-03-26 2021-08-10 青岛海尔科技有限公司 Voice signal response method and device, storage medium and electronic device
CN113628621A (en) * 2021-08-18 2021-11-09 北京声智科技有限公司 Method, system and device for realizing nearby awakening of equipment
CN113793608B (en) * 2021-09-06 2024-03-22 广州联动万物科技有限公司 Method and device for controlling intelligent household appliances through voice
CN114121003A (en) * 2021-11-22 2022-03-01 云知声(上海)智能科技有限公司 Multi-intelligent-equipment cooperative voice awakening method based on local area network
CN118251719A (en) * 2021-11-30 2024-06-25 华为技术有限公司 Control method and device of equipment
CN116935841A (en) * 2022-03-31 2023-10-24 华为技术有限公司 Voice control method and electronic equipment
CN115019793A (en) * 2022-05-31 2022-09-06 四川虹美智能科技有限公司 Awakening method, device, system, medium and equipment based on cooperative error correction
CN115312050A (en) * 2022-06-30 2022-11-08 青岛海尔科技有限公司 Command response method, storage medium and electronic device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180085931A (en) * 2017-01-20 2018-07-30 삼성전자주식회사 Voice input processing method and electronic device supporting the same
CN107919119A (en) * 2017-11-16 2018-04-17 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer-readable medium of more equipment interaction collaborations
CN107919124B (en) * 2017-12-22 2021-07-13 北京小米移动软件有限公司 Equipment awakening method and device
CN109377987B (en) * 2018-08-31 2020-07-28 百度在线网络技术(北京)有限公司 Interaction method, device, equipment and storage medium between intelligent voice equipment
CN109391528A (en) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 Awakening method, device, equipment and the storage medium of speech-sound intelligent equipment
CN110517676A (en) * 2019-08-21 2019-11-29 Oppo广东移动通信有限公司 A kind of voice awakening method and terminal, server, storage medium

Also Published As

Publication number Publication date
CN111192591A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111192591B (en) Awakening method and device of intelligent equipment, intelligent sound box and storage medium
US11227600B2 (en) Virtual assistant identification of nearby computing devices
US20160162469A1 (en) Dynamic Local ASR Vocabulary
CN111261159B (en) Information indication method and device
CN112533041A (en) Video playing method and device, electronic equipment and readable storage medium
CN108681440A (en) A kind of smart machine method for controlling volume and system
CN105122353A (en) Natural human-computer interaction for virtual personal assistant systems
CN112669831B (en) Voice recognition control method and device, electronic equipment and readable storage medium
CN110501918B (en) Intelligent household appliance control method and device, electronic equipment and storage medium
CN112908318A (en) Awakening method and device of intelligent sound box, intelligent sound box and storage medium
CN111755002B (en) Speech recognition device, electronic apparatus, and speech recognition method
WO2016094418A1 (en) Dynamic local asr vocabulary
CN105100672A (en) Display apparatus and method for performing videotelephony using the same
CN112530419B (en) Speech recognition control method, device, electronic equipment and readable storage medium
CN112382279B (en) Voice recognition method and device, electronic equipment and storage medium
CN110399474A (en) A kind of Intelligent dialogue method, apparatus, equipment and storage medium
CN112133307A (en) Man-machine interaction method and device, electronic equipment and storage medium
CN110633357A (en) Voice interaction method, device, equipment and medium
CN111128201A (en) Interaction method, device, system, electronic equipment and storage medium
CN110706701A (en) Voice skill recommendation method, device, equipment and storage medium
CN113157240A (en) Voice processing method, device, equipment, storage medium and computer program product
CN115810356A (en) Voice control method, device, storage medium and electronic equipment
CN112767916A (en) Voice interaction method, device, equipment, medium and product of intelligent voice equipment
CN111627441B (en) Control method, device, equipment and storage medium of electronic equipment
CN113160782B (en) Audio processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210518

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant