CN110288997B - Device wake-up method and system for acoustic networking - Google Patents

Device wake-up method and system for acoustic networking Download PDF

Info

Publication number
CN110288997B
CN110288997B CN201910660543.9A CN201910660543A CN110288997B CN 110288997 B CN110288997 B CN 110288997B CN 201910660543 A CN201910660543 A CN 201910660543A CN 110288997 B CN110288997 B CN 110288997B
Authority
CN
China
Prior art keywords
intelligent voice
user
intelligent
voice device
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910660543.9A
Other languages
Chinese (zh)
Other versions
CN110288997A (en
Inventor
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201910660543.9A priority Critical patent/CN110288997B/en
Publication of CN110288997A publication Critical patent/CN110288997A/en
Application granted granted Critical
Publication of CN110288997B publication Critical patent/CN110288997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a device awakening method for acoustic networking. The method comprises the following steps: determining gain parameters of various types of intelligent voice equipment; monitoring each intelligent voice device under acoustic networking, and acquiring audio recorded by the intelligent voice device and confidence of an awakening instruction when the intelligent voice device is activated by the awakening instruction; analyzing the gain calibration of the audio energy recorded by each audio recorder according to the gain parameters, and according to the distance between each intelligent voice device to be selected and the user and the direction between the intelligent voice device to be selected and the user; and inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user. The embodiment of the invention also provides a device awakening system for the acoustic networking. According to the embodiment of the invention, the configuration requirement of intelligent voice in the acoustic network is reduced, the fed back target acoustic equipment is effectively determined, and the interaction experience of a user is improved.

Description

Device wake-up method and system for acoustic networking
Technical Field
The invention relates to the field of intelligent voice interaction, in particular to a device awakening method and system for acoustic networking.
Background
Along with the rapid development of intelligent voice, more and more intelligent devices with voice interaction functions are provided for users to use, for example, an intelligent sound box, an intelligent television, an intelligent story teller, and even a higher-level intelligent desk lamp can use voice interaction. These devices interact with the user upon waking in response to a user wake instruction.
When a user purchases the intelligent voice device, the user may have some brand beliefs, or may consider purchasing the same brand of intelligent voice device in order to match with the existing intelligent voice device. Since the wake-up commands of the intelligent voice devices of the same brand are substantially the same, the wake-up commands of the intelligent voice devices are all "small C and small C", for example. When a user sends a wake-up command 'small C and small C' in an environment with a plurality of intelligent devices of the same brand, the surrounding intelligent devices are all activated by mistake, in order to avoid the situation, the devices are configured under the same network by using acoustic networking, and the intelligent electronic devices for feeding back the user are determined by the acoustic networking.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
in order to determine the intelligent electronic device for feeding back the user, the user needs to determine the position of each electronic device, and usually a far-field sound source positioning method or a plurality of microphone arrays are used, the sound intensity of the wake-up segment signal is used, the direction of the signal is calculated by using a beam forming algorithm, and then the sound source position of the user is calculated by a triangulation principle. However, these positioning methods require that the intelligent voice devices in the networking system have an ultrasonic system for measuring the distance, and the intelligent voice devices in the acoustic networking system are different in type, so that it is difficult to meet the configuration requirement, and the robustness is poor, so that the intelligent voice devices which do not meet the user expectations (are far away from the user or do not correspond to the direction) may be scheduled for feedback, and are difficult to be applied to the real voice device networking system.
Disclosure of Invention
The method aims to at least solve the problems that in the prior art, equipment for acoustic networking is high in requirement, poor in robustness and difficult to apply to a real voice equipment networking system.
In a first aspect, an embodiment of the present invention provides a device wake-up method for acoustic networking, including:
gain calibration is carried out on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice, and gain parameters of the different types of intelligent voice equipment are determined;
monitoring each intelligent voice device under the acoustic networking, and when at least one intelligent voice device is activated by a wake-up instruction of a user, acquiring audio recorded by the intelligent voice device to be selected in each activation and confidence of each intelligent voice device to be selected for recognizing the wake-up instruction;
performing energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, performing high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
and inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
In a second aspect, an embodiment of the present invention provides a device wake-up system for acoustic networking, including:
the gain parameter determination program module is used for carrying out gain calibration on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice and determining gain parameters of the different types of intelligent voice equipment;
the confidence determining program module is used for monitoring each intelligent voice device under the acoustic networking, and acquiring the audio recorded by the intelligent voice device to be selected in each activation and the confidence of each intelligent voice device to be selected for identifying the awakening instruction when at least one intelligent voice device is activated by the awakening instruction of the user;
the parameter determination program module is used for carrying out energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, carrying out high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
and the awakening program module is used for inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the steps of the device wake-up method for acoustic networking of any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the device wake-up method for acoustic networking according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: according to gain measurement in a distribution network, gain parameters of different devices are determined, so that the difference between a microphone and an acoustic structure carried by each intelligent voice device and the error of audio frequency recording amplitude are relieved, and the configuration requirement of intelligent voice in an acoustic network is reduced. Through the fusion analysis of the trained neural network, a plurality of dimensions are considered, the fed back target acoustic equipment is effectively determined, and the interaction experience of a user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a device wake-up method for acoustic networking according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a device wake-up system for acoustic networking according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a device wake-up method for acoustic networking according to an embodiment of the present invention, including the following steps:
s11: gain calibration is carried out on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice, and gain parameters of the different types of intelligent voice equipment are determined;
s12: monitoring each intelligent voice device under the acoustic networking, and when at least one intelligent voice device is activated by a wake-up instruction of a user, acquiring audio recorded by the intelligent voice device to be selected in each activation and confidence of each intelligent voice device to be selected for recognizing the wake-up instruction;
s13: performing energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, performing high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
s14: and inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
In this embodiment, the networking technology is a network building technology, the types of computer networks are many, and there are different classification bases according to different networking technologies, where the acoustic network is an exclusive network built by intelligent voice devices.
When each intelligent voice device in the acoustic networking feeds back a user, if the acoustic networking schedules that the distance is far away from the user or only considers the distance, but does not care about the speaking direction of the user or some other factors, although the scheduled resource can also be fed back to the user through the intelligent voice device, the experience of the user can be influenced to a certain extent due to the fact that the scheduled intelligent voice device is far away from the user, the direction is opposite to the user, or other situations.
In step S11, since the types of the intelligent voice devices in the network are different, for example, the intelligent sound box, the intelligent television, the intelligent story machine, and the intelligent desk lamp, and the microphones and the acoustic structures carried by these devices are different, the recorded audio amplitudes also have some differences, so that the gain measurement is performed on different devices in the acoustic network, and the devices are gained to the condition that the audio amplitudes are relatively the same. The method comprises the steps of presetting standard training voice, recording a section of standard voice, taking the standard training voice as the standard training voice, and performing gain calibration on intelligent voice equipment of different types under the same acoustic networking protocol, wherein acoustic networking generally can be equipment groups with the same networking protocol, for example, the intelligent voice equipment of the same brand has the same networking protocol, in the networking establishing process, putting a plurality of intelligent voice equipment of the same brand together, and performing gain calibration on each intelligent voice equipment through a gain verification mode according to the recording of a section of standard voice and further through the standard voice to obtain respective gain parameters.
For step S12, after the acoustic networking is established in step S11, the smart audio devices under the acoustic networking are monitored in real time, and when at least one smart audio device is activated by a wake-up command of the user, for example, when the user says "C small C" in a bedroom equipped with a smart tv, a smart sound box, and a smart story machine, the three smart devices are activated by the wake-up command of the user. At this time, the acoustic networking acquires audio recorded by the to-be-selected intelligent voice devices (namely, the intelligent television, the intelligent sound box and the intelligent story machine) in each activation, and since the positions of the three intelligent devices are different from the position of the user, when the audio is recorded, the recorded effect has a certain difference, and therefore, the determination of the confidence coefficient of the awakening instruction is different.
For step S13, the audio recorded by the smart voice device obtained in step S12 is subjected to respective energy analysis gain calibration through the gain parameter of the respective smart voice device determined in step S11, so that the audio recorded by the respective smart voice devices is at the same reference. And analyzing the direct sound and reverberation ratio of the audio frequency after the gain calibration respectively to determine the distance of each intelligent voice device relative to the user, and analyzing the high-low frequency consistency of the audio frequency after the gain calibration to determine the direction of each intelligent voice device relative to the user.
For step S14, at least the distance, direction, and confidence of the wake-up instruction between each to-be-selected smart voice device and the user are input into a pre-trained neural network for information fusion analysis. The neural network is obtained by pre-training the training data of the intelligent device, wherein the training data comprises the known distance and direction between the intelligent voice device and the user, the confidence level of the awakening instruction and the determined feedback. For example: smart tv { confidence 85%, distance 2m, direction: forward }, smart speaker { confidence 83%, distance 1.8m, direction: dorsad }, intelligent story machine { confidence 80%, distance 2.5m, direction: back to back). Finally, the neural network outputs 0.5, 0.3, 0.2, respectively. The intelligent device which wakes up the user to feed back the user is the intelligent television. Wherein the information fusion analysis at least comprises: and performing information fusion analysis through a decision tree and/or a support vector machine and/or maximum likelihood.
According to the embodiment, the gain parameters of different devices are determined according to the gain measurement in the distribution network, so that the difference between the microphone and the acoustic structure carried by each intelligent voice device and the error existing in the audio frequency recording amplitude are relieved, and the configuration requirement of intelligent voice in the acoustic network is reduced. Through the fusion analysis of the trained neural network, a plurality of dimensions are considered, the fed back target acoustic equipment is effectively determined, and the interaction experience of a user is improved.
As an implementation manner, in this embodiment, the method further includes:
determining the signal-to-noise ratio of the audio recorded by each intelligent voice device to be selected, and reflecting the definition of the audio recorded by each intelligent voice device to be selected;
and inputting at least the definition of the recorded audio of each intelligent voice device to be selected, the distance and direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
In this embodiment, the definition of the audio recorded by each to-be-selected intelligent voice device is also used as a dimension for determining to wake up the intelligent voice device, and the signal-to-noise ratio refers to the ratio of the power of the output signal of the amplifier to the power of the noise output at the same time, which is often expressed in decibels, and a higher signal-to-noise ratio of the device indicates that it generates less noise. Generally, the larger the signal-to-noise ratio, the smaller the noise mixed in the signal, the higher the quality of sound playback, and vice versa. The signal-to-noise ratio should not be lower than 70dB generally, and the signal-to-noise ratio of the hi-fi loudspeaker box should reach more than 110 dB.
Similarly, when training the neural network, the signal-to-noise ratio parameter is also added to the training data for training.
According to the implementation method, more dimensions are considered, the fed back target acoustic equipment is determined more effectively, and the user experience is improved.
As an implementation manner, in this embodiment, the method further includes:
establishing a space coordinate system for different types of intelligent voice equipment under the same acoustic networking protocol, and respectively sending pronunciation instructions to the intelligent voice equipment;
determining the coordinates of the intelligent voice equipment in the space coordinate system based on the pronunciation of the intelligent voice equipment according to the pronunciation instruction, and determining the coordinates of the user in the space coordinate system according to the pronunciation of the awakening instruction of the user;
and inputting the coordinates and directions of the intelligent voice devices to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent devices for feeding back the user.
In this embodiment, in order to achieve more accurate feedback, a spatial coordinate system is established for different types of intelligent voice devices under the same acoustic networking protocol, and then the intelligent voice devices are sequentially sounded to perform mutual positioning, so as to determine coordinates of the intelligent voice devices in the spatial coordinate system. Similarly, when the user pronounces, the coordinates of the user in the space coordinate system are determined according to the distance between the user and each intelligent voice device.
Similarly, when training the neural network, the training data is also trained by adding the parameter of coordinates.
Compared with the distance, the implementation method has the advantages that the coordinates are more accurately grasped for the space, the fed back target acoustic equipment is more effectively determined, and the user experience is improved.
Fig. 2 is a schematic structural diagram of a device wake-up system for acoustic networking according to an embodiment of the present invention, which can execute the device wake-up method for acoustic networking according to any of the above embodiments and is configured in a terminal.
The device wake-up system for acoustic networking provided by the embodiment includes: a gain parameter determination program module 11, a confidence determination program module 12, a parameter determination program module 13 and a wake-up program module 14.
The gain parameter determining program module 11 is configured to perform gain calibration on different types of intelligent voice devices under the same acoustic networking protocol through a preset standard training voice, and determine a gain parameter of each type of intelligent voice device; the confidence level determining program module 12 is configured to monitor each intelligent voice device under the acoustic networking, and when at least one intelligent voice device is activated by a wake-up instruction of a user, obtain an audio frequency recorded by the intelligent voice device to be selected in each activation and a confidence level of each intelligent voice device to be selected for recognizing the wake-up instruction; the parameter determination program module 13 is configured to perform energy analysis gain calibration on each recorded audio according to a gain parameter of each intelligent voice device to be selected, determine a distance between each intelligent voice device to be selected and the user according to at least direct sound and reverberation ratio analysis on the audio after gain calibration, perform high-low frequency consistency analysis on the audio after gain calibration, and determine a direction between the intelligent voice device to be selected and the user; the awakening program module 14 is configured to input at least the distance and the direction between each to-be-selected intelligent voice device and the user and the confidence of the awakening instruction into a pre-trained neural network for information fusion analysis, and awaken the intelligent device for feeding back the user.
Further, the parameter determination program module is further configured to:
determining the signal-to-noise ratio of the audio recorded by each intelligent voice device to be selected, and reflecting the definition of the audio recorded by each intelligent voice device to be selected;
the awakening program module is also used for inputting the definition of the recorded audio of each intelligent voice device to be selected, the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
Further, the system further comprises:
the space coordinate establishing program module is used for establishing a space coordinate system for different types of intelligent voice equipment under the same acoustic networking protocol and respectively sending pronunciation instructions to the intelligent voice equipment;
the coordinate determination program module is used for determining the coordinates of the intelligent voice equipment in the space coordinate system based on the pronunciation of the intelligent voice equipment according to the pronunciation instruction, and determining the coordinates of the user in the space coordinate system according to the pronunciation of the awakening instruction of the user;
the awakening program module is also used for inputting the coordinates and the directions of the intelligent voice devices to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent devices for feeding back the user.
Further, the information fusion analysis at least comprises: and performing information fusion analysis through a decision tree and/or a support vector machine and/or maximum likelihood to wake up intelligent equipment for feeding back a user.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the equipment awakening method for the acoustic networking in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
gain calibration is carried out on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice, and gain parameters of the different types of intelligent voice equipment are determined;
monitoring each intelligent voice device under the acoustic networking, and when at least one intelligent voice device is activated by a wake-up instruction of a user, acquiring audio recorded by the intelligent voice device to be selected in each activation and confidence of each intelligent voice device to be selected for recognizing the wake-up instruction;
performing energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, performing high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
and inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a device wake-up method for acoustic networking in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the steps of the device wake-up method for acoustic networking of any embodiment of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with intelligent voice functions.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A device wake-up method for acoustic networking, comprising:
gain calibration is carried out on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice, and gain parameters of the different types of intelligent voice equipment are determined;
monitoring each intelligent voice device under the acoustic networking, and when at least one intelligent voice device is activated by a wake-up instruction of a user, acquiring audio recorded by the intelligent voice device to be selected in each activation and confidence of each intelligent voice device to be selected for recognizing the wake-up instruction;
performing energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, performing high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
and inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
2. The method of claim 1, wherein the method further comprises:
determining the signal-to-noise ratio of the audio recorded by each intelligent voice device to be selected, and reflecting the definition of the audio recorded by each intelligent voice device to be selected;
and inputting at least the definition of the recorded audio of each intelligent voice device to be selected, the distance and direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
3. The method of claim 1, wherein the method further comprises:
establishing a space coordinate system for different types of intelligent voice equipment under the same acoustic networking protocol, and respectively sending pronunciation instructions to the intelligent voice equipment;
determining the coordinates of the intelligent voice equipment in the space coordinate system based on the pronunciation of the intelligent voice equipment according to the pronunciation instruction, and determining the coordinates of the user in the space coordinate system according to the pronunciation of the awakening instruction of the user;
and inputting the coordinates and directions of the intelligent voice devices to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent devices for feeding back the user.
4. The method of claim 1, wherein the information fusion analysis comprises at least: and performing information fusion analysis through a decision tree and/or a support vector machine and/or maximum likelihood to wake up intelligent equipment for feeding back a user.
5. A device wake-up system for acoustic networking, comprising:
the gain parameter determination program module is used for carrying out gain calibration on different types of intelligent voice equipment under the same acoustic networking protocol through preset standard training voice and determining gain parameters of the different types of intelligent voice equipment;
the confidence determining program module is used for monitoring each intelligent voice device under the acoustic networking, and acquiring the audio recorded by the intelligent voice device to be selected in each activation and the confidence of each intelligent voice device to be selected for identifying the awakening instruction when at least one intelligent voice device is activated by the awakening instruction of the user;
the parameter determination program module is used for carrying out energy analysis gain calibration on the audio recorded by each intelligent voice device according to the gain parameter of each intelligent voice device to be selected, determining the distance between each intelligent voice device to be selected and the user at least according to the direct sound and reverberation ratio analysis on the audio subjected to gain calibration, carrying out high-low frequency consistency analysis on the audio subjected to gain calibration, and determining the direction between the intelligent voice device to be selected and the user;
and the awakening program module is used for inputting the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
6. The system of claim 5, wherein the parameter determination program module is further to:
determining the signal-to-noise ratio of the audio recorded by each intelligent voice device to be selected, and reflecting the definition of the audio recorded by each intelligent voice device to be selected;
the awakening program module is also used for inputting the definition of the recorded audio of each intelligent voice device to be selected, the distance and the direction between each intelligent voice device to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent device for feeding back the user.
7. The system of claim 5, wherein the system further comprises:
the space coordinate establishing program module is used for establishing a space coordinate system for different types of intelligent voice equipment under the same acoustic networking protocol and respectively sending pronunciation instructions to the intelligent voice equipment;
the coordinate determination program module is used for determining the coordinates of the intelligent voice equipment in the space coordinate system based on the pronunciation of the intelligent voice equipment according to the pronunciation instruction, and determining the coordinates of the user in the space coordinate system according to the pronunciation of the awakening instruction of the user;
the awakening program module is also used for inputting the coordinates and the directions of the intelligent voice devices to be selected and the user and the confidence coefficient of the awakening instruction into a pre-trained neural network for information fusion analysis, and awakening the intelligent devices for feeding back the user.
8. The system of claim 5, wherein the information fusion analysis includes at least: and performing information fusion analysis through a decision tree and/or a support vector machine and/or maximum likelihood to wake up intelligent equipment for feeding back a user.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-4.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN201910660543.9A 2019-07-22 2019-07-22 Device wake-up method and system for acoustic networking Active CN110288997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910660543.9A CN110288997B (en) 2019-07-22 2019-07-22 Device wake-up method and system for acoustic networking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910660543.9A CN110288997B (en) 2019-07-22 2019-07-22 Device wake-up method and system for acoustic networking

Publications (2)

Publication Number Publication Date
CN110288997A CN110288997A (en) 2019-09-27
CN110288997B true CN110288997B (en) 2021-04-16

Family

ID=68023752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910660543.9A Active CN110288997B (en) 2019-07-22 2019-07-22 Device wake-up method and system for acoustic networking

Country Status (1)

Country Link
CN (1) CN110288997B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110718227A (en) * 2019-10-17 2020-01-21 深圳市华创技术有限公司 Multi-mode interaction based distributed Internet of things equipment cooperation method and system
CN111091828B (en) * 2019-12-31 2023-02-14 华为技术有限公司 Voice wake-up method, device and system
CN111223497B (en) * 2020-01-06 2022-04-19 思必驰科技股份有限公司 Nearby wake-up method and device for terminal, computing equipment and storage medium
CN111276139B (en) * 2020-01-07 2023-09-19 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN111276142B (en) * 2020-01-20 2023-04-07 北京声智科技有限公司 Voice wake-up method and electronic equipment
CN111276143B (en) * 2020-01-21 2023-04-25 北京远特科技股份有限公司 Sound source positioning method, sound source positioning device, voice recognition control method and terminal equipment
CN113495710A (en) * 2020-03-18 2021-10-12 中国电信股份有限公司 Sound awakening processing method and device, sound analysis platform and storage medium
CN111613221A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Nearby awakening method, device and system
CN111739521B (en) * 2020-06-19 2021-06-22 腾讯科技(深圳)有限公司 Electronic equipment awakening method and device, electronic equipment and storage medium
CN112130918B (en) * 2020-09-25 2024-07-23 深圳市欧瑞博科技股份有限公司 Intelligent device awakening method, device and system and intelligent device
CN112260860B (en) * 2020-10-09 2024-03-29 北京小米松果电子有限公司 Equipment debugging method and device, electronic equipment and storage medium
CN112420051A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Equipment determination method, device and storage medium
CN112599126B (en) * 2020-12-03 2022-05-27 海信视像科技股份有限公司 Awakening method of intelligent device, intelligent device and computing device
CN112837694B (en) * 2021-01-29 2022-12-06 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN112992140B (en) * 2021-02-18 2021-11-16 珠海格力电器股份有限公司 Control method, device and equipment of intelligent equipment and storage medium
CN113674761B (en) * 2021-07-26 2023-07-21 青岛海尔科技有限公司 Device determination method and device determination system
CN113889102A (en) * 2021-09-23 2022-01-04 达闼科技(北京)有限公司 Instruction receiving method, system, electronic device, cloud server and storage medium
CN114465837B (en) * 2022-01-30 2024-03-08 云知声智能科技股份有限公司 Collaborative wake-up processing method and device for intelligent voice equipment
CN115171703B (en) * 2022-05-30 2024-05-24 青岛海尔科技有限公司 Distributed voice awakening method and device, storage medium and electronic device
CN116206618A (en) * 2022-12-29 2023-06-02 海尔优家智能科技(北京)有限公司 Equipment awakening method, storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215663A (en) * 2018-10-11 2019-01-15 北京小米移动软件有限公司 Equipment awakening method and device
CN109256134A (en) * 2018-11-22 2019-01-22 深圳市同行者科技有限公司 A kind of voice awakening method, storage medium and terminal
CN109427336A (en) * 2017-09-01 2019-03-05 华为技术有限公司 Voice object identifying method and device
CN110033773A (en) * 2018-12-13 2019-07-19 蔚来汽车有限公司 For the audio recognition method of vehicle, device, system, equipment and vehicle

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685666B2 (en) * 2018-04-06 2020-06-16 Intel Corporation Automatic gain adjustment for improved wake word recognition in audio systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109427336A (en) * 2017-09-01 2019-03-05 华为技术有限公司 Voice object identifying method and device
CN109215663A (en) * 2018-10-11 2019-01-15 北京小米移动软件有限公司 Equipment awakening method and device
CN109256134A (en) * 2018-11-22 2019-01-22 深圳市同行者科技有限公司 A kind of voice awakening method, storage medium and terminal
CN110033773A (en) * 2018-12-13 2019-07-19 蔚来汽车有限公司 For the audio recognition method of vehicle, device, system, equipment and vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈白杨." 基于语音交互的智能家居控制系统".《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》.2018, *

Also Published As

Publication number Publication date
CN110288997A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110288997B (en) Device wake-up method and system for acoustic networking
US9560449B2 (en) Distributed wireless speaker system
US9402145B2 (en) Wireless speaker system with distributed low (bass) frequency
US20090034750A1 (en) System and method to evaluate an audio configuration
KR20200021093A (en) Detection of Replay Attacks
CN109195090B (en) Method and system for testing electroacoustic parameters of microphone in product
US10997965B2 (en) Automated voice processing testing system and method
US10602270B1 (en) Similarity measure assisted adaptation control
CN107112014A (en) Application foci in voice-based system
CN109658935B (en) Method and system for generating multi-channel noisy speech
CN103366756A (en) Sound signal reception method and device
CN106162427A (en) A kind of sound obtains directive property method of adjustment and the device of element
US20180053512A1 (en) Reverberation compensation for far-field speaker recognition
CN110956976B (en) Echo cancellation method, device and equipment and readable storage medium
CN109885162B (en) Vibration method and mobile terminal
Iijima et al. Audio hotspot attack: An attack on voice assistance systems using directional sound beams and its feasibility
CN113010139A (en) Screen projection method and device and electronic equipment
CN116868265A (en) System and method for data enhancement and speech processing in dynamic acoustic environments
CN111105803A (en) Method and device for quickly identifying gender and method for generating algorithm model for identifying gender
CN109377430B (en) Learning plan recommendation method and learning client
CN110351629B (en) Radio reception method, radio reception device and terminal
CN114464184B (en) Method, apparatus and storage medium for speech recognition
US12112741B2 (en) System and method for data augmentation and speech processing in dynamic acoustic environments
CN111312244B (en) Voice interaction system and method for sand table
Panek et al. Challenges in adopting speech control for assistive robots

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Ltd.