CN110610711A - Full-house intelligent voice interaction method and system of distributed Internet of things equipment - Google Patents

Full-house intelligent voice interaction method and system of distributed Internet of things equipment Download PDF

Info

Publication number
CN110610711A
CN110610711A CN201910966907.6A CN201910966907A CN110610711A CN 110610711 A CN110610711 A CN 110610711A CN 201910966907 A CN201910966907 A CN 201910966907A CN 110610711 A CN110610711 A CN 110610711A
Authority
CN
China
Prior art keywords
voice
equipment
sub
things
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910966907.6A
Other languages
Chinese (zh)
Inventor
郑敏
郑炜乔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huachuang Technology Co Ltd
Original Assignee
Shenzhen Huachuang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huachuang Technology Co Ltd filed Critical Shenzhen Huachuang Technology Co Ltd
Priority to CN201910966907.6A priority Critical patent/CN110610711A/en
Publication of CN110610711A publication Critical patent/CN110610711A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a full-house intelligent voice interaction method and a system thereof of distributed Internet of things equipment, the distributed Internet of things equipment acquires voice signals in real time through a microphone array of the distributed Internet of things equipment, the voice enhancement signals, awakening information and the conversion of the signals acquired by the microphone array on a frequency domain and the result of related function calculation are sent to a voice interaction control center, the voice interaction control center arbitrates and determines equipment needing to awaken a response user, meanwhile, awakening information of other distributed sub-equipment is eliminated, a user voice command of the awakening response equipment is sent to a voice cloud server through a communication connection for identification and semantic understanding, and a corresponding control command and voice reply content are issued to the Internet of things equipment responding to the awakening of the user according to a voice processing result. The invention improves the response accuracy of the distributed Internet of things equipment awakening and simultaneously improves the user experience of the whole-house intelligent voice interaction.

Description

Full-house intelligent voice interaction method and system of distributed Internet of things equipment
Technical Field
The invention relates to the technical field of artificial intelligent voice, in particular to a full-house intelligent voice interaction method and a full-house intelligent voice interaction system of distributed Internet of things equipment.
Background
The continuous development of the technology in the field of artificial intelligence enables the accuracy of voice recognition to be improved continuously, and intelligent equipment with voice interaction can be used for people to live in daily life. By arranging a microphone or a microphone array in the intelligent device, the user can interact with the intelligent device at a near distance or a far field with a certain distance, but the voice interaction accuracy is reduced or even cannot be realized when the distance exceeds the range. At present, a plurality of intelligent devices with voice interaction are distributed in a family environment, if an intelligent voice sound box is placed in a living room, an intelligent desk lamp is placed in a bedroom and the like, the devices are placed in a distributed mode, along with the rapid development of the Internet of things, the realization of multi-device interconnection of a plurality of voice intelligent devices is inevitable technical trend and the living needs of an intelligent family, and a whole-house intelligent voice interaction method of distributed Internet of things devices is needed under the scene. In the prior art, distributed internet of things devices use the same awakening word, and after a user is awakened by voice, all devices respond, so that which device should respond to a request of the user cannot be judged, and the use experience of the user is seriously influenced.
Disclosure of Invention
In order to solve the problems, the method and the device have the advantages that arbitration decision is carried out on the awakening information of each distributed internet of things device in the local area network, and the device needing to be awakened and responded is quickly decided and informed, so that network delay is reduced, the response speed is improved, meanwhile, the resource occupation of the distributed internet of things devices is reduced, the cost is saved, the problems of interconnection and cooperative work of a plurality of voice input devices in a family scene are effectively solved, the awakening response accuracy of the distributed internet of things devices is improved, and meanwhile, the user experience of full-house intelligent voice interaction is also improved.
Therefore, according to one aspect of the invention, a full-house intelligent voice interaction method of distributed internet of things equipment is provided, which comprises the following steps:
s100, locally and real-timely acquiring voice of a user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment; s200, calculating a frequency domain transformation and a covariance matrix corresponding to a microphone array receiving signal by each sub-device awakened and hit by the voice; s300, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a controllable response power function; s400, calculating the average value of the controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding sub-equipment for receiving the azimuth information of the user voice signal; s500, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and sends a voice request to the cloud; and S600, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result. The method effectively solves the problem of interconnection and cooperative work of a plurality of voice input devices in a family scene, realizes the response accuracy of awakening of the distributed Internet of things devices, and simultaneously improves the user experience of the whole-house intelligent voice interaction.
In some embodiments, the method comprises the following steps:
s110, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment; s120, each sub-device awakened and hit by the voice calculates frequency domain transformation and covariance matrix corresponding to the receiving signals of the microphone array; s130, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a first controllable response power function; s140, calculating an average value of the first controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding piece of sub-equipment for receiving the azimuth information of the user voice signal; s150, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and initiates a voice request to the cloud; and S160, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result. The covariance matrix can be weighted on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a weighted first controllable response power function, the average value of each first controllable response power function is calculated and used for representing the strength of the direction information of the voice signal of the received user, and the sub-equipment corresponding to the maximum average value is determined as the sub-equipment responding to the awakening of the user, so that the response accuracy and reliability of the awakening of the distributed Internet of things equipment can be improved.
In some embodiments, the method further comprises the steps of:
s310, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment; s320, each sub-device which is awakened and hit by the voice calculates the frequency domain transformation and the covariance matrix corresponding to the receiving signals of the microphone array; s330, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on the improved principal component eigenvector to obtain a second controllable response power function; s340, calculating an average value of the second controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding piece of sub-equipment for receiving the azimuth information of the user voice signal; s350, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and initiates a voice request to the cloud; and S360, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result. The method has strong anti-interference, anti-noise and anti-reverberation performances, and meanwhile, by means of the SRP algorithm with the direction information, the awakening decision result always enables the Internet of things sub-equipment closest to the user to accurately respond, so that the awakening response accuracy and robustness of the distributed Internet of things equipment are further improved.
In some embodiments, the whole-house intelligent voice interaction method includes that voice signal collection, preprocessing, voice enhancement and awakening are processed in distributed internet of things equipment, and a voice interaction control center makes decisions and requests and forwards voice processing.
In some embodiments, the distributed internet of things device comprises a plurality of internet of things terminal devices.
In some embodiments, each of the internet of things terminal devices is provided with a respective microphone array;
the microphone array includes, but is not limited to, a linear 2 microphone, a linear 4 microphone, a linear 6 microphone, a loop 4 microphone, or other non-regular microphone array.
According to another aspect of the present invention, a whole-house intelligent voice interaction system of distributed internet of things devices is provided, including a distributed internet of things device, a voice interaction control center and a voice cloud server, wherein:
the distributed Internet of things equipment collects voice signals in real time, performs signal processing operation, signal enhancement, voice awakening and voice reply content playing, and performs data transmission with a voice interaction arbitration center through communication connection;
the voice interaction control center carries out arbitration of fusion voice positioning and voice awakening according to the content uploaded by each distributed Internet of things device, determines devices needing to be awakened and responded in the distributed Internet of things devices, simultaneously clears awakening information of other distributed sub devices, sends a user voice command of the awakening and responding device to a voice cloud server through communication connection, and issues a corresponding control command and voice reply content to the awakening and responding device according to cloud voice recognition and semantic understanding results;
and the voice cloud server executes processing operations such as voice recognition, semantic understanding, dialogue management, voice synthesis and the like, and returns a response result to the voice interaction control center.
In some embodiments, the distributed internet of things device comprises:
the microphone array audio acquisition module is used for acquiring a voice signal in real time;
the echo cancellation module is used for canceling echo in the voice signal;
a noise reduction module for reducing noise in the speech signal;
a voice wake-up detection module for voice wake-up;
the wake-up post-processing module is used for performing signal processing operation on the wake-up voice;
the network communication module is used for processing the voice signals after operation to carry out communication transmission and realize data transmission;
the voice wake-up response module is used for responding to the wake-up voice signal;
a voice command execution module for executing the received voice command;
and the voice reply and broadcast module is used for playing the voice reply content.
In some embodiments, the voice interaction control center comprises:
the voice agent service module is used for carrying out fusion voice positioning on voice signals uploaded by the distributed Internet of things equipment;
the voice awakening arbitration module is used for carrying out voice awakening arbitration on voice signals uploaded by the distributed Internet of things equipment;
the signal analysis and processing module is used for analyzing equipment needing awakening response in the distributed Internet of things equipment and clearing awakening information of other distributed sub-equipment;
and the network communication module A is used for being in communication connection with the distributed Internet of things equipment and the voice cloud server.
In some embodiments, the voice cloud server comprises:
the voice recognition module is used for recognizing the voice signals uploaded by the voice interaction control center;
a semantic understanding module for understanding the speech signal;
the dialogue management module is used for carrying out dialogue processing on the voice signals;
a skill scheduling module for scheduling the conversation skills;
a skills and content module for managing conversational skills and content;
a dialog response module for generating a response to the dialog content;
the synthesis module is used for carrying out voice synthesis on the voice signal uploaded by the voice interaction control center;
and the network communication module B is used for being in communication connection with the distributed Internet of things equipment and the voice interaction control center.
Compared with the prior art, the invention has the following beneficial effects:
the voice interaction center disclosed by the invention carries out arbitration decision on the awakening information of each distributed Internet of things device in the local area network, and quickly decides and informs the device needing to be awakened and responded, so that the network delay is reduced, the response speed is improved, the resource occupation of the distributed Internet of things devices is reduced, and the cost is saved. In addition, the problem that a plurality of voice input devices in a family scene are interconnected and work cooperatively is effectively solved, the response accuracy of awakening of the distributed Internet of things devices is improved, and meanwhile, the user experience of whole-house intelligent voice interaction is improved.
Drawings
Fig. 1 is a flowchart of a full-house intelligent voice interaction method of a distributed internet of things device according to the present invention;
fig. 2 is a flowchart of a full-house intelligent voice interaction method for distributed internet of things devices according to an embodiment of the present invention;
fig. 3 is a flowchart of a full-house intelligent voice interaction method for a distributed internet of things device according to another embodiment of the present invention;
fig. 4 is a diagram illustrating the distribution of a full-house intelligent distributed internet of things device and its microphone array according to an embodiment of the present invention;
fig. 5 is a block diagram of a full-house intelligent voice interaction system of the distributed internet of things device of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 schematically shows a flowchart of a full-house intelligent voice interaction method for a distributed internet of things device according to the present invention, and as shown in fig. 1, the full-house intelligent voice interaction method for the distributed internet of things device includes the following steps:
s100, locally and real-timely acquiring voice of a user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment;
s200, calculating a frequency domain transformation and a covariance matrix corresponding to a microphone array receiving signal by each sub-device awakened and hit by the voice;
s300, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a controllable response power function;
s400, calculating the average value of the controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding sub-equipment for receiving the azimuth information of the user voice signal;
s500, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and sends a voice request to the cloud;
and S600, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result.
Fig. 2 schematically shows a flowchart of a full-house intelligent voice interaction method for distributed internet of things devices according to an embodiment of the present invention, and as shown in fig. 2, the embodiment includes the following steps:
and S110, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment.
In the embodiment of the disclosure, the distributed internet of things device represents a plurality of intelligent terminals, and each intelligent terminal has a microphone array thereof, including but not limited to a linear 2-microphone, linear 4-microphone, linear 6-microphone, annular 4-microphone or other irregular microphone array. After a user sends a wake-up voice signal, the distributed internet of things equipment in the whole house can receive the wake-up voice signal, after the sub-equipment executes a voice wake-up command, if a plurality of sub-equipment simultaneously respond to the user, user experience and voice interaction quality can be greatly influenced, and at the moment, the position signal and wake-up strength of user sound are required to be decided to obtain the internet of things sub-equipment or intelligent terminal which is about to respond to the wake-up voice signal of the user, namely the wake-up equipment which is most suitable for interacting with the user.
And S120, each sub-device awakened and hit by the voice calculates the frequency domain transformation and the covariance matrix corresponding to the receiving signals of the microphone array.
Such as: x is the number ofm(t) and xn(t) is the wake-up speech signal received by the mth and nth microphones of the sub-array, dividing the signal into signal frames each 10 milliseconds long,andrepresenting the l-th frame signal.
In the embodiment of the present disclosure, the sub-device transforms the voice signal to the frequency domain for processing, i.e., calculates the covariance matrix of the voice signal of the voice wake-up sub-device in the frequency domain. By means of discrete Fourier transformation, the continuous spectrum of the received signal can be approximately described by the signal xn[k]The ith frequency component of the discrete fourier transform of the mth frame of (1) is represented as follows:
l is expressed as the number of points of the discrete fourier transform, and in the frequency domain, the covariance matrix can be recursively estimated as:
Rxx(l,m)=αRxx(l,m-1)+(1-α)X(l,m)XH(l,m)
wherein R isxx(l, m) is the covariance matrix of the mth frame data update in the frequency domainThe estimation result, alpha is a smoothing factor, and the initialization value of the recursive estimation is Rxx(l,1)=X(l,1)XH(l,1)。
S130: on each sub-device, a controlled-response power based on a principal component eigenvector (SRP-PE) mode is adopted to perform weighted calculation on the covariance matrix, and a first controllable response power function is obtained.
In the disclosed embodiment, the eigenvector decomposition of the covariance matrix estimate is expressed as
Wherein λi(l, m) is a characteristic value, qi(l, m) is a number λ ordering of eigenvalues by size1(l,m)≥λ2(l,m)≥…λN(l, m) corresponding feature vectors. The controllable response power (SRP-PE) based on the principal component feature vector, i.e., the first controllable response power function, can be calculated by the following formula:
the above-mentioned calculation of the controllable response power takes into account the information of all frequencies, and can accurately calculate the azimuth information of the voice sound source.
S140: an average of the first controllable response power function is calculated at each of the sub-devices, the average being used to characterize the strength of the corresponding sub-device receiving the orientation information of the user speech signal.
S150: and the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment awakened by the response user according to the received average value of the first controllable response power function of each sub-equipment, informs the sub-equipment to carry out response prompt, continuously picks up the voice command of the user, simultaneously clears awakening information of other distributed sub-equipment, and continuously sends the voice command of the user of the sub-equipment to the voice cloud server to send a voice processing request.
And S160, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result.
It can be seen from the above embodiments that, in the process of determining the sub-devices responding to the user wake-up, the covariance matrix can be weighted on each sub-device in a controllable response power mode based on the principal component eigenvector to obtain a weighted first controllable response power function, an average value of each first controllable response power function is calculated, the average value is used for representing the strength of the direction information of the received user voice signal, and then the sub-device corresponding to the maximum average value is determined as the sub-device responding to the user wake-up, so that the response accuracy and reliability of the distributed internet of things device wake-up can be improved.
Fig. 3 is a flow chart schematically illustrating a whole-house intelligent voice interaction method for distributed internet of things devices according to another embodiment of the present invention, in which the whole-house intelligent voice interaction method includes collecting, preprocessing, voice enhancing, and waking up voice signals in distributed internet of things devices, and the voice interaction control center requests and forwards the voice signals for decision making and voice processing, and the distributed internet of things devices includes a plurality of sub-devices, each of which has a respective microphone array; the method may also be used on a decision device based on a distributed microphone array comprising a plurality of sub-microphone arrays. On the basis of establishing the method shown in fig. 2, the method of the present embodiment includes the following steps:
s310, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment; this step is the same as S110 and will not be described in detail here.
S320, each sub-device which is awakened and hit by the voice calculates the frequency domain transformation and the covariance matrix corresponding to the receiving signals of the microphone array; this step is the same as S120 and will not be described in detail here.
S330, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on the improved principal component eigenvector to obtain a second controllable response power function; the decomposition of the eigenvector of the covariance matrix estimation is the same as the decomposition of the eigenvector in step S130, and is not described in detail here.
On the basis of preparing to calculate the azimuth information of the voice sound source, the complexity of calculation is further reduced, the anti-noise performance is improved, and the controllable response power (noted as SRP-PE-M) based on the principal component feature vector is improved, that is, the second controllable response power function is calculated by only considering the information of the low-noise frequency, and is expressed as follows:
wherein the covariance matrix estimates RxxThe ratio of the second largest eigenvalue to the first largest eigenvalue of (l, m) is expressed as:
λ2(l,m)/λ1(l, m) using λ2(l,m)/λ1The frequency information with a smaller ratio of (l, m) is used to calculate the SRP. The value of Δ (l.m) in the above formula is expressed as:
the value range of delta is greater than 0 and less than 1, fine adjustment can be performed according to the actual application scene, and the value of delta can be 0.3 as a suggestion.
And S340, calculating the average value of the second controllable response power function on each sub-device, wherein the average value is used for representing the strength of the corresponding sub-device for receiving the azimuth information of the voice signal of the user.
And S350, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment awakened by the response user according to the received average value of the second controllable response power function of each sub-equipment, informs the sub-equipment of response prompt, continuously picks up the voice command of the user, simultaneously clears awakening information of other distributed sub-equipment, and continuously sends the voice command of the sub-equipment to the voice cloud server to send a voice processing request. This step is the same as S150 and will not be described in detail here.
And S360, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result. This step is the same as S160 and will not be described in detail here.
It can be seen from the above embodiments that, in the process of determining the sub-devices responding to user wake-up, the covariance matrix can be weighted on each sub-device in a controllable response power mode based on the improved principal component eigenvector to obtain a weighted first controllable response power function, an average value of each first controllable response power function is calculated, the average value is used for representing the strength of the direction information of the received user voice signal, and then the sub-device corresponding to the maximum average value is determined as the sub-device responding to user wake-up.
Fig. 4 schematically shows a distribution illustration diagram of a full-house intelligent distributed internet of things device and a microphone array thereof according to an embodiment of the invention, and the full-house intelligent voice interaction method places the acquisition, preprocessing, voice enhancement and awakening of voice signals in the distributed internet of things device for processing, and a voice interaction control center makes decisions and requests and forwards voice processing.
The distributed Internet of things equipment comprises a plurality of Internet of things terminal equipment. Each terminal device of the Internet of things is provided with a microphone array; microphone arrays include, but are not limited to, linear 2 microphone, linear 4 microphone, linear 6 microphone, loop 4 microphone, or other non-regular microphone arrays. The method may be used on a distributed microphone array based decision device, the distributed microphone array comprising a plurality of sub-microphone arrays. Wherein, thing networking terminal equipment is put to living room, kitchen, bedroom and bathroom equipartition, because every thing networking terminal equipment all is equipped with respective microphone array, consequently, the thing networking terminal equipment that arranges in the living room is equipped with annular microphone array, the thing networking terminal equipment in kitchen is equipped with linear microphone array, the thing networking terminal equipment of a bedroom is equipped with annular microphone array, the thing networking terminal equipment of another bedroom is equipped with linear microphone array, the thing networking terminal equipment of bathroom is equipped with linear microphone array.
Fig. 5 schematically shows a block diagram of a full-house intelligent voice interaction system of a distributed internet of things device according to the present invention, including the distributed internet of things device, a voice interaction control center and a voice cloud server, wherein:
the distributed Internet of things equipment collects voice signals in real time, performs signal processing operation, signal enhancement, voice awakening and voice reply content playing, and performs data transmission with a voice interaction arbitration center through communication connection;
the voice interaction control center carries out arbitration of fusion voice positioning and voice awakening according to the content uploaded by each distributed Internet of things device, determines the devices needing to be awakened and responded in the distributed Internet of things devices, simultaneously clears awakening information of other distributed sub devices, sends a user voice command of the awakening and responding device to the voice cloud server through communication connection, and issues a corresponding control command and voice reply content to the awakening and responding device according to cloud voice recognition and semantic understanding results;
and the voice cloud server executes processing operations such as voice recognition, semantic understanding, dialogue management, voice synthesis and the like, and returns a response result to the voice interaction control center.
Distributed thing networking equipment includes:
the microphone array audio acquisition module 1 is used for acquiring a voice signal in real time; an echo cancellation module 2 for canceling echo in the voice signal; a noise reduction module 3 for reducing noise in the speech signal; a voice wake-up detection module 4 for voice wake-up; a wake-up post-processing module 5 for performing signal processing operation on the wake-up voice; a network communication module 6 for processing the operated voice signal to perform communication transmission and realize data transmission; a voice wake-up response module 7 for responding to the voice signal of wake-up; a voice command execution module 8 for executing the received voice command; and a voice reply and broadcast module 9 for playing the voice reply content.
The voice interaction control center comprises:
and the voice agent service module 10 is used for performing fusion voice positioning on the voice signals uploaded by the distributed internet of things equipment. And the voice wake-up arbitration module 11 is used for performing voice wake-up arbitration on the voice signal uploaded by the distributed internet of things equipment. And the signal analysis and processing module 12 is used for analyzing the devices which need to be awakened and respond in the distributed internet of things devices and clearing the awakening information of other distributed sub-devices. And the network communication module A13 is used for carrying out communication connection with the distributed Internet of things equipment and the voice cloud server.
The pronunciation high in the clouds server includes:
the voice recognition module 14 is used for recognizing the voice signals uploaded by the voice interaction control center; a semantic understanding module 15 for understanding the speech signal; a dialogue management module 16 for dialogue processing the voice signal; a skill scheduling module 17 for performing scheduling processing on the conversation skills; a skills and content module 18 for managing conversational skills and content; a dialogue response module 19 for generating a response to the dialogue content; a synthesis module 20 for performing voice synthesis on the voice signal uploaded by the voice interaction control center; and the network communication module B21 is used for carrying out communication connection with the distributed Internet of things equipment and the voice interaction control center.
The distributed Internet of things equipment collects voice signals in real time through a microphone array of the distributed Internet of things equipment, performs voice positioning, enhancement and awakening processing on a local end, arbitrates and determines equipment needing to be awakened to a responding user through communication connection and sends conversion of the voice enhanced signals, awakening information and the microphone array collected signals on a frequency domain and a result of related function calculation to a voice interaction control center, meanwhile, awakening information of other distributed sub-equipment is eliminated, a user voice command of the awakening responding equipment is sent to a voice cloud server through the communication connection to be recognized and semantically understood, a corresponding control command and voice reply content are issued to the Internet of things equipment responding to the awakening of the user according to the voice processing result, and the whole-house intelligent voice interaction process of the distributed Internet of things equipment is achieved. According to the invention, the voice awakening result of the distributed Internet of things equipment is arbitrated through the voice interaction control center, response control is carried out according to the user command, and the response accuracy of awakening of the distributed Internet of things equipment is improved.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (10)

1. The full-house intelligent voice interaction method of the distributed Internet of things equipment is characterized by comprising the following steps:
s100, locally and real-timely acquiring voice of a user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment;
s200, calculating a frequency domain transformation and a covariance matrix corresponding to a microphone array receiving signal by each sub-device awakened and hit by the voice;
s300, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a controllable response power function;
s400, calculating the average value of the controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding sub-equipment for receiving the azimuth information of the user voice signal;
s500, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and sends a voice request to the cloud;
and S600, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result.
2. The full-house intelligent voice interaction method of the distributed Internet of things equipment as claimed in claim 1, characterized by comprising the following steps:
s110, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment;
s120, each sub-device awakened and hit by the voice calculates frequency domain transformation and covariance matrix corresponding to the receiving signals of the microphone array;
s130, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on principal component eigenvectors to obtain a first controllable response power function;
s140, calculating an average value of the first controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding piece of sub-equipment for receiving the azimuth information of the user voice signal;
s150, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and initiates a voice request to the cloud;
and S160, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result.
3. The full-house intelligent voice interaction method of the distributed internet of things equipment as claimed in claim 2, further comprising the steps of:
s310, locally and real-timely acquiring the voice of the user by each sub-device of the distributed Internet of things device, and performing voice awakening judgment;
s320, each sub-device which is awakened and hit by the voice calculates the frequency domain transformation and the covariance matrix corresponding to the receiving signals of the microphone array;
s330, performing weighted calculation on the covariance matrix on each piece of sub-equipment by adopting a controllable response power mode based on the improved principal component eigenvector to obtain a second controllable response power function;
s340, calculating an average value of the second controllable response power function on each piece of sub-equipment, wherein the average value is used for representing the strength of the corresponding piece of sub-equipment for receiving the azimuth information of the user voice signal;
s350, the voice interaction control center determines the sub-equipment corresponding to the maximum average value as the sub-equipment responding to the awakening of the user, informs the sub-equipment to continue to pick up the voice command of the user, simultaneously clears the awakening information of other distributed sub-equipment, and initiates a voice request to the cloud;
and S160, the voice cloud server executes operations such as voice recognition, semantic understanding, dialogue management and voice synthesis in real time to process the voice command of the user and returns a response result.
4. The whole-house intelligent voice interaction method for the distributed Internet of things equipment according to any one of claims 1 to 3, wherein the whole-house intelligent voice interaction method is characterized in that voice signal acquisition, preprocessing, voice enhancement and awakening are processed in the distributed Internet of things equipment, and a voice interaction control center makes a decision and requests and forwards voice processing.
5. The whole-house intelligent voice interaction method for the distributed Internet of things equipment according to any one of claims 1 to 3, wherein the distributed Internet of things equipment comprises a plurality of Internet of things terminal equipment.
6. The whole-house intelligent voice interaction method for the distributed Internet of things equipment as claimed in claim 5, wherein each Internet of things terminal equipment is provided with a respective microphone array;
the microphone array includes, but is not limited to, a linear 2 microphone, a linear 4 microphone, a linear 6 microphone, a loop 4 microphone, or other non-regular microphone array.
7. Full room intelligence voice interaction system of distributing type thing networking device, its characterized in that, including distributing type thing networking device, voice interaction control center and pronunciation high in the clouds server, wherein:
the distributed Internet of things equipment collects voice signals in real time, performs signal processing operation, signal enhancement, voice awakening and voice reply content playing, and performs data transmission with a voice interaction arbitration center through communication connection;
the voice interaction control center carries out arbitration of fusion voice positioning and voice awakening according to the content uploaded by each distributed Internet of things device, determines devices needing to be awakened and responded in the distributed Internet of things devices, simultaneously clears awakening information of other distributed sub devices, sends a user voice command of the awakening and responding device to a voice cloud server through communication connection, and issues a corresponding control command and voice reply content to the awakening and responding device according to cloud voice recognition and semantic understanding results;
and the voice cloud server executes processing operations such as voice recognition, semantic understanding, dialogue management, voice synthesis and the like, and returns a response result to the voice interaction control center.
8. The whole-house intelligent voice interaction system of distributed internet of things devices as claimed in claim 7, wherein the distributed internet of things devices comprise:
a microphone array audio acquisition module (1) for acquiring a voice signal in real time;
an echo cancellation module (2) for canceling echo in the speech signal;
a noise reduction module (3) for reducing noise in the speech signal;
a voice wake-up detection module (4) for voice wake-up;
a wake-up post-processing module (5) for performing signal processing operation on the wake-up voice;
a network communication module (6) for processing the voice signal after operation to carry out communication transmission and realize data transmission;
a voice wake-up response module (7) for responding to a wake-up voice signal;
a voice command execution module (8) for executing the received voice command;
and the voice reply and broadcast module (9) is used for playing the voice reply content.
9. The whole-house intelligent voice interaction system of distributed internet of things devices as claimed in claim 7, wherein the voice interaction control center comprises:
the voice agent service module (10) is used for carrying out fusion voice positioning on voice signals uploaded by the distributed Internet of things equipment;
the voice wake-up arbitration module (11) is used for performing voice wake-up arbitration on voice signals uploaded by the distributed Internet of things equipment;
the signal analysis and processing module (12) is used for analyzing equipment which needs to be awakened and responded in the distributed Internet of things equipment and clearing awakening information of other distributed sub-equipment;
and the network communication module A (13) is used for being in communication connection with the distributed Internet of things equipment and the voice cloud server.
10. The full-house intelligent voice interaction system of distributed internet of things devices as claimed in claim 7, wherein the voice cloud server comprises:
the voice recognition module (14) is used for recognizing the voice signal uploaded by the voice interaction control center;
a semantic understanding module (15) for understanding the speech signal;
a dialogue management module (16) for performing dialogue processing on the voice signal;
a skill scheduling module (17) for scheduling the conversational skills;
a skills and content module (18) for managing conversational skills and content;
a dialog response module (19) for generating a response to the dialog content;
a synthesis module (20) for performing voice synthesis on the voice signal uploaded by the voice interaction control center;
and the network communication module B (21) is used for carrying out communication connection with the distributed Internet of things equipment and the voice interaction control center.
CN201910966907.6A 2019-10-12 2019-10-12 Full-house intelligent voice interaction method and system of distributed Internet of things equipment Pending CN110610711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910966907.6A CN110610711A (en) 2019-10-12 2019-10-12 Full-house intelligent voice interaction method and system of distributed Internet of things equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910966907.6A CN110610711A (en) 2019-10-12 2019-10-12 Full-house intelligent voice interaction method and system of distributed Internet of things equipment

Publications (1)

Publication Number Publication Date
CN110610711A true CN110610711A (en) 2019-12-24

Family

ID=68894468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910966907.6A Pending CN110610711A (en) 2019-10-12 2019-10-12 Full-house intelligent voice interaction method and system of distributed Internet of things equipment

Country Status (1)

Country Link
CN (1) CN110610711A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145746A (en) * 2019-12-27 2020-05-12 安徽讯呼信息科技有限公司 Man-machine interaction method based on artificial intelligence voice
CN111613232A (en) * 2020-05-22 2020-09-01 苏州思必驰信息科技有限公司 Voice interaction method and system for multi-terminal equipment
CN112420043A (en) * 2020-12-03 2021-02-26 深圳市欧瑞博科技股份有限公司 Intelligent awakening method and device based on voice, electronic equipment and storage medium
CN113129905A (en) * 2021-03-31 2021-07-16 深圳鱼亮科技有限公司 Distributed voice awakening system based on multiple microphone array nodes
CN113506570A (en) * 2021-06-11 2021-10-15 杭州控客信息技术有限公司 Method for waking up voice equipment nearby in whole-house intelligent system
CN114110912A (en) * 2021-11-08 2022-03-01 珠海格力电器股份有限公司 Voice distributed recognition method combined with PLC
WO2023020076A1 (en) * 2021-08-18 2023-02-23 青岛海尔科技有限公司 Device wake-up method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169284A1 (en) * 2013-12-16 2015-06-18 Nuance Communications, Inc. Systems and methods for providing a virtual assistant
CN109215663A (en) * 2018-10-11 2019-01-15 北京小米移动软件有限公司 Equipment awakening method and device
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169284A1 (en) * 2013-12-16 2015-06-18 Nuance Communications, Inc. Systems and methods for providing a virtual assistant
CN109215663A (en) * 2018-10-11 2019-01-15 北京小米移动软件有限公司 Equipment awakening method and device
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WAN XINWANG: ""improved steered response power method for sound source localization based on principal eigenvector"", 《APPLIED ACOUSTICS》 *
XIAOJUN XIONG 等: ""Speaker localization based on ratio between the second and the first eigenvalue"", 《2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING》 *
万新旺: ""基于阵列信号处理与空间听觉的声源定位算法研究"", 《HTTPS://D.WANFANGDATA.COM.CN/THESIS/CHJUAGVZAXNOZXDTMJAYMTA1MTKSCFKYMDU0NZM4GGG2DXO5DHHYYG%3D%3D》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145746A (en) * 2019-12-27 2020-05-12 安徽讯呼信息科技有限公司 Man-machine interaction method based on artificial intelligence voice
CN111613232A (en) * 2020-05-22 2020-09-01 苏州思必驰信息科技有限公司 Voice interaction method and system for multi-terminal equipment
CN112420043A (en) * 2020-12-03 2021-02-26 深圳市欧瑞博科技股份有限公司 Intelligent awakening method and device based on voice, electronic equipment and storage medium
CN113129905A (en) * 2021-03-31 2021-07-16 深圳鱼亮科技有限公司 Distributed voice awakening system based on multiple microphone array nodes
CN113506570A (en) * 2021-06-11 2021-10-15 杭州控客信息技术有限公司 Method for waking up voice equipment nearby in whole-house intelligent system
WO2023020076A1 (en) * 2021-08-18 2023-02-23 青岛海尔科技有限公司 Device wake-up method
CN114110912A (en) * 2021-11-08 2022-03-01 珠海格力电器股份有限公司 Voice distributed recognition method combined with PLC

Similar Documents

Publication Publication Date Title
CN110610711A (en) Full-house intelligent voice interaction method and system of distributed Internet of things equipment
CN110556103B (en) Audio signal processing method, device, system, equipment and storage medium
CN109671433B (en) Keyword detection method and related device
CN109272989B (en) Voice wake-up method, apparatus and computer readable storage medium
CN106251877B (en) Voice Sounnd source direction estimation method and device
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
CN107393550B (en) Voice processing method and device
CN107993670B (en) Microphone array speech enhancement method based on statistical model
CN111161751A (en) Distributed microphone pickup system and method under complex scene
CN102164328B (en) Audio input system used in home environment based on microphone array
CN111418012B (en) Method for processing an audio signal and audio processing device
JP2002062348A (en) Apparatus and method for processing signal
CN111445919B (en) Speech enhancement method, system, electronic device, and medium incorporating AI model
CN108109617A (en) A kind of remote pickup method
CN110675887B (en) Multi-microphone switching method and system for conference system
CN110930987B (en) Audio processing method, device and storage medium
CN110383798A (en) Acoustic signal processing device, acoustics signal processing method and hands-free message equipment
CN109270493A (en) Sound localization method and device
CN115775564B (en) Audio processing method, device, storage medium and intelligent glasses
CN111722696B (en) Voice data processing method and device for low-power-consumption equipment
CN112185408A (en) Audio noise reduction method and device, electronic equipment and storage medium
CN110827846A (en) Speech noise reduction method and device adopting weighted superposition synthesis beam
CN113707136B (en) Audio and video mixed voice front-end processing method for voice interaction of service robot
WO2022105571A1 (en) Speech enhancement method and apparatus, and device and computer-readable storage medium
CN117059068A (en) Speech processing method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191224

RJ01 Rejection of invention patent application after publication