CN112543390A - Intelligent infant sound box and interaction method thereof - Google Patents
Intelligent infant sound box and interaction method thereof Download PDFInfo
- Publication number
- CN112543390A CN112543390A CN202011336049.6A CN202011336049A CN112543390A CN 112543390 A CN112543390 A CN 112543390A CN 202011336049 A CN202011336049 A CN 202011336049A CN 112543390 A CN112543390 A CN 112543390A
- Authority
- CN
- China
- Prior art keywords
- wolf
- infant
- module
- voice
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H04R1/028—Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H04R1/023—Screens for loudspeakers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0638—Interactive procedures
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/02—Details casings, cabinets or mounting therein for transducers covered by H04R1/02 but not provided for in any of its subgroups
Abstract
The invention provides an intelligent infant sound box and an interaction method thereof, wherein the intelligent infant sound box comprises a sound box body, a central processing unit, a storage and a network connector are arranged in the sound box body, a display screen is arranged on the surface of the sound box body, a voice acquisition module, an infant voiceprint acquisition module, a wake-up module, an output module and an intelligent control module are arranged in the central processing unit, a storage module is arranged in the storage, the output module is connected with the display screen through a circuit, and the intelligent control module is electrically connected with the voice acquisition module, the infant voiceprint acquisition module, the wake-up module, the storage module and the output; the voice acquisition module is used for acquiring adult voice information; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words.
Description
Technical Field
The invention relates to the technical field of voice recognition technology and artificial intelligence, in particular to an intelligent infant sound box and an interaction method thereof.
Background
With the maturity of artificial intelligence technology and the development of speech recognition technology, intelligent sound boxes have begun to penetrate into people's daily life. The intelligent sound box not only has the functions of playing audio and video by the traditional voice equipment, but also has the functions of intellectualization, interaction, control and the like. The existing loudspeaker boxes popular in the market have good interactivity and intelligence, but have poor experience effects on infants who just learn to speak for a short time, such as overlong awakening words and incapability of correctly recognizing instructions of the infants.
The neural network simulates the thinking function of the human brain structure, has strong self-learning and association functions, high precision, less manual intervention and less utilization of expert knowledge. A typical neural network architecture comprises an input layer, one or more hidden layers, and an output layer. The meta-heuristic algorithm can find a global solution in a multi-dimensional search space, and is widely applied to parameter training of a neural network. However, the neural network also has inherent defects of easy falling into local optimum, low precision, slow learning speed and the like. The processor performance of the existing intelligent sound box is general, and the data processing capability is poor.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides an infant intelligent sound box and an interaction method thereof, wherein the infant intelligent sound box can intelligently distinguish adult awakening or infant awakening by optimizing neural network parameters through an improved algorithm.
The purpose of the invention can be realized by the following technical scheme: the intelligent infant sound box comprises a sound box body, wherein a central processing unit, a storage and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice, and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
In foretell infant's intelligence audio amplifier, it is a plurality of single pronunciation collection module specifically includes first adult administrator pronunciation collection module, second adult administrator pronunciation collection module, third adult administrator pronunciation collection module, fourth adult administrator pronunciation collection module, fifth adult administrator pronunciation collection module and sixth adult administrator pronunciation collection module.
The voice acquisition module can acquire the voice information of six adults (parent, grandpa) altogether to after carrying out the discernment training through intelligent control module, these six adults can control the authority that the infant controlled intelligent audio amplifier.
The interaction method of the intelligent infant sound box comprises the following steps:
A. the method for recognizing adult speech comprises the following steps:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
B. the method for recognizing the infant voice comprises the following steps:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
In the interaction of the intelligent infant sound box, the compact gray wolf algorithm comprises the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt is a function for square root; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
Compared with the prior art, the intelligent infant sound box and the interaction method thereof have the following advantages:
the method can dynamically add awakening words, efficiently identify infant voice instructions, intelligently control the authority of infants to access the intelligent sound box, construct an efficient neural network voice training model, optimize neural network parameters in an embedded CPU with limited operation capability by the improved compact wolf algorithm, avoid the problem that the neural network is trapped in a local trap, effectively improve the prediction accuracy and accelerate the prediction process.
Drawings
FIG. 1 is a system diagram of the present invention;
FIG. 2 is a block diagram of an adult speech recognition process of the present invention;
FIG. 3 is a block diagram of a baby speech recognition process according to the present invention;
FIG. 4 is a flow chart of neural network speech recognition training of the present invention;
FIG. 5 is a diagram of a neural network architecture of the present invention;
FIG. 6 is a flow chart of the improved compact wolf algorithm of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:
as shown in fig. 1, the infant intelligent sound box comprises a sound box body, wherein a central processing unit, a memory and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body, and is characterized in that a voice acquisition module, an infant voiceprint acquisition module, a wake-up module, an output module and an intelligent control module are arranged in the central processing unit, a storage module is arranged in the memory, the output module is connected with the display screen through a circuit, and the intelligent control module is electrically connected with the voice acquisition module, the infant voiceprint acquisition module, the wake-up module, the storage module and the output module; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
In foretell infant's intelligence audio amplifier, a plurality of single pronunciation collection module specifically include first adult administrator pronunciation collection module, second adult administrator pronunciation collection module, third adult administrator pronunciation collection module, fourth adult administrator pronunciation collection module, fifth adult administrator pronunciation collection module and sixth adult administrator pronunciation collection module.
The voice acquisition module can acquire the voice information of six adults (parent, grandpa) altogether to after carrying out the discernment training through intelligent control module, these six adults can control the authority that the infant controlled intelligent audio amplifier.
The interaction method of the intelligent infant sound box comprises the following steps:
as shown in fig. 2, a method for adult speech recognition:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
as shown in fig. 4, 6), performing neural network speech recognition training by the neural network model constructed in step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network; (as shown in FIG. 5)
d. Calling compact wolf algorithm; (as shown in FIG. 6)
e. Setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
as shown in fig. 3, B, a method for speech recognition of a baby:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
as shown in fig. 4, 6), performing neural network speech recognition training by the neural network model constructed in step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network; (as shown in FIG. 5)
d. Calling compact wolf algorithm; (as shown in FIG. 6)
e. Setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
As shown in fig. 6, in the interaction of the infant smart speaker, the compact grayish wolf algorithm includes the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt () is the square root function; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
Compared with the prior art, the intelligent infant sound box and the interaction method thereof have the following advantages:
the method can dynamically add awakening words, efficiently identify infant voice instructions, intelligently control the authority of infants to access the intelligent sound box, construct an efficient neural network voice training model, optimize neural network parameters in an embedded CPU with limited operation capability by the improved compact wolf algorithm, avoid the problem that the neural network is trapped in a local trap, effectively improve the prediction accuracy and accelerate the prediction process.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.
Claims (4)
1. An intelligent infant sound box comprises a sound box body, wherein a central processing unit, a storage and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice, and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
2. The intelligent sound box for infants as defined in claim 1, wherein the plurality of single voice collecting modules specifically comprises a first adult administrator voice collecting module, a second adult administrator voice collecting module, a third adult administrator voice collecting module, a fourth adult administrator voice collecting module, a fifth adult administrator voice collecting module and a sixth adult administrator voice collecting module.
3. The interaction method of a smart sound box for young children as claimed in claim 1, comprising the following steps:
A. the method for recognizing adult speech comprises the following steps:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
B. the method for recognizing the infant voice comprises the following steps:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
4. The interactive method of intelligent speakers for young children as claimed in claim 3, wherein the compact wolf algorithm comprises the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt is a function for square root; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011336049.6A CN112543390B (en) | 2020-11-25 | 2020-11-25 | Intelligent infant sound box and interaction method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011336049.6A CN112543390B (en) | 2020-11-25 | 2020-11-25 | Intelligent infant sound box and interaction method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112543390A true CN112543390A (en) | 2021-03-23 |
CN112543390B CN112543390B (en) | 2023-03-24 |
Family
ID=75015144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011336049.6A Active CN112543390B (en) | 2020-11-25 | 2020-11-25 | Intelligent infant sound box and interaction method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112543390B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180277117A1 (en) * | 2017-03-23 | 2018-09-27 | Alex Lauren HERGENROEDER | Method and Apparatus for Speech Interaction with Children |
WO2019160396A2 (en) * | 2019-04-11 | 2019-08-22 | 엘지전자 주식회사 | Guide robot and operation method for guide robot |
CN110534099A (en) * | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Voice wakes up processing method, device, storage medium and electronic equipment |
CN110696002A (en) * | 2019-08-31 | 2020-01-17 | 左建 | Intelligent early education robot |
CN211063690U (en) * | 2019-12-25 | 2020-07-21 | 安徽淘云科技有限公司 | Drawing book recognition equipment |
CN111638787A (en) * | 2020-05-29 | 2020-09-08 | 百度在线网络技术(北京)有限公司 | Method and device for displaying information |
CN111816188A (en) * | 2020-06-23 | 2020-10-23 | 漳州龙文维克信息技术有限公司 | Man-machine voice interaction method for intelligent robot |
-
2020
- 2020-11-25 CN CN202011336049.6A patent/CN112543390B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180277117A1 (en) * | 2017-03-23 | 2018-09-27 | Alex Lauren HERGENROEDER | Method and Apparatus for Speech Interaction with Children |
WO2019160396A2 (en) * | 2019-04-11 | 2019-08-22 | 엘지전자 주식회사 | Guide robot and operation method for guide robot |
CN110696002A (en) * | 2019-08-31 | 2020-01-17 | 左建 | Intelligent early education robot |
CN110534099A (en) * | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Voice wakes up processing method, device, storage medium and electronic equipment |
CN211063690U (en) * | 2019-12-25 | 2020-07-21 | 安徽淘云科技有限公司 | Drawing book recognition equipment |
CN111638787A (en) * | 2020-05-29 | 2020-09-08 | 百度在线网络技术(北京)有限公司 | Method and device for displaying information |
CN111816188A (en) * | 2020-06-23 | 2020-10-23 | 漳州龙文维克信息技术有限公司 | Man-machine voice interaction method for intelligent robot |
Also Published As
Publication number | Publication date |
---|---|
CN112543390B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN110491416B (en) | Telephone voice emotion analysis and identification method based on LSTM and SAE | |
Zhang et al. | Cooperative learning and its application to emotion recognition from speech | |
US20200402497A1 (en) | Systems and Methods for Speech Generation | |
CN104115221B (en) | Changed based on Text To Speech and semantic audio human interaction proof | |
CN110415686A (en) | Method of speech processing, device, medium, electronic equipment | |
CN101023469B (en) | Digital filtering method, digital filtering equipment | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
US20210174805A1 (en) | Voice user interface | |
CN107329996A (en) | A kind of chat robots system and chat method based on fuzzy neural network | |
CN110211599A (en) | Using awakening method, device, storage medium and electronic equipment | |
CN106601229A (en) | Voice awakening method based on soc chip | |
CN110400571A (en) | Audio-frequency processing method, device, storage medium and electronic equipment | |
CN115762536A (en) | Small sample optimization bird sound recognition method based on bridge transform | |
CN116189681A (en) | Intelligent voice interaction system and method | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
Lu et al. | Deep convolutional neural network with transfer learning for environmental sound classification | |
Sagi et al. | A biologically motivated solution to the cocktail party problem | |
CN113571045A (en) | Minnan language voice recognition method, system, equipment and medium | |
CN112543390B (en) | Intelligent infant sound box and interaction method thereof | |
CN112382301A (en) | Noise-containing voice gender identification method and system based on lightweight neural network | |
Cao et al. | Emotion recognition from children speech signals using attention based time series deep learning | |
CN116434758A (en) | Voiceprint recognition model training method and device, electronic equipment and storage medium | |
CN113707172B (en) | Single-channel voice separation method, system and computer equipment of sparse orthogonal network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |