CN112543390A - Intelligent infant sound box and interaction method thereof - Google Patents

Intelligent infant sound box and interaction method thereof Download PDF

Info

Publication number
CN112543390A
CN112543390A CN202011336049.6A CN202011336049A CN112543390A CN 112543390 A CN112543390 A CN 112543390A CN 202011336049 A CN202011336049 A CN 202011336049A CN 112543390 A CN112543390 A CN 112543390A
Authority
CN
China
Prior art keywords
wolf
infant
module
voice
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011336049.6A
Other languages
Chinese (zh)
Other versions
CN112543390B (en
Inventor
岳莉亚
胡沛
韩璞
韩凌
杨植森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanyang Institute of Technology
Original Assignee
Nanyang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanyang Institute of Technology filed Critical Nanyang Institute of Technology
Priority to CN202011336049.6A priority Critical patent/CN112543390B/en
Publication of CN112543390A publication Critical patent/CN112543390A/en
Application granted granted Critical
Publication of CN112543390B publication Critical patent/CN112543390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/028Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/023Screens for loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/02Details casings, cabinets or mounting therein for transducers covered by H04R1/02 but not provided for in any of its subgroups

Abstract

The invention provides an intelligent infant sound box and an interaction method thereof, wherein the intelligent infant sound box comprises a sound box body, a central processing unit, a storage and a network connector are arranged in the sound box body, a display screen is arranged on the surface of the sound box body, a voice acquisition module, an infant voiceprint acquisition module, a wake-up module, an output module and an intelligent control module are arranged in the central processing unit, a storage module is arranged in the storage, the output module is connected with the display screen through a circuit, and the intelligent control module is electrically connected with the voice acquisition module, the infant voiceprint acquisition module, the wake-up module, the storage module and the output; the voice acquisition module is used for acquiring adult voice information; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words.

Description

Intelligent infant sound box and interaction method thereof
Technical Field
The invention relates to the technical field of voice recognition technology and artificial intelligence, in particular to an intelligent infant sound box and an interaction method thereof.
Background
With the maturity of artificial intelligence technology and the development of speech recognition technology, intelligent sound boxes have begun to penetrate into people's daily life. The intelligent sound box not only has the functions of playing audio and video by the traditional voice equipment, but also has the functions of intellectualization, interaction, control and the like. The existing loudspeaker boxes popular in the market have good interactivity and intelligence, but have poor experience effects on infants who just learn to speak for a short time, such as overlong awakening words and incapability of correctly recognizing instructions of the infants.
The neural network simulates the thinking function of the human brain structure, has strong self-learning and association functions, high precision, less manual intervention and less utilization of expert knowledge. A typical neural network architecture comprises an input layer, one or more hidden layers, and an output layer. The meta-heuristic algorithm can find a global solution in a multi-dimensional search space, and is widely applied to parameter training of a neural network. However, the neural network also has inherent defects of easy falling into local optimum, low precision, slow learning speed and the like. The processor performance of the existing intelligent sound box is general, and the data processing capability is poor.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides an infant intelligent sound box and an interaction method thereof, wherein the infant intelligent sound box can intelligently distinguish adult awakening or infant awakening by optimizing neural network parameters through an improved algorithm.
The purpose of the invention can be realized by the following technical scheme: the intelligent infant sound box comprises a sound box body, wherein a central processing unit, a storage and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice, and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
In foretell infant's intelligence audio amplifier, it is a plurality of single pronunciation collection module specifically includes first adult administrator pronunciation collection module, second adult administrator pronunciation collection module, third adult administrator pronunciation collection module, fourth adult administrator pronunciation collection module, fifth adult administrator pronunciation collection module and sixth adult administrator pronunciation collection module.
The voice acquisition module can acquire the voice information of six adults (parent, grandpa) altogether to after carrying out the discernment training through intelligent control module, these six adults can control the authority that the infant controlled intelligent audio amplifier.
The interaction method of the intelligent infant sound box comprises the following steps:
A. the method for recognizing adult speech comprises the following steps:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
B. the method for recognizing the infant voice comprises the following steps:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
In the interaction of the intelligent infant sound box, the compact gray wolf algorithm comprises the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt is a function for square root; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
Figure BDA0002797251020000031
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
Figure BDA0002797251020000041
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
Figure BDA0002797251020000042
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
Figure BDA0002797251020000051
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
Compared with the prior art, the intelligent infant sound box and the interaction method thereof have the following advantages:
the method can dynamically add awakening words, efficiently identify infant voice instructions, intelligently control the authority of infants to access the intelligent sound box, construct an efficient neural network voice training model, optimize neural network parameters in an embedded CPU with limited operation capability by the improved compact wolf algorithm, avoid the problem that the neural network is trapped in a local trap, effectively improve the prediction accuracy and accelerate the prediction process.
Drawings
FIG. 1 is a system diagram of the present invention;
FIG. 2 is a block diagram of an adult speech recognition process of the present invention;
FIG. 3 is a block diagram of a baby speech recognition process according to the present invention;
FIG. 4 is a flow chart of neural network speech recognition training of the present invention;
FIG. 5 is a diagram of a neural network architecture of the present invention;
FIG. 6 is a flow chart of the improved compact wolf algorithm of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:
as shown in fig. 1, the infant intelligent sound box comprises a sound box body, wherein a central processing unit, a memory and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body, and is characterized in that a voice acquisition module, an infant voiceprint acquisition module, a wake-up module, an output module and an intelligent control module are arranged in the central processing unit, a storage module is arranged in the memory, the output module is connected with the display screen through a circuit, and the intelligent control module is electrically connected with the voice acquisition module, the infant voiceprint acquisition module, the wake-up module, the storage module and the output module; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
In foretell infant's intelligence audio amplifier, a plurality of single pronunciation collection module specifically include first adult administrator pronunciation collection module, second adult administrator pronunciation collection module, third adult administrator pronunciation collection module, fourth adult administrator pronunciation collection module, fifth adult administrator pronunciation collection module and sixth adult administrator pronunciation collection module.
The voice acquisition module can acquire the voice information of six adults (parent, grandpa) altogether to after carrying out the discernment training through intelligent control module, these six adults can control the authority that the infant controlled intelligent audio amplifier.
The interaction method of the intelligent infant sound box comprises the following steps:
as shown in fig. 2, a method for adult speech recognition:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
as shown in fig. 4, 6), performing neural network speech recognition training by the neural network model constructed in step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network; (as shown in FIG. 5)
d. Calling compact wolf algorithm; (as shown in FIG. 6)
e. Setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
as shown in fig. 3, B, a method for speech recognition of a baby:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
as shown in fig. 4, 6), performing neural network speech recognition training by the neural network model constructed in step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network; (as shown in FIG. 5)
d. Calling compact wolf algorithm; (as shown in FIG. 6)
e. Setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
As shown in fig. 6, in the interaction of the infant smart speaker, the compact grayish wolf algorithm includes the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt () is the square root function; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
Figure BDA0002797251020000071
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
Figure BDA0002797251020000081
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
Figure BDA0002797251020000082
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
Figure BDA0002797251020000091
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
Compared with the prior art, the intelligent infant sound box and the interaction method thereof have the following advantages:
the method can dynamically add awakening words, efficiently identify infant voice instructions, intelligently control the authority of infants to access the intelligent sound box, construct an efficient neural network voice training model, optimize neural network parameters in an embedded CPU with limited operation capability by the improved compact wolf algorithm, avoid the problem that the neural network is trapped in a local trap, effectively improve the prediction accuracy and accelerate the prediction process.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims (4)

1. An intelligent infant sound box comprises a sound box body, wherein a central processing unit, a storage and a network connector are arranged in the sound box body, and a display screen is arranged on the surface of the sound box body; the voice acquisition module is used for acquiring adult voice information and comprises a plurality of single voice acquisition modules; the infant voiceprint acquisition module is used for acquiring infant voice signals; the awakening module is used for awakening the intelligent sound box through voice, and comprises an adult awakening module and an infant awakening module; the storage module is used for storing adult voice recognition information, awakening words, infant common instructions, infant historical browsing information and cache data; the output module is used for responding to a user instruction, and the output content of the output module comprises sound and video; the intelligent control module is used for adult voice recognition, infant voice recognition, user instruction response and dynamic addition of infant awakening words; the network connector is used for connecting the intelligent equipment with the Internet.
2. The intelligent sound box for infants as defined in claim 1, wherein the plurality of single voice collecting modules specifically comprises a first adult administrator voice collecting module, a second adult administrator voice collecting module, a third adult administrator voice collecting module, a fourth adult administrator voice collecting module, a fifth adult administrator voice collecting module and a sixth adult administrator voice collecting module.
3. The interaction method of a smart sound box for young children as claimed in claim 1, comprising the following steps:
A. the method for recognizing adult speech comprises the following steps:
1) inputting adult sample voice;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting adult training voice;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. predicting and outputting a test result by a neural network;
B. the method for recognizing the infant voice comprises the following steps:
1) inputting a sample voice of the infant;
2) extracting MFCC characteristic parameters;
3) constructing a neural network model;
4) inputting a training voice of the infant;
5) extracting MFCC characteristic parameters;
6) and carrying out neural network speech recognition training through the neural network model constructed in the step 3), wherein the training method comprises the following steps:
a. inputting speech characteristic parameter training and testing data;
b. normalizing the training data and the test data;
c. constructing a neural network;
d. calling compact wolf algorithm;
e. setting the neural network parameters as trained parameters;
f. constructing a neural network through the normalized training data;
g. and predicting and outputting a test result by the neural network.
4. The interactive method of intelligent speakers for young children as claimed in claim 3, wherein the compact wolf algorithm comprises the following steps:
1) initializing relevant parameters, such as maximum iteration times Max _ iter being 500, upper Position limit ub being 1, lower Position limit lb being 0, and randomly generating a gray wolf Position; mu and sicma calculations are shown in equations (1) and (2):
mu=zeros(3,dim); (1)
sicma=10*ones(3,dim); (2)
mu and sicma represent the mean and variance of the Gaussian distribution, dim is the dimension of the search space, and the number of parameters of the optimized neural network is represented;
2) initializing alpha, beta, gamma wolf positions, and calculating the following formulas (3) to (5):
Alpha_pos=ub*generateIndividualR(mu(1),sigma2(1)); (3)
Beta_pos=ub*generateIndividualR(mu(2),sigma2(2)); (4)
Delta_pos=ub*generateIndividualR(mu(3),sigma2(3)); (5)
generating a gray wolf position by a generateInividualR function according to the mean value and the variance of the Gaussian distribution;
3) the generatelndivualr (mu, sigma) function steps are calculated as follows (6) - (9):
r=rand(); (6)
erfA=erf((mu+1)/(sqrt(2)*sigma)); (7)
erfB=erf((mu-1)/(sqrt(2)*sigma)); (8)
samplerand=erfinv(-erfA-r*erfB+r*erfA)*sigma*sqrt(2)+mu; (9)
rand () generates a random variable of [0, 1 ]; erf () is an error function, which is the integral of the gaussian probability density function; sqrt is a function for square root; erfiv () represents the inverse error function; samplerand is a function return value;
4) calling an objective function as the following formula (10), and obtaining the objective function values of Alpha, Beta and gamma wolfs as Alpha _ score, Beta _ score and Delta _ score respectively;
Figure FDA0002797251010000031
n is the number of the neural network training samples, y is a training sample label, and y' represents a sample prediction result;
5) and calculating the position to which the wolf moves next time, circularly traversing each dimension of the wolf, and updating the following formulas (11) - (15):
a=2-l*((2)/Max_iter); (11)
X1=Alpha_pos(j)-(2*a*rand()-a)*abs(2*rand()*Alpha_pos(j)-Position(j)); (12)
X2=Beta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Beta_pos(j)-Position(j)); (13)
X3=Delta_pos(j)-(2*a*rand()-a)*abs(2*rand()*Delta_pos(j)-Position(j)); (14)
Position(j)=(X1+X2+X3)/3; (15)
l is the current iteration number, j represents the jth dimension of the wolf; a is used to control the global and local search capabilities of the algorithm; x1, X2 and X3 are the attraction of α, β, γ wolves to gray wolves, respectively; abs () is an absolute value function;
6) comparing the updated grey wolf position with the alpha wolf, winner1 being the wolf with the best objective function value, loser1 being the wolf with the worst objective function value;
7) update mu (1) and sicma (1), traverse each dimension of the wolf, update as follows (16) - (21):
winner1(j)=(winner1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (16)
loser1(j)=(loser1(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (17)
mut=mu(1,j); (18)
mu(1,j)=mu(1,j)+(1/200)*(winner1(j)-loser1(j)); (19)
t=sicma(1,j)^2+mut^2-mu(1,j)^2+(1/200)*(winner1(j)^2-loser1(j)^2); (20)
Figure FDA0002797251010000032
8) comparing the updated grey wolf position with the beta wolf, winner2 being the wolf with the best objective function value, loser2 being the wolf with the worst objective function value;
9) update mu (2) and sicma (2), traverse each dimension of the wolf, update as follows (22) - (27):
winner2(j)=(winner2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (22)
loser2(j)=(loser2(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (23)
mut=mu(1,j); (24)
mu(2,j)=mu(2,j)+(1/200)*(winner2(j)-loser2(j)); (25)
t=sicma(2,j)^2+mut^2-mu(2,j)^2+(1/200)*(winner2(j)^2-loser2(j)^2); (26)
Figure FDA0002797251010000041
10) comparing the updated grey wolf position with the gamma wolf, winner3 being the wolf with the best objective function value, loser3 being the wolf with the worst objective function value;
11) update mu (3) and sicma (3), traverse each dimension of the wolf, update as follows (28) - (33):
winner3(j)=(winner3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (28)
loser3(j)=(loser3(j)-(ub(j)+lb(j))/2)/((ub(j)-lb(j))/2); (29)
mut=mu(1,j); (30)
mu(3,j)=mu(3,j)+(1/200)*(winner3(j)-loser3(j)); (31)
t=sicma(3,j)^2+mut^2-mu(3,j)^2+(1/200)*(winner3(j)^2-loser3(j)^2); (32)
Figure FDA0002797251010000042
12) and the cycle ends, and outputs the optimum values of winner1, winner2 and winner 3.
CN202011336049.6A 2020-11-25 2020-11-25 Intelligent infant sound box and interaction method thereof Active CN112543390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011336049.6A CN112543390B (en) 2020-11-25 2020-11-25 Intelligent infant sound box and interaction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011336049.6A CN112543390B (en) 2020-11-25 2020-11-25 Intelligent infant sound box and interaction method thereof

Publications (2)

Publication Number Publication Date
CN112543390A true CN112543390A (en) 2021-03-23
CN112543390B CN112543390B (en) 2023-03-24

Family

ID=75015144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011336049.6A Active CN112543390B (en) 2020-11-25 2020-11-25 Intelligent infant sound box and interaction method thereof

Country Status (1)

Country Link
CN (1) CN112543390B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180277117A1 (en) * 2017-03-23 2018-09-27 Alex Lauren HERGENROEDER Method and Apparatus for Speech Interaction with Children
WO2019160396A2 (en) * 2019-04-11 2019-08-22 엘지전자 주식회사 Guide robot and operation method for guide robot
CN110534099A (en) * 2019-09-03 2019-12-03 腾讯科技(深圳)有限公司 Voice wakes up processing method, device, storage medium and electronic equipment
CN110696002A (en) * 2019-08-31 2020-01-17 左建 Intelligent early education robot
CN211063690U (en) * 2019-12-25 2020-07-21 安徽淘云科技有限公司 Drawing book recognition equipment
CN111638787A (en) * 2020-05-29 2020-09-08 百度在线网络技术(北京)有限公司 Method and device for displaying information
CN111816188A (en) * 2020-06-23 2020-10-23 漳州龙文维克信息技术有限公司 Man-machine voice interaction method for intelligent robot

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180277117A1 (en) * 2017-03-23 2018-09-27 Alex Lauren HERGENROEDER Method and Apparatus for Speech Interaction with Children
WO2019160396A2 (en) * 2019-04-11 2019-08-22 엘지전자 주식회사 Guide robot and operation method for guide robot
CN110696002A (en) * 2019-08-31 2020-01-17 左建 Intelligent early education robot
CN110534099A (en) * 2019-09-03 2019-12-03 腾讯科技(深圳)有限公司 Voice wakes up processing method, device, storage medium and electronic equipment
CN211063690U (en) * 2019-12-25 2020-07-21 安徽淘云科技有限公司 Drawing book recognition equipment
CN111638787A (en) * 2020-05-29 2020-09-08 百度在线网络技术(北京)有限公司 Method and device for displaying information
CN111816188A (en) * 2020-06-23 2020-10-23 漳州龙文维克信息技术有限公司 Man-machine voice interaction method for intelligent robot

Also Published As

Publication number Publication date
CN112543390B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
Zhang et al. Cooperative learning and its application to emotion recognition from speech
US20200402497A1 (en) Systems and Methods for Speech Generation
CN104115221B (en) Changed based on Text To Speech and semantic audio human interaction proof
CN110415686A (en) Method of speech processing, device, medium, electronic equipment
CN101023469B (en) Digital filtering method, digital filtering equipment
CN110265040A (en) Training method, device, storage medium and the electronic equipment of sound-groove model
US20210174805A1 (en) Voice user interface
CN107329996A (en) A kind of chat robots system and chat method based on fuzzy neural network
CN110211599A (en) Using awakening method, device, storage medium and electronic equipment
CN106601229A (en) Voice awakening method based on soc chip
CN110400571A (en) Audio-frequency processing method, device, storage medium and electronic equipment
CN115762536A (en) Small sample optimization bird sound recognition method based on bridge transform
CN116189681A (en) Intelligent voice interaction system and method
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
Lu et al. Deep convolutional neural network with transfer learning for environmental sound classification
Sagi et al. A biologically motivated solution to the cocktail party problem
CN113571045A (en) Minnan language voice recognition method, system, equipment and medium
CN112543390B (en) Intelligent infant sound box and interaction method thereof
CN112382301A (en) Noise-containing voice gender identification method and system based on lightweight neural network
Cao et al. Emotion recognition from children speech signals using attention based time series deep learning
CN116434758A (en) Voiceprint recognition model training method and device, electronic equipment and storage medium
CN113707172B (en) Single-channel voice separation method, system and computer equipment of sparse orthogonal network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant