CN113269305A - Feedback voice strengthening method for strengthening memory - Google Patents

Feedback voice strengthening method for strengthening memory Download PDF

Info

Publication number
CN113269305A
CN113269305A CN202110551052.8A CN202110551052A CN113269305A CN 113269305 A CN113269305 A CN 113269305A CN 202110551052 A CN202110551052 A CN 202110551052A CN 113269305 A CN113269305 A CN 113269305A
Authority
CN
China
Prior art keywords
voice
voice signal
frequency
module
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110551052.8A
Other languages
Chinese (zh)
Other versions
CN113269305B (en
Inventor
胡文莉
杨向格
尚季玲
梁超慧
尚宇
许卫红
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Railway Vocational and Technical College
Original Assignee
Zhengzhou Railway Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Railway Vocational and Technical College filed Critical Zhengzhou Railway Vocational and Technical College
Priority to CN202110551052.8A priority Critical patent/CN113269305B/en
Publication of CN113269305A publication Critical patent/CN113269305A/en
Application granted granted Critical
Publication of CN113269305B publication Critical patent/CN113269305B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Abstract

The invention discloses a feedback voice strengthening method for strengthening memory, which comprises the following steps: s1, the learner inputs the words to be memorized through the voice input module; s2, the voice signal data enter an analysis processing module to be sequentially subjected to voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing; s3, after the step S2, carrying out voice synthesis on the voice signal data, constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining a time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum; s4, extracting the key words of the voice signal after voice synthesis, and searching key information in a knowledge base; s5, English is output and fed back to the learner through the arranged output module; the control module is arranged to control the input module to carry out voice input, then the voice signals are subjected to enhancement processing, voice synthesis processing is carried out, the readable understandability of the voice is increased, and the enhancement of the memory effect is facilitated.

Description

Feedback voice strengthening method for strengthening memory
Technical Field
The invention belongs to the field of memory equipment, and particularly relates to a feedback voice enhancement method for enhancing memory.
Background
In the learning process of english, because english is a language class course, need contents such as a large amount of memory words, grammar at the in-process of study, just can learn english, but at present hardly to english learning process, according to student's speech feedback's mode, the device of reinforcing memory effect leads to english study can only rely on oneself hard back, and learning efficiency is low, and the memory effect is not obvious.
The existing English learning equipment generally adopts recorded words and grammar or words and grammar downloaded on the internet to play repeatedly, and learners learn by following and reading.
Chinese patent application No. 201810440869.6 discloses an interactive english learning system, the interactive English learning system comprises a plurality of mobile terminals, a server and a data storage device, a mobile terminal communication module is arranged in the mobile terminal, the mobile terminal is electrically connected with the server through the communication module, the server is electrically connected with the data storage device, a touch display screen integrated unit and a processor are arranged in the mobile terminal, the touch display screen integrated unit consists of an audio module, a display screen, a handwriting module, a tool module, a database module and a synchronous device, the processor is electrically connected with the audio module, the display screen is electrically connected with the processor, the handwriting module is electrically connected with the processor, the tool module is electrically connected with the processor, the database module is electrically connected with the processor, and the synchronous equipment is electrically connected with the processor. Above-mentioned technical scheme is through interactive study, carries out interactive study through setting up handwriting module, though help strengthening the memory, can not obtain effectual feedback information, in time deepens auditory memory, and memory learning effect remains to be promoted.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a feedback voice strengthening method for strengthening memory, which is characterized in that a control module is arranged to control an input module to carry out voice input, then a voice signal is strengthened and is subjected to voice synthesis, so that the readable understandability of voice is increased, and the recognition performance is higher; enhancing the quality of the voice signal, improving the readable intelligibility of the voice signal and enhancing the voice feedback memory effect; by pruning the voice signal data, the accuracy and the correctness of the voice signal data are increased, the accuracy of subsequent translation and voice output is increased, and the memory effect is enhanced.
The invention provides the following technical scheme:
a feedback voice enhancement method for enhancing memory comprises the following steps: s1, the learner inputs the words to be memorized through the voice input module, and the voice recognition module automatically calls a Baidu recognition API to recognize the input and output voice and convert the input and output voice into voice signal data;
s2, the voice signal data enter an analysis processing module to sequentially carry out voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing, and time domain, frequency domain graph and amplitude phase feature are obtained;
s3, after the step S2, carrying out voice synthesis on the voice signal data, constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining a time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum;
s4, extracting keywords of the voice signal after voice synthesis, retrieving key information by a knowledge base, and then translating the extracted key information by a set translation module;
and S5, calling an API (application program interface) of Baidu AI (AI-level) to convert the contents of the searched keywords and the corresponding texts, and finally outputting and feeding back English to the learner through a set output module.
Preferably, voice signal feature extraction, extracting the amplitude and the energy count value of the voice section signal data, and setting the neural network voice signal enhancement according to the amplitude and the energy count value of the voice section signal data to reduce noise interference; the neural network takes the preprocessed voice signals as input, the clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are extracted from voice signal data subjected to feature extraction immediately and used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500.
Preferably, a feedback speech enhancement method for enhancing memory adopts a memory system; comprises a control module, a voice input module and an output module; after the voice is input, the voice is recognized, the input voice signal is analyzed and processed after the voice is recognized, and after the voice signal is analyzed and processed, the readable understandability of the voice is increased, so that the recognition performance is higher;
after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased;
the voice synthesis module is connected with an extraction module, the extraction module retrieves the keywords, and after retrieval, the speech signals are translated into English through the connected translation module and are simultaneously output and fed back to the learner; the voice input module, the analysis processing module, the synthesis module, the extraction module, the translation module and the output module are sequentially connected, communicate through signal lines, are all connected with the control module, and communicate and control signals with the control module.
Preferably, the speech synthesis comprises the following steps: firstly, processing voice signal data through an analysis processing module, and extracting time domain, frequency domain and time-frequency domain characteristics of the voice signal; and constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining the time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum.
Preferably, the analysis processing module of the voice signal comprises voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing.
Preferably, the voice signal preprocessing comprises the steps of inputting the original voice, lifting the voice frequency part, and then resampling the voice signal according to 16 kHz; and performing framing and windowing on the resampled voice signal data, distinguishing a mute section and a voice section of the voice according to the short-time energy of each frame of voice signal, and performing feature extraction on the voice section.
Preferably, the voice signal enhancement processing adopts a neural network to establish a voice signal enhancement model, and the voice signal is processed through the neural network enhancement model, so that the quality of the voice signal is enhanced, the readability of the voice signal is improved, and the voice feedback memory effect is enhanced.
Preferably, the method for extracting the characteristic parameters comprises the following steps:
a, an analysis processing module acquires signal data of a voice section after voice signal preprocessing;
b, counting the energy counting value of the voice signal, wherein the energy counting value is obtained by adding the part of the voice segment data amplitude value larger than the threshold value to the energy counting;
c, comparing the energy count value with the maximum storage value, if the energy count value is larger than the maximum storage value, replacing, and recording the maximum energy count value;
and d, judging whether the voice section signal is finished or not, if the voice stops inputting, finishing the event, and storing the amplitude and the energy count value of the recorded voice section signal data and using the amplitude and the energy count value for voice signal enhancement processing.
Preferably, after the speech signal is processed by the neural network enhancement model, the analysis processing module prunes the speech signal data by adopting an iteration method, so that the accuracy and the precision of the speech signal data are increased, and the accuracy of subsequent translation and speech output is increased.
Preferably, the control module adopts an STM32 single chip microcomputer, and the voice input module (lb 3320), the translation module (icm 20602) and the output module (loudspeaker) are sequentially communicated with one-way serial ports; and the STM32 singlechip is in bidirectional serial port communication with the voice input module, the translation module and the output module.
Preferably, in step S2, after the learner inputs chinese or english speech, the system automatically calls the Baidu recognition API to recognize the incoming and outgoing speech, and after recognizing the incoming and outgoing speech as a speech signal, the system preprocesses the speech signal through the analysis processing module, where the preprocessing process includes: a1 pre-emphasis, boosting the high frequency part of the voice signal, filtering the voice signal data, the voice signal frequency H after filtering satisfies H =1-bz-1(ii) a In the formula, b is a pre-emphasis coefficient and has a value range of 0.89-1; z is the initial frequency of the speech signal data; the speech signal x2 (n) after pre-emphasis is represented as x2 (n) = x1 (n) - λ · x (n-1); in the above formula, x1 (n) is input voice signal data, and λ is an adjustment relation, and the value range is 0.76-0.97; by pre-emphasizing the voice signals, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is improved; a2, resampling the data after pre-emphasis, resampling the voice signal data by uniformly adopting 16 kHz; a3, framing, cutting the voice signal in short time steps on the time domain waveform of the voice signal to obtain small sections of voice signal parameters, combining the parameters into characteristic parameters on the time sequence of the whole voice signal, and finishing framing; a4 windowing, multiplying the recombined whole speech signal frame by a window function, wherein the window function adopts a cosine window w (N) =0.52-0.32cos (2 pi N/(N-1)); the window length is equal to the frame length; the windowed speech signal y satisfies y = x (n) -w (n); x (n) represents a single speech signal frame, wherein 0 is satisfied in the above formula<n<N, N is the number of sampling points of each frame of voice signal, and the signal pair language with sharp change at two ends of the voice frame is reduced by the methodThe influence of sound signal analysis is eliminated, and high-frequency interference is eliminated; a5 voice signal detection, each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, the average value of the short-time energy of each ten frames of voice signal segments is calculated, the average value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of the voice signal is improved.
After preprocessing, extracting the characteristics of the voice signals, extracting the amplitude and the energy count value of the voice section signal data, setting the enhancement of the neural network voice signals according to the amplitude and the energy count value of the voice section signal data, and reducing noise interference; the neural network specifically takes the preprocessed voice signals as input, clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are immediately extracted from voice signal data subjected to feature extraction to be used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500; the number of hidden layers is too many, and the training process is easy to trap partial optimal solution, so that the phenomenon of overfitting of the trained voice signal data is caused, and therefore three hidden layers are selected, and the improvement of the voice intelligibility after the voice signal is enhanced is facilitated.
In step S2, the process of the analysis processing module performing the pruning on the speech signal data by using the iterative method is as follows: firstly, setting a threshold value for voice signal parameter pruning according to the absolute value of the amplitude extracted by the characteristic, carrying out zero-cause processing when the absolute value of the amplitude is smaller than the threshold value, and pruning by using a group of masking matrixes, wherein the weight value of an item corresponding to the absolute value of the voice signal parameter smaller than the pruning threshold value returns to zero, and the weight value of the item higher than the pruned item is 1; secondly, performing secondary pruning on the data after neural network training in the same way; finally, the accuracy of the enhanced voice signal data is used as a judgment threshold, when the accuracy is smaller than the threshold, the clean voice data output by the last neural network is output, and if the accuracy is larger than the threshold, the pruning threshold is updated, and pruning is carried out again; until the trimming is complete. By pruning the neural network training model, the voice signal data is enhanced, the accuracy of the voice signal data is improved, and the accuracy of voice feedback is improved.
In addition, the whole process of data analysis by the analysis processing module is that after voice signal data is input, noise in a low-frequency environment is removed through band-pass filtering, then linear cancellation elimination of stimulation sound interference is carried out on the voice signal data, then threshold judgment is carried out, comparison with a threshold is carried out, an average value is superposed, impulse noise is removed, the signal to noise ratio is improved, dynamic tracking filtering is carried out, new voice signal data are synthesized, and finally Fourier transformation is carried out to obtain time domain, frequency domain graph and amplitude phase characteristics.
In addition, in step S3, in the process of speech synthesis, when a desired amplitude value is constructed in the time domain, the amplitude value is set as a constant function of frequency, and the amplitude value does not change with the frequency; constructing a phase spectrum on a frequency domain, constructed by group delay, at an instant T at which each frequency of the speech signal data occurs, the instant T satisfying T = T (f-f 1)/(f 2-f1) if the frequency increases from f1 to f2 when time varies from 0 to T seconds; where T is the duration of the increase in frequency from f1 to f 2; thus the phase spectrum phi satisfies phi = -pi Tf (f-f 1)/(f 2-f 1); after voice synthesis, the data frequency of the voice signal is continuous, the resolution ratio is high, the sound production efficiency is high, the instantaneous frequency of the voice signal is continuously changed from low to high according to a certain rule within a frequency range, the recognition degree of the voice signal is improved, the sensitivity of the cochlear nerve center is increased, excitation of an auditory efferent nervous system is triggered through voice signal synthesis, feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; the key words of the voice signals are extracted after voice synthesis, key information is retrieved from a knowledge base, the extracted key information is translated into English through a set translation module, English is converted by calling an API (application program interface) of a Baidu AI (artificial intelligence) level through the retrieved key words and the content of the corresponding text, and finally, English is output and fed back to a learner through a set output module.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention relates to a feedback voice strengthening method for strengthening memory, which is characterized in that a control module is arranged to control an input module to carry out voice input, then a voice signal is strengthened and synthesized, so that the readable understandability of voice is increased, and the identifiability is higher; the quality of the voice signal is enhanced, the readable understandability of the voice signal is improved, and the voice feedback memory effect is enhanced.
(2) According to the feedback voice strengthening method for strengthening memory, the voice signal data is trimmed, so that the accuracy and the precision of the voice signal data are improved, and the accuracy of subsequent translation and voice output is improved.
(3) The invention relates to a feedback voice strengthening method for strengthening memory, which is characterized in that each frame of short-time energy of a voice signal segment is used as a threshold value for feature extraction, the short-time energy of each ten frames of voice signal segments is subjected to mean value calculation, the mean value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of a voice signal is improved.
(4) The invention discloses a feedback voice strengthening method for strengthening memory, which eliminates the noise of lip vibration in the occurrence process, reduces the influence of the noise on voice data and increases the accuracy of the voice signal data by limiting the relation between the voice signal after pre-emphasis and the input voice signal data.
(5) The invention relates to a feedback voice strengthening method for strengthening memory, which limits the relation between instantaneous time and frequency by constructing a phase spectrum on a frequency domain, improves the identification degree of a voice signal, increases the sensitivity of a cochlear nerve center, induces the excitation of an auditory efferent nervous system through voice signal synthesis, is favorable for forming feedback memory on the brain and improves the memory ability and the memory effect of a learner.
(6) The invention relates to a feedback voice strengthening method for strengthening memory, which improves the recognition degree of a voice signal, improves the accuracy of voice signal data, improves the accuracy of voice feedback and further strengthens the effect of feedback memory by combining the analysis processing of voice and the voice synthesis and coacting.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a block diagram of the system framework of the present invention.
Fig. 2 is a flow chart of speech signal preprocessing of the present invention.
Fig. 3 is a diagram of a neural network topology of the present invention.
Fig. 4 is a flow chart of the iterative pruning of speech signals of the present invention.
Fig. 5 is a data flow diagram of the speech signal data analysis process of the present invention.
Fig. 6 is a flow chart of a method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described in detail and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The first embodiment is as follows:
as shown in fig. 6, a method for enhancing feedback speech of memory includes the following steps: s1, the learner inputs the words to be memorized through the voice input module, and the voice recognition module automatically calls a Baidu recognition API to recognize the input and output voice and convert the input and output voice into voice signal data;
s2, the voice signal data enter an analysis processing module to sequentially carry out voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing, and time domain, frequency domain graph and amplitude phase feature are obtained;
s3, after the step S2, carrying out voice synthesis on the voice signal data, constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining a time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum;
s4, extracting keywords of the voice signal after voice synthesis, retrieving key information by a knowledge base, and then translating the extracted key information by a set translation module;
and S5, calling an API (application program interface) of Baidu AI (AI-level) to convert the contents of the searched keywords and the corresponding texts, and finally outputting and feeding back English to the learner through a set output module.
Extracting voice signal characteristics, namely extracting the amplitude and the energy count value of voice section signal data, and setting the enhancement of a neural network voice signal according to the amplitude and the energy count value of the voice section signal data to reduce noise interference; the neural network specifically takes the preprocessed voice signals as input, clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are immediately extracted from voice signal data subjected to feature extraction to be used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500; the number of hidden layers is too many, and the training process is easy to trap partial optimal solution, so that the phenomenon of overfitting of the trained voice signal data is caused, and therefore three hidden layers are selected, and the improvement of the voice intelligibility after the voice signal is enhanced is facilitated.
Example two:
as shown in fig. 1, a feedback speech enhancement method for enhancing memory; comprises a control module, a voice input module and an output module; after the voice is input, the voice is recognized, the input voice signal is analyzed and processed after the voice is recognized, and after the voice signal is analyzed and processed, the readable understandability of the voice is increased, so that the recognition performance is higher;
after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased;
the voice synthesis module is connected with an extraction module, the extraction module retrieves the keywords, and after retrieval, the speech signals are translated into English through the connected translation module and are simultaneously output and fed back to the learner; the voice input module, the analysis processing module, the synthesis module, the extraction module, the translation module and the output module are sequentially connected, communicate through signal lines, are all connected with the control module, and communicate and control signals with the control module.
The steps of speech synthesis are: firstly, processing voice signal data through an analysis processing module, and extracting time domain, frequency domain and time-frequency domain characteristics of the voice signal; and constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining the time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum.
The voice signal analysis processing module comprises voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing.
The control module adopts an STM32 single chip microcomputer, and the voice input module (lb 3320), the translation module (icm 20602) and the output module (loudspeaker) are sequentially communicated with one-way serial ports; and the STM32 singlechip is in bidirectional serial port communication with the voice input module, the translation module and the output module.
Example three:
as shown in fig. 2, on the basis of the first embodiment, the voice signal preprocessing includes the steps of inputting the original voice, boosting the voice frequency part, and then resampling the voice signal according to 16 kHz; and performing framing and windowing on the resampled voice signal data, distinguishing a mute section and a voice section of the voice according to the short-time energy of each frame of voice signal, and performing feature extraction on the voice section.
After the learner inputs Chinese or English voice, the system automatically calls the Baidu recognition API to recognize the input and output voice, after the voice is recognized as a voice signal, the voice signal is preprocessed through the analysis processing module, and the preprocessing process is as follows: a1 pre-emphasis, boosting the high frequency part of the voice signal, filtering the voice signal data, the voice signal frequency H after filtering satisfies H =1-bz-1(ii) a In the formula, b is a pre-emphasis coefficient and has a value range of 0.89-1; z is the initial frequency of the speech signal data; the speech signal x2 (n) after pre-emphasis is represented as x2 (n) = x1 (n) - λ · x (n-1); in the above formula, x1 (n) is input voice signal data, and λ is an adjustment relation, and the value range is 0.76-0.97; by pre-emphasizing the voice signals, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is improved; a2, resampling the data after pre-emphasis, resampling the voice signal data by uniformly adopting 16 kHz; a3, framing, cutting the voice signal in short time steps on the time domain waveform of the voice signal to obtain small sections of voice signal parameters, combining the parameters into characteristic parameters on the time sequence of the whole voice signal, and finishing framing; a4 windowing, multiplying the recombined whole speech signal frame by a window function, wherein the window function adopts a cosine window w (N) =0.56-0.36cos (2 pi N/(N-1)); the window length is equal to the frame length; the windowed speech signal y satisfies y = x (n) -w (n); x (n) represents a single speech signal frame, wherein 0 is satisfied in the above formula<n<N, N is the number of sampling points of each frame of voice signal, and the words are reduced by the methodThe influence of the rapidly changing signals at the two ends of the voice frame on the voice signal analysis is eliminated, and the high-frequency interference is eliminated; a5 voice signal detection, each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, the average value of the short-time energy of each ten frames of voice signal segments is calculated, the average value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of the voice signal is improved.
Example four:
as shown in fig. 3-4, the speech signal enhancement processing adopts a neural network to establish a speech signal enhancement model, and processes the speech signal through the neural network enhancement model, so as to enhance the quality of the speech signal, improve the intelligibility of the speech signal, and enhance the speech feedback memory effect.
The extraction method of the characteristic parameters comprises the following steps:
a, an analysis processing module acquires signal data of a voice section after voice signal preprocessing;
b, counting the energy counting value of the voice signal, wherein the energy counting value is obtained by adding the part of the voice segment data amplitude value larger than the threshold value to the energy counting;
c, comparing the energy count value with the maximum storage value, if the energy count value is larger than the maximum storage value, replacing, and recording the maximum energy count value;
and d, judging whether the voice section signal is finished or not, if the voice stops inputting, finishing the event, and storing the amplitude and the energy count value of the recorded voice section signal data and using the amplitude and the energy count value for voice signal enhancement processing.
After the speech signal is processed by the neural network enhancement model, the analysis processing module prunes the speech signal data by adopting an iteration method, so that the accuracy and the precision of the speech signal data are improved, and the accuracy of subsequent translation and speech output is improved.
The process of pruning the voice signal data by the analysis processing module by adopting an iteration method is as follows: firstly, setting a threshold value for voice signal parameter pruning according to the absolute value of the amplitude extracted by the characteristic, carrying out zero-cause processing when the absolute value of the amplitude is smaller than the threshold value, and pruning by using a group of masking matrixes, wherein the weight value of an item corresponding to the absolute value of the voice signal parameter smaller than the pruning threshold value returns to zero, and the weight value of the item higher than the pruned item is 1; secondly, performing secondary pruning on the data after neural network training in the same way; finally, the accuracy of the enhanced voice signal data is used as a judgment threshold, when the accuracy is smaller than the threshold, the clean voice data output by the last neural network is output, and if the accuracy is larger than the threshold, the pruning threshold is updated, and pruning is carried out again; until the trimming is complete. By pruning the neural network training model, the voice signal data is enhanced, the accuracy of the voice signal data is improved, and the accuracy of voice feedback is improved.
EXAMPLE five
As shown in fig. 5, the whole process of data analysis by the analysis processing module includes that after voice signal data is input, noise in a low-frequency environment is removed through bandpass filtering, then linear cancellation elimination of stimulation acoustic interference is performed on the voice signal data, then threshold judgment is performed, comparison with a threshold is performed, an average value is superimposed, impulse noise is removed, signal-to-noise ratio is improved, dynamic tracking filtering is performed, new voice signal data are synthesized, and finally fourier transform is performed to obtain time domain, frequency domain graph and amplitude phase characteristics.
EXAMPLE six
In the process of voice synthesis, when an expected amplitude value is constructed on a time domain, the amplitude value is set as a constant function of frequency, and the amplitude value does not change along with the frequency; constructing a phase spectrum on a frequency domain, constructed by group delay, at an instant T at which each frequency of the speech signal data occurs, the instant T satisfying T = T (f-f 1)/(f 2-f1) if the frequency increases from f1 to f2 when time varies from 0 to T seconds; where T is the duration of the increase in frequency from f1 to f 2; thus the phase spectrum phi satisfies phi = -pi Tf (f-f 1)/(f 2-f 1); after voice synthesis, the data frequency of the voice signal is continuous, the resolution ratio is high, the sound production efficiency is high, the instantaneous frequency of the voice signal is continuously changed from low to high according to a certain rule within a frequency range, the recognition degree of the voice signal is improved, the sensitivity of the cochlear nerve center is increased, excitation of an auditory efferent nervous system is triggered through voice signal synthesis, feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; the key words of the voice signals are extracted after voice synthesis, key information is retrieved from a knowledge base, the extracted key information is translated into English through a set translation module, English is converted by calling an API (application program interface) of a Baidu AI (artificial intelligence) level through the retrieved key words and the content of the corresponding text, and finally, English is output and fed back to a learner through a set output module.
The device obtained by the technical scheme is a feedback voice enhancement method for enhancing memory, the control module is arranged to control the input module to perform voice input, then the voice signal is enhanced and voice synthesis is performed, so that the readable understandability of voice is increased, and the recognition performance is higher; enhancing the quality of the voice signal, improving the readable intelligibility of the voice signal and enhancing the voice feedback memory effect; by pruning the voice signal data, the accuracy and the correctness of the voice signal data are increased, and the accuracy of subsequent translation and voice output is increased; each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, mean value calculation is carried out on the short-time energy of each ten frames of voice signal segments, the mean value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of a voice signal is improved; by limiting the relation between the voice signal after pre-emphasis and the input voice signal data, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is increased; the phase spectrum is constructed on the frequency domain, the relation between the instantaneous time and the frequency is limited, the recognition degree of a voice signal is improved, the sensitivity of a cochlear nerve center is increased, the excitation of an auditory efferent nervous system is triggered through voice signal synthesis, the feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; through the mutual combination of the analysis processing of the voice and the voice synthesis, the recognition degree of the voice signal is improved under the combined action, the accuracy of the voice signal data is improved, the accuracy of the voice feedback is improved, and the effect of feedback memory is further enhanced.
Other technical solutions not described in detail in the present invention are prior art in the field, and are not described herein again.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention; any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A feedback voice enhancement method for enhancing memory is characterized by comprising the following steps: s1, the learner inputs the words to be memorized through the voice input module, and the voice recognition module automatically calls a Baidu recognition API to recognize the input voice and convert the voice into voice signal data;
s2, the voice signal data enter an analysis processing module to sequentially carry out voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing, and time domain, frequency domain graph and amplitude phase feature are obtained;
s3, after the step S2, carrying out voice synthesis on the voice signal data, constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining a time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum;
s4, extracting keywords of the voice signal after voice synthesis, retrieving key information by a knowledge base, and then translating the extracted key information by a set translation module;
and S5, calling an API (application program interface) of the Baidu AI platform to convert through the searched keywords and the content of the corresponding text, and finally outputting and feeding back the translated voice to the learner through a set output module.
2. The method of claim 1, wherein in step S2, the preprocessing of the speech signal comprises: a1 pre-emphasis, lifting the high-frequency part of the voice signal, and filtering the voice signal data; a2, resampling the data after pre-emphasis, resampling the voice signal data by uniformly adopting 16 kHz; and A3, framing, cutting the voice signal in short time steps on the time domain waveform of the voice signal to obtain small sections of voice signal parameters, and combining the parameters into characteristic parameters on the time sequence of the whole voice signal to finish framing.
3. The method of claim 2, wherein the pre-processing of the speech signal further comprises a4 windowing, multiplying the recombined whole speech signal frame by a window function, reducing the effect of the abrupt change signals at both ends of the speech frame on the analysis of the speech signal, and eliminating high frequency interference; a5 voice signal detection, each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, the average value of the short-time energy of each ten frames of voice signal segments is calculated, the average value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of the voice signal is improved.
4. The method according to claim 2, wherein the speech signal feature extraction is performed to extract the amplitude and energy count of the speech segment signal data, and the neural network speech signal enhancement is set according to the amplitude and energy count of the speech segment signal data to reduce noise interference; the neural network takes the preprocessed voice signals as input, the clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are extracted from voice signal data subjected to feature extraction immediately and used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500.
5. The method as claimed in claim 1, wherein in step S3, when the desired amplitude value is constructed in the time domain during the speech synthesis, the amplitude value is set as a constant function of frequency, and the amplitude value does not change with the frequency.
6. The method of claim 1, wherein the phase spectrum is constructed in the frequency domain by group delay, and the instant T of each frequency occurrence of the speech signal data when the time is changed from 0 to T seconds satisfies T = T (f-f 1)/(f 2-f1) if the frequency is increased from f1 to f 2; where T is the duration of the increase in frequency from f1 to f 2; therefore, the phase spectrum φ satisfies φ = - π Tf (f-f 1)/(f 2-f 1).
7. The method of claim 1, wherein after speech synthesis, the speech signal has continuous data frequency, high resolution, and high sound production efficiency, and the instantaneous frequency of the speech signal varies continuously from low to high within a frequency band according to a certain rule, so as to improve the recognition of the speech signal and increase the sensitivity of the cochlear nerve center.
CN202110551052.8A 2021-05-20 2021-05-20 Feedback voice strengthening method for strengthening memory Active CN113269305B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110551052.8A CN113269305B (en) 2021-05-20 2021-05-20 Feedback voice strengthening method for strengthening memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110551052.8A CN113269305B (en) 2021-05-20 2021-05-20 Feedback voice strengthening method for strengthening memory

Publications (2)

Publication Number Publication Date
CN113269305A true CN113269305A (en) 2021-08-17
CN113269305B CN113269305B (en) 2024-05-03

Family

ID=77232029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110551052.8A Active CN113269305B (en) 2021-05-20 2021-05-20 Feedback voice strengthening method for strengthening memory

Country Status (1)

Country Link
CN (1) CN113269305B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR7600292A (en) * 1975-04-02 1976-10-05 Rickwell Int Corp SYSTEM TO DETECT ONE OR MORE KEY WORDS IN CONTINUOUS CONVERSATION
US20130262096A1 (en) * 2011-09-23 2013-10-03 Lessac Technologies, Inc. Methods for aligning expressive speech utterances with text and systems therefor
US20140108020A1 (en) * 2012-10-15 2014-04-17 Digimarc Corporation Multi-mode audio recognition and auxiliary data encoding and decoding
US20150011842A1 (en) * 2012-01-18 2015-01-08 Shirley Steinberg-Shapira Method and device for stuttering alleviation
US20170270919A1 (en) * 2016-03-21 2017-09-21 Amazon Technologies, Inc. Anchored speech detection and speech recognition
CN109767760A (en) * 2019-02-23 2019-05-17 天津大学 Far field audio recognition method based on the study of the multiple target of amplitude and phase information
CN110121749A (en) * 2016-11-23 2019-08-13 通用电气公司 Deep learning medical system and method for Image Acquisition
CN110136741A (en) * 2019-05-16 2019-08-16 哈尔滨工业大学 A kind of single-channel voice Enhancement Method based on multiple dimensioned context
WO2020072759A1 (en) * 2018-10-03 2020-04-09 Visteon Global Technologies, Inc. A voice assistant system for a vehicle cockpit system
CN111078010A (en) * 2019-12-06 2020-04-28 智语科技(江门)有限公司 Man-machine interaction method and device, terminal equipment and readable storage medium
CN111145606A (en) * 2019-11-30 2020-05-12 合肥微澜特网络科技有限责任公司 English interactive learning platform system
CN112735456A (en) * 2020-11-23 2021-04-30 西安邮电大学 Speech enhancement method based on DNN-CLSTM network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR7600292A (en) * 1975-04-02 1976-10-05 Rickwell Int Corp SYSTEM TO DETECT ONE OR MORE KEY WORDS IN CONTINUOUS CONVERSATION
US20130262096A1 (en) * 2011-09-23 2013-10-03 Lessac Technologies, Inc. Methods for aligning expressive speech utterances with text and systems therefor
US20150011842A1 (en) * 2012-01-18 2015-01-08 Shirley Steinberg-Shapira Method and device for stuttering alleviation
US20140108020A1 (en) * 2012-10-15 2014-04-17 Digimarc Corporation Multi-mode audio recognition and auxiliary data encoding and decoding
US20170270919A1 (en) * 2016-03-21 2017-09-21 Amazon Technologies, Inc. Anchored speech detection and speech recognition
CN110121749A (en) * 2016-11-23 2019-08-13 通用电气公司 Deep learning medical system and method for Image Acquisition
WO2020072759A1 (en) * 2018-10-03 2020-04-09 Visteon Global Technologies, Inc. A voice assistant system for a vehicle cockpit system
CN109767760A (en) * 2019-02-23 2019-05-17 天津大学 Far field audio recognition method based on the study of the multiple target of amplitude and phase information
CN110136741A (en) * 2019-05-16 2019-08-16 哈尔滨工业大学 A kind of single-channel voice Enhancement Method based on multiple dimensioned context
CN111145606A (en) * 2019-11-30 2020-05-12 合肥微澜特网络科技有限责任公司 English interactive learning platform system
CN111078010A (en) * 2019-12-06 2020-04-28 智语科技(江门)有限公司 Man-machine interaction method and device, terminal equipment and readable storage medium
CN112735456A (en) * 2020-11-23 2021-04-30 西安邮电大学 Speech enhancement method based on DNN-CLSTM network

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
GEONMIN KIM 等: "Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition", IEEE SIGNAL PROCESSING LETTERS, no. 1, 9 December 2018 (2018-12-09), pages 159 *
SAMIA ABD EL-MONEIM 等: "Text-independent speaker recognition using LSTM-RNN and speech enhancement", pages 1, Retrieved from the Internet <URL:https://link.springer.com/article/10.1007/s11042-019-08293-7> *
戴经国: "语音信号处理技术及应用", 电脑与信息技术, no. 06, 30 December 2000 (2000-12-30), pages 35 - 38 *
时文华 等: "联合稀疏非负矩阵分解和神经网络的语音增强", 计算机研究与发展, no. 11, 15 November 2018 (2018-11-15), pages 2430 - 2438 *
李蜜: "基于语音处理的说话人身份识别研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 1, pages 5 - 7 *
极限元: "一文看懂深度学习在语音合成&增强上的应用", pages 1 - 3, Retrieved from the Internet <URL:https://www.leiphone.com/category/ai/trT0kxuTmx67dtPk.html> *
袁文浩 等: "一种基于时频域特征融合的语音增强方法", 计算机工程, 23 October 2020 (2020-10-23), pages 75 - 81 *
黄万伟 等: "基于Kinect体感识别的沉浸式英语学习系统设计", 工业和信息化教育, no. 08, 25 August 2018 (2018-08-25), pages 84 - 88 *
黄万伟 等: "基于Kinect体感识别的沉浸式英语学习系统设计", 工业和信息化教育, no. 08, pages 84 - 88 *

Also Published As

Publication number Publication date
CN113269305B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
CN109326299B (en) Speech enhancement method, device and storage medium based on full convolution neural network
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN111916111B (en) Intelligent voice outbound method and device with emotion, server and storage medium
CN109887489B (en) Speech dereverberation method based on depth features for generating countermeasure network
CN109256118B (en) End-to-end Chinese dialect identification system and method based on generative auditory model
CN108597505A (en) Audio recognition method, device and terminal device
CN115602165B (en) Digital employee intelligent system based on financial system
CN111986679A (en) Speaker confirmation method, system and storage medium for responding to complex acoustic environment
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN112183582A (en) Multi-feature fusion underwater target identification method
CN114495969A (en) Voice recognition method integrating voice enhancement
CN109346104A (en) A kind of audio frequency characteristics dimension reduction method based on spectral clustering
CN113571095A (en) Speech emotion recognition method and system based on nested deep neural network
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
CN115472168B (en) Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN113269305A (en) Feedback voice strengthening method for strengthening memory
CN113257219A (en) Feedback type voice stimulation memory system
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
CN115168563A (en) Airport service guiding method, system and device based on intention recognition
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Boril et al. Data-driven design of front-end filter bank for Lombard speech recognition
Tzudir et al. Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant