CN113257219A - Feedback type voice stimulation memory system - Google Patents

Feedback type voice stimulation memory system Download PDF

Info

Publication number
CN113257219A
CN113257219A CN202110551051.3A CN202110551051A CN113257219A CN 113257219 A CN113257219 A CN 113257219A CN 202110551051 A CN202110551051 A CN 202110551051A CN 113257219 A CN113257219 A CN 113257219A
Authority
CN
China
Prior art keywords
voice
voice signal
module
signal
feedback
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110551051.3A
Other languages
Chinese (zh)
Inventor
胡文莉
杨向格
尚季玲
梁超慧
尚宇
许卫红
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Railway Vocational and Technical College
Original Assignee
Zhengzhou Railway Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Railway Vocational and Technical College filed Critical Zhengzhou Railway Vocational and Technical College
Priority to CN202110551051.3A priority Critical patent/CN113257219A/en
Publication of CN113257219A publication Critical patent/CN113257219A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a feedback type voice stimulation memory system; comprises a control module, a voice input module and an output module; after the voice is input, the voice is recognized, the input voice signal is analyzed and processed after the voice is recognized, and after the voice signal is analyzed and processed, the readable understandability of the voice is increased, so that the recognition performance is higher; after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased; the control module is arranged to control the input module to carry out voice input, then the voice signal is subjected to enhancement processing, and voice synthesis processing is carried out, so that the readable understandability of voice is increased, and the identifiability is higher; enhancing the quality of the voice signal and enhancing the voice feedback memory effect; by pruning the voice signal data, the accuracy and the correctness of the voice signal data are increased, and the accuracy of subsequent translation and voice output is increased.

Description

Feedback type voice stimulation memory system
Technical Field
The invention belongs to the field of memory equipment, and particularly relates to a feedback type voice stimulation memory system.
Background
In the learning process of english, because english is a language class course, need contents such as a large amount of memory words, grammar at the in-process of study, just can learn english, but at present hardly to english learning process, according to student's speech feedback's mode, the device of reinforcing memory effect leads to english study can only rely on oneself hard back, and learning efficiency is low, and the memory effect is not obvious.
The existing English learning equipment generally adopts recorded words and grammar or words and grammar downloaded on the internet to play repeatedly, and learners learn by following and reading.
Chinese patent application No. 201810440869.6 discloses an interactive english learning system, the interactive English learning system comprises a plurality of mobile terminals, a server and a data storage device, a mobile terminal communication module is arranged in the mobile terminal, the mobile terminal is electrically connected with the server through the communication module, the server is electrically connected with the data storage device, a touch display screen integrated unit and a processor are arranged in the mobile terminal, the touch display screen integrated unit consists of an audio module, a display screen, a handwriting module, a tool module, a database module and a synchronous device, the processor is electrically connected with the audio module, the display screen is electrically connected with the processor, the handwriting module is electrically connected with the processor, the tool module is electrically connected with the processor, the database module is electrically connected with the processor, and the synchronous equipment is electrically connected with the processor. Above-mentioned technical scheme is through interactive study, carries out interactive study through setting up handwriting module, though help strengthening the memory, can not obtain effectual feedback information, in time deepens auditory memory, and memory learning effect remains to be promoted.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a feedback type voice stimulation memory system, wherein a control module is arranged to control an input module to perform voice input, then the voice signal is subjected to enhancement processing and voice synthesis processing, so that the readable understandability of the voice is increased, and the identifiability is higher; enhancing the quality of the voice signal, improving the readable intelligibility of the voice signal and enhancing the voice feedback memory effect; by pruning the voice signal data, the accuracy and the correctness of the voice signal data are increased, and the accuracy of subsequent translation and voice output is increased.
The invention provides the following technical scheme:
a feedback voice stimulation memory system; comprises a control module, a voice input module and an output module; after the voice is input, the voice is recognized, the input voice signal is analyzed and processed after the voice is recognized, and after the voice signal is analyzed and processed, the readable understandability of the voice is increased, so that the recognition performance is higher;
after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased;
the voice synthesis module is connected with an extraction module, the extraction module retrieves the keywords, and after retrieval, the speech signals are translated into English through the connected translation module and are simultaneously output and fed back to the learner; the voice input module, the analysis processing module, the synthesis module, the extraction module, the translation module and the output module are sequentially connected, communicate through signal lines, are all connected with the control module, and communicate and control signals with the control module.
Preferably, the speech synthesis comprises the following steps: firstly, processing voice signal data through an analysis processing module, and extracting time domain, frequency domain and time-frequency domain characteristics of the voice signal; and constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining the time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum.
Preferably, the analysis processing module of the voice signal comprises voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing.
Preferably, the voice signal preprocessing comprises the steps of inputting the original voice, lifting the voice frequency part, and then resampling the voice signal according to 16 kHz; and performing framing and windowing on the resampled voice signal data, distinguishing a mute section and a voice section of the voice according to the short-time energy of each frame of voice signal, and performing feature extraction on the voice section.
Preferably, the voice signal enhancement processing adopts a neural network to establish a voice signal enhancement model, and the voice signal is processed through the neural network enhancement model, so that the quality of the voice signal is enhanced, the readability of the voice signal is improved, and the voice feedback memory effect is enhanced.
Preferably, the method for extracting the characteristic parameters comprises the following steps:
a, an analysis processing module acquires signal data of a voice section after voice signal preprocessing;
b, counting the energy counting value of the voice signal, wherein the energy counting value is obtained by adding the part of the voice segment data amplitude value larger than the threshold value to the energy counting;
c, comparing the energy count value with the maximum storage value, if the energy count value is larger than the maximum storage value, replacing, and recording the maximum energy count value;
and d, judging whether the voice section signal is finished or not, if the voice stops inputting, finishing the event, and storing the amplitude and the energy count value of the recorded voice section signal data and using the amplitude and the energy count value for voice signal enhancement processing.
Preferably, after the speech signal is processed by the neural network enhancement model, the analysis processing module prunes the speech signal data by adopting an iteration method, so that the accuracy and the precision of the speech signal data are increased, and the accuracy of subsequent translation and speech output is increased.
Preferably, the control module adopts an STM32 single chip microcomputer, and the voice input module (lb 3320), the translation module (icm 20602) and the output module (loudspeaker) are sequentially communicated with one-way serial ports; and the STM32 singlechip is in bidirectional serial port communication with the voice input module, the translation module and the output module.
Preferably, after the learner inputs Chinese or English speech, the system automatically calls the Baidu recognition API to recognize the input and output speech, after the speech is recognized as a speech signal, the speech signal is preprocessed through the analysis processing module, and the preprocessing process is as follows: a1 pre-emphasis, boosting the high frequency part of the voice signal, filtering the voice signal data, the voice signal frequency H after filtering satisfies H =1-bz-1(ii) a In the formula, b is a pre-emphasis coefficient and has a value range of 0.89-1; z is the initial frequency of the speech signal data; the speech signal x2 (n) after pre-emphasis is represented as x2 (n) = x1 (n) - λ · x (n-1); in the above formula, x1 (n) is input voice signal data, and λ is an adjustment relation, and the value range is 0.76-0.97; by pre-emphasizing the voice signals, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is improved; a2, resampling the data after pre-emphasis, resampling the voice signal data by uniformly adopting 16 kHz; a3, framing, cutting the voice signal in short time steps on the time domain waveform of the voice signal to obtain small sections of voice signal parameters, combining the parameters into characteristic parameters on the time sequence of the whole voice signal, and finishing framing; a4 windowing, the recombined whole speech signal frame is compared with a window functionMultiplying, and adopting a cosine window w (N) =0.56-0.36cos (2 pi N/(N-1)); the window length is equal to the frame length; the windowed speech signal y satisfies y = x (n) -w (n); x (n) represents a single speech signal frame, wherein 0 is satisfied in the above formula<n<N, N is the number of sampling points of each frame of voice signal, and the influence of rapidly changing signals at two ends of a voice frame on the analysis of the voice signal is reduced in such a way, and high-frequency interference is eliminated; a5 voice signal detection, each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, the average value of the short-time energy of each ten frames of voice signal segments is calculated, the average value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of the voice signal is improved.
After preprocessing, extracting the characteristics of the voice signals, extracting the amplitude and the energy count value of the voice section signal data, setting the enhancement of the neural network voice signals according to the amplitude and the energy count value of the voice section signal data, and reducing noise interference; the neural network specifically takes the preprocessed voice signals as input, clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are immediately extracted from voice signal data subjected to feature extraction to be used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500; the number of hidden layers is too many, and the training process is easy to trap partial optimal solution, so that the phenomenon of overfitting of the trained voice signal data is caused, and therefore three hidden layers are selected, and the improvement of the voice intelligibility after the voice signal is enhanced is facilitated.
In addition, the process of pruning the voice signal data by the analysis processing module by adopting an iteration method is as follows: firstly, setting a threshold value for voice signal parameter pruning according to the absolute value of the amplitude extracted by the characteristic, carrying out zero-cause processing when the absolute value of the amplitude is smaller than the threshold value, and pruning by using a group of masking matrixes, wherein the weight value of an item corresponding to the absolute value of the voice signal parameter smaller than the pruning threshold value returns to zero, and the weight value of the item higher than the pruned item is 1; secondly, performing secondary pruning on the data after neural network training in the same way; finally, the accuracy of the enhanced voice signal data is used as a judgment threshold, when the accuracy is smaller than the threshold, the clean voice data output by the last neural network is output, and if the accuracy is larger than the threshold, the pruning threshold is updated, and pruning is carried out again; until the trimming is complete. By pruning the neural network training model, the voice signal data is enhanced, the accuracy of the voice signal data is improved, and the accuracy of voice feedback is improved.
In addition, the whole process of data analysis by the analysis processing module is that after voice signal data is input, noise in a low-frequency environment is removed through band-pass filtering, then linear cancellation elimination of stimulation sound interference is carried out on the voice signal data, then threshold judgment is carried out, comparison with a threshold is carried out, an average value is superposed, impulse noise is removed, the signal to noise ratio is improved, dynamic tracking filtering is carried out, new voice signal data are synthesized, and finally Fourier transformation is carried out to obtain time domain, frequency domain graph and amplitude phase characteristics.
In addition, in the process of voice synthesis, when an expected amplitude value is constructed on a time domain, the amplitude value is set as a constant function of frequency, and the amplitude value does not change along with the frequency; constructing a phase spectrum on a frequency domain, constructed by group delay, at an instant T at which each frequency of the speech signal data occurs, the instant T satisfying T = T (f-f 1)/(f 2-f1) if the frequency increases from f1 to f2 when time varies from 0 to T seconds; where T is the duration of the increase in frequency from f1 to f 2; thus the phase spectrum phi satisfies phi = -pi Tf (f-f 1)/(f 2-f 1); after voice synthesis, the data frequency of the voice signal is continuous, the resolution ratio is high, the sound production efficiency is high, the instantaneous frequency of the voice signal is continuously changed from low to high according to a certain rule within a frequency range, the recognition degree of the voice signal is improved, the sensitivity of the cochlear nerve center is increased, excitation of an auditory efferent nervous system is triggered through voice signal synthesis, feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; the key words of the voice signals are extracted after voice synthesis, key information is retrieved from a knowledge base, the extracted key information is translated into English through a set translation module, English is converted by calling an API (application program interface) of a Baidu AI (artificial intelligence) level through the retrieved key words and the content of the corresponding text, and finally, English is output and fed back to a learner through a set output module.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention relates to a feedback type voice stimulation memory system, which is characterized in that a control module is arranged to control an input module to carry out voice input, then a voice signal is subjected to enhancement processing, and voice synthesis processing is carried out, so that the readable understandability of voice is increased, and the identifiability is higher; the quality of the voice signal is enhanced, the readable understandability of the voice signal is improved, and the voice feedback memory effect is enhanced.
(2) The feedback type voice stimulation memory system provided by the invention has the advantages that the accuracy and the precision of voice signal data are increased and the accuracy of subsequent translation and voice output is increased by pruning the voice signal data.
(3) The invention discloses a feedback type voice stimulation memory system, which is characterized in that each frame of short-time energy of a voice signal section is used as a threshold value for feature extraction, the short-time energy of each ten frames of the voice signal section is subjected to mean value calculation, the mean value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute section is eliminated, so that unreliable data of the voice signal section is eliminated, and the accuracy of a voice signal is improved.
(4) The feedback type voice stimulation memory system eliminates the noise of lip vibration in the occurrence process, reduces the influence of the noise on the voice data and increases the accuracy of the voice signal data by limiting the relation between the voice signal after pre-emphasis and the input voice signal data.
(5) The invention relates to a feedback type voice stimulation memory system, which limits the relation between instantaneous time and frequency by constructing a phase spectrum on a frequency domain, improves the identification degree of a voice signal, increases the sensitivity of a cochlear nerve center, induces the excitation of an auditory efferent nervous system through voice signal synthesis, is favorable for forming feedback type memory on the brain, and improves the memory ability and the memory effect of learners.
(6) The feedback type voice stimulation memory system provided by the invention has the advantages that the analysis processing of voice and the voice synthesis are combined with each other and act together, so that the recognition degree of voice signals is improved, the accuracy of voice signal data is improved, the accuracy of voice feedback is improved, and the feedback memory effect is further enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a block diagram of the system framework of the present invention.
Fig. 2 is a flow chart of speech signal preprocessing of the present invention.
Fig. 3 is a diagram of a neural network topology of the present invention.
Fig. 4 is a flow chart of the iterative pruning of speech signals of the present invention.
Fig. 5 is a data flow diagram of the speech signal data analysis process of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described in detail and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The first embodiment is as follows:
as shown in fig. 1, a feedback voice stimulation memory system; comprises a control module, a voice input module and an output module; after the voice is input, the voice is recognized, the input voice signal is analyzed and processed after the voice is recognized, and after the voice signal is analyzed and processed, the readable understandability of the voice is increased, so that the recognition performance is higher;
after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased;
the voice synthesis module is connected with an extraction module, the extraction module retrieves the keywords, and after retrieval, the speech signals are translated into English through the connected translation module and are simultaneously output and fed back to the learner; the voice input module, the analysis processing module, the synthesis module, the extraction module, the translation module and the output module are sequentially connected, communicate through signal lines, are all connected with the control module, and communicate and control signals with the control module.
The steps of speech synthesis are: firstly, processing voice signal data through an analysis processing module, and extracting time domain, frequency domain and time-frequency domain characteristics of the voice signal; and constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining the time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum.
The voice signal analysis processing module comprises voice signal preprocessing, voice signal feature extraction and voice signal enhancement processing.
The control module adopts an STM32 single chip microcomputer, and the voice input module (lb 3320), the translation module (icm 20602) and the output module (loudspeaker) are sequentially communicated with one-way serial ports; and the STM32 singlechip is in bidirectional serial port communication with the voice input module, the translation module and the output module.
Example two:
as shown in fig. 2, on the basis of the first embodiment, the voice signal preprocessing includes the steps of inputting the original voice, boosting the voice frequency part, and then resampling the voice signal according to 16 kHz; and performing framing and windowing on the resampled voice signal data, distinguishing a mute section and a voice section of the voice according to the short-time energy of each frame of voice signal, and performing feature extraction on the voice section.
After the learner inputs Chinese or English voice, the system automatically calls the Baidu recognition API to recognize the input and output voice, after the voice is recognized as a voice signal, the voice signal is preprocessed through the analysis processing module, and the preprocessing process is as follows: a1 pre-emphasis, boosting the high frequency part of the voice signal, filtering the voice signal data, the voice signal frequency H after filtering satisfies H =1-bz-1(ii) a In the formula, b is a pre-emphasis coefficient and has a value range of 0.89-1; z is the initial frequency of the speech signal data; the speech signal x2 (n) after pre-emphasis is represented as x2 (n) = x1 (n) - λ · x (n-1); in the above formula, x1 (n) is input voice signal data, and λ is an adjustment relation, and the value range is 0.76-0.97; by pre-emphasizing the voice signals, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is improved; a2, resampling the data after pre-emphasis, resampling the voice signal data by uniformly adopting 16 kHz; a3, framing, cutting the voice signal in short time steps on the time domain waveform of the voice signal to obtain small sections of voice signal parameters, combining the parameters into characteristic parameters on the time sequence of the whole voice signal, and finishing framing; a4 windowing, multiplying the recombined whole speech signal frame by a window function, wherein the window function adopts a cosine window w (N) =0.56-0.36cos (2 pi N/(N-1)); window length and frameThe lengths are equal; the windowed speech signal y satisfies y = x (n) -w (n); x (n) represents a single speech signal frame, wherein 0 is satisfied in the above formula<n<N, N is the number of sampling points of each frame of voice signal, and the influence of rapidly changing signals at two ends of a voice frame on the analysis of the voice signal is reduced in such a way, and high-frequency interference is eliminated; a5 voice signal detection, each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, the average value of the short-time energy of each ten frames of voice signal segments is calculated, the average value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of the voice signal is improved.
Example three:
as shown in fig. 3-4, the speech signal enhancement processing adopts a neural network to establish a speech signal enhancement model, and processes the speech signal through the neural network enhancement model, so as to enhance the quality of the speech signal, improve the intelligibility of the speech signal, and enhance the speech feedback memory effect.
The extraction method of the characteristic parameters comprises the following steps:
a, an analysis processing module acquires signal data of a voice section after voice signal preprocessing;
b, counting the energy counting value of the voice signal, wherein the energy counting value is obtained by adding the part of the voice segment data amplitude value larger than the threshold value to the energy counting;
c, comparing the energy count value with the maximum storage value, if the energy count value is larger than the maximum storage value, replacing, and recording the maximum energy count value;
and d, judging whether the voice section signal is finished or not, if the voice stops inputting, finishing the event, and storing the amplitude and the energy count value of the recorded voice section signal data and using the amplitude and the energy count value for voice signal enhancement processing.
After the speech signal is processed by the neural network enhancement model, the analysis processing module prunes the speech signal data by adopting an iteration method, so that the accuracy and the precision of the speech signal data are improved, and the accuracy of subsequent translation and speech output is improved.
After preprocessing, extracting the characteristics of the voice signals, extracting the amplitude and the energy count value of the voice section signal data, setting the enhancement of the neural network voice signals according to the amplitude and the energy count value of the voice section signal data, and reducing noise interference; the neural network specifically takes the preprocessed voice signals as input, clean voice signals as output, the number of neurons in an input layer is set to be 90, the number of neurons in an output layer is set to be 90, eighty thousand feature vectors are immediately extracted from voice signal data subjected to feature extraction to be used as input, the number of hidden layers is set to be three, and the number of neurons in each hidden layer is 500; the number of hidden layers is too many, and the training process is easy to trap partial optimal solution, so that the phenomenon of overfitting of the trained voice signal data is caused, and therefore three hidden layers are selected, and the improvement of the voice intelligibility after the voice signal is enhanced is facilitated.
The process of pruning the voice signal data by the analysis processing module by adopting an iteration method is as follows: firstly, setting a threshold value for voice signal parameter pruning according to the absolute value of the amplitude extracted by the characteristic, carrying out zero-cause processing when the absolute value of the amplitude is smaller than the threshold value, and pruning by using a group of masking matrixes, wherein the weight value of an item corresponding to the absolute value of the voice signal parameter smaller than the pruning threshold value returns to zero, and the weight value of the item higher than the pruned item is 1; secondly, performing secondary pruning on the data after neural network training in the same way; finally, the accuracy of the enhanced voice signal data is used as a judgment threshold, when the accuracy is smaller than the threshold, the clean voice data output by the last neural network is output, and if the accuracy is larger than the threshold, the pruning threshold is updated, and pruning is carried out again; until the trimming is complete. By pruning the neural network training model, the voice signal data is enhanced, the accuracy of the voice signal data is improved, and the accuracy of voice feedback is improved.
Example four
As shown in fig. 5, the whole process of data analysis by the analysis processing module includes that after voice signal data is input, noise in a low-frequency environment is removed through bandpass filtering, then linear cancellation elimination of stimulation acoustic interference is performed on the voice signal data, then threshold judgment is performed, comparison with a threshold is performed, an average value is superimposed, impulse noise is removed, signal-to-noise ratio is improved, dynamic tracking filtering is performed, new voice signal data are synthesized, and finally fourier transform is performed to obtain time domain, frequency domain graph and amplitude phase characteristics.
EXAMPLE five
In the process of voice synthesis, when an expected amplitude value is constructed on a time domain, the amplitude value is set as a constant function of frequency, and the amplitude value does not change along with the frequency; constructing a phase spectrum on a frequency domain, constructed by group delay, at an instant T at which each frequency of the speech signal data occurs, the instant T satisfying T = T (f-f 1)/(f 2-f1) if the frequency increases from f1 to f2 when time varies from 0 to T seconds; where T is the duration of the increase in frequency from f1 to f 2; thus the phase spectrum phi satisfies phi = -pi Tf (f-f 1)/(f 2-f 1); after voice synthesis, the data frequency of the voice signal is continuous, the resolution ratio is high, the sound production efficiency is high, the instantaneous frequency of the voice signal is continuously changed from low to high according to a certain rule within a frequency range, the recognition degree of the voice signal is improved, the sensitivity of the cochlear nerve center is increased, excitation of an auditory efferent nervous system is triggered through voice signal synthesis, feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; the key words of the voice signals are extracted after voice synthesis, key information is retrieved from a knowledge base, the extracted key information is translated into English through a set translation module, English is converted by calling an API (application program interface) of a Baidu AI (artificial intelligence) level through the retrieved key words and the content of the corresponding text, and finally, English is output and fed back to a learner through a set output module.
The device obtained by the technical scheme is a feedback type voice stimulation memory system, the control module is arranged to control the input module to perform voice input, then the voice signal is subjected to enhancement processing, and voice synthesis processing is performed, so that the readable understandability of voice is increased, and the recognition performance is higher; enhancing the quality of the voice signal, improving the readable intelligibility of the voice signal and enhancing the voice feedback memory effect; by pruning the voice signal data, the accuracy and the correctness of the voice signal data are increased, and the accuracy of subsequent translation and voice output is increased; each frame of short-time energy of the voice signal segment is used as a threshold value of feature extraction, mean value calculation is carried out on the short-time energy of each ten frames of voice signal segments, the mean value of the short-time energy is used as an energy amplitude threshold value in the feature extraction process, and the influence of a mute segment is eliminated, so that unreliable data of the voice signal segment is eliminated, and the accuracy of a voice signal is improved; by limiting the relation between the voice signal after pre-emphasis and the input voice signal data, the noise of lip vibration in the occurrence process is eliminated, the influence of the noise on the voice data is reduced, and the accuracy of the voice signal data is increased; the phase spectrum is constructed on the frequency domain, the relation between the instantaneous time and the frequency is limited, the recognition degree of a voice signal is improved, the sensitivity of a cochlear nerve center is increased, the excitation of an auditory efferent nervous system is triggered through voice signal synthesis, the feedback type memory is formed for the brain, and the memory capacity and the memory effect of a learner are improved; through the mutual combination of the analysis processing of the voice and the voice synthesis, the recognition degree of the voice signal is improved under the combined action, the accuracy of the voice signal data is improved, the accuracy of the voice feedback is improved, and the effect of feedback memory is further enhanced.
Other technical solutions not described in detail in the present invention are prior art in the field, and are not described herein again.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention; any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A feedback voice stimulation memory system; comprises a control module, a voice input module and an output module; the method is characterized in that after voice input, the voice is recognized, after the recognition, the input voice signal is analyzed, and after the voice signal is analyzed, the readable understandability of the voice is increased, so that the recognition performance is higher;
after the voice signal is analyzed and processed, the voice signal is synthesized through the arranged voice synthesis module, so that the influence of the voice signal due to time variation is reduced, and the definition of auditory feedback is increased;
the voice synthesis module is connected with an extraction module, the extraction module retrieves the keywords, and after retrieval, the speech signals are translated into English through the connected translation module and are simultaneously output and fed back to the learner; the voice input module, the analysis processing module, the synthesis module, the extraction module, the translation module and the output module are sequentially connected, communicate through signal lines, are all connected with the control module, and communicate and control signals with the control module.
2. The feedback voice-stimulated memory system of claim 1, wherein the voice synthesis comprises: firstly, processing voice signal data through an analysis processing module, and extracting time domain, frequency domain and time-frequency domain characteristics of the voice signal; and constructing an expected amplitude value on a time domain, constructing an expected phase spectrum on a frequency domain, and obtaining the time-frequency domain waveform of the voice signal according to the amplitude value and the phase spectrum.
3. The feedback voice stimulation memory system according to claim 1, wherein the voice signal analyzing and processing module comprises voice signal preprocessing, voice signal feature extraction, and voice signal enhancement processing.
4. The feedback voice stimulus memory system of claim 3, wherein the voice signal preprocessing comprises a voice input, a voice frequency portion is boosted, and then the voice signal is resampled at 16 kHz; and performing framing and windowing on the resampled voice signal data, distinguishing a mute section and a voice section of the voice according to the short-time energy of each frame of voice signal, and performing feature extraction on the voice section.
5. The feedback voice stimulation memory system according to claim 3, wherein the voice signal enhancement processing uses a neural network to build a voice signal enhancement model, and the neural network enhancement model is used to process the voice signal, so as to enhance the quality of the voice signal, improve the intelligibility of the voice signal, and enhance the voice feedback memory effect.
6. The feedback voice stimulation memory system according to claim 4, wherein the characteristic parameters are extracted by the following method:
a, an analysis processing module acquires signal data of a voice section after voice signal preprocessing;
b, counting the energy counting value of the voice signal, wherein the energy counting value is obtained by adding the part of the voice segment data amplitude value larger than the threshold value to the energy counting;
c, comparing the energy count value with the maximum storage value, if the energy count value is larger than the maximum storage value, replacing, and recording the maximum energy count value;
and d, judging whether the voice section signal is finished or not, if the voice stops inputting, finishing the event, and storing the amplitude and the energy count value of the recorded voice section signal data and using the amplitude and the energy count value for voice signal enhancement.
7. The feedback voice stimulation memory system according to claim 5, wherein after the voice signal is processed by the neural network enhancement model, the analysis processing module prunes the voice signal data by using an iterative method, so as to increase the accuracy and precision of the voice signal data and the precision of subsequent translation and voice output.
CN202110551051.3A 2021-05-20 2021-05-20 Feedback type voice stimulation memory system Pending CN113257219A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110551051.3A CN113257219A (en) 2021-05-20 2021-05-20 Feedback type voice stimulation memory system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110551051.3A CN113257219A (en) 2021-05-20 2021-05-20 Feedback type voice stimulation memory system

Publications (1)

Publication Number Publication Date
CN113257219A true CN113257219A (en) 2021-08-13

Family

ID=77182983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110551051.3A Pending CN113257219A (en) 2021-05-20 2021-05-20 Feedback type voice stimulation memory system

Country Status (1)

Country Link
CN (1) CN113257219A (en)

Similar Documents

Publication Publication Date Title
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN109887489B (en) Speech dereverberation method based on depth features for generating countermeasure network
CN108962229B (en) Single-channel and unsupervised target speaker voice extraction method
CN109256118B (en) End-to-end Chinese dialect identification system and method based on generative auditory model
CN105679312B (en) The phonetic feature processing method of Application on Voiceprint Recognition under a kind of noise circumstance
CN108597505A (en) Audio recognition method, device and terminal device
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN110136709A (en) Audio recognition method and video conferencing system based on speech recognition
CN110544482B (en) Single-channel voice separation system
CN111986679A (en) Speaker confirmation method, system and storage medium for responding to complex acoustic environment
CN115602165B (en) Digital employee intelligent system based on financial system
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
CN103054586A (en) Chinese speech automatic audiometric method based on Chinese speech audiometric dynamic word list
CN116665669A (en) Voice interaction method and system based on artificial intelligence
CN104778948A (en) Noise-resistant voice recognition method based on warped cepstrum feature
CN112183582A (en) Multi-feature fusion underwater target identification method
CN108520759A (en) Time-frequency characteristics image extraction method for Parkinson&#39;s disease speech detection
CN111508504A (en) Speaker recognition method based on auditory center perception mechanism
CN113571095A (en) Speech emotion recognition method and system based on nested deep neural network
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
CN115472168B (en) Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features
CN111261192A (en) Audio detection method based on LSTM network, electronic equipment and storage medium
CN116312561A (en) Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination