CN107481732B - Noise reduction method and device in spoken language evaluation and terminal equipment - Google Patents

Noise reduction method and device in spoken language evaluation and terminal equipment Download PDF

Info

Publication number
CN107481732B
CN107481732B CN201710769182.2A CN201710769182A CN107481732B CN 107481732 B CN107481732 B CN 107481732B CN 201710769182 A CN201710769182 A CN 201710769182A CN 107481732 B CN107481732 B CN 107481732B
Authority
CN
China
Prior art keywords
audio signal
signal
frequency domain
time
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710769182.2A
Other languages
Chinese (zh)
Other versions
CN107481732A (en
Inventor
梁金辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN201710769182.2A priority Critical patent/CN107481732B/en
Publication of CN107481732A publication Critical patent/CN107481732A/en
Application granted granted Critical
Publication of CN107481732B publication Critical patent/CN107481732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • G10L19/0216Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation using wavelet decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention belongs to the technical field of audio processing, and particularly relates to a noise reduction method and device in spoken language evaluation and terminal equipment. The method comprises the following steps: collecting a first audio signal of a person to be evaluated; carrying out noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, wherein the first sample signal is the frequency spectrum of the audio signal of the person to be evaluated, which is acquired under the condition that a background audio signal is smaller than a preset threshold value; and determining the second audio signal as a target evaluation signal of the person to be evaluated. The invention can effectively filter the noise, thereby reducing the deviation of the noise to the evaluation result, enabling the evaluation result to accurately reflect the true level of the person to be evaluated and bringing good use experience to the person to be evaluated.

Description

Noise reduction method and device in spoken language evaluation and terminal equipment
Technical Field
The invention belongs to the technical field of audio processing, and particularly relates to a noise reduction method and device in spoken language evaluation and terminal equipment.
Background
At present, a large amount of spoken language evaluation software and terminal equipment exist in the market, and the basic principle is to compare the spoken language pronunciation of a person to be evaluated with a preset standard pronunciation, evaluate the spoken language level of the person to be evaluated according to the similarity between the spoken language pronunciation and the preset standard pronunciation and give a corresponding score. However, when the person to be evaluated performs spoken language evaluation, the person may be in various noisy environments, and when the noise is too large, the evaluation result may have a large deviation, which may not reflect the true level of the person to be evaluated, and may bring poor use experience to the person to be evaluated.
Disclosure of Invention
In view of this, embodiments of the present invention provide a noise reduction method and apparatus in spoken language evaluation, and a terminal device, so as to solve the problem in the prior art that when the noise is too large, the evaluation result has a large deviation, the true level of the person to be evaluated cannot be reflected, and poor use experience is brought to the person to be evaluated.
The first aspect of the embodiments of the present invention provides a noise reduction method in spoken language assessment, which may include:
collecting a first audio signal of a person to be evaluated;
carrying out noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, wherein the first sample signal is the frequency spectrum of the audio signal of the person to be evaluated, which is acquired under the condition that a background audio signal is smaller than a preset threshold value;
and determining the second audio signal as a target evaluation signal of the person to be evaluated.
Further, the performing noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal may include:
respectively obtaining a first time-frequency domain matrix of the first sample signal and a second time-frequency domain matrix of the first audio signal through wavelet packet transformation;
respectively calculating a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix;
calculating a first ratio of each corresponding element between the first feature vector and the second feature vector;
determining a vector composed of each first ratio as a first weight vector;
performing weighting processing on the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix;
and performing inverse wavelet packet transformation on the third time-frequency domain matrix to obtain the second audio signal.
Further, after obtaining the third time-frequency domain matrix, the method may further include:
calculating a third eigenvector of the third time-frequency domain matrix;
carrying out mean value filtering processing on the first feature vector and the third feature vector to obtain a fourth feature vector;
calculating a second ratio of each corresponding element between the fourth feature vector and the first feature vector;
determining a vector composed of each second ratio as a second weight vector;
weighting the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix;
performing inverse wavelet packet transformation on the fourth time-frequency domain matrix to obtain a second sample signal;
determining the second sample signal as the first sample signal at the next evaluation.
Further, before performing noise reduction processing on the first audio signal according to a preset first sample signal, the method may further include:
collecting a background audio signal when the person to be evaluated does not produce sound;
calculating a first spectrum of the first audio signal and a second spectrum of the background audio signal, respectively;
generating a third spectrum having the same amplitude and opposite phase as the second spectrum;
superposing the first frequency spectrum and the third frequency spectrum to obtain a fourth frequency spectrum;
and determining the audio signal corresponding to the fourth frequency spectrum as the adjusted first audio signal.
Further, before performing noise reduction processing on the first audio signal according to a preset first sample signal, the method may further include:
calculating a first frequency spectrum of the first audio signal;
respectively calculating the sample frequency spectrum of each sample signal in a preset audio sample library;
screening out a first sample frequency spectrum which has the highest coincidence degree with the frequency spectrum range of the first frequency spectrum and exceeds a preset threshold value from the sample frequency spectrums of the sample signals;
and determining a sample signal corresponding to the first sample spectrum as the first sample signal.
A second aspect of an embodiment of the present invention provides a noise reduction device in spoken language evaluation, which may include:
the signal acquisition module is used for acquiring a first audio signal of a person to be evaluated;
the noise reduction processing module is used for carrying out noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, wherein the first sample signal is the frequency spectrum of the audio signal of the person to be evaluated, which is acquired under the condition that a background audio signal is smaller than a preset threshold value;
and the evaluation signal determining module is used for determining the second audio signal as a target evaluation signal of the person to be evaluated.
Further, the noise reduction processing module may include:
a wavelet packet transformation unit, configured to obtain a first time-frequency domain matrix of the first sample signal and a second time-frequency domain matrix of the first audio signal through wavelet packet transformation, respectively;
a first vector calculation unit, configured to calculate a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix, respectively;
a first ratio calculation unit, configured to calculate a first ratio of each corresponding element between the first feature vector and the second feature vector;
a first weight determining unit, configured to determine a vector composed of each of the first ratios as a first weight vector;
the first weighting processing unit is used for weighting the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix;
and the first inverse wavelet packet transformation unit is used for performing inverse wavelet packet transformation on the third time-frequency domain matrix to obtain the second audio signal.
Further, the noise reduction processing module may further include:
a second vector calculation unit, configured to calculate a third eigenvector of the third time-frequency domain matrix;
the filtering processing unit is used for carrying out mean value filtering processing on the first eigenvector and the third eigenvector to obtain a fourth eigenvector;
a second ratio calculation unit configured to calculate a second ratio of each corresponding element between the fourth feature vector and the first feature vector;
a second weight determining unit, configured to determine a vector formed by each of the second ratios as a second weight vector;
the second weighting processing unit is used for weighting the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix;
the second inverse wavelet packet transformation unit is used for performing inverse wavelet packet transformation on the fourth time-frequency domain matrix to obtain a second sample signal;
and the sample signal determining unit is used for determining the second sample signal as the first sample signal at the next evaluation.
Further, the noise reduction device in the spoken language test may further include:
the background signal acquisition module is used for acquiring a background audio signal when the person to be evaluated does not produce sound;
a spectrum calculation module, configured to calculate a first spectrum of the first audio signal and a second spectrum of the background audio signal, respectively;
the inverse frequency spectrum generating module is used for generating a third frequency spectrum which has the same amplitude and is opposite to the phase of the second frequency spectrum;
the frequency spectrum superposition module is used for superposing the first frequency spectrum and the third frequency spectrum to obtain a fourth frequency spectrum;
and the signal adjusting module is used for determining the audio signal corresponding to the fourth frequency spectrum as the adjusted first audio signal.
Further, the noise reduction device in the spoken language test may further include:
a signal spectrum calculation module for calculating a first spectrum of the first audio signal;
the sample spectrum calculation module is used for calculating the sample spectrum of each sample signal in a preset audio sample library respectively;
the frequency spectrum screening module is used for screening out a first sample frequency spectrum which has the highest coincidence degree with the frequency spectrum range of the first frequency spectrum and exceeds a preset threshold value from the sample frequency spectrums of the sample signals;
a sample signal determination module, configured to determine a sample signal corresponding to the first sample spectrum as the first sample signal.
A third aspect of the embodiments of the present invention provides a noise reduction terminal device in a spoken language test, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the noise reduction method in any one of the above spoken language evaluations when executing the computer program.
A fourth aspect of an embodiment of the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the noise reduction method in any one of the above spoken language assessments.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the embodiment of the invention, a first audio signal of a person to be evaluated is firstly collected, then noise reduction processing is carried out on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, and finally the second audio signal is determined as a target evaluation signal of the person to be evaluated so as to carry out oral evaluation on the person to be evaluated based on the target evaluation signal. The first sample signal is the audio signal of the person to be evaluated, which is acquired under the condition that the background audio signal is small, and the first sample signal is taken as a reference, so that the noise can be effectively filtered, the deviation of the noise on the evaluation result is reduced, the evaluation result can accurately reflect the true level of the person to be evaluated, and good use experience is brought to the person to be evaluated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a noise reduction method in spoken language evaluation according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of the denoising process in step S102 according to the embodiment of the present invention;
FIG. 3 is a schematic block diagram of a noise reduction apparatus for spoken language evaluation according to an embodiment of the present invention;
fig. 4 is a schematic block diagram of a noise reduction terminal device in spoken language evaluation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
fig. 1 is a schematic flowchart of a noise reduction method in spoken language evaluation according to an embodiment of the present invention, where the method includes:
step S101, collecting a first audio signal of a person to be evaluated.
In this embodiment, the first audio signal may be collected by a single microphone, or more than two paths of audio signals of the person to be evaluated may be collected by more than two microphones in different directions, and then the more than two paths of audio signals are synthesized into the first audio signal.
Preferably, the first audio signal may be pre-denoised with a background audio signal. Firstly, collecting a background audio signal when the person to be evaluated does not produce sound, respectively calculating a first frequency spectrum of the first audio signal and a second frequency spectrum of the background audio signal, then generating a third frequency spectrum which has the same amplitude as the second frequency spectrum and is opposite in phase, superposing the first frequency spectrum and the third frequency spectrum to obtain a fourth frequency spectrum, and finally determining an audio signal corresponding to the fourth frequency spectrum as the adjusted first audio signal. The pre-noise reduction processing process can effectively offset the background noise in the first audio signal through the additionally collected background audio signal.
And S102, carrying out noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal.
The first sample signal is a frequency spectrum of the audio signal of the person to be evaluated, which is acquired under the condition that the background audio signal is smaller than a preset threshold, and is used as a reference signal when the first audio signal is subjected to noise reduction processing, where the threshold may be set according to an actual situation, for example, the threshold may be set to 1/100 or 1/1000 of the intensity of the audio signal, and a specific value of the threshold is not specifically limited in this embodiment.
It should be noted that, in practical applications, there may be a case where a plurality of testees share the same evaluation device, and sample signals of the testees are preset in the evaluation device, so that a sample signal matching the first audio signal must be first screened out from the sample signals as the first sample signal. Specifically, first frequency spectrums of the first audio signals are calculated, sample frequency spectrums of each sample signal in a preset audio sample library are calculated respectively, then first sample frequency spectrums, which have the highest coincidence degree with frequency spectrum ranges of the first frequency spectrums and exceed a preset threshold value, are screened from the sample frequency spectrums of each sample signal, and finally, the sample signals corresponding to the first sample frequency spectrums are determined to be the first sample signals.
As shown in fig. 2, the denoising process in step S102 may specifically include:
step S1021, a first time-frequency domain matrix of the first sample signal and a second time-frequency domain matrix of the first audio signal are obtained through wavelet packet transformation respectively.
Step S1022, respectively calculating a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix.
In step S1023, a first ratio of each corresponding element between the first feature vector and the second feature vector is calculated.
Step S1024, determining the vector formed by the first ratios as a first weight vector.
And S1025, performing weighting processing on the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix.
Step S1026, inverse wavelet packet transformation is performed on the third time-frequency domain matrix to obtain the second audio signal.
The traditional signal analysis and processing method generally adopts Fourier analysis, which is an analysis method with a fixed and unchangeable window function and cannot reflect the characteristics of non-stationarity, short duration, time domain and frequency domain localization and the like of signals. The wavelet analysis is a time-frequency localization analysis method with fixed window area but changeable shape, i.e. changeable time and frequency windows, because it only decomposes the low-frequency signal again in the decomposition process, and does not decompose the high-frequency signal any more, so that its frequency resolution decreases with the increase of frequency.
In this embodiment, a wavelet packet analysis method is preferably used for signal transformation and processing, and the wavelet packet analysis can provide a finer analysis method for the signal, divide the time-frequency plane more finely, and have a higher resolution for the high-frequency part of the signal than the wavelet analysis. Moreover, on the basis of wavelet analysis theory, the method introduces the concept of optimal basis selection, divides the frequency band by multiple levels, and adaptively selects the optimal basis function according to the characteristics of the analyzed signal to match with the signal, so as to improve the analysis capability of the signal.
From the perspective of function theory, the wavelet packet transform projects a signal into a space spanned by wavelet packet basis functions. From a signal processing point of view, it is to pass the signal through a series of filters of different center frequencies but of the same bandwidth.
In this embodiment, the first audio signal is weighted in the time-frequency domain after the wavelet transform according to the first sample signal, and since the noise signal in the first sample signal is almost negligible relative to the useful signal, after weighting, the useful signal portion in the first audio signal is enhanced, the noise signal portion is weakened, and the noise reduction effect is effectively achieved.
Preferably, the first sample signal can be further adjusted to reflect the sound characteristics of the person to be evaluated. Specifically, a third eigenvector of the third time-frequency domain matrix is calculated first, mean filtering is performed on the first eigenvector and the third eigenvector to obtain a fourth eigenvector, second ratios of corresponding elements between the fourth eigenvector and the first eigenvector are calculated, a vector composed of the second ratios is determined as a second weight vector, weighting is performed on the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix, inverse wavelet packet transformation is performed on the fourth time-frequency domain matrix to obtain a second sample signal, and finally the second sample signal is determined as a first sample signal in the next evaluation.
The essence of the adjustment of the first sample signal is that the newly acquired audio signal of the person to be evaluated is fused with the original sample signal, so that the richness of the sample signal is increased, and the fused sample signal can reflect the sound characteristics of the person to be evaluated. The adjustment is a continuous accumulation process, and the audio signal can be used for adjusting the sample signal once every time the audio signal is recorded by the person to be evaluated, so that the reliability of the sample signal can be continuously improved through the continuous repeated forward feedback process.
Step S103, determining the second audio signal as a target evaluation signal of the person to be evaluated.
Due to the series of noise reduction processes, the second audio signal can better reflect the true spoken language level of the person to be evaluated than the first audio signal, and thus can be determined as the target evaluation signal of the person to be evaluated.
In summary, in the embodiment of the present invention, a first audio signal of a person to be evaluated is first collected, then, a noise reduction process is performed on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, and finally, the second audio signal is determined as a target evaluation signal of the person to be evaluated, so as to perform oral evaluation on the person to be evaluated based on the target evaluation signal. The first sample signal is the audio signal of the person to be evaluated, which is acquired under the condition that the background audio signal is small, and the first sample signal is taken as a reference, so that the noise can be effectively filtered, the deviation of the noise on the evaluation result is reduced, the evaluation result can accurately reflect the true level of the person to be evaluated, and good use experience is brought to the person to be evaluated.
Example two:
fig. 3 is a schematic block diagram of a device for reducing noise in spoken language evaluation according to an embodiment of the present invention, where the device may include:
the signal acquisition module 301 is used for acquiring a first audio signal of a person to be evaluated;
the noise reduction processing module 302 is configured to perform noise reduction processing on the first audio signal according to a preset first sample signal to obtain a denoised second audio signal, where the first sample signal is a frequency spectrum of an audio signal of the person to be evaluated, where the frequency spectrum is acquired when a background audio signal is smaller than a preset threshold;
and the evaluation signal determining module 303 is configured to determine the second audio signal as a target evaluation signal of the person to be evaluated.
Further, the denoising module 303 may include:
a wavelet packet transformation unit, configured to obtain a first time-frequency domain matrix of the first sample signal and a second time-frequency domain matrix of the first audio signal through wavelet packet transformation, respectively;
a first vector calculation unit, configured to calculate a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix, respectively;
a first ratio calculation unit, configured to calculate a first ratio of each corresponding element between the first feature vector and the second feature vector;
a first weight determining unit, configured to determine a vector composed of each of the first ratios as a first weight vector;
the first weighting processing unit is used for weighting the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix;
and the first inverse wavelet packet transformation unit is used for performing inverse wavelet packet transformation on the third time-frequency domain matrix to obtain the second audio signal.
Further, the denoising module 303 may further include:
a second vector calculation unit, configured to calculate a third eigenvector of the third time-frequency domain matrix;
the filtering processing unit is used for carrying out mean value filtering processing on the first eigenvector and the third eigenvector to obtain a fourth eigenvector;
a second ratio calculation unit configured to calculate a second ratio of each corresponding element between the fourth feature vector and the first feature vector;
a second weight determining unit, configured to determine a vector formed by each of the second ratios as a second weight vector;
the second weighting processing unit is used for weighting the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix;
the second inverse wavelet packet transformation unit is used for performing inverse wavelet packet transformation on the fourth time-frequency domain matrix to obtain a second sample signal;
and the sample signal determining unit is used for determining the second sample signal as the first sample signal at the next evaluation.
Further, the noise reduction device in the spoken language test may further include:
a background signal acquisition module 304, configured to acquire a background audio signal when the person to be evaluated does not produce a sound;
a spectrum calculation module 305, configured to calculate a first spectrum of the first audio signal and a second spectrum of the background audio signal, respectively;
an inverse spectrum generating module 306, configured to generate a third spectrum having the same amplitude and an opposite phase as the second spectrum;
a spectrum stacking module 307, configured to stack the first spectrum and the third spectrum to obtain a fourth spectrum;
a signal adjusting module 308, configured to determine the audio signal corresponding to the fourth spectrum as the adjusted first audio signal.
Further, the noise reduction device in the spoken language test may further include:
a signal spectrum calculation module 309, configured to calculate a first spectrum of the first audio signal;
a sample spectrum calculating module 310, configured to calculate sample spectra of sample signals in a preset audio sample library respectively;
a spectrum screening module 311, configured to screen a first sample spectrum, which has a highest coincidence degree with a spectrum range of the first spectrum and exceeds a preset threshold, from the sample spectrums of the respective sample signals;
a sample signal determining module 312, configured to determine a sample signal corresponding to the first sample spectrum as the first sample signal.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 4 is a schematic block diagram of a noise reduction terminal device in spoken language evaluation according to an embodiment of the present invention. As shown in fig. 4, the noise reduction terminal device 4 in the spoken language evaluation of the embodiment includes: a processor 40, a memory 41 and a computer program 42 stored in said memory 41 and executable on said processor 40. The processor 40 executes the computer program 42 to implement the steps in the noise reduction method embodiments in the above-mentioned respective spoken language evaluations, for example, the steps S101 to S103 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 301 to 303 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the noise reduction terminal device 4. For example, the computer program 42 may be divided into a signal acquisition module, a noise reduction processing module, and an evaluation signal determination module.
The noise reduction terminal device 4 in the spoken language assessment may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The noise reduction terminal device in the spoken language evaluation may include, but is not limited to, a processor 40 and a memory 41. It will be understood by those skilled in the art that fig. 4 is merely an example of the noise reduction terminal device 4 in the spoken language evaluation, and does not constitute a limitation of the noise reduction terminal device 4 in the spoken language evaluation, and may include more or less components than those shown, or combine some components, or different components, for example, the noise reduction terminal device 4 in the spoken language evaluation may further include an input-output device, a network access device, a bus, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the noise reduction terminal device 4 in the spoken language evaluation, for example, a hard disk or a memory of the noise reduction terminal device 4 in the spoken language evaluation. The memory 41 may also be an external storage device of the noise reduction terminal device 4 in the spoken language evaluation, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), or the like, which is equipped on the noise reduction terminal device 4 in the spoken language evaluation. Further, the memory 41 may include both an internal storage unit and an external storage device of the noise reduction terminal device 4 in the spoken language evaluation. The memory 41 is used for storing the computer program and other programs and data required by the noise reduction terminal device 4 in the spoken language assessment. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (6)

1. A noise reduction method in spoken language evaluation is characterized by comprising the following steps:
collecting a first audio signal of a person to be evaluated;
respectively obtaining a first time-frequency domain matrix of a first sample signal and a second time-frequency domain matrix of the first audio signal through wavelet packet transformation, wherein the first sample signal is a frequency spectrum of an audio signal of the person to be evaluated, which is acquired under the condition that a background audio signal is smaller than a preset threshold value;
respectively calculating a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix;
calculating a first ratio of each corresponding element between the first feature vector and the second feature vector;
determining a vector composed of each first ratio as a first weight vector;
performing weighting processing on the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix;
performing inverse wavelet packet transformation on the third time-frequency domain matrix to obtain a denoised second audio signal;
determining the second audio signal as a target evaluation signal of the person to be evaluated;
calculating a third eigenvector of the third time-frequency domain matrix;
carrying out mean value filtering processing on the first feature vector and the third feature vector to obtain a fourth feature vector;
calculating a second ratio of each corresponding element between the fourth feature vector and the first feature vector;
determining a vector composed of each second ratio as a second weight vector;
weighting the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix;
performing inverse wavelet packet transformation on the fourth time-frequency domain matrix to obtain a second sample signal;
determining the second sample signal as the first sample signal at the next evaluation.
2. The method of noise reduction in spoken language evaluation according to claim 1, further comprising, before the noise reduction processing of the first audio signal according to a preset first sample signal:
collecting a background audio signal when the person to be evaluated does not produce sound;
calculating a first spectrum of the first audio signal and a second spectrum of the background audio signal, respectively;
generating a third spectrum having the same amplitude and opposite phase as the second spectrum;
superposing the first frequency spectrum and the third frequency spectrum to obtain a fourth frequency spectrum;
and determining the audio signal corresponding to the fourth frequency spectrum as the adjusted first audio signal.
3. The method of noise reduction in spoken language evaluation according to any of claims 1-2, further comprising, before the noise reduction processing of the first audio signal according to a preset first sample signal:
calculating a first frequency spectrum of the first audio signal;
respectively calculating the sample frequency spectrum of each sample signal in a preset audio sample library;
screening out a first sample frequency spectrum which has the highest coincidence degree with the frequency spectrum range of the first frequency spectrum and exceeds a preset threshold value from the sample frequency spectrums of the sample signals;
and determining a sample signal corresponding to the first sample spectrum as the first sample signal.
4. A noise reduction device in a spoken language test, comprising:
the signal acquisition module is used for acquiring a first audio signal of a person to be evaluated;
the noise reduction processing module is used for respectively obtaining a first time-frequency domain matrix of a first sample signal and a second time-frequency domain matrix of the first audio signal through wavelet packet transformation, wherein the first sample signal is a frequency spectrum of the audio signal of the person to be evaluated, which is acquired under the condition that a background audio signal is smaller than a preset threshold value; respectively calculating a first eigenvector of the first time-frequency domain matrix and a second eigenvector of the second time-frequency domain matrix; calculating a first ratio of each corresponding element between the first feature vector and the second feature vector; determining a vector composed of each first ratio as a first weight vector; performing weighting processing on the second time-frequency domain matrix according to the first weight vector to obtain a third time-frequency domain matrix; performing inverse wavelet packet transformation on the third time-frequency domain matrix to obtain a denoised second audio signal;
the evaluation signal determining module is used for determining the second audio signal as a target evaluation signal of the person to be evaluated;
the noise reduction processing module is further configured to calculate a third eigenvector of the third time-frequency domain matrix; carrying out mean value filtering processing on the first feature vector and the third feature vector to obtain a fourth feature vector; calculating a second ratio of each corresponding element between the fourth feature vector and the first feature vector; determining a vector composed of each second ratio as a second weight vector; weighting the first time-frequency domain matrix according to the second weight vector to obtain a fourth time-frequency domain matrix; performing inverse wavelet packet transformation on the fourth time-frequency domain matrix to obtain a second sample signal; determining the second sample signal as the first sample signal at the next evaluation.
5. A noise reduction terminal device in a spoken language assessment, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the noise reduction method in the spoken language assessment according to any one of claims 1 to 3.
6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method of noise reduction in a spoken language assessment according to any one of claims 1 to 3.
CN201710769182.2A 2017-08-31 2017-08-31 Noise reduction method and device in spoken language evaluation and terminal equipment Active CN107481732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710769182.2A CN107481732B (en) 2017-08-31 2017-08-31 Noise reduction method and device in spoken language evaluation and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710769182.2A CN107481732B (en) 2017-08-31 2017-08-31 Noise reduction method and device in spoken language evaluation and terminal equipment

Publications (2)

Publication Number Publication Date
CN107481732A CN107481732A (en) 2017-12-15
CN107481732B true CN107481732B (en) 2020-10-02

Family

ID=60603401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710769182.2A Active CN107481732B (en) 2017-08-31 2017-08-31 Noise reduction method and device in spoken language evaluation and terminal equipment

Country Status (1)

Country Link
CN (1) CN107481732B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599234A (en) * 2020-05-19 2020-08-28 黑龙江工业学院 Automatic English spoken language scoring system based on voice recognition
CN111640447B (en) * 2020-05-26 2023-03-21 广东小天才科技有限公司 Method for reducing noise of audio signal and terminal equipment
CN116631410B (en) * 2023-07-25 2023-10-24 陈志丰 Voice recognition method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102419972A (en) * 2011-11-28 2012-04-18 西安交通大学 Method of detecting and identifying sound signals
GB2516208A (en) * 2012-10-25 2015-01-21 Azenby Ltd Noise reduction in voice communications
CN106128477A (en) * 2016-06-23 2016-11-16 南阳理工学院 A kind of spoken identification correction system
CN106157952A (en) * 2016-08-30 2016-11-23 北京小米移动软件有限公司 Sound identification method and device
CN106952654A (en) * 2017-04-24 2017-07-14 北京奇虎科技有限公司 Robot noise-reduction method, device and robot

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102419972A (en) * 2011-11-28 2012-04-18 西安交通大学 Method of detecting and identifying sound signals
GB2516208A (en) * 2012-10-25 2015-01-21 Azenby Ltd Noise reduction in voice communications
CN106128477A (en) * 2016-06-23 2016-11-16 南阳理工学院 A kind of spoken identification correction system
CN106157952A (en) * 2016-08-30 2016-11-23 北京小米移动软件有限公司 Sound identification method and device
CN106952654A (en) * 2017-04-24 2017-07-14 北京奇虎科技有限公司 Robot noise-reduction method, device and robot

Also Published As

Publication number Publication date
CN107481732A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
US11581005B2 (en) Methods and systems for improved signal decomposition
Ur Rehman et al. Filter bank property of multivariate empirical mode decomposition
Damaševičius et al. IMF mode demixing in EMD for jitter analysis
CN107481732B (en) Noise reduction method and device in spoken language evaluation and terminal equipment
US7099417B2 (en) Trace video filtering using wavelet de-noising techniques
CN111796790B (en) Sound effect adjusting method and device, readable storage medium and terminal equipment
CN108387887A (en) A kind of mixing noise-reduction method of underwater sound signal
CN111553207A (en) Statistical distribution-based ship radiation noise characteristic recombination method
CN111863014A (en) Audio processing method and device, electronic equipment and readable storage medium
CN114187922A (en) Audio detection method and device and terminal equipment
Kopriva et al. Wavelet packets approach to blind separation of statistically dependent sources
CN117692074B (en) Low-frequency aliasing noise suppression method suitable for unsteady-state underwater sound target signal
CN111968651A (en) WT (WT) -based voiceprint recognition method and system
CN103784164A (en) Method and system for processing ultrasonic signals
Sakiyama et al. Design of polynomial approximated filters for signals on directed graphs
Khaldi et al. Voiced speech enhancement based on adaptive filtering of selected intrinsic mode functions
CN104156925A (en) Processing method and system used for carrying out speckle removing and boundary enhancing on ultrasound image
CN103903631A (en) Speech signal blind separating method based on variable step size natural gradient algorithm
Berthelot et al. Filtering-based analysis comparing the DFA with the CDFA for wide sense stationary processes
CN114220451A (en) Audio denoising method, electronic device, and storage medium
Liu et al. A method for blind source separation of multichannel electromagnetic radiation in the field
Damaševičius et al. IMF remixing for mode demixing in EMD and application for jitter analysis
CN113314147A (en) Training method and device of audio processing model and audio processing method and device
CN113238206B (en) Signal detection method and system based on decision statistic design
Makary et al. Spectral subtraction denoising improves accuracy of slow cortical potential based brain-computer interfacing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant