CN114639390A - Voice noise analysis method and system - Google Patents

Voice noise analysis method and system Download PDF

Info

Publication number
CN114639390A
CN114639390A CN202011499230.9A CN202011499230A CN114639390A CN 114639390 A CN114639390 A CN 114639390A CN 202011499230 A CN202011499230 A CN 202011499230A CN 114639390 A CN114639390 A CN 114639390A
Authority
CN
China
Prior art keywords
noise
intensity
level
audio
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011499230.9A
Other languages
Chinese (zh)
Inventor
刘刚
龚科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DMAI Guangzhou Co Ltd
Original Assignee
DMAI Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DMAI Guangzhou Co Ltd filed Critical DMAI Guangzhou Co Ltd
Priority to CN202011499230.9A priority Critical patent/CN114639390A/en
Publication of CN114639390A publication Critical patent/CN114639390A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a voice noise analysis method and a system, wherein the method comprises the following steps: acquiring voice data to be analyzed; extracting a noise audio segment only containing noise from voice data to be analyzed; determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade; and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. The noise level evaluation method based on the noise level of the voice data to be analyzed has the advantages that the noise intensity indexes of the noise audio frequency segments only containing noise are calculated to carry out independent analysis, then the noise level evaluation result of the whole voice data to be analyzed is determined according to the distribution condition of the noise intensity levels of all the noise audio frequency segments, the influence of normal voice is avoided, objective evaluation of the noise level of the voice data to be analyzed is achieved, reference audio is not needed, the application range is wider, and the noise conditions under various scenes can be accurately reflected.

Description

Voice noise analysis method and system
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a voice noise analysis method and system.
Background
With the rapid development of the mobile internet, the application of communication software is also more and more extensive, for example: more and more teachers give on-line teaching guidance to students through instant messaging software to replace the traditional face-to-face teaching mode. However, when using communication software, noise can seriously affect the communication audio quality, and in places with high requirements on noise, for example: when a student listens to the audio course recorded by the teacher on line through the communication software, the noise of the audio in the audio course is as small as possible, so that the teaching effect is improved. However, because the number of online teaching audios is huge, the traditional method of manually analyzing the noise of each class is huge in workload and the analysis result is very subjective.
In the prior art, an index evaluation mode (such as signal-to-noise ratio, segmented signal-to-noise ratio, and the like) for objectively evaluating a noise condition needs to have a reference audio identical to a speech content strictly aligned with the noise condition of one audio in time when measuring the noise condition of the audio, and for a teaching scene or other conditions that the reference audio cannot be obtained, the existing noise evaluation method cannot perform noise evaluation, so how to achieve objective evaluation of speech noise without the reference audio is an urgent problem to be solved.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for analyzing speech noise, so as to overcome a problem in the prior art that objective evaluation of speech noise is difficult to achieve without reference to audio.
The embodiment of the invention provides a voice noise analysis method, which comprises the following steps:
acquiring voice data to be analyzed;
extracting a noise audio segment only containing noise from the voice data to be analyzed;
determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;
and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio segment.
Optionally, the extracting a noise audio segment containing only noise from the voice data to be analyzed includes:
dividing the voice data to be analyzed into a plurality of audio segments based on the total time length of the voice data to be analyzed and a preset extraction time length period;
converting each audio segment into a magnitude spectrum;
inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain the probability that each audio clip only contains noise;
and screening the noise audio clips only containing the noise from the audio clips based on a preset probability threshold.
Optionally, the determining, based on the noise intensity indicator of each noise audio segment and a preset noise intensity classification level, a noise intensity level corresponding to each noise audio segment includes:
respectively calculating a noise intensity index corresponding to each noise audio segment;
acquiring noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels;
determining a current noise intensity index range corresponding to a current noise audio frequency segment according to a noise intensity index corresponding to the current noise audio frequency segment;
and determining the noise intensity level corresponding to the current noise intensity index range as the noise intensity level of the current noise audio frequency segment.
Optionally, the determining the noise level of the speech data to be analyzed according to the distribution of the noise intensity levels corresponding to the noise audio segments includes:
obtaining the ratio of different noise intensity levels in each noise audio segment;
and determining a noise level evaluation result of the voice data to be analyzed according to the ratios of different noise intensity levels and preset ratio evaluation indexes.
Optionally, the noise intensity level comprises: high intensity noise level, medium intensity noise level, and low intensity noise level.
Optionally, the determining a noise level evaluation result of the voice data to be analyzed according to the ratios of the different noise intensity levels and a preset ratio evaluation index includes:
acquiring the ratio of the high-intensity noise level;
and determining a noise level evaluation result of the voice data to be analyzed according to the relation between the ratio of the high-intensity noise level and the ratio range of the preset high-intensity noise level in the preset ratio evaluation index.
Optionally, the noise level evaluation result includes: low, moderate and high noise levels, wherein,
when the ratio of the high-intensity noise level is smaller than the minimum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is low in noise level;
when the proportion of the high-intensity noise level is within the preset proportion range of the high-intensity noise level in the preset proportion evaluation index, judging that the noise level evaluation result is that the noise level is moderate;
and when the ratio of the high-intensity noise level is larger than the maximum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is that the noise level is high.
The embodiment of the present invention further provides a speech noise analysis system, including:
the acquisition module is used for acquiring voice data to be analyzed;
the first processing module is used for extracting a noise audio segment only containing noise from the voice data to be analyzed;
the second processing module is used for determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;
and the third processing module is used for determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio segment.
An embodiment of the present invention further provides an electronic device, including: the voice noise analysis device comprises a memory and a processor, wherein the memory and the processor are mutually connected in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions so as to execute the voice noise analysis method provided by the embodiment of the invention.
The embodiment of the invention also provides a computer-readable storage medium, which stores computer instructions, and the computer instructions are used for enabling the computer to execute the voice noise analysis method provided by the embodiment of the invention.
The technical scheme of the invention has the following advantages:
the embodiment of the invention provides a voice noise analysis method and a voice noise analysis system, wherein voice data to be analyzed are obtained; extracting a noise audio segment only containing noise from voice data to be analyzed; determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade; and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. Therefore, the noise intensity indexes of the noise audio frequency segments only containing noise are calculated to carry out independent analysis on the noise intensity levels of all the noise audio frequency segments, then the noise level evaluation result of the whole voice data to be analyzed is determined according to the distribution condition of the noise intensity levels of all the noise audio frequency segments, the influence of normal voice in the voice data to be analyzed is avoided, the objective evaluation of the noise level of the voice data to be analyzed is realized, the reference audio frequency is not needed, the application range is wider, and the noise conditions under various scenes can be accurately reflected.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a speech noise analysis method according to an embodiment of the present invention;
fig. 2 is a schematic process diagram of inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain the probability that each audio clip corresponds to only including noise in the embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a speech noise analysis system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical features mentioned in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.
With the high-speed development of the mobile internet, the traditional education mode is gradually replaced by online education, more and more teachers teach students through instant messaging software at present, and therefore the intelligent classroom analysis situation is more convenient. Noise is a factor influencing the classroom learning quality of students, so that the noise needs to be detected to create a quiet environment for the students and ensure the learning effect. However, at present, massive online teaching audios and videos can be generated every day, the workload for manually analyzing the noise condition of each class is huge, the analysis result has great subjectivity, and objective and intelligent analysis of audio noise is particularly necessary.
At present, indexes (such as signal-to-noise ratio, segmented signal-to-noise ratio and the like) for objectively evaluating a noise condition have great limitation in application, namely, when the noise condition of a teaching audio is measured, a reference audio which is exactly the same as the speech content strictly aligned with the teaching audio in time needs to be obtained, which is extremely difficult to obtain in a teaching scene. Therefore, how to realize the noise evaluation under the non-reference condition needs to be solved urgently.
The embodiment of the invention provides a voice noise analysis method, which can be applied to noise analysis of an online teaching platform, and as shown in fig. 1, the voice noise analysis method mainly comprises the following steps:
step S101: and acquiring voice data to be analyzed. Specifically, the speech data to be analyzed is audio data containing noise, such as: the teaching audio recorded on the online teaching platform or corresponding audio data extracted from the teaching video containing voice data and the like. The obtaining method of the voice data to be analyzed may be directly downloading the audio data or extracting from a preset voice database to be analyzed, and the like, and the present invention is not limited thereto.
Step S102: a noise audio piece containing only noise is extracted from the speech data to be analyzed. Specifically, the voice data to be analyzed including noise includes both normal voice and noise, so that to avoid evaluating the noise, reference audio, that is, normal voice is required, and by extracting a noise audio segment only including noise, the noise level such as the sound volume or energy of the audio can be directly extracted to intuitively measure the noise level.
Step S103: and determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade. Specifically, the difference of noise energy or sound volume of different noise audio frequency segments is larger, and each noise audio frequency segment can be compared more intuitively by grading each noise audio frequency segment, so that the noise level of the whole voice data to be analyzed can be evaluated conveniently,
step S104: and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. Specifically, since a plurality of noise audio segments are contained in a complete voice data to be analyzed, in order to improve the accuracy of noise evaluation on the whole voice data to be analyzed, the noise level evaluation result is obtained by considering the distribution of the noise intensity levels of all the noise audio segments, and the objective noise evaluation on the voice data to be analyzed is realized.
Through the steps S101 to S104, the speech noise analysis method provided in the embodiment of the present invention separately analyzes the noise intensity levels of the noise audio segments by calculating the noise intensity indexes of the noise audio segments only including noise, and then determines the noise level evaluation result of the entire speech data to be analyzed according to the distribution of the noise intensity levels of all the noise audio segments, thereby avoiding the influence of normal speech in the speech data to be analyzed, achieving objective evaluation of the noise level of the speech data to be analyzed, requiring no reference audio, having a wider application range, and being capable of accurately reflecting the noise conditions in various scenes.
Specifically, in an embodiment, the step S102 specifically includes the following steps:
step S201: and dividing the voice data to be analyzed into a plurality of audio segments based on the total time length of the voice data to be analyzed and a preset extraction time length period. Specifically, the voice data to be analyzed is divided into a plurality of audio clips with equal length and smaller length according to the time axis of the total length, and the preset extraction length period can be flexibly set according to the total length and the accuracy requirement of noise analysis, for example, 1s, 3s, and the like, which is not limited by the invention.
Step S202: each audio piece is converted to a magnitude spectrum. Specifically, each audio clip is converted into a magnitude spectrum by performing operations such as framing, fourier transform, amplitude calculation, and amplitude normalization on each audio clip.
Step S203: and inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain the probability that each audio clip only contains noise. The preset noise classification model is obtained by a classification model established in advance, wherein the input of the classification model is an audio segment, the output is the probability of predicting that the audio segment only contains noise, and a large number of known audio segments are utilized to train the classification model.
In the embodiment of the invention, as shown in fig. 2, a classification model takes mobilenet-v 2 as a backbone network, further obtains a plurality of depth features of an audio frequency, then aggregates the depth features to obtain dense features of the audio frequency, and finally sends the dense features of the audio frequency to a classifier for classification. The trunk network mobilene-v 2 adopts deep separable convolution to replace the traditional convolution, the reasoning speed is higher, the trunk network mobilene-v 2 is widely used in the industry, and the deep description is not provided; in the characteristic polymerization stage, a more effective characteristic polymerization method NetVLAD Pooling is adopted. Assume that the backbone network gets a depth signature of { x1,x2,…,xTThe intermediate output of NetVLAD Pooling is a K × D matrix V, where K represents a predefined number of clusters, and D represents the dimension of each cluster center, and each row of the matrix V is obtained by the following formula:
Figure BDA0002838532090000081
wherein { wk},{bk},{ckAnd the training parameters are trained together with the classification model. And performing L2 regularization on the matrix V, splicing the matrix V together to obtain the characteristics of NetVLAD Pooling polymerization, and then sending the characteristics into a full connection layer for secondary classification. The whole classification model is trained by adopting a binary cross entropy loss function as a target.
Step S204: and screening the noise audio clips only containing the noise from the audio clips based on a preset probability threshold. Specifically, the probability of the audio segment obtained in step S203 is compared with a preset probability threshold, if the probability value exceeds the threshold, it indicates that the audio segment only contains noise, otherwise, the audio also contains non-noise such as human voice, and then all audio segments determined to only contain noise, that is, noise audio segments, are retained, and other audio segments are discarded.
Specifically, in an embodiment, the step S103 specifically includes the following steps:
step S301: and respectively calculating the noise intensity index corresponding to each noise audio segment. Specifically, the noise intensity index may be an index such as energy or volume of noise, which may reflect a degree of noise, and in the embodiment of the present invention, the noise energy index is adopted, that is, energy of a noise audio segment is calculated. Suppose a noise audio clip is a ═ a1,a2,…,aNExpressing that N is the number of sample points contained in the audio, the energy of the audio is calculated as follows:
Figure BDA0002838532090000091
where energy represents the energy of the audio, t represents the duration of the audio, N is the number of sample points included in the audio, a1,a2,…,aNRepresenting the energy value of each sample point of the audio.
Step S302: and acquiring noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels. Specifically, in the embodiment of the present invention, the preset noise intensity classification levels including the low-intensity noise level, the medium-intensity noise level, and the high-intensity noise level are taken as examples, and the classification basis is taken as the noise intensity index rangel、ThThe energy less than the low threshold is low intensity noise, the energy between the low threshold and the high threshold is medium intensity noise, and the energy greater than the high threshold is high intensity noise.
Step S303: and determining the range of the current noise intensity index corresponding to the current noise audio frequency segment according to the noise intensity index corresponding to the current noise audio frequency segment. The noise energy value calculated in the step S301 and the two energy thresholds T in the step S302l、ThTo determine the current noise level to which it belongsDegree index range.
Step S304: and determining the noise intensity level corresponding to the current noise intensity index range as the noise intensity level of the current noise audio clip. Specifically, assume that the energy value corresponding to the current noise audio segment is A, and Tl<A<ThIf the current noise intensity index range to which the noise audio frequency segment belongs corresponds to the medium-intensity noise level, determining the noise level of the noise audio frequency segment as the medium-intensity noise level.
Specifically, in an embodiment, the step S104 specifically includes the following steps:
step S401: and acquiring the ratio of different noise intensity levels in each noise audio clip. Specifically, the proportion of noise audio segments belonging to low-intensity noise levels in all noise audio segments in the whole voice data to be analyzed in the total number of the noise audio segments is calculated, and the proportion corresponding to the noise audio segments with medium-intensity noise levels and the proportion corresponding to the noise audio segments with high-intensity noise levels are calculated.
Step S402: and determining a noise level evaluation result of the voice data to be analyzed according to the ratios of different noise intensity levels and preset ratio evaluation indexes. Specifically, the preset ratio evaluation index may be set according to actual needs, for example: the noise intensity level with the largest occupation ratio is used as the noise level evaluation result of the voice data to be analyzed, or weights may be set for different noise intensity levels, the weighted occupation ratios are compared, and the noise intensity level with the largest occupation ratio after weighting is used as the noise level evaluation result of the voice data to be analyzed, and the like, which is not limited by the invention.
In the embodiment of the invention, under the condition of paying more attention to high-intensity noise influencing the learning of students, in order to improve the sensitivity of the noise level evaluation result to the high-intensity noise, the proportion of the high-intensity noise level is obtained by integrating the noise level result of each noise audio frequency segment, and the noise level evaluation result of the voice data to be analyzed is determined according to the relation between the proportion of the high-intensity noise level and the proportion range of the preset high-intensity noise level in the preset proportion evaluation index.When the ratio of the high-intensity noise level is smaller than the minimum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is low; when the proportion of the high-intensity noise level is within the preset proportion range of the high-intensity noise level in the preset proportion evaluation index, judging that the noise level evaluation result is that the noise level is moderate; and when the ratio of the high-intensity noise level is larger than the maximum value of the preset ratio range of the high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is that the noise level is high. In practical application, the preset high-intensity noise level ratio range can also be obtained by setting two thresholds T of high and lowL、THTo determine if is less than TLLow overall noise level, between T, representing the speech data to be analyzedLAnd THThe overall noise level between representing the voice data to be analyzed is moderate and is greater than THIndicating that the overall noise level of the speech data to be analyzed is high.
By executing the steps, the voice noise analysis method provided by the embodiment of the invention acquires the voice data to be analyzed; extracting a noise audio segment only containing noise from voice data to be analyzed; determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade; and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. Therefore, the noise intensity indexes of the noise audio frequency segments only containing noise are calculated to carry out independent analysis on the noise intensity levels of all the noise audio frequency segments, then the noise level evaluation result of the whole voice data to be analyzed is determined according to the distribution condition of the noise intensity levels of all the noise audio frequency segments, the influence of normal voice in the voice data to be analyzed is avoided, the objective evaluation of the noise level of the voice data to be analyzed is realized, the reference audio frequency is not needed, the application range is wider, and the noise conditions under various scenes can be accurately reflected.
An embodiment of the present invention further provides a speech noise analysis system, as shown in fig. 3, the speech noise analysis system includes:
the obtaining module 101 is configured to obtain voice data to be analyzed. For details, refer to the related description of step S101 in the above method embodiment, and no further description is provided here.
A noise extraction module 102, configured to extract a noise audio segment that only contains noise from the voice data to be analyzed. For details, refer to the related description of step S102 in the above method embodiment, and no further description is provided here.
The noise estimation module 103 is configured to determine a noise intensity level corresponding to each noise audio segment based on the noise intensity index of each noise audio segment and a preset noise intensity classification level. For details, refer to the related description of step S103 in the above method embodiment, and no further description is provided here.
And the noise statistics module 104 is configured to determine a noise level evaluation result of the speech data to be analyzed according to a distribution condition of the noise intensity levels corresponding to the noise audio segments. For details, refer to the related description of step S104 in the above method embodiment, and no further description is provided here.
Through the cooperative cooperation of the above components, the voice noise analysis system provided by the embodiment of the invention obtains the voice data to be analyzed; extracting a noise audio segment only containing noise from voice data to be analyzed; determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade; and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio clip. Therefore, the noise intensity indexes of the noise audio frequency segments only containing noise are calculated to carry out independent analysis on the noise intensity levels of all the noise audio frequency segments, then the noise level evaluation result of the whole voice data to be analyzed is determined according to the distribution condition of the noise intensity levels of all the noise audio frequency segments, the influence of normal voice in the voice data to be analyzed is avoided, the objective evaluation of the noise level of the voice data to be analyzed is realized, the reference audio frequency is not needed, the application range is wider, and the noise conditions under various scenes can be accurately reflected.
There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 4, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 4 takes the example of being connected by a bus as an example.
Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods in the above-described method embodiments.
The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for speech noise analysis, comprising:
acquiring voice data to be analyzed;
extracting a noise audio segment only containing noise from the voice data to be analyzed;
determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;
and determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio segment.
2. The method according to claim 1, wherein the extracting noise audio segments containing only noise from the speech data to be analyzed comprises:
dividing the voice data to be analyzed into a plurality of audio segments based on the total time length of the voice data to be analyzed and a preset extraction time length period;
converting each audio clip into a magnitude spectrum;
inputting the amplitude spectrum corresponding to each audio clip into a preset noise classification model to obtain the probability of only containing noise corresponding to each audio clip;
and screening the noise audio clips only containing the noise from the audio clips based on a preset probability threshold.
3. The method of claim 1, wherein determining the noise intensity level corresponding to each noise audio segment based on the noise intensity indicator of each noise audio segment and a preset noise intensity classification level comprises:
respectively calculating the noise intensity index corresponding to each noise audio frequency segment;
acquiring noise intensity index ranges corresponding to different noise intensity levels in the preset noise intensity classification levels;
determining a current noise intensity index range corresponding to a current noise audio frequency segment according to a noise intensity index corresponding to the current noise audio frequency segment;
and determining the noise intensity level corresponding to the current noise intensity index range as the noise intensity level of the current noise audio frequency segment.
4. The method according to claim 3, wherein the determining the noise level of the speech data to be analyzed according to the distribution of the noise intensity level corresponding to each of the noise audio segments comprises:
obtaining the ratio of different noise intensity levels in each noise audio segment;
and determining a noise level evaluation result of the voice data to be analyzed according to the ratios of different noise intensity levels and preset ratio evaluation indexes.
5. The method of claim 4, wherein the noise strength level comprises: high intensity noise level, medium intensity noise level, and low intensity noise level.
6. The method according to claim 5, wherein the determining the noise level evaluation result of the speech data to be analyzed according to the ratios of different noise intensity levels and the preset ratio evaluation index comprises:
acquiring the ratio of the high-intensity noise level;
and determining a noise level evaluation result of the voice data to be analyzed according to the relation between the ratio of the high-intensity noise level and the ratio range of the preset high-intensity noise level in the preset ratio evaluation index.
7. The method of claim 6, wherein the noise level assessment result comprises: low, moderate and high noise levels, wherein,
when the ratio of the high-intensity noise level is smaller than the minimum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is low in noise level;
when the proportion of the high-intensity noise level is within the preset proportion range of the high-intensity noise level in the preset proportion evaluation index, judging that the noise level evaluation result is that the noise level is moderate;
and when the ratio of the high-intensity noise level is larger than the maximum value of the ratio range of the preset high-intensity noise level in the preset ratio evaluation index, judging that the noise level evaluation result is that the noise level is high.
8. A speech noise analysis system, comprising:
the acquisition module is used for acquiring voice data to be analyzed;
the noise extraction module is used for extracting a noise audio segment only containing noise from the voice data to be analyzed;
the noise estimation module is used for determining the noise intensity grade corresponding to each noise audio frequency segment based on the noise intensity index of each noise audio frequency segment and the preset noise intensity classification grade;
and the noise statistical module is used for determining a noise level evaluation result of the voice data to be analyzed according to the distribution condition of the noise intensity level corresponding to each noise audio frequency segment.
9. An electronic device, comprising:
a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to thereby perform the method of any one of claims 1-7.
CN202011499230.9A 2020-12-15 2020-12-15 Voice noise analysis method and system Pending CN114639390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011499230.9A CN114639390A (en) 2020-12-15 2020-12-15 Voice noise analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011499230.9A CN114639390A (en) 2020-12-15 2020-12-15 Voice noise analysis method and system

Publications (1)

Publication Number Publication Date
CN114639390A true CN114639390A (en) 2022-06-17

Family

ID=81945389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011499230.9A Pending CN114639390A (en) 2020-12-15 2020-12-15 Voice noise analysis method and system

Country Status (1)

Country Link
CN (1) CN114639390A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497004A (en) * 2024-01-03 2024-02-02 深圳市九天睿芯科技有限公司 Noise level monitoring device and method based on neural network and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497004A (en) * 2024-01-03 2024-02-02 深圳市九天睿芯科技有限公司 Noise level monitoring device and method based on neural network and electronic equipment
CN117497004B (en) * 2024-01-03 2024-04-26 深圳市九天睿芯科技有限公司 Noise level monitoring device and method based on neural network and electronic equipment

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN110197658B (en) Voice processing method and device and electronic equipment
CN112802484B (en) Panda sound event detection method and system under mixed audio frequency
CN109979486B (en) Voice quality assessment method and device
WO2018161763A1 (en) Training method for voice data set, computer device and computer readable storage medium
CN108206027A (en) A kind of audio quality evaluation method and system
CN108962231B (en) Voice classification method, device, server and storage medium
CN103559892A (en) Method and system for evaluating spoken language
CN114220458B (en) Voice recognition method and device based on array hydrophone
CN112990082B (en) Detection and identification method of underwater sound pulse signal
CN114091443A (en) Network information propagation index system construction and quantitative evaluation method and system based on deep learning
CN115798518B (en) Model training method, device, equipment and medium
Schlotterbeck et al. What classroom audio tells about teaching: a cost-effective approach for detection of teaching practices using spectral audio features
CN111915111A (en) Online classroom interaction quality evaluation method and device and terminal equipment
CN115147641A (en) Video classification method based on knowledge distillation and multi-mode fusion
CN114639390A (en) Voice noise analysis method and system
CN112133289B (en) Voiceprint identification model training method, voiceprint identification device, voiceprint identification equipment and voiceprint identification medium
Albuquerque et al. Automatic no-reference speech quality assessment with convolutional neural networks
CN116074721A (en) Method and system for estimating distortion degree
Islam et al. Non-intrusive objective evaluation of speech quality in noisy condition
CN114822557A (en) Method, device, equipment and storage medium for distinguishing different sounds in classroom
CN114302301A (en) Frequency response correction method and related product
CN111951786A (en) Training method and device of voice recognition model, terminal equipment and medium
CN115171724A (en) Speech rate analysis method and system
CN116416456B (en) Self-distillation-based image classification method, system, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination