CN114387991A - Audio data processing method, apparatus, and medium for recognizing field environmental sounds - Google Patents

Audio data processing method, apparatus, and medium for recognizing field environmental sounds Download PDF

Info

Publication number
CN114387991A
CN114387991A CN202111416357.4A CN202111416357A CN114387991A CN 114387991 A CN114387991 A CN 114387991A CN 202111416357 A CN202111416357 A CN 202111416357A CN 114387991 A CN114387991 A CN 114387991A
Authority
CN
China
Prior art keywords
audio
sound
sample
specified type
audio samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111416357.4A
Other languages
Chinese (zh)
Inventor
杨胜男
蔡富东
吕昌峰
刘焕云
郭国信
边竞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Xinxinda Electric Technology Co ltd
Original Assignee
Jinan Xinxinda Electric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Xinxinda Electric Technology Co ltd filed Critical Jinan Xinxinda Electric Technology Co ltd
Priority to CN202111416357.4A priority Critical patent/CN114387991A/en
Publication of CN114387991A publication Critical patent/CN114387991A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/0332Details of processing therefor involving modification of waveforms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an audio data processing method, equipment and a medium for identifying field environment sound, wherein the method comprises the following steps: acquiring an audio sample acquired by sound acquisition equipment; the method comprises the steps of identifying audio samples through a pre-trained identification model to obtain an identification result, selecting audio samples conforming to a first specified type according to the identification result, selecting audio samples conforming to a second specified type as environment audio samples according to the identification result, carrying out data enhancement processing on the audio samples conforming to the second specified type to obtain background audio samples, integrating the background audio samples, and using the background audio samples as training samples of a field environment sound identification model to be trained. The quantity of the sample set is expanded, the diversity of the training samples is greatly enriched, meanwhile, through the recognition accuracy rate of the recognition model before and after data enhancement, more accurate recognition effect can be obtained, accurate recognition of the recognition model to the environmental audio frequency of the electric power facility is guaranteed, and the potential safety hazard of the electric power facility can be further reduced.

Description

Audio data processing method, apparatus, and medium for recognizing field environmental sounds
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to an audio data processing method, device, and medium for recognizing field environmental sounds.
Background
With the development of power systems, more and more power facilities are put into construction. However, power facilities such as transmission lines and towers in outdoor scenes often have various safety hazards, such as lightning strikes, icing, bird damage and the like. If the potential safety hazards are not paid attention and effectively managed in time, the normal operation of the power system can be damaged, and the social and economic development is seriously influenced.
In the prior art, in order to accurately identify various potential safety hazards, workers often adopt a voice identification technology to identify and early warn the potential safety hazards. However, power facilities such as transmission lines and towers are usually built in outdoor scenes with complex background sounds, so the collected sound samples have the problems of high identification difficulty and low identification accuracy.
Disclosure of Invention
In order to solve the above problems, that is, to solve the problems of high recognition difficulty and low recognition accuracy of the collected sound samples in the field environment sound recognition, the present application provides an audio data processing method, device and medium for recognizing field environment sounds, including:
in a first aspect, the present application provides an audio data processing method for identifying a field environmental sound, including: acquiring an audio sample acquired by sound acquisition equipment; the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry; selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound; performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation; and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
In one example, the method includes the steps of identifying the audio sample through a pre-trained identification model to obtain an identification result, selecting an audio sample conforming to a first specified type according to the identification result, and using the audio sample as an environmental audio sample, and specifically includes: inputting the audio samples into a pre-trained recognition model to obtain recognition results, wherein the recognition results comprise sound types corresponding to all sections of audio in the audio samples; according to the identification result, aiming at any one section of the audio in the audio samples, determining that the section of audio contains more than three sound types, and deleting the section of audio; according to the identification result, regarding the corresponding sound type in the rest audio samples which accords with a first specified sound type as an environmental audio sample, wherein the first specified sound type at least comprises one of the following types: thunder and rain, animal cry.
In one example, after the remaining audio samples, of which the corresponding sound type conforms to the first specified type, are taken as the environmental audio samples according to the recognition result, the method further includes: acquiring the position coordinates and the operation time period of the sound acquisition equipment, and determining whether the position coordinates are in the operation time period and thunderstorm weather exists through inquiry; acquiring the occurrence time period of the thunderstorm weather, and acquiring an audio sample in the occurrence time period; carrying out decibel intensity detection on the audio samples in the occurrence time period to obtain a detection result, and selecting the samples with the decibel intensities larger than a first preset threshold value as enhancement samples according to the detection result; adding the enhancement sample to the environmental audio sample.
In one example, before performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, the method further includes: classifying the audio samples conforming to the second specified type according to the sound type according to the second specified type to obtain a plurality of groups of classified audio samples; for each of the plurality of groups of classified audio samples, obtaining a plurality of waveform signals corresponding to the group of classified audio samples, performing mean processing on the plurality of waveform signals corresponding to the classified audio samples to obtain a processed audio sample, and taking the processed audio sample as an audio template corresponding to the group of classified audio samples; and obtaining a plurality of groups of audio templates corresponding to the plurality of groups of classified audio samples respectively, and replacing the audio samples conforming to the second specified type with the plurality of groups of audio templates.
In one example, the performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample specifically includes: acquiring the waveform signal of the audio sample conforming to the second specified type, and performing random rolling processing on the waveform signal of the audio sample conforming to the second specified type along a time axis corresponding to the time domain range in a preset time domain range to obtain a first processed audio sample; collecting a pitch value of the audio sample after the first processing, and performing pull-up processing on the pitch value according to a second preset threshold value to obtain a second processed audio sample; and taking the second processed audio sample as a background audio sample.
In one example, after performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, the method further includes: acquiring a network audio sample according with the second specified type according to the second specified type, wherein the network audio sample is an audio sample which is acquired from a network and is the same as the second specified type; optimizing the background audio sample by the second specified type of network audio sample in a manner that includes at least one of: audio splicing and audio superposition.
In one example, after integrating the environmental audio sample and the background audio sample as a training sample of the field environmental sound recognition model to be trained, the method further includes: carrying out supervision training on the to-be-trained field environment sound recognition model through the training sample to obtain a first field environment sound recognition model; carrying out supervision training on the to-be-trained field environment sound recognition model through the audio sample to obtain a second field environment sound recognition model; acquiring accident case information of a field power facility, and determining accident inducement information and audio information before an accident according to the accident case information; inputting the audio information before the accident to the first field environment sound identification model and the second field environment sound identification model respectively to obtain a first identification result and a second identification result; and comparing the first identification result and the second identification result with the accident incentive information respectively to obtain a comparison result, and obtaining the identification accuracy of the first field environment sound identification model and the identification accuracy of the second field environment sound identification model according to the comparison result.
In one example, after comparing the first identification result and the second identification result with the accident incentive information respectively to obtain comparison results, and obtaining the identification accuracy of the first field environmental sound identification model and the identification accuracy of the second field environmental sound identification model according to the comparison results, the method further includes: determining whether the recognition accuracy of the first field environment sound recognition model is larger than a preset multiple, and determining the recognition accuracy of the second field environment sound recognition model; if so, determining that the training effect corresponding to the training sample is qualified; if not, acquiring the audio sample again through the sound acquisition equipment, and expanding the training sample through the audio sample to obtain an expanded training sample; and retraining the first field environment sound recognition model through the expanded training sample.
In another aspect, the present application further provides an audio data processing device for recognizing a field environmental sound, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to: acquiring an audio sample acquired by sound acquisition equipment; the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry; selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound; performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation; and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
In another aspect, the present application also provides a non-transitory computer storage medium storing computer-executable instructions configured to: acquiring an audio sample acquired by sound acquisition equipment; the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry; selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound; performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation; and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
The audio data processing method, the equipment and the medium for identifying the field environment sound provided by the application can bring the following beneficial effects: the method has the advantages that the technology of extracting field environment sound and enhancing data is adopted, the audio fusion method is provided, data processing is directly carried out on collected audio samples, the number of sample sets is expanded, the diversity of training samples is greatly enriched, and meanwhile, through the recognition accuracy of recognition models before and after data enhancement, more accurate recognition effects can be obtained, accurate recognition of the recognition models on the environment audio frequency of the electric power facility can be guaranteed, and the potential safety hazard of the electric power facility can be further reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic flow chart of an audio data processing method for recognizing a field environmental sound according to an embodiment of the present application;
fig. 2 is a schematic diagram of an audio data processing device for recognizing field environmental sounds in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that, an audio data processing method for recognizing a field environmental sound described in the present application is stored in a corresponding system or server in a program form according to a flow designed in the present application. In the embodiment of the present application, a system is taken as an example for explanation, and the system is disposed in a corresponding terminal, where the terminal includes but is not limited to: the mobile phone, the tablet computer, the computer or other terminal equipment with corresponding computing power and functions. The system can determine the interaction relation between the system and the hardware equipment in the terminal through the corresponding programming language so as to call resources or transmit information for each hardware equipment in the terminal. In addition, the starting mode of the system can be determined through a corresponding software setting mode, and the starting mode includes but is not limited to direct opening, opening through APP, opening through a form of logging in a corresponding WEB webpage and the like, so that the requirement of a user on operation use, monitoring or debugging of the system is met, and further the corresponding technical scheme recorded in the application is realized.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an audio data processing method for identifying a field environmental sound provided by an embodiment of the present application includes:
s101: and acquiring an audio sample acquired by the sound acquisition equipment.
Specifically, power facilities such as transmission lines, towers under the outdoor scene can be provided with sound collection equipment, and this sound collection equipment is used for gathering near various audio data of power facilities, and in this application embodiment, the concrete model of sound collection equipment does not do specifically and restricts, can adopt any kind of sound collection equipment that can satisfy this application technical scheme among the prior art.
Furthermore, the audio data near the electric power facility can be acquired by starting the recording mode of the sound acquisition equipment. In the embodiment of the present application, in order to facilitate the production of the training sample set that is finally generated, it is necessary to collect audio samples in a uniform format.
By correspondingly setting a back-end program of the sound acquisition equipment, the sound acquisition equipment is determined to acquire audio samples in a 44100 sampling frequency, single-channel and 16-bit PCM coding format, and by the setting mode, one audio sample can be obtained every ten seconds, namely the duration of each audio sample is ten seconds.
According to the setting, the system can obtain the audio sample collected by the sound collection equipment.
S102: the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry.
It should be noted that, in the embodiment of the present application, millions of audio samples may be obtained through the step S101, but compared to the task of identifying the outdoor environmental sounds, the number of audio samples that need to be paid attention to in mass data is very small, manual screening is time-consuming and labor-consuming, and it is also easy to cause misjudgment only by sensing the sound type manually, so a scheme for data screening needs to be provided.
In order to solve the problems, the identification model is introduced, namely millions of audio samples can be identified through the identification model, so that the audio samples of the required types can be obtained through rapid screening.
In the embodiment of the present application, the pre-trained recognition model may adopt an open-source PANNs network model, and the PANNs network model may be trained through a large-scale audio data set AudioSet. The large-scale audio data set AudioSet comprises 632 audio categories and more than two million pieces of sound clips which are ten seconds long and have been manually marked with the audio categories, and the sound clips cover a wide range of sound types such as human and animal sounds and daily environmental sounds.
Specifically, the system inputs the acquired audio samples to the pre-trained recognition model to obtain recognition results, wherein the recognition results include sound types corresponding to each section of audio in the audio samples. The recognition model based on pre-training is trained through an AudioSet data set, so that multiple sound types can be recognized within the range of audio categories contained in the AudioSet data set.
Further, according to the recognition result, the system determines that the section of audio contains more than three sound types aiming at any section of the sections of audio in the audio sample, and deletes the section of audio. Since the audio segment has too many sound types and has no referential property, deleting the audio with more than three sound types is beneficial to acquiring training samples with higher precision.
Further, according to the recognition result, the system takes the corresponding sound type in the remaining audio samples, which conforms to the first specified sound type, as the environmental audio sample, wherein the first specified sound type at least includes one of the following types: thunder and rain, animal cry. In the embodiment of the application, thunderstorm and various animals are easy to bring certain hidden dangers to electric power facilities, so that the selection of the audio samples of the two sound types is particularly important. It should be further noted that, in the embodiment of the present application, since the influence and the hidden danger of birds on electric power facilities are large, audio samples including various bird sounds may be preferentially selected.
Further, the system may further include the following technical solution after taking the corresponding sound type in the remaining audio samples, which conforms to the first specified type, as the environmental audio sample according to the recognition result:
the system acquires the position coordinates and the operation time period of the sound collection equipment, and determines whether the position coordinates are in the operation time period and thunderstorm weather exists through inquiry. Specifically, the query mode may be set as: according to the position coordinates of the sound collection equipment, weather information near the position coordinates is inquired, the coverage time period of the weather information is long, and in the embodiment of the application, the weather information of the past fifteen days is collected; and then, selecting weather information in the operation time period according to the operation time period of the sound collection equipment.
The system can determine that the thunderstorm weather exists according to the weather information in the operation time period. Namely acquiring the occurrence period of thunderstorm weather and acquiring audio samples in the occurrence period.
At this time, it can be determined that an audio signal corresponding to thunderstorm weather exists in the audio samples in the occurrence time period, in order to ensure the accuracy of the samples, decibel intensity detection needs to be performed on the audio samples in the occurrence time period to obtain a detection result, and according to the detection result, a sample with decibel intensity greater than a first preset threshold value is selected as an enhancement sample.
Furthermore, the system adds the enhancement sample into the environment audio sample, and the accuracy of the environment audio sample can be further improved by adding the enhancement sample, so that the training effect of the subsequent recognition model is improved.
S103: selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind noise, noisy sound.
It should be noted that, when sound waves reach the sound collection device through air propagation, different propagation paths exist in different scenes or different spaces, and due to complexity of the scenes or the spaces, the sound waves are refracted, diffracted and reflected, and are finally superposed together and collected by the sound collection device. Because the influence factors of the scene or the space are multivariate, the change of any relative position among the sound source, various obstacles of the scene or the space and the sound acquisition equipment can cause the change of the acquired audio samples, and further influence the expression form of the sound signals, so that the audio samples acquired by different scenes or spaces have great difference.
In the open-air environment at electric power facility place, bring all kinds of sound sources of potential safety hazard owing to have very big difference and change, very difficult discernment and confirm, and open-air environment is spacious relatively, and the propagation of sound wave can disperse, and consequently, the audio sample attenuation that sound collection equipment gathered is comparatively serious.
Therefore, in order to solve the above problems, the present application adds the background audio sample, where the background audio sample may include a sound signal that is easily affected by silence, wind noise, and noisy noise, and trains the recognition model by adding the background audio sample to achieve a higher recognition effect.
Specifically, the audio samples conforming to the second specified type are selected according to the recognition result obtained in step S102, where the second specified type at least includes one of the following types: silence, wind noise, noisy sound.
S104: performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation.
Specifically, the system collects waveform signals of the audio samples conforming to the second specified type, and carries out random rolling processing on the waveform signals of the audio samples conforming to the second specified type along a time axis corresponding to the time domain range in a preset time domain range to obtain the audio samples after the first processing. In the random scroll processing, that is, the waveform displacement, the change state of the sound with time can be represented by the waveform signal, which is a common representation method of the sound in the time domain, the waveform signal is scrolled randomly along the time axis to generate a new signal different from the original signal, that is, the waveform signal is scrolled in the range of ten seconds, for example, the waveform signal of the next five seconds is scrolled to the first second, and the intensity of the waveform signal at the first second is enhanced, so that the waveform signal in the whole range of ten seconds is enhanced.
Further, the system collects the pitch value of the audio sample after the first processing, and performs pull-up processing on the pitch value according to a second preset threshold value to obtain an audio sample after the second processing. Pitch is one of three major characteristics of sound, and is distinguished from tone intensity and timbre, and pitch refers to the height of sound, i.e., pitch, which is different for each tone. The second process here, pitch conversion, can raise the pitch value of the original sound without affecting the speed of sound by raising the pitch value, and therefore, the duration of the pitch-converted audio sample does not change.
Further, the second processed audio sample is taken as a background audio sample. Because there are all kinds of complex factors in the field environment, consequently there is the intensity difference in the audio sample that accords with the second specified type, and the problem that representativeness is low, in this application embodiment, through introducing first processing and second processing, also waveform displacement and pitch change, can effectively promote the eigenvalue of audio sample, and then change and be discerned, regard as training sample simultaneously, can reach better training effect.
In addition, because the number of the audio samples conforming to the second specified type is very large, and there is a great difficulty in performing data enhancement processing on a large number of audio samples conforming to the second specified type, before performing data enhancement processing, a unique audio template is established for each type of the second specified type by a corresponding technical scheme, so as to reduce the number of the audio samples conforming to the second specified type.
Specifically, before performing data enhancement processing on an audio sample conforming to a second specified type to obtain a background audio sample, the method further includes:
and the system classifies the audio samples conforming to the second specified type according to the sound type according to the second specified type to obtain a plurality of groups of classified audio samples. In the embodiment of the present application, since the second specified type includes three classes, three groups of classified audio samples are obtained. Wherein each group of classified audio samples comprises a plurality of audio samples.
Further, the system obtains a plurality of waveform signals corresponding to the classified audio samples for each of the plurality of groups of classified audio samples, performs mean processing on the plurality of waveform signals corresponding to the classified audio samples to obtain processed audio samples, and uses the processed audio samples as the mean-processed audio samples uniquely corresponding to the classified audio samples to serve as the audio templates corresponding to the classified audio samples. The averaging process here is to superimpose a plurality of waveform signals, take the average value of each time point according to the number of the waveform signals, and finally generate a unique corresponding audio sample.
Further, the system obtains a plurality of groups of audio templates corresponding to the plurality of groups of classified audio samples, namely three audio templates, and replaces the audio samples conforming to the specified types with the plurality of groups of audio templates.
In addition, after the data enhancement processing is performed on the audio sample conforming to the second specified type to obtain the background audio sample, the method further includes:
and the system acquires network audio samples according with a second specified type, wherein the network audio samples are the same as the second specified type and acquired from the network.
Furthermore, the system performs optimization processing on the background audio sample through the network audio sample of the second specified type, and the optimization processing mode at least includes one of the following modes: audio splicing and audio superposition.
Specifically, the representativeness of the background audio sample is further improved, and the background audio sample is optimized by using the network audio sample.
The audio splicing refers to splicing the network audio sample and the corresponding background audio sample of the same type on a time domain axis, namely, the time domain length of the background audio sample is increased, the number of features in the background audio sample is further increased, and the subsequent training effect is improved.
The audio superposition refers to the mixing superposition of the network audio sample and the corresponding background audio sample of the same type in a time domain range, and the processing can improve the waveform signal strength at the time point with a lower characteristic value in the background audio sample.
Through the optimization, richer background audio samples can be obtained, the number of samples in sample concentration is increased, the diversity of the samples is also increased, the accuracy of identification can be effectively improved through the optimization, and meanwhile, the training effect of the model is improved.
S105: and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
The system integrates the environmental audio sample and the background audio sample, and after the environmental audio sample and the background audio sample are used as training samples of the field environmental sound recognition model to be trained, the system further comprises:
and carrying out supervision training on the field environment sound recognition model to be trained through the training sample to obtain a first field environment sound recognition model.
And carrying out supervision training on the field environment sound identification model to be trained through the audio sample to obtain a second field environment sound identification model. It should be noted that the audio samples herein are audio samples collected by the sound collection device without any processing.
For verifying the recognition effect of the first field environment sound recognition model, the application introduces a corresponding verification scheme, which specifically comprises:
the system acquires accident case information of the field power facility, and determines accident inducement information and audio information before an accident according to the accident case information.
And respectively inputting the audio information before the accident to the first field environment sound identification model and the second field environment sound identification model to obtain a first identification result and a second identification result. It should be noted that the accident case information may include a plurality of pieces, and the first recognition result and the second recognition result may include a plurality of pieces of recognition results.
And comparing the first identification result and the second identification result with the accident incentive information respectively to obtain a comparison result, and obtaining the identification accuracy of the first field environment sound identification model and the identification accuracy of the second field environment sound identification model according to the comparison result.
Further, whether the recognition accuracy of the first field environment sound recognition model is larger than a preset multiple or not and the recognition accuracy of the second field environment sound recognition model are determined.
In the embodiment of the application, the recognition accuracy of the first field environment sound recognition model is 20% higher than that of the second field environment sound recognition model.
And if so, determining that the training effect corresponding to the training sample is qualified.
If not, the audio sample is collected again through the sound collection equipment, and the training sample is expanded through the audio sample to obtain an expanded training sample.
And retraining the first field environment sound recognition model through the expanded training sample until the recognition accuracy of the first field environment sound model is 20% higher than that of the second field environment sound recognition model.
In one embodiment, as shown in fig. 2, the present application further provides an audio data processing device for recognizing a field environmental sound, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform instructions for:
acquiring an audio sample acquired by sound acquisition equipment;
the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry;
selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound;
performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation;
and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
In one embodiment, the present application further proposes a non-transitory computer storage medium storing computer-executable instructions configured to:
acquiring an audio sample acquired by sound acquisition equipment;
the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry;
selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound;
performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation;
and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and media embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for relevant points.
The device and the medium provided by the embodiment of the application correspond to the method one to one, so the device and the medium also have the similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the method are explained in detail above, so the beneficial technical effects of the device and the medium are not repeated herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An audio data processing method for recognizing a field environmental sound, comprising:
acquiring an audio sample acquired by sound acquisition equipment;
the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry;
selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound;
performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation;
and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
2. The audio data processing method for identifying field environmental sounds according to claim 1, wherein the audio samples are identified by a pre-trained identification model to obtain an identification result, and the audio samples conforming to a first specified type are selected according to the identification result as the environmental audio samples, and specifically comprises:
inputting the audio samples into a pre-trained recognition model to obtain recognition results, wherein the recognition results comprise sound types corresponding to all sections of audio in the audio samples;
according to the identification result, aiming at any one section of the audio in the audio samples, determining that the section of audio contains more than three sound types, and deleting the section of audio;
according to the identification result, regarding the corresponding sound type in the rest audio samples which accords with a first specified sound type as an environmental audio sample, wherein the first specified sound type at least comprises one of the following types: thunder and rain, animal cry.
3. The audio data processing method for identifying the field environmental sound according to claim 2, wherein after the audio samples with the corresponding sound type conforming to the first specified type are the remaining audio samples as the environmental audio samples according to the identification result, the method further comprises:
acquiring the position coordinates and the operation time period of the sound acquisition equipment, and determining whether the position coordinates are in the operation time period and thunderstorm weather exists through inquiry;
acquiring the occurrence time period of the thunderstorm weather, and acquiring an audio sample in the occurrence time period;
carrying out decibel intensity detection on the audio samples in the occurrence time period to obtain a detection result, and selecting the samples with the decibel intensities larger than a first preset threshold value as enhancement samples according to the detection result;
adding the enhancement sample to the environmental audio sample.
4. The audio data processing method for identifying a wild environmental sound according to claim 1, wherein before the audio samples conforming to the second specified type are subjected to data enhancement processing to obtain background audio samples, the method further comprises:
classifying the audio samples conforming to the second specified type according to the sound type according to the second specified type to obtain a plurality of groups of classified audio samples;
for each of the plurality of groups of classified audio samples, obtaining a plurality of waveform signals corresponding to the group of classified audio samples, performing mean processing on the plurality of waveform signals corresponding to the classified audio samples to obtain a processed audio sample, and taking the processed audio sample as an audio template corresponding to the group of classified audio samples;
and obtaining a plurality of groups of audio templates corresponding to the plurality of groups of classified audio samples respectively, and replacing the audio samples conforming to the second specified type with the plurality of groups of audio templates.
5. The audio data processing method for recognizing field environmental sounds according to claim 1, wherein the data enhancement processing is performed on the audio samples conforming to the second specified type to obtain background audio samples, and specifically comprises:
acquiring the waveform signal of the audio sample conforming to the second specified type, and performing random rolling processing on the waveform signal of the audio sample conforming to the second specified type along a time axis corresponding to the time domain range in a preset time domain range to obtain a first processed audio sample;
collecting a pitch value of the audio sample after the first processing, and performing pull-up processing on the pitch value according to a second preset threshold value to obtain a second processed audio sample;
and taking the second processed audio sample as a background audio sample.
6. The audio data processing method for identifying a wild environmental sound according to claim 1, wherein the audio samples conforming to the second specified type are subjected to data enhancement processing, and after obtaining background audio samples, the method further comprises:
acquiring a network audio sample according with the second specified type according to the second specified type, wherein the network audio sample is an audio sample which is acquired from a network and is the same as the second specified type;
optimizing the background audio sample by the second specified type of network audio sample in a manner that includes at least one of: audio splicing and audio superposition.
7. The audio data processing method for identifying field environmental sounds according to claim 1, wherein the environmental audio samples and the background audio samples are integrated as training samples of a field environmental sound identification model to be trained, and the method further comprises:
carrying out supervision training on the to-be-trained field environment sound recognition model through the training sample to obtain a first field environment sound recognition model;
carrying out supervision training on the to-be-trained field environment sound recognition model through the audio sample to obtain a second field environment sound recognition model;
acquiring accident case information of a field power facility, and determining accident inducement information and audio information before an accident according to the accident case information;
inputting the audio information before the accident to the first field environment sound identification model and the second field environment sound identification model respectively to obtain a first identification result and a second identification result;
and comparing the first identification result and the second identification result with the accident incentive information respectively to obtain a comparison result, and obtaining the identification accuracy of the first field environment sound identification model and the identification accuracy of the second field environment sound identification model according to the comparison result.
8. The audio data processing method for identifying a field environmental sound according to claim 7, wherein the first identification result and the second identification result are compared with the accident inducement information respectively to obtain comparison results, and after the identification accuracy of the first field environmental sound identification model and the identification accuracy of the second field environmental sound identification model are obtained according to the comparison results, the method further comprises:
determining whether the recognition accuracy of the first field environment sound recognition model is larger than a preset multiple, and determining the recognition accuracy of the second field environment sound recognition model;
if so, determining that the training effect corresponding to the training sample is qualified;
if not, acquiring the audio sample again through the sound acquisition equipment, and expanding the training sample through the audio sample to obtain an expanded training sample;
and retraining the first field environment sound recognition model through the expanded training sample.
9. Audio data processing apparatus for identifying sounds in a field environment, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform instructions for:
acquiring an audio sample acquired by sound acquisition equipment;
the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry;
selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound;
performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation;
and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
10. A non-transitory computer storage medium storing computer-executable instructions, the computer-executable instructions configured to:
acquiring an audio sample acquired by sound acquisition equipment;
the audio samples are identified through a pre-trained identification model to obtain an identification result, and audio samples conforming to a first specified type are selected according to the identification result and serve as environmental audio samples, wherein the first specified type at least comprises one of the following types: thunder and rain, animal cry;
selecting audio samples according with a second specified type according to the identification result, wherein the second specified type at least comprises one of the following types: silence, wind, noisy sound;
performing data enhancement processing on the audio sample conforming to the second specified type to obtain a background audio sample, wherein the data enhancement processing mode at least comprises one of the following modes: waveform displacement and treble transformation;
and integrating the environmental audio sample and the background audio sample to be used as a training sample of the field environmental sound recognition model to be trained.
CN202111416357.4A 2021-11-25 2021-11-25 Audio data processing method, apparatus, and medium for recognizing field environmental sounds Pending CN114387991A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111416357.4A CN114387991A (en) 2021-11-25 2021-11-25 Audio data processing method, apparatus, and medium for recognizing field environmental sounds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111416357.4A CN114387991A (en) 2021-11-25 2021-11-25 Audio data processing method, apparatus, and medium for recognizing field environmental sounds

Publications (1)

Publication Number Publication Date
CN114387991A true CN114387991A (en) 2022-04-22

Family

ID=81196124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111416357.4A Pending CN114387991A (en) 2021-11-25 2021-11-25 Audio data processing method, apparatus, and medium for recognizing field environmental sounds

Country Status (1)

Country Link
CN (1) CN114387991A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275519A (en) * 2023-11-22 2023-12-22 珠海高凌信息科技股份有限公司 Voice type identification correction method, system, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275519A (en) * 2023-11-22 2023-12-22 珠海高凌信息科技股份有限公司 Voice type identification correction method, system, device and medium
CN117275519B (en) * 2023-11-22 2024-02-13 珠海高凌信息科技股份有限公司 Voice type identification correction method, system, device and medium

Similar Documents

Publication Publication Date Title
Knight et al. Recommendations for acoustic recognizer performance assessment with application to five common automated signal recognition programs
CN109065031B (en) Voice labeling method, device and equipment
CN108366045B (en) Method and device for setting wind control scoring card
CN101710490B (en) Method and device for compensating noise for voice assessment
CN110782920B (en) Audio recognition method and device and data processing equipment
Brodie et al. Automated species identification of frog choruses in environmental recordings using acoustic indices
CN110880329A (en) Audio identification method and equipment and storage medium
Joshi et al. Comparing manual and automated species recognition in the detection of four common south-east Australian forest birds from digital field recordings
CN117095694B (en) Bird song recognition method based on tag hierarchical structure attribute relationship
CN113707173B (en) Voice separation method, device, equipment and storage medium based on audio segmentation
CN106302987A (en) A kind of audio frequency recommends method and apparatus
Gontier et al. Polyphonic training set synthesis improves self-supervised urban sound classification
CN103534755A (en) Speech processor, speech processing method, program and integrated circuit
CN111402920A (en) Surge audio identification method and device, terminal and storage medium
CN110503960A (en) Uploaded in real time method, apparatus, equipment and the storage medium of speech recognition result
Keen et al. Automated detection of low-frequency rumbles of forest elephants: A critical tool for their conservation
CN111709775A (en) House property price evaluation method and device, electronic equipment and storage medium
Dong et al. A novel representation of bioacoustic events for content-based search in field audio data
CN111081223A (en) Voice recognition method, device, equipment and storage medium
CN104143342A (en) Voiceless sound and voiced sound judging method and device and voice synthesizing system
CN113077821A (en) Audio quality detection method and device, electronic equipment and storage medium
CN114387991A (en) Audio data processing method, apparatus, and medium for recognizing field environmental sounds
CN111724769A (en) Production method of intelligent household voice recognition model
CN116884435A (en) Voice event detection method and device based on audio prompt learning
CN116168727A (en) Transformer abnormal sound detection method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination