CN114974229A

CN114974229A - Method and system for extracting abnormal behaviors based on audio data of power field operation

Info

Publication number: CN114974229A
Application number: CN202210576129.1A
Authority: CN
Inventors: 利雅琳; 毛焱; 王天师; 樊志伟; 张宝星; 张春梅; 高杨; 李明; 刘惠华; 吴金珠; 熊伟
Original assignee: Guangdong Power Grid Co Ltd; Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-30

Abstract

The invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation, wherein the extraction method comprises the steps of collecting data according to historical voice data generated by various operation behaviors in the process of power field operation; determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set; training an acoustic identification model by using the labeled data set; recognizing voice data transmitted in the operation field based on the trained acoustic recognition model; and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior. The method and the device train the voice recognition model to detect the voice keywords through the voice keywords of various standard behaviors, so that the abnormal behaviors are recognized.

Description

Method and system for extracting abnormal behaviors based on audio data of power field operation

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a method and a system for extracting abnormal behaviors based on audio data of power field operation.

Background

With the increasing expansion of the power grid scale, the power operation activities become frequent, and the lean and modern management requirements of power supply enterprises in new situations are difficult to meet by the traditional manual on-site supervision and inspection and the management and control mode reviewed afterwards. The electric power enterprise has urgent needs to establish an electric power operation field visualization and intelligent management and control platform, and more efficient and intelligent cooperative supervision and management are carried out on the electric power operation field.

Human errors are one of the main causes of electric power system accidents, and the development requirement of the national smart grid is to reduce human errors in grid operation as much as possible. With the development of artificial intelligence and the technology of the Internet of things, a technical basis is provided for automation and intelligent monitoring of an electric power operation site. On one hand, the collection of data of a working site becomes diversified, and in the face of how to more efficiently utilize the data to serve the automatic monitoring of us, the current power industry is still in a stage of exploring and practicing. Most of the current research on monitoring intelligence focuses on the analysis and processing of video data, but lacks the utilization of voice signals. Therefore, the problem of how to identify abnormal behaviors in the power field operation by using the voice signal, such as the fact that the work ticket is not read normally, becomes an urgent problem to be solved.

Disclosure of Invention

In view of the above, the present invention is directed to solve the problem that the current power field operation lacks the recognition of abnormal behavior in the operation process by using voice signals.

In order to solve the technical problems, the invention provides the following technical scheme:

in a first aspect, the present invention provides a method for extracting abnormal behavior based on audio data of power field operation, including the following steps:

acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;

determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set;

training an acoustic identification model by using the labeled data set;

recognizing voice data transmitted in the operation field based on the trained acoustic recognition model;

and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior.

Further, the data set comprises a voice data set and a noise data set, and when the operation behavior is to read the work ticket text, the data acquisition is specifically carried out according to historical voice data generated by reading the work ticket text in the power field operation process:

acquiring a filled-in work ticket text from a work ticket historical database as voice content;

constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;

the noise data set includes noise selected from a preset noise data set, the noise including cafe noise, white noise, restaurant noise, and factory floor noise.

Further, when the job behavior is to read the work ticket text, the keyword set specifically includes:

work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.

Further, before recognizing the speech data transmitted by the job site based on the trained acoustic recognition model, the method further includes:

and denoising the voice data transmitted in the operation field by using an adaptive filter.

Further, the acoustic recognition model adopts a CNN neural network model, and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model specifically includes:

extracting MFCC features from voice data transmitted from a job site, wherein the MFCC features comprise MFCC coefficients and first-order differences and second-order differences of the voice data;

MFCC features of the speech data are input into the CNN neural network model to identify keywords.

In a second aspect, the present invention provides a system for extracting abnormal behavior based on audio data of power field operation, including:

the data acquisition unit is used for acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;

the data processing unit is used for determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement and labeling the acquired data set based on the keyword set;

the voice recognition model training unit is used for training the acoustic recognition model by using the labeled data set;

the voice recognition unit is used for recognizing voice data transmitted in the operation field based on the trained acoustic recognition model;

and the abnormal behavior judging unit is used for carrying out keyword detection on the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, the behavior corresponding to the current voice data is considered to be abnormal behavior.

Further, the data acquisition unit includes: a voice data acquisition unit and a noise data acquisition unit;

when the operation behavior is to read the work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from the work ticket historical database as voice content; constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;

the noise data collection unit is used for selecting noise from a preset noise data set, wherein the noise comprises coffee house noise, white noise, restaurant noise and factory workshop noise.

Further, still include: a voice denoising unit;

the voice denoising unit is used for denoising the voice data transmitted in the operation field by using the self-adaptive filter.

Further, the acoustic recognition model adopts a CNN neural network model, and the speech recognition unit specifically includes: the device comprises a feature extraction unit and an acoustic recognition unit;

the feature extraction unit is used for extracting MFCC features from voice data transmitted in a job site, wherein the MFCC features comprise MFCC coefficients and first-order difference and second-order difference of the voice data;

the acoustic recognition unit is used for inputting MFCC characteristics of the voice data into the CNN neural network model to recognize keywords.

In conclusion, the invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation, wherein the extraction method comprises the steps of collecting data according to historical voice data generated by various operation behaviors in the process of power field operation; determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set; training an acoustic identification model by using the labeled data set; recognizing voice data transmitted in the operation field based on the trained acoustic recognition model; and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior. The invention trains the voice recognition model by utilizing the voice keywords of various standard behaviors in the power field operation process, and detects whether the field voice meets the standard requirements by applying a voice recognition method, thereby realizing the recognition of abnormal behaviors.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a method for extracting abnormal behavior based on audio data of power field operation according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a CNN network according to an embodiment of the present invention;

fig. 3 is a flowchart of voice keyword detection according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Human errors are one of the main causes of accidents in power systems, and the development requirement of national smart grids is to reduce human errors in grid operation as much as possible. With the development of artificial intelligence and the technology of the Internet of things, a technical basis is provided for automation and intelligent monitoring of an electric power operation site. On one hand, the collection of data of a working site becomes diversified, and in the face of how to more efficiently utilize the data to serve the automatic monitoring of us, the current power industry is still in a stage of exploring and practicing. Most of the current research on monitoring intelligence focuses on the analysis and processing of video data, but lacks the utilization of voice signals. Therefore, the problem of how to identify abnormal behaviors in the power field operation by using the voice signal, such as the fact that the work ticket is not read normally, becomes an urgent problem to be solved.

Based on the above, the embodiment of the invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation.

An embodiment of the method for extracting abnormal behavior based on audio data of power field operation according to the present invention is described in detail below.

Referring to fig. 1, the present embodiment provides a method for extracting abnormal behavior based on audio data of power field operation, including:

s100: and acquiring data according to historical voice data generated by various operation behaviors in the power field operation process.

It should be noted that the data set of the present embodiment includes a voice data set and a noise data set. Taking the noisy THCHS-30 and NoiseX-92 data sets as examples, cafe noise and white noise in the THCHS-30 data set may be selected, and restaurant noise and plant floor noise representative of the NoiseX-92 data set may be selected. The speech dataset is a self-constructed dataset.

According to the regulations of the two tickets of the electric power operation, before all the electric power operations, the operation responsible person must read the work ticket of the operation to all the operators, and clearly and specifically refer to the work place, the work task, the work time, the safety measures and the attention matters. In the actual implementation process, the way of collecting the historical voice data of the propaganda and reading work ticket text is to obtain the filled-in work ticket text as the voice content from the work ticket historical database of the power company. And selecting 30 speakers, obtaining 8200 voices in total, wherein the voice sampling rate is 8KHz, 90% of the voice samples are completely read according to the work ticket text, and the rest 10% of the voice samples are read according to the non-work ticket text. The size of the speech data set may be determined according to actual requirements and is used here only as an illustration and not as a limitation.

S200: and determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set.

After the voice data set is collected, all the recording files need to be labeled to form a text data set. By analyzing the text data of the electric work ticket, we have selected the following keywords as the identification content: "work place", "work task", "planned work time", "work condition", "attention", "safety measure", "barrier", "signboard", and "ground wire". The phoneme level of the 9 keyword speech training data is labeled by using a flat-start method, and the generated training data phoneme is stored as phones0. mf file. It should be noted that, for other operation behaviors of the power field, the keyword set can be determined according to the voice data or text data generated under the more standard behavior, that is, the behavior at this time can be considered as the behavior required by the standard when determining which vocabularies appear.

S300: and training the acoustic identification model by using the labeled data set.

The acoustic recognition model of this embodiment may adopt a CNN neural network model, and the feature matrix extracted by the MFCC is used as an input of the CNN neural network model, and the recognition model is completed through training. The neural network type of the embodiment is CNN, there are five convolutional layers and three fully-connected layers, a hierarchical structure formed by the five convolutional layers can effectively extract useful features from input information, and the useful features extracted by the convolutional layers are sent to the following three fully-connected layers to obtain the category of the voice signal. The schematic diagram of the network structure is shown in fig. 2. The CNN architecture has five convolutional layers and three fully-connected layers, wherein the convolutional layers include convolution operation, batch normalization and activation functions. The first convolution layer, the second convolution layer and the fifth convolution layer also comprise maximum pooling operation, and the convolved output characteristic diagram is down-sampled through the maximum pooling operation, so that the stability of the numerical value is ensured.

Acoustic feature extraction of MFCC is required before training CNN neural network models with the text data set formed by the labels. Converting the waveform file of the keyword voice into a feature vector sequence file through MFCC feature extraction, carrying out parameterization operation on voice data through HCopy command words, designating a configuration file config corresponding to coding, and automatically generating an mfc file of an MFCC vector. MFCC feature extraction is detailed in subsequent steps.

And then, performing keyword classification training on the MFCC data by adopting a CNN neural network model. Useful features can be effectively extracted from input information by utilizing a hierarchical structure formed by five convolutional layers of CNN, the useful features extracted by the convolutional layers are sent to the following three full-connection layers, and a classification model whether a voice signal contains 9 keywords or not is obtained.

S400: and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model.

In an electric power operation site, the periodic operation of the electrical equipment can generate colored noise with high energy, and the colored noise has great influence on voice signals. The traditional characteristic parameters are easy to have malformed changes in a noise environment, so that the training model is not ideal in precision. Considering that the applied environment is real-time voice monitoring analysis of an electric power operation field, and requirements for noise resistance and real-time performance are high, the embodiment adopts the adaptive filtering technology to perform denoising processing on voice data transmitted in the operation field before voice recognition.

The adaptive filter is specifically a Least Mean Square (LMS) adaptive filter. The input to the system is a noise-contaminated speech signal x (k). The filter takes the noise as the expected signal, denoted as d (n), and the error signal e (n) between the adaptive filter output y (n) and the signal d (n) is the estimate of the desired target signal.

Specifically, the LMS filter is calculated as follows:

1) where w (0) is equal to 0 and has a value of 1 < mu < 1/lambda _max ，λ _max For input signals by means of an autocorrelation matrix R _xx The maximum eigenvalue of (d);

2) computing output y (k) ═ W ^T X (k) and estimation error: e (k) ═ d (k) -y (k);

3) the filter coefficient at the next time is calculated, W (k +1) ═ W (k) - μ · e (k) · x (k). Due to the fact that the LMS has high requirements for step length selection, the step length is large, tracking failure can be caused, the step length is small, the speed of model convergence can be influenced, the voice noise reduction duration is prolonged, and therefore the instantaneity of the whole voice recognition is influenced. The invention enables the model to have faster convergence by improving the weight updating formula of the LMS filter, and the weight updating formula is as follows:

where μ is the adjustment step size and α is a constant that prevents the denominator from being zero.

The method for carrying out voice recognition based on the convolutional neural network on the voice after denoising processing comprises two parts: feature extraction and acoustic recognition models. The characteristic extraction part processes the voice signals to obtain useful characteristics for the training of the neural network. And the acoustic recognition model inputs the data after the characteristic extraction into the CNN, and the CNN is trained to correctly recognize the keywords.

The feature extraction part adopts an MFCC feature extraction algorithm, and comprises the following specific steps:

1) format conversion: converting a voice file with a non-WAV format into a WAV file with a 16k, 16bit and single channel;

2) pre-emphasis: by H (z) 1-muz ^-1 A high-pass filter (z is 0.96) pre-emphasizes the voice signal, improves the high-frequency part, removes the frequency spectrum tilt, and compensates the high-frequency part of the voice signal restrained by the pronunciation system;

3) framing: the speech signal is framed, the frame length N is set to 400 sampling points, and the frame duration is 25 ms. In order to realize smooth transition between voice frames and guarantee the continuity of time, an overlapping area of 10ms is set between adjacent frames;

4) windowing: each frame is multiplied by a hamming window, which increases the continuity of the left and right ends of the frame.

5) FFT: and carrying out fast Fourier transform on the windowed voice signal to obtain the frequency spectrum of each frame.

6) Filtering: the filter bank employed by the conventional method is a triangular filter (mel filtering). In order to simplify the calculation process and facilitate the implementation, the invention adopts a matrix filter bank.

7) The logarithmic energy of each filter bank output is calculated.

8) The MFCC coefficients are obtained through Discrete Cosine Transform (DCT).

9) And extracting dynamic difference parameters, wherein the standard cepstrum parameters MFCC only reflect the static characteristics of the voice parameters, and the dynamic characteristics of the voice can be described by using the difference spectrum of the static characteristics, wherein a first-order difference and a second-order difference are extracted.

And inputting the MFCC coefficients of the voice signals and the first-order difference and the second-order difference into the CNN neural network model as the feature matrix extracted by the MFCC.

S500: and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior.

It should be noted that keyword detection may employ a keyword recognition technology based on a lattice confusion network. The overall process is based on the steps that denoising processing is carried out on a voice signal to be detected, ADC processing is carried out, MFCC acoustic features are extracted, the extracted acoustic features are combined with a CNN neural network model to carry out speaker self-adaptive model decoding, an index network is built according to decoding results, keyword searching is carried out in the index network, and finally the searched detection results are obtained. The overall detection flow is shown in fig. 3.

The index is established by combining the extracted acoustic features with a CNN acoustic model, decoding the extracted voice features, finding a state sequence matched with the observation sequence and constructing a lattice confusion network. Calculating the posterior probability of each word according to the likelihood score information in lattice, then accumulating the posterior probabilities of the same words appearing in the same time period as the final score, and finally summarizing by using L independent linked lists according to the sequence of the posterior probabilities from large to small to construct an index.

Wherein, the keyword search is realized by a token passing algorithm. Aiming at a single keyword, searching directly according to the index; for multiple words, a single word is first searched and then filtered according to the correct word order and short time intervals. The lattice contains the pinyin information of each character, in order to improve the recall rate, an index can be directly established on the pinyin during retrieval, the pinyin of the word in the keyword list is used for searching during the searching, and finally the word information is considered when the confidence coefficient evaluation is calculated.

Wherein, the confidence evaluation is as follows: and (4) scoring the posterior probability of the search result passing through lattice, and outputting the result when the score is larger than a threshold value.

And finally, analyzing the output keyword detection result, if the keyword detection result meets the standard requirement, determining that the corresponding behavior is not abnormal, otherwise, determining that the corresponding behavior is abnormal.

The embodiment provides a method for extracting abnormal behaviors based on audio data of power field operation, which is applied to intelligent monitoring of a power field operation. According to the rule of the electric power operation, the operation responsible person must read the work ticket of the operation before all the electric power operations. In the past, the supervision of reading the work ticket can only be supervised by on-site supervision personnel on site, and the method can only adopt a sampling inspection mode due to low efficiency, but cannot carry out extensive and full-coverage on-site operation supervision.

In order to solve the problem, in the embodiment, the speech recognition model is trained by using the speech keywords of various standard behaviors in the power field operation process, and whether the field speech meets the standard requirements is detected by applying a speech recognition method, so that the abnormal behavior is recognized. Specifically, the electric power operation site has its own particularity, such as that the operation environment is noisy in the process, and the patrol inspection personnel are not specific personnel, in order to solve the voice recognition problem in the complex operation environment, the voice recognition is performed by using the LMS adaptive denoising and CNN neural network based on the inherent characteristics of artificial voice and background noise, and the keyword recognition is performed by using the lattice confusion network based on LMS adaptive denoising and CNN neural network. By applying the technology, the voice data of the field workers in the power operation can be analyzed in real time, and whether the field workers have the problems of irregular work ticket propaganda and reading and the like can be analyzed by setting corresponding keyword search.

The above is a detailed description of an embodiment of the method for extracting abnormal behavior based on audio data of power field operation according to the present invention, and the following is a detailed description of an embodiment of the system for extracting abnormal behavior based on audio data of power field operation according to the present invention.

The invention provides a system for extracting abnormal behaviors based on audio data of power field operation, which comprises: the device comprises a data acquisition unit, a data processing unit, a voice recognition model training unit, a voice recognition unit and an abnormal behavior judgment unit.

In this embodiment, the data acquisition unit is used for acquiring data according to historical voice data generated by various operation behaviors in the power field operation process.

It should be noted that the data acquisition unit includes a voice data acquisition unit and a noise data acquisition unit;

when the job behavior is to read the work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from the work ticket historical database as voice content; constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;

In this embodiment, the data processing unit is configured to determine a keyword set that appears when speech content generated by various job behaviors meets specification requirements, and perform tagging processing on a collected data set based on the keyword set.

It should be noted that, when the job behavior is to read the work ticket text, the keyword set specifically includes:

In this embodiment, the speech recognition model training unit is configured to train the acoustic recognition model using the labeled data set.

It should be noted that the acoustic recognition model adopts a CNN neural network model, and the speech recognition unit specifically includes: the device comprises a feature extraction unit and an acoustic recognition unit;

In this embodiment, the speech recognition unit is configured to recognize speech data transmitted by the job site based on a trained acoustic recognition model.

In this embodiment, the abnormal behavior determination unit is configured to perform keyword detection on the recognized voice data, and if the detected keyword does not match the keyword set meeting the specification requirement, it is determined that the behavior corresponding to the current voice data is an abnormal behavior.

Further, the method also comprises the following steps: a voice denoising unit;

It should be noted that, the abnormal behavior extraction system provided in this embodiment is used to implement the abnormal behavior extraction method in the foregoing embodiment, and the specific settings of each unit are subject to complete implementation of the method, which is not described herein again.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. The method for extracting the abnormal behavior based on the audio data of the power field operation is characterized by comprising the following steps of:

determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set;

training an acoustic identification model by using the labeled data set;

2. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 1, wherein the data set comprises a voice data set and a noise data set, and when the operation behavior is to read a work ticket text, the data acquisition according to historical voice data generated by reading the work ticket text in the process of power field operation is specifically as follows:

constructing a language sample by utilizing the speech content which is declared by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the declaration of the non-work ticket text;

3. The method for extracting abnormal behavior based on audio data of power field operation according to claim 2, wherein when the operation behavior is a reading of a work ticket text, the keyword set specifically includes:

4. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 2, wherein before recognizing the voice data transmitted by the operation field based on the trained acoustic recognition model, the method further comprises:

5. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 1, wherein the acoustic recognition model adopts a CNN neural network model, and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model specifically comprises:

extracting MFCC features from voice data transmitted at a job site, the MFCC features including MFCC coefficients and first and second order differences for the voice data;

inputting MFCC features of the speech data into the CNN neural network model to identify keywords.

6. System based on electric power field operation audio data draws unusual action which characterized in that includes:

a speech recognition model training unit for training a acoustic recognition model using the labeled data set;

the voice recognition unit is used for recognizing voice data transmitted in an operation field based on the trained acoustic recognition model;

7. The system for extracting abnormal behavior based on audio data of power field operation according to claim 6, wherein the data acquisition unit comprises: a voice data acquisition unit and a noise data acquisition unit;

when the operation behavior is to read a work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from a work ticket historical database as a voice content; constructing a language sample by utilizing the speech content which is declared by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the declaration of the non-work ticket text;

the noise data acquisition unit is configured to select noise from a preset noise data set, the noise including cafe noise, white noise, restaurant noise, and factory floor noise.

8. The system for extracting abnormal behavior based on audio data of power field operation according to claim 7, wherein when the operation behavior is a reading of a work ticket text, the keyword set specifically includes:

9. The system for extracting abnormal behavior based on audio data of power field operation according to claim 7, further comprising: a voice denoising unit;

10. The system for extracting abnormal behavior based on audio data of power field operation according to claim 6, wherein the acoustic recognition model is a CNN neural network model, and the voice recognition unit specifically comprises: the device comprises a feature extraction unit and an acoustic recognition unit;