CN114974229A - Method and system for extracting abnormal behaviors based on audio data of power field operation - Google Patents
Method and system for extracting abnormal behaviors based on audio data of power field operation Download PDFInfo
- Publication number
- CN114974229A CN114974229A CN202210576129.1A CN202210576129A CN114974229A CN 114974229 A CN114974229 A CN 114974229A CN 202210576129 A CN202210576129 A CN 202210576129A CN 114974229 A CN114974229 A CN 114974229A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- noise
- voice data
- power field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 44
- 230000006399 behavior Effects 0.000 claims abstract description 43
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000002372 labelling Methods 0.000 claims abstract description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 26
- 238000003062 neural network model Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 238000012544 monitoring process Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation, wherein the extraction method comprises the steps of collecting data according to historical voice data generated by various operation behaviors in the process of power field operation; determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set; training an acoustic identification model by using the labeled data set; recognizing voice data transmitted in the operation field based on the trained acoustic recognition model; and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior. The method and the device train the voice recognition model to detect the voice keywords through the voice keywords of various standard behaviors, so that the abnormal behaviors are recognized.
Description
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a method and a system for extracting abnormal behaviors based on audio data of power field operation.
Background
With the increasing expansion of the power grid scale, the power operation activities become frequent, and the lean and modern management requirements of power supply enterprises in new situations are difficult to meet by the traditional manual on-site supervision and inspection and the management and control mode reviewed afterwards. The electric power enterprise has urgent needs to establish an electric power operation field visualization and intelligent management and control platform, and more efficient and intelligent cooperative supervision and management are carried out on the electric power operation field.
Human errors are one of the main causes of electric power system accidents, and the development requirement of the national smart grid is to reduce human errors in grid operation as much as possible. With the development of artificial intelligence and the technology of the Internet of things, a technical basis is provided for automation and intelligent monitoring of an electric power operation site. On one hand, the collection of data of a working site becomes diversified, and in the face of how to more efficiently utilize the data to serve the automatic monitoring of us, the current power industry is still in a stage of exploring and practicing. Most of the current research on monitoring intelligence focuses on the analysis and processing of video data, but lacks the utilization of voice signals. Therefore, the problem of how to identify abnormal behaviors in the power field operation by using the voice signal, such as the fact that the work ticket is not read normally, becomes an urgent problem to be solved.
Disclosure of Invention
In view of the above, the present invention is directed to solve the problem that the current power field operation lacks the recognition of abnormal behavior in the operation process by using voice signals.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides a method for extracting abnormal behavior based on audio data of power field operation, including the following steps:
acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;
determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set;
training an acoustic identification model by using the labeled data set;
recognizing voice data transmitted in the operation field based on the trained acoustic recognition model;
and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior.
Further, the data set comprises a voice data set and a noise data set, and when the operation behavior is to read the work ticket text, the data acquisition is specifically carried out according to historical voice data generated by reading the work ticket text in the power field operation process:
acquiring a filled-in work ticket text from a work ticket historical database as voice content;
constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;
the noise data set includes noise selected from a preset noise data set, the noise including cafe noise, white noise, restaurant noise, and factory floor noise.
Further, when the job behavior is to read the work ticket text, the keyword set specifically includes:
work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.
Further, before recognizing the speech data transmitted by the job site based on the trained acoustic recognition model, the method further includes:
and denoising the voice data transmitted in the operation field by using an adaptive filter.
Further, the acoustic recognition model adopts a CNN neural network model, and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model specifically includes:
extracting MFCC features from voice data transmitted from a job site, wherein the MFCC features comprise MFCC coefficients and first-order differences and second-order differences of the voice data;
MFCC features of the speech data are input into the CNN neural network model to identify keywords.
In a second aspect, the present invention provides a system for extracting abnormal behavior based on audio data of power field operation, including:
the data acquisition unit is used for acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;
the data processing unit is used for determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement and labeling the acquired data set based on the keyword set;
the voice recognition model training unit is used for training the acoustic recognition model by using the labeled data set;
the voice recognition unit is used for recognizing voice data transmitted in the operation field based on the trained acoustic recognition model;
and the abnormal behavior judging unit is used for carrying out keyword detection on the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, the behavior corresponding to the current voice data is considered to be abnormal behavior.
Further, the data acquisition unit includes: a voice data acquisition unit and a noise data acquisition unit;
when the operation behavior is to read the work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from the work ticket historical database as voice content; constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;
the noise data collection unit is used for selecting noise from a preset noise data set, wherein the noise comprises coffee house noise, white noise, restaurant noise and factory workshop noise.
Further, when the job behavior is to read the work ticket text, the keyword set specifically includes:
work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.
Further, still include: a voice denoising unit;
the voice denoising unit is used for denoising the voice data transmitted in the operation field by using the self-adaptive filter.
Further, the acoustic recognition model adopts a CNN neural network model, and the speech recognition unit specifically includes: the device comprises a feature extraction unit and an acoustic recognition unit;
the feature extraction unit is used for extracting MFCC features from voice data transmitted in a job site, wherein the MFCC features comprise MFCC coefficients and first-order difference and second-order difference of the voice data;
the acoustic recognition unit is used for inputting MFCC characteristics of the voice data into the CNN neural network model to recognize keywords.
In conclusion, the invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation, wherein the extraction method comprises the steps of collecting data according to historical voice data generated by various operation behaviors in the process of power field operation; determining a keyword set which appears when voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set; training an acoustic identification model by using the labeled data set; recognizing voice data transmitted in the operation field based on the trained acoustic recognition model; and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior. The invention trains the voice recognition model by utilizing the voice keywords of various standard behaviors in the power field operation process, and detects whether the field voice meets the standard requirements by applying a voice recognition method, thereby realizing the recognition of abnormal behaviors.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a method for extracting abnormal behavior based on audio data of power field operation according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a CNN network according to an embodiment of the present invention;
fig. 3 is a flowchart of voice keyword detection according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
With the increasing expansion of the power grid scale, the power operation activities become frequent, and the lean and modern management requirements of power supply enterprises in new situations are difficult to meet by the traditional manual on-site supervision and inspection and the management and control mode reviewed afterwards. The electric power enterprise has urgent needs to establish an electric power operation field visualization and intelligent management and control platform, and more efficient and intelligent cooperative supervision and management are carried out on the electric power operation field.
Human errors are one of the main causes of accidents in power systems, and the development requirement of national smart grids is to reduce human errors in grid operation as much as possible. With the development of artificial intelligence and the technology of the Internet of things, a technical basis is provided for automation and intelligent monitoring of an electric power operation site. On one hand, the collection of data of a working site becomes diversified, and in the face of how to more efficiently utilize the data to serve the automatic monitoring of us, the current power industry is still in a stage of exploring and practicing. Most of the current research on monitoring intelligence focuses on the analysis and processing of video data, but lacks the utilization of voice signals. Therefore, the problem of how to identify abnormal behaviors in the power field operation by using the voice signal, such as the fact that the work ticket is not read normally, becomes an urgent problem to be solved.
Based on the above, the embodiment of the invention provides a method and a system for extracting abnormal behaviors based on audio data of power field operation.
An embodiment of the method for extracting abnormal behavior based on audio data of power field operation according to the present invention is described in detail below.
Referring to fig. 1, the present embodiment provides a method for extracting abnormal behavior based on audio data of power field operation, including:
s100: and acquiring data according to historical voice data generated by various operation behaviors in the power field operation process.
It should be noted that the data set of the present embodiment includes a voice data set and a noise data set. Taking the noisy THCHS-30 and NoiseX-92 data sets as examples, cafe noise and white noise in the THCHS-30 data set may be selected, and restaurant noise and plant floor noise representative of the NoiseX-92 data set may be selected. The speech dataset is a self-constructed dataset.
According to the regulations of the two tickets of the electric power operation, before all the electric power operations, the operation responsible person must read the work ticket of the operation to all the operators, and clearly and specifically refer to the work place, the work task, the work time, the safety measures and the attention matters. In the actual implementation process, the way of collecting the historical voice data of the propaganda and reading work ticket text is to obtain the filled-in work ticket text as the voice content from the work ticket historical database of the power company. And selecting 30 speakers, obtaining 8200 voices in total, wherein the voice sampling rate is 8KHz, 90% of the voice samples are completely read according to the work ticket text, and the rest 10% of the voice samples are read according to the non-work ticket text. The size of the speech data set may be determined according to actual requirements and is used here only as an illustration and not as a limitation.
S200: and determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set.
After the voice data set is collected, all the recording files need to be labeled to form a text data set. By analyzing the text data of the electric work ticket, we have selected the following keywords as the identification content: "work place", "work task", "planned work time", "work condition", "attention", "safety measure", "barrier", "signboard", and "ground wire". The phoneme level of the 9 keyword speech training data is labeled by using a flat-start method, and the generated training data phoneme is stored as phones0. mf file. It should be noted that, for other operation behaviors of the power field, the keyword set can be determined according to the voice data or text data generated under the more standard behavior, that is, the behavior at this time can be considered as the behavior required by the standard when determining which vocabularies appear.
S300: and training the acoustic identification model by using the labeled data set.
The acoustic recognition model of this embodiment may adopt a CNN neural network model, and the feature matrix extracted by the MFCC is used as an input of the CNN neural network model, and the recognition model is completed through training. The neural network type of the embodiment is CNN, there are five convolutional layers and three fully-connected layers, a hierarchical structure formed by the five convolutional layers can effectively extract useful features from input information, and the useful features extracted by the convolutional layers are sent to the following three fully-connected layers to obtain the category of the voice signal. The schematic diagram of the network structure is shown in fig. 2. The CNN architecture has five convolutional layers and three fully-connected layers, wherein the convolutional layers include convolution operation, batch normalization and activation functions. The first convolution layer, the second convolution layer and the fifth convolution layer also comprise maximum pooling operation, and the convolved output characteristic diagram is down-sampled through the maximum pooling operation, so that the stability of the numerical value is ensured.
Acoustic feature extraction of MFCC is required before training CNN neural network models with the text data set formed by the labels. Converting the waveform file of the keyword voice into a feature vector sequence file through MFCC feature extraction, carrying out parameterization operation on voice data through HCopy command words, designating a configuration file config corresponding to coding, and automatically generating an mfc file of an MFCC vector. MFCC feature extraction is detailed in subsequent steps.
And then, performing keyword classification training on the MFCC data by adopting a CNN neural network model. Useful features can be effectively extracted from input information by utilizing a hierarchical structure formed by five convolutional layers of CNN, the useful features extracted by the convolutional layers are sent to the following three full-connection layers, and a classification model whether a voice signal contains 9 keywords or not is obtained.
S400: and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model.
In an electric power operation site, the periodic operation of the electrical equipment can generate colored noise with high energy, and the colored noise has great influence on voice signals. The traditional characteristic parameters are easy to have malformed changes in a noise environment, so that the training model is not ideal in precision. Considering that the applied environment is real-time voice monitoring analysis of an electric power operation field, and requirements for noise resistance and real-time performance are high, the embodiment adopts the adaptive filtering technology to perform denoising processing on voice data transmitted in the operation field before voice recognition.
The adaptive filter is specifically a Least Mean Square (LMS) adaptive filter. The input to the system is a noise-contaminated speech signal x (k). The filter takes the noise as the expected signal, denoted as d (n), and the error signal e (n) between the adaptive filter output y (n) and the signal d (n) is the estimate of the desired target signal.
Specifically, the LMS filter is calculated as follows:
1) where w (0) is equal to 0 and has a value of 1 < mu < 1/lambda max ,λ max For input signals by means of an autocorrelation matrix R xx The maximum eigenvalue of (d);
2) computing output y (k) ═ W T X (k) and estimation error: e (k) ═ d (k) -y (k);
3) the filter coefficient at the next time is calculated, W (k +1) ═ W (k) - μ · e (k) · x (k). Due to the fact that the LMS has high requirements for step length selection, the step length is large, tracking failure can be caused, the step length is small, the speed of model convergence can be influenced, the voice noise reduction duration is prolonged, and therefore the instantaneity of the whole voice recognition is influenced. The invention enables the model to have faster convergence by improving the weight updating formula of the LMS filter, and the weight updating formula is as follows:
where μ is the adjustment step size and α is a constant that prevents the denominator from being zero.
The method for carrying out voice recognition based on the convolutional neural network on the voice after denoising processing comprises two parts: feature extraction and acoustic recognition models. The characteristic extraction part processes the voice signals to obtain useful characteristics for the training of the neural network. And the acoustic recognition model inputs the data after the characteristic extraction into the CNN, and the CNN is trained to correctly recognize the keywords.
The feature extraction part adopts an MFCC feature extraction algorithm, and comprises the following specific steps:
1) format conversion: converting a voice file with a non-WAV format into a WAV file with a 16k, 16bit and single channel;
2) pre-emphasis: by H (z) 1-muz -1 A high-pass filter (z is 0.96) pre-emphasizes the voice signal, improves the high-frequency part, removes the frequency spectrum tilt, and compensates the high-frequency part of the voice signal restrained by the pronunciation system;
3) framing: the speech signal is framed, the frame length N is set to 400 sampling points, and the frame duration is 25 ms. In order to realize smooth transition between voice frames and guarantee the continuity of time, an overlapping area of 10ms is set between adjacent frames;
4) windowing: each frame is multiplied by a hamming window, which increases the continuity of the left and right ends of the frame.
5) FFT: and carrying out fast Fourier transform on the windowed voice signal to obtain the frequency spectrum of each frame.
6) Filtering: the filter bank employed by the conventional method is a triangular filter (mel filtering). In order to simplify the calculation process and facilitate the implementation, the invention adopts a matrix filter bank.
7) The logarithmic energy of each filter bank output is calculated.
8) The MFCC coefficients are obtained through Discrete Cosine Transform (DCT).
9) And extracting dynamic difference parameters, wherein the standard cepstrum parameters MFCC only reflect the static characteristics of the voice parameters, and the dynamic characteristics of the voice can be described by using the difference spectrum of the static characteristics, wherein a first-order difference and a second-order difference are extracted.
And inputting the MFCC coefficients of the voice signals and the first-order difference and the second-order difference into the CNN neural network model as the feature matrix extracted by the MFCC.
S500: and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior.
It should be noted that keyword detection may employ a keyword recognition technology based on a lattice confusion network. The overall process is based on the steps that denoising processing is carried out on a voice signal to be detected, ADC processing is carried out, MFCC acoustic features are extracted, the extracted acoustic features are combined with a CNN neural network model to carry out speaker self-adaptive model decoding, an index network is built according to decoding results, keyword searching is carried out in the index network, and finally the searched detection results are obtained. The overall detection flow is shown in fig. 3.
The index is established by combining the extracted acoustic features with a CNN acoustic model, decoding the extracted voice features, finding a state sequence matched with the observation sequence and constructing a lattice confusion network. Calculating the posterior probability of each word according to the likelihood score information in lattice, then accumulating the posterior probabilities of the same words appearing in the same time period as the final score, and finally summarizing by using L independent linked lists according to the sequence of the posterior probabilities from large to small to construct an index.
Wherein, the keyword search is realized by a token passing algorithm. Aiming at a single keyword, searching directly according to the index; for multiple words, a single word is first searched and then filtered according to the correct word order and short time intervals. The lattice contains the pinyin information of each character, in order to improve the recall rate, an index can be directly established on the pinyin during retrieval, the pinyin of the word in the keyword list is used for searching during the searching, and finally the word information is considered when the confidence coefficient evaluation is calculated.
Wherein, the confidence evaluation is as follows: and (4) scoring the posterior probability of the search result passing through lattice, and outputting the result when the score is larger than a threshold value.
And finally, analyzing the output keyword detection result, if the keyword detection result meets the standard requirement, determining that the corresponding behavior is not abnormal, otherwise, determining that the corresponding behavior is abnormal.
The embodiment provides a method for extracting abnormal behaviors based on audio data of power field operation, which is applied to intelligent monitoring of a power field operation. According to the rule of the electric power operation, the operation responsible person must read the work ticket of the operation before all the electric power operations. In the past, the supervision of reading the work ticket can only be supervised by on-site supervision personnel on site, and the method can only adopt a sampling inspection mode due to low efficiency, but cannot carry out extensive and full-coverage on-site operation supervision.
In order to solve the problem, in the embodiment, the speech recognition model is trained by using the speech keywords of various standard behaviors in the power field operation process, and whether the field speech meets the standard requirements is detected by applying a speech recognition method, so that the abnormal behavior is recognized. Specifically, the electric power operation site has its own particularity, such as that the operation environment is noisy in the process, and the patrol inspection personnel are not specific personnel, in order to solve the voice recognition problem in the complex operation environment, the voice recognition is performed by using the LMS adaptive denoising and CNN neural network based on the inherent characteristics of artificial voice and background noise, and the keyword recognition is performed by using the lattice confusion network based on LMS adaptive denoising and CNN neural network. By applying the technology, the voice data of the field workers in the power operation can be analyzed in real time, and whether the field workers have the problems of irregular work ticket propaganda and reading and the like can be analyzed by setting corresponding keyword search.
The above is a detailed description of an embodiment of the method for extracting abnormal behavior based on audio data of power field operation according to the present invention, and the following is a detailed description of an embodiment of the system for extracting abnormal behavior based on audio data of power field operation according to the present invention.
The invention provides a system for extracting abnormal behaviors based on audio data of power field operation, which comprises: the device comprises a data acquisition unit, a data processing unit, a voice recognition model training unit, a voice recognition unit and an abnormal behavior judgment unit.
In this embodiment, the data acquisition unit is used for acquiring data according to historical voice data generated by various operation behaviors in the power field operation process.
It should be noted that the data acquisition unit includes a voice data acquisition unit and a noise data acquisition unit;
when the job behavior is to read the work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from the work ticket historical database as voice content; constructing a language sample by using the speech content which is announced and read by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the text of the announcement and reading non-work ticket;
the noise data collection unit is used for selecting noise from a preset noise data set, wherein the noise comprises coffee house noise, white noise, restaurant noise and factory workshop noise.
In this embodiment, the data processing unit is configured to determine a keyword set that appears when speech content generated by various job behaviors meets specification requirements, and perform tagging processing on a collected data set based on the keyword set.
It should be noted that, when the job behavior is to read the work ticket text, the keyword set specifically includes:
work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.
In this embodiment, the speech recognition model training unit is configured to train the acoustic recognition model using the labeled data set.
It should be noted that the acoustic recognition model adopts a CNN neural network model, and the speech recognition unit specifically includes: the device comprises a feature extraction unit and an acoustic recognition unit;
the feature extraction unit is used for extracting MFCC features from voice data transmitted in a job site, wherein the MFCC features comprise MFCC coefficients and first-order difference and second-order difference of the voice data;
the acoustic recognition unit is used for inputting MFCC characteristics of the voice data into the CNN neural network model to recognize keywords.
In this embodiment, the speech recognition unit is configured to recognize speech data transmitted by the job site based on a trained acoustic recognition model.
In this embodiment, the abnormal behavior determination unit is configured to perform keyword detection on the recognized voice data, and if the detected keyword does not match the keyword set meeting the specification requirement, it is determined that the behavior corresponding to the current voice data is an abnormal behavior.
Further, the method also comprises the following steps: a voice denoising unit;
the voice denoising unit is used for denoising the voice data transmitted in the operation field by using the self-adaptive filter.
It should be noted that, the abnormal behavior extraction system provided in this embodiment is used to implement the abnormal behavior extraction method in the foregoing embodiment, and the specific settings of each unit are subject to complete implementation of the method, which is not described herein again.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. The method for extracting the abnormal behavior based on the audio data of the power field operation is characterized by comprising the following steps of:
acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;
determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement, and labeling the acquired data set based on the keyword set;
training an acoustic identification model by using the labeled data set;
recognizing voice data transmitted in the operation field based on the trained acoustic recognition model;
and detecting keywords of the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, determining that the behavior corresponding to the current voice data is abnormal behavior.
2. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 1, wherein the data set comprises a voice data set and a noise data set, and when the operation behavior is to read a work ticket text, the data acquisition according to historical voice data generated by reading the work ticket text in the process of power field operation is specifically as follows:
acquiring a filled-in work ticket text from a work ticket historical database as voice content;
constructing a language sample by utilizing the speech content which is declared by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the declaration of the non-work ticket text;
the noise data set includes noise selected from a preset noise data set, the noise including cafe noise, white noise, restaurant noise, and factory floor noise.
3. The method for extracting abnormal behavior based on audio data of power field operation according to claim 2, wherein when the operation behavior is a reading of a work ticket text, the keyword set specifically includes:
work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.
4. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 2, wherein before recognizing the voice data transmitted by the operation field based on the trained acoustic recognition model, the method further comprises:
and denoising the voice data transmitted in the operation field by using an adaptive filter.
5. The method for extracting abnormal behaviors based on audio data of power field operation according to claim 1, wherein the acoustic recognition model adopts a CNN neural network model, and recognizing the voice data transmitted in the operation field based on the trained acoustic recognition model specifically comprises:
extracting MFCC features from voice data transmitted at a job site, the MFCC features including MFCC coefficients and first and second order differences for the voice data;
inputting MFCC features of the speech data into the CNN neural network model to identify keywords.
6. System based on electric power field operation audio data draws unusual action which characterized in that includes:
the data acquisition unit is used for acquiring data according to historical voice data generated by various operation behaviors in the power field operation process;
the data processing unit is used for determining a keyword set which appears when the voice content generated by various operation behaviors meets the standard requirement and labeling the acquired data set based on the keyword set;
a speech recognition model training unit for training a acoustic recognition model using the labeled data set;
the voice recognition unit is used for recognizing voice data transmitted in an operation field based on the trained acoustic recognition model;
and the abnormal behavior judging unit is used for carrying out keyword detection on the recognized voice data, and if the detected keywords are not matched with the keyword set meeting the standard requirement, the behavior corresponding to the current voice data is considered to be abnormal behavior.
7. The system for extracting abnormal behavior based on audio data of power field operation according to claim 6, wherein the data acquisition unit comprises: a voice data acquisition unit and a noise data acquisition unit;
when the operation behavior is to read a work ticket text, the voice data acquisition unit is specifically used for acquiring the filled-in work ticket text from a work ticket historical database as a voice content; constructing a language sample by utilizing the speech content which is declared by a plurality of speakers according to the standard requirement, wherein the language sample also comprises the content of the declaration of the non-work ticket text;
the noise data acquisition unit is configured to select noise from a preset noise data set, the noise including cafe noise, white noise, restaurant noise, and factory floor noise.
8. The system for extracting abnormal behavior based on audio data of power field operation according to claim 7, wherein when the operation behavior is a reading of a work ticket text, the keyword set specifically includes:
work sites, work tasks, scheduled work hours, work conditions, notes, safety measures, canopies, signs, and ground wires.
9. The system for extracting abnormal behavior based on audio data of power field operation according to claim 7, further comprising: a voice denoising unit;
the voice denoising unit is used for denoising the voice data transmitted in the operation field by using the self-adaptive filter.
10. The system for extracting abnormal behavior based on audio data of power field operation according to claim 6, wherein the acoustic recognition model is a CNN neural network model, and the voice recognition unit specifically comprises: the device comprises a feature extraction unit and an acoustic recognition unit;
the feature extraction unit is used for extracting MFCC features from voice data transmitted in a job site, wherein the MFCC features comprise MFCC coefficients and first-order difference and second-order difference of the voice data;
the acoustic recognition unit is used for inputting MFCC characteristics of the voice data into the CNN neural network model to recognize keywords.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210576129.1A CN114974229A (en) | 2022-05-25 | 2022-05-25 | Method and system for extracting abnormal behaviors based on audio data of power field operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210576129.1A CN114974229A (en) | 2022-05-25 | 2022-05-25 | Method and system for extracting abnormal behaviors based on audio data of power field operation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114974229A true CN114974229A (en) | 2022-08-30 |
Family
ID=82955279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210576129.1A Pending CN114974229A (en) | 2022-05-25 | 2022-05-25 | Method and system for extracting abnormal behaviors based on audio data of power field operation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114974229A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258466A (en) * | 2023-05-15 | 2023-06-13 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116825140A (en) * | 2023-08-29 | 2023-09-29 | 北京龙德缘电力科技发展有限公司 | Voice interaction method and system for standardizing action flow in operation ticket |
-
2022
- 2022-05-25 CN CN202210576129.1A patent/CN114974229A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258466A (en) * | 2023-05-15 | 2023-06-13 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116258466B (en) * | 2023-05-15 | 2023-10-27 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116825140A (en) * | 2023-08-29 | 2023-09-29 | 北京龙德缘电力科技发展有限公司 | Voice interaction method and system for standardizing action flow in operation ticket |
CN116825140B (en) * | 2023-08-29 | 2023-10-31 | 北京龙德缘电力科技发展有限公司 | Voice interaction method and system for standardizing action flow in operation ticket |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102163427B (en) | Method for detecting audio exceptional event based on environmental model | |
CN109034046B (en) | Method for automatically identifying foreign matters in electric energy meter based on acoustic detection | |
CN109473123A (en) | Voice activity detection method and device | |
CN114974229A (en) | Method and system for extracting abnormal behaviors based on audio data of power field operation | |
Das et al. | Recognition of isolated words using features based on LPC, MFCC, ZCR and STE, with neural network classifiers | |
CN110211594B (en) | Speaker identification method based on twin network model and KNN algorithm | |
CN112735383A (en) | Voice signal processing method, device, equipment and storage medium | |
CN109801646B (en) | Voice endpoint detection method and device based on fusion features | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN110910891B (en) | Speaker segmentation labeling method based on long-time and short-time memory deep neural network | |
CN109817227B (en) | Abnormal sound monitoring method and system for farm | |
CN112885372A (en) | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound | |
CN112397054B (en) | Power dispatching voice recognition method | |
CN114023354A (en) | Guidance type acoustic event detection model training method based on focusing loss function | |
CN113823293A (en) | Speaker recognition method and system based on voice enhancement | |
CN116741148A (en) | Voice recognition system based on digital twinning | |
Pak et al. | Convolutional neural network approach for aircraft noise detection | |
CN111968628B (en) | Signal accuracy adjusting system and method for voice instruction capture | |
CN113077812A (en) | Speech signal generation model training method, echo cancellation method, device and equipment | |
CN112885379A (en) | Customer service voice evaluation method, system, device and storage medium | |
CN112329819A (en) | Underwater target identification method based on multi-network fusion | |
CN115457966B (en) | Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion | |
CN116741159A (en) | Audio classification and model training method and device, electronic equipment and storage medium | |
CN115619117A (en) | Power grid intelligent scheduling method based on duty system | |
CN115391523A (en) | Wind power plant multi-source heterogeneous data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |