CN112489627A

CN112489627A - Audio identification method and device for industrial production line and storage medium

Info

Publication number: CN112489627A
Application number: CN202011294784.5A
Authority: CN
Inventors: 刘军; 徐梓涵; 张建行; 侯青; 刘洋; 孙思琪; 姜佳宁
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-03-12

Abstract

The invention provides an industrial production line audio recognition method, an industrial production line audio recognition device and a storage medium, wherein the method comprises the following steps: importing a plurality of original audio data, respectively performing data conversion processing on the original audio data to obtain a plurality of audio data to be processed, and respectively performing dimensionality reduction processing on the audio data to be processed to obtain a plurality of dimensionality reduction audio data; and respectively carrying out feature extraction on the plurality of dimension reduction audio data to obtain a plurality of audio feature data, and collecting the plurality of audio feature data to obtain an audio feature data set. The method improves the resolution accuracy of audio noise and silence, can realize the rapid and accurate determination of small samples in a short time, replaces manual processing of component data, realizes the aims of high intelligence, high accuracy and high efficiency, is simple to realize, is suitable for general popularization, and has wide market prospect.

Description

Audio identification method and device for industrial production line and storage medium

Technical Field

The invention mainly relates to the technical field of audio identification, in particular to an audio identification method and device for an industrial production line and a storage medium.

Background

Audio classification is essentially an audio recognition process, comprising two basic processes, feature extraction and classification. The audio classification is one of important means for solving audio structuring problems and extracting audio content semantics, is a research hotspot in the current content-based audio retrieval field, and has great application value in numerous fields such as remote teaching, digital libraries, news program retrieval and the like. Audio classification is the basis and premise for deep processing of audio. The audio environment of the voice can be determined in advance through classification, clues are provided for the self-adaptive adjustment algorithm of the voice model, and therefore the accuracy of voice recognition is improved. Thus, the classification problem is a core problem for content-based audio retrieval. Audio classification techniques are a field of cross-research involving a variety of knowledge aspects including auditory features of the human ear, signal and system, digital signal processing, speech signal processing, pattern recognition, statistical learning, artificial intelligence, and the like. Currently, the research in this field focuses mainly on the following two aspects, namely audio feature analysis and extraction and classifier design. Before performing automatic audio classification, feature information of original data needs to be extracted first. The selected features should be able to adequately represent important classification characteristics of the audio time-frequency domain, robust and general to environmental changes.

Nowadays, the identification of industrial audio in the market generally has the situations of too long time and low accuracy, thereby reducing the efficiency of processing industrial audio.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art and provides an industrial production line audio identification method, an industrial production line audio identification device and a storage medium.

The technical scheme for solving the technical problems is as follows: an industrial production line audio identification method comprises the following steps:

importing a plurality of original audio data, and respectively performing data conversion processing on the plurality of original audio data to obtain a plurality of audio data to be processed;

respectively carrying out dimensionality reduction on the audio data to be processed to obtain a plurality of dimensionality reduction audio data;

respectively extracting the features of the dimension reduction audio data to obtain a plurality of audio feature data, and collecting the plurality of audio feature data to obtain an audio feature data set;

constructing a training model, and training the audio characteristic data set according to the training model to obtain an audio characteristic model;

updating parameters of the audio characteristic model to obtain an updated audio characteristic model;

and identifying the audio data to be identified according to the updated feature detection model to obtain an identification result.

Another technical solution of the present invention for solving the above technical problems is as follows: an industrial pipeline audio recognition device, comprising:

the data conversion processing module is used for importing a plurality of original audio data and respectively carrying out data conversion processing on the original audio data to obtain a plurality of audio data to be processed;

the dimensionality reduction processing module is used for respectively carrying out dimensionality reduction processing on the audio data to be processed to obtain a plurality of dimensionality reduction audio data;

the characteristic extraction module is used for respectively carrying out characteristic extraction on the dimension reduction audio data to obtain a plurality of audio characteristic data, and collecting the audio characteristic data to obtain an audio characteristic data set;

the model training module is used for constructing a training model and training the audio characteristic data set according to the training model to obtain an audio characteristic model;

the parameter updating module is used for updating parameters of the audio characteristic model to obtain an updated audio characteristic model;

and the identification result obtaining module is used for identifying the audio data to be identified according to the updated feature detection model to obtain an identification result.

The invention has the beneficial effects that: the method comprises the steps of obtaining a plurality of audio data to be processed by respectively converting a plurality of original audio data, obtaining a plurality of dimension reduction audio data by respectively carrying out dimension reduction processing on the plurality of audio data to be processed, obtaining an audio feature data set by respectively carrying out feature extraction on the plurality of dimension reduction audio data, obtaining an audio feature model according to training of the training model on the audio feature data set, improving the resolution accuracy of audio noise and silence, realizing quick and accurate determination of a small sample in a short time, replacing manual processing of component data, achieving the aims of high intelligence, high accuracy and high efficiency, and being simple in implementation method, suitable for general popularization and wide in market prospect.

Drawings

FIG. 1 is a schematic flow chart of an audio recognition method for an industrial pipeline according to an embodiment of the present invention;

fig. 2 is a block diagram of an audio recognition apparatus for an industrial pipeline according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flowchart of an audio identification method for an industrial pipeline according to an embodiment of the present invention.

As shown in fig. 1, an industrial pipeline audio identification method includes the following steps:

It should be understood that the raw audio data is collected by a sound collection device while the industrial pipeline equipment is running.

It should be understood that the audio feature model is verified according to preset judgment data to obtain a verification result, and the audio feature model is subjected to parameter updating according to the verification result to obtain an updated audio feature model.

Specifically, the audio class is determined synthetically by comparison to a standard assay baseline of a control group, comparison of fitness and confidence scores. For the model of the invention, the evaluation criteria were the selection of the decision coefficient and the predicted squared mean error. A model with good prediction capability should have a high root mean square error value for the selection decision coefficient and prediction set, and a low square mean error for the prediction values. The predicted square mean error can reflect the accuracy of the model, the decision coefficient is the goodness of fit, and the change of the dependent variable can be predicted by the change of the independent variable.

In the above embodiment, the plurality of audio data to be processed are obtained by respectively performing data conversion processing on the plurality of original audio data, the plurality of audio data to be processed are obtained by respectively performing dimensionality reduction processing on the plurality of audio data to be processed, the audio feature data sets are obtained by respectively extracting features of the plurality of audio data to be processed, the audio feature model is obtained according to training of the training model on the audio feature data sets, the resolution accuracy of audio noise and silence is improved, short-time rapid and accurate measurement of small samples can be realized, manual processing of component data is replaced, the targets of high intelligence, high accuracy and high efficiency are realized, the realization method is simple, and the method is suitable for general popularization and has a wide market prospect.

Optionally, as an embodiment of the present invention, before the process of respectively performing data conversion processing on a plurality of pieces of original audio data to obtain a plurality of pieces of audio data to be processed, the process further includes:

and respectively carrying out character string recognition on the plurality of original audio data by utilizing a preset Python standard library to obtain a plurality of audio stream character strings.

It should be understood that a plurality of said audio data to be processed is stored.

It should be appreciated that Python is a cross-platform computer programming language. Is a high-level scripting language that combines interpretive, compiled, interactive, and object-oriented capabilities. Originally designed for writing automated scripts (shells), the more used for development of independent, large projects with the continual updating of versions and the addition of new functionality in language, the more a Python standard library is a database in Python.

Specifically, the one-dimensional wav original audio data is identified and acquired through a wave module of a Python standard library, and the audio stream character string comprises the number of channels, quantization bits, a sampling rate and a total sampling point. And converting the character strings into two arrays of columns, storing the arrays as CSV files, namely the audio data to be processed, and storing the CSV files in a database.

In the above embodiment, the preset Python standard library is used to identify the character strings of the plurality of original audio data respectively to obtain the plurality of audio stream character strings, so that subsequent data processing is facilitated, the resolution accuracy of audio noise and silence is improved, the short-time rapid and accurate determination of a small sample can be realized, manual processing of component data is replaced, the aims of high intelligence, high accuracy and high efficiency are fulfilled, the implementation method is simple, and the method is suitable for general popularization and has a wide market prospect.

Optionally, as an embodiment of the present invention, the process of performing dimension reduction processing on the multiple pieces of audio data to be processed respectively to obtain multiple pieces of dimension reduction audio data includes:

and performing dimensionality reduction on the plurality of audio data to be processed by utilizing a Principal Component Analysis (PCA) algorithm to obtain a plurality of dimensionality reduction audio data.

It should be appreciated that the optimal orthogonal transformation based on second order statistical properties of the data is performed by a PCA principal component analysis algorithm. The new components generated after the transformation are orthogonal or uncorrelated, so that the data is processed in a low-dimensional feature space, and most information of the original data is saved. Therefore, singular value decomposition is performed on the two-dimensional samples, and the largest component is selected from the obtained characteristic values to represent the whole two-dimensional array. After the PCA processing, the loss caused by operations such as noise, compression and the like of data can be effectively reduced.

Specifically, first, the PCA principal component analysis algorithm is a typical statistical analysis method using an optimal orthogonal transformation based on the second-order statistical characteristics of the data of the present invention. The new components generated after the transformation are orthogonal or uncorrelated, so that the data is processed in a low-dimensional feature space, and most information of the original data is kept. For its two-dimensional samples, X1, X2, with sample mean μ ═ 1/(X1+ X2), and the discrete matrix is S ═ (X1- μ) (X1- μ) T + (X2- μ) (X2- μ) T. With PCA, new components can be derived that are linear combinations of the original data and are uncorrelated with each other. When these new components are used to reconstruct the original data, the mean square error is minimized. Therefore, the new coordinate system is composed of eigenvectors corresponding to non-zero eigenvalues of the matrix. And solving the eigenvalue and the eigenvector by adopting a singular value decomposition method. The obtained characteristic values are arranged from large to small, and the largest component is taken out, so that the whole data can be expressed most accurately, and the purpose of reducing dimension is achieved.

In the embodiment, the PCA principal component analysis algorithm is used for respectively carrying out dimensionality reduction on the plurality of audio data to be processed to obtain the plurality of dimensionality reduction audio data, so that the influence caused by noise is effectively reduced.

Optionally, as an embodiment of the present invention, the process of respectively performing feature extraction on the plurality of dimension reduction audio data to obtain a plurality of audio feature data includes:

constructing a spectrogram of the dimension reduction audio data to obtain a plurality of spectrograms;

and respectively extracting the features of the corresponding dimension reduction audio data according to the plurality of spectrogram to obtain audio feature data corresponding to the dimension reduction audio data.

In the above embodiment, a plurality of speech spectrograms are respectively constructed for speech spectrograms of a plurality of dimension reduction audio data, audio characteristic data corresponding to each dimension reduction audio data is obtained by extracting features of each corresponding dimension reduction audio data according to the plurality of speech spectrograms, the audio characteristic data can be subdivided differently, amplification of different types of features is realized, resolution accuracy of audio noise and silence is improved, rapid and accurate determination of small samples in a short time can be realized, manual processing of component data is replaced, targets of high intelligence, high accuracy and high efficiency are realized, the implementation method is simple, and the method is suitable for general popularization and has a wide market prospect.

Optionally, as an embodiment of the present invention, the process of constructing a spectrogram through separately performing the dimensionality reduction on the multiple pieces of the dimension reduction audio data to obtain multiple spectrogram patterns includes:

importing a preset sliding window, and respectively performing framing processing on each dimensionality reduction audio data according to the sliding window to obtain framed audio data corresponding to each dimensionality reduction audio data;

respectively carrying out fast Fourier transform on the audio data after each frame is divided to obtain an audio data abscissa and an audio data frequency scale corresponding to each dimension reduction audio data;

respectively calculating power spectrum estimated values of the abscissa of each audio data by using a periodogram algorithm to obtain power spectrum estimated values corresponding to each dimensionality reduction audio data;

respectively calculating the quality scores of the power spectrum estimated values to obtain the quality scores corresponding to the dimensionality reduction audio data;

and constructing a two-dimensional graph according to each quality score and the corresponding audio data frequency scale to obtain a plurality of spectrogram.

It should be understood that, aiming at the characteristics of short time and noise mixing of the dimension reduction audio data sample, the invention can accurately sample the frequency domain parameters of the feature information of the dimension reduction audio data, such as formants, energy and the like, through a customized spectrogram. The sampling rate was 8000Hz, the window length was taken as 512 data points, and the frame was shifted to 1/4, i.e. 128 data points, of the window length. Then, using Gabor filtering, belonging to windowed fourier transform, the Gabor function can extract related features in different scales and different directions of the frequency domain.

In the above embodiment, according to the sliding window, the audio data after framing corresponding to each piece of dimension-reduced audio data is obtained by framing each piece of dimension-reduced audio data, and the audio data abscissa and the audio data frequency scale corresponding to each piece of dimension-reduced audio data are obtained by fast fourier transform of each piece of framed audio data, so that audio characteristic data can be subdivided differently, different types of characteristics can be amplified, the resolution accuracy of audio noise and silence can be improved, short-time rapid and accurate measurement of a small sample can be realized, manual processing of component data is replaced, the purposes of high intelligence, high accuracy and high efficiency are achieved, the implementation method is simple, and the method is suitable for general popularization and has a wide market prospect.

Optionally, as an embodiment of the present invention, the calculating of the power spectrum estimation value by using a periodogram algorithm on the abscissa of each piece of audio data respectively to obtain the power spectrum estimation value corresponding to each piece of reduced-dimension audio data includes:

calculating power spectrum estimation values of the abscissa of each audio data through a first equation to obtain the power spectrum estimation value corresponding to each dimensionality reduction audio data, wherein the first equation is as follows:

Y＝X*X，

where X is the audio data abscissa and Y is the power spectrum estimate.

It should be appreciated that the periodogram algorithm is a method of estimating the power spectral density of a signal. Since the discrete fourier transform X () of the sequence X (n) has periodicity, this power spectrum also has periodicity, often referred to as a periodogram.

In the above embodiment, the power spectrum estimated values corresponding to the dimensionality reduction audio data are obtained by respectively calculating the power spectrum estimated values of the abscissa of each audio data in the first mode, the audio characteristic data can be subdivided in a distinguishing manner, different types of characteristics are amplified, the resolution accuracy of audio noise and silence is improved, the small sample can be quickly and accurately determined in a short time, manual processing of component data is replaced, the purposes of high intelligence, high accuracy and high efficiency are achieved, the implementation method is simple, the method is suitable for general popularization, and the market prospect is wide.

Optionally, as an embodiment of the present invention, the step of calculating the quality score for each power spectrum estimation value to obtain the quality score corresponding to each dimensionality reduction audio data includes:

calculating the quality scores of the power spectrum estimated values respectively through a second formula to obtain the quality scores corresponding to the dimensionality reduction audio data, wherein the second formula is as follows:

M＝10*log10(Y)，

where M is the mass fraction and Y is the power spectrum estimate.

In the above embodiment, the quality scores corresponding to the dimensionality reduction audio data are obtained by calculating the quality scores of the estimated values of the power spectrums through the second formula, the audio characteristic data can be subdivided in a distinguishing manner, different types of characteristics can be amplified, the resolution accuracy of audio noise and silence can be improved, the small sample can be quickly and accurately determined in a short time, manual processing of component data is replaced, the purposes of high intelligence, high accuracy and high efficiency are achieved, the implementation method is simple, the method is suitable for general popularization, and the market prospect is wide.

Optionally, as an embodiment of the present invention, the constructing a training model, and training the audio feature data set according to the training model to obtain an audio feature model includes:

constructing a training model, wherein the training model comprises a 9x9 time domain-frequency domain filter, a 4x3 filter, a linear layer, an LSTM long-short term memory artificial neural network and a full-connection deep neural network;

sequentially inputting the audio characteristic data set into the 9x9 time domain-frequency domain filter and the 4x3 filter for filtering processing to obtain a filtered audio characteristic data set;

inputting the filtered audio characteristic data set into the linear layer for dimensionality reduction processing to obtain a dimensionality-reduced audio characteristic data set;

inputting the audio characteristic data set subjected to dimensionality reduction into the LSTM long-short term memory artificial neural network for data selection to obtain a selected audio characteristic data set;

and inputting the selected audio characteristic data set into the fully-connected deep neural network for characteristic extraction to obtain an audio characteristic model.

It should be understood that the invention adopts a semi-supervised learning method so as to preset the training parameters of the training model in the training process; the training parameters mainly include learning rate and iteration times.

Specifically, the CNN part of this patent is two layers of CNN, each layer having 256 feature maps, the first layer using a 9x9 time-frequency domain filter, and the second layer being a 4x3 filter, for highlighting features of the image for better acquisition. The pooling layer adopts a max-pooling strategy, the first layer of pooling size is 3, and the second layer of CNN is not connected with the pooling layer. Because the last layer of CNN has large output dimension, the size is feature-maps time frequency, so that a linear layer is connected before LSTM behind CNN for dimension reduction, and experiments also prove that dimension reduction parameters do not have too large influence on accuracy, and the output of the linear layer is 256 dimensions. The CNN is connected with 2 layers of LSTMs in back, and is used for solving the problems of gradient loss and gradient explosion in the long sequence training process, the iteration updating difficulty is increased along with the deepening of the layers of the training model, and the main reason is the gradient problem; explosion and disappearance of the gradient cannot be avoided, lstm is introduced, the main function of lstm is selective conduction, the difficulty of parameter modification is reduced through self-adaptive information selection in the training process, the risk of gradient disappearance is delayed to a certain extent, and the lstm is frequently used in a deep model. And each LSTM layer adopts 832 cells, 512-dimensional mapping layers to reduce the dimension, the output state label is delayed by 5 frames, the DNN output information can better predict the current frame at the moment, l frames are expanded leftwards and r frames are expanded rightwards due to the input characteristics of the CNN, and r is set to be 0 in order to ensure that the LSTM does not interfere with the content of more than 5 frames in the future. And finally, after frequency domain and time domain modeling, connecting the output of the LSTM with a plurality of fully-connected DNN layers for collecting characteristics, using the application of CNN in the image field as reference, adopting long and short time characteristics, directly inputting the input characteristics of the CNN serving as short time characteristics to the LSTM as partial input, and directly using the output characteristics of the CNN as partial input characteristics of the DNN to obtain the audio characteristic model.

In the above embodiment, the audio feature data set is sequentially input into the 9x9 time-frequency domain filter and the 4x3 filter to be filtered to obtain a filtered audio feature data set, the filtered audio feature data set is input into the linear layer to be subjected to dimensionality reduction to obtain a dimensionality reduced audio feature data set, the dimensionality reduced audio feature data set is input into the LSTM long-short term memory artificial neural network to select data to obtain a selected audio feature data set, the selected audio feature data set is input into the fully-connected deep neural network to be subjected to feature extraction to obtain an audio feature model, so that the resolution accuracy of audio noise and silence is improved, the small sample can be quickly and accurately determined in a short time, the artificial processing of component data is replaced, the purposes of high intelligence, high accuracy and high efficiency are achieved, and the implementation method is simple, is suitable for general popularization and has wide market prospect.

Alternatively, as another embodiment of the present invention, as shown in fig. 2, an industrial pipeline audio recognition apparatus includes:

Optionally, as an embodiment of the present invention, before the data conversion processing module, the method further includes:

Alternatively, another embodiment of the present invention provides an industrial pipeline audio recognition apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the industrial pipeline audio recognition method as described above is implemented. The device may be a computer or the like.

Optionally, another embodiment of the invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the industrial pipeline audio recognition method as described above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. It will be understood that the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An industrial production line audio identification method is characterized by comprising the following steps:

2. The audio identification method for industrial pipelines according to claim 1, wherein before the process of respectively performing data conversion processing on the plurality of original audio data to obtain a plurality of audio data to be processed, the method further comprises:

3. The audio identification method for the industrial pipeline according to claim 1, wherein the step of performing the dimension reduction processing on the plurality of audio data to be processed respectively to obtain the plurality of dimension reduction audio data comprises:

4. The audio recognition method for industrial production line according to claim 3, wherein the step of performing feature extraction on the plurality of dimension-reduced audio data to obtain a plurality of audio feature data comprises:

5. The audio identification method for industrial production line according to claim 4, wherein the process of constructing a spectrogram by respectively performing spectrogram construction on the dimension-reduced audio data comprises:

6. The audio recognition method for industrial pipelines according to claim 5, wherein the step of calculating the power spectrum estimation value for each abscissa of the audio data by using a periodogram algorithm to obtain the power spectrum estimation value corresponding to each of the dimensionality-reduced audio data comprises:

Y＝X*X，

where X is the audio data abscissa and Y is the power spectrum estimate.

7. The industrial pipeline audio identification method according to claim 6, wherein the step of calculating the quality score for each power spectrum estimation value to obtain the quality score corresponding to each dimensionality reduction audio data comprises:

M＝10*log10(Y)，

where M is the mass fraction and Y is the power spectrum estimate.

8. The industrial pipeline audio recognition method of claim 1, wherein the constructing a training model and the training of the audio feature data set according to the training model to obtain an audio feature model comprises:

9. An industrial pipeline audio recognition device, comprising:

10. The industrial pipeline audio recognition device of claim 9, further comprising, prior to the data conversion processing module: