CN115579017A

CN115579017A - Audio data processing method and device

Info

Publication number: CN115579017A
Application number: CN202211172793.6A
Authority: CN
Inventors: 李尔涵; 孙云飞; 成玉龙; 瞿伟
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-01-06

Abstract

The embodiment of the application provides an audio data processing method and device, which relate to the field of audio processing, and the method comprises the following steps: acquiring audio data, carrying out preprocessing operation, and classifying the audio data subjected to the preprocessing operation according to a set application scene; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting voice features of the audio data to determine corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data; according to the method and the device, the single voice data can be accurately extracted for recognition and analysis.

Description

Audio data processing method and device

Technical Field

The present application relates to the field of audio processing, and in particular, to an audio data processing method and apparatus.

Background

With the popularization of audio and video software in life, speech content recognition is also increasingly applied to daily life. In banking, speech recognition can help monitor and effectively identify whether non-compliant or sensitive information is present in an audio conference.

In the existing voice recognition technology, after noise is filtered, voice data of a plurality of persons can be extracted for recognition, but the existing voice recognition technology has the defect that single voice data cannot be extracted for analysis.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides an audio data processing method and device, which can accurately extract single voice data for recognition and analysis.

In order to solve at least one of the above problems, the present application provides the following technical solutions:

in a first aspect, the present application provides an audio data processing method, including:

acquiring audio data, carrying out preprocessing operation, and classifying the audio data subjected to the preprocessing operation according to a set application scene;

carrying out noise filtering processing on the classified audio data according to a set machine learning model;

and if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice characteristics of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data.

Further, the acquiring audio data and performing preprocessing operations include:

pre-emphasis processing is carried out on the collected audio data through a preset first-order FIR high-pass digital filter;

and carrying out windowing and framing processing on the audio data subjected to the pre-emphasis processing to obtain the audio data subjected to the windowing and framing processing.

Further, the classifying the audio data after the preprocessing operation according to a set application scenario includes:

classifying the audio data subjected to the preprocessing operation according to at least one of a preset conference area, a conference mechanism and conference properties;

and determining the monitoring level of the audio data according to the classification result.

Further, before the noise filtering processing is performed on the classified audio data according to the set machine learning model, the method includes:

performing model training on a preset machine learning model according to preset standard human voice, music and background noise as training samples;

and obtaining a machine learning model according to the model training result.

Further, the performing noise filtering processing on the classified audio data according to the set machine learning model includes:

carrying out noise filtration on the classified data with the categories of music and background noise in the audio data according to a set machine learning model;

and obtaining audio data only containing human voice data according to the noise filtering result.

Further, if the signal-to-noise ratio of the audio data after the noise filtering process exceeds a preset standard threshold, extracting the voice feature of the audio data to determine the corresponding single voice data and performing content compliance verification, further comprising:

judging whether the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard of 110dB;

if so, performing voice feature extraction operation, otherwise, performing workflow rollback.

Further, the extracting the voice feature of the audio data to determine the corresponding single voice data includes:

carrying out Fourier transform on the audio data to obtain frequency spectrums of each frame;

performing power spectrum filtering on each frame spectrum according to a preset Mel filter bank and a scene type corresponding to the audio data;

and separating the multi-person voice signals according to the Mel frequency cepstrum coefficient obtained after the power spectrum filtering to obtain corresponding single-person voice data.

In a second aspect, the present application provides an audio data processing apparatus comprising:

the audio preprocessing module is used for acquiring audio data, performing preprocessing operation and classifying the audio data subjected to the preprocessing operation according to a set application scene;

the noise filtering processing module is used for carrying out noise filtering processing on the classified audio data according to a set machine learning model;

and the single recognition and verification module is used for extracting the voice characteristics of the audio data to determine the corresponding single voice data and carrying out content compliance verification if the signal-to-noise ratio of the audio data subjected to the noise filtering exceeds a preset standard threshold value, so as to obtain a content compliance verification result of the single voice data.

In a third aspect, the present application provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the audio data processing method when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the audio data processing method described.

In a fifth aspect, the present application provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the audio data processing method as described.

According to the technical scheme, the audio data processing method and the audio data processing device are characterized in that the audio data are collected and preprocessed, and the preprocessed audio data are classified according to the set application scene; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flowchart of an audio data processing method according to an embodiment of the present application;

FIG. 2 is a second flowchart illustrating an audio data processing method according to an embodiment of the present application;

FIG. 3 is a third flowchart illustrating an audio data processing method according to an embodiment of the present application;

FIG. 4 is a fourth flowchart illustrating an audio data processing method according to an embodiment of the present application;

FIG. 5 is a fifth flowchart illustrating an audio data processing method according to an embodiment of the present application;

FIG. 6 is a sixth flowchart illustrating an audio data processing method according to an embodiment of the present application;

FIG. 7 is a seventh flowchart illustrating an audio data processing method according to an embodiment of the present application;

fig. 8 is a block diagram of an audio data processing apparatus in an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

According to the technical scheme, the data acquisition, storage, use, processing and the like meet the relevant regulations of national laws and regulations.

In view of the problems in the prior art, the application provides an audio data processing method and device, by acquiring audio data and performing preprocessing operation, classifying the audio data after the preprocessing operation according to a set application scene; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

In order to accurately extract single voice data for recognition and analysis, the present application provides an embodiment of an audio data processing method, and referring to fig. 1, the audio data processing method specifically includes the following contents:

step S101: the method comprises the steps of collecting audio data, carrying out preprocessing operation, and classifying the audio data after the preprocessing operation according to a set application scene.

Optionally, the method and the device can acquire all the voice data in the conference, then perform preprocessing operation on the input data, and then classify the voice data according to different application scenes.

Step S102: and carrying out noise filtering processing on the classified audio data according to a set machine learning model.

Optionally, the application may build a machine learning model to distinguish and filter speech from noise.

Step S103: and if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice characteristics of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data.

Optionally, in the filtered data, the application may determine whether the speech signal-to-noise ratio meets a preset standard, and if so, extract the speech features to obtain the individual speech data of each participant, and then identify the content compliance of the speech data of each participant and perform corresponding operations on the identified result.

Optionally, the method and the device can be used for preprocessing various audio conference data by means of an artificial intelligence technology according to different application scenes and through factors such as conference areas, conference mechanisms, conference properties and the like, and adding enough noise types for recognition and filtration through modeling, so that the problem of weak interference resistance in the existing voice recognition technology is solved, and the non-compliance and sensitive information in the voice conference is analyzed to make effective recognition and perform targeted intervention.

As can be seen from the above description, the audio data processing method provided in the embodiment of the present application can classify the audio data after the preprocessing operation according to the set application scenario by acquiring the audio data and performing the preprocessing operation; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

In an embodiment of the audio data processing method of the present application, referring to fig. 2, the following may be further included:

step S201: and pre-emphasis processing is carried out on the collected audio data through a preset first-order FIR high-pass digital filter.

Step S202: and carrying out windowing and framing processing on the audio data subjected to the pre-emphasis processing to obtain the audio data subjected to the windowing and framing processing.

Optionally, the preprocessing operation of the speech data includes pre-emphasis, windowing, and framing. Wherein the pre-emphasis is realized by a first order FIR high-pass digital filter in order to increase the high frequency resolution of the speech. The purpose of windowing framing is to improve the continuity and stationarity of the speech data.

In an embodiment of the audio data processing method of the present application, referring to fig. 3, the following may be further included:

step S301: and classifying the audio data subjected to the preprocessing operation according to at least one of a preset conference area, a conference mechanism and conference properties.

Step S302: and determining the monitoring level of the audio data according to the classification result.

Optionally, the collected voice data is classified according to different application scenes, specifically, the conference scenes can be classified according to the conference area, and the conference can be divided into an international conference and a domestic conference. The monitoring level of the international conference is larger than that of the domestic conference.

The conference scene can be classified according to the conference mechanism, and the classification can be divided into a head office, a first-level mechanism, a second-level mechanism and the like. The higher the organizational hierarchy, the higher the level of monitoring.

The method and the system can also classify meeting scenes according to meeting properties and can be divided into decision meetings, negotiation meetings and daily office meetings. In the three scenes, the decision-making conference monitoring level is the highest, and the daily office conference monitoring level is the lowest.

After scene classification is completed, according to preset settings, the audio conference monitoring system uses different monitoring levels and corresponding monitoring strength according to different conference scenes.

In an embodiment of the audio data processing method of the present application, referring to fig. 4, the following may be further included:

step S401: and performing model training on the preset machine learning model according to preset standard human voice, music and background noise as training samples.

Step S402: and obtaining a machine learning model according to the model training result.

Optionally, the present application may use a Convolutional Neural Network (CNN) and a connectivity time series classification (CTC) method to build a machine learning model, and add training samples to the model in a training stage, where the training samples include the following 3 sound sources: standard human voice, music, background noise. Then, the data in the training sample is trained through the model, a sound classifier is designed, and a category is determined for each audio frame by taking a frame as a unit, wherein the category is 1 of 3 sound sources in the training sample. After the training phase is over, the model after training is saved.

After the model is trained, the proportion of 3 sound sources in the training sample is adjusted to be used as a verification set. And putting the data in the verification set into the model for training, and evaluating the effect of the model. And then, the proportion of 3 sound sources in the verification set is repeatedly adjusted for many times, and the effect of the model on the verification set is the best.

In an embodiment of the audio data processing method of the present application, referring to fig. 5, the following may be further included:

step S501: and carrying out noise filtration on the data with the categories of music and background noise in the classified audio data according to a set machine learning model.

Step S502: and obtaining audio data only containing human voice data according to the noise filtering result.

Optionally, the log energy and mel-frequency cepstrum coefficient (MFCC) of the data with the conference scene as the dimension may be extracted as the speech features, and a sound classifier may be used to determine a category for each audio frame of the data with the conference scene as the dimension. Then putting the voice data into a model for training, and distinguishing which voice data are the voice of the audio conference and which voice data are the music and the background noise of the conference.

And filtering the data with the categories of music and background noise in each frame of the voice data by taking the frame as a unit through training a result obtained by training the data with the conference scene as a dimension, wherein the left data, namely the voice data of all participants in the conference, is used as effective data of the audio conference.

In an embodiment of the audio data processing method of the present application, referring to fig. 6, the following may be further included:

step S601: and judging whether the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard of 110dB.

Step S602: if so, performing voice feature extraction operation, otherwise, performing workflow rollback.

Optionally, the method may calculate the signal-to-noise ratio of the voice in the data subjected to noise filtering, determine whether the voice signal-to-noise ratio meets a preset standard of 110dB, and if so, enter a feature parameter extraction process; if not, returning to the noise identification and filtering process, and continuing to filter the music and the background noise in the voice data until meeting the preset standard.

In an embodiment of the audio data processing method of the present application, referring to fig. 7, the following may be further included:

step S701: and carrying out Fourier transform on the audio data to obtain the frequency spectrum of each frame.

Step S702: and performing power spectrum filtering on each frame spectrum according to a preset Mel filter bank and a scene type corresponding to the audio data.

Step S703: and separating the multi-person voice signals according to the Mel frequency cepstrum coefficient obtained after the power spectrum filtering to obtain corresponding single-person voice data.

The valid data includes voice information of all participants in the conference. Then extracting the voice data of each participant from the data, comprising the following steps:

1. and performing Fourier transform on the effective data to obtain the frequency spectrum of each frame.

2. And designing a Mel filter bank, setting an upper limit and a lower limit of frequency according to a selected meeting scene, and smoothing the frequency spectrum. The power spectrum of each frame is then filtered with a mel-filter.

3. And summing the filtered energy of each frame, taking a logarithm, performing discrete cosine transform, and extracting a Mel Frequency Cepstrum Coefficient (MFCC).

4. And separating the voice signals of a plurality of persons by the extracted mel frequency cepstrum coefficient, shielding the voice of other persons and obtaining the independent voice data of each participant.

Optionally, in other embodiments of the present application, the content compliance of each participant voice data may be identified according to the monitoring level determined in the conference scene classification step. When the data of a certain voice characteristic is detected to have unqualified content or sensitive information for the first time, the voice recognition system can give an alarm to the participant; when the data of the voice feature is detected for the second time as non-compliant content or sensitive information, the voice recognition system will move the participant out of the conference.

In order to accurately extract single voice data for recognition and analysis, the present application provides an embodiment of an audio data processing apparatus for implementing all or part of the content of the audio data processing method, and referring to fig. 8, the audio data processing apparatus specifically includes the following contents:

the audio preprocessing module 10 is configured to collect audio data, perform preprocessing operation, and classify the audio data after the preprocessing operation according to a set application scenario.

And a noise filtering module 20, configured to perform noise filtering processing on the classified audio data according to a set machine learning model.

And the single recognition and verification module 30 is configured to extract the voice feature of the audio data to determine the corresponding single voice data and perform content compliance verification if the signal-to-noise ratio of the audio data after the noise filtering process exceeds a preset standard threshold, so as to obtain a content compliance verification result of the single voice data.

As can be seen from the above description, the audio data processing apparatus provided in the embodiment of the present application can classify the audio data after the preprocessing operation according to the set application scenario by acquiring the audio data and performing the preprocessing operation; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

In terms of hardware, in order to accurately extract single-person voice data for recognition and analysis, the present application provides an embodiment of an electronic device for implementing all or part of the contents in the audio data processing method, where the electronic device specifically includes the following contents:

a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the audio data processing device and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may be implemented with reference to the embodiment of the audio data processing method and the embodiment of the audio data processing apparatus in the embodiment, and the contents thereof are incorporated herein, and repeated descriptions are omitted.

It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..

In practical applications, part of the audio data processing method may be performed on the electronic device side as described above, or all operations may be performed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.

The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.

Fig. 9 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 9, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 9 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the audio data processing method functions may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:

As can be seen from the above description, in the electronic device provided in the embodiment of the present application, by acquiring audio data and performing a preprocessing operation, the audio data after the preprocessing operation is classified according to a set application scenario; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

In another embodiment, the audio data processing apparatus may be configured separately from the central processor 9100, for example, the audio data processing apparatus may be configured as a chip connected to the central processor 9100, and the audio data processing method function is realized by the control of the central processor.

As shown in fig. 9, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 9; in addition, the electronic device 9600 may further include components not shown in fig. 9, which may be referred to in the prior art.

As shown in fig. 9, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.

The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.

The memory 9140 can be a solid state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage part 9142, the application/function storage part 9142 being used to store application programs and function programs or a flow for executing the operation of the electronic device 9600 by the central processing unit 9100.

The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.

An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the audio data processing method with the execution subject being the server or the client in the foregoing embodiments, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all the steps in the audio data processing method with the execution subject being the server or the client in the foregoing embodiments, for example, when the processor executes the computer program, implements the following steps:

As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application classifies audio data after being subjected to the preprocessing operation according to a set application scenario by acquiring the audio data and performing the preprocessing operation; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

Embodiments of the present application further provide a computer program product capable of implementing all steps in the audio data processing method with the execution subject being a server or a client in the foregoing embodiments, and when executed by a processor, the computer program/instruction implements the steps of the audio data processing method, for example, the computer program/instruction implements the following steps:

As can be seen from the foregoing description, in the computer program product provided in the embodiment of the present application, by acquiring audio data and performing a preprocessing operation, the audio data after the preprocessing operation is classified according to a set application scenario; carrying out noise filtering processing on the classified audio data according to a set machine learning model; if the signal-to-noise ratio of the audio data after the noise filtering processing exceeds a preset standard threshold, extracting the voice features of the audio data to determine the corresponding single voice data and performing content compliance verification to obtain a content compliance verification result of the single voice data, so that the single voice data can be accurately extracted for recognition analysis.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of audio data processing, the method comprising:

2. The audio data processing method of claim 1, wherein the acquiring audio data and performing pre-processing operations comprises:

3. The audio data processing method of claim 1, wherein the classifying the audio data after the preprocessing operation according to a set application scenario comprises:

4. The audio data processing method according to claim 1, wherein before the noise filtering the classified audio data according to the set machine learning model, the method comprises:

and obtaining a machine learning model according to the model training result.

5. The audio data processing method according to claim 1, wherein the performing noise filtering processing on the classified audio data according to a set machine learning model includes:

carrying out noise filtration on data with the categories of music and background noise in the classified audio data according to a set machine learning model;

6. The audio data processing method of claim 1, wherein if the snr of the audio data after the noise filtering process exceeds a preset standard threshold, extracting the speech feature of the audio data to determine a corresponding single-person speech data and performing content compliance verification, further comprising:

7. The audio data processing method of claim 1, wherein the extracting the voice feature of the audio data to determine the corresponding single person voice data comprises:

8. An audio data processing apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the audio data processing method according to any of claims 1 to 7 are implemented when the processor executes the program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the audio data processing method of any one of claims 1 to 7.

11. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the audio data processing method of any of claims 1 to 7.