CN111241335A

CN111241335A - Audio advertisement detection method, system, mobile terminal and storage medium

Info

Publication number: CN111241335A
Application number: CN202010013182.1A
Authority: CN
Inventors: 陈剑超; 肖龙源; 李稀敏; 蔡振华; 刘晓葳
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-06-05

Abstract

The invention provides an audio advertisement detection method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: obtaining audio data, and performing feature extraction to obtain audio features; performing matrix calculation on the audio features to obtain a self-similarity matrix; setting the maximum peak point in the self-similarity matrix as a similar segment reference point, and inquiring similar points in the self-similarity matrix according to the similar segment reference point; and setting the inquired set of similar points as an advertisement section according to the inquiry result, and carrying out advertisement elimination processing on the audio data according to the advertisement section. According to the method and the device, the self-similarity matrix is calculated and designed for the audio features, the similar section reference points can be effectively set, the similar points are automatically inquired based on the similar section reference points, and the advertisement sections are obtained and removed based on the inquiry results of the similar points.

Description

Audio advertisement detection method, system, mobile terminal and storage medium

Technical Field

The invention belongs to the technical field of audio detection, and particularly relates to an audio advertisement detection method, an audio advertisement detection system, a mobile terminal and a storage medium.

Background

With the development and popularization of the internet, a large amount of information is accumulated on the internet, wherein the information comprises a large amount of voice-like audio information. Many of these voice-like audio messages carry advertisements. When the user carries out the audio on demand, the audio frequency with the advertisement can influence the on demand experience of the user to a great extent. For example, in story machine products, the core function is to return the specified story audio on demand by the user. The story audio database of the story machine has thousands of albums, containing hundreds of thousands of audio. However, the quality of the audio is uneven, and there is also a lot of audio containing advertising information from third parties. If the user plays the story resources with advertisements with low quality for the user when the user uses the story machine to play on demand, the user will be forced to have bad on-demand experience. Therefore, how to quickly screen out the audio with the advertisement when the data is put in storage is a very interesting problem.

The existing audio advertisement detection method adopts template matching based on audio fingerprints, workers listen to audio data, manually intercept advertisement sections and store the advertisement sections into a database in an audio fingerprint mode, when the audio data is subjected to an advertisement detection task, the extraction operation of the audio fingerprints is carried out on broadcast audio, and the extracted audio fingerprints are compared with the audio fingerprints in the database one by one.

If the audio fingerprint match is successful, the detected audio segment is an advertisement segment and specific information of the advertisement may also be determined. If the fingerprint matching fails, the detected audio is not the advertisement recorded in the audio fingerprint database. Due to the large number of types of advertisements in the audio data, the audio fingerprint library constructed by the staff often becomes huge. When carrying out advertisement detection to audio data, each section detects audio frequency and all needs to compare each audio frequency fingerprint in the fingerprint storehouse to wait, and then leads to audio frequency advertisement detection's detection inefficiency.

Disclosure of Invention

The embodiment of the invention aims to provide an audio advertisement detection method, an audio advertisement detection system, a mobile terminal and a storage medium, and aims to solve the problems of low training efficiency and long time consumption of the existing audio advertisement detection method.

The embodiment of the invention is realized in such a way that an audio advertisement detection method comprises the following steps:

acquiring audio data, and performing feature extraction on the audio data to obtain audio features;

performing matrix calculation on the audio features to obtain a self-similarity matrix;

setting the maximum peak point in the self-similarity matrix as a similar segment reference point, and inquiring similar points in the self-similarity matrix according to the similar segment reference point;

and setting the inquired set of the similar points as an advertisement section according to the inquiry result, and carrying out advertisement elimination processing on the audio data according to the advertisement section.

Further, the step of performing similarity point query in the self-similarity matrix according to the similarity segment reference points comprises:

setting the reference points of the similar segments as middle points, and extending equidistantly along the diagonal line of the self-similar matrix to obtain a starting point position and an end point position;

calculating the similarity between the starting point position and the end point position to obtain a similarity value;

judging whether the similarity value is smaller than a similarity threshold value;

stopping the extension of the starting position and the end position when the similarity value is judged to be smaller than the similarity threshold value;

setting a point between the start point position and the end point position as the similarity point.

Further, after the step of performing advertisement elimination processing on the audio data according to the advertisement segments, the method further includes:

performing feature calculation on the audio signals in the audio data to obtain audio feature vectors;

inputting the feature vector into a gradient lifting tree model, and controlling the gradient lifting tree model to classify all audio frames of the audio data;

and when the classification result of the audio frame is judged to be the advertisement classification, marking the audio frame as an advertisement frame, and deleting the continuous advertisement frames in the audio signal.

Further, the step of performing feature calculation on the audio signal in the audio data comprises:

performing frame windowing on the audio signal, and extracting MFCC (Mel frequency cepstrum coefficient) characteristics, zero-crossing rate characteristics, short-time energy characteristics, energy entropy characteristics, frequency spectrum center characteristics, frequency spectrum ductility characteristics and frequency spectrum flux characteristics;

vector-stitching the MFCC features, the zero-crossing rate features, the short-time energy features, the energy entropy features, the spectral center features, the spectral spread features, and the spectral flux features to obtain the audio feature vector.

Further, the step of performing a matrix calculation on the audio features comprises:

performing cosine calculation on the audio features to obtain cosine similarity;

and inquiring a target distance formula according to the cosine similarity, and performing matrix calculation on the audio features according to the target distance formula.

Further, before the step of setting the maximum peak point in the self-similarity matrix as the reference point of the similar segment, the method further includes:

and carrying out convolution processing on the self-similarity matrix to delete the abnormal points in the self-similarity matrix.

Further, after the step of extracting the features of the audio data, the method further includes:

classifying the audio features, and performing audio segmentation according to a classification result to obtain a plurality of groups of different audio features;

and sequentially carrying out matrix calculation on the audio features of different categories.

It is another object of an embodiment of the present invention to provide an audio advertisement detection system, which includes:

the characteristic extraction module is used for acquiring audio data and extracting the characteristics of the audio data to obtain audio characteristics;

the matrix calculation module is used for performing matrix calculation on the audio features to obtain a self-similarity matrix;

the similar point query module is used for setting the maximum peak point in the self-similar matrix as a similar segment reference point and performing similar point query in the self-similar matrix according to the similar segment reference point;

and the advertisement removing module is used for setting the inquired set of the similar points as an advertisement section according to the inquiry result and removing the advertisements from the audio data according to the advertisement section.

Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above audio advertisement detection method.

Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the audio advertisement detection method.

According to the embodiment of the invention, the similar section reference points can be effectively set through the calculation design of the self-similar matrix for the audio features, the query of the similar points is automatically carried out based on the setting of the similar section reference points, and the advertisement sections are obtained and removed based on the query results of the similar points.

Drawings

FIG. 1 is a flow chart of an audio commercial detection method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of an audio commercial detection method according to a second embodiment of the present invention;

FIG. 3 is a flowchart of an audio commercial detection method according to a third embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an audio commercial detection system according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a mobile terminal according to a fifth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Please refer to fig. 1, which is a flowchart illustrating an audio advertisement detection method according to a first embodiment of the present invention, including the steps of:

step S10, audio data are obtained, and feature extraction is carried out on the audio data to obtain audio features;

wherein the step of feature extracting the audio data comprises: preprocessing the audio data, judging a mute frame of the audio characteristic after the preprocessing is finished, and judging the type of the audio data to be a mute audio when the audio corresponding to the audio characteristic is judged to be the mute frame;

when the audio corresponding to the audio features is judged not to be a mute frame, inputting the audio features into a classifier for classification so as to judge the type attribute of corresponding audio data, wherein the type attribute can be pure music, background sound, noise, pure voice, noise-containing voice and the like;

specifically, in this step, the preprocessing may be performed by setting the processing steps as required, and in this embodiment, the preprocessing includes: pre-emphasis, framing, windowing, fast fourier transform, triangular band-pass filter, Discrete Cosine Transform (DCT) to obtain MFCC coefficients, calculating log energy, and extracting dynamic score parameters to obtain the audio features, that is, the features obtained in this embodiment are mel cepstrum coefficients (M is abbreviated as MFCC features);

step S20, performing matrix calculation on the audio features to obtain a self-similarity matrix, and setting the maximum peak point in the self-similarity matrix as a reference point of a similar segment;

in the self-similarity matrix, the numerical values on the line segments distributed along the diagonal line represent the similarity of the audio contents of the corresponding horizontal axis time period and the vertical axis time period, so that the query of the advertisement segment can be effectively carried out by setting the maximum peak point in the self-similarity matrix as the reference point of the similar segment;

preferably, because the fluctuation of the similarity value also generates peak points in the non-similar section area in the self-similar matrix, the design of the reference points of the similar section is determined by adopting the method of the maximum peak point of the matrix in the step, so that the accuracy of setting the reference points of the similar section is effectively improved;

step S30, according to the similar segment reference points, similar point query is carried out in the self-similar matrix;

in the step, the similarity of the audio content of the corresponding horizontal axis time period and the vertical axis time period is represented by the numerical value on the line segment distributed along the diagonal line in the self-similarity matrix, so that the advertisement segment is a set of continuous points which have higher similarity in the self-similarity matrix and are distributed along the diagonal line, and therefore, in the step, the advertisement segment can be obtained by querying the similarity point along the diagonal line in the self-similarity matrix based on the reference point of the similarity segment;

step S40, the inquired set of similar points is set as an advertisement section according to the inquiry result, and the audio data is subjected to advertisement elimination processing according to the advertisement section;

based on the query of the similarity points in the step S30, the start time, the advertisement duration and the end time of the advertisement segment can be effectively obtained, thereby facilitating the location of the advertisement segment, and based on the start time, the advertisement duration and the end time, the advertisement segment in the audio data can be effectively removed;

preferably, the embodiment can also be applied to detection of repeated playing of the same advertisement in the same time period, thereby effectively preventing repeated playing of the advertisement in the same time period and reducing the advertisement investment cost;

in the embodiment, the audio features are subjected to the calculation design of the self-similarity matrix, the similar section reference points can be effectively set, the similar points are automatically inquired based on the setting of the similar section reference points, and the advertisement sections are acquired and removed based on the inquiry results of the similar points.

Example two

Please refer to fig. 2, which is a flowchart illustrating an audio advertisement detection method according to a second embodiment of the present invention, including the steps of:

step S11, audio data are obtained, and feature extraction is carried out on the audio data to obtain audio features;

step S21, classifying the audio features, and performing audio segmentation according to the classification result to obtain a plurality of groups of audio features of different classes;

step S31, sequentially carrying out cosine calculation on the audio features of different categories to obtain cosine similarity, and inquiring a target distance formula according to the cosine similarity;

in the self-similarity matrix, the most important step is to select a proper distance formula, and the selection of the proper distance formula can make the repeated section in the sequence more obvious in the self-similarity matrix, so that in the step, the distance formula is inquired based on the cosine similarity, and the calculation accuracy of the subsequent self-similarity matrix is effectively improved;

step S41, performing matrix calculation on the audio features according to the target distance formula to obtain a self-similarity matrix, and setting the maximum peak point in the self-similarity matrix as a reference point of a similar segment;

preferably, in this step, before the step of setting the maximum peak point in the self-similarity matrix as the reference point of the similar segment, the method further includes:

performing convolution processing on the self-similarity matrix to delete abnormal points in the self-similarity matrix;

the method has the advantages that by the design of convolution processing on the self-similarity matrix, the abnormal points in the self-similarity matrix can be effectively smoothed, the values of the similar sections are enhanced, and the robustness of the self-similarity matrix to the abnormal points is greatly enhanced by the convolution operation;

step S51, setting the reference point of the similar segment as a midpoint, and extending equidistantly along the diagonal of the self-similar matrix to obtain a starting point position and an end point position;

the distance length of the equidistant extension can be set according to requirements, for example, one point is used as an interval for extension, or two points are used as intervals for extension;

preferably, in this step, extending is performed in a point manner, and extending of points is performed along a diagonal line in the self-similarity matrix with the similar segment reference point as a midpoint, so as to obtain a start point position and an end point position;

step S61, calculating a similarity between the start position and the end position to obtain a similarity value;

when the similarity value is larger, determining that the audio contents corresponding to the starting position and the end position are more similar, and when the similarity value is smaller, determining that the audio contents corresponding to the starting position and the end position are more dissimilar;

step S71, judging whether the similarity value is smaller than a similarity threshold value;

the similarity threshold may be set according to requirements, for example, the similarity threshold may be set to 90%, 80%, or 70%;

when it is determined in step S71 that the similarity value is greater than or equal to the similarity threshold, continuing the extension of the start point and the end point;

when it is determined in step S71 that the similarity value is smaller than the similarity threshold, performing step S81;

a step S81 of stopping the extension of the start position and the end position and setting a point between the start position and the end position as the similar point;

when the similarity value is judged to be smaller than the similarity threshold value, the audio data contents corresponding to the starting position and the end position are judged to be different, and the effective program audio is prevented from being set as the advertisement audio by stopping the extension of the starting position and the end position, so that the accuracy of the audio advertisement detection method is improved;

step S91, the inquired set of similar points is set as an advertisement section according to the inquiry result, and the audio data is subjected to advertisement elimination processing according to the advertisement section;

based on the stop extension of the starting position and the end position in the step S81, the starting time, the advertisement duration and the end time of the advertisement segment can be effectively obtained, thereby facilitating the position location of the advertisement segment, and based on the starting time, the advertisement duration and the end time, the advertisement segment in the audio data can be effectively removed;

in the embodiment of the invention, the audio features are subjected to the calculation design of the self-similarity matrix, the similar section reference points can be effectively set, the query of the similar points is automatically carried out based on the setting of the similar section reference points, and the advertisement sections are obtained and removed based on the query results of the similar points.

EXAMPLE III

Please refer to fig. 3, which is a flowchart illustrating an audio advertisement detection method according to a third embodiment of the present invention, including the steps of:

step S12, audio data are obtained, and feature extraction is carried out on the audio data to obtain audio features;

step S22, performing matrix calculation on the audio features to obtain a self-similarity matrix;

step S32, setting the maximum peak point in the self-similarity matrix as a similar segment reference point, and performing similar point query in the self-similarity matrix according to the similar segment reference point;

step S42, the inquired set of similar points is set as an advertisement section according to the inquiry result, and the audio data is subjected to advertisement elimination processing according to the advertisement section;

step S52, carrying out feature calculation on the audio signals in the audio data to obtain audio feature vectors;

specifically, in this step, the step of performing feature calculation on the audio signal in the audio data includes:

step S521, performing frame-by-frame windowing on the audio signal, and extracting MFCC (Mel frequency cepstrum coefficient) features, zero-crossing rate features, short-time energy features, energy entropy features, frequency spectrum center features, frequency spectrum extensibility features and frequency spectrum flux features;

step S522, performing vector concatenation on the MFCC features, the zero-crossing rate features, the short-time energy features, the energy entropy features, the spectrum center features, the spectrum spread features, and the spectrum flux features to obtain the audio feature vector;

step S62, inputting the feature vector into a gradient lifting tree model, and controlling the gradient lifting tree model to classify all audio frames of the audio data;

the design of classifying all audio frames of the audio data by controlling the gradient lifting tree model can effectively classify the audio types of all the audio frames so as to facilitate the subsequent advertisement judgment based on audio frame classification;

step S72, when the classification result of the audio frame is judged to be the advertisement classification, the audio frame is marked as an advertisement frame, and the continuous advertisement frames in the audio signal are deleted;

the design that the audio frames in the advertisement classification are marked as the advertisement frames can effectively acquire and delete the advertisement audio which does not repeatedly appear in the audio data, so that the accuracy of audio advertisement detection is improved, and all the advertisement audio in the audio data can be accurately identified and deleted;

Example four

Please refer to fig. 4, which is a schematic structural diagram of an audio advertisement detection system 100 according to a fourth embodiment of the present invention, including: the system comprises a feature extraction module 10, a matrix calculation module 11, a similar point query module 12 and an advertisement rejection module 13, wherein:

the feature extraction module 10 is configured to acquire audio data and perform feature extraction on the audio data to obtain audio features.

And the matrix calculation module 11 is configured to perform matrix calculation on the audio features to obtain a self-similarity matrix.

Wherein the matrix calculation module 11 is further configured to: performing cosine calculation on the audio features to obtain cosine similarity; and inquiring a target distance formula according to the cosine similarity, and performing matrix calculation on the audio features according to the target distance formula.

Wherein the matrix calculation module 11 is further configured to: classifying the audio features, and performing audio segmentation according to a classification result to obtain a plurality of groups of different audio features; and sequentially carrying out matrix calculation on the audio features of different categories.

And the similar point query module 12 is configured to set a maximum peak point in the self-similar matrix as a similar segment reference point, and perform similar point query in the self-similar matrix according to the similar segment reference point.

Wherein the similarity point query module 12 is configured to: setting the reference points of the similar segments as middle points, and extending equidistantly along the diagonal line of the self-similar matrix to obtain a starting point position and an end point position; calculating the similarity between the starting point position and the end point position to obtain a similarity value; judging whether the similarity value is smaller than a similarity threshold value; stopping the extension of the starting position and the end position when the similarity value is judged to be smaller than the similarity threshold value; setting a point between the start point position and the end point position as the similarity point.

Preferably, the similarity point query module 12 is further configured to: and carrying out convolution processing on the self-similarity matrix to delete the abnormal points in the self-similarity matrix.

And the advertisement removing module 13 is configured to set the queried set of similar points as an advertisement segment according to a query result, and perform advertisement removing processing on the audio data according to the advertisement segment.

Preferably, the audio commercial detection system 100 further comprises:

an audio frame classification module 14, configured to perform feature calculation on an audio signal in the audio data to obtain an audio feature vector;

Further, the audio frame classification module 14 is further configured to: performing frame windowing on the audio signal, and extracting MFCC (Mel frequency cepstrum coefficient) characteristics, zero-crossing rate characteristics, short-time energy characteristics, energy entropy characteristics, frequency spectrum center characteristics, frequency spectrum ductility characteristics and frequency spectrum flux characteristics; vector-stitching the MFCC features, the zero-crossing rate features, the short-time energy features, the energy entropy features, the spectral center features, the spectral spread features, and the spectral flux features to obtain the audio feature vector.

EXAMPLE five

Referring to fig. 5, a mobile terminal 101 according to a fifth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the audio advertisement detection method.

The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:

and setting the inquired set of the similar points as an advertisement section according to the inquiry result, and carrying out advertisement elimination processing on the audio data according to the advertisement section. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is used as an example, in practical applications, the above-mentioned function distribution may be performed by different functional units or modules according to needs, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.

Those skilled in the art will appreciate that the component configuration shown in fig. 4 is not intended to limit the audio commercial detection system of the present invention and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components, and that the audio commercial detection methods of fig. 1-3 may be implemented using more or fewer components than those shown in fig. 4, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) of the targeted audio commercial detection system and that are functionally configured to perform certain functions, and that are all stored in a storage device (not shown) of the targeted audio commercial detection system.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for audio commercial detection, the method comprising:

2. The audio commercial detection method of claim 1, wherein the step of performing a similarity point query in the self-similarity matrix according to the similarity segment reference points comprises:

3. The audio commercial detection method of claim 1, wherein after the step of performing commercial culling processing on the audio data according to the commercial segments, the method further comprises:

4. The audio commercial detection method of claim 3 wherein said step of performing feature calculations on audio signals in said audio data comprises:

5. The audio commercial detection method of claim 1, wherein said step of matrix computing the audio features comprises:

6. The audio commercial detection method of claim 1, wherein prior to the step of setting the maximum peak point in the self-similarity matrix as a similarity segment reference point, the method further comprises:

7. The audio commercial detection method of claim 1, wherein after the step of feature extracting the audio data, the method further comprises:

8. An audio commercial detection system, the system comprising:

9. A mobile terminal, characterized in that it comprises a storage device for storing a computer program and a processor running the computer program to make the mobile terminal execute the audio advertisement detection method according to any one of claims 1 to 7.

10. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 9, which computer program, when being executed by a processor, carries out the steps of the audio commercial detection method according to any one of claims 1 to 7.