CN112420023B

CN112420023B - Music infringement detection method

Info

Publication number: CN112420023B
Application number: CN202011352226.XA
Authority: CN
Inventors: 方煌锖; 俞挺; 张德华; 黄静惠; 柯登峰
Original assignee: Hangzhou Yindu Artificial Intelligence Co ltd
Current assignee: Hangzhou Yindu Artificial Intelligence Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2022-03-25
Anticipated expiration: 2040-11-26
Also published as: CN112420023A

Abstract

The invention relates to a music infringement detection method, which comprises the following steps: s1: sequentially carrying out short-time Fourier transform on each piece of music in the music library to obtain a frequency spectrum signal corresponding to the music ID; s2: performing dynamic resolution compression on the frequency spectrum signal; s3: calculating an extreme point of each frequency band interval according to the compressed frequency spectrum signal; s4: filtering the extreme points, and subtracting every two extreme points to obtain music vectors of the music library; s5: compressing music vectors of a music library into int32 according to bits; s6: establishing a hash table by taking int32 as Key and music ID as Value, wherein the music ID progressively marks each piece of music according to the time sequence of music entering a music library; s7: inputting training audio to obtain infringement probability; s8: and inputting test audio to obtain infringement probability. The method and the device perform feature extraction on the frequency spectrum information of the music through the convolutional neural network and the full-connection network, can extract useful features in multiple dimensions, do not need manual screening, and improve the accuracy and the efficiency of detection.

Description

Music infringement detection method

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a music infringement detection method.

Background

The popularization of the internet also brings the wide popularity of music at present, and people can conveniently listen to and use the music in various modes to create videos. But the music is copyrighted and if it is randomly used for commercial video it will cause infringement problems, compromising the rights and interests of the music creators.

A patent with a patent publication number of CN101493918A discloses a music infringing method, which specifically comprises the following steps:

the invention relates to an online music piracy monitoring method and a system, which sequentially comprise the following steps: the audio fingerprint extraction module acquires an audio download address from the Internet: the audio fingerprint extraction module reads an audio file from the audio download address, and the audio file is processed to obtain an audio fingerprint: the monitoring analysis module compares the audio fingerprint with the audio fingerprint of the genuine audio file: if the comparison result is larger than the set wide value, the infringement positioning module further acquires the information of the suspected infringement person and sends out a warning to the suspected infringement person. Compared with the prior art, the invention has the technical effects that: through the technical means of network spiders, audio fingerprint extraction, feature code extraction and the like, the network digital music resources are effectively monitored, and the evidence and the warning of the infringement behavior are obtained, so that the whole process is automatic, the cost is greatly saved, the time is saved, and the timeliness of the right maintenance is ensured.

Although the above patent can judge the music infringement, the accuracy and performance of infringement detection cannot be guaranteed.

Disclosure of Invention

In order to solve the problems, the invention provides a music infringement detection method which can greatly improve the accuracy and performance of infringement detection based on deep learning on the basis of judging infringement.

The technical scheme of the invention is as follows:

a music infringement detection method comprises the following steps:

s1: carrying out short-time Fourier transform on music in a music library to obtain a frequency spectrum signal;

s2: performing dynamic resolution compression on the frequency spectrum signal;

s3: calculating an extreme point of each frequency band interval according to the compressed frequency spectrum signal;

s4: filtering the extreme points, and subtracting every two extreme points to obtain music vectors of the music library;

s5: compressing music vectors of a music library into int32 according to bits;

s6: repeating the steps S1-S5 aiming at all music in the music library, constructing a hash table by taking int32 as Key and music ID as Value, wherein the music ID progressively marks each piece of music according to the time sequence of the music entering the music library;

s7: inputting training audio, acquiring a frequency spectrum signal of the training audio by using short-time Fourier transform, repeating the steps S2-S5, acquiring training music vectors, colliding with a hash table containing all music library music vectors, sequencing according to the time of collision, calculating the Euclidean distance between the two vectors, and normalizing to obtain infringement probability;

s8: inputting a test audio, acquiring a frequency spectrum signal of the test audio by using short-time Fourier transform, repeating the steps S2-S5, acquiring a test music vector, colliding with a hash table containing all music library music vectors, sequencing according to the time of collision, calculating the Euclidean distance between the two vectors, and normalizing to obtain the infringement probability.

Preferably, the specific process of the dynamic resolution compression in step S2 is as follows:

s2.1: forming a spectrogram by using the input Fourier transformed spectrum signal;

s2.2: vertically and uniformly dividing the spectrogram into a plurality of regions;

s2.3: performing feature extraction on the region in the step S2.2 through a convolutional neural network;

s2.4: judging whether the region belongs to useful features or not, and rejecting partial regions not containing the useful features;

s2.5: and splicing the rest regions into a new spectrogram again.

Preferably, the convolutional neural network comprises six convolutional layers and three fully-connected layers, the convolutional layers comprise eight 1 × 1 convolutional kernels, two layers of the fully-connected layers comprise 1024 neurons, and one layer comprises 2 neurons.

Preferably, the calculation of the extreme point in step S3 is to find a maximum value and a minimum value, and the calculation formula of the maximum value is:

the calculation formula of the minimum value is as follows:

。

preferably, the filtering step in step S4 includes:

s4.1: screening all extreme points through a multilayer fully-connected neural network;

s4.2: eliminating the extreme point which is output as 0 after passing through the multilayer fully-connected neural network, and reserving the extreme point which is output as 1;

s4.3: and splicing the residual extreme points of different frequency bands and outputting.

Preferably, the fully-connected neural network comprises three layers, wherein each of the first and second layers comprises 1024 neurons, and the third layer comprises 2 neurons.

The invention also provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the music infringement detection method when executing the computer program.

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the music infringement detection method.

The invention has the beneficial effects that: the method and the device perform feature extraction on the frequency spectrum information of the music through the convolutional neural network and the full-connection network, can extract useful features in multiple dimensions, do not need manual screening, and improve the accuracy and the efficiency of detection.

Drawings

Fig. 1 is a flowchart of a method provided in an embodiment of the present invention.

Fig. 2 is a detailed flowchart of dynamic resolution compression.

FIG. 3 is a flow chart of extreme point filtering.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides a music infringement detection method, which includes the following specific steps:

1. and carrying out short-time Fourier transform on the music in the music library to obtain a spectrogram.

The method comprises the steps of adding a sliding time window to an input music signal by utilizing short-time Fourier change, and carrying out Fourier transformation on the signal in the window to obtain time-varying frequency spectrum processing of the signal so as to convert a time domain signal into a frequency domain signal.

2. And performing dynamic resolution compression on the acquired spectrogram.

3. And calculating all extreme points of each frequency band interval.

4. Filtering the extreme points.

5. And obtaining vectors by pairwise subtraction of the extreme points.

In the step 3-5, a piece of music corresponds to a vector, specifically, a minimum value is subtracted from a maximum value of each frequency band to obtain a number, and then the obtained numbers of each frequency band are combined to obtain the vector.

6. Repeating the steps 1-5 for each piece of music in the music library to obtain vectors corresponding to all pieces of music, compressing the vectors into int32 according to bits, and constructing a HashTable by taking int32 as Key and music ID as Value.

7. Inputting training audio, repeating the steps 2-5 to obtain vectors, colliding with the HashTable, sequencing the collision of each piece of music according to time, calculating infringement probability, and comparing infringement results with the labels to obtain a training model.

8. Inputting test audio, repeating the steps 2-5 to obtain vectors, colliding with HashTable, sequencing the collision of each piece of music according to time, calculating infringement probability, and outputting whether infringement exists or not.

As an embodiment of the present invention, as shown in fig. 2, the specific process in step 2 is:

2.1, inputting a spectrogram.

2.2, vertically dividing the spectrogram into a plurality of regions, wherein the regions are divided into 256 regions in the embodiment.

And 2.3, extracting the features of the region in the step S2.2 through a convolutional neural network.

And 2.4, judging whether the area belongs to the useful features or not, and rejecting the partial area not containing the useful features.

And 2.5, splicing the rest areas into a new spectrogram again.

As an embodiment of the present invention, the calculation of the extreme point in step 3 is to find a maximum value and a minimum value, and the maximum value is calculated by the following formula:

the calculation formula of the minimum value is as follows:

。

as an embodiment of the present invention, the convolutional neural network includes six convolutional layers and three fully-connected layers, wherein the convolutional layers include eight convolutional cores of 1 × 1, two layers of the fully-connected layers include 1024 neurons, and one layer includes 2 neurons.

As an embodiment of the present invention, as shown in fig. 3, the specific process of filtering in step 4 is:

4.1, inputting an extreme point;

4.2, through the full-connection neural network;

4.3, outputting whether an extreme point is reserved;

and 4.4, splicing and outputting the residual extreme points.

As an embodiment of the present invention, the fully-connected neural network comprises three layers, wherein the first layer and the second layer each comprise 1024 neurons, and the third layer comprises 2 neurons.

The practical examples of the method are as follows: the music library has music A, B, C, D, music IDs are 1, 2, 3 and 4 respectively, for example, A is firstly carried out short-time Fourier transform on A to obtain frequency spectrum signals [1.6393873e-05, -2.2720376e-05, -1.9727035e-05,.;, 0.0000000e +00, 0.0000000e +00, 0.0000000e +00], then dynamic resolution compression is carried out on the obtained frequency spectrum signals by a convolutional neural network to obtain [1.6393873e-05, -1.3622489e-05, -3.8468256e-05,.;, 1.4637652e-05, -2.58741654e-05, -1.8945687ee-05], then maximum values and minimum values are found out from the frequency spectrum signals according to regions, and then extreme value filtering is carried out by the neural network to obtain a maximum value sequence [1.6393873e-05, 2.9647521e-05,;. 3.7123548,.; 1.9647581e-05 ], 2.4874165e-05, 1.5512479e-05] and a minimum sequence [ -1.3222547e-05, -1.39852657e-05, -3.7988510e-05,.;, -1.3347891e-05, -2.6955249e-05, -2.58741654e-05], and then subtracting the two sequences to obtain a vector of music library music A [2.96164200e-05, 4.36327867e-05, 3.71239279e +00,;. 3.29954720e-05, 5.18294140e-05, 4.13866444e-05 ]. BCD is also performed as above, resulting in the corresponding vector. Then compressing the data in the vector into int32 according to bits, and constructing a hash table as follows:

Key	(music A corresponding vector)	(vector corresponding to music B)	(music C corresponding vector)	(vector corresponding to music D)
					Value	1	2	3	4

Then, the vector of the training music T is calculated according to the method, the training music T collides with the upper table, namely the music vector of the library is compared with the T in pairs, the Euclidean distance is calculated, and then collision results (infringement probability) are sequenced according to the time when the collision occurs, so that the infringement probability of each music in the library of the T is obtained.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A music infringement detection method is characterized by comprising the following steps:

s2: performing dynamic resolution compression on the frequency spectrum signal;

s5: compressing music vectors of a music library into int32 according to bits;

s8: inputting a test audio, acquiring a frequency spectrum signal of the test audio by using short-time Fourier transform, repeating the steps S2-S5 to acquire a test music vector, colliding with a hash table containing all music library music vectors, sequencing according to the time of collision, calculating the Euclidean distance between the two vectors, and normalizing to obtain infringement probability; the specific process of dynamic resolution compression in step S2 is:

s2.5: splicing the rest areas into a new spectrogram again;

the convolutional neural network comprises six convolutional layers and three fully-connected layers, wherein the convolutional layers comprise eight 1x1 convolutional kernels, two layers of the fully-connected layers comprise 1024 neurons, and one layer comprises 2 neurons.

2. The method for detecting music piracy according to claim 1, wherein the extreme points in step S3 are calculated by finding the maximum value and the minimum value, and the maximum value is calculated by the formula:

the calculation formula of the minimum value is as follows:

。

3. the music infringement detection method of claim 1, wherein the filtering step in step S4 includes:

4. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 3 when executing the computer program.

5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.