CN114925231A

CN114925231A - Pirated audio detection method, apparatus and computer program product

Info

Publication number: CN114925231A
Application number: CN202210567984.6A
Authority: CN
Inventors: 何礼
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-08-19

Abstract

The application relates to the technical field of audio, and provides a pirated audio detection method, computer equipment and a computer program product. According to the method and the device, pirated audio in the audio library can be efficiently and accurately detected. The method comprises the following steps: the method comprises the steps of determining the same type of audio matched with the audio features of the audio to be detected from an audio library to obtain an audio group to be detected comprising the audio to be detected and the same type of audio, inputting audio matching images of the audio in the audio group to be detected into a trained audio classification model to obtain audio classification results of the audio in the audio group to be detected output by the model, determining the benchmarking audio in the audio group to be detected according to the classification results, and identifying pirated audio in the audio group to be detected based on the benchmarking audio.

Description

Pirated audio detection method, apparatus and computer program product

Technical Field

The present application relates to the field of audio technologies, and in particular, to a pirated audio detection method, a computer device, and a computer program product.

Background

With the development of the internet and audio technology, various audio applications provide users with various audio services such as audio playing, audio searching, song listening, music recognition, and the like. Pirated audio in the audio-frequency-based pirated audio is accurately and efficiently detected, and the audio service quality and the user experience can be improved.

The prior art mainly adopts a manual review mode to identify pirated audio in an audio library. However, the existing method wastes time and labor, has the technical problem of low detection efficiency, and cannot deal with massive stock data and daily incremental data in an audio library.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a pirate audio detection method, a computer device and a computer program product for solving the above technical problems.

In a first aspect, the present application provides a method for detecting pirated audio. The method comprises the following steps:

determining the same type of audio frequency matched with the audio frequency characteristics of the audio frequency to be detected from an audio frequency library to obtain an audio frequency group to be detected comprising the audio frequency to be detected and the same type of audio frequency;

inputting the audio matching image of each audio in the audio group to be detected into the trained audio classification model to obtain the audio classification result of each audio in the audio group to be detected output by the audio classification model; the audio classification result is used for indicating whether the audio is a legal audio or not;

determining the benchmark audio in the audio group to be detected according to the audio classification result;

and identifying pirated audio in the audio group to be detected based on the benchmark audio.

In a second aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

determining the same type of audio frequency matched with the audio frequency characteristics of the audio frequency to be detected from an audio frequency library to obtain an audio frequency group to be detected comprising the audio frequency to be detected and the same type of audio frequency; inputting the audio matching image of each audio in the audio group to be detected into the trained audio classification model to obtain the audio classification result of each audio in the audio group to be detected output by the audio classification model; the audio classification result is used for representing whether the audio is a legal audio or not; determining the benchmark audio in the audio group to be detected according to the audio classification result; and identifying pirated audio in the audio group to be detected based on the benchmark audio.

In a third aspect, the present application also provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

According to the pirate audio detection method, the computer equipment and the computer program product, the similar audio matched with the audio features of the audio to be detected is determined from the audio library to obtain the audio group to be detected comprising the audio to be detected and the similar audio, then the audio matching of each audio in the audio group to be detected is input into the trained audio classification model to obtain the audio classification result of each audio in the audio group to be detected output by the model, the benchmark audio in the audio group to be detected is determined according to the classification result, and the pirate audio in the audio group to be detected is identified based on the benchmark audio. The scheme recalls similar audios matched with the audio features of the audios to be detected from an audio library through an audio identification system and forms an audio group to be detected with the same, then the matching of each audio in the audio group to be detected is input into an audio classification model so as to determine a benchmark audio in the audio group to be detected, and finally whether other audios in the audio group to be detected are pirated audios or not can be determined based on the benchmark audio.

Drawings

FIG. 1 is a flow diagram illustrating a method for pirated audio detection in one embodiment;

FIG. 2 is a flowchart illustrating the steps of retrieving homogeneous audio in one embodiment;

FIG. 3 is a flow diagram illustrating the steps of determining homogeneous audio in one embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The pirated audio detection method provided by the embodiment of the application can be applied to computer equipment such as a server and the like to execute. The server may be implemented by an independent server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 1, there is provided a pirate audio detection method comprising the steps of:

and S101, determining the similar audio matched with the audio characteristics of the audio to be detected from an audio library to obtain the audio to be detected group comprising the audio to be detected and the similar audio.

The method mainly comprises the steps of recalling the audio matched with the audio characteristics of the audio to be detected from an audio library after the audio to be detected is obtained, wherein the audio matched with the audio characteristics of the audio to be detected is called as the same type of audio, the number of the same type of audio is generally multiple, and then combining the audio to be detected and the multiple same type of audio to form an audio group to be detected. In practical application, the audio identification system can be used for determining the similar audio from an audio library, the audio identification system is a system with audio identification capability and is mainly used for identifying the audio similar to the audio to be detected from the audio library, and the audio identification system specifically identifies the audio similar to the audio to be detected in the audio library as the similar audio according to the audio characteristics of the audio to be detected. For the audio features, audio features such as Landmark audio fingerprints can be adopted.

In the specific application, taking a song as an audio frequency as an example, the audio frequency identification system can adopt a song listening identification system based on the Landmark, after a song to be detected is obtained, the song matched with the Landmark audio frequency fingerprint of the song is determined from a song library as a similar song through the song listening identification system based on the Landmark, and the similar song and the song to be detected form a song group to be detected.

And S102, inputting the audio matching picture of each audio in the audio group to be detected into the trained audio classification model to obtain the audio classification result of each audio in the audio group to be detected, which is output by the audio classification model.

As an example, if the audio in the audio group to be checked is songs, the audio map may be an album map of each song, and the album map of the song may include at least one of the following information: singer portrait, singer name, album release time, and names of the songs contained in the album.

In this step, the respective audio matching diagram of each audio in the audio group to be detected can be obtained, the respective audio matching diagram of each audio in the audio group to be detected is input into the trained audio classification model, and the audio classification model outputs the audio classification result of the corresponding audio according to the audio matching diagram, so that the respective audio classification result of each audio in the audio group to be detected is obtained, and the audio classification result can indicate whether the corresponding audio is a genuine audio or not.

Specifically, for the audio classification model, a residual network model for target classification in the computer vision field may be used, specifically, a binary classification model based on audio mapping may be used, and a Resnet50 deep neural network model or other models may be used in specific implementations. Although the binary classification model based on audio matching can relatively accurately preliminarily distinguish whether the audio is genuine audio or pirated audio, the embodiment further improves the accuracy of identifying the pirated audio, avoids identifying the pirated audio based on a single dimension of audio matching, and can determine the genuine audio from the audio group to be detected according to the audio classification result.

Here, the pirated audio refers to audio content that is consistent with original audio/original audio, but other information of the audio is inconsistent. Taking a song as an example, a pirated song is consistent with the original song audio, but such songs as the singer's name and the song's name are inconsistent with the original song, are called pirated songs.

And S103, determining the benchmark audio in the audio group to be detected according to the audio classification result.

In this step, after the respective audio classification result of each audio in the to-be-detected audio group output by the audio classification model is obtained, whether the audio is the flagpole audio can be determined according to whether the audio classification result represents that the corresponding audio is the genuine audio. Specifically, if the audio classification result of an audio frequency in the audio frequency group to be detected is the legal version audio frequency, the audio frequency can be determined as the benchmark audio frequency in the audio frequency group to be detected, and if the audio classification result of a plurality of audio frequencies in the audio frequency group to be detected is the legal version audio frequency, the benchmark audio frequency can be further determined from the plurality of legal version audio frequencies.

Step S105, based on the benchmarking audio, identifying pirated audio in the audio group to be detected.

In this step, it is mainly determined in step S104 that the benchmarking audio in the audio group to be checked is used to identify whether other audio in the audio group to be checked is pirated audio. Specifically, the flagpole audio and other audio in the audio group to be detected can be compared aiming at other audio information, and whether the flagpole audio is the pirate audio or not is judged according to the comparison result. For example, the other audio information may be information of an audio publisher and a creator, and the audio publisher and the creator of the banner audio may be compared with the audio publishers and creators of other audio in the audio group to be checked, and other audio that is inconsistent with the banner audio may be identified as pirated audio.

According to the pirate audio detection method, the similar audio matched with the audio characteristics of the audio to be detected is recalled from the audio library and forms an audio group to be detected, then the matching of each audio in the audio group to be detected is input into the audio classification model so as to determine the benchmark audio in the audio group to be detected, and finally whether other audio in the audio group to be detected is the pirate audio can be determined based on the benchmark audio.

For the above-mentioned homogeneous audio determined from the audio library in step S101 to match with the audio features of the suspected audio, in an embodiment, as shown in fig. 2, the following steps may be included:

step S201, segmenting the audio to be detected to obtain a plurality of audio segments.

The audio frequency to be detected has a certain time length, and can be segmented according to a certain time length interval to obtain a plurality of audio frequency fragments. For example, for a 3-minute duration song to be examined, the song to be examined may be segmented at 6-second duration intervals, resulting in a total of 30 audio segments.

Step S202, aiming at each audio clip, determining candidate same-class audio groups matched with the audio features of the audio clip from an audio library to obtain a plurality of candidate same-class audio groups.

In this step, each audio segment obtained by segmentation in step S201 may recall, from the audio library, an audio frequency that matches with its audio frequency feature as a candidate similar audio frequency, where the number of candidate similar audio frequencies is generally multiple, and these candidate similar audio frequencies are used as corresponding audio frequency segments to form a candidate similar audio frequency group, and each audio segment corresponds to a candidate similar audio frequency group, so as to obtain multiple candidate similar audio frequency groups. In an example, a plurality of audio clips may be respectively input to the audio recognition system, and the audio recognition system matches candidate homogeneous audio of a current audio clip from the audio library, where the current audio clip is an audio clip of the plurality of audio clips.

And step S203, obtaining the similar audio matched with the audio characteristics of the audio to be detected according to the candidate similar audio commonly contained in each candidate similar audio group.

Specifically, each audio segment corresponds to a candidate homogeneous audio group. In this step, the candidate similar audios included in the candidate similar audio groups can be used as the similar audios matched with the audio features of the audio to be detected. In the concrete implementation, if the audio is represented by adopting an audio identification number (audio ID) mode, the audio IDs of all candidate similar audio groups are intersected, and the audio corresponding to the audio ID in the intersection is taken as the similar audio matched with the audio characteristic of the audio to be detected.

In the embodiment, the candidate similar audio groups matched with the audio clips are recalled in a mode of segmenting the audio to be detected, and then the similar audio of the audio to be detected is obtained according to the candidate similar audio commonly contained in the candidate similar audio groups, so that the accuracy of recalling the similar audio of the audio to be detected in the audio library is improved.

Further, in an embodiment, as shown in fig. 3, the step S203 may specifically include:

step S301, obtaining the feature matching time of each candidate similar audio and a plurality of audio segments in each candidate similar audio group, and the matching degree of each candidate similar audio.

As described in the above embodiments, the audio matching its audio features may be recalled from the audio library as candidate homogeneous audio that will constitute the candidate homogeneous audio group of the corresponding audio segment. The matching degree between the candidate similar audio and the corresponding audio segment can be output while the candidate similar audio is recalled, for example, the matching degree of each candidate similar audio can be output by the audio recognition system. In addition, candidate homogeneous audio is recalled in the audio library based on the audio features of the audio segment when determining the candidate homogeneous audio, and the matching of the audio features may correspond to a point in time of the audio, which is referred to as a feature matching time. For example, in the case that a certain pitch feature of the audio segment is identified to match a pitch feature of an audio in the audio library, the audio may be regarded as a candidate homogeneous audio, and a time point at which the pitch feature corresponds to the audio segment may also be obtained, and a time point at which the pitch feature corresponds to the candidate homogeneous audio is output, for example, a time point at which the pitch feature appears in the audio segment and b time point at which the pitch feature appears in the candidate homogeneous audio are also obtained, and the a time point and the b time point are referred to as feature matching times. Therefore, the step can also obtain the characteristic matching time of each candidate same-class audio and a plurality of audio segments in each candidate same-class audio group.

Step S302, aiming at each candidate similar audio group, determining candidate similar audio of which the matching degree meets the threshold condition of the matching degree and/or the characteristic matching time difference meets the threshold condition of the time difference in the candidate similar audio group to obtain a screened candidate similar audio group.

The step is to further screen each candidate homogeneous audio in each recalled candidate homogeneous audio group, wherein the screening basis may include at least one of a matching degree and a feature matching time difference.

If the screening process is performed based on the matching degree, this process may be called matching degree screening, and when the screening is performed based on the matching degree, all the candidate similar audios in each group of candidate similar audios are filtered/screened through a matching degree threshold. For any candidate similar audio, comparing the matching degree with the threshold value of the matching degree, if the matching degree is greater than the threshold value of the matching degree, determining that the candidate similar audio meets the threshold condition of the matching degree, and keeping the candidate similar audio in the candidate similar audio group where the candidate similar audio is; if the matching degree is less than or equal to the threshold value of the matching degree, the candidate same-class audio can be removed from the candidate same-class audio group where the candidate same-class audio is located. Through the processing of the process, candidate same-class audio groups screened by the matching degree can be obtained.

If the filtering is based on the feature matching time difference, the process can also be called matching time filtering. Specifically, for each group of candidate similar audios in the candidate similar audio group subjected to matching degree screening, filtering/screening is performed through a time difference threshold. For a candidate same-class audio in any group, calculating the difference between the feature matching time of the candidate same-class audio and the feature matching time of a corresponding audio clip to obtain the feature matching time difference of the candidate same-class audio, then comparing the feature matching time difference with the time difference threshold, if the feature matching time difference is smaller than the time difference threshold, determining that the candidate same-class audio meets the time difference threshold condition, and keeping the candidate same-class audio in the candidate same-class audio group which is subjected to matching degree screening and is located; if the characteristic matching time difference is larger than or equal to the time difference threshold, the candidate same-class audio can be removed from the candidate same-class audio group which is subjected to matching degree screening. Through the processing of the process, candidate homogeneous audio groups screened by the matching time can be obtained.

Therefore, the candidate similar audio groups in the candidate similar audio groups are screened and filtered through at least one mode, and the screened candidate similar audio groups can be obtained.

And step S303, obtaining the similar audio matched with the audio characteristics of the audio to be detected according to the candidate similar audio commonly contained in the screened candidate similar audio groups.

In this step, the candidate similar audios included in the screened candidate similar audio groups can be used as the similar audios matched with the audio features of the audio to be detected.

According to the method and the device, on the basis of audio segmentation matching and recall of the candidate similar audio, at least one of matching degree and feature matching time is further combined, the candidate similar audio is screened to obtain the similar audio, the similarity degree of the audio to be detected and the recalled similar audio is ensured, and the accuracy of recalling the similar audio from an audio library is further improved.

For the process of recalling the same type of audio described in steps S201 to S203 and steps S301 to S303, the following exemplary description takes a song to be detected as the audio to be detected, a song listening and song recognition system based on a Landmark as the audio recognition system, and a Landmark audio fingerprint as the audio feature:

firstly, segmenting a song to be detected to obtain a plurality of song segments to be detected, then matching each song segment to be detected through a song listening identification system based on a Landmark, and returning a song identifier, feature matching time and matching degree matched with the song segment by the song listening identification system based on the Landmark.

Wherein, for the matching process of the audio song segment to be detected by the audio song recognition system based on the Landmark, in particular, the audio characteristic adopted by the audio song recognition system based on the Landmark is the audio fingerprint of the Landmark, for any audio, the extraction process of the Landmark audio fingerprint is mainly to obtain a time frequency spectrum by Fourier transform of time domain audio to a frequency domain, wherein the horizontal axis of the time frequency spectrum represents a time index, the vertical axis represents a frequency index, then extracting the local peak point of the time frequency point signal in the time frequency spectrum, and searching the other peak points around the local peak point to form a peak point combination (t1, f1, t2, f2), even if peak points exist in the frequency points (t1, f1) and the time frequency points (t2, f2), the information is further simplified into simplified information such as (f1, f2, t2-t1), and then the simplified information (f1, f2, t2-t1) can be hashed to obtain a hash value. Therefore, the corresponding hash value can be calculated for each song in the song library, the song library index is constructed according to the respective hash value of each song, for the song segment to be detected, the song listening and identifying system can calculate the hash value and use the hash value as the index hash value, the index hash value is utilized to carry out matching on the song library index, and the matched song identification, the feature matching time and the matching degree are returned.

Therefore, each song segment to be detected can be recalled to a group of candidate similar songs correspondingly, in order to ensure the consistency of the song to be detected and the similar songs recalled, the candidate similar songs recalled by each song segment to be detected can be screened through a matching degree threshold value, the candidate similar songs with the matching degree larger than the matching degree threshold value are reserved, the candidate similar song groups screened through the matching degrees are obtained, then, for each candidate similar song in the candidate similar song groups screened through the matching degrees, the difference between the feature matching time of the candidate similar songs and the feature matching time of the corresponding song segment to be detected is calculated to obtain a feature matching time difference, the candidate similar songs with the feature matching time difference smaller than the time difference threshold value are reserved, and the candidate similar song groups screened through the matching times are obtained. And finally, taking the candidate similar songs which are jointly contained in the candidate similar song groups screened by the matching time as similar songs to be detected.

In an embodiment, the determining the benchmarking audio in the audio group to be detected according to the audio classification result in step S103 specifically includes:

acquiring the audio heat of each audio in the audio group to be detected; and determining the benchmark audios in the audio group to be detected according to the audio heat and the audio classification result.

In this embodiment, the respective audio heat of each audio in the audio group to be detected is obtained, and the benchmarking audio in the audio group to be detected is determined by combining the audio heat and the audio classification result provided by the audio classification model. In a specific application, the audio popularity can be determined according to the information such as the playing number, the clicking number, the release date and the like of the audio. The embodiment determines the benchmarking audio frequency in the audio group to be detected by combining the audio frequency heat and the audio frequency classification result provided by the model, can avoid the problem that the accuracy of pirated audio detection is influenced by a single dimension possibly existing in the audio frequency classification result provided by the model, relatively speaking, the audio frequency of original edition audio frequency is more excellent in manufacture and has higher heat, so that the benchmarking audio frequency in the audio group to be detected can be more accurately determined, and the robustness of pirated audio frequency detection is further improved.

Further, in an embodiment, the determining the benchmarking audio in the audio group to be detected according to the audio heat and the audio classification result in the above embodiment specifically includes:

and determining the audio frequency of which the audio frequency heat in the audio frequency group to be detected meets the preset heat condition and the audio frequency classification result is the legal audio frequency as the benchmark audio frequency.

The embodiment mainly selects the audio which is judged to be the genuine audio by the audio classification model and has high audio heat as the benchmark audio from the audio group to be detected. Specifically, the audio with the audio heat greater than or equal to the preset heat threshold may be determined as the audio meeting the preset heat condition, the audio with the highest audio heat may also be determined as the audio meeting the preset heat condition, and then the audio meeting the preset heat condition may be selected as the flagpole audio in at least one audio whose audio classification result is the genuine audio.

In one example, when a banner audio is obtained from a plurality of genuine audios, only one of the genuine audios may be selected as the banner audio. Specifically, the audio contents of the respective audios in the audio group to be examined are consistent, and the difference may be only in difference of the audio description information of the respective audios, and it can be understood that the respective audios are actually generated based on the same audio content, and therefore, one audio with the highest reliability can be obtained from a plurality of genuine audios, and the audio can be regarded as an audio content source of other similar audios in the audio group to be examined, and is taken as a benchmarking audio.

In one embodiment, the identifying pirated audio in the suspect audio group based on the benchmarking audio in step S105 may include:

acquiring a benchmark audio and audio description information of audio to be compared in an audio group to be detected; and determining whether the audio to be compared is pirated audio according to the consistency of the audio description information of the audio to be compared and the audio description information of the benchmarking audio.

In this embodiment, after the flagpole audio frequency in the audio frequency group to be detected is identified, other audio frequencies except the flagpole audio frequency in the audio frequency group to be detected can be used as audio frequencies to be compared, the audio frequency description information of each audio frequency to be compared is compared with the audio frequency description information of the flagpole audio frequency, the consistency between the two audio frequencies is judged, and whether the audio frequency to be compared is a pirate audio frequency is determined according to the consistency. Specifically, the audio description information may be text information for describing the audio, and may include duration of the audio, a creator of the audio, a name of the audio, and the like, and by comparing whether the audio description information is consistent, it is determined whether the audio to be compared in the group of audios to be detected is a pirated audio, if the audio description information of the two is consistent, it may be determined that the audio to be compared is an original edition audio, and if the audio description information of the two is not consistent, it may be determined that the audio to be compared is a pirated audio. In the embodiment, the consistency of the audio description information of the benchmarking audio and the audio to be compared is compared by combining the characteristic that the pirated audio has the audio description information for tampering the original edition audio, so that the pirated audio in the audio group to be detected can be more accurately and efficiently identified.

Further, in some embodiments, the audio description information may include a plurality of audio descriptor information corresponding to a plurality of audio description items, that is, the audio description information may include a plurality of audio descriptor information, taking a song as an example of audio, the audio description items may include a singer name and a song title of the song, the audio descriptor information corresponding to the singer name refers to a specific name of the singer, and the audio descriptor information corresponding to the song name refers to a specific name of the song. Based on this, the determining whether the audio to be compared is a pirated audio according to the consistency between the audio description information of the audio to be compared and the audio description information of the benchmarking audio in the above embodiments includes:

and if the audio to be compared is inconsistent with the audio descriptor information corresponding to the benchmark audio on at least one audio description item, determining that the audio to be compared is pirate audio.

In this embodiment, specifically, the audio descriptor information of the audio to be compared and the flagpole audio may be compared for each audio description item, each audio description item may correspond to a comparison result, and the comparison result indicates whether the audio descriptor information of the audio to be compared and the audio descriptor information of the flagpole audio are the same on the audio description item. If the audio to be compared and the benchmarking audio correspond to the audio descriptor information on at least one audio descriptor, such as the name of the singer or the name of the song, the audio to be compared can be determined to be pirated audio, and if the audio to be compared and the audio descriptor information on each audio descriptor correspond to each audio descriptor, the audio to be compared can be determined to be original audio. The scheme of the embodiment can be used for the user to set a plurality of audio description items specifically so as to detect the pirated audio in the audio group to be checked more accurately and efficiently.

In an embodiment, the method may further include the following step of training the audio classification model, specifically including:

acquiring audio matching images of the legal copy audio and the pirated audio; taking the audio matching image of the positive audio as a positive sample, and taking the audio matching image of the pirated audio as a negative sample; and training the audio classification model to be trained by utilizing the positive sample and the negative sample based on a preset loss function, and obtaining the audio classification model when the loss value of the preset loss function meets a preset loss threshold condition.

Specifically, in consideration of the characteristics of the pirated audio, besides tampering the information such as the audio name and the publisher name of the genuine audio, pictures different from the genuine audio are used as audio matching images, and in the embodiment, an audio classification model is obtained based on audio matching training of the genuine audio and the pirated audio, and is used for identifying the genuine audio and the pirated audio based on the audio matching images.

In this embodiment, an audio matching diagram of a positive audio and a pirated audio is obtained first, the audio matching diagram of the positive audio is used as a positive sample, the audio matching diagram of the pirated audio is used as a negative sample, a residual network model for target classification, such as Resnet50, is used as an audio classification model to be trained, a cross entropy loss function can be used as a preset loss function, the audio classification model to be trained is trained by using the positive sample and the negative sample based on the cross entropy loss function, and when a loss value of the preset loss function meets a preset loss threshold condition, such as a loss value less than or equal to a preset loss threshold value, a trained audio classification model based on the audio matching diagram is obtained, which can be a binary classification model based on the audio matching diagram and is used for identifying whether the model belongs to the positive audio based on the input audio matching diagram.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as audio. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a piracy audio detection method.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method for pirated audio detection, the method comprising:

determining the benchmark audios in the audio group to be detected according to the audio classification result;

2. The method of claim 1, wherein determining the benchmarking audio in the to-be-detected audio group according to the audio classification result comprises:

acquiring the audio heat of each audio in the audio group to be detected;

and determining the benchmark audio in the audio group to be detected according to the audio heat and the audio classification result.

3. The method of claim 2, wherein determining the benchmarking audio in the candidate audio group based on the audio heat and audio classification results comprises:

and determining the audio frequency of which the audio frequency heat degree meets a preset heat degree condition and the audio frequency classification result is the original audio frequency in the audio frequency group to be detected as the benchmark audio frequency.

4. The method of any of claims 1 to 3, wherein said identifying pirated audio in said set of suspect audio based on said benchmarking audio comprises:

acquiring the benchmark audio and audio description information of the audio to be compared in the audio group to be detected, wherein the audio to be compared is other audio except the benchmark audio;

and determining whether the audio to be compared is pirated audio according to the consistency of the audio description information of the audio to be compared and the audio description information of the benchmarking audio.

5. The method according to claim 4, wherein the audio description information comprises a plurality of audio description sub-information corresponding to a plurality of audio description items; the determining whether the audio to be compared is pirated audio according to the consistency of the audio description information of the audio to be compared and the audio description information of the benchmarking audio comprises:

and if the audio to be compared is not consistent with the audio description sub-information corresponding to the flagpole audio on at least one audio description item, determining that the audio to be compared is pirated audio.

6. The method of claim 1, wherein determining homogeneous audio from an audio library that matches audio features of the suspect audio comprises:

segmenting the audio to be detected to obtain a plurality of audio segments;

for each audio clip, determining candidate homogeneous audio groups matched with the audio features of the audio clips from the audio library to obtain a plurality of candidate homogeneous audio groups;

and obtaining the similar audio matched with the audio characteristics of the audio to be detected according to the candidate similar audio commonly contained in each candidate similar audio group.

7. The method according to claim 6, wherein said obtaining the homogeneous audio matched with the audio feature of the audio to be examined according to the candidate homogeneous audio commonly contained in each candidate homogeneous audio group comprises:

acquiring the feature matching time of each candidate similar audio in each candidate similar audio group and the plurality of audio segments and the matching degree of each candidate similar audio;

aiming at each candidate similar audio group, determining candidate similar audio in the candidate similar audio groups, wherein the matching degree meets a matching degree threshold condition and/or the characteristic matching time difference meets a time difference threshold condition, and obtaining screened candidate similar audio groups; the feature matching time difference is the difference between the feature matching time of the candidate audio of the same class and the feature matching time of the corresponding audio segment;

and obtaining the similar audio matched with the audio characteristics of the audio to be detected according to the candidate similar audio commonly contained in the screened candidate similar audio groups.

8. The method of claim 1, further comprising:

acquiring audio matching images of the legal copy audio and the pirated audio;

taking the audio matching image of the positive version audio as a positive sample, and taking the audio matching image of the pirated audio as a negative sample;

and training the audio classification model to be trained by utilizing the positive sample and the negative sample based on a preset loss function, and obtaining the audio classification model when the loss value of the preset loss function meets the preset loss threshold condition.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.