CN110808065A

CN110808065A - Method and device for detecting refrain, electronic equipment and storage medium

Info

Publication number: CN110808065A
Application number: CN201911031441.7A
Authority: CN
Inventors: 张文文; 张存义; 李佳文
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-02-18

Abstract

The present disclosure provides a refraining detecting method, apparatus, electronic device and computer readable storage medium, the method comprising: acquiring a plurality of audio clips from an audio file to be detected; for each audio segment, determining the similarity of the audio segment with each audio segment after the audio segment; aiming at each audio clip, calculating the number of the similarity exceeding a preset threshold value, and determining the repetition times of the audio clip; taking the audio clip with the most repetition times as a refrain; the present disclosure enables a process of accurately acquiring refrains.

Description

Method and device for detecting refrain, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of multimedia technologies, and in particular, to a method and an apparatus for detecting refrain, an electronic device, and a computer-readable storage medium.

Background

With the development of multimedia technology, people often use audio playing applications to play audio files. For example, a song may be played using audio playback software. The song master and the song deputy are main components of the popular song, the song master is generally the part of the song before the climax, and the melody is slowly pushed to the climax and the story background is clearly expressed; the refrain is the sublimation of emotion, the melody is strongly contrasted with the main song, the singing climax is the fragment with the most concentrated thinking and the most intense emotion in the song, and the singing climax is the center of the whole song and is often the place with the strongest memory.

In the related art, the detection of the refrain (climax) part is usually to manually find the climax part of the song, which not only has low search efficiency, but also needs to consume higher time cost and material cost.

Disclosure of Invention

In view of this, the present disclosure provides a method and an apparatus for detecting refraining, an electronic device and a computer-readable storage medium.

A first aspect of the present disclosure provides a refraining detection method, specifically including:

acquiring a plurality of audio clips from an audio file to be detected;

for each audio segment, determining the similarity of the audio segment with each audio segment after the audio segment;

aiming at each audio clip, calculating the number of the similarity exceeding a preset threshold value, and determining the repetition times of the audio clip;

and taking the audio clip with the most repetition times as the refrain.

Optionally, the similarity is a ratio of a length of the same content in the two audio segments to a half of a sum of the lengths of the two audio segments.

Optionally, the calculating, for each audio segment, the number of the similarity exceeding a preset threshold, and determining the number of repetitions of the audio segment includes:

if the similarity is greater than the preset threshold, making the similarity be 1, otherwise, making the similarity be 0;

and calculating the number of the similarity degrees of 1 for each audio segment, and determining the repetition times of the audio segment.

Optionally, the calculating, for each audio segment, the number of similarities of 1, and determining the number of repetitions of the audio segment includes:

constructing a similarity matrix based on the similarity of each audio clip and other audio clips, wherein the coordinates of points in the similarity matrix are the arrangement sequence of two audio clips with the similarity of 1 in the audio file to be detected;

determining continuous points in the similarity matrix, and filtering out the parts of the similarity matrix, the continuous number of which is less than the specified number;

and determining the repetition times of each audio segment based on the similarity matrix after filtering.

Optionally, the determining, for each audio segment, the number of repetitions of the audio segment based on the similarity matrix after the filtering processing includes:

and summing the point numbers of each column in the similarity matrix after filtering processing to obtain the repetition times of each audio frequency segment.

Optionally, the total number of audio clips included in the audio file to be detected is not less than 2 times of the specified number.

Optionally, the audio file to be detected is a lyric file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.

Optionally, the obtaining a plurality of audio clips from the audio file to be detected includes:

and preprocessing the lyric file of the audio to be detected to obtain multiple sentences of lyrics.

Optionally, the pre-processing comprises any one or more of:

the lyric text format normalization process, filter lyrics with total words less than a specified threshold, remove non-lyric portions of the lyric text, merge non-phrase-consistent lyrics, and remove lyrics containing non-specified languages.

According to a second aspect of the embodiments of the present disclosure, there is provided a refraining detecting apparatus, the apparatus comprising:

the audio clip acquisition unit is used for acquiring a plurality of audio clips from the audio file to be detected;

the similarity determining unit is used for determining the similarity of each audio clip and each audio clip after the audio clip;

the repetition frequency determining unit is used for calculating the number of the similarity exceeding a preset threshold value aiming at each audio clip and determining the repetition frequency of the audio clip;

and the refrain determining unit is used for taking the audio segment with the most repetition times as the refrain.

Optionally, the repetition number determining unit includes:

a setting subunit, configured to set the similarity to 1 if the similarity is greater than the preset threshold, and otherwise to 0;

and the repetition number calculation subunit is used for calculating the number of the similarity degrees of 1 aiming at each audio segment and determining the repetition number of the audio segment.

Optionally, the repetition number calculation subunit includes:

the matrix construction module is used for constructing a similarity matrix based on the similarity of each audio clip and other audio clips, and the coordinates of points in the similarity matrix are the arrangement sequence of two audio clips with the similarity of 1 in the audio file to be detected;

the filtering module is used for determining continuous points in the similarity matrix and filtering parts of which the continuous number of the continuous points in the similarity matrix is less than a specified number;

and the repetition frequency determining module is used for determining the repetition frequency of each audio segment based on the similarity matrix after the filtering processing.

Optionally, the repetition number determining module includes:

Optionally, the audio clip obtaining unit includes:

Optionally, the pre-processing comprises any one or more of:

the lyric text format is normalized, lyrics with a total number of words less than a specified threshold are filtered, non-lyric portions of the lyric text are deleted, and lyrics with inconsistent merging and breaking sentences are removed.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any of the first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of any one of the methods of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising the steps of any one of the methods of the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the method and the device for determining the refrain of the karaoke are characterized in that a plurality of audio clips are obtained from an audio file to be detected, then the similarity of each audio clip with each audio clip behind the audio clip is determined, the repetition times of the audio clips are determined according to the similarity, the audio clip with the most repetition times is used as a refrain part, the implementation process is simple and efficient, the obtained refrain is high in accuracy, a user does not need to drag a progress bar to find the climax of the song, the determination of the refrain can help the user to find favorite videos and music more effectively, the tedious operation of the user is reduced, and the use experience of the user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

FIG. 1 is a flow chart illustrating a method of detecting refraining according to an exemplary embodiment of the present disclosure;

FIG. 2A is a graph of a similarity matrix shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 2B is a second similarity matrix diagram illustrated in accordance with an exemplary embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a second method of detecting refraining according to an exemplary embodiment of the present disclosure;

FIG. 4 is a third similarity matrix diagram shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 5A is a diagram illustrating a fourth similarity matrix according to an exemplary embodiment of the present disclosure;

FIG. 5B is a fifth similarity matrix diagram illustrated in accordance with an exemplary embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating a refraining detecting apparatus according to an exemplary embodiment of the present disclosure;

fig. 7 is a block diagram illustrating an apparatus for performing an embodiment of a method for detecting refraining according to an example embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Aiming at the problems of low efficiency and high cost of manually searching for the song climax part in the related technology, the embodiment of the disclosure provides a refrain detection method which can be executed by electronic equipment, wherein the electronic equipment can be a computer, a smart phone, a tablet, a personal digital assistant or a server and other computing equipment, and the simple and efficient searching for the song climax part is realized.

Referring to fig. 1, a flowchart of a method for detecting refraining according to an exemplary embodiment of the present disclosure is shown, the method comprising:

in step S101, a plurality of audio clips are acquired from an audio file to be detected.

In step S102, for each audio segment, a similarity of the audio segment with each audio segment following the audio segment is determined.

In step S103, for each audio segment, the number of times that the similarity exceeds the preset threshold is calculated, and the number of times of repetition of the audio segment is determined.

In step S104, the audio piece with the largest number of repetitions is regarded as a refrain.

In an embodiment, the electronic device may pre-process the audio file to be detected to obtain a plurality of audio segments, where the pre-processing may be a normalization processing of the format of the audio file to be detected, or a pre-processing of a lyrics text contained in the audio file to be detected, such as filtering lyrics whose total number of words is less than a specified threshold, or a pre-processing of a melody text contained in the audio file to be detected, such as deleting a non-melody part in the melody text.

The audio file to be detected may be a lyric file of the audio to be detected or a song melody file, and it should be noted that the lyric file is a text file into which each sentence of lyrics is divided in advance, and the song melody file is a text file into which each section of melody is divided in advance, and accordingly, if the audio file to be detected is the lyric file, the audio clip is one sentence of lyrics; if the audio file to be detected is a song melody file (namely a music score), the audio clip is a segment of melody.

In an embodiment, after the electronic device obtains a plurality of audio clips, for each audio clip, determining similarity of each audio clip to each audio clip after the audio clip, where the similarity is a ratio of a length of a same content in the two audio clips to a half of a sum of lengths of the two audio clips, for example, if the electronic device obtains 4 audio clips from an audio file to be detected, a1 st audio clip calculates similarities with a2 nd, a 3 rd, and a 4 th audio clip, the 1 st audio clip has 3 similarities, the 2 nd audio clip calculates similarities with the 3 rd and the 4 th audio clips, the 2 nd audio clip has 2 similarities, and so on, the 3 rd audio clip has 1 similarity, and the 4 th audio clip has 0 similarity.

In one implementation, assuming that the audio segment is ith, the similarity si (i, j) with the jth (i < j) audio segment is:

where, numi is the character length of the ith audio clip, num_jThe sameNum is the character length of the jth audio clip, and the character length of the ith audio clip and the jth audio clip with the same content.

In this embodiment, after determining the similarity between each audio clip and each audio clip following the audio clip, the electronic device calculates, for each audio clip, the number of times that the similarity in the audio clip exceeds a preset threshold, and determines the number of repetitions of the audio clip, so as to use the audio clip with the largest number of repetitions as a refrain; the method is simple and efficient to realize, the refrain part can be quickly and accurately determined, the user does not need to drag a progress bar to search for the climax of the song, the determination of the refrain can help the user to more effectively find favorite videos and music, the complicated operation of the user is reduced, and the use experience of the user is improved; it can be understood that, in the embodiment of the present disclosure, the value of the threshold is not limited at all, and may be specifically set according to an actual situation, for example, the threshold may be 0.5.

In some scenarios, when the user clicks the audition song, only the refrain part can be played for the user, so that the user can quickly determine the desired song; or the refrain part may be marked in the song progress bar so that the user can directly locate the refrain part by clicking or dragging the progress bar.

In a possible implementation manner, when determining the similarity between each audio segment and each subsequent audio segment, if the similarity is greater than the preset threshold, the electronic device makes the similarity 1, otherwise, the similarity is 0, and after determining all the similarities, the electronic device determines the number of times of repetition of the audio segment according to the number of the similarities 1 in each audio segment.

It should be noted that if it is determined that the repetition times of all the audio clips are 0, it indicates that the audio file to be detected has no repeated content, the refrain cannot be detected through the repetition, and the electronic device ends the refrain detection process.

In an embodiment, considering that non-refrain parts in a song, such as a main song, a transition sentence, a final sentence, etc. of the song also frequently repeat, the inventor finds out after analyzing a large number of songs in the process of implementing the present disclosure that: the refrain and the ending sentence of the song are the most repeated parts, but the ending sentence is generally shorter, such as less than 4 sentences, and the climax part is generally longer, such as at least 4 sentences. Based on this, in order to further improve the accuracy of detecting the refrain, the electronic device may construct a similarity matrix based on the similarity between each audio clip and other audio clips, where the coordinates of points in the similarity matrix are the arrangement order of two audio clips with similarity of 1 in the audio file to be detected, then determine continuous points in the similarity matrix, filter out the part where the continuous number of the continuous points in the similarity matrix is smaller than the specified number, and finally determine the repetition number of the audio clip based on the similarity matrix after the filtering processing for each audio clip; such as filtering out a consecutive number of portions smaller than 4; it is understood that, in the embodiments of the present disclosure, specific values of the specified quantities are not limited at all, and may be specifically set according to actual situations.

As one possible implementation manner, the electronic device may sum the point numbers of each column in the similarity matrix after the filtering processing, so as to obtain the repetition number of each audio segment.

As an example, for example, audio segments a to G are extracted from an audio file to be detected (assuming that the arrangement sequence of the audio segments is the same as the alphabetical sequence), the similarity of each audio segment is calculated to determine the repetition frequency of each audio segment, and assuming that the audio segments a to G are { "you", "good", "you", "i", "you", "good", "you" } respectively in turn, the similarity between the audio segment a and the audio segment C, the similarity between the audio segment E and the audio segment G are 1, and the similarity between the audio segment a and the audio segments G is 0; the similarity between the audio clip B and the audio clip F is 1, and the similarity between the audio clip B and other audio clips is 0; the similarity of the audio clip C with the audio clip E and the similarity of the audio clip G with the audio clip E are both 1, and the similarity of the audio clip C with other audio clips is both 0; the similarity of the audio clip D and other audio clips is 0; the similarity of the audio clip E and the audio clip G is 1; the similarity of the audio clip F and other audio clips is 0; the electronic device may construct a 7 × 7 similarity matrix diagram as shown in fig. 2A based on the similarity between each audio clip and other audio clips, where the similarity matrix diagram represents the result of displaying the similarity of 1 after calculating the similarity between each audio clip on the horizontal axis and each audio clip on the vertical axis, and the points in the similarity matrix are located in the arrangement order of the two audio clips with the similarity of 1 in the audio file to be detected.

As can be seen from fig. 2A, points on coordinates (a, E), coordinates (B, F) and coordinates (C, G) are continuous, points on other coordinates are discontinuous, and the specified number is 2, the electronic device filters a portion of the similarity matrix where the continuous number of the continuous points is smaller than the specified number, so as to obtain a filtered similarity matrix as shown in fig. 2B, the electronic device determines, for each audio segment, the number of repetitions of the audio segment based on the filtered similarity matrix, in one implementation, the electronic device may sum the number of points in each column in the filtered similarity matrix, so as to obtain the number of repetitions of each audio segment, and finally, the audio segment with the largest number of repetitions is used as an refrain, for example, the refrain portion obtained based on fig. 2B is an audio segment a, Audio clip B and audio clip C.

It should be noted that, in order to ensure the accuracy of detecting the refrain, the embodiment of the present disclosure detects the audio frequency of the refrain part repeated at least 2 times, so that the total number of the audio frequency segments included in the audio frequency file to be detected in the embodiment of the present disclosure is not less than 2 times of the specified number, and the refrain cannot be detected through repeatability for the audio frequency of the audio frequency segment less than 2 times of the specified number.

Please refer to fig. 3, which is a flowchart illustrating a second method for detecting a refrain according to an exemplary embodiment of the present disclosure, in this embodiment, the audio file to be detected is a lyric file of an audio to be detected, and the audio clip is a lyric, for example, the method includes:

in step S301, a plurality of words are obtained from a lyric file of the audio to be detected.

In step S302, for each lyric, a similarity between the lyric and each lyric following it is determined.

In step S303, for each song word, the number of times that the similarity exceeds the preset threshold is calculated, and the number of repetitions of the song word is determined.

In step S304, the lyrics with the most repeated number are regarded as the refrains.

In an embodiment, the electronic device preprocesses the lyric file of the audio to be detected to obtain multiple sentences of lyrics, where the preprocessing includes any one or more of the following operations: the lyric text format standard is processed in a standardized way, lyrics with the total word number less than a specified threshold value are filtered, non-lyric parts in the lyric text are deleted (such as related lyric information, namely, information of 'words: Xiaoming', 'songs: small white', 'producers: Mingming', and the like), and lyrics with inconsistent merging and breaking sentences are deleted; it is understood that, in the present disclosure, specific values of the specified threshold are not limited at all, and may be set according to actual situations, for example, the specified threshold is 2.

In consideration of the inconsistency of the lyric text formats, for example, a lyric in some lyric texts corresponds to a time stamp, and each word in some lyric texts corresponds to a time stamp, so that normalization processing is required, for example, all lyric texts are processed into a lyric corresponding to a time stamp.

In addition, for example, a song may have multiple lyrics a, and lyrics a found in subsequent lyrics are divided into two sentences, a1 and a2, so as to avoid deviation in similarity calculation and improve the accuracy of detecting the refrain, two sentences, a1 and a2, are combined in the preprocessing process, that is, words, lyrics with inconsistent sentence breaks are combined.

It should be noted that the embodiment of the present disclosure does not set any limitation on the lyric language, and the embodiment of the present disclosure may implement the chorus detection on songs in any language and with repeatability.

In an embodiment, after obtaining multiple words of lyrics, the electronic device determines, for each word of lyrics, a similarity between the lyrics and each word of lyrics following the lyrics, and for example, assuming that the audio segment is the ith, a similarity si (i, j) between the lyrics and the jth (i < j) audio segment is:

wherein, num_iIs the character length of the lyrics of the ith sentence, num_jThe sum of the characters of the ith lyric and the jth lyric is the character length of the jth lyric, sameNum is the character length of the same content of the ith lyric and the jth lyric, for example, if the lyric file of the audio to be detected is Chinese, the sameNum indicates the same number of Chinese characters between the two lyrics; if the lyric file of the audio to be detected is English, sameNum indicates the same word number between two words.

In an embodiment, the electronic device may calculate the number of the similarity exceeding a preset threshold in each lyric, and determine the number of times of repetition of the lyric; it is understood that the threshold may be specifically set according to actual situations, for example, the threshold may be set to 0.5.

In one implementation, for subsequent convenience of calculation, if the similarity is greater than the preset threshold, the similarity is 1, otherwise, the similarity is 0, if a song has N sentences, after the similarity is determined, a similarity matrix diagram of N × N may be obtained, for example, if a song has 10 sentences, 2 nd sentences are the same as 6 th sentences, 3 rd sentences are the same as 7 th sentences, 4 th sentences are the same as 8 th sentences, and 5 th sentences are the same as 9 th sentences, the determined similarity matrix is as shown in fig. 4 (0 and 1 in fig. 4 are for illustration, where the horizontal axis represents i and the vertical axis represents j), the number of repetitions of each song word may be directly obtained from the diagram according to the number of similarities 1 in each audio segment, and which song words repeat, for example, the coordinates (3,6) represent that the 3 rd sentences and the 6 th lyrics repeat the lyrics.

In an example, please refer to fig. 5A, taking song "chengdu" as an example, white points represent that the similarity is 1, after preprocessing the lyric text and calculating the similarity, according to the arrangement order of each lyric in the audio file to be detected, a similarity matrix diagram as shown in fig. 5A can be constructed according to the similarity between each lyric and other lyrics, and the electronic device can determine the number of repetitions of each lyric, such as 20 abscissas (representing the 21 st sentence) repeated 2 times, according to the number of similarities (the number of white points) of 1 in each lyric, and repeat the lyrics with the 36 th sentence and the 46 th sentence, respectively.

In an embodiment, considering that non-refrain parts in a song, such as a main song, a transition sentence, a final sentence, etc. of the song also frequently repeat, the inventor finds out after analyzing a large number of songs in the process of implementing the present disclosure that: the refrain and the ending sentence of the song are the most repeated parts, but the ending sentence is generally shorter, such as less than 4 sentences, and the climax part is generally longer, such as at least 4 sentences. Based on the above, in order to further improve the accuracy of detecting the refrain, the electronic device may construct a similarity matrix based on the similarity between each lyric and other lyrics, where a point in the similarity matrix is located in an arrangement order of two lyrics with a similarity of 1 in the audio file to be detected, then determine continuous points in the similarity matrix, filter out a portion where the continuous number of the continuous points in the similarity matrix is smaller than a specified number, and finally determine the number of repetitions of the lyrics based on the similarity matrix after the filtering processing for each lyric; such as filtering out a consecutive number of portions smaller than 4; it is understood that, in the embodiments of the present disclosure, specific values of the specified quantities are not limited at all, and may be specifically set according to actual situations.

In one example, referring to the similarity matrix diagram shown in fig. 5A, for example, if the specified number is 4, the lyric with the repetition number of 0 is filtered, and the part with the sentence number of the consecutively repeated lyrics being less than 4 sentences is filtered, i.e., the line segment with the consecutive points less than 4 in the similarity matrix diagram shown in fig. 5 needs to be removed, so as to obtain fig. 5B, and each column in fig. 5B is added, i.e., the longitudinal white points are added, so as to obtain the values shown in table 1:

table 1 (sum of omitted parts is the same as before and after)

It can be seen that the part with the maximum value is the part with the most repetition times, and is determined as the detected refrain, i.e. the abscissa 16-22, i.e. the parts from clauses 17 to 23 are the detected refrain.

It should be noted that, in order to ensure the accuracy of detecting the refrain, the embodiment of the present disclosure detects the audio frequency of the refrain part repeated at least 2 times, so that the total number of words of the lyric included in the lyric file in the embodiment of the present disclosure is not less than 2 times of the specified number, and the refrain cannot be detected through repeatability for the audio frequency of which the total number of words of the lyric is less than 2 times of the specified number.

In the embodiment, multiple words of lyrics are obtained from a lyric file of the audio to be detected, then, for each word of lyrics, the similarity between the lyrics and each subsequent word of lyrics is determined, the repetition frequency of the words of the lyrics is determined according to the similarity, and therefore the lyrics with the largest repetition frequency are used as a part of the chorus.

Corresponding to the embodiments of the refraining detection method, the invention also provides embodiments of a refraining detection device, an electronic device and a computer readable storage medium.

Referring to fig. 6, a block diagram of an embodiment of a refraining detecting apparatus according to an embodiment of the present disclosure is shown, the apparatus including:

an audio clip acquiring unit 401, configured to acquire a plurality of audio clips from an audio file to be detected; the beginning of each audio segment corresponds to the beginning of the lyrics.

A similarity determining unit 402, configured to determine, for each audio segment, a similarity of the audio segment with each audio segment following the audio segment.

A repetition number determining unit 403, configured to calculate, for each audio segment, the number of times that the similarity exceeds a preset threshold, and determine the repetition number of the audio segment.

A refrain determining unit 404 for determining the audio segment with the most repetition times as a refrain.

Optionally, the repetition number determining unit 403 includes:

and the setting subunit is used for setting the similarity as 1 if the similarity is greater than the preset threshold, and otherwise, setting the similarity as 0.

Optionally, the repetition number calculation subunit includes:

and the matrix construction module is used for constructing a similarity matrix based on the similarity of each audio clip and other audio clips, and the coordinates of points in the similarity matrix are the arrangement sequence of the two audio clips with the similarity of 1 in the audio file to be detected.

And the filtering module is used for determining continuous points in the similarity matrix and filtering the part of which the continuous number of the continuous points in the similarity matrix is less than the specified number.

Optionally, the repetition number determining module includes:

Optionally, the audio segment obtaining unit 401 includes:

Optionally, the pre-processing comprises any one or more of:

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.

Accordingly, fig. 7 is a block diagram illustrating an apparatus for performing the above-described method embodiments according to an exemplary embodiment of the present disclosure.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the method embodiments of any of fig. 1 or 3 described above.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiment of the present disclosure further provides an electronic device 50, which includes a processor 51; a memory 52 (e.g., a non-volatile memory) for storing executable instructions, wherein the processor is configured to execute the instructions to implement the method embodiments of any of fig. 1 or 3 described above.

The Processor 51 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 52 stores executable instructions of the refraining detecting method, and the memory 52 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the apparatus may cooperate with a network storage device that performs a storage function of the memory through a network connection. The storage 52 may be an internal storage unit of the device 50, such as a hard disk or a memory of the device 50. The memory 52 may also be an external storage device of the device 50, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) Card, Flash memory Card (Flash Card), etc. provided on the device 50. Further, memory 52 may also include both internal and external storage units of device 50. The memory 52 is used for storing a computer program 53 as well as other programs and data required by the device. The memory 52 may also be used to temporarily store data that has been output or is to be output.

The various embodiments described herein may be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, and an electronic unit designed to perform the functions described herein. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory and executed by the controller.

The electronic device 50 includes, but is not limited to, the following forms of presence: (1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, etc.; (2) ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as iPad; (3) a portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices; (4) a server: the device for providing the computing service, the server comprises a processor, a hard disk, a memory, a system bus and the like, the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like as long as highly reliable service is provided; (5) and other electronic equipment with data interaction function. The device may include, but is not limited to, a processor 51, a memory 52, and as shown in fig. 7, the electronic device generally includes a memory 53 and a network interface 54. Of course, those skilled in the art will appreciate that fig. 5 is merely an example of the electronic device 50, and does not constitute a limitation on the electronic device 50, and may include more or less components than those shown, or combine some of the components, or different components, for example, the device may also include an input-output device, a network access device, a bus, a camera device, etc.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

The disclosed embodiments also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform the method embodiment of any one of fig. 1 or fig. 3.

The disclosed embodiments also provide a computer program product comprising executable program code, wherein the program code, when executed by the above-described apparatus, implements the method embodiments according to any of fig. 1 or fig. 3.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The present disclosure is to be considered as limited only by the preferred embodiments and not limited to the specific embodiments described herein, and all changes, equivalents, and modifications that come within the spirit and scope of the disclosure are desired to be protected.

Claims

1. A method for detecting refraining, comprising:

acquiring a plurality of audio clips from an audio file to be detected;

and taking the audio clip with the most repetition times as the refrain.

2. The method of claim 1, wherein the similarity is a ratio of a length of the same content in the two audio segments to half of a sum of the lengths of the two audio segments.

3. The method of claim 1, wherein the calculating, for each audio segment, the number of the similarity exceeding a preset threshold and determining the number of repetitions of the audio segment comprises:

4. The method of claim 3, wherein the calculating the number of similarity degrees 1 for each audio segment and determining the number of repetitions of the audio segment comprises:

5. The method of claim 4, wherein the determining the number of repetitions of each audio segment based on the filtered similarity matrix comprises:

6. The method according to claim 4, wherein the total number of audio clips included in the audio file to be detected is not less than 2 times the designated number.

7. The method according to any one of claims 1 to 6, wherein the audio file to be detected is a lyrics file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.

8. A refraining detecting apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.