CN110808065A - Method and device for detecting refrain, electronic equipment and storage medium - Google Patents

Method and device for detecting refrain, electronic equipment and storage medium Download PDF

Info

Publication number
CN110808065A
CN110808065A CN201911031441.7A CN201911031441A CN110808065A CN 110808065 A CN110808065 A CN 110808065A CN 201911031441 A CN201911031441 A CN 201911031441A CN 110808065 A CN110808065 A CN 110808065A
Authority
CN
China
Prior art keywords
audio
similarity
determining
segment
clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911031441.7A
Other languages
Chinese (zh)
Inventor
张文文
张存义
李佳文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Reach Best Technology Co Ltd
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Reach Best Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Reach Best Technology Co Ltd filed Critical Reach Best Technology Co Ltd
Priority to CN201911031441.7A priority Critical patent/CN110808065A/en
Publication of CN110808065A publication Critical patent/CN110808065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The present disclosure provides a refraining detecting method, apparatus, electronic device and computer readable storage medium, the method comprising: acquiring a plurality of audio clips from an audio file to be detected; for each audio segment, determining the similarity of the audio segment with each audio segment after the audio segment; aiming at each audio clip, calculating the number of the similarity exceeding a preset threshold value, and determining the repetition times of the audio clip; taking the audio clip with the most repetition times as a refrain; the present disclosure enables a process of accurately acquiring refrains.

Description

Method and device for detecting refrain, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of multimedia technologies, and in particular, to a method and an apparatus for detecting refrain, an electronic device, and a computer-readable storage medium.
Background
With the development of multimedia technology, people often use audio playing applications to play audio files. For example, a song may be played using audio playback software. The song master and the song deputy are main components of the popular song, the song master is generally the part of the song before the climax, and the melody is slowly pushed to the climax and the story background is clearly expressed; the refrain is the sublimation of emotion, the melody is strongly contrasted with the main song, the singing climax is the fragment with the most concentrated thinking and the most intense emotion in the song, and the singing climax is the center of the whole song and is often the place with the strongest memory.
In the related art, the detection of the refrain (climax) part is usually to manually find the climax part of the song, which not only has low search efficiency, but also needs to consume higher time cost and material cost.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for detecting refraining, an electronic device and a computer-readable storage medium.
A first aspect of the present disclosure provides a refraining detection method, specifically including:
acquiring a plurality of audio clips from an audio file to be detected;
for each audio segment, determining the similarity of the audio segment with each audio segment after the audio segment;
aiming at each audio clip, calculating the number of the similarity exceeding a preset threshold value, and determining the repetition times of the audio clip;
and taking the audio clip with the most repetition times as the refrain.
Optionally, the similarity is a ratio of a length of the same content in the two audio segments to a half of a sum of the lengths of the two audio segments.
Optionally, the calculating, for each audio segment, the number of the similarity exceeding a preset threshold, and determining the number of repetitions of the audio segment includes:
if the similarity is greater than the preset threshold, making the similarity be 1, otherwise, making the similarity be 0;
and calculating the number of the similarity degrees of 1 for each audio segment, and determining the repetition times of the audio segment.
Optionally, the calculating, for each audio segment, the number of similarities of 1, and determining the number of repetitions of the audio segment includes:
constructing a similarity matrix based on the similarity of each audio clip and other audio clips, wherein the coordinates of points in the similarity matrix are the arrangement sequence of two audio clips with the similarity of 1 in the audio file to be detected;
determining continuous points in the similarity matrix, and filtering out the parts of the similarity matrix, the continuous number of which is less than the specified number;
and determining the repetition times of each audio segment based on the similarity matrix after filtering.
Optionally, the determining, for each audio segment, the number of repetitions of the audio segment based on the similarity matrix after the filtering processing includes:
and summing the point numbers of each column in the similarity matrix after filtering processing to obtain the repetition times of each audio frequency segment.
Optionally, the total number of audio clips included in the audio file to be detected is not less than 2 times of the specified number.
Optionally, the audio file to be detected is a lyric file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.
Optionally, the obtaining a plurality of audio clips from the audio file to be detected includes:
and preprocessing the lyric file of the audio to be detected to obtain multiple sentences of lyrics.
Optionally, the pre-processing comprises any one or more of:
the lyric text format normalization process, filter lyrics with total words less than a specified threshold, remove non-lyric portions of the lyric text, merge non-phrase-consistent lyrics, and remove lyrics containing non-specified languages.
According to a second aspect of the embodiments of the present disclosure, there is provided a refraining detecting apparatus, the apparatus comprising:
the audio clip acquisition unit is used for acquiring a plurality of audio clips from the audio file to be detected;
the similarity determining unit is used for determining the similarity of each audio clip and each audio clip after the audio clip;
the repetition frequency determining unit is used for calculating the number of the similarity exceeding a preset threshold value aiming at each audio clip and determining the repetition frequency of the audio clip;
and the refrain determining unit is used for taking the audio segment with the most repetition times as the refrain.
Optionally, the similarity is a ratio of a length of the same content in the two audio segments to a half of a sum of the lengths of the two audio segments.
Optionally, the repetition number determining unit includes:
a setting subunit, configured to set the similarity to 1 if the similarity is greater than the preset threshold, and otherwise to 0;
and the repetition number calculation subunit is used for calculating the number of the similarity degrees of 1 aiming at each audio segment and determining the repetition number of the audio segment.
Optionally, the repetition number calculation subunit includes:
the matrix construction module is used for constructing a similarity matrix based on the similarity of each audio clip and other audio clips, and the coordinates of points in the similarity matrix are the arrangement sequence of two audio clips with the similarity of 1 in the audio file to be detected;
the filtering module is used for determining continuous points in the similarity matrix and filtering parts of which the continuous number of the continuous points in the similarity matrix is less than a specified number;
and the repetition frequency determining module is used for determining the repetition frequency of each audio segment based on the similarity matrix after the filtering processing.
Optionally, the repetition number determining module includes:
and summing the point numbers of each column in the similarity matrix after filtering processing to obtain the repetition times of each audio frequency segment.
Optionally, the total number of audio clips included in the audio file to be detected is not less than 2 times of the specified number.
Optionally, the audio file to be detected is a lyric file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.
Optionally, the audio clip obtaining unit includes:
and preprocessing the lyric file of the audio to be detected to obtain multiple sentences of lyrics.
Optionally, the pre-processing comprises any one or more of:
the lyric text format is normalized, lyrics with a total number of words less than a specified threshold are filtered, non-lyric portions of the lyric text are deleted, and lyrics with inconsistent merging and breaking sentences are removed.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method of any of the first aspects.
According to a fourth aspect of embodiments of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of any one of the methods of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising the steps of any one of the methods of the first aspect.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the method and the device for determining the refrain of the karaoke are characterized in that a plurality of audio clips are obtained from an audio file to be detected, then the similarity of each audio clip with each audio clip behind the audio clip is determined, the repetition times of the audio clips are determined according to the similarity, the audio clip with the most repetition times is used as a refrain part, the implementation process is simple and efficient, the obtained refrain is high in accuracy, a user does not need to drag a progress bar to find the climax of the song, the determination of the refrain can help the user to find favorite videos and music more effectively, the tedious operation of the user is reduced, and the use experience of the user is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
FIG. 1 is a flow chart illustrating a method of detecting refraining according to an exemplary embodiment of the present disclosure;
FIG. 2A is a graph of a similarity matrix shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 2B is a second similarity matrix diagram illustrated in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a second method of detecting refraining according to an exemplary embodiment of the present disclosure;
FIG. 4 is a third similarity matrix diagram shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 5A is a diagram illustrating a fourth similarity matrix according to an exemplary embodiment of the present disclosure;
FIG. 5B is a fifth similarity matrix diagram illustrated in accordance with an exemplary embodiment of the present disclosure;
FIG. 6 is a block diagram illustrating a refraining detecting apparatus according to an exemplary embodiment of the present disclosure;
fig. 7 is a block diagram illustrating an apparatus for performing an embodiment of a method for detecting refraining according to an example embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Aiming at the problems of low efficiency and high cost of manually searching for the song climax part in the related technology, the embodiment of the disclosure provides a refrain detection method which can be executed by electronic equipment, wherein the electronic equipment can be a computer, a smart phone, a tablet, a personal digital assistant or a server and other computing equipment, and the simple and efficient searching for the song climax part is realized.
Referring to fig. 1, a flowchart of a method for detecting refraining according to an exemplary embodiment of the present disclosure is shown, the method comprising:
in step S101, a plurality of audio clips are acquired from an audio file to be detected.
In step S102, for each audio segment, a similarity of the audio segment with each audio segment following the audio segment is determined.
In step S103, for each audio segment, the number of times that the similarity exceeds the preset threshold is calculated, and the number of times of repetition of the audio segment is determined.
In step S104, the audio piece with the largest number of repetitions is regarded as a refrain.
In an embodiment, the electronic device may pre-process the audio file to be detected to obtain a plurality of audio segments, where the pre-processing may be a normalization processing of the format of the audio file to be detected, or a pre-processing of a lyrics text contained in the audio file to be detected, such as filtering lyrics whose total number of words is less than a specified threshold, or a pre-processing of a melody text contained in the audio file to be detected, such as deleting a non-melody part in the melody text.
The audio file to be detected may be a lyric file of the audio to be detected or a song melody file, and it should be noted that the lyric file is a text file into which each sentence of lyrics is divided in advance, and the song melody file is a text file into which each section of melody is divided in advance, and accordingly, if the audio file to be detected is the lyric file, the audio clip is one sentence of lyrics; if the audio file to be detected is a song melody file (namely a music score), the audio clip is a segment of melody.
In an embodiment, after the electronic device obtains a plurality of audio clips, for each audio clip, determining similarity of each audio clip to each audio clip after the audio clip, where the similarity is a ratio of a length of a same content in the two audio clips to a half of a sum of lengths of the two audio clips, for example, if the electronic device obtains 4 audio clips from an audio file to be detected, a1 st audio clip calculates similarities with a2 nd, a 3 rd, and a 4 th audio clip, the 1 st audio clip has 3 similarities, the 2 nd audio clip calculates similarities with the 3 rd and the 4 th audio clips, the 2 nd audio clip has 2 similarities, and so on, the 3 rd audio clip has 1 similarity, and the 4 th audio clip has 0 similarity.
In one implementation, assuming that the audio segment is ith, the similarity si (i, j) with the jth (i < j) audio segment is:
Figure BDA0002250269570000071
where, numi is the character length of the ith audio clip, numjThe sameNum is the character length of the jth audio clip, and the character length of the ith audio clip and the jth audio clip with the same content.
In this embodiment, after determining the similarity between each audio clip and each audio clip following the audio clip, the electronic device calculates, for each audio clip, the number of times that the similarity in the audio clip exceeds a preset threshold, and determines the number of repetitions of the audio clip, so as to use the audio clip with the largest number of repetitions as a refrain; the method is simple and efficient to realize, the refrain part can be quickly and accurately determined, the user does not need to drag a progress bar to search for the climax of the song, the determination of the refrain can help the user to more effectively find favorite videos and music, the complicated operation of the user is reduced, and the use experience of the user is improved; it can be understood that, in the embodiment of the present disclosure, the value of the threshold is not limited at all, and may be specifically set according to an actual situation, for example, the threshold may be 0.5.
In some scenarios, when the user clicks the audition song, only the refrain part can be played for the user, so that the user can quickly determine the desired song; or the refrain part may be marked in the song progress bar so that the user can directly locate the refrain part by clicking or dragging the progress bar.
In a possible implementation manner, when determining the similarity between each audio segment and each subsequent audio segment, if the similarity is greater than the preset threshold, the electronic device makes the similarity 1, otherwise, the similarity is 0, and after determining all the similarities, the electronic device determines the number of times of repetition of the audio segment according to the number of the similarities 1 in each audio segment.
It should be noted that if it is determined that the repetition times of all the audio clips are 0, it indicates that the audio file to be detected has no repeated content, the refrain cannot be detected through the repetition, and the electronic device ends the refrain detection process.
In an embodiment, considering that non-refrain parts in a song, such as a main song, a transition sentence, a final sentence, etc. of the song also frequently repeat, the inventor finds out after analyzing a large number of songs in the process of implementing the present disclosure that: the refrain and the ending sentence of the song are the most repeated parts, but the ending sentence is generally shorter, such as less than 4 sentences, and the climax part is generally longer, such as at least 4 sentences. Based on this, in order to further improve the accuracy of detecting the refrain, the electronic device may construct a similarity matrix based on the similarity between each audio clip and other audio clips, where the coordinates of points in the similarity matrix are the arrangement order of two audio clips with similarity of 1 in the audio file to be detected, then determine continuous points in the similarity matrix, filter out the part where the continuous number of the continuous points in the similarity matrix is smaller than the specified number, and finally determine the repetition number of the audio clip based on the similarity matrix after the filtering processing for each audio clip; such as filtering out a consecutive number of portions smaller than 4; it is understood that, in the embodiments of the present disclosure, specific values of the specified quantities are not limited at all, and may be specifically set according to actual situations.
As one possible implementation manner, the electronic device may sum the point numbers of each column in the similarity matrix after the filtering processing, so as to obtain the repetition number of each audio segment.
As an example, for example, audio segments a to G are extracted from an audio file to be detected (assuming that the arrangement sequence of the audio segments is the same as the alphabetical sequence), the similarity of each audio segment is calculated to determine the repetition frequency of each audio segment, and assuming that the audio segments a to G are { "you", "good", "you", "i", "you", "good", "you" } respectively in turn, the similarity between the audio segment a and the audio segment C, the similarity between the audio segment E and the audio segment G are 1, and the similarity between the audio segment a and the audio segments G is 0; the similarity between the audio clip B and the audio clip F is 1, and the similarity between the audio clip B and other audio clips is 0; the similarity of the audio clip C with the audio clip E and the similarity of the audio clip G with the audio clip E are both 1, and the similarity of the audio clip C with other audio clips is both 0; the similarity of the audio clip D and other audio clips is 0; the similarity of the audio clip E and the audio clip G is 1; the similarity of the audio clip F and other audio clips is 0; the electronic device may construct a 7 × 7 similarity matrix diagram as shown in fig. 2A based on the similarity between each audio clip and other audio clips, where the similarity matrix diagram represents the result of displaying the similarity of 1 after calculating the similarity between each audio clip on the horizontal axis and each audio clip on the vertical axis, and the points in the similarity matrix are located in the arrangement order of the two audio clips with the similarity of 1 in the audio file to be detected.
As can be seen from fig. 2A, points on coordinates (a, E), coordinates (B, F) and coordinates (C, G) are continuous, points on other coordinates are discontinuous, and the specified number is 2, the electronic device filters a portion of the similarity matrix where the continuous number of the continuous points is smaller than the specified number, so as to obtain a filtered similarity matrix as shown in fig. 2B, the electronic device determines, for each audio segment, the number of repetitions of the audio segment based on the filtered similarity matrix, in one implementation, the electronic device may sum the number of points in each column in the filtered similarity matrix, so as to obtain the number of repetitions of each audio segment, and finally, the audio segment with the largest number of repetitions is used as an refrain, for example, the refrain portion obtained based on fig. 2B is an audio segment a, Audio clip B and audio clip C.
It should be noted that, in order to ensure the accuracy of detecting the refrain, the embodiment of the present disclosure detects the audio frequency of the refrain part repeated at least 2 times, so that the total number of the audio frequency segments included in the audio frequency file to be detected in the embodiment of the present disclosure is not less than 2 times of the specified number, and the refrain cannot be detected through repeatability for the audio frequency of the audio frequency segment less than 2 times of the specified number.
Please refer to fig. 3, which is a flowchart illustrating a second method for detecting a refrain according to an exemplary embodiment of the present disclosure, in this embodiment, the audio file to be detected is a lyric file of an audio to be detected, and the audio clip is a lyric, for example, the method includes:
in step S301, a plurality of words are obtained from a lyric file of the audio to be detected.
In step S302, for each lyric, a similarity between the lyric and each lyric following it is determined.
In step S303, for each song word, the number of times that the similarity exceeds the preset threshold is calculated, and the number of repetitions of the song word is determined.
In step S304, the lyrics with the most repeated number are regarded as the refrains.
In an embodiment, the electronic device preprocesses the lyric file of the audio to be detected to obtain multiple sentences of lyrics, where the preprocessing includes any one or more of the following operations: the lyric text format standard is processed in a standardized way, lyrics with the total word number less than a specified threshold value are filtered, non-lyric parts in the lyric text are deleted (such as related lyric information, namely, information of 'words: Xiaoming', 'songs: small white', 'producers: Mingming', and the like), and lyrics with inconsistent merging and breaking sentences are deleted; it is understood that, in the present disclosure, specific values of the specified threshold are not limited at all, and may be set according to actual situations, for example, the specified threshold is 2.
In consideration of the inconsistency of the lyric text formats, for example, a lyric in some lyric texts corresponds to a time stamp, and each word in some lyric texts corresponds to a time stamp, so that normalization processing is required, for example, all lyric texts are processed into a lyric corresponding to a time stamp.
In addition, for example, a song may have multiple lyrics a, and lyrics a found in subsequent lyrics are divided into two sentences, a1 and a2, so as to avoid deviation in similarity calculation and improve the accuracy of detecting the refrain, two sentences, a1 and a2, are combined in the preprocessing process, that is, words, lyrics with inconsistent sentence breaks are combined.
It should be noted that the embodiment of the present disclosure does not set any limitation on the lyric language, and the embodiment of the present disclosure may implement the chorus detection on songs in any language and with repeatability.
In an embodiment, after obtaining multiple words of lyrics, the electronic device determines, for each word of lyrics, a similarity between the lyrics and each word of lyrics following the lyrics, and for example, assuming that the audio segment is the ith, a similarity si (i, j) between the lyrics and the jth (i < j) audio segment is:
Figure BDA0002250269570000101
wherein, numiIs the character length of the lyrics of the ith sentence, numjThe sum of the characters of the ith lyric and the jth lyric is the character length of the jth lyric, sameNum is the character length of the same content of the ith lyric and the jth lyric, for example, if the lyric file of the audio to be detected is Chinese, the sameNum indicates the same number of Chinese characters between the two lyrics; if the lyric file of the audio to be detected is English, sameNum indicates the same word number between two words.
In an embodiment, the electronic device may calculate the number of the similarity exceeding a preset threshold in each lyric, and determine the number of times of repetition of the lyric; it is understood that the threshold may be specifically set according to actual situations, for example, the threshold may be set to 0.5.
In one implementation, for subsequent convenience of calculation, if the similarity is greater than the preset threshold, the similarity is 1, otherwise, the similarity is 0, if a song has N sentences, after the similarity is determined, a similarity matrix diagram of N × N may be obtained, for example, if a song has 10 sentences, 2 nd sentences are the same as 6 th sentences, 3 rd sentences are the same as 7 th sentences, 4 th sentences are the same as 8 th sentences, and 5 th sentences are the same as 9 th sentences, the determined similarity matrix is as shown in fig. 4 (0 and 1 in fig. 4 are for illustration, where the horizontal axis represents i and the vertical axis represents j), the number of repetitions of each song word may be directly obtained from the diagram according to the number of similarities 1 in each audio segment, and which song words repeat, for example, the coordinates (3,6) represent that the 3 rd sentences and the 6 th lyrics repeat the lyrics.
In an example, please refer to fig. 5A, taking song "chengdu" as an example, white points represent that the similarity is 1, after preprocessing the lyric text and calculating the similarity, according to the arrangement order of each lyric in the audio file to be detected, a similarity matrix diagram as shown in fig. 5A can be constructed according to the similarity between each lyric and other lyrics, and the electronic device can determine the number of repetitions of each lyric, such as 20 abscissas (representing the 21 st sentence) repeated 2 times, according to the number of similarities (the number of white points) of 1 in each lyric, and repeat the lyrics with the 36 th sentence and the 46 th sentence, respectively.
In an embodiment, considering that non-refrain parts in a song, such as a main song, a transition sentence, a final sentence, etc. of the song also frequently repeat, the inventor finds out after analyzing a large number of songs in the process of implementing the present disclosure that: the refrain and the ending sentence of the song are the most repeated parts, but the ending sentence is generally shorter, such as less than 4 sentences, and the climax part is generally longer, such as at least 4 sentences. Based on the above, in order to further improve the accuracy of detecting the refrain, the electronic device may construct a similarity matrix based on the similarity between each lyric and other lyrics, where a point in the similarity matrix is located in an arrangement order of two lyrics with a similarity of 1 in the audio file to be detected, then determine continuous points in the similarity matrix, filter out a portion where the continuous number of the continuous points in the similarity matrix is smaller than a specified number, and finally determine the number of repetitions of the lyrics based on the similarity matrix after the filtering processing for each lyric; such as filtering out a consecutive number of portions smaller than 4; it is understood that, in the embodiments of the present disclosure, specific values of the specified quantities are not limited at all, and may be specifically set according to actual situations.
In one example, referring to the similarity matrix diagram shown in fig. 5A, for example, if the specified number is 4, the lyric with the repetition number of 0 is filtered, and the part with the sentence number of the consecutively repeated lyrics being less than 4 sentences is filtered, i.e., the line segment with the consecutive points less than 4 in the similarity matrix diagram shown in fig. 5 needs to be removed, so as to obtain fig. 5B, and each column in fig. 5B is added, i.e., the longitudinal white points are added, so as to obtain the values shown in table 1:
table 1 (sum of omitted parts is the same as before and after)
Figure BDA0002250269570000121
It can be seen that the part with the maximum value is the part with the most repetition times, and is determined as the detected refrain, i.e. the abscissa 16-22, i.e. the parts from clauses 17 to 23 are the detected refrain.
It should be noted that, in order to ensure the accuracy of detecting the refrain, the embodiment of the present disclosure detects the audio frequency of the refrain part repeated at least 2 times, so that the total number of words of the lyric included in the lyric file in the embodiment of the present disclosure is not less than 2 times of the specified number, and the refrain cannot be detected through repeatability for the audio frequency of which the total number of words of the lyric is less than 2 times of the specified number.
In the embodiment, multiple words of lyrics are obtained from a lyric file of the audio to be detected, then, for each word of lyrics, the similarity between the lyrics and each subsequent word of lyrics is determined, the repetition frequency of the words of the lyrics is determined according to the similarity, and therefore the lyrics with the largest repetition frequency are used as a part of the chorus.
Corresponding to the embodiments of the refraining detection method, the invention also provides embodiments of a refraining detection device, an electronic device and a computer readable storage medium.
Referring to fig. 6, a block diagram of an embodiment of a refraining detecting apparatus according to an embodiment of the present disclosure is shown, the apparatus including:
an audio clip acquiring unit 401, configured to acquire a plurality of audio clips from an audio file to be detected; the beginning of each audio segment corresponds to the beginning of the lyrics.
A similarity determining unit 402, configured to determine, for each audio segment, a similarity of the audio segment with each audio segment following the audio segment.
A repetition number determining unit 403, configured to calculate, for each audio segment, the number of times that the similarity exceeds a preset threshold, and determine the repetition number of the audio segment.
A refrain determining unit 404 for determining the audio segment with the most repetition times as a refrain.
Optionally, the similarity is a ratio of a length of the same content in the two audio segments to a half of a sum of the lengths of the two audio segments.
Optionally, the repetition number determining unit 403 includes:
and the setting subunit is used for setting the similarity as 1 if the similarity is greater than the preset threshold, and otherwise, setting the similarity as 0.
And the repetition number calculation subunit is used for calculating the number of the similarity degrees of 1 aiming at each audio segment and determining the repetition number of the audio segment.
Optionally, the repetition number calculation subunit includes:
and the matrix construction module is used for constructing a similarity matrix based on the similarity of each audio clip and other audio clips, and the coordinates of points in the similarity matrix are the arrangement sequence of the two audio clips with the similarity of 1 in the audio file to be detected.
And the filtering module is used for determining continuous points in the similarity matrix and filtering the part of which the continuous number of the continuous points in the similarity matrix is less than the specified number.
And the repetition frequency determining module is used for determining the repetition frequency of each audio segment based on the similarity matrix after the filtering processing.
Optionally, the repetition number determining module includes:
and summing the point numbers of each column in the similarity matrix after filtering processing to obtain the repetition times of each audio frequency segment.
Optionally, the total number of audio clips included in the audio file to be detected is not less than 2 times of the specified number.
Optionally, the audio file to be detected is a lyric file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.
Optionally, the audio segment obtaining unit 401 includes:
and preprocessing the lyric file of the audio to be detected to obtain multiple sentences of lyrics.
Optionally, the pre-processing comprises any one or more of:
the lyric text format is normalized, lyrics with a total number of words less than a specified threshold are filtered, non-lyric portions of the lyric text are deleted, and lyrics with inconsistent merging and breaking sentences are removed.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.
Accordingly, fig. 7 is a block diagram illustrating an apparatus for performing the above-described method embodiments according to an exemplary embodiment of the present disclosure.
In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the method embodiments of any of fig. 1 or 3 described above.
Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The embodiment of the present disclosure further provides an electronic device 50, which includes a processor 51; a memory 52 (e.g., a non-volatile memory) for storing executable instructions, wherein the processor is configured to execute the instructions to implement the method embodiments of any of fig. 1 or 3 described above.
The Processor 51 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 52 stores executable instructions of the refraining detecting method, and the memory 52 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the apparatus may cooperate with a network storage device that performs a storage function of the memory through a network connection. The storage 52 may be an internal storage unit of the device 50, such as a hard disk or a memory of the device 50. The memory 52 may also be an external storage device of the device 50, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) Card, Flash memory Card (Flash Card), etc. provided on the device 50. Further, memory 52 may also include both internal and external storage units of device 50. The memory 52 is used for storing a computer program 53 as well as other programs and data required by the device. The memory 52 may also be used to temporarily store data that has been output or is to be output.
The various embodiments described herein may be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, and an electronic unit designed to perform the functions described herein. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory and executed by the controller.
The electronic device 50 includes, but is not limited to, the following forms of presence: (1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, etc.; (2) ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as iPad; (3) a portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices; (4) a server: the device for providing the computing service, the server comprises a processor, a hard disk, a memory, a system bus and the like, the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like as long as highly reliable service is provided; (5) and other electronic equipment with data interaction function. The device may include, but is not limited to, a processor 51, a memory 52, and as shown in fig. 7, the electronic device generally includes a memory 53 and a network interface 54. Of course, those skilled in the art will appreciate that fig. 5 is merely an example of the electronic device 50, and does not constitute a limitation on the electronic device 50, and may include more or less components than those shown, or combine some of the components, or different components, for example, the device may also include an input-output device, a network access device, a bus, a camera device, etc.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
The disclosed embodiments also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform the method embodiment of any one of fig. 1 or fig. 3.
The disclosed embodiments also provide a computer program product comprising executable program code, wherein the program code, when executed by the above-described apparatus, implements the method embodiments according to any of fig. 1 or fig. 3.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
The present disclosure is to be considered as limited only by the preferred embodiments and not limited to the specific embodiments described herein, and all changes, equivalents, and modifications that come within the spirit and scope of the disclosure are desired to be protected.

Claims (10)

1. A method for detecting refraining, comprising:
acquiring a plurality of audio clips from an audio file to be detected;
for each audio segment, determining the similarity of the audio segment with each audio segment after the audio segment;
aiming at each audio clip, calculating the number of the similarity exceeding a preset threshold value, and determining the repetition times of the audio clip;
and taking the audio clip with the most repetition times as the refrain.
2. The method of claim 1, wherein the similarity is a ratio of a length of the same content in the two audio segments to half of a sum of the lengths of the two audio segments.
3. The method of claim 1, wherein the calculating, for each audio segment, the number of the similarity exceeding a preset threshold and determining the number of repetitions of the audio segment comprises:
if the similarity is greater than the preset threshold, making the similarity be 1, otherwise, making the similarity be 0;
and calculating the number of the similarity degrees of 1 for each audio segment, and determining the repetition times of the audio segment.
4. The method of claim 3, wherein the calculating the number of similarity degrees 1 for each audio segment and determining the number of repetitions of the audio segment comprises:
constructing a similarity matrix based on the similarity of each audio clip and other audio clips, wherein the coordinates of points in the similarity matrix are the arrangement sequence of two audio clips with the similarity of 1 in the audio file to be detected;
determining continuous points in the similarity matrix, and filtering out the parts of the similarity matrix, the continuous number of which is less than the specified number;
and determining the repetition times of each audio segment based on the similarity matrix after filtering.
5. The method of claim 4, wherein the determining the number of repetitions of each audio segment based on the filtered similarity matrix comprises:
and summing the point numbers of each column in the similarity matrix after filtering processing to obtain the repetition times of each audio frequency segment.
6. The method according to claim 4, wherein the total number of audio clips included in the audio file to be detected is not less than 2 times the designated number.
7. The method according to any one of claims 1 to 6, wherein the audio file to be detected is a lyrics file of the audio to be detected; the audio frequency fragment is a sentence of lyrics.
8. A refraining detecting apparatus, comprising:
the audio clip acquisition unit is used for acquiring a plurality of audio clips from the audio file to be detected;
the similarity determining unit is used for determining the similarity of each audio clip and each audio clip after the audio clip;
the repetition frequency determining unit is used for calculating the number of the similarity exceeding a preset threshold value aiming at each audio clip and determining the repetition frequency of the audio clip;
and the refrain determining unit is used for taking the audio segment with the most repetition times as the refrain.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.
CN201911031441.7A 2019-10-28 2019-10-28 Method and device for detecting refrain, electronic equipment and storage medium Pending CN110808065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911031441.7A CN110808065A (en) 2019-10-28 2019-10-28 Method and device for detecting refrain, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911031441.7A CN110808065A (en) 2019-10-28 2019-10-28 Method and device for detecting refrain, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110808065A true CN110808065A (en) 2020-02-18

Family

ID=69489392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911031441.7A Pending CN110808065A (en) 2019-10-28 2019-10-28 Method and device for detecting refrain, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110808065A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383669A (en) * 2020-03-19 2020-07-07 杭州网易云音乐科技有限公司 Multimedia file uploading method, device, equipment and computer readable storage medium
CN111583963A (en) * 2020-05-18 2020-08-25 合肥讯飞数码科技有限公司 Method, device and equipment for detecting repeated audio and storage medium
CN112989109A (en) * 2021-04-14 2021-06-18 腾讯音乐娱乐科技(深圳)有限公司 Music structure analysis method, electronic equipment and storage medium
CN113035160A (en) * 2021-02-26 2021-06-25 成都潜在人工智能科技有限公司 Music automatic editing implementation method and device based on similarity matrix and storage medium
CN113377992A (en) * 2021-06-21 2021-09-10 腾讯音乐娱乐科技(深圳)有限公司 Song segmentation method, device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1577877A1 (en) * 2002-10-24 2005-09-21 National Institute of Advanced Industrial Science and Technology Musical composition reproduction method and device, and method for detecting a representative motif section in musical composition data
US20060080095A1 (en) * 2004-09-28 2006-04-13 Pinxteren Markus V Apparatus and method for designating various segment classes
US7659471B2 (en) * 2007-03-28 2010-02-09 Nokia Corporation System and method for music data repetition functionality
US20110112672A1 (en) * 2009-11-11 2011-05-12 Fried Green Apps Systems and Methods of Constructing a Library of Audio Segments of a Song and an Interface for Generating a User-Defined Rendition of the Song
WO2012091935A1 (en) * 2010-12-30 2012-07-05 Dolby Laboratories Licensing Corporation Repetition detection in media data
CN102903357A (en) * 2011-07-29 2013-01-30 华为技术有限公司 Method, device and system for extracting chorus of song
CN104282322A (en) * 2014-10-29 2015-01-14 深圳市中兴移动通信有限公司 Mobile terminal and method and device for identifying chorus part of song thereof
CN105161116A (en) * 2015-09-25 2015-12-16 广州酷狗计算机科技有限公司 Method and device for determining climax fragment of multimedia file
CN106409311A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Refrain extracting apparatus and method
CN106782601A (en) * 2016-12-01 2017-05-31 腾讯音乐娱乐(深圳)有限公司 A kind of multimedia data processing method and its device
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1577877A1 (en) * 2002-10-24 2005-09-21 National Institute of Advanced Industrial Science and Technology Musical composition reproduction method and device, and method for detecting a representative motif section in musical composition data
US20060080095A1 (en) * 2004-09-28 2006-04-13 Pinxteren Markus V Apparatus and method for designating various segment classes
US7659471B2 (en) * 2007-03-28 2010-02-09 Nokia Corporation System and method for music data repetition functionality
US20110112672A1 (en) * 2009-11-11 2011-05-12 Fried Green Apps Systems and Methods of Constructing a Library of Audio Segments of a Song and an Interface for Generating a User-Defined Rendition of the Song
WO2012091935A1 (en) * 2010-12-30 2012-07-05 Dolby Laboratories Licensing Corporation Repetition detection in media data
CN102903357A (en) * 2011-07-29 2013-01-30 华为技术有限公司 Method, device and system for extracting chorus of song
CN104282322A (en) * 2014-10-29 2015-01-14 深圳市中兴移动通信有限公司 Mobile terminal and method and device for identifying chorus part of song thereof
CN106409311A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Refrain extracting apparatus and method
CN105161116A (en) * 2015-09-25 2015-12-16 广州酷狗计算机科技有限公司 Method and device for determining climax fragment of multimedia file
CN106782601A (en) * 2016-12-01 2017-05-31 腾讯音乐娱乐(深圳)有限公司 A kind of multimedia data processing method and its device
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
NICOLA ORIO: ""Experiments on Segmentation Techniques for Music Documents Indexing"", 《PROC. IN 6TH ISMIR》 *
孙佳音: ""音乐要素自动分析关键技术研究"", 《中国博士学位论文全文数据库(信息科技辑)》 *
孙翌: "《学科化服务技术与应用》", 31 January 2013 *
李相莲: ""基于音色单元分布的音乐结构分析"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
村松純: ""歌謡曲における 「さび」 の楽譜情報に基づく特徴抽出一小室哲哉の場合一"", 《情処研報音楽情報科学》 *
石自强: ""音乐结构自动分析研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
蒋盛益 等: ""基于歌词的歌曲高潮片段自动提取"", 《小型微型计算机系统》 *
邱莉榕 等: "《算法设计与优化》", 31 December 2016 *
陈廷梁: ""音乐结构分析及应用"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
韩圣龙: "《数字音乐信息组织》", 30 June 2005 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383669A (en) * 2020-03-19 2020-07-07 杭州网易云音乐科技有限公司 Multimedia file uploading method, device, equipment and computer readable storage medium
CN111583963A (en) * 2020-05-18 2020-08-25 合肥讯飞数码科技有限公司 Method, device and equipment for detecting repeated audio and storage medium
CN111583963B (en) * 2020-05-18 2023-03-21 合肥讯飞数码科技有限公司 Repeated audio detection method, device, equipment and storage medium
CN113035160A (en) * 2021-02-26 2021-06-25 成都潜在人工智能科技有限公司 Music automatic editing implementation method and device based on similarity matrix and storage medium
CN113035160B (en) * 2021-02-26 2022-08-02 成都潜在人工智能科技有限公司 Music automatic editing implementation method and device based on similarity matrix and storage medium
CN112989109A (en) * 2021-04-14 2021-06-18 腾讯音乐娱乐科技(深圳)有限公司 Music structure analysis method, electronic equipment and storage medium
CN113377992A (en) * 2021-06-21 2021-09-10 腾讯音乐娱乐科技(深圳)有限公司 Song segmentation method, device and storage medium

Similar Documents

Publication Publication Date Title
CN110808065A (en) Method and device for detecting refrain, electronic equipment and storage medium
CN109657213B (en) Text similarity detection method and device and electronic equipment
CN110750996B (en) Method and device for generating multimedia information and readable storage medium
CN107145509B (en) Information searching method and equipment thereof
CN109684497B (en) Image-text matching information sending method and device and electronic equipment
JP2009519538A (en) Method and apparatus for accessing a digital file from a collection of digital files
CN109190116B (en) Semantic analysis method, system, electronic device and storage medium
CN105868424B (en) Audio file naming method and device and electronic equipment
CN113407775B (en) Video searching method and device and electronic equipment
CN114297143A (en) File searching method, file displaying device and mobile terminal
CN110717323B (en) Document seal dividing method and device, terminal and computer readable storage medium
CN112052676B (en) Text content processing method, computer equipment and storage medium
CN111949793B (en) User intention recognition method and device and terminal equipment
CN111128254B (en) Audio playing method, electronic equipment and storage medium
CN110263135B (en) Data exchange matching method, device, medium and electronic equipment
CN111046271B (en) Mining method and device for searching, storage medium and electronic equipment
CN108090045B (en) Word segmentation method and device and readable storage medium
CN111063337A (en) Large-scale voice recognition method and system capable of rapidly updating language model
CN112825088A (en) Information display method, device, equipment and storage medium
CN110020429A (en) Method for recognizing semantics and equipment
CN115048908A (en) Method and device for generating text directory
US11775070B2 (en) Vibration control method and system for computer device
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
CN109710798B (en) Music performance evaluation method and device
CN107194004B (en) Data processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200218