CN104882152A

CN104882152A - Method and apparatus for generating lyric file

Info

Publication number: CN104882152A
Application number: CN201510257914.0A
Authority: CN
Inventors: 武大伟; 赵普; 任思豪; 龚维
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2015-05-18
Filing date: 2015-05-18
Publication date: 2015-09-02
Anticipated expiration: 2035-05-18
Also published as: CN104882152B

Abstract

The invention discloses a method and an apparatus for generating a lyric file, and belongs to the technical field of audio processing. The method includes: obtaining a reference audio file corresponding to a to-be-processed target audio file, wherein the reference audio file and the target audio file belong to different versions of the same song; calculating the time deviation between the reference audio file and the target audio file; and correcting a timestamp corresponding to the lyric file of the reference audio file according to the time deviation and regarding the corrected lyric file as the lyric file of the target audio file. According to the method and the apparatus for generating the lyric file, problems of low efficiency and high cost of the generation of the lyric file by the adoption of the manual method are solved, the efficiency for generating the lyric file is improved, and the cost is lowered.

Description

Generate method and the device of lyrics file

Technical field

The present invention relates to audio signal processing technique field, particularly a kind of method and device generating lyrics file.

Background technology

Along with the raising day by day that user requires audiovisual experience, user use music class application program carry out musical works depending on, listen, the operation such as to sing time, need application program can provide the function of the display lyrics.

Develop Application System business, in order to meet the demand of user, needs for different song files generates the lyrics file matched.In the related, manual type is adopted to be that song files generates the lyrics file matched.

But not only efficiency is low and cost is high to adopt manual type to generate lyrics file.Along with the continuous expansion of bent storehouse scale, the drawback existing for manual type seems day by day serious.

Summary of the invention

Adopting the efficiency existing for manual type generation lyrics file low and the problem that cost is high to solve correlation technique, embodiments providing a kind of method and the device that generate lyrics file.Described technical scheme is as follows:

First aspect, provide a kind of method generating lyrics file, described method comprises:

Obtain the reference audio file that pending target audio file is corresponding, described reference audio file and described target audio file belong to the different editions of same song;

Calculate the time deviation between described reference audio file and described target audio file;

According to described time deviation, the timestamp corresponding to the lyrics file of described reference audio file is revised, and using the lyrics file of revised lyrics file as described target audio file.

Alternatively, the reference audio file that the pending target audio file of described acquisition is corresponding, comprising:

Obtain at least one candidate reference audio file that described target audio file is corresponding, candidate reference audio file described in each and described target audio file belong to the different editions of same song;

According to predetermined order rule, at least one candidate reference audio file described is sorted;

Described candidate reference audio file is chosen successively one by one according to ranking results;

Whether there is strong correlation between the candidate reference audio file that detection is selected and described target audio file;

When obtaining the candidate reference audio file between first and described target audio file with strong correlation, stop choosing next candidate reference audio file, and there is the candidate reference audio file of strong correlation as described reference audio file using between described first and described target audio file.

Alternatively, whether there is strong correlation between the candidate reference audio file that described detection is selected and described target audio file, comprising:

Cross correlation Number Sequence between the candidate reference audio file be selected described in calculating and described target audio file, comprises at least one cross-correlation coefficient in described cross correlation Number Sequence;

The maximal value p of cross-correlation coefficient is chosen from described cross correlation Number Sequence ₀;

Obtain described maximal value p ₀corresponding position deviation m ₀;

According to described position deviation m ₀[m between primary importance deviation area ₀+ m _min, m ₀+ m _max] and second place deviation area between [m ₀-m _max, m ₀-m _min] corresponding to cross-correlation coefficient in choose maximal value p ₁, 1≤m _min< m _max;

Detect described maximal value p ₀with described maximal value p ₁between ratio p ₀/ p ₁whether be greater than predetermined threshold value;

If described ratio p ₀/ p ₁be greater than described predetermined threshold value, then between the candidate reference audio file be selected described in determining and described target audio file, there is strong correlation.

Alternatively, the cross correlation Number Sequence between the candidate reference audio file that described calculating is selected and described target audio file, comprising:

Sample from the described candidate reference audio file be selected with default sampling rate and obtain candidate audio sample sequence, and sample from described target audio file acquisition target audio sample sequence with described default sampling rate;

Extract the voice data of preset length from the same position of described candidate audio sample sequence and described target audio sample sequence, obtain candidate audio data sequence and target audio data sequence respectively;

Calculate the cross correlation Number Sequence between described candidate audio data sequence and described target audio data sequence.

Alternatively, the cross correlation Number Sequence between described calculating described candidate audio data sequence and described target audio data sequence, comprising:

Calculate cross correlation Number Sequence R_xy (m) between described candidate audio data sequence x (n) and described target audio data sequence y (n) according to the following equation:

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

Wherein, m ∈ [-(N-1), (N-1)], 0≤n≤N-1,0≤n+m≤N-1, N is positive integer.

From described candidate audio data sequence x (n), extract a voice data every predetermined space and obtain candidate audio data pick-up sequence x ' (n), and from described target audio data sequence y (n), extract a voice data every described predetermined space and obtain target audio data pick-up sequences y ' (n); Wherein, x ' (n)=x (k × n), y ' (n)=y (k × n), described predetermined space is k voice data, and k is positive integer;

Calculate rough cross correlation Number Sequence R_xy ' (m) between described candidate audio data pick-up sequence x ' (n) and described target audio data pick-up sequences y ' (n) according to the following equation:

R_x y^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

Wherein, m ∈ [-(N-1)/k, (N-1)/k], 0≤n≤(N-1)/k, 0≤n+m≤(N-1)/k, N is positive integer;

Obtain the position deviation m corresponding to maximal value in described rough cross correlation Number Sequence R_xy ' (m) ₁;

Position deviation between described candidate audio data sequence x (n) and described target audio sequences y (n) is k × m ₁state under, intercept the candidate audio data cutout sequence x " (n) and target audio data cutout sequences y " (n) of target length respectively from the corresponding position of described candidate audio data sequence x (n) and described target audio sequences y (n);

Calculate the accurate cross correlation Number Sequence R_xy between described candidate audio data cutout sequence x " (n) and described target audio data cutout sequences y " (n) according to the following equation " (m):

R_x y^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

Wherein, m ∈ [k × m ₁-a, k × m ₁+ a], a>=k, N ₀represent described target length, N ₀for preset value; Described accurate cross correlation Number Sequence R_xy " the position deviation m corresponding to the maximal value in (m) ₂be exact position deviation.

Alternatively, at least one candidate reference audio file that described acquisition described target audio file is corresponding, comprising:

Obtain the classification belonging to described target audio file, described in be categorized as single class, on-the-spot class, accompaniment class and noise reduction class in any one;

Classification belonging to described target audio file determines the target classification searching described candidate reference audio file;

The audio file that meets and preset and choose condition is searched as described candidate reference audio file in described target classification; Wherein, described presetting is chosen condition and is comprised: described audio file has the lyrics file through manually binding, described audio file belongs at least one in high-sound quality audio file.

Alternatively, described classification belonging to described target audio file determines the target classification searching described candidate reference audio file, comprising:

When the classification belonging to described target audio file belongs to described single class, determine that described single class is described target classification; Or,

When the classification belonging to described target audio file belongs to described on-the-spot class, determine that described on-the-spot class is described target classification; Or,

When the classification belonging to described target audio file belongs to described accompaniment class, determine that described accompaniment class, described single class and described on-the-spot class are described target classification; Or,

When the classification belonging to described target audio file belongs to described noise reduction class, determine that described noise reduction class, described single class and described on-the-spot class are described target classification.

Second aspect, provide a kind of device generating lyrics file, described device comprises:

Acquisition module, for obtaining reference audio file corresponding to pending target audio file, described reference audio file and described target audio file belong to the different editions of same song;

Computing module, for calculating the time deviation between described reference audio file and described target audio file;

Correcting module, for revising the timestamp corresponding to the lyrics file of described reference audio file according to described time deviation, and using the lyrics file of revised lyrics file as described target audio file.

Alternatively, described acquisition module, comprising: obtain submodule, sorting sub-module, choose submodule, detection sub-module and determine submodule;

Described acquisition submodule, for obtaining at least one candidate reference audio file corresponding to described target audio file, candidate reference audio file described in each and described target audio file belong to the different editions of same song;

Described sorting sub-module, for sorting at least one candidate reference audio file described according to predetermined order rule;

Describedly choose submodule, for choosing described candidate reference audio file successively one by one according to ranking results;

Described detection sub-module, for detect be selected whether there is strong correlation between candidate reference audio file and described target audio file;

Describedly determine submodule, for when obtaining the candidate reference audio file between first and described target audio file with strong correlation, stop choosing next candidate reference audio file, and there is the candidate reference audio file of strong correlation as described reference audio file using between described first and described target audio file.

Alternatively, described detection sub-module, comprising: computing unit, first chooses unit, acquiring unit, second chooses unit, detecting unit and determining unit;

Described computing unit, for the cross correlation Number Sequence between the candidate reference audio file that is selected described in calculating and described target audio file, comprises at least one cross-correlation coefficient in described cross correlation Number Sequence;

Described first chooses unit, for choosing the maximal value p of cross-correlation coefficient from described cross correlation Number Sequence ₀;

Described acquiring unit, for obtaining described maximal value p ₀corresponding position deviation m ₀;

Described second chooses unit, for according to described position deviation m ₀[m between primary importance deviation area ₀+ m _min, m ₀+ m _max] and second place deviation area between [m ₀-m _max, m ₀-m _min] corresponding to cross-correlation coefficient in choose maximal value p ₁, 1≤m _min< m _max;

Described detecting unit, for detecting described maximal value p ₀with described maximal value p ₁between ratio p ₀/ p ₁whether be greater than predetermined threshold value;

Described determining unit, for working as described ratio p ₀/ p ₁when being greater than described predetermined threshold value, between the candidate reference audio file be selected described in determining and described target audio file, there is strong correlation.

Alternatively, described computing unit, comprising: sampling subelement, extraction subelement and computation subunit;

Described sampling subelement, obtains candidate audio sample sequence for sampling from the described candidate reference audio file be selected with default sampling rate, and sample from described target audio file acquisition target audio sample sequence with described default sampling rate;

Described extraction subelement, for extracting the voice data of preset length from the same position of described candidate audio sample sequence and described target audio sample sequence, obtains candidate audio data sequence and target audio data sequence respectively;

Described computation subunit, for calculating the cross correlation Number Sequence between described candidate audio data sequence and described target audio data sequence.

Alternatively, described computation subunit, specifically for:

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

Alternatively, described computation subunit, specifically for:

R_x y^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

R_x y^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

Alternatively, described acquisition submodule, comprising: classification acquiring unit, classification determining unit and ff unit;

Described classification acquiring unit, for obtaining the classification belonging to described target audio file, described in be categorized as single class, on-the-spot class, accompaniment class and noise reduction class in any one;

Described classification determining unit, determines for the classification belonging to described target audio file the target classification searching described candidate reference audio file;

Described ff unit, for searching the audio file that meets and preset and choose condition as described candidate reference audio file in described target classification; Wherein, described presetting is chosen condition and is comprised: described audio file has the lyrics file through manually binding, described audio file belongs at least one in high-sound quality audio file.

Alternatively, described classification determining unit, comprising:

Subelement is determined in first classification, for when the classification belonging to described target audio file belongs to described single class, determines that described single class is described target classification; And/or,

Subelement is determined in second classification, for when the classification belonging to described target audio file belongs to described on-the-spot class, determines that described on-the-spot class is described target classification; And/or,

Subelement is determined in 3rd classification, for when the classification belonging to described target audio file belongs to described accompaniment class, determines that described accompaniment class, described single class and described on-the-spot class are described target classification; And/or,

Subelement is determined in 4th classification, for when the classification belonging to described target audio file belongs to described noise reduction class, determines that described noise reduction class, described single class and described on-the-spot class are described target classification.

The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought comprises:

By the time deviation between computing reference audio file and target audio file, then according to this time deviation, the timestamp corresponding to the lyrics file of reference audio file is revised, obtain the lyrics file of target audio file; Solving correlation technique adopts the efficiency of manual type generation existing for lyrics file low and the problem that cost is high; Reach the efficiency improving and generate lyrics file, and the technique effect reduced costs.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the process flow diagram of the method for the generation lyrics file that one embodiment of the invention provides;

Fig. 2 A is the process flow diagram of the method for the generation lyrics file that another embodiment of the present invention provides;

Fig. 2 B is the process flow diagram of the step 201 involved by another embodiment of the present invention;

Fig. 2 C is the process flow diagram of the step 204 involved by another embodiment of the present invention;

Fig. 2 D is the process flow diagram of the step 204a involved by another embodiment of the present invention;

Fig. 3 is the block diagram of the device of the generation lyrics file that one embodiment of the invention provides;

Fig. 4 is the block diagram of the device of the generation lyrics file that another embodiment of the present invention provides.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.

The method of the generation lyrics file that the embodiment of the present invention provides, can be applied to and anyly to possess in the electronic equipment of computing ability.Such as, this electronic equipment can be server, also can be the terminal of such as mobile phone, multimedia player or computing machine and so on.

Please refer to Fig. 1, it illustrates the process flow diagram of the method for the generation lyrics file that one embodiment of the invention provides, the method can comprise following several step:

Step 102, obtains the reference audio file that pending target audio file is corresponding, and above-mentioned reference audio file and target audio file belong to the different editions of same song.

Step 104, the time deviation between computing reference audio file and target audio file.

Step 106, revises the timestamp corresponding to the lyrics file of reference audio file according to time deviation, and using the lyrics file of revised lyrics file as target audio file.

In sum, the method that the present embodiment provides, by the time deviation between computing reference audio file and target audio file, then according to this time deviation, the timestamp corresponding to the lyrics file of reference audio file is revised, obtain the lyrics file of target audio file; Solving correlation technique adopts the efficiency of manual type generation existing for lyrics file low and the problem that cost is high; Reach the efficiency improving and generate lyrics file, and the technique effect reduced costs.

Please refer to Fig. 2 A, it illustrates the process flow diagram of the method for the generation lyrics file that another embodiment of the present invention provides, the method can comprise following several step:

Step 201, obtain at least one candidate reference audio file that target audio file is corresponding, each candidate reference audio file and target audio file belong to the different editions of same song.

In Qu Ku, can there is multiple different version in same song (in other words entry) usually, as single version, live version, accompaniment version etc.Also multiple different version may be there is in the version of the same type of same song, as multiple live version that may there are multiple single versions of multiple different singer's performance, same singer sings in different concert, etc.When the lyrics file corresponding for target audio file generated, from Qu Ku, search other audio file alternatively reference audio file obtaining and belong to same song with target audio file.

Alternatively, as shown in Figure 2 B, this step can comprise following a few sub-steps:

Step 201a, obtains the classification belonging to target audio file.

Wherein, the classification belonging to target audio file include but not limited in following classification any one: single class, on-the-spot class, accompaniment class and noise reduction class.

Step 201b, the classification belonging to target audio file determines the target classification searching candidate reference audio file.

In a kind of possible embodiment, directly the classification belonging to target audio file is defined as the target classification for searching candidate reference audio file.Also namely, after obtaining the classification belonging to target audio file, directly under this classification, candidate reference audio file is searched.

In the embodiment that another kind is possible, step 201b can comprise following several situation:

1) when the classification belonging to target audio file belongs to single class, determine that single class is target classification;

2) when the classification belonging to target audio file belongs to on-the-spot class, determine that on-the-spot class is target classification;

3) when the classification belonging to target audio file belongs to accompaniment class, determine that accompaniment class, single class and on-the-spot class are target classification;

4) when the classification belonging to target audio file belongs to noise reduction class, determine that noise reduction class, single class and on-the-spot class are target classification.

Certainly, above-mentioned two kinds of possible embodiments are only exemplary and explanatory, and the present embodiment does not limit other possible embodiment.

Step 201c, searches the audio file alternatively reference audio file meeting and preset condition of choosing in target classification.

Wherein, preset and choose condition and comprise: audio file has lyrics file through manually binding, audio file belongs at least one in high-sound quality audio file.In the present embodiment, the above-mentioned audio file alternatively reference audio file presetting condition of choosing is met by choosing, can guarantee that candidate reference audio file has the lyrics file of exact matching, the quality of selected candidate reference audio file can also be guaranteed, be conducive to the accuracy improving subsequent calculations and correction.

Step 202, sorts at least one candidate reference audio file according to predetermined order rule.

After obtaining at least one candidate reference audio file corresponding to target audio file, according to predetermined order rule, the candidate reference audio file of above-mentioned acquisition is sorted.Wherein, predetermined order rule comprises: with target audio file belong to candidate reference audio file candidate reference audio file that is preferential, that the belong to high-sound quality audio file candidate reference audio file that is preferential, that belong to high temperature audio file of same category preferential at least one.In the present embodiment, by setting above-mentioned predetermined order rule, can guarantee preferentially to choose more similar to target audio file and the candidate reference audio file that quality is higher carries out follow-up matching primitives, be conducive to raising and choose efficiency, and save and calculate and processing expenditure.

Step 203, chooses candidate reference audio file successively one by one according to ranking results.

Whether step 204, have strong correlation between the candidate reference audio file that detection is selected and target audio file.

In the present embodiment, in order to ensure the correction precision of lyrics file, when choosing reference audio file from candidate reference audio file, need to carry out determination and analysis to the correlativity between candidate reference audio file and target audio file, choose there is strong correlation between target audio file candidate reference audio file as final reference audio file.

Alternatively, as shown in Figure 2 C, this step can comprise following a few sub-steps:

Step 204a, calculates the cross correlation Number Sequence between candidate reference audio file and target audio file be selected, comprises at least one cross-correlation coefficient in cross correlation Number Sequence.

As shown in Figure 2 D, in a kind of possible embodiment, in order to reduce calculated amount and improve counting yield, step 204a can comprise following a few sub-steps:

Step 204a1, samples with default sampling rate and obtains candidate audio sample sequence from the candidate reference audio file be selected, and sample from target audio file acquisition target audio sample sequence with default sampling rate.

For the ease of processing the audio file of different code check and reducing expense computing time, in the present embodiment, adopt down-sampled mode, by the candidate reference audio file that is selected and target audio file all down-sampled to presetting sampling rate.This default sampling rate can set according to the actual requirements in advance, and such as this default sampling rate is 8kHz, or this default sampling rate also can be 4kHz, etc.

Step 204a2, extracts the voice data of preset length, obtains candidate audio data sequence and target audio data sequence respectively from the same position of candidate audio sample sequence and target audio sample sequence.

Wherein, preset length can set according to the actual requirements in advance, and such as this preset length is 10s.

Step 204a3, the cross correlation Number Sequence between calculated candidate audio data sequence and target audio data sequence.

Alternatively, cross correlation Number Sequence R_xy (m) according to the following equation between calculated candidate audio data sequence x (n) and target audio data sequence y (n):

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

Step 204b, chooses the maximal value p of cross-correlation coefficient from cross correlation Number Sequence ₀.

Step 204c, obtains described maximal value p ₀corresponding position deviation m ₀.

Step 204d, according to described position deviation m ₀[m between primary importance deviation area ₀+ m _min, m ₀+ m _max] and second place deviation area between [m ₀-m _max, m ₀-m _min] corresponding to cross-correlation coefficient in choose maximal value p ₁, 1≤m _min< m _max.

Step 204e, detects maximal value p ₀with maximal value p ₁between ratio p ₀/ p ₁whether be greater than predetermined threshold value.

Step 204f, if ratio p ₀/ p ₁be greater than predetermined threshold value, then determine, between the candidate reference audio file that is selected and target audio file, there is strong correlation.

It should be noted is that: the analysis part of the cross-correlation coefficient involved by above-mentioned steps 204b to step 204f is most important in the present embodiment, decides the performance of whole system, also namely decide the calculating of follow-up time deviation and the accuracy of correction.Although one finds a maximal value p surely from cross correlation Number Sequence R_xy (m) calculated ₀, but by this maximal value p ₀corresponding position deviation m ₀the time deviation calculated might not be believable.Reason is, works as p ₀although compared with other cross-correlation coefficient in cross correlation Number Sequence R_xy (m) large but large insufficient time, show that the correlativity between the candidate reference audio file that is selected and target audio file is not strong.Therefore, in the present embodiment, by comparing ratio p ₀/ p ₁and the magnitude relationship between predetermined threshold value, judge whether there is strong correlation between the candidate reference audio file that is selected and target audio file, when there is strong correlation between the candidate reference audio file determining to be selected and target audio file, the candidate reference audio file this be selected is as final reference audio file, otherwise choose next candidate reference audio file, until obtain the candidate reference audio file between and target audio file with strong correlation.

Step 205, when obtaining the candidate reference audio file between first and target audio file with strong correlation, stop choosing next candidate reference audio file, and will there is the candidate reference audio file of strong correlation as reference audio file between this first and target audio file.

Reference audio file is audio file that is that be selected and that have strong correlation between target audio file, and reference audio file has the lyrics file through manually binding, and also can think that this reference audio file has the lyrics file of exact matching.

Step 206, the time deviation between computing reference audio file and target audio file.

After selecting reference audio file, calculate time deviation between the two according to the relative coefficient between this reference audio file and target audio file.

Alternatively, time deviation τ=m ₀/ k ₀; Wherein, m ₀represent the maximal value p of cross-correlation coefficient ₀corresponding position deviation, k ₀represent and preset sampling rate.

Step 207, revises the timestamp corresponding to the lyrics file of reference audio file according to time deviation, and using the lyrics file of revised lyrics file as target audio file.

After calculating time deviation τ, this time deviation τ is utilized to revise the timestamp corresponding to the lyrics file of reference audio file.In the present embodiment, being utilize time deviation τ to carry out entirety correction to the timestamp corresponding to lyrics file, is also that the correction amplitude of the timestamp corresponding to each lyrics in lyrics file is time deviation τ.Revised lyrics file is the lyrics file of target audio file.

In addition, the method that the present embodiment provides, also by choosing the candidate reference audio file between target audio file with strong correlation as with reference to audio file, calculate to carry out follow-up time deviation and revise, substantially increase the accuracy of the final lyrics file generated, ensure that system performance.

Need supplementary notes a bit: in order to improve efficiency when cross-correlation coefficient calculates further, saving and calculating and processing expenditure, in embodiments of the present invention, additionally provide the following two kinds and calculate the mode of cross-correlation coefficient.

In first kind of way, following several step can be comprised:

1, from candidate audio data sequence x (n), extract a voice data every predetermined space and obtain candidate audio data pick-up sequence x ' (n), and from target audio data sequence y (n), extract a voice data every predetermined space and obtain target audio data pick-up sequences y ' (n); Wherein, x ' (n)=x (k × n), y ' (n)=y (k × n), predetermined space is k voice data, and k is positive integer.

Wherein, the value of predetermined space k sets after can considering the factor of computational accuracy and counting yield two aspect.Such as, k=4 can be set.

2, calculated candidate voice data extracts rough cross correlation Number Sequence R_xy ' (m) between sequence x ' (n) and target audio data pick-up sequences y ' (n) according to the following equation:

R_x y^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

Wherein, m ∈ [-(N-1)/k, (N-1)/k], 0≤n≤(N-1)/k, 0≤n+m≤(N-1)/k, N is positive integer.

3, the position deviation m corresponding to maximal value in rough cross correlation Number Sequence R_xy ' (m) is obtained ₁.

Wherein, position deviation m ₁for rough position deviation.

4, the position deviation between candidate audio data sequence x (n) and target audio sequences y (n) is k × m ₁state under, intercept the candidate audio data cutout sequence x " (n) and target audio data cutout sequences y " (n) of target length respectively from the corresponding position of candidate audio data sequence x (n) and target audio sequences y (n).

5, calculated candidate voice data intercepts the accurate cross correlation Number Sequence R_xy between sequence x " (n) and target audio data cutout sequences y " (n) according to the following equation " (m):

R_x y^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

Wherein, m ∈ [k × m ₁-a, k × m ₁+ a], a>=k, N ₀represent target length, N ₀for preset value; Accurate cross correlation Number Sequence R_xy " the position deviation m corresponding to the maximal value in (m) ₂be exact position deviation.

In above-mentioned first kind of way, first calculate " rough position deviation ", then calculate " exact position deviation ".From the computing formula of cross correlation Number Sequence, length is to two audio data sequences of N, its computation complexity is O (N ²).Therefore, above-mentioned first kind of way is adopted expense computing time can be shortened to original 1/k ²left and right.Alternatively, as k=4, adopt above-mentioned first kind of way expense computing time can be shortened to original about 1/16.

In the second way, FFT (Fast Fourier Transformation, Fast Fourier Transform (FFT)) is adopted to calculate cross-correlation coefficient:

The method using FFT to calculate cross correlation Number Sequence can be derived by the computing formula of cross correlation Number Sequence:

R_xy＝IFFT(conj(FFT(y))×FFT(x))；

Wherein, conjugate operation is asked in conj () expression.

The above-mentioned second way can utilize the efficient FFT computing module of some existing maturations to realize, as FFTW (the Faster Fourier Transform in the West) storehouse.Adopt FFT calculate cross-correlation coefficient equally also can improve cross-correlation coefficient calculate time efficiency, save calculate and processing expenditure.

Following is apparatus of the present invention embodiment, may be used for performing the inventive method embodiment.For the details do not disclosed in apparatus of the present invention embodiment, please refer to the inventive method embodiment.

Please refer to Fig. 3, it illustrates the block diagram of the device of the generation lyrics file that one embodiment of the invention provides.This device can be applicable to anyly to be had in the electronic equipment of computing ability.This device can comprise: acquisition module 310, computing module 320 and correcting module 330.

Acquisition module 310, for obtaining reference audio file corresponding to pending target audio file, described reference audio file and described target audio file belong to the different editions of same song.

Computing module 320, for calculating the time deviation between described reference audio file and described target audio file.

Correcting module 330, for revising the timestamp corresponding to the lyrics file of described reference audio file according to described time deviation, and using the lyrics file of revised lyrics file as described target audio file.

In sum, the device that the present embodiment provides, by the time deviation between computing reference audio file and target audio file, then according to this time deviation, the timestamp corresponding to the lyrics file of reference audio file is revised, obtain the lyrics file of target audio file; Solving correlation technique adopts the efficiency of manual type generation existing for lyrics file low and the problem that cost is high; Reach the efficiency improving and generate lyrics file, and the technique effect reduced costs.

Please refer to Fig. 4, it illustrates the block diagram of the device of the generation lyrics file that another embodiment of the present invention provides.This device can be applicable to anyly to be had in the electronic equipment of computing ability.This device can comprise: acquisition module 310, computing module 320 and correcting module 330.

Alternatively, described acquisition module 310, comprising: obtain submodule 310a, sorting sub-module 310b, choose submodule 310c, detection sub-module 310d and determine submodule 310e.

Described acquisition submodule 310a, for obtaining at least one candidate reference audio file corresponding to described target audio file, candidate reference audio file described in each and described target audio file belong to the different editions of same song.

Described sorting sub-module 310b, for sorting at least one candidate reference audio file described according to predetermined order rule.

Describedly choose submodule 310c, for choosing described candidate reference audio file successively one by one according to ranking results.

Described detection sub-module 310d, for detect be selected whether there is strong correlation between candidate reference audio file and described target audio file.

Describedly determine submodule 310e, for when obtaining the candidate reference audio file between first and described target audio file with strong correlation, stop choosing next candidate reference audio file, and there is the candidate reference audio file of strong correlation as described reference audio file using between described first and described target audio file.

Alternatively, described detection sub-module 310d, comprising: computing unit 310d1, first chooses unit 310d2, acquiring unit 310d3, second and chooses unit 310d4, detecting unit 310d5 and determining unit 310d6.

Described computing unit 310d1, for the cross correlation Number Sequence between the candidate reference audio file that is selected described in calculating and described target audio file, comprises at least one cross-correlation coefficient in described cross correlation Number Sequence.

Described first chooses unit 310d2, for choosing the maximal value p of cross-correlation coefficient from described cross correlation Number Sequence ₀.

Described acquiring unit 310d3, for obtaining described maximal value p ₀corresponding position deviation m ₀.

Described second chooses unit 310d4, for according to described position deviation m ₀[m between primary importance deviation area ₀+ m _min, m ₀+ m _max] and second place deviation area between [m ₀-m _max, m ₀-m _min] corresponding to cross-correlation coefficient in choose maximal value p ₁, 1≤m _min< m _max.

Described detecting unit 310d5, for detecting described maximal value p ₀with described maximal value p ₁between ratio p ₀/ p ₁whether be greater than predetermined threshold value.

Described determining unit 310d6, for working as described ratio p ₀/ p ₁when being greater than described predetermined threshold value, between the candidate reference audio file be selected described in determining and described target audio file, there is strong correlation.

Alternatively, described computing unit 310d1, comprising: sampling subelement 310d11, extraction subelement 310d12 and computation subunit 310d13.

Described sampling subelement 310d11, obtains candidate audio sample sequence for sampling from the described candidate reference audio file be selected with default sampling rate, and sample from described target audio file acquisition target audio sample sequence with described default sampling rate.

Described extraction subelement 310d12, for extracting the voice data of preset length from the same position of described candidate audio sample sequence and described target audio sample sequence, obtains candidate audio data sequence and target audio data sequence respectively.

Described computation subunit 310d13, for calculating the cross correlation Number Sequence between described candidate audio data sequence and described target audio data sequence.

Alternatively, described computation subunit 310d13, specifically for:

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

Alternatively, described computation subunit 310d13, specifically for:

R_x y^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

R_x y^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

Alternatively, described acquisition submodule 310a, comprising: classification acquiring unit 310a1, classification determining unit 310a2 and ff unit 310a3.

Described classification acquiring unit 310a1, for obtaining the classification belonging to described target audio file, described in be categorized as single class, on-the-spot class, accompaniment class and noise reduction class in any one.

Described classification determining unit 310a2, determines for the classification belonging to described target audio file the target classification searching described candidate reference audio file.

Described ff unit 310a3, for searching the audio file that meets and preset and choose condition as described candidate reference audio file in described target classification; Wherein, described presetting is chosen condition and is comprised: described audio file has the lyrics file through manually binding, described audio file belongs at least one in high-sound quality audio file.

Alternatively, described classification determining unit 310a2, comprising:

Subelement 310a21 is determined in first classification, for when the classification belonging to described target audio file belongs to described single class, determines that described single class is described target classification; And/or,

Subelement 310a22 is determined in second classification, for when the classification belonging to described target audio file belongs to described on-the-spot class, determines that described on-the-spot class is described target classification; And/or,

Subelement 310a23 is determined in 3rd classification, for when the classification belonging to described target audio file belongs to described accompaniment class, determines that described accompaniment class, described single class and described on-the-spot class are described target classification; And/or,

Subelement 310a24 is determined in 4th classification, for when the classification belonging to described target audio file belongs to described noise reduction class, determines that described noise reduction class, described single class and described on-the-spot class are described target classification.

In addition, the device that the present embodiment provides, also by choosing the candidate reference audio file between target audio file with strong correlation as with reference to audio file, calculate to carry out follow-up time deviation and revise, substantially increase the accuracy of the final lyrics file generated, ensure that system performance.

In addition, the device that the present embodiment provides, when calculating cross correlation Number Sequence, adopts account form from coarse to fine, further increases efficiency when cross-correlation coefficient calculates, saves and calculate and processing expenditure.

It should be noted that: the device of the generation lyrics file that above-described embodiment provides is when generating lyrics file, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by equipment is divided into different functional modules, to complete all or part of function described above.In addition, the device of the generation lyrics file that above-described embodiment provides belongs to same design with the embodiment of the method for the method generating lyrics file, and its specific implementation process refers to embodiment of the method, repeats no more here.

Should be understood that, use in this article, unless context clearly supports exception, singulative " " (" a ", " an ", " the ") is intended to also comprise plural form.It is to be further understood that the "and/or" used in this article refers to comprise any of more than one or one project listed explicitly and likely combine.

The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. generate a method for lyrics file, it is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, the reference audio file that the pending target audio file of described acquisition is corresponding, comprising:

3. method according to claim 2, is characterized in that, whether has strong correlation, comprising between the candidate reference audio file that described detection is selected and described target audio file:

Obtain described maximal value p ₀corresponding position deviation m ₀;

4. method according to claim 3, is characterized in that, the cross correlation Number Sequence between the candidate reference audio file that described calculating is selected and described target audio file, comprising:

5. method according to claim 4, is characterized in that, the cross correlation Number Sequence between described calculating described candidate audio data sequence and described target audio data sequence, comprising:

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

6. method according to claim 4, is characterized in that, the cross correlation Number Sequence between described calculating described candidate audio data sequence and described target audio data sequence, comprising:

{R_xy}^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

{R_xy}^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

7., according to the arbitrary described method of claim 2 to 6, it is characterized in that at least one candidate reference audio file that described acquisition described target audio file is corresponding comprises:

8. method according to claim 7, is characterized in that, described classification belonging to described target audio file determines the target classification searching described candidate reference audio file, comprising:

9. generate a device for lyrics file, it is characterized in that, described device comprises:

10. device according to claim 9, is characterized in that, described acquisition module, comprising: obtain submodule, sorting sub-module, choose submodule, detection sub-module and determine submodule;

11. devices according to claim 10, is characterized in that, described detection sub-module, comprising: computing unit, first chooses unit, acquiring unit, second chooses unit, detecting unit and determining unit;

12. devices according to claim 11, is characterized in that, described computing unit, comprising: sampling subelement, extraction subelement and computation subunit;

13. devices according to claim 12, is characterized in that, described computation subunit, specifically for:

R_xy (m) = Σ_{n = 0}^{N - 1} x (n + m) y (n);

14. devices according to claim 12, is characterized in that, described computation subunit, specifically for:

{R_xy}^{'} (m) = Σ_{n = 0}^{(N - 1) / k} x^{'} (n + m) y^{'} (n);

{R_xy}^{''} (m) = Σ_{n = 0}^{N_{0}} x^{''} (n + m) y^{''} (n);

15. according to claim 10 to 14 arbitrary described devices, and it is characterized in that, described acquisition submodule, comprising: classification acquiring unit, classification determining unit and ff unit;

16. devices according to claim 15, is characterized in that, described classification determining unit, comprising: