CN105975568A

CN105975568A - Audio processing method and apparatus

Info

Publication number: CN105975568A
Application number: CN201610288300.3A
Authority: CN
Inventors: 孙嘉骏; 王志豪; 赵伟峰; 杨雍; 车斌; 周旋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2016-09-28
Anticipated expiration: 2036-04-29
Also published as: CN105975568B

Abstract

Embodiments of the invention provide an audio processing method and apparatus. The method can comprise the steps of extracting target audio data of a preset duration from a to-be-processed target audio file; performing offset slicing processing on the target audio data to obtain at least one audio slice; acquiring fingerprint information of the at least one audio slice and comparing the fingerprint information of the at least one audio slice with a preset fingerprint information library; and locating a characteristic position of the target audio file according to a comparison result, wherein the characteristic position is a slice head position or a slice tail position. According to the method and apparatus, the characteristic position such as the slice head position or the slice tail position of the audio file can be automatically located, so that the audio processing efficiency and accuracy are improved.

Description

A kind of audio-frequency processing method and device

Technical field

The present invention relates to Internet technical field, be specifically related to Audiotechnica field, particularly relate to a kind of audio frequency Processing method and processing device.

Background technology

Audio file may include but be not limited to: the song in internet audio storehouse, snatch of song, voice class joint Mesh；The song of station television broadcasting, snatch of song, voice class program etc..The head of audio file refer to The head end being positioned at audio file holds the voice data of the effect of opening, and the run-out of audio file refers to be positioned at audio frequency literary composition The end of part is summed up, the voice data of end effect.There is teaser or tail in some audio file, and some There is not teaser or tail in audio file.In prior art, whether audio file exists teaser or tail is all by people Work judges, and head position or run-out position are generally by manually getting realization ready, along with audio file number The increase day by day of amount, manual work can not meet Audio Processing and proposed efficiency and accuracy increasingly High requirement.

Summary of the invention

The embodiment of the present invention provides a kind of audio-frequency processing method and device, it is possible to the such as head to audio file Or the feature locations of run-out realizes automatization location, promote efficiency and the accuracy of Audio Processing.

Embodiment of the present invention first aspect provides a kind of audio-frequency processing method, it may include:

The target audio data of preset duration are extracted from pending target audio file；

Carry out described target audio data offseting slicing treatment, it is thus achieved that at least one audio frequency burst；

Gather the finger print information of at least one audio frequency burst described, and by the finger of at least one audio frequency burst described Stricture of vagina information is compared with preset finger print information storehouse respectively；

Position the feature locations of described target audio file according to comparison result, described feature locations is head position Put or run-out position.

Preferably, the described target audio data extracting preset duration from pending target audio file it Before, also include:

Create preset finger print information storehouse, described preset finger print information storehouse comprises at least one special edition finger print information Storehouse, a special edition finger print information storehouse comprises the finger print information of at least one audio file being subordinated to same special edition.

Preferably, the described target audio data extracting preset duration from pending target audio file, Including:

The first audio frequency number of the first preset duration is extracted from the starting position of pending target audio file order According to；Or,

The second audio frequency number of the second preset duration is extracted from the end position backward of pending target audio file According to.

Preferably, described carry out described target audio data offsets slicing treatment, it is thus achieved that at least one audio frequency Burst, including:

One section of preset burst duration is extracted every preset shift time from the original position of described target audio data Audio frequency burst；

Store at least one audio frequency burst of being obtained successively, and record at least one audio frequency burst described time Between attribute；

Wherein, the time attribute of an audio frequency burst includes: initial time and relative to described target sound frequency According to the shift time of original position.

Preferably, the described finger print information by least one audio frequency burst described respectively with preset finger print information storehouse Compare, including:

Inquire about the target special edition belonging to described target audio file；

From described preset finger print information storehouse, select target special edition finger print information storehouse, and read described target special edition The finger print information of at least one audio file in finger print information storehouse；

The order ascending according to shift time chooses present video from least one audio frequency burst described successively Burst, by the finger print information of selected present video burst and described target special edition finger print information storehouse extremely The finger print information of a few audio file is compared；

If described target special edition finger print information storehouse exists the audio file more than or equal to predetermined number threshold value Finger print information matches with the finger print information of selected present video burst, it is determined that selected current sound Frequency division sheet is coupling audio frequency burst；

If described target special edition finger print information exists the finger print information of the audio file less than predetermined number threshold value Match with the finger print information of selected present video burst, it is determined that selected present video burst is Non-matching audio frequency burst, and the fingerprint of all audio frequency bursts after stopping selected present video burst The finger print information of information and at least one audio file in described target special edition finger print information storehouse is compared.

Preferably, the described feature locations positioning described target audio file according to comparison result, including:

The time attribute and of the coupling audio frequency burst of monic is obtained according to the order that shift time is ascending The time attribute of one, end coupling audio frequency burst；

If described target audio data are the first voice data, according to the time of the coupling audio frequency burst of monic Attribute determines the head starting position of described target audio file, and according to the last coupling audio frequency burst Time attribute determines the head end position of described target audio file；

If described target audio data are second audio data, according to the time of the coupling audio frequency burst of monic Attribute determines the run-out end position of described target audio file, and according to the last coupling audio frequency burst Time attribute determines the run-out starting position of described target audio file.

Embodiment of the present invention second aspect provides a kind of apparatus for processing audio, it may include:

Extraction unit, for extracting the target audio data of preset duration from pending target audio file；

Processing unit, for carrying out skew slicing treatment, it is thus achieved that at least one sound to described target audio data Frequency division sheet；

Collecting unit, for gathering the finger print information of at least one audio frequency burst described；

Comparing unit, for by the finger print information of at least one audio frequency burst described respectively with preset finger print information Compare in storehouse；

Positioning unit, for positioning the feature locations of described target audio file, described spy according to comparison result Levying position is head position or run-out position.

Preferably, described device also includes:

Creating unit, is used for creating preset finger print information storehouse, comprises at least one in described preset finger print information storehouse Individual special edition finger print information storehouse, a special edition finger print information storehouse comprises at least one audio frequency being subordinated to same special edition The finger print information of file.

Preferably, described extraction unit is specifically for from the starting position of pending target audio file order Extract the first voice data of the first preset duration；Or, for from the knot of pending target audio file Bundle position backward extracts the second audio data of the second preset duration.

Preferably, described processing unit includes:

Audio frequency burst extraction unit, for from the original position of described target audio data every preset skew time Between extract one section of preset burst duration audio frequency burst；

Memory element, for storing at least one audio frequency burst of being obtained successively, and described in recording at least one The time attribute of individual audio frequency burst；

Preferably, described comparing unit includes:

Target special edition query unit, for inquiring about the target special edition belonging to described target audio file；

Storehouse selects unit, for selecting target special edition finger print information storehouse from described preset finger print information storehouse；

Finger print information reads unit, for reading at least one audio frequency in described target special edition finger print information storehouse The finger print information of file；

Currently choose unit, for the order ascending according to shift time successively from least one sound described Frequency division sheet chooses present video burst；

Current comparing unit, for by the finger print information of selected present video burst and described target special edition The finger print information of at least one audio file in finger print information storehouse is compared；

Result determines unit, if existing more than or equal to present count in described target special edition finger print information storehouse The finger print information of audio file of amount threshold value matches, then with the finger print information of selected present video burst Determine that selected present video burst is for coupling audio frequency burst；Or, if for described target special edition fingerprint Information exists the finger print information of the audio file less than predetermined number threshold value and selected present video burst Finger print information match, it is determined that selected present video burst is non-matching audio frequency burst, and stops By the finger print information of all audio frequency bursts after selected present video burst and described target special edition fingerprint The finger print information of at least one audio file in information bank is compared.

Preferably, described positioning unit includes:

Time attribute acquiring unit, for obtaining monic coupling according to the order that shift time is ascending The time attribute of audio frequency burst and the time attribute of the last coupling audio frequency burst；

Head position determination unit, if being the first voice data for described target audio data, according to the most first The time attribute of one coupling audio frequency burst determines the head starting position of described target audio file, and according to The time attribute of the last coupling audio frequency burst determines the head end position of described target audio file；

Run-out position determination unit, if being second audio data for described target audio data, according to the most first The time attribute of one coupling audio frequency burst determines the run-out end position of described target audio file, and according to The time attribute of the last coupling audio frequency burst determines the run-out starting position of described target audio file.

The embodiment of the present invention can extract the target audio data of preset duration from pending target audio file, Carry out described target audio data offseting slicing treatment, it is thus achieved that at least one audio frequency burst；Use preset finger Stricture of vagina information bank carrys out the finger print information of at least one audio frequency burst of comparison, according to mesh described in comparison result analyzing and positioning The head position of mark with phonetic symbols frequency file or run-out position, can realize the sheet to target audio file by said process Head scribing tail position carries out automatization location, saves human cost, effectively promote Audio Processing efficiency and Accuracy.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

The flow chart of a kind of audio-frequency processing method that Fig. 1 provides for the embodiment of the present invention；

The flow chart of the another kind of audio-frequency processing method that Fig. 2 provides for the embodiment of the present invention；

The flow chart of another audio-frequency processing method that Fig. 3 provides for the embodiment of the present invention；

The structural representation of a kind of apparatus for processing audio that Fig. 4 provides for the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation The every other embodiment obtained under property work premise, broadly falls into the scope of protection of the invention.

In the embodiment of the present invention, audio file may include but be not limited to: the song in internet audio storehouse, song Knee-piece section, voice class program；The song of station television broadcasting, snatch of song, voice class program etc..For More precisely carry out Audio Processing, audio file described in the follow-up each embodiment of the present invention, preferably refer to The file of original audio frequency format, the most preferably 8K sample rate, 16bit quantization digit, monophonic wav ( Plant AIFC) file.If pending audio file is the file of other audio formats, such as: MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio frequency Aspect 3), WMA (Windows Media Audio, digital audio format), APE (a kind of DAB Nondestructive compression type) etc. the audio file of form, then need first it to be carried out format conversion processing.

In prior art, the head position of audio file or run-out position generally by manually getting realization ready, Along with the increase day by day of audio file quantity, manually get the requirement that can not meet efficiency and accuracy ready.Base In this, the embodiment of the present invention can extract the target sound frequency of preset duration from pending target audio file According to, carry out described target audio data offseting slicing treatment, it is thus achieved that at least one audio frequency burst；Use pre- Put finger print information storehouse and carry out the finger print information of at least one audio frequency burst of comparison, according to comparison result analyzing and positioning institute State head position or the run-out position of target audio file, can be realized target audio file by said process Head scribing tail position carry out automatization location, save human cost, effectively promote the effect of Audio Processing Rate and accuracy.

Based on foregoing description, embodiments provide a kind of audio-frequency processing method, refer to Fig. 1, the party Method can comprise the following steps S101-step S105.

S101, extracts the target audio data of preset duration from pending target audio file.

Generally, the head of an audio file or the duration of run-out all will not be the longest, based on this feature, this step Suddenly the target audio data of preset duration can be extracted from target audio file to carry out follow-up head or run-out Analysis, due to be only extract one section of preset duration target audio data be analyzed, be not directed to The voice data of whole audio file is analyzed, and so can reduce interfering between voice data, Promote the efficiency of Audio Processing.Herein it should be noted that preset duration can set according to practical experience Fixed, such as: usually, a length of 5s-120s during the head of audio file, a length of 5s-60s during the run-out of audio file； So, if desired position the head position of target audio file, then can extract front 2 from target audio file The target audio data of minute (120s) are analyzed；If desired the run-out position of target audio file is positioned, The target audio data that then can extract end 1 minute (60s) from target audio file are analyzed.

Described target audio data are carried out offseting slicing treatment, it is thus achieved that at least one audio frequency burst by S102.

Skew burst processes and refers to i.e. cut every certain shift time the audio frequency burst of one section of certain time length, Such as: assume that shift time is 1s, a length of 10s during burst, then, can be from the beginning of target audio data Position is risen, the first audio frequency burst of a length of 10s, the shift time of this first audio frequency burst when cutting during skew 0s For 0s, the beginning and ending time is 0s-10s；The second audio frequency burst of a length of 10s, this second sound when cutting during skew 1s The shift time of frequency division sheet is 1s, and the beginning and ending time is 1s-11s；3rd sound of a length of 10s when cutting during skew 2s Frequency division sheet, the shift time of the 3rd audio frequency burst is 2s, and the beginning and ending time is 2s-12s；By that analogy.By This is visible, and at least one the audio frequency burst obtained after migration processing, the duration of each audio frequency burst is identical, The voice data that each audio frequency burst is comprised exists overlapping, but the beginning and ending time of each audio frequency burst and skew Time all differs.In implementing, can use some audio processing tool that target audio data are entered Line displacement slicing treatment, audio processing tool herein can include but not limited to: ffmpeg (Fast Forward Mpeg, is used for recording, converted digital audio, video, and is translated into the computer program of increasing income of stream) Instrument.Preferably, audio frequency burst is 8K sample rate, 16bit quantization digit, monophonic wav file.

S103, gathers the finger print information of at least one audio frequency burst described.

The finger print information of audio frequency refers to represent the important acoustic features of a section audio, be wrapped based on this audio frequency The digital signature of compacting of the content contained, possesses following major advantage: 1. robustness, even if comparing occurs in audio frequency The situations such as serious distortion, noise, modified tone, finger print information remains able to identify and characterize the important of this audio frequency Acoustic features；2. distinction, a finger print information can uniquely identify a section audio, the finger between different audio frequency Stricture of vagina information has difference；3. reliability, i.e. by the probability of its wrong identification during finger print information identification audio frequency relatively Low.It is to say, the finger print information of audio frequency burst refers to represent the important acoustic features of this audio frequency burst The digital signature of compacting based on content.In implementing, some audio-frequency fingerprint extraction algorithms can be used Gathering the finger print information of each audio frequency burst, audio-frequency fingerprint extraction algorithm herein may include but be not limited to: Big fingerprint characteristic algorithm, hash algorithm, Complex Cepstrum Transform algorithm, wavelet package transforms algorithm etc..One sound The corresponding finger print information of frequency division sheet.

S104, compares with preset finger print information storehouse respectively by the finger print information of at least one audio frequency burst described Right.

Preset finger print information storehouse stores the finger print information of at least one audio file；In implementing, can Successively that the finger print information of at least one audio frequency burst described is civilian with each audio frequency in preset finger print information storehouse respectively The finger print information of part is compared, if the finger print information of certain audio frequency burst and the fingerprint of certain audio file The similarity of information reaches preset value, and (preset value herein can set according to actual needs, such as: 85%, 90% Deng) more than, then it is believed that this audio frequency burst matches with this audio file in preset finger print information storehouse.

S105, positions the feature locations of described target audio file according to comparison result, and described feature locations is Head position or run-out position.

Usually, head or the run-out of audio file possesses repeatability, homogeny.So, if certain sound The finger print information of frequency division sheet all matches with the finger print information of the multiple audio files in preset finger print information storehouse, Then show preset finger print information storehouse exists multiple identical, finger print information of repetition, then it is believed that this audio frequency Burst belongs to head or run-out.Based on this principle, this step can be according to each sound obtained by step S104 With the comparison result in preset finger print information storehouse, frequency division sheet determines whether described audio frequency burst belongs to head or run-out, And head position or the run-out position of target audio file can be positioned further.

The audio-frequency processing method of the embodiment of the present invention, when can extract default from pending target audio file Described target audio data are carried out offseting slicing treatment, it is thus achieved that at least one sound by long target audio data Frequency division sheet；Preset finger print information storehouse is used to carry out the finger print information of at least one audio frequency burst of comparison, according to comparison Interpretation of result positions head position or the run-out position of described target audio file, can be realized by said process The head scribing tail position of target audio file is carried out automatization location, saves human cost, effectively carry Rise efficiency and the accuracy of Audio Processing.

The embodiment of the present invention additionally provides another kind of audio-frequency processing method, and the method for the present embodiment lays particular emphasis on description How to position the process of the head position of target audio file.Referring to Fig. 2, the method can comprise the following steps S201-step S213.

S201, creates preset finger print information storehouse, comprises at least one special edition and refer in described preset finger print information storehouse Stricture of vagina information bank.Wherein, a special edition comprises at least one audio file, and a special edition finger print information storehouse comprises It is subordinated to the finger print information of at least one audio file of same special edition.

In the present embodiment, preset finger print information storehouse can use following table one to be indicated:

Table one: preset finger print information storehouse

From above-mentioned table one, preset finger print information storehouse stores the finger print information of at least one audio file； Preferably, in the present embodiment, preset finger print information storehouse divides with special edition for dimension, be subordinated to same specially The finger print information of each audio file collected is stored in same special edition finger print information storehouse by unified.So, for The subsequent treatment of target audio file has only to carry out in the special edition finger print information storehouse belonging to target audio file, Substantially increase Audio Processing efficiency.

S202, from the starting position of pending target audio file, order extracts the first of the first preset duration Voice data.

First preset duration can be set according to practical experience, such as: usually, the sheet of audio file A length of 5s-120s during head, described first preset duration may be set to 5s-120s.In the present embodiment, it is assumed that target Audio file is one 5 minutes long songs a1, and the first preset duration is 120s, then, this song can be extracted First voice data of first 2 minutes (120s) of bent a1 is analyzed.

S203, extracts one section preset point from the original position of described first voice data every preset shift time The audio frequency burst of sheet duration.

S204, stores at least one the audio frequency burst obtained successively, and record at least one audio frequency described divides The time attribute of sheet.

Step S203-S204 of the present embodiment can be the concrete refinement of step S102 of embodiment illustrated in fig. 1 Step.In step S203-S204, preset shift time and preset burst duration all can be according to actual needs It is set.The present embodiment may be assumed that preset shift time is 1s, a length of 10s during preset burst, then according to Example shown in step S202, extracts first voice data of first 2 minutes from song a1, this first sound The original position of frequency evidence is i.e. the moment of the starting position of song a1, i.e. 0s, then, step S203-S204 In, the first audio frequency burst of a length of 10s when cutting during skew 0s, this first audio frequency burst is relative to the first audio frequency number According to the shift time of original position be 0s, the beginning and ending time is 0s-10s；A length of 10s when cutting during skew 1s Second audio frequency burst, this second audio frequency burst is 1s relative to the shift time of the original position of the first voice data, Beginning and ending time is 1s-11s；The 3rd audio frequency burst of a length of 10s, the 3rd audio frequency burst when cutting during skew 2s The shift time of the original position of relative first voice data is 2s, and the beginning and ending time is 2s-12s；By that analogy. At least one the audio frequency burst obtained can use following table two to be indicated:

Table two: audio frequency burst

Title	Shift time	Beginning and ending time
			First audio frequency burst	0s	0s-10s
Second audio frequency burst	1s	1s-11s
			3rd audio frequency burst	2s	2s-12s
…	…	…

S205, gathers the finger print information of at least one audio frequency burst described.This step can be found in shown in Fig. 1 real Execute step S103 of example, be not repeated herein.

S206, inquires about the target special edition belonging to described target audio file.

In internet audio storehouse or station television library of programmes, each special edition has unique ID of its correspondence, and The each audio file being subordinated to same special edition also has unique ID of its correspondence, internet audio storehouse or radio station electricity Depending on library of programmes stores the ID of the audio file of each special edition ID, each special edition of subordinate, also stored for audio frequency literary composition simultaneously Part and the membership relation of special edition.In step S206, can be according to the ID of target audio file from the Internet sound Frequently storehouse or station television library of programmes determine the target special edition belonging to target audio file, and read target special edition ID.

S207, selects target special edition finger print information storehouse from described preset finger print information storehouse.

S208, reads the finger print information of at least one audio file in described target special edition finger print information storehouse.

In step S207-S208, ID based on the target special edition read, can be by table described in the present embodiment One selects target special edition finger print information storehouse, and therefrom reads at least one in this target special edition finger print information storehouse The finger print information of audio file.According to the present embodiment example, target audio file is song a1, and it is subordinated to Special edition A, then special edition A finger print information storehouse can be selected to be target special edition finger print information storehouse according to table one, and from In read the finger print information of each audio file being subordinated to special edition A.

S209, chooses from least one audio frequency burst described successively according to the order that shift time is ascending and works as Front audio frequency burst, by the finger print information of selected present video burst and described target special edition finger print information storehouse In the finger print information of at least one audio file compare.

, if there is the audio frequency more than or equal to predetermined number threshold value in described target special edition finger print information storehouse in S210 The finger print information of file matches with the finger print information of selected present video burst, it is determined that selected Present video burst is coupling audio frequency burst.

, if there is the finger of the audio file less than predetermined number threshold value in described target special edition finger print information in S211 Stricture of vagina information matches with the finger print information of selected present video burst, it is determined that selected present video Burst is non-matching audio frequency burst, and all audio frequency bursts after stopping selected present video burst The finger print information of finger print information and at least one audio file in described target special edition finger print information storehouse carry out Comparison.

Step S209-S211 describes the comparison of at least one audio frequency burst and target special edition finger print information storehouse Journey.Specific as follows: according to the order that shift time is ascending, first to choose the first sound with reference to above-mentioned table two Frequency division sheet is present video burst, by the finger print information of the first audio frequency burst and target special edition A finger print information storehouse In the finger print information of each audio file compare, if there is more than or equal to predetermined number threshold value (this The predetermined number threshold value at place can be set, such as according to actual needs: can be set as 3,5 etc.) The finger print information of audio file and the finger print information of the first audio frequency burst match, then determine that the first audio frequency divides Sheet is coupling audio frequency burst；Then choosing the second audio frequency burst successively according still further to above-mentioned table two is that present video divides Sheet, repeat the above steps.Finger print information and first if there is the audio file less than predetermined number threshold value The finger print information of audio frequency burst matches, then determine that the first audio frequency burst is non-matching audio frequency burst, then Just do not continue to choose present video burst from table two, i.e. stop all audio frequency after the first audio frequency burst The comparison process of burst.

Step S206-S211 of the present embodiment can be the concrete refinement of step S104 of embodiment illustrated in fig. 1 Step.

S212, obtains the time genus of monic coupling audio frequency burst according to the order that shift time is ascending Property and the last coupling audio frequency burst time attribute.

S213, determines the sheet of described target audio file according to the time attribute of the coupling audio frequency burst of monic Head starting position, and determine described target audio file according to the time attribute of the last coupling audio frequency burst Head end position.

Step S212-S213 of the present embodiment can be the concrete refinement of step S105 of embodiment illustrated in fig. 1 Step.In step S212-S213, it is assumed that predetermined number threshold value is 3, comparison result can be expressed as follows table three:

Table three: comparison result

From above-mentioned table three, owing to target special edition finger print information storehouse only existing the fingerprint of 1 audio file Information matches with the finger print information of the 9th audio frequency burst, namely only exists in target special edition finger print information storehouse Less than the finger print information of the audio file of predetermined number threshold value with selected as the 9th of present video burst The finger print information of audio frequency burst matches, then, all audio frequency bursts after the 9th audio frequency burst will no longer Carry out fingerprint comparison process.The order ascending according to shift time understands, and the first audio frequency burst is the most first One coupling audio frequency burst, octave frequency division sheet is that the last mates audio frequency burst, say, that the first sound Frequency division sheet belongs to head to octave frequency division sheet；So understand with reference to above-mentioned table two, the first audio frequency burst phase To the shift time of the original position (i.e. the starting position of song a1) of the first voice data be 0s, initial time Between be 0s；Octave frequency division sheet is relative to the original position (i.e. the starting position of song a1) of the first voice data Shift time be 6s, initial time be 7s；Thus can determine that the target audio file i.e. head of song a1 are opened Beginning position be the shift time (or initial time) of the first audio frequency burst be 0s, head end position is according to Shift time and the initial time of octave frequency division sheet are calculated as 6+7=13s.

The embodiment of the present invention additionally provides another audio-frequency processing method, and the method for the present embodiment lays particular emphasis on description How to position the process of the run-out position of target audio file.Referring to Fig. 3, the method can comprise the following steps S301-step S313.

S301, creates preset finger print information storehouse, comprises at least one special edition and refer in described preset finger print information storehouse Stricture of vagina information bank.Wherein, a special edition comprises at least one audio file, and a special edition finger print information storehouse comprises It is subordinated to the finger print information of at least one audio file of same special edition.

Step S301 of the present embodiment can be found in step S201 of embodiment illustrated in fig. 2, is not repeated herein.

S302, extracts the second of the second preset duration from the end position backward of pending target audio file Voice data.

Second preset duration can be set according to practical experience, such as: usually, the sheet of audio file A length of 5s-60s during tail；So, described second preset duration then may be set to 5s-60s.In the present embodiment, false If target audio file is one 5 minutes long songs a1, the second preset duration is 60s, then, can extract The second audio data of the most end 1 minute (60s) of this song a1 is analyzed.

S303, extracts one section preset point from the original position of described second audio data every preset shift time The audio frequency burst of sheet duration.

S304, stores at least one the audio frequency burst obtained successively, and record at least one audio frequency described divides The time attribute of sheet.

Step S303-S304 of the present embodiment may refer in step S203-S204 of embodiment illustrated in fig. 2, Do not repeat them here, but it should be recognized that the present embodiment is from target audio due to second audio data The end position backward of file (song a1) is extracted and is obtained, then, the original position of second audio data It is i.e. the end position of song a1, the moment of i.e. 5 minutes, then, in step S303-S304, offset 0s Time the first audio frequency burst of a length of 10s when cutting, this first audio frequency burst is relative to the original position of second audio data The shift time of (i.e. the end position of song a1) is 0s, and the beginning and ending time is 0s-10s；When cutting during skew 1s The second audio frequency burst of a length of 10s, this second audio frequency burst (is i.e. sung relative to the original position of second audio data The end position of bent a1) shift time be 1s, the beginning and ending time is 1s-11s；During skew 2s a length of when cutting The 3rd audio frequency burst of 10s, the 3rd audio frequency burst is relative to original position (i.e. song a1 of second audio data End position) shift time be 2s, the beginning and ending time is 2s-12s；By that analogy.

S305, gathers the finger print information of at least one audio frequency burst described.

S306, inquires about the target special edition belonging to described target audio file.

S307, selects target special edition finger print information storehouse from described preset finger print information storehouse.

S308, reads the finger print information of at least one audio file in described target special edition finger print information storehouse.

S309, chooses from least one audio frequency burst described successively according to the order that shift time is ascending and works as Front audio frequency burst, by the finger print information of selected present video burst and described target special edition finger print information storehouse In the finger print information of at least one audio file compare.

, if there is the audio frequency more than or equal to predetermined number threshold value in described target special edition finger print information storehouse in S310 The finger print information of file matches with the finger print information of selected present video burst, it is determined that selected Present video burst is coupling audio frequency burst.

, if there is the finger of the audio file less than predetermined number threshold value in described target special edition finger print information in S311 Stricture of vagina information matches with the finger print information of selected present video burst, it is determined that selected present video Burst is non-matching audio frequency burst, and all audio frequency bursts after stopping selected present video burst The finger print information of finger print information and at least one audio file in described target special edition finger print information storehouse carry out Comparison.

S312, obtains the time genus of monic coupling audio frequency burst according to the order that shift time is ascending Property and the last coupling audio frequency burst time attribute.

S313, determines the sheet of described target audio file according to the time attribute of the coupling audio frequency burst of monic Tail end position, and determine described target audio file according to the time attribute of the last coupling audio frequency burst Run-out starting position.

Step S305-S313 of the present embodiment may refer to step S205-S213 of embodiment illustrated in fig. 2, This does not repeats.But it should be recognized that in the present embodiment, the order ascending according to shift time understands, First audio frequency burst is that monic mates audio frequency burst, and octave frequency division sheet is that the last coupling audio frequency divides Sheet, say, that the first audio frequency burst belongs to run-out to octave frequency division sheet, then can with reference to above-mentioned table two Knowing, the first audio frequency burst is inclined relative to the original position of second audio data (i.e. the end position of song a1) Shift time is 0s, initial time is 0s；Octave frequency division sheet relative to second audio data original position (i.e. The end position of song a1) shift time be 6s, initial time be 7s；Thus can determine that target audio literary composition The run-out end position of i.e. song a1 of part be the first audio frequency burst relative to second audio data original position (i.e. The end position of song a1) shift time (or initial time) be 0s, namely the stop bits of song a1 Put 5 minute moment；Run-out starting position according to octave frequency division sheet relative to the original position of second audio data Shift time and the initial time of (i.e. the end position of song a1) are calculated as 6+7=13s, namely song a1 4 points of 47 second moment.

Description based on said method embodiment, below in conjunction with accompanying drawing 4, the sound providing the embodiment of the present invention Frequency processing device describes in detail.It should be noted that following apparatus for processing audio can be used for performing State the audio-frequency processing method shown in Fig. 1-Fig. 3.Specifically, a kind of Audio Processing is embodiments provided Device, please also refer to Fig. 4, this plant running such as lower unit:

Extraction unit 101, for extracting the target sound frequency of preset duration from pending target audio file According to.

Processing unit 102, for carrying out skew slicing treatment to described target audio data, it is thus achieved that at least one Audio frequency burst.

Collecting unit 103, for gathering the finger print information of at least one audio frequency burst described.

Comparing unit 104, for believing the finger print information of at least one audio frequency burst described with preset fingerprint respectively Compare in breath storehouse.

Positioning unit 105, for positioning the feature locations of described target audio file according to comparison result, described Feature locations is head position or run-out position.

In implementing, this device also runs such as lower unit:

Creating unit 106, is used for creating preset finger print information storehouse, comprises at least in described preset finger print information storehouse One special edition finger print information storehouse, a special edition finger print information storehouse comprises at least one sound being subordinated to same special edition The finger print information of frequency file.

In implementing, this device specifically performs from pending target when running described extraction unit 101 The starting position order of audio file extracts the first voice data of the first preset duration；Or, specifically perform The second audio data of the second preset duration is extracted from the end position backward of pending target audio file.

In implementing, this device is during running described processing unit 102, and carrying out practically such as places an order Unit:

Audio frequency burst extraction unit 1001, for from the original position of described target audio data every preset partially Shift time extracts the audio frequency burst of one section of preset burst duration.

Memory element 1002, for storing at least one the audio frequency burst obtained successively, and described in record extremely The time attribute of a few audio frequency burst.Wherein, the time attribute of an audio frequency burst includes: initial time And the shift time of the original position relative to described target audio data.

In implementing, this device is during running described comparing unit 104, and carrying out practically such as places an order Unit:

Target special edition query unit 2001, for inquiring about the target special edition belonging to described target audio file.

Storehouse selects unit 2002, for selecting target special edition finger print information storehouse from described preset finger print information storehouse.

Finger print information reads unit 2003, for reading at least one in described target special edition finger print information storehouse The finger print information of audio file.

Currently choose unit 2004, for the order ascending according to shift time successively from described at least one Individual audio frequency burst chooses present video burst.

Current comparing unit 2005, for by the finger print information of selected present video burst and described target The finger print information of at least one audio file in special edition finger print information storehouse is compared.

Result determines unit 2006, if existing in described target special edition finger print information storehouse more than or equal to pre- If the finger print information of the audio file of amount threshold matches with the finger print information of selected present video burst, Then determine that selected present video burst is for coupling audio frequency burst；Or, if referring to for described target special edition The finger print information that there is the audio file less than predetermined number threshold value in stricture of vagina information divides with selected present video The finger print information of sheet matches, it is determined that selected present video burst is non-matching audio frequency burst, and stops Only the finger print information of all audio frequency bursts after selected present video burst is referred to described target special edition The finger print information of at least one audio file in stricture of vagina information bank is compared.

In implementing, this device is during running described positioning unit 105, and carrying out practically such as places an order Unit:

Time attribute acquiring unit 3001, for obtaining monic according to the order that shift time is ascending The time attribute of coupling audio frequency burst and the time attribute of the last coupling audio frequency burst.

Head position determination unit 3002, if being the first voice data for described target audio data, according to The time attribute of monic coupling audio frequency burst determines the head starting position of described target audio file, and Time attribute according to the last coupling audio frequency burst determines the head end position of described target audio file.

Run-out position determination unit 3003, if being second audio data for described target audio data, according to The time attribute of monic coupling audio frequency burst determines the run-out end position of described target audio file, and Time attribute according to the last coupling audio frequency burst determines the run-out starting position of described target audio file.

The method that can be used for execution Fig. 1-embodiment illustrated in fig. 3 due to the apparatus for processing audio shown in Fig. 4, because of This, the function of each unit shown in Fig. 4 can be found in the associated description of each step of method shown in Fig. 1-Fig. 3, This does not repeats.Set it should be strongly noted that the apparatus for processing audio shown in Fig. 4 can be operate in entity An application program in Bei, and at least there is the embodiment that following two is feasible:

In a kind of feasible embodiment, this apparatus for processing audio can be operate in an entity device Work alone, such as: this apparatus for processing audio can run in a terminal, this terminal can include but not It is limited to: PC (Personal Computer, personal computer), mobile phone, PDA (panel computer), intelligence can Wearable device etc., is independently realized the method flow shown in Fig. 1-Fig. 3 by terminal；Or, this Audio Processing Device can also run in a server, server independently realize the method flow shown in Fig. 1-Fig. 3.

In the embodiment that another kind is feasible, this apparatus for processing audio can be that distribution runs on multiple entity In equipment, Distributed parts co-ordination, such as: a part for this apparatus for processing audio can run on one In terminal, and another part can run in a server, by the work of terminal and Server assistance thus Realize the method flow shown in Fig. 1-Fig. 3.In this embodiment, creating unit 106 He shown in Fig. 4 Comparing unit 104 may be located in server, and extraction unit 101, processing unit 102, collecting unit 103 May be located in terminal with positioning unit 104；And correspondence is when performing the method flow shown in Fig. 1-Fig. 3, The process and the comparison process that create preset finger print information storehouse can betide in server, and other process bags Include extraction target audio data, it is thus achieved that at least one audio frequency burst, gather the fingerprint of at least one audio frequency burst Information, and the process of location feature position can betide in terminal.Specifically, terminal can be by audio frequency The finger print information of burst sends compares to server, server return comparison result, then depended on by terminal According to comparison result location feature position.

With method in like manner, the present invention, can be from pending target sound about in the embodiment of apparatus for processing audio Frequency file extracts the target audio data of preset duration, carries out described target audio data offseting at section Reason, it is thus achieved that at least one audio frequency burst；Preset finger print information storehouse is used to carry out at least one audio frequency burst of comparison Finger print information, according to head position or the run-out position of target audio file described in comparison result analyzing and positioning, Can realize the head scribing tail position of target audio file is carried out automatization location by said process, save Human cost, promotes efficiency and the accuracy of Audio Processing effectively.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, Can be by computer program and complete to instruct relevant hardware, described program can be stored in a calculating In machine read/write memory medium, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method. Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, Or random store-memory body (Random Access Memory, RAM) etc. ROM).

Above disclosed be only present pre-ferred embodiments, certainly can not with this limit the present invention it Interest field, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is contained.

Claims

1. an audio-frequency processing method, it is characterised in that including:

2. the method for claim 1, it is characterised in that described from pending target audio file Before the target audio data of middle extraction preset duration, also include:

3. method as claimed in claim 2, it is characterised in that described from pending target audio file The target audio data of middle extraction preset duration, including:

4. method as claimed in claim 2 or claim 3, it is characterised in that described to described target audio data Carry out offseting slicing treatment, it is thus achieved that at least one audio frequency burst, including:

5. method as claimed in claim 4, it is characterised in that described by least one audio frequency burst described Finger print information compare with preset finger print information storehouse respectively, including:

6. method as claimed in claim 5, it is characterised in that described according to the described mesh in comparison result location The feature locations of mark with phonetic symbols frequency file, including:

7. an apparatus for processing audio, it is characterised in that including:

8. device as claimed in claim 7, it is characterised in that also include:

9. device as claimed in claim 8, it is characterised in that described extraction unit is specifically for from treating The starting position order of the target audio file of reason extracts the first voice data of the first preset duration；Or, For extracting the second audio frequency number of the second preset duration from the end position backward of pending target audio file According to.

10. device as claimed in claim 8 or 9, it is characterised in that described processing unit includes:

11. devices as claimed in claim 10, it is characterised in that described comparing unit includes:

12. devices as claimed in claim 11, it is characterised in that described positioning unit includes: