CN105975568B

CN105975568B - Audio processing method and device

Info

Publication number: CN105975568B
Application number: CN201610288300.3A
Authority: CN
Inventors: 孙嘉骏; 王志豪; 赵伟峰; 杨雍; 车斌; 周旋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2020-04-03
Anticipated expiration: 2036-04-29
Also published as: CN105975568A

Abstract

The embodiment of the invention provides an audio processing method and an audio processing device, wherein the method comprises the following steps: extracting target audio data with preset duration from a target audio file to be processed; performing offset slicing processing on the target audio data to obtain at least one audio fragment; collecting fingerprint information of the at least one audio fragment, and comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively; and positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position. The invention can realize automatic positioning of the characteristic position of the audio file such as the leader or the trailer, and improve the efficiency and the accuracy of audio processing.

Description

Audio processing method and device

Technical Field

The invention relates to the technical field of internet, in particular to the technical field of audio, and particularly relates to an audio processing method and device.

Background

The audio files may include, but are not limited to: songs, song segments, and voice-like programs in an internet audio library; songs played on a radio or television, song clips, speech-like programs, etc. The title of the audio file refers to audio data which is positioned at the head end of the audio file and plays a role in starting, and the tail of the audio file refers to audio data which is positioned at the tail end of the audio file and plays a role in summarizing and ending. Some audio files have a leader and trailer, while some audio files do not have a leader and trailer. In the prior art, whether the audio file has the leader and the trailer is judged manually, the leader position or the trailer position is generally realized by manual dotting, and along with the increasing number of the audio files, the manual operation cannot meet the increasing requirements of the audio processing on efficiency and accuracy.

Disclosure of Invention

The embodiment of the invention provides an audio processing method and an audio processing device, which can realize automatic positioning of characteristic positions of an audio file, such as a leader or a trailer, and improve the efficiency and accuracy of audio processing.

A first aspect of an embodiment of the present invention provides an audio processing method, which may include:

extracting target audio data with preset duration from a target audio file to be processed;

performing offset slicing processing on the target audio data to obtain at least one audio fragment;

collecting fingerprint information of the at least one audio fragment, and comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;

and positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.

Preferably, before extracting the target audio data of the preset duration from the target audio file to be processed, the method further includes:

and creating a preset fingerprint information base, wherein the preset fingerprint information base comprises at least one album fingerprint information base, and one album fingerprint information base comprises fingerprint information of at least one audio file belonging to the same album.

Preferably, the extracting target audio data with a preset duration from the target audio file to be processed includes:

sequentially extracting first audio data of a first preset duration from the starting position of a target audio file to be processed; or,

and extracting second audio data with a second preset duration from the end position of the target audio file to be processed in a reverse order.

Preferably, the performing offset slicing processing on the target audio data to obtain at least one audio slice includes:

extracting audio fragments with preset fragment duration from the initial position of the target audio data at intervals of preset offset time;

sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;

wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.

Preferably, the comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively includes:

inquiring a target album to which the target audio file belongs;

selecting a target album fingerprint information base from the preset fingerprint information base, and reading fingerprint information of at least one audio file in the target album fingerprint information base;

sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album;

if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment;

if the fingerprint information of the audio files with the number smaller than the preset number threshold value in the fingerprint information of the target album is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matched audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

Preferably, the positioning the feature position of the target audio file according to the comparison result includes:

acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;

if the target audio data is first audio data, determining the start position of the title of the target audio file according to the time attribute of the first matched audio fragment, and determining the end position of the title of the target audio file according to the time attribute of the last matched audio fragment;

and if the target audio data is second audio data, determining the ending position of the end of the target audio file according to the time attribute of the first matched audio fragment, and determining the starting position of the end of the target audio file according to the time attribute of the last matched audio fragment.

A second aspect of the embodiments of the present invention provides an audio processing apparatus, which may include:

the extraction unit is used for extracting target audio data with preset duration from a target audio file to be processed;

the processing unit is used for carrying out offset slicing processing on the target audio data to obtain at least one audio fragment;

the acquisition unit is used for acquiring fingerprint information of the at least one audio fragment;

the comparison unit is used for comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;

and the positioning unit is used for positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.

Preferably, the apparatus further comprises:

the album fingerprint information database comprises fingerprint information of at least one audio file belonging to the same album.

Preferably, the extracting unit is specifically configured to sequentially extract first audio data of a first preset duration from a start position of a target audio file to be processed; or, the method is used for extracting the second audio data with the second preset duration from the end position of the target audio file to be processed in reverse order.

Preferably, the processing unit includes:

an audio slice extracting unit, configured to extract an audio slice with a preset slice duration from the start position of the target audio data every preset offset time;

the storage unit is used for sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;

Preferably, the alignment unit comprises:

a target album querying unit configured to query a target album to which the target audio file belongs;

the library selection unit is used for selecting a target album fingerprint information library from the preset fingerprint information libraries;

the fingerprint information reading unit is used for reading the fingerprint information of at least one audio file in the target album fingerprint information base;

the current selection unit is used for sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large;

the current comparison unit is used for comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the target album fingerprint information base;

a result determining unit, configured to determine that the selected current audio fragment is a matching audio fragment if fingerprint information of audio files of which the number of fingerprints is greater than or equal to a preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment; or, the audio file processing module is configured to determine that the selected current audio fragment is a non-matching audio fragment if the fingerprint information of the audio file of which the number is smaller than the preset number threshold exists in the fingerprint information of the target album and the fingerprint information of the selected current audio fragment matches with the fingerprint information of the selected current audio fragment, and stop comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

Preferably, the positioning unit includes:

the time attribute acquisition unit is used for acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;

a slice header position determining unit, configured to determine, if the target audio data is first audio data, a slice header start position of the target audio file according to a time attribute of a first matching audio slice, and determine a slice header end position of the target audio file according to a time attribute of a last matching audio slice;

and the end-of-piece position determining unit is used for determining the end-of-piece position of the target audio file according to the time attribute of the first matched audio fragment and determining the end-of-piece starting position of the target audio file according to the time attribute of the last matched audio fragment if the target audio data is the second audio data.

The embodiment of the invention can extract target audio data with preset duration from a target audio file to be processed, and perform offset slicing processing on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an audio processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another audio processing method according to an embodiment of the present invention;

fig. 3 is a flowchart of another audio processing method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the embodiment of the present invention, the audio file may include, but is not limited to: songs, song segments, and voice-like programs in an internet audio library; songs played on a radio or television, song clips, speech-like programs, etc. In order to perform audio processing more accurately, the audio file in the following embodiments of the present invention preferably refers to a file in an original audio format, i.e., a file with a sampling rate of 8K, a quantization bit number of 16 bits, and a mono wav (a sound file format) is preferred. If the audio file to be processed is a file in other audio formats, for example: audio files in the formats of MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), WMA (Windows Media Audio format), APE (a digital Audio lossless compression format), and the like need to be first subjected to format conversion processing.

In the prior art, the leader position or the trailer position of an audio file is generally realized by manual dotting, and along with the increasing number of audio files, the manual dotting cannot meet the requirements on efficiency and accuracy. Based on this, the embodiment of the present invention may extract target audio data with a preset duration from a target audio file to be processed, and perform offset slicing on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.

Based on the above description, an embodiment of the present invention provides an audio processing method, please refer to fig. 1, which may include the following steps S101 to S105.

S101, extracting target audio data with preset duration from a target audio file to be processed.

Generally, the duration of the leader or the trailer of an audio file is not very long, and based on the characteristic, the target audio data with the preset duration can be extracted from the target audio file to analyze the leader or the trailer subsequently. It should be noted that the preset time period may be set according to practical experience, for example: generally, the time length of the head of an audio file is 5s-120s, and the time length of the tail of the audio file is 5s-60 s; then, if the position of the title of the target audio file needs to be located, the target audio data of the previous 2 minutes (120s) can be extracted from the target audio file for analysis; if the position of the end of the target audio file needs to be located, the target audio data of the last 1 minute (60s) can be extracted from the target audio file for analysis.

S102, carrying out offset slicing processing on the target audio data to obtain at least one audio slice.

The offset slicing processing refers to cutting an audio slice at a certain offset time, for example: assuming that the offset time is 1s and the slice duration is 10s, a first audio slice with a time slice duration of 10s can be cut by offsetting 0s from the start position of the target audio data, the offset time of the first audio slice is 0s, and the start-stop time is 0s-10 s; a second audio slice with the time length of 10s is cut when the time length is deviated from 1s, the deviation time of the second audio slice is 1s, and the starting time and the ending time are 1s-11 s; a third audio slice with the time length of 10s and the time offset of 2s, wherein the offset time of the third audio slice is 2s, and the start-stop time is 2s-12 s; and so on. Therefore, the time length of each audio fragment in at least one audio fragment obtained after the offset processing is the same, the audio data contained in each audio fragment are overlapped, but the start-stop time and the offset time of each audio fragment are different. In a specific implementation, some audio processing tools may be used to perform offset slicing on the target audio data, and the audio processing tools herein may include, but are not limited to: ffmpeg (Fast Forward Mpeg, open source computer program for recording, converting digital audio, video, and converting them into streams). Preferably, the audio slice is an 8K sample rate, 16bit quantization bit number, mono wav file.

S103, collecting fingerprint information of the at least one audio fragment.

The audio fingerprint information refers to important acoustic features which can represent a section of audio and a compact digital signature based on the content contained in the audio, and has the following main advantages that ① robustness is achieved, even if the audio has serious distortion, noise, tone variation and the like, the fingerprint information can still identify and represent the important acoustic features of the audio, ② distinctiveness is achieved, one piece of fingerprint information can uniquely identify a section of audio, and the fingerprint information of different audios has differences, ③ reliability is low, namely the probability of error identification is low when the audio is identified through the fingerprint information.

S104, comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively.

Fingerprint information of at least one audio file is stored in a preset fingerprint information base; in a specific implementation, the fingerprint information of the at least one audio fragment may be sequentially compared with the fingerprint information of each audio file in a preset fingerprint information base, and if the similarity between the fingerprint information of a certain audio fragment and the fingerprint information of a certain audio file reaches a preset value (the preset value here may be set according to actual needs, for example, 85%, 90%, etc.) or more, the audio fragment may be considered to be matched with the audio file in the preset fingerprint information base.

And S105, positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.

Generally, the beginning or end of an audio file has repeatability and identity. If the fingerprint information of a certain audio fragment matches with the fingerprint information of a plurality of audio files in the preset fingerprint information base, it indicates that a plurality of identical and repeated fingerprint information exist in the preset fingerprint information base, and the audio fragment is considered to belong to a fragment head or a fragment tail. Based on this principle, this step may determine whether the audio fragment belongs to the slice header or the slice trailer according to the comparison result between each audio fragment obtained in step S104 and the preset fingerprint information base, and may further locate the slice header position or the slice trailer position of the target audio file.

According to the audio processing method provided by the embodiment of the invention, target audio data with preset duration can be extracted from a target audio file to be processed, and offset slicing processing is carried out on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.

The embodiment of the invention also provides another audio processing method, and the method of the embodiment focuses on the process of describing how to locate the position of the slice header of the target audio file. Referring to fig. 2, the method may include the following steps S201 to S213.

S201, a preset fingerprint information base is established, wherein the preset fingerprint information base comprises at least one album fingerprint information base. Wherein, an album contains at least one audio file, and an album fingerprint information base contains fingerprint information of at least one audio file belonging to the same album.

In this embodiment, the preset fingerprint information base may be represented by the following table one:

table one: preset fingerprint information base

As can be seen from the table one, the fingerprint information of at least one audio file is stored in the preset fingerprint information base; preferably, in this embodiment, the preset fingerprint information base is divided by taking albums as dimensions, and the fingerprint information of each audio file belonging to the same album is stored in the same album fingerprint information base in a unified manner. Therefore, the subsequent processing of the target audio file only needs to be carried out in the album fingerprint information base to which the target audio file belongs, and the audio processing efficiency is greatly improved.

S202, first audio data with a first preset duration are sequentially extracted from the starting position of the target audio file to be processed.

The first preset time period may be set according to practical experience, for example: generally, the slice header time length of the audio file is 5s-120s, and the first preset time length can be set to be 5s-120 s. In this embodiment, assuming that the target audio file is a song a1 with a length of 5 minutes and the first preset time duration is 120s, the first audio data of the first 2 minutes (120s) of the song a1 may be extracted for analysis.

S203, extracting audio fragments with preset fragment duration from the initial position of the first audio data at intervals of preset offset time.

S204, sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment.

Steps S203-S204 of the present embodiment may be a detailed refinement of step S102 of the embodiment shown in fig. 1. In steps S203-S204, the preset offset time and the preset slicing time duration may be set according to actual needs. This embodiment may assume that the preset offset time is 1S and the preset slicing time duration is 10S, then according to the example shown in step S202, the first audio data of the first 2 minutes is extracted from song a1, the start position of the first audio data is the start position of song a1, i.e., the time of 0S, then in steps S203-S204, the first audio slice with the time-cut duration offset by 0S is a first audio slice with the time-cut duration of 10S, the offset time of the first audio slice relative to the start position of the first audio data is 0S, and the start-stop time is 0S to 10S; a second audio fragment with the time-cut duration of 10s when the second audio fragment is offset by 1s, wherein the offset time of the second audio fragment relative to the initial position of the first audio data is 1s, and the start-stop time is 1s-11 s; a third audio fragment with the time length of 10s is cut when the time is deviated by 2s, the deviation time of the third audio fragment relative to the initial position of the first audio data is 2s, and the starting time and the ending time are 2s-12 s; and so on. The obtained at least one audio slice may be represented by the following table two:

table two: audio slicing

Name (R)	Offset time	Start and end time
			First audio slice	0s	0s-10s
Second audio slice	1s	1s-11s
			Third audio frequency chip	2s	2s-12s
…	…	…

S205, collecting fingerprint information of the at least one audio fragment. This step can be referred to step S103 in the embodiment shown in fig. 1, which is not described herein.

And S206, inquiring a target album to which the target audio file belongs.

In the internet audio library or the radio and television program library, each album has a corresponding unique ID, each audio file belonging to the same album also has a corresponding unique ID, and the internet audio library or the radio and television program library stores the ID of each album, the ID of the audio file belonging to each album, and the association relationship between the audio file and the album. In step S206, a target album to which the target audio file belongs may be determined from the internet audio library or the radio and television program library according to the ID of the target audio file, and the ID of the target album may be read.

And S207, selecting a target album fingerprint information base from the preset fingerprint information bases.

And S208, reading the fingerprint information of at least one audio file in the target album fingerprint information base.

In steps S207-S208, based on the read ID of the target album, a target album fingerprint information base is selected from the table of the present embodiment, and the fingerprint information of at least one audio file in the target album fingerprint information base is read therefrom. According to the example, the target audio file is song a1, which belongs to album a, and then the album a fingerprint information base can be selected as the target album fingerprint information base according to a table, and the fingerprint information of each audio file belonging to album a can be read from the target album fingerprint information base.

S209, sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

And S210, if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment.

S211, if the fingerprint information of the audio files with the number smaller than the preset number threshold exists in the fingerprint information of the target album and is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matching audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

Steps S209-S211 describe the comparison of at least one audio tile to the target album fingerprint information base. The method comprises the following specific steps: according to the sequence of the offset time from small to large, firstly selecting a first audio fragment as a current audio fragment by referring to the second table, comparing the fingerprint information of the first audio fragment with the fingerprint information of each audio file in the fingerprint information base of the target album A, and if the fingerprint information of the audio file which is greater than or equal to a preset number threshold (the preset number threshold can be set according to actual needs, for example, can be set to be 3, 5 and the like) is matched with the fingerprint information of the first audio fragment, determining the first audio fragment as a matched audio fragment; and then sequentially selecting a second audio fragment as the current audio fragment according to the second table, and repeating the steps. If the fingerprint information of the audio file with the number smaller than the preset number threshold value is matched with the fingerprint information of the first audio fragment, the first audio fragment is determined to be a non-matching audio fragment, the current audio fragment is not selected from the second table, and the comparison process of all the audio fragments after the first audio fragment is stopped.

Steps S206-S211 of the present embodiment may be a detailed refinement of step S104 of the embodiment shown in fig. 1.

S212, the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment are obtained according to the sequence of the offset time from small to large.

S213, determining the beginning position of the title of the target audio file according to the time attribute of the first matching audio fragment, and determining the ending position of the title of the target audio file according to the time attribute of the last matching audio fragment.

Steps S212-S213 of the present embodiment may be a detailed refinement of step S105 of the embodiment shown in fig. 1. In steps S212-S213, assuming that the preset number threshold is 3, the comparison result can be represented as the following table three:

table three: comparison results

As can be seen from the above table three, because the fingerprint information of only 1 audio file in the target album fingerprint information base matches with the fingerprint information of the ninth audio fragment, that is, only the fingerprint information of the audio file less than the preset number threshold exists in the target album fingerprint information base matches with the fingerprint information of the selected ninth audio fragment serving as the current audio fragment, then, all the audio fragments after the ninth audio fragment will not undergo the fingerprint comparison process. According to the sequence of the offset time from small to large, the first audio fragment is the first matched audio fragment, the eighth audio fragment is the last matched audio fragment, namely the first audio fragment to the eighth audio fragment belong to the slice header; then referring to table two above, the offset time of the first audio piece from the start position of the first audio data (i.e., the start position of song a1) is 0s, and the start time is 0 s; the offset time of the eighth audio clip from the start position of the first audio data (i.e., the start position of song a1) is 6s, and the start time is 7 s; it can thus be determined that the offset time (or start time) for the start position of the title of the target audio file, song a1, to be the first audio slice is 0s, and the end position of the title is calculated to be 6+ 7-13 s according to the offset time and start time of the eighth audio slice.

The embodiment of the invention also provides another audio processing method, and the method of the embodiment focuses on the process of describing how to position the end of the target audio file. Referring to fig. 3, the method may include the following steps S301 to S313.

S301, a preset fingerprint information base is established, wherein the preset fingerprint information base comprises at least one album fingerprint information base. Wherein, an album contains at least one audio file, and an album fingerprint information base contains fingerprint information of at least one audio file belonging to the same album.

Step S301 of this embodiment can refer to step S201 of the embodiment shown in fig. 2, which is not described herein again.

S302, extracting second audio data with a second preset duration from the end position of the target audio file to be processed in a reverse order.

The second preset time period may be set according to practical experience, for example: generally, the time length of the tail of the audio file is 5s-60 s; then, the second preset time period may be set to 5s-60 s. In this embodiment, assuming that the target audio file is a song a1 that is 5 minutes long and the second preset time period is 60s, the second audio data of the last 1 minute (60s) of the song a1 may be extracted for analysis.

S303, extracting an audio slice with a preset slice duration from the start position of the second audio data every preset offset time.

S304, sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment.

Steps S303 to S304 in this embodiment may refer to steps S203 to S204 in the embodiment shown in fig. 2, which are not described again here, but it should be noted that in this embodiment, since the second audio data is obtained by extracting the end position of the target audio file (song a1) in reverse order, then the start position of the second audio data is the end position of song a1, that is, the time of 5 minutes, then, in steps S303 to S304, the first audio piece with the time-cut duration of 10S is shifted by 0S, the shift time of the start position of the first audio piece relative to the second audio data (that is, the end position of song a1) is 0S, and the start-stop time is 0S to 10S; a second audio piece with a time-cut length of 10s at an offset of 1s, wherein the offset time of the second audio piece from the start position of the second audio data (i.e. the end position of song a1) is 1s, and the start-stop time is 1s-11 s; a third audio piece with a time-cut length of 10s at an offset of 2s, wherein the offset time of the third audio piece from the start position of the second audio data (i.e., the end position of song a1) is 2s, and the start-stop time is 2s-12 s; and so on.

S305, collecting fingerprint information of the at least one audio fragment.

S306, inquiring a target album to which the target audio file belongs.

And S307, selecting a target album fingerprint information base from the preset fingerprint information bases.

S308, reading the fingerprint information of at least one audio file in the target album fingerprint information base.

S309, sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of the at least one audio file in the fingerprint information base of the target album.

And S310, if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment.

S311, if the fingerprint information of the audio files with the number smaller than the preset number threshold value in the fingerprint information of the target album is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matched audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

S312, the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment are obtained according to the sequence of the offset time from small to large.

S313, determining the ending position of the end of the target audio file according to the time attribute of the first matched audio fragment, and determining the starting position of the end of the target audio file according to the time attribute of the last matched audio fragment.

Steps S305 to S313 of this embodiment can refer to steps S205 to S213 of the embodiment shown in fig. 2, which is not described herein again. It should be noted that, in this embodiment, according to the sequence of the offset time from small to large, the first audio fragment is the first matching audio fragment, and the eighth audio fragment is the last matching audio fragment, that is, the first audio fragment to the eighth audio fragment all belong to the end of the fragment, and then, referring to the table two above, it can be known that the offset time of the starting position (i.e., the ending position of the song a1) of the first audio fragment relative to the second audio data is 0s, and the starting time is 0 s; the offset time of the eighth audio slice with respect to the start position of the second audio data (i.e., the end position of song a1) is 6s, and the start time is 7 s; it may thus be determined that the end position of the trailer of the target audio file, song a1, is offset by 0s from the start position of the first audio segment relative to the second audio data (i.e., the end position of song a1), i.e., 5 minutes from the end position of song a 1; the start position of the end of the title is calculated as 6+ 7-13 s from the offset time and the start time of the eighth audio clip from the start position of the second audio data (i.e., the end position of song a1), i.e., 4 minutes and 47 seconds of song a 1.

Based on the above description of the method embodiments, the following describes in detail an audio processing apparatus according to an embodiment of the present invention with reference to fig. 4. It should be noted that the audio processing apparatus described below can be used to execute the audio processing method shown in fig. 1 to 3. Specifically, an embodiment of the present invention provides an audio processing apparatus, please refer to fig. 4, where the apparatus operates as follows:

the extracting unit 101 is configured to extract target audio data of a preset duration from a target audio file to be processed.

A processing unit 102, configured to perform offset slicing on the target audio data to obtain at least one audio slice.

An acquiring unit 103, configured to acquire fingerprint information of the at least one audio slice.

A comparing unit 104, configured to compare the fingerprint information of the at least one audio fragment with a preset fingerprint information base, respectively.

And the positioning unit 105 is configured to position a feature position of the target audio file according to the comparison result, where the feature position is a leader position or a trailer position.

In a specific implementation, the device further operates the following units:

the creating unit 106 is configured to create a preset fingerprint information base, where the preset fingerprint information base includes at least one album fingerprint information base, and one album fingerprint information base includes fingerprint information of at least one audio file belonging to the same album.

In a specific implementation, the device specifically performs, when the extracting unit 101 is operated, sequentially extracting first audio data of a first preset duration from a start position of a target audio file to be processed; or, specifically, extracting second audio data of a second preset time length from the end position of the target audio file to be processed in a reverse order.

In a specific implementation, the apparatus specifically operates the following units in the process of operating the processing unit 102:

an audio slice extracting unit 1001 configured to extract an audio slice with a preset slice duration from the start position of the target audio data every preset offset time.

The storage unit 1002 is configured to sequentially store the obtained at least one audio fragment, and record a time attribute of the at least one audio fragment. Wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.

In a specific implementation, the device specifically operates the following units in the process of operating the comparison unit 104:

a target album querying unit 2001, configured to query a target album to which the target audio file belongs.

A library selecting unit 2002 for selecting a target album fingerprint information library from the preset fingerprint information libraries.

A fingerprint information reading unit 2003, configured to read fingerprint information of at least one audio file in the target album fingerprint information base.

A current selecting unit 2004, configured to sequentially select a current audio slice from the at least one audio slice according to the order of the offset time from small to large.

The current comparing unit 2005 is configured to compare the fingerprint information of the selected current audio fragment with fingerprint information of at least one audio file in the target album fingerprint information base.

A result determining unit 2006, configured to determine that the selected current audio fragment is a matching audio fragment if the fingerprint information of the audio file of which the number is greater than or equal to the preset number threshold exists in the target album fingerprint information base and the fingerprint information of the selected current audio fragment are matched; or, the audio file processing module is configured to determine that the selected current audio fragment is a non-matching audio fragment if the fingerprint information of the audio file of which the number is smaller than the preset number threshold exists in the fingerprint information of the target album and the fingerprint information of the selected current audio fragment matches with the fingerprint information of the selected current audio fragment, and stop comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.

In a specific implementation, the device specifically operates the following units in the process of operating the positioning unit 105:

the time attribute obtaining unit 3001 is configured to obtain a time attribute of the first matching audio slice and a time attribute of the last matching audio slice according to the sequence of the offset time from small to large.

A slice header position determining unit 3002, configured to determine, if the target audio data is the first audio data, a slice header start position of the target audio file according to the time attribute of the first matching audio slice, and determine a slice header end position of the target audio file according to the time attribute of the last matching audio slice.

A ending position determining unit 3003, configured to determine, if the target audio data is second audio data, an ending position of the ending of the target audio file according to the time attribute of the first matching audio slice, and determine an ending position of the ending of the target audio file according to the time attribute of the last matching audio slice.

Since the audio processing apparatus shown in fig. 4 can be used to execute the method of the embodiment shown in fig. 1-3, the functions of the units shown in fig. 4 can be referred to the related descriptions of the steps of the method shown in fig. 1-3, and are not described herein again. It should be noted that the audio processing apparatus shown in fig. 4 may be an application program running in a physical device, and there are at least two possible implementations:

in a possible embodiment, the audio processing device may be operated in a physical device to work independently, for example: the audio processing apparatus may operate in a terminal, which may include but is not limited to: a PC (personal computer), a mobile phone, a PDA (tablet personal computer), a smart wearable device, and the like, wherein the method flows shown in fig. 1 to 3 are independently implemented by a terminal; alternatively, the audio processing apparatus may also be operated in a server, and the server independently implements the method flows shown in fig. 1 to fig. 3.

In another possible embodiment, the audio processing apparatus may be distributed to operate in a plurality of physical devices, and the distribution part works in coordination, for example: one part of the audio processing device can be operated in a terminal, and the other part can be operated in a server, and the terminal and the server work cooperatively to realize the method flows shown in fig. 1-3. In this embodiment, the creating unit 106 and the comparing unit 104 shown in fig. 4 may be located in a server, and the extracting unit 101, the processing unit 102, the collecting unit 103 and the positioning unit 104 may be located in a terminal; while the corresponding process of the method shown in fig. 1-3 is executed, the process of creating the preset fingerprint information base and the comparison process may occur in the server, and the other processes include extracting the target audio data, obtaining at least one audio fragment, collecting fingerprint information of at least one audio fragment, and locating the feature position may occur in the terminal. Specifically, the terminal can send the fingerprint information of the audio fragment to the server for comparison, the server returns a comparison result, and the terminal locates the feature position according to the comparison result.

In the embodiment of the audio processing apparatus, target audio data with a preset duration may be extracted from a target audio file to be processed, and offset slicing processing may be performed on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. An audio processing method, comprising:

performing offset slicing processing on the target audio data to obtain at least one audio fragment; the audio data contained in two adjacent audio fragments in the at least one audio fragment are overlapped;

and positioning the characteristic position of the target audio file according to a comparison result, wherein the characteristic position is a leader position or a trailer position, the comparison result is used for determining whether the at least one audio fragment belongs to the leader or the trailer, the at least one audio fragment comprises a matched audio fragment, the matched audio fragment belongs to the leader or the trailer, and the matched audio fragment is matched with the fingerprint information of the audio files of which the number is greater than or equal to a preset number threshold in the preset fingerprint information base.

2. The method as claimed in claim 1, wherein before extracting the target audio data of the preset duration from the target audio file to be processed, the method further comprises:

3. The method as claimed in claim 2, wherein the extracting the target audio data of the preset duration from the target audio file to be processed comprises:

4. The method of claim 2 or 3, wherein the offset slicing the target audio data to obtain at least one audio slice comprises:

5. The method of claim 4, wherein comparing the fingerprint information of the at least one audio slice with a preset fingerprint information base respectively comprises:

inquiring a target album to which the target audio file belongs;

6. The method of claim 5, wherein the locating the feature location of the target audio file according to the comparison result comprises:

7. An audio processing apparatus, comprising:

the processing unit is used for carrying out offset slicing processing on the target audio data to obtain at least one audio fragment; the audio data contained in two adjacent audio fragments in the at least one audio fragment are overlapped;

the positioning unit is used for positioning the characteristic position of the target audio file according to a comparison result, wherein the characteristic position is a leader position or a trailer position, the comparison result is used for determining whether the at least one audio fragment belongs to the leader or the trailer, the at least one audio fragment comprises a matching audio fragment, the matching audio fragment belongs to the leader or the trailer, and the matching audio fragment is matched with the fingerprint information of the audio files of which the number is greater than or equal to a preset number threshold in the preset fingerprint information base.

8. The apparatus of claim 7, further comprising:

9. The apparatus according to claim 8, wherein the extracting unit is specifically configured to sequentially extract the first audio data of a first preset duration from a start position of the target audio file to be processed; or, the method is used for extracting the second audio data with the second preset duration from the end position of the target audio file to be processed in reverse order.

10. The apparatus of claim 8 or 9, wherein the processing unit comprises:

11. The apparatus of claim 10, wherein the alignment unit comprises:

12. The apparatus of claim 11, wherein the positioning unit comprises:

13. A computer-readable storage medium, in which program instructions are stored, the program instructions being for executing the audio processing method according to any one of claims 1 to 6.