CN105975568B - Audio processing method and device - Google Patents

Audio processing method and device Download PDF

Info

Publication number
CN105975568B
CN105975568B CN201610288300.3A CN201610288300A CN105975568B CN 105975568 B CN105975568 B CN 105975568B CN 201610288300 A CN201610288300 A CN 201610288300A CN 105975568 B CN105975568 B CN 105975568B
Authority
CN
China
Prior art keywords
audio
fingerprint information
target
fragment
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610288300.3A
Other languages
Chinese (zh)
Other versions
CN105975568A (en
Inventor
孙嘉骏
王志豪
赵伟峰
杨雍
车斌
周旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610288300.3A priority Critical patent/CN105975568B/en
Publication of CN105975568A publication Critical patent/CN105975568A/en
Application granted granted Critical
Publication of CN105975568B publication Critical patent/CN105975568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an audio processing method and an audio processing device, wherein the method comprises the following steps: extracting target audio data with preset duration from a target audio file to be processed; performing offset slicing processing on the target audio data to obtain at least one audio fragment; collecting fingerprint information of the at least one audio fragment, and comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively; and positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position. The invention can realize automatic positioning of the characteristic position of the audio file such as the leader or the trailer, and improve the efficiency and the accuracy of audio processing.

Description

Audio processing method and device
Technical Field
The invention relates to the technical field of internet, in particular to the technical field of audio, and particularly relates to an audio processing method and device.
Background
The audio files may include, but are not limited to: songs, song segments, and voice-like programs in an internet audio library; songs played on a radio or television, song clips, speech-like programs, etc. The title of the audio file refers to audio data which is positioned at the head end of the audio file and plays a role in starting, and the tail of the audio file refers to audio data which is positioned at the tail end of the audio file and plays a role in summarizing and ending. Some audio files have a leader and trailer, while some audio files do not have a leader and trailer. In the prior art, whether the audio file has the leader and the trailer is judged manually, the leader position or the trailer position is generally realized by manual dotting, and along with the increasing number of the audio files, the manual operation cannot meet the increasing requirements of the audio processing on efficiency and accuracy.
Disclosure of Invention
The embodiment of the invention provides an audio processing method and an audio processing device, which can realize automatic positioning of characteristic positions of an audio file, such as a leader or a trailer, and improve the efficiency and accuracy of audio processing.
A first aspect of an embodiment of the present invention provides an audio processing method, which may include:
extracting target audio data with preset duration from a target audio file to be processed;
performing offset slicing processing on the target audio data to obtain at least one audio fragment;
collecting fingerprint information of the at least one audio fragment, and comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;
and positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.
Preferably, before extracting the target audio data of the preset duration from the target audio file to be processed, the method further includes:
and creating a preset fingerprint information base, wherein the preset fingerprint information base comprises at least one album fingerprint information base, and one album fingerprint information base comprises fingerprint information of at least one audio file belonging to the same album.
Preferably, the extracting target audio data with a preset duration from the target audio file to be processed includes:
sequentially extracting first audio data of a first preset duration from the starting position of a target audio file to be processed; or,
and extracting second audio data with a second preset duration from the end position of the target audio file to be processed in a reverse order.
Preferably, the performing offset slicing processing on the target audio data to obtain at least one audio slice includes:
extracting audio fragments with preset fragment duration from the initial position of the target audio data at intervals of preset offset time;
sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;
wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.
Preferably, the comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively includes:
inquiring a target album to which the target audio file belongs;
selecting a target album fingerprint information base from the preset fingerprint information base, and reading fingerprint information of at least one audio file in the target album fingerprint information base;
sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album;
if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment;
if the fingerprint information of the audio files with the number smaller than the preset number threshold value in the fingerprint information of the target album is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matched audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
Preferably, the positioning the feature position of the target audio file according to the comparison result includes:
acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;
if the target audio data is first audio data, determining the start position of the title of the target audio file according to the time attribute of the first matched audio fragment, and determining the end position of the title of the target audio file according to the time attribute of the last matched audio fragment;
and if the target audio data is second audio data, determining the ending position of the end of the target audio file according to the time attribute of the first matched audio fragment, and determining the starting position of the end of the target audio file according to the time attribute of the last matched audio fragment.
A second aspect of the embodiments of the present invention provides an audio processing apparatus, which may include:
the extraction unit is used for extracting target audio data with preset duration from a target audio file to be processed;
the processing unit is used for carrying out offset slicing processing on the target audio data to obtain at least one audio fragment;
the acquisition unit is used for acquiring fingerprint information of the at least one audio fragment;
the comparison unit is used for comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;
and the positioning unit is used for positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.
Preferably, the apparatus further comprises:
the album fingerprint information database comprises fingerprint information of at least one audio file belonging to the same album.
Preferably, the extracting unit is specifically configured to sequentially extract first audio data of a first preset duration from a start position of a target audio file to be processed; or, the method is used for extracting the second audio data with the second preset duration from the end position of the target audio file to be processed in reverse order.
Preferably, the processing unit includes:
an audio slice extracting unit, configured to extract an audio slice with a preset slice duration from the start position of the target audio data every preset offset time;
the storage unit is used for sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;
wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.
Preferably, the alignment unit comprises:
a target album querying unit configured to query a target album to which the target audio file belongs;
the library selection unit is used for selecting a target album fingerprint information library from the preset fingerprint information libraries;
the fingerprint information reading unit is used for reading the fingerprint information of at least one audio file in the target album fingerprint information base;
the current selection unit is used for sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large;
the current comparison unit is used for comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the target album fingerprint information base;
a result determining unit, configured to determine that the selected current audio fragment is a matching audio fragment if fingerprint information of audio files of which the number of fingerprints is greater than or equal to a preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment; or, the audio file processing module is configured to determine that the selected current audio fragment is a non-matching audio fragment if the fingerprint information of the audio file of which the number is smaller than the preset number threshold exists in the fingerprint information of the target album and the fingerprint information of the selected current audio fragment matches with the fingerprint information of the selected current audio fragment, and stop comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
Preferably, the positioning unit includes:
the time attribute acquisition unit is used for acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;
a slice header position determining unit, configured to determine, if the target audio data is first audio data, a slice header start position of the target audio file according to a time attribute of a first matching audio slice, and determine a slice header end position of the target audio file according to a time attribute of a last matching audio slice;
and the end-of-piece position determining unit is used for determining the end-of-piece position of the target audio file according to the time attribute of the first matched audio fragment and determining the end-of-piece starting position of the target audio file according to the time attribute of the last matched audio fragment if the target audio data is the second audio data.
The embodiment of the invention can extract target audio data with preset duration from a target audio file to be processed, and perform offset slicing processing on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an audio processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another audio processing method according to an embodiment of the present invention;
fig. 3 is a flowchart of another audio processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the embodiment of the present invention, the audio file may include, but is not limited to: songs, song segments, and voice-like programs in an internet audio library; songs played on a radio or television, song clips, speech-like programs, etc. In order to perform audio processing more accurately, the audio file in the following embodiments of the present invention preferably refers to a file in an original audio format, i.e., a file with a sampling rate of 8K, a quantization bit number of 16 bits, and a mono wav (a sound file format) is preferred. If the audio file to be processed is a file in other audio formats, for example: audio files in the formats of MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), WMA (Windows Media Audio format), APE (a digital Audio lossless compression format), and the like need to be first subjected to format conversion processing.
In the prior art, the leader position or the trailer position of an audio file is generally realized by manual dotting, and along with the increasing number of audio files, the manual dotting cannot meet the requirements on efficiency and accuracy. Based on this, the embodiment of the present invention may extract target audio data with a preset duration from a target audio file to be processed, and perform offset slicing on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
Based on the above description, an embodiment of the present invention provides an audio processing method, please refer to fig. 1, which may include the following steps S101 to S105.
S101, extracting target audio data with preset duration from a target audio file to be processed.
Generally, the duration of the leader or the trailer of an audio file is not very long, and based on the characteristic, the target audio data with the preset duration can be extracted from the target audio file to analyze the leader or the trailer subsequently. It should be noted that the preset time period may be set according to practical experience, for example: generally, the time length of the head of an audio file is 5s-120s, and the time length of the tail of the audio file is 5s-60 s; then, if the position of the title of the target audio file needs to be located, the target audio data of the previous 2 minutes (120s) can be extracted from the target audio file for analysis; if the position of the end of the target audio file needs to be located, the target audio data of the last 1 minute (60s) can be extracted from the target audio file for analysis.
S102, carrying out offset slicing processing on the target audio data to obtain at least one audio slice.
The offset slicing processing refers to cutting an audio slice at a certain offset time, for example: assuming that the offset time is 1s and the slice duration is 10s, a first audio slice with a time slice duration of 10s can be cut by offsetting 0s from the start position of the target audio data, the offset time of the first audio slice is 0s, and the start-stop time is 0s-10 s; a second audio slice with the time length of 10s is cut when the time length is deviated from 1s, the deviation time of the second audio slice is 1s, and the starting time and the ending time are 1s-11 s; a third audio slice with the time length of 10s and the time offset of 2s, wherein the offset time of the third audio slice is 2s, and the start-stop time is 2s-12 s; and so on. Therefore, the time length of each audio fragment in at least one audio fragment obtained after the offset processing is the same, the audio data contained in each audio fragment are overlapped, but the start-stop time and the offset time of each audio fragment are different. In a specific implementation, some audio processing tools may be used to perform offset slicing on the target audio data, and the audio processing tools herein may include, but are not limited to: ffmpeg (Fast Forward Mpeg, open source computer program for recording, converting digital audio, video, and converting them into streams). Preferably, the audio slice is an 8K sample rate, 16bit quantization bit number, mono wav file.
S103, collecting fingerprint information of the at least one audio fragment.
The audio fingerprint information refers to important acoustic features which can represent a section of audio and a compact digital signature based on the content contained in the audio, and has the following main advantages that ① robustness is achieved, even if the audio has serious distortion, noise, tone variation and the like, the fingerprint information can still identify and represent the important acoustic features of the audio, ② distinctiveness is achieved, one piece of fingerprint information can uniquely identify a section of audio, and the fingerprint information of different audios has differences, ③ reliability is low, namely the probability of error identification is low when the audio is identified through the fingerprint information.
S104, comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively.
Fingerprint information of at least one audio file is stored in a preset fingerprint information base; in a specific implementation, the fingerprint information of the at least one audio fragment may be sequentially compared with the fingerprint information of each audio file in a preset fingerprint information base, and if the similarity between the fingerprint information of a certain audio fragment and the fingerprint information of a certain audio file reaches a preset value (the preset value here may be set according to actual needs, for example, 85%, 90%, etc.) or more, the audio fragment may be considered to be matched with the audio file in the preset fingerprint information base.
And S105, positioning the characteristic position of the target audio file according to the comparison result, wherein the characteristic position is a leader position or a trailer position.
Generally, the beginning or end of an audio file has repeatability and identity. If the fingerprint information of a certain audio fragment matches with the fingerprint information of a plurality of audio files in the preset fingerprint information base, it indicates that a plurality of identical and repeated fingerprint information exist in the preset fingerprint information base, and the audio fragment is considered to belong to a fragment head or a fragment tail. Based on this principle, this step may determine whether the audio fragment belongs to the slice header or the slice trailer according to the comparison result between each audio fragment obtained in step S104 and the preset fingerprint information base, and may further locate the slice header position or the slice trailer position of the target audio file.
According to the audio processing method provided by the embodiment of the invention, target audio data with preset duration can be extracted from a target audio file to be processed, and offset slicing processing is carried out on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
The embodiment of the invention also provides another audio processing method, and the method of the embodiment focuses on the process of describing how to locate the position of the slice header of the target audio file. Referring to fig. 2, the method may include the following steps S201 to S213.
S201, a preset fingerprint information base is established, wherein the preset fingerprint information base comprises at least one album fingerprint information base. Wherein, an album contains at least one audio file, and an album fingerprint information base contains fingerprint information of at least one audio file belonging to the same album.
In this embodiment, the preset fingerprint information base may be represented by the following table one:
table one: preset fingerprint information base
Figure BDA0000979492330000081
As can be seen from the table one, the fingerprint information of at least one audio file is stored in the preset fingerprint information base; preferably, in this embodiment, the preset fingerprint information base is divided by taking albums as dimensions, and the fingerprint information of each audio file belonging to the same album is stored in the same album fingerprint information base in a unified manner. Therefore, the subsequent processing of the target audio file only needs to be carried out in the album fingerprint information base to which the target audio file belongs, and the audio processing efficiency is greatly improved.
S202, first audio data with a first preset duration are sequentially extracted from the starting position of the target audio file to be processed.
The first preset time period may be set according to practical experience, for example: generally, the slice header time length of the audio file is 5s-120s, and the first preset time length can be set to be 5s-120 s. In this embodiment, assuming that the target audio file is a song a1 with a length of 5 minutes and the first preset time duration is 120s, the first audio data of the first 2 minutes (120s) of the song a1 may be extracted for analysis.
S203, extracting audio fragments with preset fragment duration from the initial position of the first audio data at intervals of preset offset time.
S204, sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment.
Steps S203-S204 of the present embodiment may be a detailed refinement of step S102 of the embodiment shown in fig. 1. In steps S203-S204, the preset offset time and the preset slicing time duration may be set according to actual needs. This embodiment may assume that the preset offset time is 1S and the preset slicing time duration is 10S, then according to the example shown in step S202, the first audio data of the first 2 minutes is extracted from song a1, the start position of the first audio data is the start position of song a1, i.e., the time of 0S, then in steps S203-S204, the first audio slice with the time-cut duration offset by 0S is a first audio slice with the time-cut duration of 10S, the offset time of the first audio slice relative to the start position of the first audio data is 0S, and the start-stop time is 0S to 10S; a second audio fragment with the time-cut duration of 10s when the second audio fragment is offset by 1s, wherein the offset time of the second audio fragment relative to the initial position of the first audio data is 1s, and the start-stop time is 1s-11 s; a third audio fragment with the time length of 10s is cut when the time is deviated by 2s, the deviation time of the third audio fragment relative to the initial position of the first audio data is 2s, and the starting time and the ending time are 2s-12 s; and so on. The obtained at least one audio slice may be represented by the following table two:
table two: audio slicing
Name (R) Offset time Start and end time
First audio slice 0s 0s-10s
Second audio slice 1s 1s-11s
Third audio frequency chip 2s 2s-12s
S205, collecting fingerprint information of the at least one audio fragment. This step can be referred to step S103 in the embodiment shown in fig. 1, which is not described herein.
And S206, inquiring a target album to which the target audio file belongs.
In the internet audio library or the radio and television program library, each album has a corresponding unique ID, each audio file belonging to the same album also has a corresponding unique ID, and the internet audio library or the radio and television program library stores the ID of each album, the ID of the audio file belonging to each album, and the association relationship between the audio file and the album. In step S206, a target album to which the target audio file belongs may be determined from the internet audio library or the radio and television program library according to the ID of the target audio file, and the ID of the target album may be read.
And S207, selecting a target album fingerprint information base from the preset fingerprint information bases.
And S208, reading the fingerprint information of at least one audio file in the target album fingerprint information base.
In steps S207-S208, based on the read ID of the target album, a target album fingerprint information base is selected from the table of the present embodiment, and the fingerprint information of at least one audio file in the target album fingerprint information base is read therefrom. According to the example, the target audio file is song a1, which belongs to album a, and then the album a fingerprint information base can be selected as the target album fingerprint information base according to a table, and the fingerprint information of each audio file belonging to album a can be read from the target album fingerprint information base.
S209, sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
And S210, if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment.
S211, if the fingerprint information of the audio files with the number smaller than the preset number threshold exists in the fingerprint information of the target album and is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matching audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
Steps S209-S211 describe the comparison of at least one audio tile to the target album fingerprint information base. The method comprises the following specific steps: according to the sequence of the offset time from small to large, firstly selecting a first audio fragment as a current audio fragment by referring to the second table, comparing the fingerprint information of the first audio fragment with the fingerprint information of each audio file in the fingerprint information base of the target album A, and if the fingerprint information of the audio file which is greater than or equal to a preset number threshold (the preset number threshold can be set according to actual needs, for example, can be set to be 3, 5 and the like) is matched with the fingerprint information of the first audio fragment, determining the first audio fragment as a matched audio fragment; and then sequentially selecting a second audio fragment as the current audio fragment according to the second table, and repeating the steps. If the fingerprint information of the audio file with the number smaller than the preset number threshold value is matched with the fingerprint information of the first audio fragment, the first audio fragment is determined to be a non-matching audio fragment, the current audio fragment is not selected from the second table, and the comparison process of all the audio fragments after the first audio fragment is stopped.
Steps S206-S211 of the present embodiment may be a detailed refinement of step S104 of the embodiment shown in fig. 1.
S212, the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment are obtained according to the sequence of the offset time from small to large.
S213, determining the beginning position of the title of the target audio file according to the time attribute of the first matching audio fragment, and determining the ending position of the title of the target audio file according to the time attribute of the last matching audio fragment.
Steps S212-S213 of the present embodiment may be a detailed refinement of step S105 of the embodiment shown in fig. 1. In steps S212-S213, assuming that the preset number threshold is 3, the comparison result can be represented as the following table three:
table three: comparison results
Figure BDA0000979492330000111
As can be seen from the above table three, because the fingerprint information of only 1 audio file in the target album fingerprint information base matches with the fingerprint information of the ninth audio fragment, that is, only the fingerprint information of the audio file less than the preset number threshold exists in the target album fingerprint information base matches with the fingerprint information of the selected ninth audio fragment serving as the current audio fragment, then, all the audio fragments after the ninth audio fragment will not undergo the fingerprint comparison process. According to the sequence of the offset time from small to large, the first audio fragment is the first matched audio fragment, the eighth audio fragment is the last matched audio fragment, namely the first audio fragment to the eighth audio fragment belong to the slice header; then referring to table two above, the offset time of the first audio piece from the start position of the first audio data (i.e., the start position of song a1) is 0s, and the start time is 0 s; the offset time of the eighth audio clip from the start position of the first audio data (i.e., the start position of song a1) is 6s, and the start time is 7 s; it can thus be determined that the offset time (or start time) for the start position of the title of the target audio file, song a1, to be the first audio slice is 0s, and the end position of the title is calculated to be 6+ 7-13 s according to the offset time and start time of the eighth audio slice.
According to the audio processing method provided by the embodiment of the invention, target audio data with preset duration can be extracted from a target audio file to be processed, and offset slicing processing is carried out on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
The embodiment of the invention also provides another audio processing method, and the method of the embodiment focuses on the process of describing how to position the end of the target audio file. Referring to fig. 3, the method may include the following steps S301 to S313.
S301, a preset fingerprint information base is established, wherein the preset fingerprint information base comprises at least one album fingerprint information base. Wherein, an album contains at least one audio file, and an album fingerprint information base contains fingerprint information of at least one audio file belonging to the same album.
Step S301 of this embodiment can refer to step S201 of the embodiment shown in fig. 2, which is not described herein again.
S302, extracting second audio data with a second preset duration from the end position of the target audio file to be processed in a reverse order.
The second preset time period may be set according to practical experience, for example: generally, the time length of the tail of the audio file is 5s-60 s; then, the second preset time period may be set to 5s-60 s. In this embodiment, assuming that the target audio file is a song a1 that is 5 minutes long and the second preset time period is 60s, the second audio data of the last 1 minute (60s) of the song a1 may be extracted for analysis.
S303, extracting an audio slice with a preset slice duration from the start position of the second audio data every preset offset time.
S304, sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment.
Steps S303 to S304 in this embodiment may refer to steps S203 to S204 in the embodiment shown in fig. 2, which are not described again here, but it should be noted that in this embodiment, since the second audio data is obtained by extracting the end position of the target audio file (song a1) in reverse order, then the start position of the second audio data is the end position of song a1, that is, the time of 5 minutes, then, in steps S303 to S304, the first audio piece with the time-cut duration of 10S is shifted by 0S, the shift time of the start position of the first audio piece relative to the second audio data (that is, the end position of song a1) is 0S, and the start-stop time is 0S to 10S; a second audio piece with a time-cut length of 10s at an offset of 1s, wherein the offset time of the second audio piece from the start position of the second audio data (i.e. the end position of song a1) is 1s, and the start-stop time is 1s-11 s; a third audio piece with a time-cut length of 10s at an offset of 2s, wherein the offset time of the third audio piece from the start position of the second audio data (i.e., the end position of song a1) is 2s, and the start-stop time is 2s-12 s; and so on.
S305, collecting fingerprint information of the at least one audio fragment.
S306, inquiring a target album to which the target audio file belongs.
And S307, selecting a target album fingerprint information base from the preset fingerprint information bases.
S308, reading the fingerprint information of at least one audio file in the target album fingerprint information base.
S309, sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of the at least one audio file in the fingerprint information base of the target album.
And S310, if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment.
S311, if the fingerprint information of the audio files with the number smaller than the preset number threshold value in the fingerprint information of the target album is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matched audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
S312, the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment are obtained according to the sequence of the offset time from small to large.
S313, determining the ending position of the end of the target audio file according to the time attribute of the first matched audio fragment, and determining the starting position of the end of the target audio file according to the time attribute of the last matched audio fragment.
Steps S305 to S313 of this embodiment can refer to steps S205 to S213 of the embodiment shown in fig. 2, which is not described herein again. It should be noted that, in this embodiment, according to the sequence of the offset time from small to large, the first audio fragment is the first matching audio fragment, and the eighth audio fragment is the last matching audio fragment, that is, the first audio fragment to the eighth audio fragment all belong to the end of the fragment, and then, referring to the table two above, it can be known that the offset time of the starting position (i.e., the ending position of the song a1) of the first audio fragment relative to the second audio data is 0s, and the starting time is 0 s; the offset time of the eighth audio slice with respect to the start position of the second audio data (i.e., the end position of song a1) is 6s, and the start time is 7 s; it may thus be determined that the end position of the trailer of the target audio file, song a1, is offset by 0s from the start position of the first audio segment relative to the second audio data (i.e., the end position of song a1), i.e., 5 minutes from the end position of song a 1; the start position of the end of the title is calculated as 6+ 7-13 s from the offset time and the start time of the eighth audio clip from the start position of the second audio data (i.e., the end position of song a1), i.e., 4 minutes and 47 seconds of song a 1.
According to the audio processing method provided by the embodiment of the invention, target audio data with preset duration can be extracted from a target audio file to be processed, and offset slicing processing is carried out on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
Based on the above description of the method embodiments, the following describes in detail an audio processing apparatus according to an embodiment of the present invention with reference to fig. 4. It should be noted that the audio processing apparatus described below can be used to execute the audio processing method shown in fig. 1 to 3. Specifically, an embodiment of the present invention provides an audio processing apparatus, please refer to fig. 4, where the apparatus operates as follows:
the extracting unit 101 is configured to extract target audio data of a preset duration from a target audio file to be processed.
A processing unit 102, configured to perform offset slicing on the target audio data to obtain at least one audio slice.
An acquiring unit 103, configured to acquire fingerprint information of the at least one audio slice.
A comparing unit 104, configured to compare the fingerprint information of the at least one audio fragment with a preset fingerprint information base, respectively.
And the positioning unit 105 is configured to position a feature position of the target audio file according to the comparison result, where the feature position is a leader position or a trailer position.
In a specific implementation, the device further operates the following units:
the creating unit 106 is configured to create a preset fingerprint information base, where the preset fingerprint information base includes at least one album fingerprint information base, and one album fingerprint information base includes fingerprint information of at least one audio file belonging to the same album.
In a specific implementation, the device specifically performs, when the extracting unit 101 is operated, sequentially extracting first audio data of a first preset duration from a start position of a target audio file to be processed; or, specifically, extracting second audio data of a second preset time length from the end position of the target audio file to be processed in a reverse order.
In a specific implementation, the apparatus specifically operates the following units in the process of operating the processing unit 102:
an audio slice extracting unit 1001 configured to extract an audio slice with a preset slice duration from the start position of the target audio data every preset offset time.
The storage unit 1002 is configured to sequentially store the obtained at least one audio fragment, and record a time attribute of the at least one audio fragment. Wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.
In a specific implementation, the device specifically operates the following units in the process of operating the comparison unit 104:
a target album querying unit 2001, configured to query a target album to which the target audio file belongs.
A library selecting unit 2002 for selecting a target album fingerprint information library from the preset fingerprint information libraries.
A fingerprint information reading unit 2003, configured to read fingerprint information of at least one audio file in the target album fingerprint information base.
A current selecting unit 2004, configured to sequentially select a current audio slice from the at least one audio slice according to the order of the offset time from small to large.
The current comparing unit 2005 is configured to compare the fingerprint information of the selected current audio fragment with fingerprint information of at least one audio file in the target album fingerprint information base.
A result determining unit 2006, configured to determine that the selected current audio fragment is a matching audio fragment if the fingerprint information of the audio file of which the number is greater than or equal to the preset number threshold exists in the target album fingerprint information base and the fingerprint information of the selected current audio fragment are matched; or, the audio file processing module is configured to determine that the selected current audio fragment is a non-matching audio fragment if the fingerprint information of the audio file of which the number is smaller than the preset number threshold exists in the fingerprint information of the target album and the fingerprint information of the selected current audio fragment matches with the fingerprint information of the selected current audio fragment, and stop comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
In a specific implementation, the device specifically operates the following units in the process of operating the positioning unit 105:
the time attribute obtaining unit 3001 is configured to obtain a time attribute of the first matching audio slice and a time attribute of the last matching audio slice according to the sequence of the offset time from small to large.
A slice header position determining unit 3002, configured to determine, if the target audio data is the first audio data, a slice header start position of the target audio file according to the time attribute of the first matching audio slice, and determine a slice header end position of the target audio file according to the time attribute of the last matching audio slice.
A ending position determining unit 3003, configured to determine, if the target audio data is second audio data, an ending position of the ending of the target audio file according to the time attribute of the first matching audio slice, and determine an ending position of the ending of the target audio file according to the time attribute of the last matching audio slice.
Since the audio processing apparatus shown in fig. 4 can be used to execute the method of the embodiment shown in fig. 1-3, the functions of the units shown in fig. 4 can be referred to the related descriptions of the steps of the method shown in fig. 1-3, and are not described herein again. It should be noted that the audio processing apparatus shown in fig. 4 may be an application program running in a physical device, and there are at least two possible implementations:
in a possible embodiment, the audio processing device may be operated in a physical device to work independently, for example: the audio processing apparatus may operate in a terminal, which may include but is not limited to: a PC (personal computer), a mobile phone, a PDA (tablet personal computer), a smart wearable device, and the like, wherein the method flows shown in fig. 1 to 3 are independently implemented by a terminal; alternatively, the audio processing apparatus may also be operated in a server, and the server independently implements the method flows shown in fig. 1 to fig. 3.
In another possible embodiment, the audio processing apparatus may be distributed to operate in a plurality of physical devices, and the distribution part works in coordination, for example: one part of the audio processing device can be operated in a terminal, and the other part can be operated in a server, and the terminal and the server work cooperatively to realize the method flows shown in fig. 1-3. In this embodiment, the creating unit 106 and the comparing unit 104 shown in fig. 4 may be located in a server, and the extracting unit 101, the processing unit 102, the collecting unit 103 and the positioning unit 104 may be located in a terminal; while the corresponding process of the method shown in fig. 1-3 is executed, the process of creating the preset fingerprint information base and the comparison process may occur in the server, and the other processes include extracting the target audio data, obtaining at least one audio fragment, collecting fingerprint information of at least one audio fragment, and locating the feature position may occur in the terminal. Specifically, the terminal can send the fingerprint information of the audio fragment to the server for comparison, the server returns a comparison result, and the terminal locates the feature position according to the comparison result.
In the embodiment of the audio processing apparatus, target audio data with a preset duration may be extracted from a target audio file to be processed, and offset slicing processing may be performed on the target audio data to obtain at least one audio fragment; the fingerprint information of at least one audio fragment is compared by adopting a preset fingerprint information base, the position of the head or the position of the tail of the target audio file is analyzed and positioned according to the comparison result, the automatic positioning of the position of the head or the tail of the target audio file can be realized through the process, the labor cost is saved, and the efficiency and the accuracy of audio processing are effectively improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (13)

1. An audio processing method, comprising:
extracting target audio data with preset duration from a target audio file to be processed;
performing offset slicing processing on the target audio data to obtain at least one audio fragment; the audio data contained in two adjacent audio fragments in the at least one audio fragment are overlapped;
collecting fingerprint information of the at least one audio fragment, and comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;
and positioning the characteristic position of the target audio file according to a comparison result, wherein the characteristic position is a leader position or a trailer position, the comparison result is used for determining whether the at least one audio fragment belongs to the leader or the trailer, the at least one audio fragment comprises a matched audio fragment, the matched audio fragment belongs to the leader or the trailer, and the matched audio fragment is matched with the fingerprint information of the audio files of which the number is greater than or equal to a preset number threshold in the preset fingerprint information base.
2. The method as claimed in claim 1, wherein before extracting the target audio data of the preset duration from the target audio file to be processed, the method further comprises:
and creating a preset fingerprint information base, wherein the preset fingerprint information base comprises at least one album fingerprint information base, and one album fingerprint information base comprises fingerprint information of at least one audio file belonging to the same album.
3. The method as claimed in claim 2, wherein the extracting the target audio data of the preset duration from the target audio file to be processed comprises:
sequentially extracting first audio data of a first preset duration from the starting position of a target audio file to be processed; or,
and extracting second audio data with a second preset duration from the end position of the target audio file to be processed in a reverse order.
4. The method of claim 2 or 3, wherein the offset slicing the target audio data to obtain at least one audio slice comprises:
extracting audio fragments with preset fragment duration from the initial position of the target audio data at intervals of preset offset time;
sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;
wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.
5. The method of claim 4, wherein comparing the fingerprint information of the at least one audio slice with a preset fingerprint information base respectively comprises:
inquiring a target album to which the target audio file belongs;
selecting a target album fingerprint information base from the preset fingerprint information base, and reading fingerprint information of at least one audio file in the target album fingerprint information base;
sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large, and comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album;
if the fingerprint information of the audio files with the number greater than or equal to the preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment, determining the selected current audio fragment as a matched audio fragment;
if the fingerprint information of the audio files with the number smaller than the preset number threshold value in the fingerprint information of the target album is matched with the fingerprint information of the selected current audio fragment, determining that the selected current audio fragment is a non-matched audio fragment, and stopping comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
6. The method of claim 5, wherein the locating the feature location of the target audio file according to the comparison result comprises:
acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;
if the target audio data is first audio data, determining the start position of the title of the target audio file according to the time attribute of the first matched audio fragment, and determining the end position of the title of the target audio file according to the time attribute of the last matched audio fragment;
and if the target audio data is second audio data, determining the ending position of the end of the target audio file according to the time attribute of the first matched audio fragment, and determining the starting position of the end of the target audio file according to the time attribute of the last matched audio fragment.
7. An audio processing apparatus, comprising:
the extraction unit is used for extracting target audio data with preset duration from a target audio file to be processed;
the processing unit is used for carrying out offset slicing processing on the target audio data to obtain at least one audio fragment; the audio data contained in two adjacent audio fragments in the at least one audio fragment are overlapped;
the acquisition unit is used for acquiring fingerprint information of the at least one audio fragment;
the comparison unit is used for comparing the fingerprint information of the at least one audio fragment with a preset fingerprint information base respectively;
the positioning unit is used for positioning the characteristic position of the target audio file according to a comparison result, wherein the characteristic position is a leader position or a trailer position, the comparison result is used for determining whether the at least one audio fragment belongs to the leader or the trailer, the at least one audio fragment comprises a matching audio fragment, the matching audio fragment belongs to the leader or the trailer, and the matching audio fragment is matched with the fingerprint information of the audio files of which the number is greater than or equal to a preset number threshold in the preset fingerprint information base.
8. The apparatus of claim 7, further comprising:
the album fingerprint information database comprises fingerprint information of at least one audio file belonging to the same album.
9. The apparatus according to claim 8, wherein the extracting unit is specifically configured to sequentially extract the first audio data of a first preset duration from a start position of the target audio file to be processed; or, the method is used for extracting the second audio data with the second preset duration from the end position of the target audio file to be processed in reverse order.
10. The apparatus of claim 8 or 9, wherein the processing unit comprises:
an audio slice extracting unit, configured to extract an audio slice with a preset slice duration from the start position of the target audio data every preset offset time;
the storage unit is used for sequentially storing the obtained at least one audio fragment and recording the time attribute of the at least one audio fragment;
wherein the time attribute of an audio slice comprises: a start time and an offset time relative to a start position of the target audio data.
11. The apparatus of claim 10, wherein the alignment unit comprises:
a target album querying unit configured to query a target album to which the target audio file belongs;
the library selection unit is used for selecting a target album fingerprint information library from the preset fingerprint information libraries;
the fingerprint information reading unit is used for reading the fingerprint information of at least one audio file in the target album fingerprint information base;
the current selection unit is used for sequentially selecting the current audio fragment from the at least one audio fragment according to the sequence of the offset time from small to large;
the current comparison unit is used for comparing the fingerprint information of the selected current audio fragment with the fingerprint information of at least one audio file in the target album fingerprint information base;
a result determining unit, configured to determine that the selected current audio fragment is a matching audio fragment if fingerprint information of audio files of which the number of fingerprints is greater than or equal to a preset number threshold exists in the target album fingerprint information base and is matched with the fingerprint information of the selected current audio fragment; or, the audio file processing module is configured to determine that the selected current audio fragment is a non-matching audio fragment if the fingerprint information of the audio file of which the number is smaller than the preset number threshold exists in the fingerprint information of the target album and the fingerprint information of the selected current audio fragment matches with the fingerprint information of the selected current audio fragment, and stop comparing the fingerprint information of all the audio fragments after the selected current audio fragment with the fingerprint information of at least one audio file in the fingerprint information base of the target album.
12. The apparatus of claim 11, wherein the positioning unit comprises:
the time attribute acquisition unit is used for acquiring the time attribute of the first matched audio fragment and the time attribute of the last matched audio fragment according to the sequence of the offset time from small to large;
a slice header position determining unit, configured to determine, if the target audio data is first audio data, a slice header start position of the target audio file according to a time attribute of a first matching audio slice, and determine a slice header end position of the target audio file according to a time attribute of a last matching audio slice;
and the end-of-piece position determining unit is used for determining the end-of-piece position of the target audio file according to the time attribute of the first matched audio fragment and determining the end-of-piece starting position of the target audio file according to the time attribute of the last matched audio fragment if the target audio data is the second audio data.
13. A computer-readable storage medium, in which program instructions are stored, the program instructions being for executing the audio processing method according to any one of claims 1 to 6.
CN201610288300.3A 2016-04-29 2016-04-29 Audio processing method and device Active CN105975568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610288300.3A CN105975568B (en) 2016-04-29 2016-04-29 Audio processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610288300.3A CN105975568B (en) 2016-04-29 2016-04-29 Audio processing method and device

Publications (2)

Publication Number Publication Date
CN105975568A CN105975568A (en) 2016-09-28
CN105975568B true CN105975568B (en) 2020-04-03

Family

ID=56993679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610288300.3A Active CN105975568B (en) 2016-04-29 2016-04-29 Audio processing method and device

Country Status (1)

Country Link
CN (1) CN105975568B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205550B (en) * 2016-12-16 2021-03-12 北京酷我科技有限公司 Audio fingerprint generation method and device
CN108198573B (en) * 2017-12-29 2021-04-30 北京奇艺世纪科技有限公司 Audio recognition method and device, storage medium and electronic equipment
CN108305622B (en) * 2018-01-04 2021-06-11 海尔优家智能科技(北京)有限公司 Voice recognition-based audio abstract text creating method and device
CN108630208B (en) * 2018-05-14 2020-10-27 平安科技(深圳)有限公司 Server, voiceprint-based identity authentication method and storage medium
CN112632321A (en) * 2019-09-23 2021-04-09 北京国双科技有限公司 Audio file processing method and device and audio file playing method and device
CN110650366B (en) * 2019-10-29 2021-09-24 成都超有爱科技有限公司 Interactive dubbing method and device, electronic equipment and readable storage medium
CN110990632B (en) * 2019-12-19 2023-05-02 腾讯科技(深圳)有限公司 Video processing method and device
CN113347489B (en) * 2021-07-09 2022-11-18 北京百度网讯科技有限公司 Video clip detection method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314875A (en) * 2011-08-01 2012-01-11 北京百度网讯科技有限公司 Audio file identification method and device
CN103021440A (en) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 Method and system for tracking audio streaming media
CN104462537A (en) * 2014-12-24 2015-03-25 北京奇艺世纪科技有限公司 Method and device for classifying voice data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050154987A1 (en) * 2004-01-14 2005-07-14 Isao Otsuka System and method for recording and reproducing multimedia

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314875A (en) * 2011-08-01 2012-01-11 北京百度网讯科技有限公司 Audio file identification method and device
CN103021440A (en) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 Method and system for tracking audio streaming media
CN104462537A (en) * 2014-12-24 2015-03-25 北京奇艺世纪科技有限公司 Method and device for classifying voice data

Also Published As

Publication number Publication date
CN105975568A (en) 2016-09-28

Similar Documents

Publication Publication Date Title
CN105975568B (en) Audio processing method and device
CN105825850B (en) Audio processing method and device
CN1998168B (en) Method and apparatus for identification of broadcast source
KR101578279B1 (en) Methods and systems for identifying content in a data stream
JP4658598B2 (en) System and method for providing user control over repetitive objects embedded in a stream
US9451048B2 (en) Methods and systems for identifying information of a broadcast station and information of broadcasted content
JP5362178B2 (en) Extracting and matching characteristic fingerprints from audio signals
US20140214190A1 (en) Method and System for Content Sampling and Identification
Herley ARGOS: Automatically extracting repeating objects from multimedia streams
EP2602630A2 (en) Method of characterizing the overlap of two media segments
US20080154401A1 (en) Method and System For Content Sampling and Identification
US9773058B2 (en) Methods and systems for arranging and searching a database of media content recordings
CA2905385C (en) Methods and systems for arranging and searching a database of media content recordings
WO2016189307A1 (en) Audio identification method
CN110795597A (en) Video keyword determination method, video retrieval method, video keyword determination device, video retrieval device, storage medium and terminal
George et al. Scalable and robust audio fingerprinting method tolerable to time-stretching
CN109271501A (en) A kind of management method and system of audio database
CN103294696A (en) Audio and video content retrieval method and system
CN113420178A (en) Data processing method and equipment
CN108198573B (en) Audio recognition method and device, storage medium and electronic equipment
US7985915B2 (en) Musical piece matching judging device, musical piece recording device, musical piece matching judging method, musical piece recording method, musical piece matching judging program, and musical piece recording program
TWI516098B (en) Record the signal detection method of the media
CN107844578B (en) Method and device for identifying repeated segments in audio stream
WO2010038187A1 (en) Method for data clusters indexing, recognition and retrieval in presence of noise
CN108268572B (en) Song synchronization method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant