KR20130061504A - Method for searching content using audio features - Google Patents
Method for searching content using audio features Download PDFInfo
- Publication number
- KR20130061504A KR20130061504A KR1020110127848A KR20110127848A KR20130061504A KR 20130061504 A KR20130061504 A KR 20130061504A KR 1020110127848 A KR1020110127848 A KR 1020110127848A KR 20110127848 A KR20110127848 A KR 20110127848A KR 20130061504 A KR20130061504 A KR 20130061504A
- Authority
- KR
- South Korea
- Prior art keywords
- content
- audio
- audio signal
- sample
- feature
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000005236 sound signal Effects 0.000 claims abstract description 55
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000001228 spectrum Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 abstract description 16
- 238000010586 diagram Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Mathematical Analysis (AREA)
- Multimedia (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Mathematical Optimization (AREA)
- Signal Processing (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
The present invention relates to a content retrieval method, and more particularly, to extract an audio feature using signal size information of a specific frequency band of an audio signal included in the content, and to search for the content using the extracted audio feature. It relates to a content search method using.
When a user has only part of the content from countless audio / video content on the Internet, there is a need for a technique for searching for content that includes a part of the content. In general, a video includes an audio signal synchronized with a video signal. Since the characteristics of the audio signal are easier to calculate and have a smaller capacity than the features of the video signal, the audio signal is used as a means for searching for the video. Are utilized.
In order to search for content using audio features, it must have robust characteristics for audio signal deformation such as resampling, lossy compression such as MP3, and equalization, and it should be easy to search in real time through a simple process.
Conventionally, audio features are extracted by extracting audio features using spectral flatness of each subband of an audio signal, and audio is searched using a distortion discriminant analysis (DDA). However, the conventional search method using the audio feature is not robust to distortions applied to the audio signal, and has a problem in that it takes a long time to search for an audio file.
The present invention was devised to solve the above problems, and an object of the present invention is to index an audio feature of a specific position of an audio signal in a content and perform a content search by using an indexed audio feature in a content search. It is to provide a content retrieval method using the feature.
It is also an object of the present invention to provide a content retrieval method using an audio feature exhibiting a characteristic that is robust to distortion by placing feature values insensitive to distortion in the most significant bit.
To this end, according to the present invention, a content retrieval method using an audio feature according to the present invention comprises the steps of receiving a content sample, pre-processing the audio signal of the content sample, the sound for a specific frequency band of the audio signal Obtaining a power, extracting a reference position in the specific frequency band based on the sound power, extracting an audio feature in at least two sections based on the reference position, and using the audio feature Retrieving content including the content sample from a database.
The content retrieval method using the audio feature according to the present invention indexes the audio feature of a specific position of the audio signal in the content and performs the content retrieval using the indexed audio feature in the content retrieval. Searched specific content quickly.
In addition, the content retrieval method using the audio feature according to the present invention does not use the feature of the entire area of the content when retrieving the content from a large database, but uses the minimum audio feature at a specific position of the audio signal in the content. This can improve search efficiency.
1 is a diagram showing the configuration of a content retrieval apparatus using audio features according to the present invention;
FIG. 2 is a diagram illustrating a configuration of the filter unit illustrated in FIG. 1. FIG.
3 is a conceptual diagram illustrating a maximum value extraction section and a movement interval of a feature extractor;
4 is a flowchart illustrating a content retrieval method using audio features according to the present invention;
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The configuration of the present invention and the operation and effect thereof will be clearly understood through the following detailed description.
Prior to the detailed description of the present invention, the same components will be denoted by the same reference numerals even if they are displayed on different drawings, and the detailed description will be omitted when it is determined that the well-known configuration may obscure the gist of the present invention. do.
1 is a block diagram of a content retrieval apparatus using an audio feature according to an embodiment of the present invention, FIG. 2 is a block diagram of a filter unit shown in FIG. 1, and FIG. The conceptual diagram shown.
Referring to FIG. 1, the media search apparatus according to the present invention includes a
The preprocessing
If the content sample is audio content, the preprocessor 1100 converts the audio signal of the content sample into a mono signal. Here, the
If the input content sample is not audio content, the
The
The
The
The
At this time, the subband power in the first section is in the order of low frequency to high frequency.
(i = 1, 2, ..., 16), and the subband power in the second section , The characteristic value at the kth (k = 1, 2, ..., 16) bits represented by 16 bits. Is defined as in Equations 1 and 2 below. Equation 1 is a feature value Eighth bit is defined from the 1st bit (most significant bit) of Equation (2). Defines the 9th to 16th bits (least significant bit) of the.
As mentioned earlier, the feature value
Consists of 16 bits. The feature value Has the same content, but only the lower bit value is transformed when the distortion is only partially caused by band pass filtering. This is advantageous for indexing and processing.In other words, feature values
Since the most significant bits of are compared with sound power differences between neighboring frames, the values of the most significant bits are maintained unchanged in the case of an audio signal having the same contents unless the distortion is very severe. Thus, feature value The most significant bit of is unlikely to be transformed, and even if some of the lower bits are different, it is most likely an audio signal of quite similar content. The feature value When indexing, the search efficiency can be increased by sequentially comparing and processing the most significant bit to the least significant bit.The feature value
At least one may be extracted based on the position of the maximum value of each block, and the feature values are located at the important bit positions in order that distortion is not easily generated by deformation. . ≪ / RTI >The
The
4 is a flowchart illustrating a content retrieval method using audio features according to the present invention.
Referring to FIG. 4, the
The
If the content sample is audio content, the
On the other hand, if the content sample is not audio content, the
The
Subsequently, the
When the position where the sound power is maximum is extracted, the
When the feature value is extracted, the
The embodiments disclosed in the specification of the present invention are not intended to limit the present invention. The scope of the present invention should be construed according to the following claims, and all the techniques within the scope of equivalents should be construed as being included in the scope of the present invention.
110: preprocessing unit 120: filter unit
130: location extraction unit 140: feature extraction unit
150: search unit 160: DB
Claims (10)
Preprocessing an audio signal of the content sample;
Obtaining sound power for a specific frequency band of the audio signal;
Extracting a reference position in the specific frequency band based on the sound power;
Extracting an audio feature in at least two sections based on the reference position;
And searching for a content including the content sample in a database using the audio feature.
Confirming whether the content sample is audio content;
Extracting an audio signal from the content sample if the content sample is not audio content;
And converting the audio signal extracted from the content sample into a mono signal.
And converting the audio signal of the content sample into a mono signal if the content sample is audio content.
And an audio signal of the specific frequency band is extracted by sequentially performing band pass filtering and low pass filtering on the audio signal.
Setting a block that is a location extraction section and extracting a reference position from each block while moving the block in the specific frequency band.
Content search method using an audio feature, characterized in that for moving at a smaller interval than the location extraction section.
And detecting the position of the maximum or minimum sound power in each of the blocks.
Obtaining a power spectrum of an audio signal in at least two specific sections based on the reference position;
Dividing the power spectrum into a plurality of subbands and obtaining subband power of each subband;
And extracting feature values of an audio signal using subband power in the specific section.
Content retrieval method using an audio feature, characterized in that consisting of 16 bits.
Content retrieval method using an audio feature, characterized in that a plurality of extractable based on the reference position of each block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110127848A KR20130061504A (en) | 2011-12-01 | 2011-12-01 | Method for searching content using audio features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110127848A KR20130061504A (en) | 2011-12-01 | 2011-12-01 | Method for searching content using audio features |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20130061504A true KR20130061504A (en) | 2013-06-11 |
Family
ID=48859608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020110127848A KR20130061504A (en) | 2011-12-01 | 2011-12-01 | Method for searching content using audio features |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20130061504A (en) |
-
2011
- 2011-12-01 KR KR1020110127848A patent/KR20130061504A/en not_active Application Discontinuation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR20120064582A (en) | Method of searching multi-media contents and apparatus for the same | |
TWI480855B (en) | Extraction and matching of characteristic fingerprints from audio signals | |
EP2458584A2 (en) | Audio visual signature, method of deriving a signature, and method of comparing audio-visual data | |
Anguera et al. | Mask: Robust local features for audio fingerprinting | |
CN106802960B (en) | Fragmented audio retrieval method based on audio fingerprints | |
US10089994B1 (en) | Acoustic fingerprint extraction and matching | |
CN105989836B (en) | Voice acquisition method and device and terminal equipment | |
CN106708990B (en) | Music piece extraction method and equipment | |
JP2006505821A (en) | Multimedia content with fingerprint information | |
CN111640411B (en) | Audio synthesis method, device and computer readable storage medium | |
CN109644283B (en) | Audio fingerprinting based on audio energy characteristics | |
US20190213214A1 (en) | Audio matching | |
US8543228B2 (en) | Coded domain audio analysis | |
US8108164B2 (en) | Determination of a common fundamental frequency of harmonic signals | |
CN103294696A (en) | Audio and video content retrieval method and system | |
KR20130061504A (en) | Method for searching content using audio features | |
KR101661666B1 (en) | Hybrid audio fingerprinting apparatus and method | |
US9215350B2 (en) | Sound processing method, sound processing system, video processing method, video processing system, sound processing device, and method and program for controlling same | |
US9183840B2 (en) | Apparatus and method for measuring quality of audio | |
Wang et al. | Audio fingerprint based on spectral flux for audio retrieval | |
KR101303256B1 (en) | Apparatus and Method for real-time detecting and decoding of morse signal | |
Mapelli et al. | Audio hashing technique for automatic song identification | |
CN108268572B (en) | Song synchronization method and system | |
CN110910899B (en) | Real-time audio signal consistency comparison detection method | |
Chickanbanjar | Comparative analysis between audio fingerprinting algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |