TWI516098B - Record the signal detection method of the media - Google Patents

Record the signal detection method of the media Download PDF

Info

Publication number
TWI516098B
TWI516098B TW101134398A TW101134398A TWI516098B TW I516098 B TWI516098 B TW I516098B TW 101134398 A TW101134398 A TW 101134398A TW 101134398 A TW101134398 A TW 101134398A TW I516098 B TWI516098 B TW I516098B
Authority
TW
Taiwan
Prior art keywords
multimedia
segment
feature
index
recording medium
Prior art date
Application number
TW101134398A
Other languages
Chinese (zh)
Other versions
TW201414289A (en
Original Assignee
Chunghwa Telecom Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chunghwa Telecom Co Ltd filed Critical Chunghwa Telecom Co Ltd
Priority to TW101134398A priority Critical patent/TWI516098B/en
Priority to CN2012105322318A priority patent/CN103065661A/en
Publication of TW201414289A publication Critical patent/TW201414289A/en
Application granted granted Critical
Publication of TWI516098B publication Critical patent/TWI516098B/en

Links

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Television Signal Processing For Recording (AREA)

Description

錄製媒體之信號偵測方法 Signal detection method for recording media

本發明係關於一種在錄製媒體中信號偵測的方法,特別為一種利用信號處理與比對技術,將錄製的信號與內容提供者所提供的信號相互比對,自動偵測相似片段的方法。 The invention relates to a method for detecting signals in a recording medium, in particular to a method for automatically detecting similar fragments by using a signal processing and comparison technique to compare the recorded signals with the signals provided by the content provider.

隨著科技進步,多媒體資訊已成為我們生活的重心,在多媒體搜尋這個議題上,如何快速地回應使用者的需求一直都是熱門的研究主題,特別是在這個資訊爆炸的年代,資訊量更以倍數之速度成長,如果單靠人工方式進行搜尋,既耗時又費力。因此我們希望利用資料索引技術,從原本雜亂無章的資料中,萃取出與使用者興趣相近之內涵特徵,進而達到快速而有效的檢索搜尋。 With the advancement of technology, multimedia information has become the focus of our lives. On the topic of multimedia search, how to respond quickly to the needs of users has always been a hot research topic, especially in the era of information explosion, the amount of information is even more The speed of multiples grows, and it is time consuming and laborious to search by manual means. Therefore, we hope to use the data indexing technology to extract the connotation features similar to the user's interest from the original disorganized data, so as to achieve fast and effective search and retrieval.

目前多媒體資料的檢索技術,主要是以多媒體本身的文字資訊做為判斷,在先前的專利技術中有提及類似的概念:台灣DigitalInn公開號200307874,此專利係一種方法與系統,其使用一可攜式裝置,將音檔上傳伺服器,利用其音頻指紋辨識音檔內容並比對搜尋資料庫內相同之音檔,然而該專利中,音頻指紋只限於一首歌曲被播出之時間、地區等周邊文字資訊,並不包含音樂本身擷取出的音頻資訊。 At present, the retrieval technology of multimedia materials is mainly based on the text information of multimedia itself. In the prior patent technology, a similar concept is mentioned: Taiwan DigitalInn Publication No. 200307874, which is a method and system, which can be used. The portable device uploads the audio file to the server, and uses the audio fingerprint to identify the content of the audio file and compares the same audio file in the search database. However, in this patent, the audio fingerprint is limited to the time and region when a song is broadcasted. The surrounding text information does not include the audio information extracted by the music itself.

另外文獻中亦有利用音頻資訊之方法,如 Microsoft台灣專利號:I329455係一種自多媒體串流中辨識和擷取重複聲音或視訊物件之系統與方法,使用自我相關係數做為辨識準則,然而該專利不包含加速搜尋之索引技術,且用於比對之音樂特徵如BPM以及Bark Spectra應該無法有效辨識音樂片段。又如HP專利美國專利號6995309,係一種用於音樂識別之系統與方法,錄製音樂樣本,產生該樣本的特徵向量,與曲庫中的音樂特徵向量,一同計算特徵差異,若符合歌曲匹配規則,則提供歌曲資訊給使用者。本篇專利重在其匹配比對方法,使用FFT重疊存取卷積及夾角餘弦公式計算;至於特徵抽取方法則不在專利宣告範圍內;Dolby專利美國專利號US20100205174,係一種使用多個搜尋組合改良音訊/視訊指紋搜尋正確性之技術。 In addition, there are also methods for using audio information in the literature, such as Microsoft Taiwan Patent No.: I329455 is a system and method for identifying and extracting repeated sounds or video objects from multimedia streams, using self-correlation coefficients as the identification criteria. However, the patent does not include an indexing technique for accelerated search and is used for Musical features such as BPM and Bark Spectra should not be able to effectively identify music clips. Another example is HP patent US Pat. No. 6,995,309, which is a system and method for music recognition, recording a music sample, generating a feature vector of the sample, and calculating a feature difference together with a music feature vector in the music library, if the song matching rule is met. , providing song information to the user. This patent focuses on its matching comparison method, using FFT overlap access convolution and angle cosine formula; as for feature extraction method is not within the scope of patent declaration; Dolby patent US patent number US20100205174, is a use of multiple search combinations to improve The technology of audio/video fingerprint search for correctness.

針對音訊/視訊片段,取得其指紋特徵,並在資料庫中搜尋指紋以獲得可能的匹配。指紋之間差值測量的方法,有漢明距離、位元錯誤率、Lp範數、L2距離、自相關係數...等,若有符合,則回覆搜尋成功訊息給使用者,若沒有符合,則會回報搜尋結果沒有存在資料庫中。前述兩篇專利的缺點為數學運算多,計算量大,回傳速度較慢。此外GraceNote專利US 7,549,051 B2以訊號的Time-Frequency components之一階變化為基礎之音訊指紋建立索引與pattern match;Shazam專利US 2009/0265174A9以訊號Time-Frequency landmarks為基礎並產生頻率峰值對的invariant/variant兩種雜湊數值,搜尋時先比對 invariant pattern找出所有可能的頻率峰值對再運用相對時間偏移的histogram統計找出有線性關係(histogram peak)的音樂。 For the audio/video clip, get its fingerprint features and search the database for fingerprints to get a possible match. The method for measuring the difference between fingerprints includes Hamming distance, bit error rate, Lp norm, L2 distance, autocorrelation coefficient, etc. If there is a match, then the search success message is returned to the user, if not, Will return the search results are not in the database. The shortcomings of the above two patents are that there are many mathematical operations, a large amount of calculation, and a slow return speed. In addition, GraceNote patent US 7,549,051 B2 establishes an index and pattern match based on the first-order change of the Time-Frequency components of the signal; the Shazam patent US 2009/0265174A9 is based on the signal Time-Frequency landmarks and generates the frequency peak pair invariant/ Variant two kinds of hash values, first match when searching The invariant pattern finds all possible frequency peaks and uses the histogram statistics of relative time offsets to find music with a histogram peak.

前述兩篇專利皆只能用於音樂檔案,無法處理多媒體影片。 Both of the above patents can only be used for music files and cannot handle multimedia movies.

由此可見,上述習用方式仍有諸多缺失,實非一良善之設計,而亟待加以改良。 It can be seen that there are still many shortcomings in the above-mentioned methods of use, which is not a good design, but needs to be improved.

本案發明人鑑於上述習用方式所衍生的各項缺點,乃亟思加以改良創新,並經苦心孤詣潛心研究後,終於成功研發完成本發明之錄製媒體之信號偵測方法。 In view of the shortcomings derived from the above-mentioned conventional methods, the inventors of the present invention have succeeded in research and development and completed the signal detection method of the recording medium of the present invention after painstaking research and development.

隨著壓縮方式快速發展,數位內容已成為生活中的一部份,在這樣的環境下,經常會發生我們對某些內容很感興趣、卻無法使用傳統關鍵字搜尋的方法來找出該內容之資訊。而利用錄製媒體搜尋的方式可以在這種情形下達到搜尋該數位內容的目標。例如:當我們聽到一首很感興趣的音樂,但卻又不曉得任何有關該音樂的資訊,這時可以利用錄音裝置記錄音樂片段,然後利用搜尋系統對這段音樂進行特徵值分析,進而找出其低階特徵值,之後利用這些低階特徵值找出在音樂資料庫中最有可能包含該音樂片段之音樂,如此一來即使我們無法對該音樂下關鍵字,系統也可以依照音樂本身之特性進而完成搜尋的工作。 With the rapid development of compression methods, digital content has become a part of life. In such an environment, it is often the case that we are interested in some content but cannot use traditional keyword search to find out the content. Information. The use of recording media search can achieve the goal of searching for the digital content in this case. For example, when we hear a music that is very interesting, but we don't know any information about the music, we can use the recording device to record the music clip, and then use the search system to analyze the feature value of the music to find out The low-order eigenvalues are then used to find the music that is most likely to contain the music segment in the music database, so that even if we can't place keywords on the music, the system can follow the music itself. The feature then completes the search work.

本發明之目的即在於提出一種錄製媒體之信號 偵測的方法,可應用於行動裝置上,透過錄音錄影的方式找尋相似的歌曲或電影;亦可應用於電子儲存設備中,自動分析及標記及整理數位內容。 The object of the present invention is to provide a signal for recording media The detection method can be applied to mobile devices to find similar songs or movies through recording and recording; it can also be applied to electronic storage devices to automatically analyze and mark and organize digital content.

達成上述發明目的之錄製媒體信號偵測方法,係將內容提供者所提供的信號內容資訊進行特徵抽取、分群並建立索引,當用戶想要查詢某個媒體信號時,可錄製與分析媒體裝置的信號內容,進行特徵抽取並求出每個特徵和內容提供者的群中心之間的距離,取最小的距離的群作為代表,接著利用索引檔找出該代表所對應的內容位置,計算錄製的信號內容與那位置附近的內容相似度,以判斷出最相似的信號內容片段。 The recording medium signal detecting method for achieving the above object aims to extract, group and index the signal content information provided by the content provider, and when the user wants to query a certain media signal, the recording and analyzing media device can be recorded and analyzed. Signal content, perform feature extraction and find the distance between each feature and the group center of the content provider, take the group with the smallest distance as the representative, and then use the index file to find the content position corresponding to the representative, and calculate the recorded content. The content of the signal is similar to the content near that location to determine the most similar segment of the signal content.

為使貴局能更進一步瞭解本發明之技術內容,謹佐以一較佳具體實施例配合說明如下。 In order to enable the office to further understand the technical content of the present invention, a preferred embodiment will be described below.

本發明係為一種錄製媒體之信號偵測方法,透過錄製多媒體信號內容,進行分析與處理,搜尋相似多媒體片段,其至少包含相連接之多媒體索引建立方法與錄製媒體索引比對方法。 The invention relates to a signal detecting method for recording media, which performs analysis and processing by recording multimedia signal content, and searches for similar multimedia segments, which at least comprises a connected multimedia index establishing method and a recording media index matching method.

其中,上述該多媒體索引建立方法其步驟更可包含: The step of the multimedia index establishing method may further include:

a.時序分割,將N個多媒體檔案,編號1~N,按照時間順序分割成長度為若干秒的多媒體片段,每個片段都命名為該多媒體檔案編號加上底線及一個時間片段序號,這個序號等於該片段開頭在原始檔 案中是第幾秒。 a. Time division, N multimedia files, numbered 1~N, are divided into multimedia segments of length several seconds in chronological order, each segment is named as the multimedia file number plus the bottom line and a time segment number, this serial number Equivalent to the beginning of the fragment in the original file In the case is the first few seconds.

b.特徵抽取,步驟a.該些多媒體片段若為視訊,則抽取960個維度的場景導向特徵(GIST);若為音訊,則抽取13維度的梅爾頻率倒頻譜特徵(MFCC)。 b. Feature extraction, step a. If the multimedia segments are video, extract 960 dimensions of scene-oriented features (GIST); if it is audio, extract 13-dimensional Mel frequency cepstral features (MFCC).

c.特徵編碼,步驟b.中每一片段所抽取之特徵,與其後一片段所抽取的特徵,每一維度的差異進行二元化編碼,若兩者差異大於0則編碼成1,反之則編碼成0,故若多媒體片段為視訊,則每一片段將會有960個維度0或是1,若多媒體片段為音訊,則每一片段有13個維度0或是1,以下稱為高維度編碼或特徵編碼。 c. feature coding, the feature extracted by each segment in step b., the feature extracted from the latter segment, and the difference of each dimension is binary coded. If the difference between the two is greater than 0, the code is 1; Encoded to 0, so if the multimedia clip is video, each clip will have 960 dimensions 0 or 1. If the multimedia clip is audio, each clip has 13 dimensions 0 or 1, which is called high dimension. Encoding or feature encoding.

d.建立索引。 d. Establish an index.

然而,上述之步驟d該建立索引係為建立出多媒體索引,即依據該些特徵編碼,將這些高維度的編碼利用分群演算法分群並取得群中心,將每個高維度編碼所對應的多媒體檔案編號與時間片段序號,紀錄在距離該二元化編碼最接近之群中心的索引中。 However, in the above step d, the indexing is to establish a multimedia index, that is, according to the feature coding, the high-dimensional codes are grouped by using a grouping algorithm and the group center is obtained, and the multimedia file corresponding to each high-dimensional code is encoded. The number and time segment sequence number are recorded in the index of the group center closest to the binary code.

進一步說明,該分群演算法係為使用非監督式分群法,給定群中心的個數為總共的多媒體個數開根號。 Further, the clustering algorithm is to use an unsupervised grouping method, and the number of given group centers is the root number of the total number of multimedia.

且,該距離之計算係為依據漢明距離或歐式距離。 Moreover, the calculation of the distance is based on Hamming distance or Euclidean distance.

本發明之錄製媒體之信號偵測方法,其中該錄製媒體索引比對方法,其步驟更可包含: The method for detecting a signal of a recording medium of the present invention, wherein the recording medium index comparison method may further include:

a.時序分割,將多媒體檔案按照時間順序分割成長度為若干秒的多媒體片段,每個片段都命名為一個時 間片段序號,這個序號等於該片段開頭在原始檔案中是第幾秒; a. Time division, the multimedia files are divided into multimedia segments of several seconds in chronological order, each segment is named as one time Inter-segment number, which is equal to the first few seconds of the beginning of the fragment in the original file;

b.特徵抽取,步驟a.之多媒體片段若為視訊,則抽取960個維度的場景導向特徵(GIST);若為音訊,則抽取13維度的梅爾頻率倒頻譜特徵(MFCC); b. Feature extraction, if the multimedia segment of step a. is video, extract 960 dimensions of scene-oriented features (GIST); if it is audio, extract 13-dimensional Mel frequency cepstral feature (MFCC);

c.特徵編碼,將b.中每一片段所抽取之特徵,與其後一片段所抽取的特徵,每一維度的差異進行二元化編碼,若兩者差異大於0則編碼成1,反之則編碼成0,故若多媒體片段為視訊,則每一片段將會有960個維度0或是1,若多媒體片段為音訊,則每一片段有13個維度0或是1,以下稱為高維度編碼或特徵編碼。 c. feature coding, the feature extracted from each segment in b., and the feature extracted from the latter segment, the difference between each dimension is binary coded, if the difference between the two is greater than 0, the code is 1; Encoded to 0, so if the multimedia clip is video, each clip will have 960 dimensions 0 or 1. If the multimedia clip is audio, each clip has 13 dimensions 0 or 1, which is called high dimension. Encoding or feature encoding.

d.索引比對。 d. Index comparison.

其中,上述之步驟d該索引比對係將該每一片段的特徵編碼,逐一與該些群中心相比,找出與該高維度編碼距離最相近的群中心,接著對於該群中心索引中包含的對應多媒體檔名及時間片段,個別求出錄製媒體內容特徵與該時間片段的特徵之距離,距離最小的片段即為最相似的多媒體片段。 Wherein, in the step d, the index comparison encodes the feature of each segment, and finds the group center closest to the high-dimensional coding distance one by one compared with the group centers, and then in the group center index The corresponding multimedia file name and time segment are included, and the distance between the recorded media content feature and the feature of the time segment is separately determined, and the segment with the smallest distance is the most similar multimedia segment.

然而該步驟d之索引比對提出距離之計算係依據漢明距離或歐式距離。 However, the index of the step d is calculated based on the Hamming distance or the Euclidean distance.

本發明係以音樂的信號偵測為例,音樂的信號可為mp3或wav檔案格式。 The invention uses the signal detection of music as an example, and the signal of the music can be an mp3 or wav file format.

請參閱圖一所示,當內容供應者想建立其音樂內容之索引時,可使用本發明所提出之多媒體索引建立方法,依序執行下列四個步驟: Referring to FIG. 1 , when the content provider wants to establish an index of its music content, the following four steps can be performed in sequence using the multimedia index creation method proposed by the present invention:

a.將所有音樂切割成每秒38個片段。 a. Cut all the music into 38 clips per second.

b.每個片段都以梅爾導頻譜係數(MFCC)方法抽取特徵,a.與b.之步驟如圖三所示。 b. Each segment is extracted by the Mel's Guided Spectrum Coefficient (MFCC) method. The steps of a. and b. are shown in Figure 3.

c.將每個片段的梅爾導頻譜係數向量與其後一個向量相減,該差值向量中的數值若大於0則設為1,其餘設為0,其結果如圖五第一欄所示。 c. Subtract the Mel spectral coefficient vector of each segment from the latter vector. If the value in the difference vector is greater than 0, it is set to 1, and the rest is set to 0. The result is shown in the first column of Figure 5. .

d.將c.所建立的差值向量利用K-means演算法分成10,0000群,如圖四,並建立群中心與索引檔,索引檔格式如圖五第2及第3欄所示 d. The difference vector established by c. is divided into 10000 groups by K-means algorithm, as shown in Fig. 4, and the group center and index files are established. The index file format is shown in the second and third columns of Figure 5.

請參閱圖二所示,當用戶想查詢其音樂內容之索引時,可使用本發明所提出之錄製媒體之信號偵測的方法,依序執行下列四個步驟: Referring to FIG. 2, when the user wants to query the index of the music content, the following four steps can be performed in sequence by using the signal detection method of the recording medium proposed by the present invention:

e.將查詢音樂切割成每秒38個片段。 e. Cut the query music into 38 segments per second.

f.每個片段都以梅爾導頻譜係數(MFCC)方法抽取特徵,e.與f.之步驟如圖三所示。 f. Each segment is extracted by the Mel's Guided Spectrum Coefficient (MFCC) method. The steps of e. and f. are shown in Figure 3.

g.將每個片段的梅爾導頻譜係數向量與其後一個向量相減,該差值向量中的數值若大於0則設為1,其餘設為0,其結果如圖五第一欄所示。 g. Subtract the Mel spectral coefficient vector of each segment from the latter vector. If the value in the difference vector is greater than 0, it is set to 1, and the rest is set to 0. The result is shown in the first column of Figure 5. .

h.將每個g.所建立的編碼與d.所建立的索引檔進行比對,找出資料庫中所有相似的片段,一一比較查詢音樂之片段及其前後之編碼與該片段前後的編碼,加總計算出漢明距離(Hamming Distance),而最後資料庫中漢明距離最小的音樂即為所求。 h. Compare the code created by each g. with the index file created by d. Find all similar segments in the database, compare the segments of the query music and the code before and after the segment. The coding, plus the total, calculates the Hamming Distance, and the music with the smallest Hamming distance in the last database is the desired.

本發明提供一種錄製媒體之信號偵測方法,與其他習用技術相互比較時,更具備下列優點: The invention provides a signal detecting method for a recording medium, which has the following advantages when compared with other conventional technologies:

1.本發明可自動處理多媒體內容,產生多媒體索引。 1. The present invention automatically processes multimedia content and produces a multimedia index.

2.本發明可透過錄製媒體信號的方式自動找出對應的多媒體內容片段。 2. The invention can automatically find corresponding multimedia content segments by recording media signals.

3.本發明可自動分析電子儲存設備中的數位內容,進行自動標記與整理。 3. The invention can automatically analyze the digital content in the electronic storage device for automatic marking and sorting.

4.本發明利用編碼及取樣技術,可較為快速地找出結果。 4. The present invention utilizes coding and sampling techniques to find results relatively quickly.

5.本發明利用分群及容錯技術,可有效對抗錄製媒體信號時的干擾或雜訊。 5. The present invention utilizes grouping and fault tolerance techniques to effectively combat interference or noise when recording media signals.

上列詳細說明乃針對本發明之一可行實施例進行具體明,惟該實施例並非用以限制本發明之專利範圍,凡未脫離本發明技藝精神所為之等效實施或變更,均應包含於本案之專利範圍中。 The detailed description of the present invention is intended to be illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述,本案不僅於技術思想上確屬創新,並具備習用之傳統方法所不及之上述多項功效,已充分符合新穎性及進步性之法定發明專利要件,爰依法提出申請,懇請貴局核准本件發明專利申請案,以勵發明,至感德便。 To sum up, this case is not only innovative in terms of technical thinking, but also has many of the above-mentioned functions that are not in the traditional methods of the past. It has fully complied with the statutory invention patent requirements of novelty and progressiveness, and applied for it according to law. Approved this invention patent application, in order to invent invention, to the sense of virtue.

100‧‧‧時序分割 100‧‧‧Time Division

200‧‧‧特徵抽取 200‧‧‧ Feature extraction

300‧‧‧特徵編碼 300‧‧‧Character code

400‧‧‧建立索引 400‧‧‧ Indexing

500‧‧‧索引比對 500‧‧‧ index comparison

請參閱有關本發明之詳細說明及其附圖,將可進一步瞭解本發明之技術內容及其目的功效;有關附圖為:圖一為本發明之多媒體內容索引建立方法之流程圖;圖二為本發明之錄製媒體信號偵測方法之流程圖;圖三為多媒體內容索引建立方法及錄製媒體信號偵測方法之時序分割及特徵抽取步驟示意圖;圖四為多媒體內容索引建立方法及錄製媒體信號偵測方法之特徵編碼範例;以及圖五為多媒體內容索引建立方法之建立索引之範例。 Please refer to the detailed description of the present invention and the accompanying drawings, and the technical contents of the present invention and the functions thereof can be further understood. The related drawings are: FIG. 1 is a flowchart of a method for establishing a multimedia content index according to the present invention; The flowchart of the method for detecting a recording medium signal of the present invention; FIG. 3 is a schematic diagram of a method for establishing a multimedia content index and a method for detecting a timing of a recording medium signal, and a method for extracting features; FIG. 4 is a method for establishing a multimedia content index and recording a media signal. An example of feature encoding of a measurement method; and FIG. 5 is an example of indexing a method for establishing a multimedia content index.

100‧‧‧時序分割 100‧‧‧Time Division

200‧‧‧特徵抽取 200‧‧‧ Feature extraction

300‧‧‧特徵編碼 300‧‧‧Character code

400‧‧‧建立索引 400‧‧‧ Indexing

500‧‧‧索引比對 500‧‧‧ index comparison

Claims (7)

一種錄製媒體之信號偵測方法,透過錄製多媒體信號內容,進行分析與處理,搜尋相似多媒體片段,其至少包含相連接之多媒體索引建立方法與錄製媒體索引比對方法,其中,該多媒體索引建立方法其步驟更包含:a.時序分割,將N個多媒體檔案,編號1~N,按照時間順序分割成長度為若干秒的多媒體片段,每個片段都命名為該多媒體檔案編號加上底線及一個時間片段序號,這個序號等於該片段開頭在原始檔案中是第幾秒;b.特徵抽取,步驟a.中該些多媒體片段若為視訊,則抽取960個維度的場景導向特徵(GIST);若為音訊,則抽取13維度的梅爾頻率倒頻譜特徵(MFCC);c.特徵編碼,步驟b.中每一片段所抽取之特徵,與其後一片段所抽取的特徵,每一維度的差異進行二元化編碼,若兩者差異大於0則編碼成1,反之則編碼成0,故若多媒體片段為視訊,則每一片段將會有960個維度0或是1,若多媒體片段為音訊,則每一片段有13個維度0或是1,以下稱為高維度編碼或特徵編碼;以及d.建立索引。 A signal detecting method for recording media, which performs analysis and processing by recording multimedia signal content, searches for similar multimedia segments, and at least includes a connected multimedia index establishing method and a recording media index matching method, wherein the multimedia index establishing method The steps further include: a. timing division, dividing N multimedia files, numbered 1~N, into chronologically divided into multimedia segments of several seconds in length, each segment being named as the multimedia file number plus the bottom line and a time Fragment serial number, the serial number is equal to the first few seconds in the original file; b. feature extraction, in step a. in the multimedia segment, if the video segment is video, extract 960 dimensions of the scene-oriented feature (GIST); For audio, the 13-dimensional Mel frequency cepstral feature (MFCC) is extracted; c. feature coding, the feature extracted by each segment in step b., and the feature extracted from the latter segment, the difference between each dimension is performed. Meta-encoding, if the difference between the two is greater than 0, the code is 1; otherwise, the code is 0, so if the multimedia segment is video, each segment will There will be 960 dimensions 0 or 1. If the multimedia clip is audio, each clip has 13 dimensions 0 or 1, hereinafter referred to as high dimensional encoding or feature encoding; and d. indexing. 如申請專利範圍第1項所述之錄製媒體之信號偵測方法,其中,步驟d該建立索引係為建立出多媒體索引,即依據該些特徵編碼,將這些高維度的編 碼利用分群演算法分群並取得群中心,將每個高維度編碼所對應的多媒體檔案編號與時間片段序號,紀錄在距離該二元化編碼最接近之群中心的索引中。 The method for detecting a recording medium according to the first aspect of the patent application, wherein the step d is to establish a multimedia index, that is, according to the feature encoding, the high-dimensional editing is performed. The code is grouped by the grouping algorithm and the group center is obtained, and the multimedia file number and the time segment number corresponding to each high-dimensional code are recorded in the index of the group center closest to the binary code. 如申請專利範圍第2項所述之錄製媒體之信號偵測方法,其中,該分群演算法係為使用非監督式分群法,給定群中心的個數為總共的多媒體個數開根號。 The method for detecting a recording medium according to the second aspect of the patent application, wherein the grouping algorithm is to use an unsupervised grouping method, and the number of the given group centers is a total number of multimedia numbers. 如申請專利範圍第2項所述之錄製媒體之信號偵測方法,其中,該距離之計算係為依據漢明距離或歐式距離。 The method for detecting a signal of a recording medium according to claim 2, wherein the calculation of the distance is based on a Hamming distance or an Euclidean distance. 如申請專利範圍第1項所述之錄製媒體之信號偵測方法,其中該錄製媒體索引比對方法,其步驟更包含:a.時序分割,將多媒體檔案按照時間順序分割成長度為若干秒的多媒體片段,每個片段都命名為一個時間片段序號,這個序號等於該片段開頭在原始檔案中是第幾秒;b.特徵抽取,步驟a.之多媒體片段若為視訊,則抽取960個維度的場景導向特徵(GIST);若為音訊,則抽取13維度的梅爾頻率倒頻譜特徵(MFCC);c.特徵編碼,將b.中每一片段所抽取之特徵,與其後一片段所抽取的特徵,每一維度的差異進行二元化編碼,若兩者差異大於0則編碼成1,反之則編碼成0,故若多媒體片段為視訊,則每一 片段將會有960個維度0或是1,若多媒體片段為音訊,則每一片段有13個維度0或是1,以下稱為高維度編碼或特徵編碼;d.索引比對。 The method for detecting a recording medium according to the first aspect of the invention, wherein the recording medium index comparison method further comprises: a. timing division, dividing the multimedia file into a length of several seconds in time sequence. The multimedia segment, each segment is named as a time segment number, which is equal to the first few seconds in the original file; b. feature extraction, if the multimedia segment of step a. is video, then 960 dimensions are extracted. Scene-oriented feature (GIST); if it is audio, extract 13-dimensional Mel frequency cepstral feature (MFCC); c. feature coding, extract the feature extracted from each segment in b. Feature, the difference of each dimension is binary coded, if the difference between the two is greater than 0, the code is 1; if the code is 0, if the multimedia segment is video, then each The segment will have 960 dimensions 0 or 1. If the multimedia segment is audio, each segment has 13 dimensions 0 or 1, hereinafter referred to as high-dimensional coding or feature coding; d. index alignment. 如申請專利範圍第5項所述之錄製媒體之信號偵測方法,其中步驟d該索引比對係將該每一片段的特徵編碼,逐一與該些群中心相比,找出與該高維度編碼距離最相近的群中心,接著對於該群中心索引中包含的對應多媒體檔名及時間片段,個別求出錄製媒體內容特徵與該時間片段的特徵之距離,距離最小的片段即為最相似的多媒體片段。 The method for detecting a recording medium according to claim 5, wherein the step d is to encode the feature of each segment, and compare with the group centers one by one to find the high dimension. The group center with the closest coding distance, and then the distance between the recorded media content feature and the feature of the time segment is individually determined for the corresponding multimedia file name and time segment included in the group center index, and the segment with the smallest distance is the most similar. Multimedia clips. 如申請專利範圍第6項所述之錄製媒體之信號偵測方法,其中步驟d該索引比對提出距離之計算係依據漢明距離或歐式距離。 The method for detecting a signal of a recording medium according to claim 6, wherein the calculation of the index by the step d is based on a Hamming distance or an Euclidean distance.
TW101134398A 2012-09-20 2012-09-20 Record the signal detection method of the media TWI516098B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW101134398A TWI516098B (en) 2012-09-20 2012-09-20 Record the signal detection method of the media
CN2012105322318A CN103065661A (en) 2012-09-20 2012-12-11 Signal detection method for recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW101134398A TWI516098B (en) 2012-09-20 2012-09-20 Record the signal detection method of the media

Publications (2)

Publication Number Publication Date
TW201414289A TW201414289A (en) 2014-04-01
TWI516098B true TWI516098B (en) 2016-01-01

Family

ID=48108256

Family Applications (1)

Application Number Title Priority Date Filing Date
TW101134398A TWI516098B (en) 2012-09-20 2012-09-20 Record the signal detection method of the media

Country Status (2)

Country Link
CN (1) CN103065661A (en)
TW (1) TWI516098B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103974143B (en) * 2014-05-20 2017-11-07 北京速能数码网络技术有限公司 A kind of method and apparatus for generating media data
CN114978840B (en) * 2022-05-13 2023-08-18 天津理工大学 Physical layer safety and high-spectrum efficiency communication method in wireless network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006132596A1 (en) * 2005-06-07 2006-12-14 Matsushita Electric Industrial Co., Ltd. Method and apparatus for audio clip classification
CN101196888A (en) * 2006-12-05 2008-06-11 云义科技股份有限公司 System and method for using digital audio characteristic set to specify audio frequency
CN101894251A (en) * 2009-05-21 2010-11-24 国家广播电影电视总局广播科学研究院 Video detection method and device
CN102033927B (en) * 2010-12-15 2012-09-05 哈尔滨工业大学 Rapid audio searching method based on GPU (Graphic Processing Unit)
CN102508910A (en) * 2011-11-11 2012-06-20 大连理工大学 Image retrieval method based on minimum projection errors of multiple hash tables

Also Published As

Publication number Publication date
TW201414289A (en) 2014-04-01
CN103065661A (en) 2013-04-24

Similar Documents

Publication Publication Date Title
US7451078B2 (en) Methods and apparatus for identifying media objects
US8352259B2 (en) Methods and apparatus for audio recognition
US7240207B2 (en) Fingerprinting media entities employing fingerprint algorithms and bit-to-bit comparisons
CN109117622B (en) Identity authentication method based on audio fingerprints
West et al. A model-based approach to constructing music similarity functions
JP5366212B2 (en) Video search apparatus, program, and method for searching from multiple reference videos using search key video
TWI516098B (en) Record the signal detection method of the media
Xiao et al. Fast Hamming Space Search for Audio Fingerprinting Systems.
Gurjar et al. Comparative Analysis of Music Similarity Measures in Music Information Retrieval Systems.
Haitsma et al. An efficient database search strategy for audio fingerprinting
Silva et al. A video compression-based approach to measure music structural similarity
US10776420B2 (en) Fingerprint clustering for content-based audio recognition
You et al. Music Identification System Using MPEG‐7 Audio Signature Descriptors
CN113420178A (en) Data processing method and equipment
Htun Analytical approach to MFCC based space-saving audio fingerprinting system
JP6031475B2 (en) Hamming space search device, Hamming space search method, Hamming space search program, and recording medium
Anguera et al. Multimodal video copy detection applied to social media
Gao et al. Octave-dependent probabilistic latent semantic analysis to chorus detection of popular song
Myung et al. Two‐pass search strategy using accumulated band energy histogram for HMM‐based identification of perceptually identical music
Subramanian et al. Concert Stitch: Organization and Synchronization of Crowd Sourced Recordings.
Chung et al. Identical-video retrieval using the low-peak feature of a video’s audio information
BA MUSIC RECOGNITION USING AUDIO FINGERPRINTING
Sonje et al. Accelerating Content Based Music Retrieval Using Audio Fingerprinting
Yoon et al. Robust music information retrieval on mobile network based on multi-feature clustering
Chung et al. An algorithm that minimizes audio fingerprints using the difference of Gaussians