CN101673264B

CN101673264B - Audio content searching device

Info

Publication number: CN101673264B
Application number: CN2008100428555A
Authority: CN
Inventors: 连惠城; 程建章
Original assignee: Chuanxian Network Technology Shanghai Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2008-09-12
Filing date: 2008-09-12
Publication date: 2012-11-07
Anticipated expiration: 2028-09-12
Also published as: CN101673264A

Abstract

The invention discloses an audio content searching device, which comprises an audio fingerprint extraction module, an audio fingerprint segmentation module, an index generation module and a search module, wherein the audio fingerprint extraction module is used for extracting the audio fingerprints of a plurality of audio files; the audio fingerprint segmentation module is connected with the audio fingerprint extraction module and is used for segmenting the extracted audio fingerprint; the index generation module is connected with the audio fingerprint segmentation module and is used for generating an audio fingerprint index according to the result of the segmentation; and the search module is connected with the index generation module and is used for searching a matched audio file by suing the audio fingerprint file. In the device, a segmentation technique in a text search engine is used to perform segmentation processing of the audio fingerprint files first, then an index technique in the field of text search is used to perform the index processing of the audio fingerprints, and finally, after the index processing is completed, the search engine can search an audio segment input by a user. Thus, the method facilitates the search of the user and improves search efficiency.

Description

The searcher of audio content

Technical field

The present invention relates to the searcher of audio content.

Background technology

Along with Internet development, search engine becomes people's one of necessary tool of surfing the Net.Traditional search engine all is based on text search (Text Search), and being called is text search engine.Its principle is: search engine server is collected a large amount of webpages; And according to the text in the existing Rule Extraction webpage and do participle (Word Segmentation) and handle, common segmenting method, for example: based on the segmenting method of string matching, based on the segmenting method of understanding with based on the segmenting method of statistics; Text search engine utilizes the text dictionary to index and shows to be used for quick search.The user is input to server with text when searching for, server is searched for according to concordance list, then return results after the text is carried out word segmentation processing fast.

At present, search engine all is based on text, searches for even the search engine of some search pictures or audio frequency also is text messages such as title, explanation, introduction, label through picture or audio program.Search engine does not also have directly to search for through the signal content of audio frequency.

Audio-frequency fingerprint (audio fingerprinting) just is being suggested a long time ago; For example, Jaap Haitsma and TonKalke have delivered " a kind of audio fingerprint system of high reliability " (A Highly Robust AudioFingerprinting System) on music searching in 2002 makes progress international conference (Proceedings of International Conference on MusicInformation Retrieval).This system passes through method for processing signals; With the sound signal of (for example 11.6ms) at set intervals in the audio file; Be converted into the fingerprint (fingerprint) of one 32 bit (bit) size, an audio file just can be converted into a file fingerprint by this method.System just can carry out fast audio-frequency fingerprint and retrieve behind table that all audio-frequency fingerprint files are indexed.

Under audio-frequency fingerprint number of files less (for example 10,000 s') situation, can all file fingerprints be deposited in the calculator memory, carry out index after, can retrieve fast easily.Above-mentioned " a kind of audio fingerprint system of high reliability " promptly provided the detailed step of this method.Yet under actual conditions, the number of audio file will be considerably beyond 10,000 number.For example, the number of audio files that occurs on the internet at present surpasses 10,000,000 numbers, and quantity is in continuous growth.Therefore adopt this method to be difficult to make practical search engine.

Summary of the invention

In order to solve the problems of the technologies described above, the present invention provides a kind of searcher of audio content, and it is audio-frequency fingerprint search engine (audio fingerprint search engine) that this search engine is called.

The present invention adopts following technical scheme:

A kind of searcher of audio content comprises:

The audio-frequency fingerprint extraction module is used to extract the audio-frequency fingerprint of a plurality of audio files;

The audio-frequency fingerprint word-dividing mode is connected with said audio-frequency fingerprint extraction module, is used for the audio-frequency fingerprint that extracts is carried out participle;

The index generation module is connected with said audio-frequency fingerprint word-dividing mode, is used for generating the audio-frequency fingerprint index according to word segmentation result;

Search module is connected with said index generation module, is used to provide the audio file that utilizes this audio-frequency fingerprint indexed search coupling.

Further, also comprise the memory module that is connected between said index generation module and the said search module, be used for store audio fingerprints, said audio-frequency fingerprint index and corresponding audio files thereof.

The present invention is through adopting the participle technique in the text search engine; On the audio-frequency fingerprint file, carry out word segmentation processing; Adopt the index technology in the text search field that audio-frequency fingerprint is carried out index process then; After index process was accomplished, search engine can be searched for the audio fragment of user's input.Not only make things convenient for user's search, and improved the efficient of search.

Further specify the present invention below in conjunction with accompanying drawing and embodiment.

Description of drawings

Fig. 1 is an audio-frequency fingerprint searcher example structure synoptic diagram of the present invention.

Embodiment

As shown in Figure 1, a kind of searcher of audio content comprises:

Wherein, also comprise the memory module that is connected between said index generation module and the said search module, be used for store audio fingerprints, said audio-frequency fingerprint index and corresponding audio files thereof.

Search can be adopted audio file or the audio file fragment according to the needs retrieval of input; Extract its audio-frequency fingerprint through said audio-frequency fingerprint extraction module; And this audio-frequency fingerprint is carried out participle through the audio-frequency fingerprint word-dividing mode, according to the audio file of word segmentation result search matched in this audio-frequency fingerprint index; Also can directly import audio-frequency fingerprint according to the user, utilize the audio-frequency fingerprint word-dividing mode that this audio-frequency fingerprint is carried out participle after, according to the audio file of word segmentation result search matched in this audio-frequency fingerprint index.

Participle mode in the foregoing description can adopt multiple mode to realize, below enumerates several kinds of modes and explains respectively.

Mode one

Employing is carried out word segmentation processing based on the Statistic for Chinese segmenting method to audio-frequency fingerprint.At first with the file fingerprint of 15000 audio files by the method generation fixed width of above-mentioned Jaap Haitsma and Ton Kalke, its width can be 32 bits or 16 bits, and each file fingerprint that obtains on average is made up of the fingerprint of about 10000 fixed width.The data of each 32 bit or 16 bits are counted as a word in the Chinese.It is 15000 pieces " articles " that all 15000 file fingerprints that comprise " word " are taken as, and these " articles " then carry out participle as the language material of Chinese word segmentation.In statistic processes, the frequency of the combination of each " word " of adjacent co-occurrence in the audio frequency language material is added up.The combination that the co-occurrence frequency is high is considered to a speech, is called " fingerprint speech ".For example; The combination of the fingerprint of 7 continuous scale-of-two " 00000000000000000000000000000000 " that frequency is higher; With the combination of the fingerprint of 5 continuous scale-of-two " 11111111111111111111111111111111 " be the higher fingerprint combination of frequency by statistics, they are used as " fingerprint speech ".

Mode two

Adopting the fingerprint width is the audio-frequency fingerprint method for distilling of 16 bits.Specifically be to be that the fingerprint of 32 bits carries out the fingerprint that interval sampling obtains 16 bits with width in the mode one.Adopt identical with mode one word segmentation processing of carrying out audio-frequency fingerprint based on the Statistic for Chinese segmenting method then.

Claims

1. the searcher of an audio content is characterized in that comprising:

Search module is connected with said index generation module, is used to provide the audio file that utilizes this audio-frequency fingerprint indexed search coupling;

Said audio-frequency fingerprint participle may further comprise the steps:

1) audio file is generated the audio-frequency fingerprint file of fixed width;

2) frequency of the combination of adjacent co-occurrence fingerprint in the audio-frequency fingerprint file is added up;

3) speech is thought in the co-occurrence frequency is high combination.

2. the searcher of audio content according to claim 1; It is characterized in that also comprising: be connected in the memory module between said index generation module and the said search module, be used for store audio fingerprints, said audio-frequency fingerprint index and corresponding audio files thereof.