CN104834740A

CN104834740A - Full-automatic audio/video structuralized accurate searching method

Info

Publication number: CN104834740A
Application number: CN201510258687.3A
Authority: CN
Inventors: 常锴; 罗振坤
Original assignee: East Shenzhen Tai Ming Science And Technology Ltd
Priority date: 2015-05-20
Filing date: 2015-05-20
Publication date: 2015-08-12

Abstract

The embodiment of the invention relates to a full-automatic audio/video structuralized accurate searching method, in particular to a method for structuralizing a full-automatic audio/video content text and then accurately searching each second of audio/video content based on the large voice identification technology. The full-automatic audio/video structuralized accurate searching method is used for rapidly structuralizing mass audio/video content in internet data on a large scale, assisting a user in improving audio/video content accuracy and reducing audio/video content searching duration and acquisition cost of search results. The full-automatic audio/video structuralized accurate searching method includes two aspects which are a full-automatic audio/video content data structuralizing method and an accurate structuralized audio/video searching method.

Description

A kind of full-automatic audio frequency and video structuring and the method for precisely searching for

Technical field

Example of the present invention relates to a kind of full-automatic audio frequency and video structuring and the method for precisely searching for, and particularly relates to a kind of based on the method for search audio-video frequency content accurate after the full-automatic audio-video frequency content text-type structuring of large speech recognition technology to each second.

Background technology

Audio-video frequency content automatic structureization and the object of precisely searching for, be help user to get the audio-video frequency content wanted most more rapidly and accurately in internet mass audio-video frequency content, help user to save and obtain the time of the relevant accurate content of audio frequency and video and reduce procurement cost.

Along with the fast development of Internet technology and Internet service, the data type in internet constantly increases fast, data type except word with figuresheet also has a large amount of audio frequency and video outward.In data type in internet, word with figuresheet has been now structural data all, searches the content needed most with can facilitating user's fast accurate.But, the audio-video frequency content of the magnanimity in internet data develops into structural data not yet on a large scale, therefore, how fast and effeciently magnanimity audio frequency and video to be carried out full-automatic content structure process and carry out precisely search to audio-video frequency content becoming the problem needing to solve.

The audio video searching method generally used at present is: search for based on the word in the title of the audio frequency and video of human-edited or brief introduction or label, the defect of this search is that searchable word is limited, and be all that artificial later stage compilation is added, objectivity and the accuracy of Search Results are lower, further, this way of search precisely cannot search the key content a certain second in audio frequency and video.

A kind of audio video searching method is in addition at present: extract the some crucial track in audio frequency and video or key frame, be that a certain feature to be gone in audio frequency and video to be searched by track or mates screening frame by frame with the static information in crucial track or key frame, the defect of this way of search be to need according to time ordered pair key sound rail or the screening of key frame repeated matching and search, operand in search procedure is quite huge, and along with the continuous increase in audio frequency and video storehouse to be searched, the search efficiency of the method can exponentially level decline, and searches for consuming time long.

Summary of the invention

For solving the problem and overcoming Problems existing in correlation technique, example of the present invention discloses a kind of full-automatic audio frequency and video structuring and the method for precisely searching for, help user to improve the accuracy of audio-video frequency content search in order to the magnanimity audio-video frequency content in quick large-scale structure internet data, reduce duration and the procurement cost of Search Results of audio-video frequency content search.

A kind of full-automatic audio frequency and video structuring disclosed in example of the present invention comprises two aspects with the method for precisely searching for, and is the accurate searching method of audio frequency and video after a kind of full-automatic data structured method of audio-video frequency content and a kind of structuring respectively.

According to the first aspect of disclosure example, provide a kind of full-automatic data structured method of audio-video frequency content, process is as follows.

System automatically on internet or LAN (Local Area Network) batch extracting treat structuring audio frequency and video, and record each internet treating structuring audio frequency and video extracted or lan address.

System automatically utilize audio analysis techniques batch to extract each above-mentioned corresponding complete track until structuring audio frequency and video of having extracted and be compressed to the sound signal that is not less than 16bit with until after use.

System automatically by above-mentioned each to have extracted and the stand-by track logic being compressed to the sound signal being not less than 16bit is cut into multiple track in short-term in seconds.

System is automatically for multiple tracks in short-term of the above-mentioned cutting of logic sequentially mark Millisecond beginning and ending time code.

System is automatically by the multiple tracks in short-term also sequentially marking Millisecond beginning and ending time code of the above-mentioned cutting of logic, submit to multiple speech recognition server respectively in the mode of batch multithreading simultaneously, utilize speech recognition technology to complete the full-automatic conversion of sound to text character.

System automatically by above-mentioned completed multiple conversions in short-term corresponding to track that sound transforms to text character after text fragments fetch, and each character in text fragments after all conversions is sequentially marked upper corresponding Millisecond beginning and ending time code.

All characters that system has marked Millisecond beginning and ending time code by above-mentioned automatically and text fragments are sequentially combined into complete text again, and each character in full copy all has the Millisecond beginning and ending time code of its correspondence.

System automatically by the above-mentioned full copy of Millisecond beginning and ending time code that marked with the complete track corresponding to it and treat that structuring audio frequency and video all synchronously set up complete unique mapping relations, that is, treat that each sound in the complete track of structuring audio frequency and video all has a unique corresponding text character marking Millisecond beginning and ending time code.

The internet that system treats structuring audio frequency and video by above-mentioned automatically or lan address, the complete track corresponding to it and the unique full copy marking Millisecond beginning and ending time code corresponding to it are with character string mode typing structuring audio frequency and video index data base.

So far, the full-automatic data structured process of audio-video frequency content completes.

According to the second aspect of disclosure example, provide the audio frequency and video after a kind of structuring accurate searching method, process is as follows.

The accurate searching request of video that system receives user is initiated, at least carries video content keyword character or the subjective video presentation ocra font ocr thought of user in described searching request.

The mode that system is retrieved in full automatically from the structuring audio frequency and video index data base described in disclosure example first aspect, extract the multiple character strings consistent with above-mentioned user search request, utilize clustering algorithm to determine the audio and video resources of Search Results to be presented respectively, and be each audio and video resources determination string matching degree mark to be presented.

System automatically from the structuring audio frequency and video index data base described in disclosure example first aspect in the mode of context semantic analysis, extract and multiple character strings approximate in above-mentioned user search request, utilize clustering algorithm to determine the audio and video resources of Search Results to be presented respectively, and be each audio and video resources determination semantic matching degree mark to be presented.

System utilizes formula automatically: string matching degree mark+semantic matching degree mark, calculates the final score of each audio and video resources to be presented respectively.

System according to the final score of each audio and video resources to be presented, in the mode of descending list, to the final Search Results of user feedback.

Accompanying drawing explanation

Fig. 1 in Figure of description page is the implementing procedure figure of a kind of full-automatic audio frequency and video structuring and accurate method of searching in the embodiment of the present invention.

Claims

1. full-automatic audio frequency and video structuring and a method of precisely searching for, it is characterized in that, described method mainly comprises: the automatic extraction module of audio frequency and video, sound literary composition automatic converting module, sound literary composition automatic coupling module and structured content search module.

2. the automatic extraction module of audio frequency and video of a kind of full-automatic audio frequency and video structuring according to claim 1 and accurate method of searching for, it is characterized in that, automatically on internet or LAN (Local Area Network) batch extracting treat structuring audio frequency and video and record the internet of its correspondence or lan address, extraction and compress the track of its correspondence, logic cuts each track and is multiple track in short-term in seconds and sequentially marks beginning and ending time code.

3. a kind of full-automatic audio frequency and video structuring according to claim 1 automatic converting module civilian with the sound of the method for precisely searching for, it is characterized in that, the sound utilizing speech recognition technology to complete above-mentioned all tracks in short-term in the mode of batch multithreading to text character full-automatic conversion and each character in text fragments after all conversions is sequentially marked upper corresponding beginning and ending time code.

4. a kind of full-automatic audio frequency and video structuring according to claim 1 automatic coupling module civilian with the sound of the method for precisely searching for, it is characterized in that, system automatically by the above-mentioned full copy of Millisecond beginning and ending time code that marked with the complete track corresponding to it and treat that structuring audio frequency and video all synchronously set up complete unique mapping relations, each sound in complete track is made all to have a unique corresponding text character marking Millisecond beginning and ending time code, simultaneously, system is automatically by above-mentioned internet or the lan address for the treatment of structuring audio frequency and video, complete track corresponding to it and the unique full copy marking Millisecond beginning and ending time code corresponding to it are with character string mode typing structuring audio frequency and video index data base.

5. the structured content search module of a kind of full-automatic audio frequency and video structuring according to claim 1 and accurate method of searching for, it is characterized in that, the mode that system is retrieved in full automatically in said structure audio frequency and video index data base, extract the multiple character strings consistent with user search request, clustering algorithm is utilized to determine the audio and video resources of Search Results to be presented respectively, simultaneously, system automatically from said structure audio frequency and video index data base in the mode of context semantic analysis, extract and multiple character strings approximate in user search request, clustering algorithm is utilized to determine the audio and video resources of Search Results to be presented respectively.