CN106055570A

CN106055570A - Video retrieval device based on audio data and video retrieval method for same

Info

Publication number: CN106055570A
Application number: CN201610339063.9A
Authority: CN
Inventors: 高万林; 李佳璇; 冯慧; 张莉; 于丽娜; 宋越
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2016-10-26

Abstract

The invention discloses a video retrieval device based on audio data and a video retrieval method for the same. The device comprises a video database module, a first audio-video separation module, an audio database module, an audio-video data receiving module, a second audio-video separation module, an audio data matching module and a video retrieval display module, wherein the video database module is used to store video data; the first audio-video separation module is used to separate audio data of the video data in the video database module; the audio database module is used to store the audio data obtained by the first audio-video separation module; the audio-video data receiving module is used to receive audio or video data input by a user; the second audio-video separation module is used to separate audio data in received video data after the audio-video data receiving modules receives the video data; the audio data matching module is used to match the audio data input by the user or the audio data obtained by the second audio-video separation module with the audio data in the audio database module, so that one piece or multiple pieces of target audio data can be obtained; and the video retrieval display module is used to display target video data corresponding to the target audio data to the user.

Description

The device of a kind of video frequency searching based on voice data and video retrieval method thereof

Technical field

The present invention relates to multimedia technology field, be specifically related to a kind of video frequency searching based on voice data device and Video retrieval method.

Background technology

Under big data age, video data rapid development, of a great variety, enormous amount, the most in real time, efficiently and accurately Retrieval video, is one of current information-intensive society problem demanding prompt solution.The requirement of video frequency searching is not only satisfied with logical by people Cross its metadata (such as video name, author etc.) and obtain corresponding video content, and more want to by a bit of the unknown The video intelligent in source quickly obtains the complete video information of its place video, and therefore, content based video retrieval system is near Study hotspot over a little years.Video, as a kind of aggregate data, contains much information, such as image, word, sound etc., Therefore it is currently based on the video frequency searching of content to be typically to combine much information mode video is retrieved, wherein image inspection Rope is often as main retrieval mode, and retrieval is optimized by audio-frequency information often as one auxiliary information, and independent The research started with from audio frequency is few.On the other hand, content-based audio retrieval is intended to some spies by audio content itself Levying, retrieve its complete information, wherein, content-based music retrieval a lot of APP realize.And " will listen to sing and know song " this merit The system that can amplify video aspect there is no the research that comparison is complete at present.

Summary of the invention

In view of the above problems, the present invention proposes and overcomes the problems referred to above or solve the one of the problems referred to above at least in part The device of video frequency searching based on voice data and video retrieval method thereof.

For this purpose it is proposed, first aspect, the present invention proposes the device of a kind of video frequency searching based on voice data, including:

Video data library module, is used for storing video data, and receive user and/or manager's input be used for update The video data of video database；

First audio frequency and video separation module, for the audio frequency separated in described video data library module in the video data of storage Data；

Voice data library module, for storing the voice data of described first audio frequency and video separation module isolated；

Audio, video data receiver module, for receiving voice data or the video data of user's input；

Second audio frequency and video separation module, for after described audio, video data receiver module receives video data, separates Voice data in the video data that described audio, video data receiver module receives；

Voice data matching module, for voice data user inputted or the second audio frequency and video separation module isolated Voice data with in described voice data library module storage voice data mate, obtain one or more target audio Data；Described target audio data are and being stored in described voice data library module of matching of voice data of user's input Voice data；

Video frequency searching display module, for by target video data corresponding for the one or more target audio data to User shows, described target video data is the video data of storage in described video data library module.

Optionally, described first audio frequency and video separation module, including:

Segregant module, for the voice data separated in described video data library module in the video data of storage；

Labeling submodule, for the voice data of described segregant module isolated increases mark, described mark is used Corresponding relation between instruction voice data and video data；

Correspondingly, described voice data library module, for storing the voice data increasing mark.

Optionally, described device also includes:

First audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, to described audio database In module, the voice data of storage carries out audio-frequency fingerprint extraction；

Fingerprint database module, for storing the audio-frequency fingerprint that described first audio-frequency fingerprint extraction module extracts；

Index data library module, for storing audio-frequency fingerprint and the audio frequency that described first audio-frequency fingerprint extraction module extracts Index relative between data；

First audio classification module, for the audio-frequency fingerprint stored based on described fingerprint database module, to described audio frequency The voice data of DBM storage is classified.

Optionally, described device also includes:

Second audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, to described audio, video data The voice data of user's input that receiver module receives or the voice data of described second audio frequency and video separation module isolated Carry out audio-frequency fingerprint extraction；

Second audio classification module is for the audio-frequency fingerprint extracted based on described second audio-frequency fingerprint extraction module, right The voice data of described user input or the voice data of described second audio frequency and video separation module isolated are classified.

Optionally, described voice data matching module, including:

Voice data to be retrieved determines subelement, for the voice data that obtains based on described second audio classification module The classification of the voice data of the described audio database module stores that classification and described first audio classification module obtain, from institute State and the voice data of audio database module stores determines each voice data to be retrieved；The class of described each voice data to be retrieved Not identical with the classification of the voice data that described second audio classification module obtains；

The audio-frequency fingerprint of voice data to be retrieved determines subelement, for sound based on described index data base module stores Frequently the index relative between fingerprint and voice data, determines the audio-frequency fingerprint that each voice data to be retrieved is corresponding；

Audio-frequency fingerprint coupling subelement, the audio-frequency fingerprint being used for obtaining described second audio-frequency fingerprint extraction module is with described The audio-frequency fingerprint of voice data to be retrieved determines that the audio-frequency fingerprint that audio frequency each to be detected that subelement determines is corresponding mates, To one or more target audio data.

Second aspect, the present invention also proposes a kind of video retrieval method based on the device described in second aspect, including:

Audio, video data receiver module receives voice data or the video data of user's input；

After described audio, video data receiver module receives video data, the second audio frequency and video separation module separates described sound Voice data in the video data that video data receiver module receives；

Voice data that user is inputted by voice data matching module or described second audio frequency and video separation module isolated Voice data with in voice data library module storage voice data mate, obtain one or more target sound frequency According to；Described target audio data are and being stored in described voice data library module of matching of voice data of user's input Voice data；

Video frequency searching display module by target video data corresponding for the one or more target audio data to user Display, described target video data is the video data of storage in video data library module；In described voice data library module The video data that voice data is separated in described video data library module by the first audio frequency and video separation module obtains.

Optionally, after described audio, video data receiver module receives voice data or the video data of user's input, institute Method of stating also includes:

Second audio-frequency fingerprint extraction module, based on default audio-frequency fingerprint extracting rule, receives mould to described audio, video data The voice data of user's input or the voice data of described second audio frequency and video separation module isolated that block receives carry out sound Frequently fingerprint extraction；

The audio-frequency fingerprint that second audio classification module is extracted based on described second audio-frequency fingerprint extraction module, to described use The voice data of family input or the voice data of described second audio frequency and video separation module isolated are classified.

Optionally, user is inputted by described voice data matching module voice data or described second audio frequency and video splitting die The voice data of block isolated mates with the voice data of storage in voice data library module, obtains one or more mesh Mark voice data, including:

The classification of the voice data that described voice data matching module obtains based on described second audio classification module and The classification of the voice data of the described audio database module stores that the first audio classification module obtains, from described audio database The voice data of module stores determines each voice data to be retrieved；The classification and described second of described each voice data to be retrieved The classification of the voice data that audio classification module obtains is identical；

Between described voice data matching module audio-frequency fingerprint based on index data base module stores and voice data Index relative, determines the audio-frequency fingerprint that each voice data to be retrieved is corresponding；

The audio-frequency fingerprint that described second audio-frequency fingerprint extraction module obtains is treated by described voice data matching module with described The audio-frequency fingerprint of retrieval voice data determines that the audio-frequency fingerprint that audio frequency each to be detected that subelement determines is corresponding mates, and obtains One or more target audio data.

Compared to prior art, the device of the video frequency searching based on voice data that the present invention proposes and video frequency searching side thereof Method, goes out to comprise the whole of similar audio content according to the audio retrieval in a bit of video that user is interested and completely regards Frequently, existing video frequency searching scheme is overcome not to be based only on the deficiency that video sound intermediate frequency data carry out retrieving.

Accompanying drawing explanation

The structure drawing of device of a kind of based on voice data the video frequency searching that Fig. 1 provides for first embodiment of the invention；

The video frequency searching of the device of a kind of based on voice data the video frequency searching that Fig. 2 provides for second embodiment of the invention Method flow diagram.

Detailed description of the invention

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of embodiment rather than whole embodiments.

It should be noted that in this article, " first " is used merely to " second " separate identical name region, and not It is to imply the relation between these titles or order.

As it is shown in figure 1, the present embodiment discloses the device of a kind of video frequency searching based on voice data, it may include such as lower mold Block: video data library module the 11, first audio frequency and video separation module 12, voice data library module 13, audio, video data receiver module 14, the second audio frequency and video separation module 15, voice data matching module 16 and video frequency searching display module 17.Each module is specifically retouched State as follows:

Video data library module 11, is used for storing video data, and receive user and/or manager's input for more The video data of new video data base.In the present embodiment, in video data library module 11, the video data of storage can be updated, and uses Video data library module 11 all can be updated by family or manager.In a particular application, video data library module 11 can be by The storage hardware such as memory hardware such as hard disk and relevant database software thereof combine realization.

First audio frequency and video separation module 12, for separating in the video data stored in described video data library module 11 Voice data.In the present embodiment, the first audio frequency and video separation module 12 is by the video data of storage in video data library module 11 Voice data separate, it is simple to carry out video frequency searching based on voice data.In a particular application, the first audio frequency and video splitting die Block 12 can be realized by the processor hardware such as processor hardware such as single-chip microcomputer, DSP, ARM.

Voice data library module 13, for storing the voice data of described first audio frequency and video separation module 12 isolated. In the present embodiment, voice data library module 13 can be soft by the storage hardware such as memory hardware such as hard disk and relevant data base thereof Part combines realization.

Audio, video data receiver module 14, for receiving voice data or the video data of user's input.In the present embodiment, Audio, video data receiver module 14 can be made up of mike, denoising device, USB interface and display, and display provides user Operation interface, user can select directly to play video segment or by video segment duplication to carrying this video frequency searching device In terminal, it is possible to the auxiliary information of the data inquired about is provided in operation interface, the most unique including audio types, main Audio types, whether it is to carry out inquiry etc. for the first time.

Second audio frequency and video separation module 15, is used for after described audio, video data receiver module 14 receives video data, Separate the voice data in the video data that described audio, video data receiver module 14 receives.In a particular application, if used Mike inputting audio data are passed through at family, then denoising device is to voice data is transferred to after voice data denoising voice data coupling Module 16, if user is by USB typing video data, then through the second audio frequency and video separation module 15 by the sound in video data Frequency is according to separating and be transferred to voice data matching module 16.

Voice data matching module 16, separates for voice data or the second audio frequency and video separation module 15 user inputted The voice data obtained carries out audio similarity with the voice data of storage in described voice data library module 13 and mates, and obtains one Individual or multiple target audio data；Described target audio data are that being stored in of matching with the voice data of user's input is described Voice data in voice data library module 13.In the present embodiment, voice data library module 13 sound intermediate frequency similarity can be more than The voice data of preset audio similarity thresholding is as target audio data.

Video frequency searching display module 17, for by target video data corresponding for the one or more target audio data Displaying to the user that, described target video data is the video data of storage in described video data library module 11.In the present embodiment, If there being multiple target video data, multiple target video data can be ranked up by video frequency searching display module 17, and sequence depends on According to descending for audio similarity, and display to the user that each target video data after sequence, certainly, in order to make retrieval result More effectively, optional several the forward video datas that sort show, the most front 3 video datas.User can be to display Video data select.

Visible, the device of the disclosed video frequency searching based on voice data of the present embodiment, interested by user is inputted Voice data corresponding to video segment mate with the voice data of storage in voice data library module, it is achieved for completely The retrieval of video, thus meet user's demand for the retrieval of one section of video segment place complete video interested.

The device of the disclosed video frequency searching based on voice data of the present embodiment, according to a bit of video that user is interested In audio retrieval go out the whole complete video comprising similar audio content, overcome existing video frequency searching scheme not have It is based only on the deficiency that video sound intermediate frequency data carry out retrieving.

In a specific example, described first audio frequency and video separation module 12, including:

Segregant module, for the voice data separated in described video data library module 11 in the video data of storage；

Correspondingly, described voice data library module 13, for storing the voice data increasing mark.

In a specific example, described device also includes that Fig. 1 is unshowned with lower module:

First audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, to described audio database In module 13, the voice data of storage carries out audio-frequency fingerprint extraction；

First audio classification module, for the audio-frequency fingerprint stored based on described fingerprint database module, to described audio frequency The voice data of DBM 13 storage is classified.

Second audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, to described audio, video data The voice data of user's input that receiver module 14 receives or the audio frequency of described second audio frequency and video separation module 15 isolated Data carry out audio-frequency fingerprint extraction；

Second audio classification module is for the audio-frequency fingerprint extracted based on described second audio-frequency fingerprint extraction module, right The voice data of described user input or the voice data of described second audio frequency and video separation module 15 isolated are classified.

In a specific example, described voice data matching module 16, including:

Voice data to be retrieved determines subelement, for the voice data that obtains based on described second audio classification module The classification of the voice data of described voice data library module 13 storage that classification and described first audio classification module obtain, from The voice data of described voice data library module 13 storage determines each voice data to be retrieved；Described each voice data to be retrieved Classification identical with the classification of the voice data that described second audio classification module obtains；

As in figure 2 it is shown, the present embodiment discloses a kind of based on the open video frequency searching based on voice data of above-described embodiment The video retrieval method of device, the method can comprise the following steps 201～204:

201, audio, video data receiver module 14 receives voice data or the video data of user's input；

202, after described audio, video data receiver module 14 receives video data, the second audio frequency and video separation module 15 points Voice data in the video data that described audio, video data receiver module 14 receives；

203, user is inputted by voice data matching module 16 voice data or described second audio frequency and video separation module 15 The voice data of isolated mates with the voice data of storage in voice data library module 13, obtains one or more mesh Mark voice data；Described target audio data be with user input voice data match be stored in described audio database Voice data in module 13；

204, video frequency searching display module 17 is by target video data corresponding for the one or more target audio data Displaying to the user that, described target video data is the video data of storage in video data library module 11；Described audio database Voice data in module 13 is separated the video data in described video data library module 11 by the first audio frequency and video separation module 12 Obtain.

Visible, the video retrieval method of the device of the disclosed video frequency searching based on voice data of the present embodiment, by inciting somebody to action User inputs the voice data of storage in voice data corresponding to video segment interested and voice data library module and carries out Join, it is achieved for the retrieval of complete video, thus meet user for one section of video segment place complete video interested The demand of retrieval.

The video retrieval method of the device of the disclosed video frequency searching based on voice data of the present embodiment, feels emerging according to user Audio retrieval in a bit of video of interest goes out the whole complete video comprising similar audio content, overcomes existing regarding Frequently retrieval scheme is not based only on the deficiency that video sound intermediate frequency data carry out retrieving.

In a specific example, described audio, video data receiver module 14 receives the voice data of user's input or regards Frequency is according to afterwards, and described method also includes the following steps not shown in Fig. 2:

Second audio-frequency fingerprint extraction module, based on default audio-frequency fingerprint extracting rule, receives mould to described audio, video data The voice data of user's input or the voice data of described second audio frequency and video separation module 15 isolated that block 14 receives enter Row audio-frequency fingerprint extracts；

The audio-frequency fingerprint that second audio classification module is extracted based on described second audio-frequency fingerprint extraction module, to described use The voice data of family input or the voice data of described second audio frequency and video separation module 15 isolated are classified.

In a specific example, the voice data or described that user is inputted by described voice data matching module 16 In the voice data of two audio frequency and video separation module 15 isolateds and voice data library module 13, the voice data of storage is carried out Join, obtain one or more target audio data, including:

The classification of the voice data that described voice data matching module 16 obtains based on described second audio classification module with And first classification of voice data of described voice data library module 13 storage that obtain of audio classification module, from described audio frequency number Voice data according to library module 13 storage determines each voice data to be retrieved；The classification of described each voice data to be retrieved and institute The classification stating the voice data that the second audio classification module obtains is identical；

Between described voice data matching module 16 audio-frequency fingerprint based on index data base module stores and voice data Index relative, determine the audio-frequency fingerprint that each voice data to be retrieved is corresponding；

The audio-frequency fingerprint that described second audio-frequency fingerprint extraction module is obtained by described voice data matching module 16 is with described The audio-frequency fingerprint of voice data to be retrieved determines that the audio-frequency fingerprint that audio frequency each to be detected that subelement determines is corresponding mates, To one or more target audio data.

It will be understood by those skilled in the art that and each unit in embodiment can be combined into a unit, and in addition Multiple subelement can be put them into.Except at least some in such feature and/or process or unit is to arrange mutually Scold part, any combination can be used to all features disclosed in this specification and the disclosedest any method or to set Standby all processes or unit are combined.Unless expressly stated otherwise, each feature disclosed in this specification can be by carrying Alternative features for identical, equivalent or similar purpose replaces.

Although it will be appreciated by those of skill in the art that embodiments more described herein include being wrapped in other embodiments Some feature included rather than further feature, but the combination of the feature of different embodiment mean to be in the scope of the present invention it In and form different embodiments.

Although being described in conjunction with the accompanying embodiments of the present invention, but those skilled in the art can be without departing from this Making various modifications and variations in the case of bright spirit and scope, such amendment and modification each fall within by claims Within limited range.

Claims

1. the device of a video frequency searching based on voice data, it is characterised in that including:

Video data library module, is used for storing video data, and receive user and/or manager's input for more new video The video data of data base；

First audio frequency and video separation module, for the audio frequency number separated in described video data library module in the video data of storage According to；

Second audio frequency and video separation module, for after described audio, video data receiver module receives video data, separates described Voice data in the video data that audio, video data receiver module receives；

Voice data matching module, for voice data user inputted or the sound of the second audio frequency and video separation module isolated Frequency is mated according to the voice data of storage in described voice data library module, obtains one or more target sound frequency According to；Described target audio data are and being stored in described voice data library module of matching of voice data of user's input Voice data；

Video frequency searching display module, is used for target video data corresponding for the one or more target audio data to user Display, described target video data is the video data of storage in described video data library module.

Device the most according to claim 1, it is characterised in that

Described first audio frequency and video separation module, including:

Labeling submodule, for the voice data of described segregant module isolated is increased mark, described mark is used for referring to Show the corresponding relation between voice data and video data；

Device the most according to claim 1, it is characterised in that described device also includes:

First audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, to described voice data library module The voice data of middle storage carries out audio-frequency fingerprint extraction；

Index data library module, for storing audio-frequency fingerprint and the voice data that described first audio-frequency fingerprint extraction module extracts Between index relative；

First audio classification module, for the audio-frequency fingerprint stored based on described fingerprint database module, to described voice data The voice data of library module storage is classified.

Device the most according to claim 3, it is characterised in that described device also includes:

Second audio-frequency fingerprint extraction module, for based on default audio-frequency fingerprint extracting rule, receives described audio, video data The voice data of user's input or the voice data of described second audio frequency and video separation module isolated that module receives are carried out Audio-frequency fingerprint extracts；

Second audio classification module, for the audio-frequency fingerprint extracted based on described second audio-frequency fingerprint extraction module, to described The voice data of user's input or the voice data of described second audio frequency and video separation module isolated are classified.

Device the most according to claim 4, it is characterised in that described voice data matching module, including:

Voice data to be retrieved determines subelement, the classification of the voice data for obtaining based on described second audio classification module And the classification of the voice data of described audio database module stores that described first audio classification module obtains, from described sound The voice data of DBM storage frequently determines each voice data to be retrieved；The classification of described each voice data to be retrieved with The classification of the voice data that described second audio classification module obtains is identical；

The audio-frequency fingerprint of voice data to be retrieved determines subelement, refers to for audio frequency based on described index data base module stores Index relative between stricture of vagina and voice data, determines the audio-frequency fingerprint that each voice data to be retrieved is corresponding；

Audio-frequency fingerprint coupling subelement, to be checked with described for the audio-frequency fingerprint that described second audio-frequency fingerprint extraction module is obtained The audio-frequency fingerprint of rope voice data determines that the audio-frequency fingerprint that audio frequency each to be detected that subelement determines is corresponding mates, and obtains one Individual or multiple target audio data.

6. a video retrieval method based on the device described in any one of claim 1 to 5, it is characterised in that including:

After described audio, video data receiver module receives video data, the second audio frequency and video separation module separates described audio frequency and video Voice data in the video data that data reception module receives；

Voice data that user is inputted by voice data matching module or the sound of described second audio frequency and video separation module isolated Frequency is mated according to the voice data of storage in voice data library module, obtains one or more target audio data；Institute Stating target audio data is the audio frequency being stored in described voice data library module that the voice data with user's input matches Data；

Target video data corresponding for the one or more target audio data is displayed to the user that by video frequency searching display module, Described target video data is the video data of storage in video data library module；Audio frequency number in described voice data library module Obtain according to the video data separated in described video data library module by the first audio frequency and video separation module.

Method the most according to claim 6, it is characterised in that described audio, video data receiver module receives user's input After voice data or video data, described method also includes:

Described audio, video data receiver module, based on default audio-frequency fingerprint extracting rule, is connect by the second audio-frequency fingerprint extraction module The voice data of user's input or the voice data of described second audio frequency and video separation module isolated that receive carry out audio frequency and refer to Stricture of vagina extracts；

The audio-frequency fingerprint that second audio classification module is extracted based on described second audio-frequency fingerprint extraction module, defeated to described user The voice data entered or the voice data of described second audio frequency and video separation module isolated are classified.

Method the most according to claim 7, it is characterised in that the audio frequency that user is inputted by described voice data matching module The voice data of data or described second audio frequency and video separation module isolated and the audio frequency number of storage in voice data library module According to mating, obtain one or more target audio data, including:

The classification and first of the voice data that described voice data matching module obtains based on described second audio classification module The classification of the voice data of the described audio database module stores that audio classification module obtains, from described voice data library module The voice data of storage determines each voice data to be retrieved；The classification of described each voice data to be retrieved and described second audio frequency The classification of the voice data that sort module obtains is identical；

Index between described voice data matching module audio-frequency fingerprint based on index data base module stores and voice data Relation, determines the audio-frequency fingerprint that each voice data to be retrieved is corresponding；

The audio-frequency fingerprint that described second audio-frequency fingerprint extraction module is obtained by described voice data matching module is to be retrieved with described The audio-frequency fingerprint of voice data determines that the audio-frequency fingerprint that audio frequency each to be detected that subelement determines is corresponding mates, and obtains one Or multiple target audio data.