CN105005630A

CN105005630A - Method for multi-dimensional detection of specific targets from omnimedia

Info

Publication number: CN105005630A
Application number: CN201510515893.8A
Authority: CN
Inventors: 薛丹; 陈淑珊; 张松涛; 迟立明
Original assignee: REDASEN TECHNOLOGY (DALIAN) CO Ltd
Current assignee: REDASEN TECHNOLOGY (DALIAN) CO Ltd
Priority date: 2015-08-18
Filing date: 2015-08-18
Publication date: 2015-10-28
Anticipated expiration: 2035-08-18
Also published as: CN105005630B

Abstract

A method for multi-dimensional detection of specific targets from omnimedia specifically comprises steps as follows: data types of target reference sample data to be searched and recognized by search engines and detection and recognition engines are determined according to search condition samples; matched detection and recognition engines are selected according to the data types of the target reference sample data to be searched and recognized by the search engines and the detection and recognition engines; the result of each detection and recognition engine is analyzed, and search keywords and target characteristic quantity data are acquired and used as search conditions to be sent to the search engines for search; each related search engine searches data conforming to the conditions from input target search data and records data fragments and appearance positions; the search engines search different data to obtain different search results which are then summarized and output in a classified manner; the data recall ratio and precision ratio are increased with multiple search methods in different dimensions.

Description

The method of multi-dimensions test specific objective in full media

Technical field

The present invention relates to a kind of method detecting specific objective and occur from full media data, particularly relate to the method for multi-dimensions test specific objective in a kind of full media.

Background technology

Full media information comprises the data of the various ways such as word, voice, picture, video, specific target (people, thing) is looked for from these information, relating to the multinomial technology such as Application on Voiceprint Recognition, speech recognition, image recognition, video finger print, character analysis, is a complicated systems engineering.Further, because vocal print, voice, image recognition and video finger print technology are all in developing stage, a single technology cannot reach looking into of expection complete, look into the performance index requirements such as accurate.Vocal print, voice, image, video finger print, Word message in media, there is certain internal association, such as, video information generally comprises word, sound, video pictures, voice data comprises the sound of speaking that can be identified as word, and also comprise the biological characteristic etc. that speaker is different from other people, these information pass through content analysis, can set up certain relation, this just provides technical foundation for retrieving common objective in several ways.

Based on to vocal print, voice, image, video finger print, the studying for a long period of time of Word message, we find can by certain statistical study, to extract in these information two, three, multinomial between common trait or content is described, utilize a kind of result of retrieval mode, be extended to several mode coordinate retrieval, integrated retrieval result is provided.Such as, detect from vocal print, judge whom talker is, meanwhile, extract the frag info that this people speaks; Being aware of speaker is after whom, from speech recognition, can find the content relating to this speaker; Also picture and the relevant video segments of speaker can be inquired; Relevant Word message can also be found further.

Because speech recognition, image recognition, video fingerprint recognition adopt the technology such as DNN, HMM, their great majority are Corpus--based Method analytical models, and these technology all have certain defect, and monotechnics means can not reach the recognition effect of expection.In order to improve the performance of monotechnics, need the data volume in the sample pattern storehouse of significantly improving statistical study, but, the external factor such as accent, word speed, sex of neighbourhood noise, speaker affects the performance of voice and Application on Voiceprint Recognition, the illumination of shooting image and video, resolution, background complexity also have a significant impact image recognition, video fingerprint recognition, and monotechnics means all can not reach promising result, therefore, need to take multiple means to combine, improve the recall ratio identified.

Summary of the invention

The present invention is in several ways, retrieve the dissimilar proper vector of full media information, as: text key word, vocal print, voice content, image color, image, semantic etc., gather the every terms of information wanting query aim, the position of the information metadata fragment relevant to searched targets and record metadata can be obtained more comprehensively, the retrieval of various ways different dimensions, improves recall ratio and the precision ratio of data.

For achieving the above object, the technical solution adopted in the present invention is: the method for multi-dimensions test specific objective in full media, and concrete steps are as follows:

S1: according to search condition sample, as text key word, vocal print characteristic voice, content voice, feature image, feature video, determines search engine and detects the data type identifying the object reference sample data that engine will be retrieved and identify;

S2: according to search engine and detect the data type of object reference sample data identifying that engine will be retrieved and identify, select the detection identification engine of coupling, as keyword identification engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape recognition engine;

S3: analyze each result detecting identification engine, obtain search key, target signature amount data, send to search engine to retrieve as search condition;

S4: relevant each search engine retrieves qualified data from the target retrieval data of input, and record data slot and occur position;

S5: the data that each search engine retrieving is different, obtains different result for retrieval, is carried out by these result for retrieval gathering, output of classifying again.

Further, then in step S2, if any multiple different types of data search condition, then multiple detection is selected to identify engine.

Further, then in step S3, as the keyword containing more than 3 in search condition, then crucial phrase is resolved into further.

Further, then in step S3, if a certain data are without the need to enabling identification engines handle data corresponding with it, conditional value is arranged to null value.

Further, the target retrieval data in step S4, from database, data file, network flow-medium, comprising: text, voice, picture, video data.

Further, in step S5, result for retrieval is one or more in text, voice, picture, video, for the result for retrieval of voice, video, then extracts content association fragment or is recorded into a little and duration.

Further, in step s 5, result for retrieval is according to formula realization below:

S R = Σ_{i = 1}^{N} {SE}_{i} (Σ_{j = 1}^{N} Σ_{k = 1, p_{j} &NotEqual; N U L L}^{M} {RE}_{j} (k, p_{j}), q_{i})

Variable and symbol description in formula:

SR, result for retrieval; SE _i, search engine; I, engine is numbered, e.g., SE ₁represent vocal print search engine, SE ₂represent Voice search engine; N, represents the data type number in full media; RE _j, detect and identify engine, detect and identify that engine has the function of target detection and target identification, for different pieces of information, can have detection and Identification two functions simultaneously, also only can have simple function, different detection identifies engine process different pieces of information content; J, detects and identifies engine numbering, such as, and RE ₁represent Application on Voiceprint Recognition engine, identify whom talker is; RE ₂represent speech recognition engine, identify the content in voice and keyword; K, represents the catalogue number(Cat.No.) in Sample Storehouse, also represents specimen discerning cycle index; M, represents the sample number in Sample Storehouse, has how many samples to identify certification; p _j, search engine and detection identify the object reference sample data that engine will be retrieved and identify; q _i, search engine retrieving object, i.e. search engine searched targets information from which data.

As further, described detection identification engine has to detect and identifies and retrieval two layer functions, and the engine that different types of data object carries out processing is as different disposal dimension.

Patent of the present invention is owing to adopting above technical scheme, following technique effect can be obtained: in several ways, retrieve the dissimilar proper vector of full media information, as text key word, vocal print, voice content, image color, image, semantic etc., gather the every terms of information wanting query aim, the position of the information metadata fragment relevant to searched targets and record metadata can be obtained more comprehensively.The retrieval of various ways different dimensions, improves recall ratio and the precision ratio of data.Adopt the inventive method, the problem that the recall ratio of single identification engine is low can be made up, improve recall ratio and the precision ratio of full media retrieval, for different applied environments and sample retrieval, recall ratio can be improved 10%-30%.

Accompanying drawing explanation

The present invention has accompanying drawing 1 width:

Fig. 1 is process flow diagram of the present invention.

Specific embodiment

Below by specific embodiment, and by reference to the accompanying drawings, explanation that the technical solution of the present invention is further explained.

As shown in Figure 1, the present invention is to provide: a kind of method of multi-dimensions test specific objective in full media, concrete steps are as follows:

S1: according to search condition sample, as text key word, text sentence, vocal print characteristic voice (voice data that the voice of speaker or other objects that will retrieve send), content voice (mentioning the speech data of searched targets in voice), feature image (face, humanoid, body form, color, state of aggregation feature image), feature video is (a bit of containing face, humanoid, body form, color, the video data of state of aggregation feature), determine search engine and detect the data type identifying the object reference sample data that engine will be retrieved and identify, the search key of the similar ordinary search engine of search condition sample, the condition due to full media retrieval may be the one or more combination in text, voice (fragment), picture, video (fragment) form.Text can be " keyword " combinations of words; Also can be text sentence; Also can be Chinese with other language mix text.Voice (fragment) are input one section of voice datas, and acquiescence supports WAV form in the method for the invention, and the voice data of extended formatting can be changed, and the content of voice can be complete sentence, also can be phrase.Picture adopts basic BMP form, and the BMP that can be converted to of extended formatting uses, and will have the target person of retrieval, object in picture, lowest resolution 32X32, color value is not limit.Video (fragment) form is based on AVI, and extended formatting can be changed, comprise to retrieve people, target, the target resolution that retrieve is not less than 32X32 pixel.

S2: according to search engine and detect the data type of object reference sample data identifying that engine will be retrieved and identify, select the detection identification engine of coupling, as keyword identification engine, Application on Voiceprint Recognition engine, voice semantics recognition engine, shape recognition engine; RE in Fig. 1 ₁rE _nrepresent different detection identification engines, detect and identify that engine can detect or identify the features such as text key word, vocal print, voice semanteme, video finger print, shape, object color, state of aggregation.

S3: analyze each result detecting identification engine, obtain search key, target signature amount data, send to search engine to retrieve as search condition; Detect and identify engine result:

Keyword spotting identification engine, extracts keyword in text sentence;

Vocal print detects and identifies engine, and identifying speaker is whom, and the I D of speaker or name are referred to as keyword and vocal print eigenvector for search;

Color detection identification engine, judges the body color of target in picture, and color value is used for search as keyword;

SHAPE DETECTION identification engine, judges the shape of target in picture, and shape is used for search as keyword and morphological feature vector;

Social event (state of aggregation) detects and identifies engine, judges the state of aggregation of object target in picture, and state of aggregation recognition result is used for search as keyword and state of aggregation eigenvector.

S4: relevant each search engine retrieves qualified data from the target retrieval data of input, and record data slot and occur position; Searched targets data are the metadata set such as text, voice, picture, video, can from database, data file, data stream, and the present invention retrieves according to search condition sample exactly from these data.

Video and Streaming Media support AVI, MPEG-1/2/4, H.263/264/265, and M-JPEG, MP4;

Speech data is supported: WAV, MP3, PCM;

Picture is supported: BMP, JPG/JPEG, G i f, T i ff, PNG, P I C.

S5: the data that each search engine retrieving is different, obtains different result for retrieval, is carried out by these result for retrieval gathering, output of classifying again.The present invention is directed to the testing result of full media information, output content is as follows:

Text data: text fragments;

Sound, video data: what target occurred enters a time, duration;

Picture file: list of file names, store path that target occurs.

Again in step S2, if any multiple different types of data search condition, multiple detection is then selected to identify engine, again in step S3, search condition is complicated, then resolve into crucial phrase further, again in step S3, correlated condition in search condition can not be null value (whether so describe accurate herein), if a certain data are without the need to enabling identification engines handle data corresponding with it, conditional value is arranged to null value, target retrieval data in step S4 are from database, data file, network flow-medium, comprise: text, voice, picture, video data, in step S5, result for retrieval is text, voice, picture, one or more in video, for voice, the result for retrieval of video, extract content association fragment again or be recorded into a little and duration, in step s 5, result for retrieval is according to formula realization below:

S R = Σ_{i = 1}^{N} {SE}_{i} (Σ_{j = 1}^{N} Σ_{k = 1, p_{j} &NotEqual; N U L L}^{M} {RE}_{j} (k, p_{j}), q_{i})

Variable and symbol description: SR in formula, result for retrieval; SE _i, search engine, different search engine obtains result from detection and indentification engine, retrieves specific objective data in media data; I, engine is numbered, e.g., SE ₁represent vocal print search engine, SE ₂represent Voice search engine; N, represents the data type number in full media; RE _j, detect and identify engine, detect and identify that engine has the function of target detection and target identification, for different pieces of information, can have detection and Identification two functions simultaneously, also only can have simple function, different detection identifies engine process different pieces of information content; J, detects and identifies engine numbering, such as, and RE ₁represent Application on Voiceprint Recognition engine, identify whom talker is; RE ₂represent speech recognition engine, identify the content in voice and keyword; K, represents the catalogue number(Cat.No.) in Sample Storehouse, also represents specimen discerning cycle index; M, represents the sample number in Sample Storehouse, has how many samples to identify certification; p _j, search engine and detection identify the object reference sample data that engine will be retrieved and identify; q _i, search engine retrieving object, i.e. search engine searched targets information from which data.

Described detection identification engine has detection and identifies and retrieval two layer functions, and the engine that different types of data object carries out processing is as different disposal dimension.

Because full media information is as complicated in word, voice, picture, video data structure, contain much information, diversification of forms, single data retrieval method can not obtain satisfied effect, and recall ratio and the precision ratio of data are relatively low.Especially voice, image internal characteristics type complexity, to dissimilar characteristic key, the different result obtained, such as, to the retrieval of speech data vocal print feature, can judge whom speaker is, to the identification of voice semantic content, the content text etc. of speaking can be obtained.The invention provides a kind of dynamically, the method for the full media data information of detection of various dimensions, by judging the data type of full media information, the various retrieval that dynamic load is mated with data type and identification engine, from different directions full media data is detected, obtain the metadata clips and data storage location that are associated with query aim.Full media data text, voice, picture, image contain the information such as text key word, vocal print, voice content, languages, image, semantic, color of image, target shape, goal behavior, target state of aggregation, and different pieces of information process needs specific engine.The engines such as text key word retrieval, Application on Voiceprint Recognition, voice content identification, languages identification, image, semantic analysis, color of image identification, image object detection, target shape identification, goal behavior identification, the identification of target state of aggregation identify according to detection by multi-dimensions test exactly, it is two-layer to retrieve, the engine that different data objects carries out processing is as different disposal dimension, different according to detected object data type, detect from different dimensions, identify, retrieve specific objective.

The above; be only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to according to technical scheme of the present invention and inventive concept thereof and replace or change, all should be encompassed within protection scope of the present invention.

Claims

1. the method for multi-dimensions test specific objective in full media, it is characterized in that, concrete steps are as follows:

S1: according to search condition sample, determines search engine and detects the data type identifying the object reference sample data that engine will be retrieved and identify;

S2: according to search engine and detect the data type of object reference sample data identifying that engine will be retrieved and identify, select the detection identification engine of coupling;

2. the method for multi-dimensions test specific objective in full media according to claim 1, is characterized in that, then in step S2, if any multiple different types of data search condition, then selects multiple detection to identify engine.

3. the method for multi-dimensions test specific objective in full media according to claim 1, is characterized in that, then in step S3, as the keyword containing more than 3 in search condition, then resolves into crucial phrase further.

4. the method for multi-dimensions test specific objective in full media according to claim 3, is characterized in that, then in step S3, if a certain data are without the need to enabling identification engines handle data corresponding with it, conditional value is arranged to null value.

5. the method for multi-dimensions test specific objective in the full media according to any one of claim 1-4, it is characterized in that, target retrieval data in step S4, from database, data file, network flow-medium, comprising: text, voice, picture, video data.

6. the method for multi-dimensions test specific objective in full media according to claim 5, it is characterized in that, in step S5, result for retrieval is one or more in text, voice, picture, video, for the result for retrieval of voice, video, then extracts content association fragment or is recorded into a little and duration.

7. the method for multi-dimensions test specific objective in full media according to claim 6, is characterized in that, in step s 5, result for retrieval realizes according to formula below:

S R = Σ_{i = 1}^{N} {SE}_{i} (Σ_{j = 1}^{N} Σ_{k = 1, p_{j} &NotEqual; N U L L}^{M} {RE}_{j} (k, p_{j}), q_{i})

Variable and symbol description: SR in formula, result for retrieval; SE _i, search engine; I, engine is numbered; N, represents the data type number in full media; RE _j, detect and identify engine; J, detects and identifies engine numbering; K, represents the catalogue number(Cat.No.) in Sample Storehouse, also represents specimen discerning cycle index; M, represents the sample number in Sample Storehouse; p _j, search engine and detection identify the object reference sample data that engine will be retrieved and identify; q _i, search engine retrieving object.

8. in the full media according to claim 2 or 4, the method for multi-dimensions test specific objective, is characterized in that, described detection identification engine has detection and identifies and retrieval two layer functions, and the engine that different types of data object carries out processing is as different disposal dimension.