CN105760472A - Video retrieval method and system - Google Patents

Video retrieval method and system Download PDF

Info

Publication number
CN105760472A
CN105760472A CN201610084093.XA CN201610084093A CN105760472A CN 105760472 A CN105760472 A CN 105760472A CN 201610084093 A CN201610084093 A CN 201610084093A CN 105760472 A CN105760472 A CN 105760472A
Authority
CN
China
Prior art keywords
facial image
camera lens
image
similarity
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610084093.XA
Other languages
Chinese (zh)
Inventor
杨颖�
李丹阳
贾静丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201610084093.XA priority Critical patent/CN105760472A/en
Publication of CN105760472A publication Critical patent/CN105760472A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video retrieval method and system. The method comprises the steps that when retrieval keywords are received, a video to be retrieved is divided into multiple shots; previous N frame images of the shots are extracted, and whether human face images exist in the extracted images or not is detected, wherein N is an integer larger than or equal to 1; all the human face images are detected in the shots, in which the human face images exist, of the former N frame images; according to the retrieval keywords, a sample set corresponding to the retrieval keywords is compared with the human face images, and the similarity between all the human face images and the sample set is calculated; the human face images with the similarity larger than a first preset value are integrated in the shots to which the human face images belong, and all the integrated shots are connected to obtain a target video. The method can solve the problem that in the prior art, interested parts are difficult to search for, the video retrieval speed is increased, and therefore viewing experience of users is promoted.

Description

Video retrieval method and system
Technical field
The present invention relates to multimedia technology field, particularly relate to a kind of video retrieval method and system.
Background technology
2010, the intelligent television plan of Google company, formally open the intellectualization times of TV, the demand of video is also just developed by people towards aspect personalized, hommization.
In daily video search, user is likely to only interested in the video segment of someone or a few individual, even and if most of video resource comprises video segment interested, but video resource itself is generally of the long period, thus for checking that video segment interested has to check whole video resource, or owing to location inaccuracy misses some video segments interested.Thus, causing user's video retrieval difficulty to part interested, so that user is also relatively long in the time that video is retrieved, what greatly reduce user views and admires experience.
Summary of the invention
For defect of the prior art, the present invention provides a kind of video retrieval method and system, to solve problem in prior art, part lookup interested is difficult.
First aspect, the present invention provides a kind of video retrieval method, including:
When receiving search key, it is multiple camera lens by Video segmentation to be retrieved;
Extracting the front N two field picture of described camera lens, and detect whether there is facial image in the image extracted, N is the integer be more than or equal to 1;
Exist in the camera lens of facial image at front N two field picture and detect face images;
According to described search key, sample set corresponding for described search key and described facial image are contrasted, calculates the similarity of each facial image and described sample set;
Described similarity is integrated more than the facial image of the first preset value in affiliated camera lens, and each camera lens after integrating is connected, to obtain target video.
Preferably, described is multiple camera lens by Video segmentation to be retrieved, including:
Extract the visual signature of video to be retrieved;
According to the similarity between described visual feature measurement adjacent image;
When described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
Preferably, described exist in the camera lens of facial image at front N two field picture detect face images, including:
Adopt cascade classifier to exist in the camera lens of facial image at described front N two field picture and detect face images.
Preferably, described according to described search key, sample set corresponding for described search key and described facial image are contrasted, calculates the similarity of each facial image and described sample set, including:
According to described search key, extracting the sample set relevant to described search key in face sample database, described sample set is multiple face sample images of same personage;
It is by the linear combination of described face sample image by described graphical representation;
The similarity of this image of the coefficient calculations according to described linear combination and described sample set.
Preferably, described described similarity is integrated in affiliated camera lens more than the facial image of the first preset value, including:
Described similarity is clustered in the camera lens belonging to this facial image more than the facial image of the first preset value;
By temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
Second aspect, the present invention provides a kind of video frequency search system, including:
Video lens segmentation module, is used for when receiving search key, is multiple camera lens by Video segmentation to be retrieved;
Shot detection module, for extracting the front N two field picture of described camera lens, and detects whether there is facial image in the image extracted, and N is the integer be more than or equal to 1;
Facial image detection module, detects face images for existing in the camera lens of facial image at front N two field picture;
Facial image retrieval module, for according to described search key, contrasting sample set corresponding for described search key and described facial image, calculate the similarity of each facial image and described sample set;
Target video generation module, for described similarity being integrated in affiliated camera lens more than the facial image of the first preset value, and connects each camera lens after integrating, to obtain target video.
Preferably, described video lens segmentation module, specifically for
Extract the visual signature of video to be retrieved;
According to the similarity between described visual feature measurement adjacent image;
When described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
Preferably, described facial image detection module, specifically for
Adopt cascade classifier to exist in the camera lens of facial image at described front N two field picture and detect face images.
Preferably, described facial image retrieval module, specifically for
According to described search key, extracting the sample set relevant to described search key in face sample database, described sample set is multiple face sample images of same personage;
It is by the linear combination of described face sample image by described graphical representation;
The similarity of this image of the coefficient calculations according to described linear combination and described sample set.
Preferably, described target video generation module, specifically for
Described similarity is clustered in the camera lens belonging to this facial image more than the facial image of the first preset value;
By temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
As shown from the above technical solution, the video retrieval method of the present invention and system, by being multiple camera lens by Video segmentation to be retrieved, and the camera lens that front N frame exists facial image carries out facial image detection, further according to search key, calculate the similarity of the sample set corresponding with described search key and facial image, described similarity is integrated more than the facial image of the first preset value in the camera lens belonging to this facial image, finally the camera lens after integration is connected, obtain target video.Thus, being effectively increased video frequency searching speed, what promote user views and admires experience.
Accompanying drawing explanation
The schematic flow sheet of the video retrieval method that Fig. 1 provides for one embodiment of the invention;
The schematic flow sheet of the video retrieval method that Fig. 2 provides for another embodiment of the present invention;
The schematic diagram of each class template that Fig. 3 provides for the embodiment of the present invention;
The structural representation of the video frequency search system that Fig. 4 provides for one embodiment of the invention.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is carried out clear, complete description, obviously, described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on embodiments of the invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
Fig. 1 illustrates the schematic flow sheet of the video retrieval method that one embodiment of the invention provides, as it is shown in figure 1, the video retrieval method of the present embodiment is as described below.
101, when receiving search key, it is multiple camera lens by Video segmentation to be retrieved.
It will be appreciated that INVENTIONBroadcast video generally includes multiple camera lens in manufacturing process, scene and the content of each camera lens are continuous.Therefore, can, by judging that the difference between consecutive image identifies the border of camera lens, be thus multiple independent camera lenses by Video segmentation to be retrieved.
In actual applications, above-mentioned step 101 also includes not shown sub-step 1011-sub-step 1013.
1011, the visual signature of video to be retrieved is extracted.
For example, the color histogram of video to be retrieved or the pixel visual signature as video to be retrieved can be extracted.
1012, according to the similarity between described visual feature measurement adjacent image.
For example, it is possible to the similarity of position is as the similarity between adjacent image, and the similarity of position is represented by:
S = Σ i - 1 N W i × [ ( x i ′ - x i ) + ( y i ′ - y i ) ] ;
Wherein, (Xi, Yi) represent the coordinate of point in the i-th two field picture, (Xi', Yi') represent that the i-th two field picture believes the coordinate of the point in two field picture, WiRepresent the weight of this point.
1013, when described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
By above-mentioned feasible, when above-mentioned location similarity S is less than the second preset value, using the position at this place as the position of shot segmentation.The second above-mentioned preset value is an empirical value, and the concrete value of the second preset value is not defined by the present embodiment.
Adopt and in manner just described video to be retrieved is traveled through, be multiple camera lens by Video segmentation to be retrieved.
102, extract the front N two field picture of described camera lens, and detect whether there is facial image in the image extracted.
Wherein, N is the integer be more than or equal to 1.The content that image in INVENTIONBroadcast video includes is complicated and has multiformity, for improving recall precision, detects using the front N two field picture of each camera lens as the key frame of this camera lens, if front N two field picture includes facial image, is then retained by this camera lens, as camera lens to be retrieved;If front N two field picture does not include facial image, then this camera lens is given up, no longer this camera lens is carried out detection further and retrieval.
103, exist at front N two field picture in the camera lens of facial image and detect face images.
In actual applications, cascade classifier can be adopted to detect all of facial image in this camera lens in above-mentioned camera lens.
Specifically, A Weak Classifier can carry out cascade, then B weak cascade classifier is cascaded as a strong classifier, then C strong classifier is carried out cascade obtain above-mentioned cascade classifier.When being carried out the detection of facial image frame by above-mentioned cascade classifier, first detection image is passed through first strong classifier, if it is facial image that this strong classifier judges this detection image, then by this detection image again through second strong classifier, by that analogy, until by whole strong classifiers.As long as detection image is non-face image to have a Weak Classifier to judge in detection process, then no longer carries out follow-up detection process, and this detection image is categorized as non-face image.Thus, substantial amounts of non-detection target can be eliminated, filter most of non-face picture frame, be greatly improved detection speed.
104, according to described search key, sample set corresponding for described search key and described facial image are contrasted, calculates the similarity of each facial image and described sample set.
Should be noted that the set that above-mentioned sample set is the multiple images including certain personage, the image of sample set is the same personage character image in different illumination, different angles or different facial expression.
In actual applications, the image in sample set generally goes through denoising, and zooms to 100 × 100 pixel sizes, and the characters name in usable image is classified.When carrying out video frequency searching, it is possible to characters name is that search key extracts sample set, then calculates the similarity of facial image and sample set.
105, described similarity is integrated more than the facial image of the first preset value in affiliated camera lens, and the camera lens after integrating is connected, to obtain target video fragment.
Specifically, above-mentioned step 105 includes not shown sub-step 1051 and sub-step 1052.
1051, described similarity is clustered in the camera lens belonging to this facial image more than the facial image of the first preset value.
For example, optional suitable threshold value, similarity is clustered more than the facial image of this threshold value, then the facial image that an apoplexy due to endogenous wind includes can be that scene is similar, and content is close and continuous print image.Therefore, above-mentioned facial image is first carried out clustering processing, the efficiency of image integration can be effectively improved.
1052, by temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
It will be appreciated that each image in video all correspond to the temporal information on unique time shaft and acoustic information.Therefore, temporal information corresponding with this image for image after cluster and acoustic information are together in parallel, could recovering the video content that each fragment of original video is identical, thus generation camera lens is attached ultimately generating again and only comprises the target video relevant to search key.
The video retrieval method of the present embodiment, by being multiple camera lens by Video segmentation to be retrieved, and the camera lens that front N frame exists facial image carries out facial image detection, further according to search key, calculate the similarity of the sample set corresponding with described search key and facial image, described similarity is integrated more than the facial image of the first preset value in the camera lens belonging to this facial image, finally the camera lens after integration is connected, obtain target video.Thus, being effectively increased video frequency searching speed, what promote user views and admires experience.
Fig. 2 illustrates the schematic flow sheet of the video retrieval method that one embodiment of the invention provides, as in figure 2 it is shown, the video retrieval method of the present embodiment is as described below.
201, when receiving search key, it is multiple camera lens by Video segmentation to be retrieved.
202, extract the front N two field picture of described camera lens, and detect whether there is facial image in the image extracted.
203, adopt cascade classifier to exist in the camera lens of facial image at described front N two field picture and detect face images.
In a kind of enforceable mode, the cascade classifier of the present embodiment, can be trained in the following way generating.
First, choose N number of image as training sample, process through smoothing denoising, be scaled to the image of 24 × 24 sizes.Then calculate the face characteristic of these samples, construct Weak Classifier.
Specifically, each class template as shown in Figure 3 can be adopted to obtain the rectangular characteristic of facial image.Should be noted that the scalable one-tenth arbitrary dimension detection window of above-mentioned template, obtain the rectangular characteristic of facial image.The detection window number that the template of such as s × t yardstick obtains is:
( [ 24 s ] + [ 23 s ] + ... + [ 1 s ] ) × ( [ 24 t ] + [ 23 t ] + ... + [ 1 t ] )
Wherein, [] is for rounding symbol;
Further, the eigenvalue of each detection window is calculated.The eigenvalue of detection window can for all pixels comprised in black rectangle and all pixels comprised in deducting white rectangle and.Integrogram can be adopted to calculate the eigenvalue of each detection window.
For example, if A is (m, n) representing the cumulative sum of all pixels in top and left of this point in integral image, (m, n) represents the cumulative sum of line direction to S, i (m, n) represent this region pixel and, then line by line image is scanned, recursive calculation S (m, n) and A (m, n) can obtain:
S (m, n)=S (m, n-1)+i (m, n)
A (m, n)=A (m-1, n)+S (m, n)
Thus, the eigenvalue of each detection window is calculated.
Due to different sample Xi(i=1,2 ... N) is at different detection window KjEigenvalue f in (j=1,2 ... M)j(Xi) (i=1,2 ..., Nj=1,2 ..., M) different, these eigenvalues are chosen suitable threshold value, is used for judging that image is facial image or non-face image.
Such as, detection window KjCorresponding Weak Classifier can be whj(X),
whj(Xi)=1 represents that this image is facial image, otherwise, for non-face image.P indicates the direction of the sign of inequality, and value is positive and negative 1, and when jth feature meansigma methods in all samples is less than threshold θjTime, p is-1, and otherwise p is 1;θjFor all samples optimal threshold in jth eigenvalue.
Wherein, threshold θjAdopt and determine with the following method: for each sample X of each feature calculationiEigenvalue fj(Xi), eigenvalue is ranked up from small to large, obtains the ratio T of all face sample images+And the ratio T of all non-face sample images-;Calculate again at fj(Xi) before the ratio S of all face sample images+And the ratio S of all non-face sample images-, then current fj(Xi) classification error rate e is:
E=min ((S++(T--S-)),(S-+(T+-S+)))
Obtained from above to the optimal threshold θ making error rate e minimumj, j and fj(Xi) value.According to classification error rate e, filter out the less feature of classification error rate as Weak Classifier.
Then, the strong classifier of Face datection is constructed.Each sample can be composed identical weights, with first Weak Classifier to N number of sample classification, the sample of misclassification is increased weight, point to sample reduce weight, second Weak Classifier of training in the sample composing new weight, increases weight by the sample of misclassification, point to sample reduce weight, after iteration P time, generate P Weak Classifier.
Again by P Weak Classifier according to certain weighted superposition, obtain strong classifier;Find out several strong classifiers, constitute cascade classifier, be used for detecting above-mentioned facial image frame.
204, according to described search key, face sample database extracts the sample set relevant to described search key.
Described sample set is multiple face sample images of same personage.
In actual applications, face sample image can be classified according to personage.Such as, if there being k personage, then sample set can be categorized as k class, and the sample set of each class is represented by [d11,d12,d13,...d1n,d21,d22,d23,...d2n,…di1,di2,di3,…din,…dk1,dk2,dk3,...dkn], then each column vector diji=(1,2 ... k), j=(1,2 ... the sample set of a personage n) can be represented.Further, Di=[d is madei1,di2,di3,…din], then D=(D1,D2,D3…Dk) for the face sample image database of k sample set composition.
205, it is by the linear combination of described face sample image by described graphical representation.
By above-mentioned sample set it can be seen that each above-mentioned image is represented by sample set from the linear combination of face sample image.Such as, Y=DA can be adopted to represent above-mentioned any image frame.Wherein, A is sparse coefficient matrix.When a certain personage that image is in face sample image database, it is represented by Y=ai1×di1+ai2×di2+…+ain×din, wherein ai1,ai2,…ain, for sparse coefficient, it is a train value of sparse coefficient matrix A.
206, the similarity according to this image of the coefficient calculations of described linear combination Yu described sample set.
Specifically, above-mentioned sparse coefficient and Σ A can be adoptedi=ai1+ai2+…+ainRepresent the similarity of this picture frame Y and described sample set.
207, described similarity is integrated more than the facial image of the first preset value in affiliated camera lens, and each camera lens after integrating is connected, to obtain target video.
For example, if the number of the face sample image in sample set is n, then at sparse coefficient and Σ Ai> 0.8n time, using this image as the facial image being retrieved.Again the facial image obtained is integrated in camera lens belonging to this facial image, each camera lens after integrating is concatenated, finally gives target video.
The video retrieval method of the present embodiment, by being multiple camera lens by Video segmentation to be retrieved, and the camera lens that front N frame exists facial image carries out facial image detection, further according to search key, calculate the similarity of the sample set corresponding with described search key and facial image, described similarity is integrated more than the facial image of the first preset value in the camera lens belonging to this facial image, finally the camera lens after integration is connected, obtain target video.Thus, being effectively increased video frequency searching speed, what promote user views and admires experience.
Fig. 4 illustrates the video frequency search system that one embodiment of the invention provides, as described in Figure 4, the video frequency search system of the present embodiment, including: video lens segmentation module 41, shot detection module 42, facial image detection module 43, facial image retrieval module 44 and target video generation module 45.
Video lens segmentation module 41, is used for when receiving search key, is multiple camera lens by Video segmentation to be retrieved;
Shot detection module 42, for extracting the front N two field picture of described camera lens, and detects whether there is facial image in the image extracted, and N is the integer be more than or equal to 1;
Facial image detection module 43, detects face images for existing in the camera lens of facial image at front N two field picture;
Facial image retrieval module 44, for according to described search key, contrasting sample set corresponding for described search key and described facial image, calculate the similarity of each facial image and described sample set;
Target video generation module 45, for described similarity being integrated in affiliated camera lens more than the facial image of the first preset value, and connects each camera lens after integrating, to obtain target video.
Preferably, described video lens segmentation module 41, specifically for extracting the visual signature of video to be retrieved;According to the similarity between described visual feature measurement adjacent image;When described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
Preferably, described facial image detection module 43, detect face images specifically for adopting cascade classifier to exist in the camera lens of facial image at described front N two field picture.
Preferably, described facial image retrieval module 44, specifically for according to described search key, extracting the sample set relevant to described search key in face sample database, described sample set is multiple face sample images of same personage;It is by the linear combination of described face sample image by described graphical representation;The similarity of this image of the coefficient calculations according to described linear combination and described sample set.
Preferably, described target video generation module 45, specifically for clustering described similarity more than the facial image of the first preset value in the camera lens belonging to this facial image;By temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
The video frequency search system of the present embodiment, it is possible to for performing the technical scheme of embodiment of the method shown in above-mentioned Fig. 1 or Fig. 2, it is similar with technique effect that it realizes principle, repeats no more herein.
The video frequency search system of the present embodiment, by being multiple camera lens by Video segmentation to be retrieved, and the camera lens that front N frame exists facial image carries out facial image detection, further according to search key, calculate the similarity of the sample set corresponding with described search key and facial image, described similarity is integrated more than the facial image of the first preset value in the camera lens belonging to this facial image, finally the camera lens after integration is connected, obtain target video.Thus, being effectively increased video frequency searching speed, what promote user views and admires experience.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, it is not intended to limit;Although the present invention being described in detail with reference to foregoing embodiments, it will be understood by those within the art that: the technical scheme described in foregoing embodiments still can be modified by it, or wherein some or all of technical characteristic is carried out equivalent replacement;And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of the claims in the present invention.

Claims (10)

1. a video retrieval method, it is characterised in that described method includes:
When receiving search key, it is multiple camera lens by Video segmentation to be retrieved;
Extracting the front N two field picture of described camera lens, and detect whether there is facial image in the image extracted, N is the integer be more than or equal to 1;
Exist in the camera lens of facial image at front N two field picture and detect face images;
According to described search key, sample set corresponding for described search key and described facial image are contrasted, calculates the similarity of each facial image and described sample set;
Described similarity is integrated more than the facial image of the first preset value in affiliated camera lens, and each camera lens after integrating is connected, to obtain target video.
2. method according to claim 1, it is characterised in that described is multiple camera lens by Video segmentation to be retrieved, including:
Extract the visual signature of video to be retrieved;
According to the similarity between described visual feature measurement adjacent image;
When described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
3. method according to claim 1, it is characterised in that described exist in the camera lens of facial image at front N two field picture detect face images, including:
Adopt cascade classifier to exist in the camera lens of facial image at described front N two field picture and detect face images.
4. method according to claim 1, it is characterised in that described according to described search key, contrasts sample set corresponding for described search key and described facial image, calculates the similarity of each facial image and described sample set, including:
According to described search key, extracting the sample set relevant to described search key in face sample database, described sample set is multiple face sample images of same personage;
It is by the linear combination of described face sample image by described graphical representation;
The similarity of this image of the coefficient calculations according to described linear combination and described sample set.
5. method according to claim 1, it is characterised in that described described similarity is integrated in affiliated camera lens more than the facial image of the first preset value, including:
Described similarity is clustered in the camera lens belonging to this facial image more than the facial image of the first preset value;
By temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
6. a video frequency search system, it is characterised in that described system includes:
Video lens segmentation module, is used for when receiving search key, is multiple camera lens by Video segmentation to be retrieved;
Shot detection module, for extracting the front N two field picture of described camera lens, and detects whether there is facial image in the image extracted, and N is the integer be more than or equal to 1;
Facial image detection module, detects face images for existing in the camera lens of facial image at front N two field picture;
Facial image retrieval module, for according to described search key, contrasting sample set corresponding for described search key and described facial image, calculate the similarity of each facial image and described sample set;
Target video generation module, for described similarity being integrated in affiliated camera lens more than the facial image of the first preset value, and connects each camera lens after integrating, to obtain target video.
7. system according to claim 6, it is characterised in that described video lens segmentation module, specifically for
Extract the visual signature of video to be retrieved;
According to the similarity between described visual feature measurement adjacent image;
When described similarity is less than the second preset value, described adjacent image is divided into two camera lenses.
8. system according to claim 6, it is characterised in that described facial image detection module, specifically for
Adopt cascade classifier to exist in the camera lens of facial image at described front N two field picture and detect face images.
9. system according to claim 6, it is characterised in that described facial image retrieval module, specifically for
According to described search key, extracting the sample set relevant to described search key in face sample database, described sample set is multiple face sample images of same personage;
It is by the linear combination of described face sample image by described graphical representation;
The similarity of this image of the coefficient calculations according to described linear combination and described sample set.
10. system according to claim 6, it is characterised in that described target video generation module, specifically for
Described similarity is clustered in the camera lens belonging to this facial image more than the facial image of the first preset value;
By temporal information corresponding with this facial image for the facial image of cluster and acoustic information association, to generate the camera lens including this facial image.
CN201610084093.XA 2016-02-06 2016-02-06 Video retrieval method and system Pending CN105760472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610084093.XA CN105760472A (en) 2016-02-06 2016-02-06 Video retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610084093.XA CN105760472A (en) 2016-02-06 2016-02-06 Video retrieval method and system

Publications (1)

Publication Number Publication Date
CN105760472A true CN105760472A (en) 2016-07-13

Family

ID=56330041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610084093.XA Pending CN105760472A (en) 2016-02-06 2016-02-06 Video retrieval method and system

Country Status (1)

Country Link
CN (1) CN105760472A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169071A (en) * 2017-05-08 2017-09-15 浙江大华技术股份有限公司 A kind of video searching method and device
WO2018033152A1 (en) * 2016-08-19 2018-02-22 中兴通讯股份有限公司 Video playing method and apparatus
CN107948730A (en) * 2017-10-30 2018-04-20 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on picture generation video
CN108764067A (en) * 2018-05-08 2018-11-06 北京大米科技有限公司 Video intercepting method, terminal, equipment and readable medium based on recognition of face
CN108881813A (en) * 2017-07-20 2018-11-23 北京旷视科技有限公司 A kind of video data handling procedure and device, monitoring system
CN109034174A (en) * 2017-06-08 2018-12-18 北京君正集成电路股份有限公司 A kind of cascade classifier training method and device
CN110545443A (en) * 2018-05-29 2019-12-06 优酷网络技术(北京)有限公司 Video clip acquisition method and device
CN110598048A (en) * 2018-05-25 2019-12-20 北京中科寒武纪科技有限公司 Video retrieval method and video retrieval mapping relation generation method and device
CN110866148A (en) * 2018-08-28 2020-03-06 富士施乐株式会社 Information processing system, information processing apparatus, and storage medium
CN113837022A (en) * 2021-09-02 2021-12-24 北京新橙智慧科技发展有限公司 Method for rapidly searching video pedestrian
US11995556B2 (en) 2018-05-18 2024-05-28 Cambricon Technologies Corporation Limited Video retrieval method, and method and apparatus for generating video retrieval mapping relationship

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040218814A1 (en) * 1993-10-20 2004-11-04 Takafumi Miyatake Video retrieval method and apparatus
CN101650740A (en) * 2009-08-27 2010-02-17 中国科学技术大学 Method and device for detecting television advertisements
CN103530652A (en) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 Face clustering based video categorization method and retrieval method as well as systems thereof
CN103761284A (en) * 2014-01-13 2014-04-30 中国农业大学 Video retrieval method and video retrieval system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040218814A1 (en) * 1993-10-20 2004-11-04 Takafumi Miyatake Video retrieval method and apparatus
CN101650740A (en) * 2009-08-27 2010-02-17 中国科学技术大学 Method and device for detecting television advertisements
CN103530652A (en) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 Face clustering based video categorization method and retrieval method as well as systems thereof
CN103761284A (en) * 2014-01-13 2014-04-30 中国农业大学 Video retrieval method and video retrieval system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
何薇: "基于多关键帧的相似镜头检索技术研究", 《万方数据》 *
刘博等: "快速可扩展的子空间聚类算法", 《模式识别与人工智能》 *
卓婧: "影视镜头拍摄和剪辑技术在动画镜头语言中的影响与应用", 《品艺长廊》 *
彭宇新等: "一种通过视频片段进行视频检索的方法", 《软件学报》 *
简彩仁等: "基于投影最小二乘回归子空间分割的基因表达数据聚类", 《模式识别与人工智能》 *
陈立珍等: "基于子空间增量学习的视频中人脸图像检索", 《计算机辅助设计与图形学学报》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018033152A1 (en) * 2016-08-19 2018-02-22 中兴通讯股份有限公司 Video playing method and apparatus
CN107770528A (en) * 2016-08-19 2018-03-06 中兴通讯股份有限公司 Video broadcasting method and device
CN107770528B (en) * 2016-08-19 2023-08-25 中兴通讯股份有限公司 Video playing method and device
CN107169071A (en) * 2017-05-08 2017-09-15 浙江大华技术股份有限公司 A kind of video searching method and device
CN107169071B (en) * 2017-05-08 2020-02-14 浙江大华技术股份有限公司 Video searching method and device
CN109034174A (en) * 2017-06-08 2018-12-18 北京君正集成电路股份有限公司 A kind of cascade classifier training method and device
CN109034174B (en) * 2017-06-08 2021-07-09 北京君正集成电路股份有限公司 Cascade classifier training method and device
CN108881813A (en) * 2017-07-20 2018-11-23 北京旷视科技有限公司 A kind of video data handling procedure and device, monitoring system
CN107948730B (en) * 2017-10-30 2020-11-20 百度在线网络技术(北京)有限公司 Method, device and equipment for generating video based on picture and storage medium
CN107948730A (en) * 2017-10-30 2018-04-20 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on picture generation video
CN108764067A (en) * 2018-05-08 2018-11-06 北京大米科技有限公司 Video intercepting method, terminal, equipment and readable medium based on recognition of face
US11995556B2 (en) 2018-05-18 2024-05-28 Cambricon Technologies Corporation Limited Video retrieval method, and method and apparatus for generating video retrieval mapping relationship
CN110598048A (en) * 2018-05-25 2019-12-20 北京中科寒武纪科技有限公司 Video retrieval method and video retrieval mapping relation generation method and device
CN110545443A (en) * 2018-05-29 2019-12-06 优酷网络技术(北京)有限公司 Video clip acquisition method and device
CN110866148A (en) * 2018-08-28 2020-03-06 富士施乐株式会社 Information processing system, information processing apparatus, and storage medium
CN113837022A (en) * 2021-09-02 2021-12-24 北京新橙智慧科技发展有限公司 Method for rapidly searching video pedestrian

Similar Documents

Publication Publication Date Title
CN105760472A (en) Video retrieval method and system
CN109993160B (en) Image correction and text and position identification method and system
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
Zhang et al. Probabilistic graphlet transfer for photo cropping
CN109635686B (en) Two-stage pedestrian searching method combining human face and appearance
CN111985621A (en) Method for building neural network model for real-time detection of mask wearing and implementation system
CN113779308B (en) Short video detection and multi-classification method, device and storage medium
CN107358141B (en) Data identification method and device
CN110738262B (en) Text recognition method and related product
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN104463232A (en) Density crowd counting method based on HOG characteristic and color histogram characteristic
Nag et al. A new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2)
CN108268875A (en) A kind of image meaning automatic marking method and device based on data smoothing
Elharrouss et al. FSC-set: counting, localization of football supporters crowd in the stadiums
CN111753923A (en) Intelligent photo album clustering method, system, equipment and storage medium based on human face
Gu et al. Embedded and real-time vehicle detection system for challenging on-road scenes
CN106203448A (en) A kind of scene classification method based on Nonlinear Scale Space Theory
CN104866826A (en) Static gesture language identification method based on KNN algorithm and pixel ratio gradient features
Agrawal et al. Redundancy removal for isolated gesture in Indian sign language and recognition using multi-class support vector machine
CN110728214B (en) Weak and small figure target detection method based on scale matching
CN110458203B (en) Advertisement image material detection method
Rakowski et al. Hand shape recognition using very deep convolutional neural networks
CN117011932A (en) Running behavior detection method, electronic device and storage medium
Wan et al. Face detection method based on skin color and adaboost algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160713

RJ01 Rejection of invention patent application after publication