CN107169106B - Video retrieval method, device, storage medium and processor - Google Patents

Video retrieval method, device, storage medium and processor Download PDF

Info

Publication number
CN107169106B
CN107169106B CN201710351135.6A CN201710351135A CN107169106B CN 107169106 B CN107169106 B CN 107169106B CN 201710351135 A CN201710351135 A CN 201710351135A CN 107169106 B CN107169106 B CN 107169106B
Authority
CN
China
Prior art keywords
target
image
video
feature
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710351135.6A
Other languages
Chinese (zh)
Other versions
CN107169106A (en
Inventor
周文明
王志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Thinkjoy Information Technology Co ltd
Original Assignee
Zhuhai Thinkjoy Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Thinkjoy Information Technology Co ltd filed Critical Zhuhai Thinkjoy Information Technology Co ltd
Priority to CN201710351135.6A priority Critical patent/CN107169106B/en
Publication of CN107169106A publication Critical patent/CN107169106A/en
Application granted granted Critical
Publication of CN107169106B publication Critical patent/CN107169106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a video retrieval method, a video retrieval device, a storage medium and a processor. Wherein the method comprises the following steps: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; processing at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; processing all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image; clustering the first features and the second features according to a preset algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; and searching the target area image to obtain a search result. The application solves the technical problems of low video retrieval precision and low retrieval efficiency in the prior art.

Description

Video retrieval method, device, storage medium and processor
Technical Field
The present application relates to the field of digital intelligence, and in particular, to a video retrieval method, apparatus, storage medium, and processor.
Background
With the construction and popularization of projects such as safe cities, intelligent communities and the like, video security monitoring equipment is gradually erected to all corners of the cities, and video image data can be recorded and collected continuously for 7x24 hours. For traffic and community monitoring video systems with huge and numerous scales, the emerging intelligent video analysis based on the computer vision technology enables automatic analysis and target identification of massive videos. It is well known that surveillance videos are mainly used for community and public security maintenance, and play a vital role in guaranteeing social security through real-time evidence obtaining and post-hoc retrieval. However, video images have a huge amount of data as unstructured data, and have little effective information, and there are still many problems in terms of formatted storage. In addition, real-time quick retrieval of video data also faces many challenges, and manual retrieval is not suitable for practical application due to various limiting factors such as large workload, numerous retrieval targets, easy omission, low efficiency and the like. Based on the above, the video retrieval technology in the prior art mainly includes the following two ways:
in one approach, semantic-based video retrieval. The retrieval mode is based on keywords, and the keywords can be titles, topics, characters, video events and the like by performing retrieval matching based on the keywords by manually adding the video or automatically generating semantic description data. However, in security monitoring applications, the accuracy of semantic-based video retrieval techniques relies on a large amount of semantic descriptive information, and the retrieval effect is quite limited with less descriptive information for a single specific target. For example, a certain target person is found in a huge amount of public security videos, the description information of the target person only includes a person wearing blue coat and black trousers, deep characteristic information of the person cannot be described specifically, the searching pertinence is poor, and the searched result is quite complicated.
And secondly, retrieving video based on the content. The retrieval mode generally adopts a traditional image processing method, and the similarity between videos is analyzed as the retrieval basis by extracting bottom layer information such as colors, textures, edges, feature points and the like of video images. Compared with semantic retrieval, the content-based video retrieval effectively utilizes the bottom features in the image video, and the retrieval efficiency is improved. However, most of the current content-based image retrieval technologies need to adopt traditional image features, the description capability still has a certain limit, the feature vector dimension for retrieval is high, the time for calculating the similarity is long, and the real-time retrieval is difficult to achieve.
In summary, the existing video retrieval technology has the technical problems of low retrieval pertinence, low retrieval precision and low retrieval efficiency and poor retrieval real-time property, so the technical problems of low video retrieval precision and low retrieval efficiency in the prior art exist.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a video retrieval method, a device, a storage medium and a processor, which are used for at least solving the technical problems of low video retrieval precision and low retrieval efficiency in the prior art.
According to an aspect of an embodiment of the present application, there is provided a video retrieval method, including: acquiring a target retrieval picture and a plurality of video images; preprocessing the plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on the at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video images, and the second features are original features of the first target video images; clustering the first feature and the second feature according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target search image to obtain a target area image; and searching the target area image according to the search model to obtain a search result.
Further, the searching the target area image according to the searching model to obtain a searching result includes: acquiring a third feature and a fourth feature of the target area image, wherein the third feature is a binary hash feature of the target area image, and the fourth feature is an original feature of the target area image; calculating a hamming distance between the third feature and the first feature of each of the first target video images to obtain at least one second target video image; calculating the Euclidean distance between the fourth feature and the second feature of each second target video image in the at least one second target video image to obtain a target image frame, wherein the similarity between the target image frame and the target retrieval image is greater than a preset similarity threshold; acquiring a frame ID of the target image frame; and searching the video image corresponding to the frame ID in the plurality of video images to obtain the search result.
Further, after performing feature extraction processing on all target image sequences of each of the first target video images according to a second preset model, the method further includes: the at least one first target video image, the sequence of target images, the first feature and the second feature are stored in a database.
Further, the preset approximate nearest neighbor algorithm is a local sensitivity hash algorithm.
Further, the preprocessing the plurality of video images to obtain at least one first target video image includes: and sequentially performing length normalization processing and decoding processing on each of the plurality of video images to obtain the first target video image.
Further, the method further comprises the steps of: training the first preset model and the second preset model according to a random gradient descent algorithm until the first preset model and the second preset model reach a convergence state.
According to another aspect of the embodiment of the present application, there is also provided a video retrieval apparatus including: an acquisition unit configured to acquire a target retrieval picture and a plurality of video images; the first processing unit is used for preprocessing the plurality of video images to obtain at least one first target video image; the second processing unit is used for carrying out target detection processing and target tracking processing on the at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; the third processing unit is configured to perform feature extraction processing on all the target image sequences of each of the first target video images according to a second preset model to obtain a first feature and a second feature of each of the first target video images, where the first feature is a binary hash feature of the first target video image, and the second feature is an original feature of the first target video image; the fourth processing unit is used for carrying out clustering processing on the first characteristic and the second characteristic according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; a fifth processing unit, configured to perform matting processing on the target search image to obtain a target area image; and the searching unit is used for searching the target area image according to the searching model to obtain a searching result.
Further, the search unit includes: a first obtaining subunit, configured to obtain a third feature and a fourth feature of the target area image, where the third feature is a binary hash feature of the target area image, and the fourth feature is an original feature of the target area image; a first computing subunit, configured to calculate a hamming distance between the third feature and the first feature of each of the first target video images, so as to obtain at least one second target video image; a second calculating subunit, configured to calculate a euclidean distance between the fourth feature and the second feature of each of the at least one second target video image, so as to obtain a target image frame, where a similarity between the target image frame and the target search image is greater than a preset similarity threshold; a second acquisition subunit configured to acquire a frame ID of the target image frame; and a search subunit configured to search the plurality of video images for the video image corresponding to the frame ID, and obtain the search result.
According to still another aspect of the embodiments of the present application, there is further provided a storage medium, where the storage medium includes a stored program, and the device where the storage medium is controlled to execute the video search method when the program runs.
According to still another aspect of the embodiment of the present application, there is further provided a processor, where the processor is configured to execute a program, and the video searching method is executed when the program runs.
In the embodiment of the application, the following modes are adopted: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image; clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; the method and the device achieve the aim of obtaining the search result by searching the target area image according to the search model, thereby achieving the technical effects of improving the search precision and the search efficiency of the video, reducing the search time cost and the labor cost, and further solving the technical problems of lower search precision and lower search efficiency of the video in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of an alternative video retrieval method according to an embodiment of the application;
FIG. 2 is a flow chart of another alternative video retrieval method according to an embodiment of the application;
FIG. 3 is a schematic diagram of an alternative video retrieval device according to an embodiment of the present application;
fig. 4 is a schematic structural view of another alternative video retrieval device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present application, there is provided an embodiment of a video retrieval method, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
Fig. 1 is a flow chart of an alternative video retrieval method according to an embodiment of the present application, as shown in fig. 1, the method includes the steps of:
step S102, obtaining a target retrieval picture and a plurality of video images;
step S104, preprocessing a plurality of video images to obtain at least one first target video image;
step S106, performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image;
step S108, carrying out feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image;
step S110, clustering the first feature and the second feature according to a preset approximate nearest neighbor algorithm to obtain a retrieval model;
step S112, carrying out matting processing on the target retrieval image to obtain a target area image;
step S114, searching the target area image according to the search model to obtain a search result.
In the embodiment of the application, the following modes are adopted: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image; clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; the method and the device achieve the aim of obtaining the search result by searching the target area image according to the search model, thereby achieving the technical effects of improving the search precision and the search efficiency of the video, reducing the search time cost and the labor cost, and further solving the technical problems of lower search precision and lower search efficiency of the video in the prior art.
Alternatively, the plurality of video images may be understood as massive video images, and the target search picture is input by the user, where it is noted that the target search picture may or may not be included in the plurality of video images.
Optionally, the steps S102 to S110 of the present application may be performed by processing a large number of video images, extracting features (including object detection, object tracking, and feature extraction) of each video image, where the features include an original feature (with a longer dimension) and a binary hash feature (with a shorter dimension, and only 0 or 1 two digits), and further storing and clustering the original feature and the binary hash feature of the video image, so as to construct a search service model.
Optionally, in the case that the user inputs a single picture as the target search picture, the step S112 may be executed to pre-process the single picture input by the user, remove information irrelevant to the target area image in the picture, and extract the target area image alone.
Optionally, the first preset model may include two sub-models, which are a target detection sub-model based on deep learning and a target tracking sub-model based on deep learning, respectively; the second preset model may be a deep learning based target feature extraction model.
Optionally, fig. 2 is a flowchart of another optional video retrieval method according to an embodiment of the present application, as shown in fig. 2, in step S114, retrieving a target area image according to a retrieval model, where obtaining a retrieval result includes:
step S202, obtaining a third feature and a fourth feature of the target area image, wherein the third feature is a binarized hash feature of the target area image, and the fourth feature is an original feature of the target area image;
step S204, calculating the Hamming distance between the third feature and the first feature of each first target video image to obtain at least one second target video image;
step S206, calculating the Euclidean distance between the fourth feature and the second feature of each second target video image in at least one second target video image to obtain a target image frame, wherein the similarity between the target image frame and the target retrieval image is larger than a preset similarity threshold;
step S208, obtaining the frame ID of the target image frame;
step S210, searching video images corresponding to the frame ID in the plurality of video images to obtain a search result.
Optionally, step S202 is performed, where original features with longer dimensions and binary hash features with shorter dimensions in the target area image may be obtained.
Optionally, step S204 is performed, where the hamming distance between the binarized feature of the user input image and the binarized feature of the massive video data may be calculated, so as to reduce the search range and obtain the massive video data feature with reduced range. Wherein the hamming distance may characterize the similarity between the features, the greater the hamming distance, the lower the similarity. For example, computing the hamming distance may narrow the search, e.g., there are hundreds of thousands of video images in a mass database, and the user enters a half-life of a picture, possibly ten thousands of video images, all of which may contain dogs, after computing the hamming distance.
Optionally, executing steps S206 to S210 may calculate the euclidean distance between the original features of the image input by the user and the original features of the massive video data with the reduced scope, so as to obtain the first N image frames with high similarity to the image input by the user in the massive video data, and further search the related information such as the corresponding video identifier, the frame number where the image is located, and the like in the massive video data according to the image frame ID, and finally obtain the video search result. For example, by calculating the Euclidean distance, one thousand video images including only Hastelloy can be obtained from ten thousand video images including dogs in the above example. Therefore, the hamming distance and the euclidean distance are calculated sequentially, and the search range can be further narrowed.
Optionally, based on the above, firstly, according to the binary hash feature of the target search picture, the position of the corresponding sub-bucket is obtained through the standard front distribution icon, the corresponding binary vector set is obtained from the redis according to the sub-bucket mark, and the binary hash feature with high corresponding similarity is obtained through hamming distance comparison and sequencing, so as to complete the preliminary search. Further accurate searching can be performed by calculating Euclidean distance according to the original characteristics of the target searching picture. Finally, the first N image frames with high similarity are obtained through comparison and sequencing, and corresponding information such as video identification, frame number of the image and the like is searched according to the image frame ID, so that a video retrieval result is obtained. Where N is set to 10, i.e. the search returns the first 10 highest-similarity video sequences.
Optionally, after performing the completing step S108, that is, after performing the feature extraction process on all target image sequences of each first target video image according to the second preset model, the method may further include:
step S10, at least one first target video image, a target image sequence, a first feature and a second feature are stored in a database in a structured manner. The database can be a Mongodb database or a Poseidon database, and can be used as a search database, and when video image search is carried out, the target characteristics are required to be compared with the data in the database, so that a search result is obtained.
Optionally, the preset approximate nearest neighbor algorithm is a local sensitivity hashing algorithm. Specifically, the structured information of the video file is clustered based on ANN (Approximate Nearest Neighbor) approximate nearest neighbor algorithm. And carrying out barrel division based on the standard direct-distributed binary hash, and storing the binary vector data after barrel division into memory data redis, thereby constructing retrieval service.
Optionally, performing step S104, that is, preprocessing the plurality of video images to obtain at least one first target video image includes:
step S20, sequentially performing a length normalization process and a decoding process on each of the plurality of video images, to obtain a first target video image.
Specifically, the length normalization processing is carried out on the video image, and the continuous video stream can be intercepted into a video stream string with fixed length, so that the later analysis and the storage are facilitated; in the process of decoding the video image, the video file may be decoded by opencv, and a size scaling normalization operation is performed on each frame of image. Wherein, the size scaling adopts bilinear difference algorithm, and the scaled size is 1920×1080.
Optionally, the method may further include: step S30, training the first preset model and the second preset model according to a random gradient descent algorithm until the first preset model and the second preset model reach a convergence state.
Specifically, the first preset model may be trained in the manner described above: firstly, the image data set and the corresponding category label information thereof can be respectively and correspondingly divided into two parts, wherein one part is used as a training sample set, the other part is used as a test sample set, and each sample in the training sample set and the test sample set comprises an image and a corresponding category label. And then two sub-models in the first preset model can be constructed: the target detection sub-model based on the deep learning and the target tracking sub-model based on the deep learning are adopted, wherein the target detection sub-model adopts a classical YOLO architecture, and the target tracking sub-model adopts an RNN architecture. Finally, training the target detection sub-model and the target tracking sub-model according to an SGD random gradient descent method by using a training sample set. Wherein, the learning rate step length of training is set to 0.01.
Specifically, the second preset model may be trained in the manner described above: firstly, respectively dividing an image data set and corresponding category label information into two parts, wherein one part is used as a training sample set, and the other part is used as a test sample set, and each sample in the training sample set and the test sample set comprises an image and a corresponding category label. Further, a deep convolutional neural network architecture is constructed, wherein the deep convolutional neural network architecture comprises a convolutional sub-network, a hash layer and a loss layer, the convolutional sub-network is used for learning original features of an image, the hash layer is used for carrying out feature compression dimension reduction on the original features and converting the original features into binary codes to obtain binary hash features of an input image, and the loss layer is used for measuring Softmax classification errors; wherein, the convolution sub-network adopts VGG architecture. The original feature dimension is 4096 dimensions. The binarized hash feature dimension is 128 dimensions. Finally, training the second preset model according to the SGD random gradient descent method by utilizing the training sample set and according to the deep convolutional neural network architecture to obtain a target feature extraction model based on deep learning. Wherein, the learning rate step length of training is set to 0.01.
In the embodiment of the application, the following modes are adopted: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image; clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; the method and the device achieve the aim of obtaining the search result by searching the target area image according to the search model, thereby achieving the technical effects of improving the search precision and the search efficiency of the video, reducing the search time cost and the labor cost, and further solving the technical problems of lower search precision and lower search efficiency of the video in the prior art.
Example 2
According to another aspect of the embodiment of the present application, there is also provided a video retrieval apparatus, as shown in fig. 3, including: an acquisition unit 301, a first processing unit 303, a second processing unit 305, a third processing unit 307, a fourth processing unit 309, a fifth processing unit 311, and a retrieval unit 313.
Wherein, the acquiring unit 301 is configured to acquire a target search picture and a plurality of video images; a first processing unit 303, configured to pre-process the plurality of video images to obtain at least one first target video image; a second processing unit 305, configured to perform a target detection process and a target tracking process on at least one first target video image according to a first preset model, so as to obtain a total target image sequence of each first target video image in the at least one first target video image; the third processing unit 307 is configured to perform feature extraction processing on all the target image sequences of each first target video image according to the second preset model, so as to obtain a first feature and a second feature of each first target video image, where the first feature is a binary hash feature of the first target video image, and the second feature is an original feature of the first target video image; a fourth processing unit 309, configured to perform clustering processing on the first feature and the second feature according to a preset approximate nearest neighbor algorithm, to obtain a retrieval model; a fifth processing unit 311, configured to perform matting processing on the target search image to obtain a target area image; and a retrieval unit 313 for retrieving the target area image according to the retrieval model to obtain a retrieval result.
Alternatively, as shown in fig. 4, the retrieving unit 313 may include: a first acquisition subunit 401, a first calculation subunit 403, a second calculation subunit 405, a second acquisition subunit 407, and a retrieval subunit 409.
The first obtaining subunit 401 is configured to obtain a third feature and a fourth feature of the target area image, where the third feature is a binary hash feature of the target area image, and the fourth feature is an original feature of the target area image; a first calculating subunit 403, configured to calculate a hamming distance between the third feature and the first feature of each first target video image, so as to obtain at least one second target video image; a second calculating subunit 405, configured to calculate a euclidean distance between the fourth feature and the second feature of each of the at least one second target video image, to obtain a target image frame, where a similarity between the target image frame and the target search image is greater than a preset similarity threshold; a second acquisition subunit 407 for acquiring a frame ID of the target image frame; the retrieving subunit 409 is configured to retrieve, from among the plurality of video images, a video image corresponding to the frame ID, and obtain a retrieval result.
In the embodiment of the application, the following modes are adopted: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image; clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; the method and the device achieve the aim of obtaining the search result by searching the target area image according to the search model, thereby achieving the technical effects of improving the search precision and the search efficiency of the video, reducing the search time cost and the labor cost, and further solving the technical problems of lower search precision and lower search efficiency of the video in the prior art.
Example 3
According to still another aspect of the embodiment of the present application, there is further provided a storage medium including a stored program, wherein the device in which the storage medium is controlled to execute the video search method in embodiment 1 of the present application is controlled when the program runs.
According to still another aspect of the embodiment of the present application, there is further provided a processor, where the processor is configured to execute a program, where the program executes the video search method in embodiment 1 of the present application.
In the embodiment of the application, the following modes are adopted: acquiring a target retrieval picture and a plurality of video images; preprocessing a plurality of video images to obtain at least one first target video image; performing target detection processing and target tracking processing on at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image; performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video image, and the second features are original features of the first target video image; clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model; carrying out matting processing on the target retrieval image to obtain a target area image; the method and the device achieve the aim of obtaining the search result by searching the target area image according to the search model, thereby achieving the technical effects of improving the search precision and the search efficiency of the video, reducing the search time cost and the labor cost, and further solving the technical problems of lower search precision and lower search efficiency of the video in the prior art.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (9)

1. The video retrieval method is characterized in that firstly, the position of a corresponding sub-bucket is obtained through a standard front distribution icon according to the binary hash characteristic of a target retrieval picture, a corresponding binary vector set is obtained from redis according to a sub-bucket mark, and the binary hash characteristic with high corresponding similarity is obtained through Hamming distance comparison and sequencing to finish preliminary retrieval; finally, obtaining the first N image frames with high similarity through comparison and sequencing, and searching the corresponding video identification and the related information of the frame number of the image according to the image frame ID, thereby obtaining a video retrieval result; wherein N is set to 10, namely searching and returning the first 10 video sequences with highest similarity;
the method specifically comprises the following steps:
acquiring a target retrieval picture and a plurality of video images;
preprocessing the plurality of video images to obtain at least one first target video image;
performing target detection processing and target tracking processing on the at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image;
performing feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binary hash features of the first target video images, and the second features are original features of the first target video images;
clustering the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model;
carrying out matting processing on the target retrieval image to obtain a target area image;
searching the target area image according to the search model to obtain a search result;
the target area image is searched according to the search model, and the search result is obtained comprises the following steps:
acquiring third and fourth features of the target area image, wherein the third feature is a binary hash feature of the target area image, and the fourth feature is an original feature of the target area image;
calculating a hamming distance between the third feature and the first feature of each first target video image to obtain at least one second target video image;
calculating the Euclidean distance between the fourth feature and the second feature of each second target video image in the at least one second target video image to obtain a target image frame, wherein the similarity between the target image frame and the target retrieval image is larger than a preset similarity threshold;
acquiring a frame ID of the target image frame;
and searching the video images corresponding to the frame IDs in the plurality of video images to obtain the search result.
2. The method according to claim 1, wherein after performing feature extraction processing on all target image sequences of each of the first target video images according to a second preset model, the method further comprises:
the at least one first target video image, the sequence of target images, the first feature and the second feature are stored in a database.
3. The method of claim 1, wherein the predetermined approximate nearest neighbor algorithm is a local sensitivity hashing algorithm.
4. The method of claim 1, wherein preprocessing the plurality of video images to obtain at least one first target video image comprises:
and sequentially performing length normalization processing and decoding processing on each video image in the plurality of video images to obtain the first target video image.
5. The method according to claim 1, wherein the method further comprises:
training the first preset model and the second preset model according to a random gradient descent algorithm until the first preset model and the second preset model reach a convergence state.
6. The video retrieval device according to claim 1, characterized by comprising: an acquisition unit configured to acquire a target retrieval picture and a plurality of video images;
the first processing unit is used for preprocessing the plurality of video images to obtain at least one first target video image;
the second processing unit is used for carrying out target detection processing and target tracking processing on the at least one first target video image according to a first preset model to obtain all target image sequences of each first target video image in the at least one first target video image;
the third processing unit is used for carrying out feature extraction processing on all target image sequences of each first target video image according to a second preset model to obtain first features and second features of each first target video image, wherein the first features are binarized hash features of the first target video image, and the second features are original features of the first target video image;
the fourth processing unit is used for carrying out clustering processing on the first features and the second features according to a preset approximate nearest neighbor algorithm to obtain a retrieval model;
a fifth processing unit, configured to perform matting processing on the target search image to obtain a target area image;
and the retrieval unit is used for retrieving the target area image according to the retrieval model to obtain a retrieval result.
7. The apparatus of claim 6, wherein the retrieval unit comprises:
a first obtaining subunit, configured to obtain a third feature and a fourth feature of the target area image, where the third feature is a binary hash feature of the target area image, and the fourth feature is an original feature of the target area image;
a first computing subunit, configured to calculate a hamming distance between the third feature and the first feature of each of the first target video images, to obtain at least one second target video image;
a second calculating subunit, configured to calculate a euclidean distance between the fourth feature and the second feature of each of the at least one second target video image, to obtain a target image frame, where a similarity between the target image frame and the target search image is greater than a preset similarity threshold;
a second acquisition subunit configured to acquire a frame ID of the target image frame;
and the searching subunit is used for searching the video images corresponding to the frame ID in the plurality of video images to obtain the searching result.
8. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the video retrieval method of any one of claims 1 to 5.
9. A processor for running a program, wherein the program when run performs the video retrieval method of any one of claims 1 to 5.
CN201710351135.6A 2017-05-18 2017-05-18 Video retrieval method, device, storage medium and processor Active CN107169106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710351135.6A CN107169106B (en) 2017-05-18 2017-05-18 Video retrieval method, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710351135.6A CN107169106B (en) 2017-05-18 2017-05-18 Video retrieval method, device, storage medium and processor

Publications (2)

Publication Number Publication Date
CN107169106A CN107169106A (en) 2017-09-15
CN107169106B true CN107169106B (en) 2023-08-18

Family

ID=59816651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710351135.6A Active CN107169106B (en) 2017-05-18 2017-05-18 Video retrieval method, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN107169106B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107705259A (en) * 2017-09-24 2018-02-16 合肥麟图信息科技有限公司 A kind of data enhancement methods and device under mobile terminal preview, screening-mode
CN107844753A (en) * 2017-10-20 2018-03-27 珠海习悦信息技术有限公司 Pedestrian in video image recognition methods, device, storage medium and processor again
CN109697451B (en) * 2017-10-23 2022-01-07 北京京东尚科信息技术有限公司 Similar image clustering method and device, storage medium and electronic equipment
CN108573032A (en) * 2018-03-27 2018-09-25 麒麟合盛网络技术股份有限公司 Video recommendation method and device
CN109086866B (en) * 2018-07-02 2021-07-30 重庆大学 Partial binary convolution method suitable for embedded equipment
CN108932509A (en) * 2018-08-16 2018-12-04 新智数字科技有限公司 A kind of across scene objects search methods and device based on video tracking
CN110929058B (en) * 2018-08-30 2023-01-31 北京蓝灯鱼智能科技有限公司 Trademark picture retrieval method and device, storage medium and electronic device
CN110162665B (en) * 2018-12-28 2023-06-16 腾讯科技(深圳)有限公司 Video searching method, computer device and storage medium
CN109871763B (en) * 2019-01-16 2020-11-06 清华大学 Specific target tracking method based on YOLO
CN113255828B (en) * 2021-06-17 2021-10-15 长沙海信智能系统研究院有限公司 Feature retrieval method, device, equipment and computer storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2412471A1 (en) * 2002-12-17 2004-06-17 Concordia University A framework and a system for semantic content extraction in video sequences
CN103150375A (en) * 2013-03-11 2013-06-12 浙江捷尚视觉科技有限公司 Quick video retrieval system and quick video retrieval method for video detection
CN104182959A (en) * 2013-05-22 2014-12-03 浙江大华技术股份有限公司 Target searching method and target searching device
CN105808732A (en) * 2016-03-10 2016-07-27 北京大学 Integration target attribute identification and precise retrieval method based on depth measurement learning
CN106033426A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 A latent semantic min-Hash-based image retrieval method
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106407352A (en) * 2016-09-06 2017-02-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 Traffic image retrieval method based on depth learning
CN106682233A (en) * 2017-01-16 2017-05-17 华侨大学 Method for Hash image retrieval based on deep learning and local feature fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2412471A1 (en) * 2002-12-17 2004-06-17 Concordia University A framework and a system for semantic content extraction in video sequences
CN103150375A (en) * 2013-03-11 2013-06-12 浙江捷尚视觉科技有限公司 Quick video retrieval system and quick video retrieval method for video detection
CN104182959A (en) * 2013-05-22 2014-12-03 浙江大华技术股份有限公司 Target searching method and target searching device
CN106033426A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 A latent semantic min-Hash-based image retrieval method
CN105808732A (en) * 2016-03-10 2016-07-27 北京大学 Integration target attribute identification and precise retrieval method based on depth measurement learning
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106407352A (en) * 2016-09-06 2017-02-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 Traffic image retrieval method based on depth learning
CN106682233A (en) * 2017-01-16 2017-05-17 华侨大学 Method for Hash image retrieval based on deep learning and local feature fusion

Also Published As

Publication number Publication date
CN107169106A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
CN107169106B (en) Video retrieval method, device, storage medium and processor
TWI623842B (en) Image search and method and device for acquiring image text information
CN104679818B (en) A kind of video key frame extracting method and system
US9373040B2 (en) Image matching using motion manifolds
CN106599226A (en) Content recommendation method and content recommendation system
CN107818307B (en) Multi-label video event detection method based on LSTM network
CN102165464A (en) Method and system for automated annotation of persons in video content
CN103150375A (en) Quick video retrieval system and quick video retrieval method for video detection
CN104239420A (en) Video fingerprinting-based video similarity matching method
CN102890700A (en) Method for retrieving similar video clips based on sports competition videos
CN107103615A (en) A kind of monitor video target lock-on tracing system and track lock method
CN101872415A (en) Video copying detection method being suitable for IPTV
CN107229710A (en) A kind of video analysis method accorded with based on local feature description
CN105589974A (en) Surveillance video retrieval method and system based on Hadoop platform
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
CN104317946A (en) Multi-key image-based image content retrieval method
Parihar et al. Multiview video summarization using video partitioning and clustering
CN103187083B (en) A kind of storage means based on time domain video fusion and system thereof
CN109241315B (en) Rapid face retrieval method based on deep learning
Lv et al. Efficient large scale near-duplicate video detection base on spark
Hezel et al. Video search with sub-image keyword transfer using existing image archives
Tseytlin et al. Content based video retrieval system for distorted video queries
Hu et al. STRNN: End-to-end deep learning framework for video partial copy detection
CN113010731B (en) Multimodal video retrieval system
CN112069331A (en) Data processing method, data retrieval method, data processing device, data retrieval device, data processing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant