CN106610987B

CN106610987B - Video image retrieval method, device and system

Info

Publication number: CN106610987B
Application number: CN201510694753.1A
Authority: CN
Inventors: 郭斌; 蔡巍伟
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2015-10-22
Filing date: 2015-10-22
Publication date: 2020-02-21
Anticipated expiration: 2035-10-22
Also published as: CN106610987A

Abstract

The invention discloses a video image retrieval method, a video image retrieval device and a video image retrieval system. Wherein, the method comprises the following steps: acquiring color statistical data of a target picture, wherein the color statistical data of the target picture is data representing the color of a first characteristic in the target picture; analyzing a pre-stored packaging layer of a video frame to obtain color statistical data of the video frame, wherein the video frame is obtained after packaging the color statistical data of the video frame and video coding data; matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and performing image retrieval on the target video frame according to the first characteristic. The invention solves the technical problem of low image retrieval efficiency caused by the temporary decoding and analysis of the image in the image retrieval process in the prior art.

Description

Video image retrieval method, device and system

Technical Field

The invention relates to the field of image processing and analysis, in particular to a video image retrieval method, device and system.

Background

With the rapid development of image processing technology, the amount of video data is increasing, and the difficulty of image retrieval from massive video data is increased. In the prior art, temporary decoding and analysis of the image are required in the image retrieval process, but the image decoding and analysis consume a large amount of processing time, which seriously reduces the efficiency of the image retrieval. In addition, image retrieval in the prior art excessively depends on the efficiency and accuracy of target object detection, but in a video monitoring scene, due to the existence of factors such as different distances, angles, background interference and the like, the accuracy and reliability of target object detection can be seriously reduced, and meanwhile, the problems of missed detection and false detection of a target object can also seriously affect the accuracy and reliability of target object detection.

Aiming at the problem of low image retrieval efficiency caused by the temporary decoding and analysis of images in the image retrieval process in the prior art, no effective solution is provided at present.

Disclosure of Invention

The embodiment of the invention provides a video image retrieval method, a video image retrieval device and a video image retrieval system, which are used for at least solving the technical problem of low image retrieval efficiency caused by the need of temporarily decoding and analyzing an image in the image retrieval process in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a video image processing method, including: acquiring color statistical data of a target picture, wherein the color statistical data of the target picture is data representing the color of a first characteristic in the target picture; analyzing a pre-stored packaging layer of a video frame to obtain color statistical data of the video frame, wherein the video frame is obtained after packaging the color statistical data of the video frame and video coding data; matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and performing image retrieval on the target video frame according to the first characteristic.

Further, the color statistical data of the target picture includes color information of the first characteristic and a pixel number corresponding to the color information of the first characteristic, wherein the acquiring the color statistical data of the target picture includes: acquiring color information of the first characteristic; acquiring the number of pixels of a video frame; calculating the percentage of the size of the first feature in the target picture; and calculating the number of pixels corresponding to the color information of the first feature, wherein the number of pixels corresponding to the color information of the first feature is the product of the number of pixels of the video frame and the percentage.

Further, the color statistical data of the video frame includes color information of each image feature in the video frame and a pixel number corresponding to the color information of each image feature, wherein matching the color statistical data of the target picture with the color statistical data of the video frame includes: matching the color information of the first characteristic with the color information of each image characteristic respectively; and matching the number of pixels corresponding to the color information of the first feature with the number of pixels corresponding to the color information of each image feature.

Further, the image retrieval of the target video frame according to the first feature comprises: decoding the video coding data of the target video frame to obtain a decoded video image; judging whether the decoded video image contains a first feature or not; and if the decoded video image is judged to contain the first feature, determining the decoded video image as the retrieval result of the target picture.

According to another aspect of an embodiment of the present invention, there is provided a video image processing system including: the video camera is used for acquiring color statistical data and video coding data of a video image and packaging the color statistical data and the video coding data into a video frame; the storage server is used for acquiring color statistical data of the target picture; analyzing the packaging layer of the video frame to obtain color statistical data of the video frame; matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and carrying out image retrieval on the target video frame according to the first characteristic, wherein the color statistical data of the target picture is data representing the color of the first characteristic in the target picture, and the video frame is obtained by packaging the color statistical data of the video frame and the video coding data.

Further, the camera is configured to encapsulate the color statistics and the video encoding data into video frames in the following manner: establishing a corresponding relation between color statistical data and video coding data; and packaging the color statistical data and the video coding data according to the corresponding relation to obtain a video frame.

Further, the camera is used for acquiring the color statistical data of the video image according to the following modes: acquiring color information of image characteristics in a video image; and classifying colors of each pixel point in the video image, and counting the number of pixels corresponding to the color information of the image characteristics.

Further, the camera is used for classifying colors of each pixel point in the video image according to the following modes, and counting the number of pixels corresponding to color information of image features: acquiring a color value of a first pixel in a video image; judging whether the color value of the first pixel is matched with the color value corresponding to the color information of the image characteristic; and if the color value of the first pixel is judged to be matched with the color value corresponding to the color information of the image characteristic, classifying the color of the first pixel into the color information of the image characteristic, and accumulating the number of pixels corresponding to the color information of the image characteristic.

According to another aspect of the embodiments of the present invention, there is also provided a video image processing apparatus including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring color statistical data of a target picture, and the color statistical data of the target picture is data representing the color of a first characteristic in the target picture; the analysis module is used for analyzing a pre-stored packaging layer of the video frame to obtain color statistical data of the video frame, wherein the video frame is obtained after the color statistical data of the video frame and video coding data are packaged; the second acquisition module is used for matching the color statistical data of the target picture with the color statistical data of the video frame to acquire the target video frame of which the matching result meets the preset condition; and the retrieval module is used for carrying out image retrieval on the target video frame according to the first characteristic.

Further, the color statistical data of the target picture includes color information of the first feature and the number of pixels corresponding to the color information of the first feature, wherein the first obtaining module includes: the first obtaining submodule is used for obtaining color information of the first characteristic; the second obtaining submodule is used for obtaining the pixel number of the video frame; the first calculation module is used for calculating the percentage of the size of the first feature in the target picture; and the second calculation module is used for calculating the number of pixels corresponding to the color information of the first characteristic, wherein the number of pixels corresponding to the color information of the first characteristic is the product of the number of pixels of the video frame and the percentage.

In the embodiment of the invention, the color statistical data of the video image is stored in the storage server in the video acquisition process, when the video image is retrieved, the target video frame matched with the color statistical data of the target image is firstly screened out according to the color statistical data of the target image, then the video coding data of the target video frame is decoded to obtain the decoded video image, and then the video image containing the first characteristic in the target image is retrieved from the decoded video image, so that the aim of not decoding and analyzing all the video images in the video image retrieval process is achieved, the technical effect of improving the video image retrieval efficiency is realized, and the technical problem of low image retrieval efficiency caused by the fact that the images are temporarily decoded and analyzed in the image retrieval process in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of a video image retrieval system according to an embodiment of the present invention;

fig. 2 is a schematic view of a process flow of data information of a video image in a digital camera according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a process for simulating data information of a video image in a camera according to an embodiment of the present invention;

fig. 4 is a flowchart of a video image retrieval method according to an embodiment of the present invention; and

fig. 5 is a schematic diagram of a video image retrieval apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an embodiment of the present invention, there is provided a video image retrieval system, and fig. 1 is a schematic diagram of a video image retrieval system according to an embodiment of the present invention, as shown in fig. 1, the system includes:

the video camera 10 is used for acquiring color statistical data and video coded data of a video image, and packaging the color statistical data and the video coded data into a video frame; and

the storage server 20 is used for acquiring color statistical data of the target picture; analyzing the packaging layer of the video frame to obtain color statistical data of the video frame; matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and carrying out image retrieval on the target video frame according to the first characteristic, wherein the color statistical data of the target picture is data representing the color of the first characteristic in the target picture, and the video frame is obtained by packaging the color statistical data of the video frame and the video coding data.

Optionally, the system may further include a client 30, configured to input a target picture through the client, where the target picture is a picture that a user wants to retrieve according to actual needs, the target picture may be input by the user using a dedicated client, and the client may be in a hardware form or a software form. The retrieval source of the target picture is composed of a video image captured by a camera 10 (such as a digital camera or an analog camera), i.e., the target picture is retrieved from the video image captured by the camera 10. The camera 10 transmits the data information of the acquired video image (including the color statistics data and the video encoding data of the video image) to the storage server 20 for storage, and the storage server 20 interacts with the client 30 through a network (a wired network or a wireless network). As shown in fig. 1, the camera 10 processes the acquired video image, and sends data information of the processed video image to the storage server 20 for storage, the client 30 sends the target picture to the storage server 20, the storage server 20 performs retrieval on the target picture, and sends a retrieval result of the target picture (including the retrieved picture and a video identifier where the retrieved picture is located) to the client 30. Alternatively, when the storage server 20 meets the requirement of running software, the storage server 20 and the client 30 may be deployed on the same hardware device to reduce the consumption of network bandwidth for data transmission between the client 30 and the storage server 20.

The storage server 20 is used for storing data information of video images, and the data information of video images of one or more videos can be stored in the storage server 20, wherein different videos can be distinguished and represented by different video identifiers. Optionally, the video corresponding to any one of the video identifiers in the storage server 20 may include a plurality of video frames, data information of each frame of video image includes color statistical data and video coded data of the frame of video image, and the color statistical data and the video coded data are stored in the storage server 20 in a packaged format, that is, packaged data of a plurality of video frames are stored in the storage server 20, where the packaged data of the video frames are data obtained by packaging the color statistical data and the video coded data of the video frames.

Optionally, the camera 10 is configured to encapsulate the color statistical data and the video encoding data of the acquired video image into a video frame according to the following manner, and send the encapsulated video frame to the storage server 20: establishing a corresponding relation between color statistical data and video coding data; and the camera encapsulates the color statistical data and the video coding data according to the corresponding relation to obtain a video frame.

Optionally, the color statistics of the video image include: the color information of the image features in the video image and the number of pixels corresponding to the color information of the image features. The camera 10 is configured to obtain color statistics of a video image according to the following manner: the camera 10 acquires color information of image features in a video image; and the camera 10 classifies the color of each pixel point in the video image, and counts the number of pixels corresponding to the color information of the image characteristics. In particular, the camera 10 is configured to perform the following steps:

step S1: and acquiring color information of image features in the video image. When the camera collects video images, color discrimination and pixel number statistics are carried out on each frame of video image, wherein the types and the number of colors depend on color space calculation of pixel points, and the retrieval method of the video image does not pay attention to the color discrimination calculation method. Optionally, the obtaining of the color information of the image features in the video image may occur during the video acquisition process, and the color information of the image features in the video image is obtained through color discrimination and pixel number statistics. The color information of the image features in the video image may also be obtained by decoding the video stored in the storage server 20, and then performing color analysis on each video frame in the decoded video, where the color analysis includes color discrimination and pixel number statistics.

Step S2: and counting the number of pixels corresponding to the color information of the image characteristics in the video image by classifying the color of each pixel point in the video image. Assuming that the color discrimination algorithm of this embodiment can discriminate seven colors, namely "red, orange, yellow, green, blue and purple", color classification is performed on all pixel points of each frame of video image, so as to count the number of pixels corresponding to each color.

Optionally, the camera 10 is configured to classify colors of each pixel point in the video image according to the following method, and count the number of pixels corresponding to the color information of the image feature: the camera 10 acquires a color value of a first pixel in a video image; the camera 10 determines whether the color value of the first pixel matches the color value corresponding to the color information of the image feature; and if the camera 10 judges that the color value of the first pixel is matched with the color value corresponding to the color information of the image characteristic, classifying the color of the first pixel into the color information of the image characteristic, and accumulating the number of pixels corresponding to the color information of the image characteristic. That is, the camera 10 color-classifies each pixel point in the video image by performing the following steps:

step S21: and acquiring a color value of a first pixel in the video frame, wherein the first pixel is any one pixel in the video frame image. After capturing a video image, the camera 10 first obtains YUV data of the video image, and a capturing device (such as a video camera) generally outputs the video image in NV12 format. And then converting the chrominance space of the video image into an RGB space from YUV, and assuming that the YUV value of a pixel point of the video image is (y, u, v), the conversion mode is shown as the following formula:

r＝y+1.13983*(v-128)

g＝y-0.39465*(u-128)-0.58060*(v-128)

b＝y+2.03211*(u-128)

then, the video image is converted from the RGB space to the HSV space, and assuming that the RGB values of the pixels of the video image are (r, g, b), the conversion is as follows:

max＝max(r,g,b)

min＝min(r,g,b)

v＝max

step S22: and judging whether the color value of the first pixel is matched with the color value corresponding to the color information of the image characteristic. The color value corresponding to the color information of the image feature is stored in the preset scene color cluster table. The preset scene color cluster table is a different color cluster table which is established in advance according to actual needs according to the monitored scene, and the preset scene color cluster table may contain one or more colors, for example, the preset scene color cluster table may be expressed as:

color_tab[6]＝(h_i,s_i,v_i),i∈[0:6]

wherein (h)_i,s_i,v_i) HSV values for the 7 color center points of "red-orange-yellow-green-cyan-blue-violet" in this scene mode.

Step S23: if the color value of the first pixel is judged to be matched with the color value corresponding to the color information of the image characteristic, classifying the color of the first pixel into the color information of the image characteristic, and accumulating the number of pixels corresponding to the color information of the image characteristic. For example, the color value (h, v, s) of the first pixel of the video image is associated with the color center point (h) corresponding to a color in the preset scene color cluster table_i,s_i,v_i),i∈[0:6]And matching, and classifying the color of the first pixel with the color when the matching is successful.

The video image retrieval system of the embodiment considers the situation that the image has color cast under different monitoring scenes (such as color temperature, illumination and the like), and if the same color template is adopted for color statistics, the accuracy of the statistical result is greatly influenced. The video image retrieval system of the embodiment reduces the problem of image retrieval accuracy reduction caused by image color deviation by using the pre-established preset scene color clustering table, and achieves the effect of improving the video image retrieval accuracy.

When the color value of the first pixel is judged to be matched with the color value corresponding to the color information of the image characteristic, classifying the color of the first pixel as the color, and simultaneously, calculating the color counter c corresponding to the color information of the image characteristic_i,i∈[0:6]And adding one. The calculation method of the number of pixels corresponding to the color information of the image features is shown as the following formula, wherein T is an empirical value and is generally an image noise variance:

c_i+＝1,if D((h,s,v),(h_i,s_i,v_i))＜T

the method for acquiring the color statistical data of the video image is suitable for the video images corresponding to all video frames in the video corresponding to any video identification. The video image retrieval system of the embodiment counts the global color characteristics of the image, effectively avoids the problem of low image retrieval reliability caused by only detecting the target object, and achieves the effect of improving the video image retrieval reliability.

Alternatively, after acquiring the color statistics of the video image, the camera 10 encodes the video image to obtain encoded video data. The camera 10 establishes a corresponding relationship between the color statistical data of the video image and the video coding data, so that the color statistical data of the video image corresponds to the video coding data one by one; and encapsulating the color statistical data and the video coding data of the video image according to the corresponding relation to obtain a video frame corresponding to the video image, wherein the encapsulated data of the video frame. The embodiment adopts a proprietary packaging method allowed by a packaging format standard when packaging the format.

The camera 10 can acquire all video frames in the video corresponding to any video identifier according to the method. It should be noted that, for the analog camera and the digital camera, the execution subjects of the video image acquisition, the color statistic calculation, the encoding, and the packaging process are slightly different. Fig. 2 is a schematic view of a processing flow of data information of a video image in a digital camera according to an embodiment of the present invention, and as shown in fig. 2, for the digital camera, the processes of capturing the video image, calculating color statistics, encoding, and packaging are all performed inside the digital camera. Fig. 3 is a schematic diagram of a processing flow of data information of video images in an analog camera according to an embodiment of the present invention, and as shown in fig. 3, for the analog camera, the analog camera itself only completes a process of acquiring video images, and a video encoder completes processes of calculating color statistics, encoding and packaging.

It should be noted that, the data information of the video image in the storage server 20 has no difference to conventional real-time preview, transmission, storage, and the like, and the difference is that the video encoded data of each frame of video image has color statistical data corresponding to the frame of video image, and the reading of the color statistical data of the frame of video image does not need to decode the video encoded data of the frame of video image, and can be obtained only by analyzing the encapsulation layer, so that the effect of improving the video image retrieval efficiency is achieved.

Optionally, the storage server 20 supports the functions of adding, modifying and deleting the data information of the video images, and the data information of the video images in the storage server 20 may be pre-stored, or may be added, modified and deleted in real time. After the data information of the video image is stored in the storage server 20, the target picture is retrieved from the storage server 20, and color analysis needs to be performed on the target picture sent by the client 30, wherein a calculation method of the color analysis is the same as a calculation method of the color statistical data of the video image, and is not described here again.

The color statistical data of the target picture comprises color information of the first characteristic and the number of pixels corresponding to the color information of the first characteristic. For example, if the target picture is a picture including a red car, the first feature is the car, and the color information of the first feature is red. Optionally, the obtaining of the color statistic data of the target picture includes: acquiring color information of the first characteristic; and acquiring the number of pixels corresponding to the color information of the first characteristic.

Since the target picture is likely to be inconsistent with the video image to be retrieved in the storage server 20 in terms of size ratio and image pixels, the number of pixels corresponding to the target color of the target picture and the number of pixels of the video image to be retrieved in the storage server 20 need to be adjusted correspondingly according to the actual ratio to obtain the more exact number of pixels corresponding to the target color. Optionally, the storage server 20 obtains the number of pixels corresponding to the color information of the first feature by: acquiring the number of pixels of a video frame stored in advance in the storage server 20; calculating the percentage of the size of the first feature in the target picture; and calculating the number of pixels corresponding to the color information of the first feature, wherein the number of pixels corresponding to the color information of the first feature is the product of the number of pixels of the video frame in the storage server 20 and the percentage.

For example, it is assumed that the target picture includes a red car, the first characteristic is the car, the car picture itself is 200 ten thousand pixels, the color information of the first characteristic is red, and the video frame in the storage server 20 is also 200 ten thousand pixels. In the video scene, the number of pixels corresponding to the color information of the first feature in the target picture needs to be converted, and assuming that the percentage of the car in the target picture is one percent, the number of pixels corresponding to the color information of the first feature in the target picture is 2 ten thousand pixels, that is, the color statistical data of the target picture is red and 2 ten thousand pixels.

Alternatively, matching the size of the first feature in the target picture with the size of the video image in the storage server 20 may be performed in two ways: one is a manual matching method; the other is an image algorithm matching method. The manual matching method refers to that a user views a real-time video stored in the storage server 20, and when a target object (such as a first feature, a car) appears in the real-time video, a picture is captured. And calculating the size proportion of the target object in the intercepted picture by using a grid method for the intercepted picture so as to obtain the approximate pixel value of the target object in the video scene. The image algorithm matching means that a user selects a video record containing various moving objects and classifies the moving objects through a human-vehicle object classification algorithm, wherein the human-vehicle object classification algorithm for the video in the machine vision field is mature, and is not described in detail here. The result of the algorithm for classifying the human, vehicle and object includes the normalized sizes of various targets of the human, vehicle and object, and the approximate pixel value of the target object in the video scene can be obtained by combining the pixel value of the video image in the storage server 20.

After acquiring the color statistical data of the target picture, the storage server 20 acquires a target video frame of which the color statistical data matches with the color statistical data of the target picture from the pre-stored video frames according to the color statistical data of the target picture. It should be noted that the preset condition that the matching result satisfies is preferably that the color statistical data of the target picture is a subset of the color statistical data of the target video frame, that is, as long as the color statistical data of the target picture is the subset of the color statistical data of the target video frame, the color statistical data of the target picture can satisfy the matching between the color statistical data of the target picture and the color statistical data of the target video frame. Assume that the color statistics of the target picture are: red, 2 ten thousand pixels, as long as the color statistic data of the target video frame includes "red, 2 ten thousand pixels", for example, the color statistic data of the target video frame is: red, 2 ten thousand pixels; blue, 2 ten thousand pixels, the color statistical data of the target picture can be matched with the color statistical data of the target video frame. When retrieving a target picture from a storage server, parsing a packaging layer of a video image is required to facilitate reading color statistics of the video image.

Alternatively, the storage server 20 may match the color statistic data of the target picture with the color statistic data of the video frame by performing the following steps to obtain the target video frame with a matching result meeting the preset condition:

step S4: the color statistical data of a first video frame is obtained, wherein the plurality of video frames stored in the storage server 20 include the first video frame, and the first video frame may be any one of the video frames corresponding to any video identifier. Obtaining color statistics for the first video frame also includes: acquiring color information of a first color in a first video frame; and acquiring the number of pixels corresponding to the first color. Wherein the first color is a color of a certain feature in the first video frame image.

Step S5: and judging whether the color statistical data of the first video frame is matched with the color statistical data of the target picture. The method specifically comprises the following steps: judging whether the color information of the first color is matched with the color information of the target color; and judging whether the number of pixels corresponding to the first color is matched with the number of pixels corresponding to the target color. And if the image of the first video frame does not contain the target color, or the image of the first video frame contains the target color, but the number of pixels corresponding to the target color is far less than that of pixels corresponding to the target color in the target picture, directly skipping the first video frame, and continuously judging the next video frame image in the storage server.

Step S6: and if the color statistical data of the first video frame is judged to be matched with the color statistical data of the target picture, determining the first video frame as the target video frame. Namely, the first video frame image contains the target color, and the number of pixels corresponding to the target color in the first video frame image is consistent with the number of pixels corresponding to the target color in the target picture, the first video frame is determined as the target video frame.

Optionally, after the storage server 20 acquires the target video frame, the storage server 20 performs image retrieval on the target video frame according to the first feature, and specifically, the storage server 20 may perform image retrieval on the target video frame by performing the following steps:

step S7, decoding the video encoded data of the target video frame to obtain a decoded video image.

After acquiring the target video frame, the storage server 20 decodes the video encoding data of the target video frame to obtain a decoded video image. The storage server 20 of this embodiment screens out a target video frame whose color statistical data matches with the color statistical data of the target picture from the pre-stored video frames according to the color statistical data of the target picture, filters out the video frame whose color statistical data obviously does not match with the color statistical data of the target picture, and only decodes the video coding data of the target video frame, thereby avoiding decoding consumption, overcoming the problem of low image retrieval efficiency caused by decoding all video frame images in the prior art, further achieving the technical effects of saving a large amount of computing resource consumption and improving the video image retrieval efficiency.

In step S8, it is determined whether the decoded video image includes the first feature.

In step S9, if it is determined that the decoded video image contains the first feature, the decoded video image is determined as a search result of the target picture.

Taking an example of retrieving a target picture containing a red car from any video (the first characteristic is that the car is a car, the color information of the first characteristic is red, and the number of pixels corresponding to the color information of the first characteristic is 2 ten thousand pixels), color statistical data for describing each frame of image, that is, private data of the color distribution of each frame of video data, is read in a packaging layer of the video image, and if there is no red in the frame of image, or the number of the red pixels is far less than 2 ten thousand pixels, the frame of video is directly skipped, otherwise, the conventional image searching algorithm is executed, namely decoding the video coding data of the frame video image to obtain a decoded video image, and then judging whether the decoded video image contains the car or not, and if so, taking the frame of video image as a retrieval result of the target picture. The retrieval result may include a result picture and a video corresponding to the result picture.

Optionally, the storage server 20 is further configured to send the retrieval result to the client 30 after obtaining the retrieval result of the target picture, so as to complete a complete operation of searching the picture with the picture.

The video image retrieval system of the embodiment screens out a target video frame matched with the color statistical data of the target picture from pre-stored video frames according to the color statistical data of the target picture, then decodes the video coded data of the target video frame to obtain a decoded video image, and then retrieves a video image containing the first characteristic in the target picture from the decoded video image. The video image retrieval system of the embodiment solves the technical problem of low image retrieval efficiency caused by the fact that images need to be temporarily decoded and analyzed in the image retrieval process in the prior art, and achieves the technical effect of improving the video image retrieval efficiency. Meanwhile, the video image retrieval system of the embodiment stores the color statistical data of all the video images to the storage server in the video acquisition process, effectively avoids the problem of low reliability caused by detecting only the target object by counting the global color characteristics of the video images, reduces the problem of accuracy reduction caused by image color deviation by utilizing the pre-established preset scene color clustering table, and further achieves the technical effect of improving the accuracy and reliability of video image retrieval.

There is also provided, in accordance with an embodiment of the present invention, a video image retrieval method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The video image retrieval method of the embodiment is executed by the storage server provided by the embodiment of the invention. Fig. 4 is a flowchart of a video image retrieval method according to an embodiment of the present invention, as shown in fig. 4, the method including the steps of:

step S402, acquiring color statistical data of a target picture, wherein the color statistical data of the target picture is data representing the color of a first characteristic in the target picture;

step S404, analyzing a pre-stored packaging layer of a video frame to obtain color statistical data of the video frame, wherein the video frame is obtained after packaging the color statistical data of the video frame and video coding data;

step S406, matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets the preset condition; and

step S408, image retrieval is carried out on the target video frame according to the first characteristic.

The above steps are a retrieval process of a target picture, where the retrieval process may be executed by a storage server in the video image retrieval system according to the embodiment of the present invention, and the target picture may be input by a client in the video image retrieval system according to the embodiment of the present invention, and a retrieval source of the target picture may be acquired by a camera in the video image retrieval system according to the embodiment of the present invention.

The storage server can store data information of video images of one or more videos, wherein different videos can be distinguished and represented by different video identifiers. Optionally, the video corresponding to any one of the video identifiers in the storage server may include a plurality of video frames, data information of each frame of video image includes color statistical data and video coded data of the frame of video image, and the color statistical data and the video coded data are stored in the storage server in a packaged format, that is, packaged data of a plurality of video frames are stored in the storage server, where the packaged data of the video frames are data obtained by packaging the color statistical data and the video coded data of the video frames.

The video frame in the video corresponding to any one of the video identifiers in the storage server is sent by the camera, and optionally, the camera encapsulates the collected color statistical data of the video image and the video encoding data into a video frame, and sends the encapsulated video frame to the storage server includes: establishing a corresponding relation between color statistical data and video coding data; and the camera encapsulates the color statistical data and the video coding data according to the corresponding relation to obtain a video frame.

Optionally, the color statistics of the video image include: the color information of the image features in the video image and the number of pixels corresponding to the color information of the image features. The method for acquiring the color statistical data of the video image by the camera comprises the following steps:

step S1: and acquiring color information of image features in the video image. When the camera collects video images, color discrimination and pixel number statistics are carried out on each frame of video image, wherein the types and the number of colors depend on color space calculation of pixel points, and the retrieval method of the video image does not pay attention to the color discrimination calculation method. Optionally, the obtaining of the color information of the image features in the video image may occur during the video acquisition process, and the color information of the image features in the video image is obtained through color discrimination and pixel number statistics. The color information of the image features in the video image can also be obtained by decoding the video stored in the storage server, and then performing color analysis on each video frame in the decoded video, wherein the color analysis on each video frame in the decoded video comprises color discrimination and pixel number statistics.

Optionally, the camera performs color classification on each pixel point in the video image, and the counting of the number of pixels corresponding to the color information of the image features includes the following steps:

step S21: and acquiring a color value of a first pixel in the video frame, wherein the first pixel is any one pixel in the video frame image. After a video image is captured by a camera, YUV data of the video image is first acquired, and a video image in NV12 format is usually output by a capturing device (such as a camera). And then converting the chrominance space of the video image into an RGB space from YUV, and assuming that the YUV value of a pixel point of the video image is (y, u, v), the conversion mode is shown as the following formula:

r＝y+1.13983*(v-128)

g＝y-0.39465*(u-128)-0.58060*(v-128)

b＝y+2.03211*(u-128)

max＝max(r,g,b)

min＝min(r,g,b)

v＝max

color_tab[6]＝(h_i,s_i,v_i),i∈[0:6]

The video image retrieval method of the embodiment considers the situation that the image has color cast in different monitoring scenes (such as color temperature, illumination and the like), and if the same color template is adopted for color statistics, the accuracy of the statistical result is greatly influenced. The video image retrieval method of the embodiment reduces the problem of image retrieval accuracy reduction caused by image color deviation by using the pre-established preset scene color clustering table, and achieves the effect of improving the video image retrieval accuracy.

c_i+＝1,if D((h,s,v),(h_i,s_i,v_i))＜T,

the method for acquiring the color statistical data of the video image is suitable for the video images corresponding to all video frames in the video corresponding to any video identification. The video image retrieval method of the embodiment counts the global color characteristics of the image, effectively avoids the problem of low image retrieval reliability caused by only detecting the target object, and achieves the effect of improving the video image retrieval reliability.

Optionally, after acquiring the color statistics of the video image, the camera encodes the video image to obtain encoded video data. The camera establishes a corresponding relation between color statistical data of the video image and video coding data, so that the color statistical data of the video image corresponds to the video coding data one by one; and encapsulating the color statistical data and the video coding data of the video image according to the corresponding relation to obtain a video frame corresponding to the video image, wherein the encapsulated data of the video frame. The embodiment adopts a proprietary packaging method allowed by a packaging format standard when packaging the format.

And after acquiring all video frames in the video corresponding to any video identification, the camera sends the video frames to the storage server. It should be noted that, the data information of the video image in the storage server has no difference to conventional real-time preview, transmission, storage, and the like, and the difference is that the video encoded data of each frame of video image has color statistical data corresponding to the frame of video image, and the reading of the color statistical data of the frame of video image does not need to decode the video encoded data of the frame of video image, and can be obtained only by analyzing the encapsulation layer, thereby achieving the effect of improving the video image retrieval efficiency.

Optionally, the storage server supports functions of adding, modifying and deleting data information of the video images, and the data information of the video images in the storage server may be pre-stored, or may be added, modified and deleted in real time. After the data information of the video image is stored in the storage server, the target picture is retrieved from the storage server, and color analysis needs to be performed on the target picture sent by the client, wherein a calculation method of the color analysis is the same as a calculation method of the color statistical data of the video image, and is not repeated here.

Since the target picture is likely to be inconsistent with the video image to be retrieved in the storage server in terms of size ratio and image pixels, the number of pixels corresponding to the target color of the target picture and the number of pixels of the video image to be retrieved in the storage server need to be adjusted correspondingly according to the actual ratio, so as to obtain the more exact number of pixels corresponding to the target color. Optionally, the obtaining, by the storage server, the number of pixels corresponding to the color information of the first feature includes: acquiring the pixel number of a video frame prestored in a storage server; calculating the percentage of the size of the first feature in the target picture; and calculating the number of pixels corresponding to the color information of the first feature, wherein the number of pixels corresponding to the color information of the first feature is the product of the number of pixels of the video frame in the storage server and the percentage.

For example, suppose that the target picture includes a red car, the first feature is the car, the car picture itself is 200 ten thousand pixels, the color information of the first feature is red, and the video frame in the storage server is also 200 ten thousand pixels. In the video scene, the number of pixels corresponding to the color information of the first feature in the target picture needs to be converted, and assuming that the percentage of the car in the target picture is one percent, the number of pixels corresponding to the color information of the first feature in the target picture is 2 ten thousand pixels, that is, the color statistical data of the target picture is red and 2 ten thousand pixels.

Optionally, matching the size of the first feature in the target picture with the size of the video image in the storage server may be performed in two ways: one is a manual matching method; the other is an image algorithm matching method. The manual matching method refers to that a user views a real-time video stored by a storage server, and when a target object (such as a first feature, a car) appears in the real-time video, a picture is captured. And calculating the size proportion of the target object in the intercepted picture by using a grid method for the intercepted picture so as to obtain the approximate pixel value of the target object in the video scene. The image algorithm matching means that a user selects a video record containing various moving objects and classifies the moving objects through a human-vehicle object classification algorithm, wherein the human-vehicle object classification algorithm for the video in the machine vision field is mature, and is not described in detail here. The result of the classification algorithm for the human, vehicle and object comprises the normalized sizes of various targets of the human, vehicle and object, and the approximate pixel value of the target object in the video scene can be obtained by combining the pixel value of the video image in the storage server.

After the storage server obtains the color statistical data of the target picture, step S406 is executed, that is, a target video frame with color statistical data matched with the color statistical data of the target picture is obtained from the pre-stored video frame according to the color statistical data of the target picture. It should be noted that the preset condition that the matching result satisfies is preferably that the color statistical data of the target picture is a subset of the color statistical data of the target video frame, that is, as long as the color statistical data of the target picture is the subset of the color statistical data of the target video frame, the color statistical data of the target picture can satisfy the matching between the color statistical data of the target picture and the color statistical data of the target video frame. Assume that the color statistics of the target picture are: red, 2 ten thousand pixels, as long as the color statistic data of the target video frame includes "red, 2 ten thousand pixels", for example, the color statistic data of the target video frame is: red, 2 ten thousand pixels; blue, 2 ten thousand pixels, the color statistical data of the target picture can be matched with the color statistical data of the target video frame. When retrieving a target picture from a storage server, parsing a packaging layer of a video image is required to facilitate reading color statistics of the video image.

Optionally, the color statistical data of the video frame includes color information of each image feature in the video frame and a pixel number corresponding to the color information of each image feature, and matching the color statistical data of the target picture with the color statistical data of the video frame includes: matching the color information of the first feature with the color information of each image feature; and matching the number of pixels corresponding to the color information of the first feature with the number of pixels corresponding to the color information of each image feature. Specifically, the step S406 of matching, by the storage server, the color statistical data of the target picture with the color statistical data of the video frame to obtain the target video frame of which the matching result meets the preset condition may include the following steps:

step S4: the method comprises the steps of obtaining color statistical data of a first video frame, wherein a plurality of video frames stored in a storage server comprise the first video frame, and the first video frame can be any one of videos corresponding to any video identification. Obtaining color statistics for the first video frame also includes: acquiring color information of a first color in a first video frame; and acquiring the number of pixels corresponding to the first color. Wherein the first color is a color of a certain feature in the first video frame image.

Step S5: and judging whether the color statistical data of the first video frame is matched with the color statistical data of the target picture. The method specifically comprises the following steps: judging whether the color information of the first color is matched with the color information of a target color, wherein the target color is the color of the first characteristic in the target picture; and judging whether the number of pixels corresponding to the first color is matched with the number of pixels corresponding to the target color. And if the image of the first video frame does not contain the target color, or the image of the first video frame contains the target color, but the number of pixels corresponding to the target color is far less than that of pixels corresponding to the target color in the target picture, directly skipping the first video frame, and continuously judging the next video frame image in the storage server.

Optionally, the step S408 of performing image retrieval on the target video frame according to the first feature includes the following steps:

After the target video frame is obtained, the storage server decodes the video coding data of the target video frame to obtain a decoded video image. The storage server of the embodiment screens out the target video frame with the color statistical data matched with the color statistical data of the target picture from the pre-stored video frames according to the color statistical data of the target picture, filters out the video frames with the color statistical data obviously unmatched with the color statistical data of the target picture, and only decodes the video coding data of the target video frame, thereby avoiding decoding consumption, overcoming the problem of low image retrieval efficiency caused by decoding all video frame images in the prior art, further achieving the technical effects of saving a large amount of computing resource consumption and improving the video image retrieval efficiency.

Optionally, after obtaining the retrieval result of the target picture, the storage server sends the retrieval result to the client, thereby completing a complete operation of searching the picture by using the picture.

The video image retrieval method of the embodiment screens out a target video frame matched with the color statistical data of the target picture from pre-stored video frames according to the color statistical data of the target picture, then decodes the video coding data of the target video frame to obtain a decoded video image, and then retrieves a video image containing the first characteristic in the target picture from the decoded video image. The video image retrieval method of the embodiment solves the technical problem of low image retrieval efficiency caused by the fact that images need to be temporarily decoded and analyzed in the image retrieval process in the prior art, and achieves the technical effect of improving the video image retrieval efficiency.

According to an embodiment of the present invention, there is also provided a video image retrieval device, and it should be noted that the video image retrieval device in this embodiment may be used to execute the video image retrieval method in this embodiment of the present invention, and the video image retrieval method in this embodiment may also be executed in the video image retrieval device in this embodiment.

Fig. 5 is a schematic diagram of a video image retrieval apparatus according to an embodiment of the present invention, as shown in fig. 5, the video image retrieval apparatus including: a first acquisition module 52, a parsing module 54, a second acquisition module 56, and a retrieval module 58.

The first obtaining module 52 is configured to obtain color statistics data of the target picture, where the color statistics data of the target picture is data representing a color of a first feature in the target picture.

Optionally, the color statistical data of the target picture includes color information of the first feature and a pixel number corresponding to the color information of the first feature, where the first obtaining module 52 includes: the first obtaining submodule is used for obtaining color information of the first characteristic; the second obtaining submodule is used for obtaining the pixel number of the video frame; the first calculation module is used for calculating the percentage of the size of the first feature in the target picture; and the second calculation module is used for calculating the number of pixels corresponding to the color information of the first characteristic, wherein the number of pixels corresponding to the color information of the first characteristic is the product of the number of pixels of the video frame and the percentage.

The parsing module 54 is configured to parse a pre-stored encapsulation layer of a video frame to obtain color statistical data of the video frame, where the video frame is obtained by encapsulating the color statistical data of the video frame and video encoding data.

And a second obtaining module 56, configured to match the color statistical data of the target picture with the color statistical data of the video frame, and obtain a target video frame of which a matching result meets a preset condition.

Optionally, the color statistical data of the video frame includes color information of each image feature in the video frame and a pixel number corresponding to the color information of each image feature, where the second obtaining module 56 includes: the first matching module is used for matching the color information of the first characteristic with the color information of each image characteristic; and the second matching module is used for matching the number of pixels corresponding to the color information of the first characteristic with the number of pixels corresponding to the color information of each image characteristic.

And the retrieval module 58 is used for carrying out image retrieval on the target video frame according to the first characteristic.

Optionally, the retrieving module 58 comprises: the decoding module is used for decoding the video coding data of the target video frame to obtain a decoded video image; the judging module is used for judging whether the decoded video image contains the first characteristic or not; and the determining module is used for determining the decoded video image as the retrieval result of the target picture when the first characteristic is judged to be contained in the decoded video image.

The first obtaining module 52 in the video image retrieving apparatus of this embodiment may be configured to perform step S402 in the video image retrieving method of this embodiment of the present invention, the parsing module 54 may be configured to perform step S404 in the video image retrieving method of this embodiment of the present invention, the second obtaining module 56 may be configured to perform step S406 in the video image retrieving method of this embodiment of the present invention, and the retrieving module 58 may be configured to perform step S408 in the video image retrieving method of this embodiment of the present invention. By the video image retrieval device of the embodiment, the technical problem of low image retrieval efficiency caused by the fact that images need to be temporarily decoded and analyzed in the image retrieval process in the prior art is solved, and the technical effect of improving the video image retrieval efficiency is achieved.

Through the analysis, the invention achieves the following technical effects:

firstly, the color statistical data of the video image is stored in the storage server in advance, in the process of retrieving the video image, the target video frame matched with the color statistical data of the target image is screened out from the storage server, and then the image searching algorithm is carried out on the target video frame to obtain the retrieval result of the target image.

Secondly, the invention effectively avoids the problem of low retrieval reliability caused by retrieving only the target object by counting the global color information of the video image.

Thirdly, the invention adopts the pre-established preset scene color cluster table to analyze the number of pixels corresponding to the target color, thereby effectively reducing the problem of low retrieval accuracy rate caused by image color deviation.

The method can be used for searching the video images, particularly can effectively improve the searching efficiency of the massive video images, can effectively reduce the manual searching cost and shorten the positioning time of key events and targets in the massive video images.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for retrieving a video image, comprising:

acquiring color statistical data of a target picture, wherein the color statistical data of the target picture is data representing a color of a first feature in the target picture;

analyzing a pre-stored packaging layer of a video frame to obtain color statistical data of the video frame, wherein the video frame comprises the color statistical data and video coding data of the video frame;

matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and

and carrying out image retrieval on the target video frame according to the first characteristic.

2. The video image retrieval method according to claim 1, wherein the color statistic data of the target picture includes the color information of the first feature and a pixel number corresponding to the color information of the first feature, and wherein obtaining the color statistic data of the target picture includes:

acquiring color information of the first characteristic;

acquiring the number of pixels of the video frame;

calculating the percentage of the size of the first feature in the target picture; and

and calculating the number of pixels corresponding to the color information of the first feature, wherein the number of pixels corresponding to the color information of the first feature is the product of the number of pixels of the video frame and the percentage.

3. The method of claim 2, wherein the color statistics of the video frame comprise color information of each image feature in the video frame and a number of pixels corresponding to the color information of each image feature, and wherein matching the color statistics of the target picture with the color statistics of the video frame comprises:

matching the color information of the first characteristic with the color information of each image characteristic respectively; and

and matching the number of pixels corresponding to the color information of the first characteristic with the number of pixels corresponding to the color information of each image characteristic.

4. The video image retrieval method of claim 1, wherein performing image retrieval on the target video frame according to the first feature comprises:

decoding the video coding data of the target video frame to obtain a decoded video image;

judging whether the decoded video image contains the first characteristic or not; and

and if the decoded video image is judged to contain the first feature, determining the decoded video image as the retrieval result of the target picture.

5. A video image retrieval system, comprising:

the video camera is used for acquiring color statistical data and video coding data of a video image and packaging the color statistical data and the video coding data into a video frame; and

the storage server is used for acquiring color statistical data of the target picture; analyzing the packaging layer of the video frame to obtain color statistical data of the video frame; matching the color statistical data of the target picture with the color statistical data of the video frame to obtain a target video frame of which the matching result meets a preset condition; and performing image retrieval on the target video frame according to the first characteristic, wherein the color statistical data of the target picture is data representing the color of the first characteristic in the target picture, and the video frame comprises the color statistical data and video coding data of the video frame.

6. The video image retrieval system of claim 5, wherein the camera is configured to pack the color statistics and the video encoding data into video frames in the following manner:

establishing a corresponding relation between the color statistical data and the video coding data; and

and encapsulating the color statistical data and the video coding data according to the corresponding relation to obtain the video frame.

7. The video image retrieval system of claim 6, wherein the camera is configured to obtain color statistics for the video image by:

acquiring color information of image features in the video image; and

and classifying colors of each pixel point in the video image, and counting the number of pixels corresponding to the color information of the image characteristics.

8. The video image retrieval system of claim 7, wherein the camera is configured to classify colors of each pixel point in the video image and count the number of pixels corresponding to the color information of the image features according to the following method:

acquiring a color value of a first pixel in the video image;

judging whether the color value of the first pixel is matched with the color value corresponding to the color information of the image characteristic; and

if the color value of the first pixel is judged to be matched with the color value corresponding to the color information of the image characteristic, classifying the color of the first pixel into the color information of the image characteristic, and accumulating the number of pixels corresponding to the color information of the image characteristic.

9. A video image retrieval apparatus, comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring color statistical data of a target picture, and the color statistical data of the target picture is data representing the color of a first characteristic in the target picture;

the analysis module is used for analyzing a pre-stored packaging layer of a video frame to acquire color statistical data of the video frame, wherein the video frame comprises the color statistical data and video coding data of the video frame;

the second acquisition module is used for matching the color statistical data of the target picture with the color statistical data of the video frame to acquire a target video frame of which the matching result meets a preset condition; and

and the retrieval module is used for carrying out image retrieval on the target video frame according to the first characteristic.

10. The video image retrieval device according to claim 9, wherein the color statistic data of the target picture includes color information of the first feature and a pixel number corresponding to the color information of the first feature, and wherein the first obtaining module includes:

the first obtaining submodule is used for obtaining the color information of the first characteristic;

the second obtaining submodule is used for obtaining the pixel number of the video frame;

the first calculation module is used for calculating the percentage of the size of the first feature in the target picture; and

and the second calculating module is used for calculating the number of pixels corresponding to the color information of the first characteristic, wherein the number of pixels corresponding to the color information of the first characteristic is the product of the number of pixels of the video frame and the percentage.

11. A computer-readable storage medium storing at least one instruction which, when executed by a processor, performs the video image retrieval method of any one of claims 1-4.