CN109800685A

CN109800685A - The determination method and device of object in a kind of video

Info

Publication number: CN109800685A
Application number: CN201811648094.8A
Authority: CN
Inventors: 万一木; 徐珺
Original assignee: Fujian Yi Tu Network Technology Co Ltd; Shanghai Is According To Figure Network Technology Co Ltd
Current assignee: Fujian Yi Tu Network Technology Co Ltd; Shanghai Is According To Figure Network Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-05-24

Abstract

The invention discloses a kind of determination method and devices of object in video.The described method includes: obtaining the video to be processed that monitoring device is shot within a preset period of time, if the first image is detection frame image, the type that object is respectively identified in the first image and the corresponding detection image information of each identification object are detected；If the first image is prediction frame image, according to the corresponding image information of each identification object in the second image, prediction each identification object corresponding forecast image information in the first image；According to detection image information and forecast image information of the same identification object in different frame image, the identification image of identification object is determined.Due to first judging whether detection image information and forecast image information are images to be recognized, the precision of the identification image of object to be identified is improved, the accuracy that the subsequent identification image using object to be identified is filed is further increased.

Description

Method and device for determining object in video

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for determining an object in a video.

Background

In the current society, monitoring equipment is distributed in various public places such as streets, communities, buildings and the like due to the requirement of security management. When the police condition occurs, the image of the suspect or the suspect vehicle is determined from the video data collected by the monitoring equipment, and then the police personnel search for the suspect or the suspect vehicle according to the image of the suspect or the suspect vehicle.

In the prior art, after a monitoring device collects a video stream, each frame of image in the video stream is mostly detected and identified to determine objects such as faces or vehicles in each frame of image, and then the detected objects in each frame of image are matched to determine all images of an object.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining an object in a video, and aims to solve the technical problems that in the prior art, each frame of image needs to be detected and identified, the calculated amount is large, and the efficiency is low.

The embodiment of the invention provides a method for determining an object in a video, which comprises the following steps:

acquiring a to-be-processed video shot by monitoring equipment within a preset time period, wherein the to-be-processed video comprises N frames of images; n is greater than or equal to 2;

for a first image, if the first image is a detection frame image, detecting the type of each identification object in the first image and detection image information corresponding to each identification object; if the first image is a prediction frame image, predicting prediction image information corresponding to each recognition object in the first image according to image information corresponding to each recognition object in a second image; the first image is any one of the N frames of images, the second image is an adjacent image of the first image, and the image information corresponding to the identification object is determined or predicted;

and determining the identification image of the identification object according to the detection image information and the prediction image information of the same identification object in different frame images.

The method comprises the steps of judging whether the detection image information and the prediction image information are images to be identified or not, and then determining the identification image of the object to be identified according to the detection image information and the prediction image information of the same object to be identified, wherein the detection image information and the prediction image information are determined as the images to be identified, so that the accuracy of the identification image of the object to be identified is improved, and the accuracy of filing by using the identification image of the object to be identified subsequently is further improved.

In one possible implementation manner, if the first image is a detection frame image, determining detection image information corresponding to each recognition object includes:

when the first image is determined to be a detection frame image, performing object detection on the first image, and determining first image information corresponding to each first object in the first image;

predicting second image information corresponding to each second object in the first image according to third image information corresponding to each second object in a third image; the third image is an adjacent image of the first image acquired by the monitoring equipment and is a predicted frame image;

when second image information of a second object and first image information of a first object meet set conditions, the second object and the first object are determined to be the same identification object, and detection image information of the same identification object is determined according to the first image information and the second image information.

In one possible implementation manner, if the first image is a detection frame image, determining the type of each identification object in the first image and detection image information corresponding to each identification object includes:

if the first image is a detection frame image, inputting the first image into a classifier model, and determining the type of each identification object in the first image and detection image information corresponding to each identification object; the types used by the classifier to distinguish include motor vehicles, non-motor vehicles and pedestrians.

In one possible implementation manner, determining an identification image of an identification object according to detection image information and prediction image information of the same identification object in different frame images includes:

selecting K pieces of image information as the identification images of the identification objects according to the detection image information and the prediction image information of the same identification object in different frame images; or,

and selecting K pieces of image information according to the detection image information and the prediction image information of the same identification object in different frame images, and generating the identification image of the identification object according to the K pieces of image information.

The embodiment of the invention provides a device for determining an object in a video, which comprises:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a video to be processed which is shot by monitoring equipment in a preset time period, and the video to be processed comprises N frames of images; n is greater than or equal to 2;

the processing unit is used for detecting the type of each identification object and detection image information corresponding to each identification object in a first image if the first image is a detection frame image; if the first image is a prediction frame image, predicting prediction image information corresponding to each recognition object in the first image according to image information corresponding to each recognition object in a second image; the first image is any one of the N frames of images, the second image is an adjacent image of the first image, and the image information corresponding to the identification object is determined or predicted;

the processing unit is further used for determining the identification image of the identification object according to the detection image information and the prediction image information of the same identification object in different frame images.

In a possible implementation manner, the processing unit is specifically configured to:

The embodiment of the present application further provides an apparatus having a function of implementing the method for determining an object in a video described above. This function may be implemented by hardware executing corresponding software, and in one possible design, the apparatus includes: a processor, a transceiver, a memory; the memory is used for storing computer execution instructions, the transceiver is used for realizing the communication between the device and other communication entities, the processor is connected with the memory through the bus, and when the device runs, the processor executes the computer execution instructions stored in the memory so as to enable the device to execute the method for determining the object in the video.

An embodiment of the present invention further provides a computer storage medium, where a software program is stored, and when the software program is read and executed by one or more processors, the software program implements the method for determining an object in a video described in the foregoing various possible implementations.

Embodiments of the present invention also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method for determining an object in a video described in the above-mentioned various possible implementations.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.

FIG. 1 is a diagram of a system architecture suitable for use with an embodiment of the present invention;

fig. 2 is a schematic flowchart illustrating a method for determining an object in a video according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for determining an identified object in a predicted frame image according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for determining an object in a video according to an embodiment of the present invention.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings, and the specific operation methods in the method embodiments can also be applied to the apparatus embodiments.

The determination of the object in the embodiment of the present invention may be applied to image archiving, for example, when a video stream acquired by a monitoring device is used for archiving, an object frame of the same object in each frame of image in the video stream may be determined by using the determination method of the object in the video in the embodiment of the present invention, and then the object frame corresponding to the object is used as an identification image of the object for subsequent archiving.

Fig. 1 illustrates a schematic diagram of a system architecture to which an embodiment of the present invention is applicable, in which a monitoring device 101 and a server 102 are included. The monitoring equipment 101 collects a video stream in real time, then sends the collected video stream to the server 102, the server 102 comprises a device for determining an object in the video, and the server 102 acquires an image from the video stream and then determines an image area corresponding to the object in the image. The monitoring device 101 is connected to the server 102 via a wireless network, and is an electronic device with an image capturing function, such as a camera, a video recorder, and the like. The server 102 is a server or a server cluster composed of several servers or a cloud computing center.

Based on the system architecture shown in fig. 1, fig. 2 exemplarily shows a flowchart corresponding to a method for determining an object in a video according to an embodiment of the present invention, where the flowchart of the method may be executed by a device for determining an object in a video, and the device for determining an object in a video may be the server 102 shown in fig. 1, as shown in fig. 2, the method specifically includes the following steps:

step 201, acquiring a to-be-processed video shot by a monitoring device within a preset time period.

The video to be processed may comprise N frames of images; n is greater than or equal to 2.

Further, the acquired video to be processed may be a second image and a first image which are sequentially output by the monitoring device in time sequence within a preset time period, wherein the second image is output earlier than the first image.

Step 202, for the first image, if the first image is a detection frame image, detecting the type of each identification object and the detection image information corresponding to each identification object in the first image.

Each image information corresponds to one frame of image, and the first object to be identified is any object to be identified contained in any frame of image.

Specifically, an image of a video to be processed is marked in advance, an image which needs to be detected in the video to be processed is marked as a detection frame image, and an image which needs to be predicted is marked as a prediction frame image. Illustratively, 10 frames of pictures are set in a segment of video stream, the first frame of picture and the fifth frame of picture are marked as detection frame pictures, the second frame of picture to the fourth frame of picture are marked as prediction frame pictures, and the sixth frame to the tenth frame of picture are marked as prediction frame pictures.

The method comprises the steps of dividing an image in a video stream acquired by monitoring equipment into a detection frame image and a non-detection frame image, judging whether the first image is the detection frame image or not when the first image is acquired, if so, detecting an object to be identified in the first image, otherwise, predicting the object to be identified in the first image by using the object to be identified in the second image, so that each frame of image to be identified does not need to be detected and identified, the calculation amount for determining the object to be identified in the image is reduced, and the efficiency is improved.

Further, if the first image is a detection frame image, the first image may be input to a classifier model, and the type of each recognition object in the first image and the detection image information corresponding to each recognition object may be determined. The types of the classifier used for distinguishing include motor vehicles, non-motor vehicles and pedestrians. That is, the types of the respective recognition objects include a motor vehicle, a non-motor vehicle, and a pedestrian, and further, the detection image information corresponding to the respective recognition objects includes detection image information corresponding to the motor vehicle, detection image information corresponding to the non-motor vehicle, and detection image information corresponding to the pedestrian.

Specifically, the type of each recognition object in the first image may be detected, and detection image information corresponding to each recognition object in the first image may be determined.

Further, the object detection may be performed on the first image, the detection image region corresponding to each recognition object in the first image is determined, and then the detection image region corresponding to each recognition object is input into the classifier model, so as to determine the type corresponding to each recognition object. The image area may be an image frame having a regular shape or an image frame not having a regular shape.

Optionally, for any recognition object, a key point of the object is detected in an image region of the recognition object. And adjusting the image area of the recognition object in the first image according to the key point.

Specifically, the key points of the object are key points for identifying the object, for example, the key points of the motor vehicle may include a license plate, a window, wheels, and the like, and the key points of the pedestrian may include a head, four limbs, an upper body, a lower body, and the like.

Since the key points of the object to be recognized are detected and then the image area is adjusted based on the key points, the image area is more accurate.

In step 203, if the first image is a predicted frame image, predicted image information corresponding to each recognition target in the first image is predicted from image information corresponding to each recognition target in the second image.

The first image is any one of N frames of images, the second image is an image which is adjacent to the first image and is collected by the monitoring equipment, and the image information corresponding to the identification object is determined or predicted. Specifically, the second image may be a detection frame image or a non-detection frame image.

Furthermore, the image area corresponding to each recognition object in the first image may be predicted according to the image area corresponding to each recognition object in the second image, and further, the image information of each object to be recognized in the image area corresponding to the first image, that is, the image information corresponding to each recognition object in the first image may be predicted.

Specifically, for an image region corresponding to any identification object in the second image, an image region with the similarity of the image region corresponding to the identification object larger than a preset threshold is determined from the first image as a predicted image region corresponding to the identification object in the first image.

For example, an image frame corresponding to each recognition object in the known second image is set, and for the image frame corresponding to the recognition object a in the second image, an image frame with similarity greater than a preset threshold with the image frame of the recognition object a is determined from the first image as the image frame of the recognition object a in the first image. For example, the object to be recognized is set as a motor vehicle, the vehicle frame corresponding to the motor vehicle in the second image is known, and for the vehicle frame corresponding to each motor vehicle in the second image, the vehicle frame with the similarity greater than the preset threshold value with the vehicle frame is determined from the first image as the vehicle frame of the motor vehicle in the first image.

In one possible embodiment, a region with obvious features is selected from the recognition object, the region is compared with the first image, an image region with similarity greater than a preset threshold with the region is determined from the first image, and then the image region is enlarged to serve as a corresponding image region of the recognition object in the first image.

For example, the object to be identified is set as a motor vehicle, a vehicle frame corresponding to the motor vehicle in the second image is known, for the vehicle frame in the second image, firstly, areas with obvious features in the vehicle frame, such as wheels and windows, are selected, the areas with the obvious features in the vehicle frame are compared with the first image, an image area with similarity to the area larger than a preset threshold value is determined from the first image, and then the image area is expanded to serve as the vehicle frame corresponding to the motor vehicle in the first image.

Considering that there may be more than one image area with the similarity greater than the preset threshold, the embodiment of the present invention may further determine the number of image areas with the similarity greater than the preset threshold.

If only one image area with the similarity larger than the preset threshold is available, the image area with the similarity larger than the preset threshold is used as the corresponding image area of the identification object in the first image.

If more than one image area with the similarity greater than the preset threshold is provided, one possible implementation manner is to select the image area with the highest similarity as the image area corresponding to the recognition object in the first image, so that one recognition object is prevented from corresponding to multiple image areas in one image.

Another possible implementation manner is that the image area closest to the position of the image area corresponding to the identification object in the second image is selected as the image area corresponding to the identification object in the first image.

Because the time interval between the images acquired by the monitoring equipment is short, the moving distance of the same object to be identified in the two adjacent images is short, and when the image area of the identified object in the second image is predicted in the image area of the first image, the image area closest to the image area of the identified object in the second image can be selected as the corresponding image area of the identified object in the first image, so that one identified object is prevented from corresponding to a plurality of image areas in one image.

Alternatively, after predicting the corresponding image region of each recognition object in the first image, for any recognition object, the key point of the recognition object is detected in the image region of the recognition object. And adjusting the image area of the recognition object in the first image according to the key point.

For example, the object to be identified is set as a motor vehicle, after a corresponding vehicle frame of the motor vehicle in the first image is predicted, a key point in the vehicle frame is detected, if the monitored key point includes a front window, it is indicated that the motor vehicle in the vehicle frame is the front side of the vehicle, and the front wheel is not included in the vehicle frame, the vehicle frame may be expanded downward, so that the front wheel of the motor vehicle is included in the vehicle frame.

Since the key point of the recognition object is detected after the image area of the recognition object in the first image is predicted, and then the image area of the recognition object is adjusted based on the key point, the image area of the recognition object is more accurate.

After finding the M pieces of image information of the first object to be identified, in the embodiment of the present invention, it may further determine whether the M pieces of image information (including the detected image information and the predicted image information) are identification images (such as pedestrian images, or vehicle images, or non-vehicle images).

Further, in consideration that when the first image is a predicted frame image, the image information of each recognition object is obtained by prediction, in order to improve the accuracy of the image information of each recognition object, an embodiment of the present invention provides a flowchart corresponding to the method for determining a recognition object in a predicted frame image, as shown in fig. 3, specifically including the following steps:

step 301, when the first image is determined to be the detection frame image, performing object detection on the first image, and determining first image information corresponding to each first object in the first image.

Step 302, predicting second image information corresponding to each second object in the first image according to the third image information corresponding to each second object in the third image.

The third image is an adjacent image of the first image acquired by the monitoring device and the third image is a predicted frame image.

And step 303, when the second image information of the second object and the first image information of the first object meet the set condition, determining that the second object and the first object are the same identification object, and determining the detection image information of the same identification object according to the first image information and the second image information.

In one possible embodiment, the second object and the first object are determined to be the same object when there is an intersection between the second image region of the second object and the first image region of the first object.

For example, object detection is performed on the first image, a first image frame of the recognition object a is determined, and a second image frame corresponding to each second object in the first image is predicted according to a third image frame corresponding to each second object in the second image. And setting that one second image frame exists in the second image frames and the intersection exists between the second image frame and the first image frame of the identification object A, and determining the identification object in the second image frame as the identification object A.

In one possible embodiment, when the distance between the position space of the second image area of the second object and the position space of the first image area of the first object is smaller than a set threshold, it is determined that the second object and the first object are the same object.

According to the third image information corresponding to each second object in the second image, second image information corresponding to each second object in the first image is predicted, and when the second image information of the second object and the first image information of the first object meet set conditions, the second object and the first object are determined to be the same object, so that two image areas of the same object in one image to be recognized are avoided, and meanwhile, the continuity of the same object in each image to be recognized is guaranteed.

Optionally, after the step 203 is executed, position information of the identification object in different frame images may also be recorded, so as to form track information of the identification object under the monitoring device. Therefore, convenience of tracking the identification object subsequently can be improved.

And step 204, determining the identification image of the identification object according to the detection image information and the prediction image information of the same identification object in different frame images.

There are various ways to determine the identification image of the identification object, and one possible implementation way is to select K pieces of image information as the identification image of the identification object according to the detection image information and the prediction image information of the same identification object in different frame images. Specifically, there are various selection manners of the K pieces of image information, which may be selected according to the quality of each piece of image information, for example, selecting image information with better quality (clear image and complete image); or the image information may be selected according to the shooting angle of each piece of image information, for example, the image information with a better shooting angle (a positive shooting angle) is selected, which is not limited specifically.

For example, if the same object to be recognized is set to include 10 pieces of image information in 10 frames of images to be recognized, 8 pieces of recognition images with better quality are selected from the 10 pieces of image information as the face objects.

Another possible implementation manner is that K pieces of image information are selected according to the detection image information and the prediction image information of the same identification object in different frame images, and an identification image of the identification object is generated according to the K pieces of image information. Specifically, the K pieces of image information are selected in a manner similar to that described above, and will not be described in detail here.

For example, if the same object to be recognized is set to include 10 pieces of image information in 10 frames of images to be recognized, 8 pieces of image information with better quality are selected from the 10 pieces of image information, and the 8 pieces of image information are fused into 1 recognition image.

Alternatively, attribute information corresponding to the detection image information and the prediction image information of the same recognition object may be further extracted, thereby determining the attribute information of the recognition object. Further, the identification object may be identified or archived according to attribute information of the identification object.

In order to better explain the embodiment of the present invention, a method for determining an object in a video, which may be performed by a device for determining an object in a video, is described below with reference to a specific implementation scenario.

The video stream is set to comprise 10 frames of images to be identified, wherein the first frame of image to be identified is a detection frame image, and the third frame of image to be identified is a detection frame image. Firstly, a first frame of image to be recognized is detected, and a first detection frame of each recognition object in the first frame of image to be recognized is determined. For a first detection frame of an identification object A in a first frame image to be identified, firstly detecting key points of the identification object A in the first detection frame, and adjusting the first detection frame according to the detected key points. And then predicting a second prediction frame corresponding to the image to be recognized in the second frame by the first detection frame of the recognition object A. And detecting key points of the recognition object A in the second prediction frame, and adjusting the second face prediction frame according to the detected key points. And then predicting a third prediction frame corresponding to the second prediction frame of the identification object A in the third frame image to be identified, detecting the key point of the identification object A in the third prediction frame, and adjusting the third prediction frame according to the detected key point. And simultaneously detecting the third frame of image to be identified, and determining a third detection frame of the third frame of image to be identified. And setting that one third detection frame in the third detection frame of the third frame of image to be recognized and a third prediction frame of the recognition object A have intersection, and correcting the third face prediction frame of the recognition object A by using the third detection frame. And then judging whether the third prediction frame of the modified recognition object A is a face image, if so, predicting a fourth prediction frame corresponding to the fourth frame of the image to be recognized by the third prediction frame of the modified recognition object A. And the like until the recognition object A cannot be predicted to predict a frame in the image to be recognized of the next frame. And setting 8 frames corresponding to the identification object A in 10 frames of images to be identified, and taking the 8 frames as the identification image of the identification object A for identifying the identification object A or filing the identification object A subsequently.

In addition, in a third face detection frame of the third frame of image to be recognized, when there is no intersection with a third face prediction frame of the third frame of image to be recognized, the third face detection frame is determined as a new face detection frame corresponding to the face. And detecting key points in the third face detection frame, and adjusting the third face detection frame according to the detected key points. And then judging whether the third face detection frame is a face image, if so, predicting a fourth face prediction frame corresponding to the fourth frame of image to be recognized by the third face detection frame, and repeating the steps until the face prediction frame in the next frame of image to be recognized cannot be predicted.

Based on the same technical concept, an embodiment of the present invention provides an apparatus for determining an object in a video, as shown in fig. 4, the apparatus includes an obtaining unit 401 and a processing unit 402, where:

an obtaining unit 401, configured to obtain a to-be-processed video that is obtained by shooting by a monitoring device within a preset time period, where the to-be-processed video includes N frames of images; n is greater than or equal to 2;

a processing unit 402, configured to detect, for a first image, a type of each identification object and detection image information corresponding to each identification object in the first image if the first image is a detection frame image; if the first image is a prediction frame image, predicting prediction image information corresponding to each recognition object in the first image according to image information corresponding to each recognition object in a second image; the first image is any one of the N frames of images, the second image is an adjacent image of the first image, and the image information corresponding to the identification object is determined or predicted;

the processing unit 402 is further configured to determine an identification image of the identification object according to the detection image information and the prediction image information of the same identification object in different frame images.

In a possible implementation manner, the processing unit 402 is specifically configured to:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for determining an object in a video, the method comprising:

2. The method according to claim 1, wherein determining the detection image information corresponding to each recognition object if the first image is a detection frame image comprises:

3. The method of claim 1, wherein determining the type of each recognition object and the detection image information corresponding to each recognition object in the first image if the first image is a detection frame image comprises:

4. The method according to any one of claims 1 to 3, wherein determining the identification image of the identification object based on the detection image information and the prediction image information of the same identification object in different frame images comprises:

5. An apparatus for determining objects in a video, the apparatus comprising:

6. The apparatus according to claim 5, wherein the processing unit is specifically configured to:

7. The apparatus according to claim 5, wherein the processing unit is specifically configured to:

8. The apparatus according to any one of claims 5 to 7, wherein the processing unit is specifically configured to:

9. A computer-readable storage medium, characterized in that the storage medium stores instructions that, when executed on a computer, cause the computer to carry out performing the method of any one of claims 1 to 4.

10. A computer device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 4 in accordance with the obtained program.