CN111582025B

CN111582025B - Method and device for identifying moving object and storage medium

Info

Publication number: CN111582025B
Application number: CN202010241664.2A
Authority: CN
Inventors: 林晓明; 江金陵; 鲁邹尧
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-11-24
Anticipated expiration: 2040-03-31
Also published as: CN111582025A

Abstract

A method, a device and a storage medium for identifying a moving object comprise the steps of collecting video; detecting a motion trail of a moving object in the acquired video; determining a first distance between each moving region and the central positions of a plurality of moving regions contained in the moving track in the detected moving track, and obtaining an intra-class distance of the moving track according to the plurality of first distances; and for the motion trail of which the intra-class distance meets the preset condition, identifying each picture corresponding to the motion trail in the video by adopting a pre-trained picture classification model, and determining whether the video contains a specified object according to an identification result. The application can accurately identify the moving object.

Description

Method and device for identifying moving object and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a method and apparatus for identifying a moving object, and a storage medium.

Background

With the improvement of the living standard of people, the sanitation and safety become more and more important, and among the sanitation and safety, the food safety is an important one. Kitchen safety is an important component of food safety, and in kitchen safety, the danger posed by mice is very high. However, on one hand, it is difficult to monitor whether a rat appears in a kitchen or not at all, and on the other hand, a rat trap and a rat powder are effective means for removing the rat, but in the case that the rat is not seen, it is difficult to determine the place where the rat trap and the rat powder are placed, and the rat removal efficiency is much lower. Therefore, it becomes very important to monitor whether a rat is present in the kitchen and to locate the movement track of the rat when the rat is present. The present application is directed to night time video and to increase the feasibility of the algorithm, each video is typically one to two minutes in length. Lengthy video needs to be split into short video. The short video ensures that the overall background of the video varies little.

In the prior art, a picture classification model based on deep learning is generally adopted for judging, and if a mouse exists in a picture, the class of the whole picture is judged to be the mouse. However, commonly used image classification models are used, such as resnet, densnet, etc. However, in the kitchen video image, the mouse occupies only a small area of the picture. On the one hand, if the model is to identify mice in the picture well, the deep learning model used needs to be large, and the calculation amount and storage become large. On the other hand, if the model is good in effect, massive data are needed, and meanwhile, because the difference of different kitchen backgrounds is large, even if a classification model with good effect is obtained through training, the generalization capability of the model is difficult to ensure. In addition, the whole picture is classified, so that the mice cannot be positioned, and a moving track diagram of the mice cannot be obtained.

In the prior art, a target detection model based on deep learning is also adopted, and the model can effectively identify the target in the picture and locate the position of the target. The mouse in the picture can be effectively identified by training a deep learning target detection model related to the mouse, such as Faster-RCNN, SSD, YOLO-v3, and the aim of detecting the mouse in the video is fulfilled. However, on the one hand, an effective object model can be trained for the kitchen of the same store, but this model is likely to be less effective in other stores. The target detection model not only needs to judge which mice are in the pictures, but also needs to judge which mice are not, and the background of the stores is very different in different stores, so that the effect of using the deep learning target detection model is not good. If one wishes to train a very generalizable target detection model, one needs many mouse pictures in many different contexts, which is difficult to achieve. On the other hand, the calculation amount of the object detection model is relatively large, and the calculation cost required by the model is high if the model is deployed to a plurality of shops at the same time.

In the prior art, a Gaussian background modeling and a deep learning-based picture classification model are also mixed, and mainly comprise two parts:

a) Background modeling of mixed Gaussian, and background targets are crucial to target identification and tracking in moving target detection and extraction. Whereas hybrid gaussian background modeling is suitable for separating the background and foreground from the image sequence with the camera fixed. In the case of fixed cameras, the background changes slowly and is mostly influenced by light, wind, etc., and by modeling the background, the foreground, which is generally a moving object, is separated from the background in a given image, thereby achieving the purpose of detecting moving objects.

b) Based on the deep learning picture classification model, a common image classification network, such as resnet, VGG16, densenet, etc., can identify the class of a given picture by marking a picture with a class mark and then training a classification model. Based on the small-area pictures obtained by the movement detection, the required model is smaller, and the generalization capability of the model is better than that of the model obtained by directly using the classification model.

Under the kitchen scene of the restaurant, the camera is a fixed camera, the background change is smaller, and the detection algorithm effect of the moving objects is better. By combining the moving object detection model modeled by the Gaussian mixture background and the image classification model, the mouse in the video can be effectively detected, and the position of the mouse can be positioned. Detecting the position of a moving object in a video, classifying the picture by using a picture classification model, and judging whether the moving object is a mouse or not so as to judge whether the mouse appears in the video and the position of the mouse.

However, the deep learning classification model is limited by training data and model effects, and the model is difficult to achieve perfect effects. In the kitchen, there are refrigerators, power strip, flashing of various electric appliances, and reflection of various kinds of steel, and these things are sometimes misdetected as moving articles due to imperfections in the movement detection algorithm, and some light is much like eyes of mice, and all the articles are also easily misdetected as mice.

Therefore, in order to train a generalization capability, the effect is also good, and false detection caused by brightness change in the video is not easy to happen. In addition to the picture itself, it is also necessary to exclude the flash trajectories of those non-mice based on the movement trajectories of the objects, and to correctly judge whether the moving object is a mouse.

Disclosure of Invention

The application provides a method and a device for identifying a moving object and a storage medium, which can achieve the purpose of accurately identifying the moving object.

The application provides a method for identifying a moving object, which comprises the following steps: collecting video; detecting a motion trail of a moving object in the acquired video; determining a first distance between each moving region and the central positions of a plurality of moving regions contained in the moving track in the detected moving track, and obtaining an intra-class distance of the moving track according to the plurality of first distances; the moving area refers to a rectangular area forming the motion trail in a frame picture contained in the video; and for the motion trail of which the intra-class distance meets the preset condition, identifying each picture corresponding to the motion trail in the video by adopting a pre-trained picture classification model, and determining whether the video contains a specified object according to an identification result.

Compared with the related art, the method and the device for identifying the moving object can identify the possible situation of the moving object by determining the intra-class distance of the motion trail, and further confirm by adopting the picture classification model, so that the identification accuracy of the corresponding moving object can be improved.

In an exemplary embodiment, the method and the device can eliminate the flash track of the unspecified object by determining the intra-class distance of the motion track to the moving object, so that the identification accuracy of the corresponding moving object can be improved.

In an exemplary embodiment, the embodiment of the application processes the predicted value of each frame of frame picture in a predetermined manner through the picture classification model, so as to obtain the predicted result of the moving track.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. Other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The accompanying drawings are included to provide an understanding of the principles of the application, and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the principles of the application.

FIG. 1 is a flow chart of a method for identifying a moving object according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a module of an identification device for a moving object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a video picture based on moving object recognition under a specific scene according to an embodiment of the present application;

FIG. 4 is a refractive image of light under a specific scene based on an embodiment of the present application;

fig. 5 an embodiment of the application is based on a flash picture of light in a specific scene.

Detailed Description

The present application has been described in terms of several embodiments, but the description is illustrative and not restrictive, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the described embodiments. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or in place of any other feature or element of any other embodiment unless specifically limited.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The disclosed embodiments, features and elements of the present application may also be combined with any conventional features or elements to form a unique inventive arrangement as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive arrangements to form another unique inventive arrangement as defined in the claims. It is therefore to be understood that any of the features shown and/or discussed in the present application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.

Furthermore, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

As shown in fig. 1, the present application provides a method for identifying a moving object, including the following operations:

s1, collecting video;

in one exemplary embodiment, video capture may be performed by a camera device. For example, when it is desired to identify a mouse in a kitchen, video in the kitchen may be collected by a camera fixedly installed in the kitchen.

In one exemplary embodiment, for a mouse detection scenario, to reduce the computational effort, it is assumed that only nocturnal active mice are detected. Because the activity information of many people in the kitchen is in daytime, if moving objects are detected, the objects are detected to move, then the deep learning model needs to make a plurality of judgment, and the calculated amount is much larger. Therefore, the camera needs to have the function of infrared shooting, the infrared shooting can acquire clear pictures at night, and the later model processing can be based on the video acquired after the infrared shooting is started.

S2, detecting a motion trail of a moving object in the acquired video;

in one exemplary embodiment, a hybrid Gaussian model is used to detect the motion trajectories of moving items in a video. The main principle of the Gaussian mixture background modeling is to construct a background in a video, and then for each frame of picture, on one hand, the picture and the background are subjected to difference detection, so that a 'foreground' in the picture is detected, wherein the foreground is considered to be a moving object; on the other hand, the picture is used for updating the background, so that a new background is obtained.

S3, determining first distances between each moving region and the central positions of a plurality of moving regions contained in the detected moving track, and obtaining the intra-class distance of the moving track according to the first distances; the moving area refers to a rectangular area forming the motion trail in a frame picture contained in the video;

in an exemplary embodiment, the moving area detected by the gaussian mixture model is a rectangular area in a frame diagram corresponding to a moving track, where the area is an area where a moving object appears, and is generally indicated by a rectangle, and a track a sequentially moves the area and includes rectangle1, rectangle2, …, rectangle. Each rectangle corresponds to a screenshot in a frame of picture.

In an exemplary embodiment, the moving area in one frame of picture may include one or more moving areas.

As shown in fig. 3, the graph is a frame of picture in the acquired video, a rectangular area in the picture is a moving area, w represents the length of the whole graph, and h represents the width of the whole graph. (x, y) indicates the coordinates of the upper left corner (default the coordinates of the upper left corner of the entire drawing are (0, 0), (dx, dy) are the length and width of the moving area, respectively.

In an exemplary implementation, the center positions of the plurality of moving areas included in the motion trajectory in operation S3 refer to an average value of the plurality of moving area positions included in the motion trajectory;

in an exemplary implementation, the obtaining the intra-class distance of the motion trajectory according to the plurality of first distances in operation S3 includes: the intra-class distance of the motion trail is obtained by taking an average value according to the plurality of first distances; the center positions of the plurality of moving areas included in the motion trail refer to an average value of the positions of the plurality of moving areas included in the motion trail, and a calculation formula is as follows:

wherein A represents a motion trail; distance_sequential (a) represents an intra-class distance between moving regions in a frame picture of the motion trail a; n represents the number of moving areas corresponding to the motion trail; i sequence number of each moving area in the motion trail.

The above formula is derived by reasoning:

let-down rectangle ₁ ＝(x1,y1,dx1,dy1,w,h),rectangle ₂ = (x 2, y2, dx2, dy2, w, h) defining two rectangles rectangle ₁ And rectangle ₂ Distance between them.

First, the distance between the center points of two rectangles is determined by the distance between the abscissas and the distance between the ordinates:

then, the distance between two rectangular frames:

c) Define a "center point" (i.e., center position) of a track and "intra-class distance", the intra-class distance of track a is: assume that track a= [ rectangle ₁ ,rectangle ₂ ,...,rectangle _n ]: trackThe center point of a is (center_x, center_y), where:

the intra-class distance of trace a is:

in other embodiments, the intra-class distance of the motion trail may be obtained by taking an average value according to the distance between the partial moving area and the center position, or may be obtained by taking a median value of the distance between the partial moving area and the center position, or may be obtained by taking an average value after screening the abnormal value, or the like.

The embodiment of the application can distinguish the moving objects through determining the intra-class distance of the moving track. For example, in the practical application scenario of detecting mice, moving objects recorded in video may have conditions of refrigerator, power strip, flashing of various electric appliances, reflection of various steel, etc. except mice, these conditions may sometimes be misdetected as moving objects, and some light is much like eyes of mice, so these objects are also easy to be misdetected as mice, as shown in fig. 4 and 5; the embodiment of the application determines that the moving object is possibly the situation by determining the intra-class distance of the moving track of the moving object, thereby eliminating false alarm.

S4, for the motion trail of which the intra-class distance meets the preset condition, a pre-trained picture classification model is adopted to identify each picture corresponding to the motion trail in the video, and whether the video contains a specified object is determined according to an identification result.

In general, the pictures in the moving track are generally frame picture partial screenshots, and in other modes, frame pictures can be directly adopted.

In one exemplary embodiment, a convolutional network based image classification model, such as resnet, densnet, and the like, is employed in advance.

In an exemplary embodiment, the picture classification model is trained using sample data that is labeled as to whether the moving region contains a specified object.

In an exemplary embodiment, the method further comprises the operations of: s5: and judging whether the intra-class distance of the motion trail meets a preset condition.

In an exemplary embodiment, the operation S5 includes the following operations:

s50, comparing the intra-class distance of the motion trail with a preset minimum intra-class distance;

s51, when the intra-class distance of the motion trail is smaller than a preset minimum intra-class distance, determining that the intra-class distance of the motion trail does not meet a preset condition;

s52, when the intra-class distance of the motion trail is larger than or equal to the preset minimum intra-class distance, determining that the intra-class distance of the motion trail meets the preset condition.

In an exemplary embodiment, the method further comprises S6: and determining that the video does not contain the specified object for the motion trail of which the intra-class distance does not meet the preset condition.

For example, in an application scenario for detecting mice, a "minimum intra-class distance" (min_distance) may be predefined according to the study, such as min_distance=0.15. When distance_win_class (A) < min_distance, the trajectory is considered unlikely to be a mouse, most likely due to a false detection of the moving object by the flash. When distance_window_class (A) is more than or equal to min_distance, the prediction result of the picture classification model is taken as the reference.

The motion trail which does not meet the preset conditions for the intra-class distance is determined to not contain the result of the specified object, but the motion trail which does not meet the preset conditions for the intra-class distance cannot directly obtain the result, further judgment is needed, and further prediction is needed by adopting a picture classification model.

Therefore, in an exemplary embodiment, the step S4 of identifying each picture corresponding to the motion track in the video by using a pre-trained picture classification model for the motion track with the intra-class distance meeting the predetermined condition, and determining whether the video includes the specified object according to the identification result includes the following steps:

s41, predicting whether each picture corresponding to the motion trail contains the appointed moving object or not to obtain a prediction result value of each picture corresponding to the motion trail containing the appointed moving object;

s42, obtaining a predicted result value of the specified moving object in the motion track in a preset mode according to the predicted result value of the specified moving object contained in each picture;

s43, determining whether the video contains the specified object according to the obtained predicted result value of the specified moving object contained in the motion trail.

For example, in the above application scenario for detecting mice, when distance_win_class (a) is greater than or equal to min_distance, all pictures in the track a are predicted by using the classification model trained and stored in the model training module. Let-down rectangle ₁ ,rectangle ₂ ,…,rectangle _n The corresponding prediction results are respectively [ pred ] ₁ ,pred ₂ ,,,,pred _n ]Wherein pred _i Is a number between 0 and 1, the closer to 1, the more likely it is that the graph will contain mice. .

Since the above is a prediction of the result of each frame of picture, and the entire movement locus is not predicted, it is necessary to predict the movement locus in a predetermined manner.

In one exemplary embodiment, determining that the motion trail includes the predicted result value of the specified moving object by averaging the predicted result values of the specified moving object included in all the frame pictures corresponding to the motion trail; alternatively, in another exemplary embodiment, the motion trail is determined by averaging the prediction result values of a specified number of frame pictures that are ranked first among all frame pictures

For example, in the above application scenario for detecting a mouse, the video image having the motion trajectory includes a predicted result value of a moving object. Combining these results in a unified result, such as using:

or assume [ pred ₁ ,pred ₂ ,,,,pred _n ]The result of the big to small permutation is [ pred ₁ ’,pred ₂ ’,,,,pred _n ’]I.e. pred ₁ ’>＝pred ₂ ’>＝pred _n ' define:

or using a median:

median_pred＝median([pred1,pred2,,,,predn])

and so on, a number between 0 and 1 may be obtained. This value is used as a predicted value of whether or not the moving article in the track a is a mouse.

In an exemplary embodiment, the above method further comprises: and drawing a moving track graph of the specified object based on the moving track when determining whether the specified object is contained in the video.

As in the example of the mouse described above, when the predicted result of the track a is greater than a given threshold value, such as 0.5, the moving object in the track a is judged to be a mouse. And drawing a movement track graph of the mouse based on the track A.

As shown in fig. 2, the present application provides an apparatus for recognizing a moving object, comprising:

a video acquisition module 10 for acquiring video;

a detection module 20 for detecting a motion trail of a moving object in the acquired video;

an ordered intra-class distance determining module 30, configured to determine a first distance between each moving region and a central position of a plurality of moving regions included in the detected motion track, where the first distances are used to obtain an intra-class distance of the motion track; the moving area refers to a rectangular area forming the motion trail in a frame picture contained in the video;

the picture classification module 40 is configured to identify each picture corresponding to a motion track in the video by using a pre-trained picture classification model for the motion track with an intra-class distance meeting a predetermined condition, and determine whether the video contains a specified object according to the identification result.

The application also provides a device for identifying the moving object, which comprises a processor and a memory, and is characterized in that the memory stores a program for directing and throwing content; the processor is configured to read the program for targeting delivery of content, and execute the method of any one of the above.

The application also provides a computer storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of the above.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims

1. A method of identifying a moving object, comprising:

collecting video;

detecting a motion trail of a moving object in the acquired video;

determining a first distance between each moving region and the central positions of a plurality of moving regions contained in the moving track in the detected moving track, and obtaining an intra-class distance of the moving track according to the plurality of first distances; the moving area refers to a rectangular area forming the motion trail in a frame picture contained in the video;

for the motion trail of which the intra-class distance meets the preset condition, a pre-trained picture classification model is adopted to identify each picture corresponding to the motion trail in the video, and whether the video contains a specified object is determined according to an identification result

The obtaining the intra-class distance of the motion trail according to the first distances comprises the following steps:

the intra-class distance of the motion trail is obtained by taking an average value according to the plurality of first distances; the center positions of the plurality of moving areas included in the motion trail refer to an average value of the positions of the plurality of moving areas included in the motion trail,

the method further comprises the step of judging whether the intra-class distance of the motion trail meets a preset condition;

the judging whether the intra-class distance of the motion trail meets the preset condition comprises the following steps:

comparing the intra-class distance of the motion trail with a preset minimum intra-class distance;

when the intra-class distance of the motion trail is smaller than the preset minimum intra-class distance, determining that the ordered intra-class distance of the motion trail does not accord with a preset condition; and

and when the intra-class distance of the motion trail is larger than or equal to the preset minimum intra-class distance, determining that the intra-class distance of the motion trail meets the preset condition.

2. The method of claim 1, wherein the picture classification model is trained using sample data that is labeled as to whether the moving region contains a specified object.

3. The method according to claim 1, characterized in that: the method for determining whether the video contains the specified object according to the recognition result comprises the following steps of:

predicting whether each picture corresponding to the motion trail contains the appointed object or not to obtain a prediction result value of each picture corresponding to the motion trail containing the appointed object;

according to the obtained predicted result value of each picture containing the specified object, obtaining the predicted result value of the motion trail containing the specified object in a preset mode;

and determining whether the video contains the specified object according to the obtained predicted result value of the specified object contained in the motion trail.

4. A method according to claim 3, characterized in that: the determining that the motion trail includes a predicted result value of the moving object in a predetermined manner includes:

determining that the motion trail comprises the predicted result value of the appointed object in a mode of averaging the predicted result values of the appointed object corresponding to all the frame pictures;

or determining that the video image of the motion track contains the predicted result value of the moving object by adopting a mode that the motion track is averaged corresponding to the predicted result values of the frame pictures of the specified number which are sequenced in front in all the frame pictures.

5. The method according to claim 1, characterized in that: the method further comprises the steps of: and drawing a motion trail graph of the specified object based on the motion trail when the video is determined to contain the specified object.

6. An apparatus for identifying a moving object, the apparatus comprising:

the video acquisition module is used for acquiring videos;

the detection module is used for detecting the motion trail of the moving object in the acquired video;

the in-order distance determining module is used for determining first distances between each moving area and the central positions of a plurality of moving areas contained in the moving track in the detected moving track, and obtaining in-order distances of the moving track according to the first distances; the moving area refers to a rectangular area forming the motion trail in a frame picture contained in the video;

a picture classification module, configured to identify each picture corresponding to a motion track in the video by using a pre-trained picture classification model for the motion track with an intra-class distance meeting a predetermined condition, determine whether the video contains a specified object according to an identification result,

the device is also used for judging whether the intra-class distance of the motion trail meets a preset condition;

7. A mobile object recognition device, comprising a processor and a memory, wherein the memory stores a program for directing delivery of content; the processor is configured to read the program for targeting content and perform the method of any of claims 1-5.

8. A computer storage medium having stored thereon a computer program, which when executed by a processor implements the method according to any of claims 1-5.