CN111582025A

CN111582025A - Moving object identification method and device and storage medium

Info

Publication number: CN111582025A
Application number: CN202010241664.2A
Authority: CN
Inventors: 林晓明; 江金陵; 鲁邹尧
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-08-25
Anticipated expiration: 2040-03-31
Also published as: CN111582025B

Abstract

A method, apparatus and storage medium for identifying moving objects, comprising collecting video; detecting a motion track of a moving object in the acquired video; determining a first distance between each moving area and the center position of a plurality of moving areas contained in the motion trail in the detected motion trail, and obtaining the intra-class distance of the motion trail according to the first distances; and for the motion track with the intra-class distance meeting the preset condition, adopting a pre-trained picture classification model to identify each picture corresponding to the motion track in the video, and determining whether the video contains the specified object according to the identification result. The method and the device can accurately identify the moving object.

Description

Moving object identification method and device and storage medium

Technical Field

The present disclosure relates to the field of computers, and more particularly, to a method, an apparatus, and a storage medium for identifying a moving object.

Background

With the improvement of living standard of people, sanitary safety becomes more and more important, and the food safety is important in the sanitary safety. Kitchen hygiene safety, in which the risk of rats is very high, is an important component of food safety. However, on one hand, it is difficult to constantly monitor whether a mouse is present in the kitchen, and on the other hand, a mouse trap and mouse poison are effective means for killing a mouse. Therefore, it is important to monitor whether or not a rat is present in the kitchen and to determine the movement locus of the rat when the rat is present. The present invention is directed to night time video and each video is typically one to two minutes in length in order to increase the feasibility of the algorithm. The lengthy video needs to be cut into short videos. Short video ensures that the overall background variation of the video is small.

In the prior art, a picture classification model based on deep learning is generally adopted for judgment, and for one picture, if a mouse exists in the picture, the category of the whole picture is judged to be the mouse. However, commonly used image classification models such as resnet, densenet, etc. may be used. But in kitchen video images, mice occupy only a small area of the picture. On one hand, if the model is used for well recognizing the mouse in the picture, the used deep learning model needs to be large, and the calculation amount and the storage amount are large. On the other hand, if the model effect is good, massive data is needed, and meanwhile, due to the fact that the difference of different kitchen backgrounds is large, even if the classification model with good effect is obtained through training, the generalization capability of the model is difficult to guarantee. In addition, the mouse cannot be positioned by classifying the whole picture, and a moving track graph of the mouse cannot be obtained.

In the prior art, a target detection model based on deep learning is also adopted, and the model can effectively identify a target in a picture and locate the position of the target. The method can effectively identify the mice in the picture by training a deep learning target detection model related to the mice, such as fast-RCNN, SSD, YOLO-v3 and the like, so as to achieve the aim of detecting the mice in the video. On the one hand, however, an effective target test model may be trained for the kitchen in the same store, but this model is likely to be less effective in other stores. A target detection model not only needs to judge which are mice in pictures, but also needs to judge which are not mice, and the background difference of shops is large in different shops, so that the effect of using a deep learning target detection model is not good. If one wishes to train a target detection model with good generalization ability, many pictures of rats in different backgrounds are needed, which is difficult to achieve. On the other hand, the object detection models are relatively large in calculation amount, and if the models are deployed to a plurality of shops at the same time, the calculation cost is high.

In the prior art, a Gaussian background modeling and deep learning-based image classification model is also mixed, and the image classification model mainly comprises two parts:

a) and (3) modeling a mixed Gaussian background, wherein in the detection and extraction of a moving target, a background target is important for the identification and tracking of the target. Whereas hybrid gaussian background modeling is suitable for separating the background and foreground from the image sequence with the camera fixed. Under the condition that a camera is fixed, the change of a background is slow, and is mostly influenced by illumination, wind and the like, a foreground and the background are separated from a given image through modeling the background, generally speaking, the foreground is a moving object, and therefore the purpose of detecting moving objects is achieved.

b) Based on the deep learning image classification model, common image classification networks, such as resnet, VGG16, densenet, etc., can identify the class of a given image by labeling the class of an image and then training a classification model. The small-area pictures obtained based on the movement detection are classified, the needed model is smaller, and the generalization capability of the model is better than that of a classification model directly.

Under the restaurant kitchen scene, the camera is a fixed camera, the background change is small, and the mobile object detection algorithm effect is good. The mouse in the video can be effectively detected by combining the moving object detection model and the image classification model which are modeled by the mixed Gaussian background, and the position of the mouse can be positioned. The method comprises the steps of firstly detecting the position of a moving object in a video, then classifying pictures by using a picture classification model, and judging whether the moving object is a mouse or not so as to judge whether the mouse appears in the video or not and the position of the mouse.

However, the deep learning classification model is limited by training data and model effect, and the model cannot achieve perfect effect. In the kitchen, the refrigerator, the socket, the flashing of various electric appliances and the reflection of various steel and iron exist, and due to the fact that the movement detection algorithm is not perfect, the light can be detected as moving objects in a wrong way sometimes, and in addition, the light is bright like the eyes of a mouse, so that the objects are easy to detect as the mouse in a wrong way.

Therefore, in order to train a video, the video generalization capability is strong, the effect is good, and various false detections caused by light changes in the video are not easy to detect. Besides the picture itself, it is necessary to exclude the non-mouse flashing tracks based on the moving tracks of the objects to correctly judge whether the moving objects are mice.

Disclosure of Invention

The application provides a method and a device for identifying a moving object and a storage medium, which can achieve the aim of identifying the moving object more accurately.

The application provides a method for identifying a moving object, which comprises the following steps: collecting a video; detecting a motion track of a moving object in the acquired video; determining a first distance between each moving area and the center position of a plurality of moving areas contained in the motion trail in the detected motion trail, and obtaining the intra-class distance of the motion trail according to the first distances; the moving area is a rectangular area which forms the motion track in a frame picture contained in the video; and for the motion track with the intra-class distance meeting the preset condition, adopting a pre-trained picture classification model to identify each picture corresponding to the motion track in the video, and determining whether the video contains the specified object according to the identification result.

Compared with the related art, the method and the device have the advantages that the possible situations of the moving objects are identified by determining the intra-class distance of the motion trail, and then the image classification model is adopted for further confirmation, so that the identification accuracy of the corresponding moving objects can be improved.

In an exemplary embodiment, the embodiment of the present application excludes the flashing tracks of the unspecified objects from the moving objects by determining the intra-class distance of the motion track, so that the identification accuracy of the corresponding moving objects can be improved.

In an exemplary embodiment, the prediction value of each frame of picture is processed in a predetermined manner through a picture classification model, so that a prediction result of a movement track can be obtained.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a flowchart illustrating a method for identifying a moving object according to an embodiment of the present application;

FIG. 2 is a block diagram of an apparatus for recognizing a moving object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of identifying video pictures based on moving objects in a specific scene according to an embodiment of the present application;

FIG. 4 is a refraction picture of light based on a specific scene in an embodiment of the present application;

fig. 5 an embodiment of the present application is based on a flash picture of light in a specific scene.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

As shown in fig. 1, the present application provides a method for identifying a moving object, comprising the operations of:

s1, collecting videos;

in one exemplary embodiment, the video capture may be captured by a camera device. For example, when a mouse in a kitchen needs to be identified, a video in the kitchen can be captured through a camera fixedly installed in the kitchen.

In one exemplary embodiment, for the rat detection scenario, in order to reduce the amount of computation, it is assumed that only rats that are active at night are detected. Because in daytime, the activity information of many people in the kitchen, if moving object detection is carried out at this time, it can be detected that many objects are moving, then deep learning models need to make many judgments, and the calculation amount is large. Therefore, the camera needs to have an infrared shooting function, the infrared shooting function can acquire clear pictures at night, and the subsequent model processing can be based on videos acquired after the infrared shooting function is started.

S2, detecting the motion trail of the moving object in the collected video;

in one exemplary embodiment, a hybrid Gaussian model is used to detect the motion trajectory of a moving item in a video. The main principle of mixed Gaussian background modeling is to construct a background in a video, and then for a picture of each frame, on one hand, the picture and the background are subjected to difference detection so as to detect a 'foreground' in the picture, wherein the foreground is a recognized moving object; on the other hand, the picture is used for updating the background to obtain a new background.

S3, determining a first distance between each moving area and the center position of a plurality of moving areas contained in the detected moving track, and obtaining the intra-class distance of the moving track according to the first distances; the moving area is a rectangular area which forms the motion track in a frame picture contained in the video;

in an exemplary embodiment, the moving area detected by the gaussian mixture model is a rectangular area in the frame map corresponding to the moving track, the area is an area where a moving object appears and is generally denoted by rectangle, and the area where one track a moves in sequence includes rectangle1, rectangle2, …, rectangle. Each rectangle corresponds to a screenshot in one frame of picture.

In an exemplary embodiment, the moving area in one frame of picture may include one or more moving areas.

As shown in fig. 3, the graph is a frame of picture in the captured video, a rectangular region in the picture is a moving region, w represents the length of the whole graph, and h represents the width of the whole graph. (x, y) represents the coordinates of the upper left corner (default coordinates of the upper left corner of the entire figure are (0,0), (dx, dy) are the length and width of the moving area, respectively.

In an exemplary implementation, the central positions of the moving areas included in the motion trajectory in operation S3 refer to an average value of the positions of the moving areas included in the motion trajectory;

in an exemplary implementation, the deriving the intra-class distance of the motion trail according to the plurality of first distances in operation S3 includes: the intra-class distance of the motion trail is obtained by averaging the plurality of first distances; the central positions of the plurality of moving areas included in the motion trail are average values of the positions of the plurality of moving areas included in the motion trail, and the calculation formula is as follows:

wherein, A represents a motion track; distance _ sequential (a) represents an intra-class distance between moving regions in the frame picture of the motion trajectory a; n represents the number of moving areas corresponding to the motion trail; i the sequence number of each moving area in the motion trajectory.

The above formula is obtained by the following reasoning:

hypothesis retangle₁＝(x1,y1,dx1,dy1,w,h),rectangle₂(x2, y2, dx2, dy2, w, h) defines two rectangles rectangle₁And retangle₂The distance between them.

First, the distance between the center points of two rectangle is determined by the distance between the abscissa and the distance between the ordinate:

then, the distance between the two rectangular frames:

c) defining a "center point" (i.e. a center position) and an "intra-class distance" of a track, the intra-class distance of the track a is: suppose that the trajectory A ═ rectangle₁,rectangle₂,...,rectangle_n]: the center point of the track A is (center _ x, center _ y), where:

the intra-class distance of track A is:

in other embodiments, the intra-class distance of the motion trajectory may be obtained by averaging the distances between the partial moving area and the central position, by obtaining a median of the distances between the partial moving area and the central position, by averaging after screening abnormal values, or the like.

The moving object can be distinguished by determining the intra-class distance of the moving track. For example, in an actual application scene of detecting a mouse, moving objects recorded in a video may have situations such as a refrigerator, a socket, flashes of various electrical appliances, reflections of various steel, and the like except for the mouse, and these situations may be mistakenly detected as moving objects, and some lights are bright like eyes of the mouse, so that these objects are also easily mistakenly detected as the mouse, as shown in fig. 4 and 5; the embodiment of the application determines the situation that the moving object is possibly the situation by determining the intra-class distance of the moving object motion track, thereby eliminating false alarm.

And S4, for the motion track with the intra-class distance meeting the preset condition, adopting a pre-trained picture classification model to identify each picture corresponding to the motion track in the video, and determining whether the video contains the specified object according to the identification result.

Generally, the picture in the movement track is generally a partial screenshot of the frame picture, and the frame picture may be directly adopted in other manners.

In an exemplary embodiment, an image classification model based on a convolution network, such as resnet, densenet, and the like, is adopted in advance.

In an exemplary embodiment, the image classification model is obtained by training sample data after marking whether a moving region includes a specified object.

In one exemplary embodiment, the method further comprises the operations of: s5: and judging whether the intra-class distance of the motion trail meets a preset condition.

In an exemplary embodiment, the operation S5 includes the following operations:

s50, comparing the intra-class distance of the motion trail with a preset minimum intra-class distance;

s51, when the intra-class distance of the motion track is smaller than a preset minimum intra-class distance, determining that the intra-class distance of the motion track does not meet a preset condition;

s52, when the intra-class distance of the motion trail is larger than or equal to the preset minimum intra-class distance, determining that the intra-class distance of the motion trail meets the preset condition.

In an exemplary embodiment, the method further includes S6: and determining that the video does not contain the specified object for the motion track with the intra-class distance not meeting the preset condition.

For example, in an application scenario of detecting rats, "minimum distance within class" (min _ distance) may be predefined according to the research, such as min _ distance being 0.15. When distance _ winthin _ class (a) < min _ distance, the track is considered unlikely to be a mouse, likely due to false detection of moving objects by a flash. When the distance _ within _ class (A) is more than or equal to min _ distance, the prediction result of the picture classification model is used as the standard.

The motion trail with the intra-class distance not meeting the preset condition is determined as a result without containing the specified object, but the motion trail with the intra-class distance meeting the preset condition cannot directly obtain the result, further judgment is needed, and a picture classification model is needed for further prediction.

Therefore, in an exemplary embodiment, in the operation S4, for a motion trajectory of which the intra-class distance meets a predetermined condition, a pre-trained image classification model is used to identify each image corresponding to the motion trajectory in the video, and whether a specified object is included in the video is determined according to an identification result, including the following operations:

s41, predicting whether each picture corresponding to the motion trail contains the specified moving object or not to obtain a prediction result value of each picture corresponding to the motion trail containing the specified moving object;

s42, obtaining the prediction result value of the motion trail containing the appointed moving object by adopting a preset mode according to the fact that each picture contains the prediction result value of the appointed moving object;

and S43, determining whether the video contains the specified object or not according to the fact that the obtained motion trail contains the prediction result value of the specified moving object.

For example, in the application scenario of detecting a mouse, when distance _ withhin _ class (a) ≧ min _ distance, all pictures in the trajectory a are predicted by using the classification model trained and stored in the model training module. Hypothesis retangle₁,rectangle₂,…,rectangle_nThe corresponding prediction results are respectively [ pred₁,pred₂,,,,pred_n]Wherein pred_iIs a number between 0 and 1, the closer to 1, the more likely it is that mice are contained in this figure. .

Since the above is prediction of the result of each frame picture, and the entire movement trajectory is not predicted, it is necessary to predict the movement trajectory in a predetermined manner.

In an exemplary embodiment, the motion trajectory is determined to contain the prediction result value of the specified moving object by averaging the prediction result values of all the frame pictures containing the specified moving object corresponding to the motion trajectory; or, in another exemplary embodiment, the motion trajectory is determined by averaging the prediction result values of a specified number of frames of the pictures sequenced at the front in all the frames of the pictures

For example, in the application scenario of detecting a mouse, the video image with the motion trajectory contains a prediction result value of a moving object. Combining these results yields a unified result, such as using:

or assume [ pred₁,pred₂,,,,pred_n]The big to little permutation result is [ pred₁’,pred₂’,,,,pred_n’]I.e. pred₁’>＝pred₂’>＝pred_n', definition:

or median in-use number:

median_pred＝median([pred1,pred2,,,,predn])

and so on, may yield a number between 0 and 1. This value is used as a predicted value of whether the moving article in the trajectory a is a mouse.

In an exemplary embodiment, the method further includes: when it is determined whether the video contains the specified object, drawing a movement track graph of the specified object based on the movement track.

In the above example of the mouse, when the predicted result of the trajectory a is larger than a given threshold, such as 0.5, the moving object in the trajectory a is determined to be a mouse. And drawing a moving track graph of the mouse based on the track A.

As shown in fig. 2, the present application provides an identification apparatus for a moving object, including:

the video acquisition module 10 is used for acquiring videos;

a detection module 20, configured to detect a motion trajectory of a moving object in the acquired video;

an ordered intra-class distance determining module 30, configured to determine a first distance between each moving region and a center position of a plurality of moving regions included in the detected motion trajectory, and obtain an intra-class distance of the motion trajectory according to the first distances; the moving area is a rectangular area which forms the motion track in a frame picture contained in the video;

and the picture classification module 40 is configured to identify each picture corresponding to a motion trajectory in the video by using a pre-trained picture classification model for the motion trajectory of which the intra-class distance meets a predetermined condition, and determine whether the video contains a specified object according to an identification result.

The application also provides a device for identifying the moving object, which comprises a processor and a memory, and is characterized in that the memory stores a program for directionally delivering the content; the processor is used for reading the program for targeted delivery of content and executing the method of any one of the above.

The present application also provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method of any of the above.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A method of identifying a moving object, comprising:

collecting a video;

detecting a motion track of a moving object in the acquired video;

determining a first distance between each moving area and the center position of a plurality of moving areas contained in the motion trail in the detected motion trail, and obtaining the intra-class distance of the motion trail according to the first distances; the moving area is a rectangular area which forms the motion track in a frame picture contained in the video;

and for the motion track with the intra-class distance meeting the preset condition, adopting a pre-trained picture classification model to identify each picture corresponding to the motion track in the video, and determining whether the video contains the specified object according to the identification result.

2. The method of claim 1, wherein said deriving the intra-class distance of the motion trajectory from a plurality of first distances comprises:

the intra-class distance of the motion trail is obtained by averaging the plurality of first distances; the center positions of the plurality of moving areas included in the motion trajectory are average values of the positions of the plurality of moving areas included in the motion trajectory.

3. The method of claim 1, wherein the image classification model is trained using sample data after labeling whether the moving region includes a specific object.

4. The method of claim 4, further comprising determining whether an intra-class distance of the motion trajectory meets a predetermined condition;

the step of judging whether the intra-class distance of the motion trail meets a preset condition comprises the following steps:

comparing the intra-class distance of the motion track with a preset minimum intra-class distance;

when the intra-class distance of the motion track is smaller than a preset minimum intra-class distance, determining that the ordered intra-class distance of the motion track does not meet a preset condition;

and when the intra-class distance of the motion track is greater than or equal to a preset minimum intra-class distance, determining that the intra-class distance of the motion track meets a preset condition.

5. The method of claim 1, wherein: for the motion track with the intra-class distance meeting the preset condition, identifying each picture corresponding to the motion track in the video by adopting a pre-trained picture classification model, and determining whether the video contains a specified object according to an identification result, wherein the identification comprises the following steps:

predicting whether each picture corresponding to the motion trail contains the specified moving object or not to obtain a prediction result value of each picture corresponding to the motion trail containing the specified moving object;

obtaining a prediction result value of the motion trail including the specified moving object by adopting a preset mode according to the fact that each obtained picture includes the prediction result value of the specified moving object;

and determining whether the video contains the specified object or not according to the fact that the obtained motion trail contains the prediction result value of the specified moving object.

6. The method of claim 5, wherein: the determining that the motion trajectory contains a prediction result value of a moving object in a predetermined manner includes:

determining that the motion trail contains the prediction result value of the specified moving object by averaging the prediction result values of all the frame pictures corresponding to the motion trail containing the specified moving object;

or determining that the video image of the motion track contains the prediction result value of the moving object by adopting a mode of averaging the prediction result values of the frame pictures which are sequenced in the front and in the appointed number in all the frame pictures corresponding to the motion track.

7. The method of claim 1, wherein: the method further comprises the following steps: and when the video is determined to contain the specified object, drawing a movement track graph of the specified object based on the movement track.

8. An apparatus for identifying a moving object, the apparatus comprising:

the video acquisition module is used for acquiring videos;

the detection module is used for detecting the motion trail of a moving object in the acquired video;

the ordered intra-class distance determining module is used for determining a first distance between each moving area and the center position of a plurality of moving areas contained in the detected motion track, and obtaining the intra-class distance of the motion track according to the first distances; the moving area is a rectangular area which forms the motion track in a frame picture contained in the video;

and the picture classification module is used for identifying each picture corresponding to the motion track in the video by adopting a pre-trained picture classification model for the motion track with the intra-class distance meeting the preset condition, and determining whether the video contains the specified object according to the identification result.

9. An apparatus for identifying a moving object, comprising a processor and a memory, wherein the memory stores a program for targeting delivered content; the processor is configured to read the program for targeted delivery and execute the method of any of claims 1-7.

10. A computer storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the method according to any one of claims 1-7.