CN115457447B

CN115457447B - Moving object identification method, device and system, electronic equipment and storage medium

Info

Publication number: CN115457447B
Application number: CN202211387815.0A
Authority: CN
Inventors: 陆韶琦; 冯雪涛
Original assignee: Zhejiang Lianhe Technology Co ltd
Current assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-28
Anticipated expiration: 2042-11-07
Also published as: CN115457447A

Abstract

The application discloses a moving object identification method, a device and a system, an electronic device and a storage medium, comprising: detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized to obtain a first detection area containing the object to be recognized and a corresponding confidence coefficient; comparing the auxiliary video frame obtained by temporally separating the auxiliary video frame from the target video frame by a preset time length with the target video frame to obtain a second detection area with a moving foreground; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area according to the ratio r; and judging that the confidence coefficient of the first detection area is greater than t, and judging that the first detection area contains the object to be identified. The problem of prior art can't be in the sensitive and accurate discovery of monitoring area wait to discern the thing is solved.

Description

Moving object identification method, device and system, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computer image identification, in particular to a method, a device and a system for identifying a moving object, electronic equipment and a computer readable storage medium.

Background

With the gradual improvement of living standard of people, sanitary safety becomes more and more important. For the catering industry, the problem of guaranteeing food sanitation and safety is the life pulse and the standing cost of enterprises. Therefore, video monitoring equipment is installed in kitchens or stores in the general catering industry and used for controlling dressing and operation specifications of workers, discovering abnormal intrusion of pests such as mice and the like, and storing field records and the like. Aiming at a large amount of daily video data, the time and labor are wasted, and the problems are difficult to find quickly in time when the abnormity is found through manual checking. Especially, the manual investigation is difficult and easy to miss for the rare abnormal events such as most of the mice, large insects and the like which occur at night.

In order to save manpower and improve detection efficiency, a method for detecting and identifying by using a machine in the prior art does not only rely on manual investigation, but also can identify abnormal events such as mice, large insects and the like in a video image by using an identification model trained by the machine, but both of the abnormal events and the abnormal events only rely on a single-frame image for monitoring. In addition, in the prior art, a moving object can be found by comparing changes in continuous images through appearance characteristics and further combining a motion characteristic method, so that an abnormal object can be identified. However, in an actual use scene, for the case of appearance change in an image, disturbances including swaying leaves, flickering of light, and the like are easily recognized as rats, large insects, and the like by mistake.

Therefore, how to sensitively and accurately find abnormal objects such as intruding rats and large insects in various environments becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying a moving object, electronic equipment and a computer readable storage medium, which are used for solving the problem that an abnormal object cannot be accurately found and identified in various environments in the prior art.

The embodiment of the application provides a method for identifying a moving object, which comprises the following steps:

detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized to obtain a first detection area containing the object to be recognized and a corresponding confidence coefficient;

comparing the auxiliary video frame obtained by temporally separating the auxiliary video frame from the target video frame by a preset time length with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the auxiliary video frame and the target video frame;

carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area;

calculating the proportion of the second detection area in the first detection area in the fused video frame one by one;

determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r;

and judging whether the confidence coefficient of the first detection area is greater than the target confidence coefficient threshold t, if so, judging that the first detection area contains the object to be identified.

Optionally, the method further includes:

in the judgment of judging whether the confidence degree in the first detection area is greater than the target confidence degree threshold value t, if the judgment result is negative, providing the first detection area to a first classification model for distinguishing high-difficulty samples;

and the first classification model identifies and judges the first detection area and determines whether the first detection area contains an object to be identified.

Optionally, the method further includes:

finding out a non-associated area which is not associated with the first detection area in the second detection area;

respectively providing the non-associated regions to a pre-trained second classification model for identifying high-difficulty samples;

and the second classification model identifies and judges the non-associated region and determines whether the non-associated region contains an object to be identified.

Optionally, the determining, according to the ratio r, a target confidence threshold t of the first detected region by a predetermined confidence threshold determination method includes:

and within a preset interval of the ratio r, determining the target confidence coefficient threshold t according to a monotonically decreasing function t = f (r) between the ratio r and the target confidence coefficient threshold t.

Optionally, the detecting module for the object to be recognized is used to detect the object to be recognized included in the selected target video frame, and the obtaining of the first detection region including the object to be recognized and the corresponding confidence includes:

the detection model is a data-driven deep learning model and is obtained by training collected, simulated and synthesized data in a monitoring scene.

Optionally, the first classification model and the second classification model may be the same classification model or two classification models trained respectively, the classification models are data-driven deep learning models, and the classification models are obtained by training collected, simulated and synthesized data in a monitoring scene;

the sample data used by the classification model training comprises marked positive samples and marked negative samples; the positive sample is a frame picture containing the object to be identified, and the negative sample is a frame picture not containing the object to be identified and a frame picture containing other moving objects.

Optionally, before the detection model for the object to be recognized is used to detect the object to be recognized included in the selected target video frame and obtain the first detection region including the object to be recognized and the corresponding confidence, the method further includes:

acquiring a video image by using image acquisition equipment, and segmenting the acquired video image according to preset segmentation duration;

and performing frame extraction detection on the acquired video image, and selecting a target video frame.

Optionally, comparing the auxiliary video frame obtained by temporally separating the target video frame by a predetermined time length with the target video frame, and obtaining a second detected area with a moving foreground according to an image difference between the auxiliary video frame and the target video frame, including:

and the auxiliary video frame obtained by temporally separating the target video frame by a preset time length is the last frame video of the target video frame.

Optionally, the method is arranged on an intelligent edge server or a collection device with computing power, and is performed for live monitoring video collected in real time.

Optionally, the object to be identified is one or more of human, mouse, snake, livestock and insect.

The embodiment of the present application further provides a system for monitoring a field moving object, including: the system comprises video acquisition equipment, edge computing equipment and a server;

the video acquisition equipment is arranged at a monitoring site and used for acquiring a site video in real time;

the edge computing device is arranged at or near the monitoring site, and can also be integrated with the video acquisition device, wherein a program of the moving object identification method is arranged; the edge computing device obtains the field video collected by the video collecting device, selects a target video frame from the field video, and executes the following steps: detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized to obtain a first detection area containing the object to be recognized and a corresponding confidence coefficient; using an auxiliary video frame obtained by temporally separating a video frame with the target video frame by a preset time length to compare with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the two video frames; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; judging whether the confidence coefficient of a first detection area is greater than the target confidence coefficient threshold value t, if so, judging that the first detection area contains an object to be identified; if a first detection area containing the object to be identified is detected, intercepting and sending out a video frame which identifies the first detection area;

and the server receives the video frame identifying the first detection area and performs confirmation and/or alarm.

The embodiment of the present application further provides a device for identifying a moving object, including:

the first identification unit is used for detecting the object to be identified contained in the selected target video frame by using the detection model aiming at the object to be identified, and obtaining a first detection area containing the object to be identified and a corresponding confidence coefficient;

a second identification unit, configured to compare the auxiliary video frame obtained by temporally separating the target video frame by a predetermined time length with the target video frame, and obtain a second detected area having a moving foreground according to an image difference between the auxiliary video frame and the target video frame;

the fusion unit is used for carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fusion video frame marking the first detection area and the second detection area;

the calculating unit is used for calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one;

the determining unit is used for determining a target confidence coefficient threshold t of the first detection area according to the ratio r and a predetermined confidence coefficient threshold determining method;

and the judging unit is used for judging whether the confidence coefficient of the first detection area is greater than the target confidence coefficient threshold t, and if so, judging that the first detection area contains the object to be identified.

An embodiment of the present application further provides an electronic device, including:

a processor;

a memory;

the memory stores a program for a method of moving object identification, which program, when read and executed by the processor, performs the following operations:

calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one;

An embodiment of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the following operations are performed:

carrying out image fusion on the first detected area and the second detected area according to the images of the first detected area and the second detected area to obtain a fused video frame which marks the first detected area and the second detected area;

Compared with the prior art, the embodiment of the application has the following advantages:

the method for identifying the moving object provided by the embodiment of the application detects the object to be identified contained in the selected target video frame by using the detection model aiming at the object to be identified, and obtains a first detection area containing the object to be identified and a corresponding confidence coefficient; comparing the auxiliary video frame obtained by temporally separating the auxiliary video frame from the target video frame by a preset time length with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the auxiliary video frame and the target video frame; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; and judging whether the confidence coefficient of a first detection area is greater than the target confidence coefficient threshold value t, if so, judging that the first detection area contains an object to be identified. Therefore, the image recognition of the single-frame target video frame and the motion characteristics obtained by the target video frame and the auxiliary video frame are combined, the object to be recognized can be sensitively and accurately found in various environments, and the detection rate of the object to be recognized is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a moving object identification method according to an embodiment of the present application.

Fig. 2 is a flowchart of a moving object identification method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a first detection region and a corresponding confidence level according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a relationship between the duty ratio r and the target value signal service threshold provided in the embodiment of the present application.

Fig. 5 is a block diagram of units of a device for identifying a moving object according to an embodiment of the present application.

Fig. 6 is a block diagram of elements of a system for monitoring an object in a field according to an embodiment of the present disclosure.

Fig. 7 is a schematic logical structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a method, a system, a device, electronic equipment and a computer storage medium for identifying a moving object, which are more favorable for sensitively and accurately finding an object to be identified in various environments and improving the detectable rate of the object to be identified.

In order to enable those skilled in the art to better understand the technical solution of the present application, the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. This application is capable of embodiments in many different forms than those described below and it is therefore intended that all such other embodiments, which would be within the scope of the present application and which are obtained by a person of ordinary skill in the art based on the examples provided herein, be within the scope of the present application without undue experimentation.

It should be noted that the terms "first," "second," "third," and the like in the claims, the description, and the drawings of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. The data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some technical terms related to the present application are explained:

video frame: also known as video Frame rate (Frame rate), is a measure for measuring the number of display frames. The units of measurement are the number of display Frames Per Second (FPS) or "Hertz" (Hz).

Confidence coefficient: also referred to as confidence, or confidence level, confidence coefficient, which in this application is the magnitude of the likelihood that the object to be identified falls within the first detection region, is scored by the confidence

Is expressed and is taken>

。

Detection area: the detection area is a target area obtained according to an algorithm model. In the present application, the detection area is rectangular and can be utilized

The detection area can be expressed as ^ greater or greater, i.e., the detection area represents the horizontal or vertical value of the coordinate of the upper left point of the rectangle and the width or height of the rectangle>

。

And (3) moving foreground: the area is an area where a pixel value changes greatly in a digital video frame image. Generally speaking, the target area with more prominent motion in the video.

To facilitate understanding of the methods provided by the embodiments of the present application, a background of the embodiments of the present application is described before the embodiments of the present application are described.

Food sanitation and safety problems in the catering industry are important for enterprises, and in order to guarantee the standardization of the catering industry, video monitoring equipment is generally installed in a kitchen or a store and can be used for finding abnormal intrusion of rats and other pests and storing field records. However, for a large amount of daily video data, manual troubleshooting is time-consuming and labor-consuming, and problems cannot be timely found and solved. Especially, the manual investigation is difficult and easy to miss for most abnormal events such as the intrusion of abnormal objects such as rats, large insects and the like which occur at night.

In the prior art, frame extraction detection is performed on a video frame, and a detection model trained in advance is used to detect whether an abnormal object is included in a video frame image. However, the method is used for detecting a single-frame image, and in practical use, the identification rate is not high at night when rats, large insects and the like frequently appear, so that missing detection of abnormal objects is easily caused, and the food safety of the catering industry cannot be well guaranteed. In addition, in the prior art, a method of combining appearance characteristics with motion characteristics can be used, and moving objects are found by comparing changes in continuous images, so that abnormal objects are found out. However, in this method using the motion characteristics, the swaying leaves and the flickering of the light are easy to be identified by mistake as mice, large insects and the like even at night, so the identification rate is not high and false detection is easy to be caused.

In order to solve the problems in the prior art, the application provides a method for identifying a moving object, which uses a detection model for an object to be identified to detect the object to be identified contained in a selected target video frame, and obtains a first detection area containing the object to be identified and a corresponding confidence coefficient; comparing the auxiliary video frame obtained by temporally separating the auxiliary video frame from the target video frame by a preset time length with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the auxiliary video frame and the target video frame; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the proportion of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; and judging whether the confidence coefficient of the first detection area is greater than the target confidence coefficient threshold t, if so, judging that the first detection area contains the object to be identified. Therefore, the image recognition of the single-frame target video frame and the motion characteristics obtained by the target video frame and the auxiliary video frame are combined, the object to be recognized can be sensitively and accurately found in various environments, and the detection rate of the object to be recognized is improved.

With the background introduction of the above, those skilled in the art can understand the problems existing in the prior art, and the following provides a detailed description of an application scenario of the method for identifying a moving object provided in the present application. The moving object identification method provided by the embodiment of the application can be applied to the technical field of computer image identification or other related technical fields with computer image identification requirements.

First, an application scenario of the method for identifying a moving object according to the embodiment of the present application is described below.

Fig. 1 is a schematic view of an application scenario of a moving object identification method according to a first embodiment of the present application.

As shown in fig. 1, the application scenario includes an image capture device 101, an edge computing device 102, an application background 103, and a mobile terminal 104. The image acquisition device 101, the edge computing device 102, the application background 103 and the mobile terminal 104 are all connected through network communication.

It should be noted that fig. 1 is a schematic view of an application scenario of the moving object identification method provided in the embodiment of the present application, and the embodiment of the present application does not limit the devices included in fig. 1, and does not limit the numbers of the image capturing device 101, the edge computing device 102, the application background 103, and the mobile terminal 104. For example, in the application scenario shown in fig. 1, the image capturing device 101 may be an image capturing device installed in a restaurant and existing in itself. The edge computing device 102 may be an intelligent edge server or may be an image capture device 101 with its own computing capabilities. That is, the image capture device 101 may be a one-piece image capture computing device that integrates image capture functionality and image computing functionality. The application background 103 may include a task center and a user console, and performs screenshot reporting on an abnormal event in the edge computing device 102, forms a record to be rectified and stores the picture reported each time to the task center in the application background 103, and pushes the picture with risk to a mobile terminal of a relevant responsible person to be processed by attaching a striking rectangular area mark, and the user console of the application background 103 after the user confirms that the picture is a real risk event or not, so as to perform rectification or other subsequent actions. In some embodiments of the present application, the application background 103 may also be omitted, and the relevant person may be prompted to perform processing by operations such as sound and light prompts. The mobile terminal 104 may be a smart phone, a smart band, a tablet computer, a wearable device, a multimedia player, an e-reader, or other devices with communication functions.

In the embodiment of the present application, the number of the image capturing device 101, the edge computing device 102, the application background 103, and the mobile terminal 104 in fig. 1 may be changed. The specific implementation process of the application scenario can be referred to the following scheme description of each embodiment.

The method, apparatus, electronic device, and computer-readable storage medium described herein are further described in detail with reference to the following embodiments and the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the application and do not limit the application.

A method for identifying a moving object according to an embodiment of the present application is described in detail below with reference to fig. 2. Fig. 2 is a schematic flowchart of a moving object identification method according to an embodiment of the present application. It should be noted that the steps shown in the flowchart may be performed in a computer system such as a set of computer-executable instructions, and in some cases, the steps shown may be performed in a different logical order than that shown in the flowchart.

The method for identifying the moving object is arranged on an intelligent edge server or a collection device with computing power, and is executed aiming at a real-time collected field monitoring video.

As shown in fig. 2, a method for identifying a moving object provided in an embodiment of the present application includes the following steps:

step S201, detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient;

the method comprises the following steps of using a detection model of the object to be recognized to detect the object to be recognized contained in the selected target video frame, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient.

Before this step is performed, the target video frame needs to be acquired first, and the method for acquiring the target video frame may be: acquiring a video image by using image acquisition equipment, and segmenting the acquired video image according to preset segmentation duration; and performing frame extraction detection on the acquired video image, and selecting a target video frame.

The image capturing device utilized in this step may be an image capturing device that is present in the restaurant itself, which is installed in a multiplex.

The detection model for the object to be recognized is a data-driven deep learning model, and is obtained by training data collected, simulated and synthesized in a monitoring scene.

The object to be identified can be one or more of human, mouse, snake, livestock and insect. When the preset object to be recognized is a mouse, the positive sample for model training is a frame picture containing the mouse; the negative sample is a frame picture not containing the mouse or a frame picture containing other moving objects such as insects and snakes. When the trained detection model is more accurate, when the detection model for the object to be recognized is used, the more accurate the result obtained by detecting the object to be recognized contained in the selected target video frame is, that is, the more accurate the obtained first detection region containing the object to be recognized is, the higher the corresponding confidence coefficient is.

Fig. 3 is a schematic diagram of a first detected region containing an object to be recognized and a corresponding confidence level in a target video frame. As can be seen from fig. 3, the detection model is used to detect that three first detected regions are included in the target video frame, and a confidence score value with confidence is marked above each detected region. Wherein, after the image sampled by frames is identified, the obtained first detection area and the corresponding confidence coefficient can be expressed as

. Wherein +>

Denotes the coordinates of the upper left point of a first detection region rectangle>

Indicates the width of the same first detection area rectangle>

The height of the first detected area rectangle is shown, and the accurate position of the detected rectangular area on the image can be located through the four values. The confidence score is used to indicate the likelihood that the first detected region contains the object to be identified. The confidence scores of the first detected regions shown in fig. 3 are 0.3, 0.8, and 0.1 in this order. />

And detecting the object to be recognized contained in the selected target video frame by using the detection model of the object to be recognized, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient. And detecting the object to be identified on the video frame image, and providing a basis for combining the detection result with the motion characteristic subsequently to obtain a more accurate detection result.

Step S202, using an auxiliary video frame obtained by temporally separating the target video frame by a preset time length to compare with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the target video frame and the auxiliary video frame;

this step is used for comparing the auxiliary video frame obtained by temporally separating the target video frame by a predetermined time length with the target video frame, and obtaining a second detected area having a moving foreground according to the difference between the two images.

In order to improve the accuracy of the obtained second detection area with the moving foreground, the time length between the selected auxiliary video frame and the target video frame is as small as possible because the moving speeds of the objects to be identified are all fast. Preferably, the auxiliary video frame obtained by the target video frame at a predetermined time interval is a previous frame of video of the target video frame.

And performing motion foreground extraction on the images of the target video frame and the auxiliary video frame through the image difference of the two frames of videos with smaller time interval to obtain a second detection area with motion foreground. The second detection region is mainly a region having a large movement range and a large change in image pixel value.

The second detected region with moving foreground obtained from the image difference between the target video frame and the auxiliary video frame can be represented as

. Is identical to the first detection region, wherein>

Coordinates which represent the upper left point of a second detection region rectangle, is/are present>

Represents the width of the same second detection region rectangle, <' > is selected>

Indicating the height of the second detected region rectangle by which the exact position of the second detected rectangular region on the image can be located.

In the extraction process of the moving foreground, a swaying leaf, a flickering light and even a reflection reflected by moonlight in the video frame image may be detected as the second detection area.

This step obtains a second detected area having a moving foreground from a difference between the images of the target video frame and the auxiliary video frame obtained by temporally spacing the auxiliary video frame from the target video frame by a predetermined time length, by comparing the auxiliary video frame with the target video frame. And identifying the area with the motion characteristic, and providing a basis for further combining the area identified by target detection with the area with the motion characteristic to obtain a more accurate identification result.

Step S203, carrying out image fusion on the first detected area and the second detected area according to the images of the first detected area and the second detected area to obtain a fused video frame marking the first detected area and the second detected area;

the step is used for carrying out image fusion on the first detected area and the second detected area according to the images of the first detected area and the second detected area to obtain a fused video frame marking the first detected area and the second detected area.

And performing image fusion on the frame images according to the frame image where the first detected area obtained in the step S201 and the second detected area obtained in the step S202 are located, wherein the fused video frame comprises the first detected area and the second detected area. This step provides a basis for the subsequent calculation of the correlation between the detected regions in the fused video frame.

Step S204, calculating the proportion of the second detection area in the first detection area in the fused video frames one by one;

the step is used for calculating the occupation ratio of the second detection area in the first detection area in the fusion video frame one by one.

The first detection region is shown as

And the second detection area is indicated as ^ greater or greater>

Count each by one->

Is at>

Is based on the ratio of->

。

If three detected regions are included in the first detected region and two detected regions are included in the second detected region, the ratio of each second detected region in the first detected region needs to be calculated for the target video frame, i.e., 3 × 2, i.e., 6 ratio values need to be calculated in this case.

This step is used to calculate the proportion of each second detected region in each first detected region for each target video frame. It can be understood that some first detection regions and some second detection regions do not have an overlapping relationship, and the ratio of the first detection regions to the second detection regions is 0; some of the second detection regions are completely included in the first detection region, and the proportion of the second detection regions in the first detection region is 1.

Step S205, according to the ratio r, determining a target confidence threshold t of the first detection area by a preset confidence threshold determination method;

this step is used for determining the target confidence threshold t of the first detection region according to the ratio obtained in step S204 and a predetermined confidence threshold determination method according to the ratio r.

The method for determining the preset confidence threshold value is that when the ratio r is in a preset interval, the target confidence threshold value t is determined according to a monotonically decreasing function t = f (r) between the ratio r and the target confidence threshold value t. Namely, within a preset interval of the ratio r, the ratio r and the target confidence threshold t are monotonically decreased functions. The larger the ratio r, the smaller the target confidence threshold t.

As shown in fig. 4, the relationship between the ratio r and the target confidence threshold t is shown schematically.

In this step, a specific functional relationship between the duty ratio r and the target confidence level threshold t is not defined, and t = f (r) may satisfy a monotonic decrease in a predetermined interval, where the monotonic decrease function may be linear or non-linear, where (a) in fig. 4 is a schematic diagram of a linear relationship between the duty ratio r and the target confidence level threshold t, and (b) in fig. 4 is a schematic diagram of a non-linear relationship between the duty ratio r and the target confidence level threshold. Fig. 4 is a simple illustration and is not intended to limit the relationship between r and t.

In the step, a target confidence coefficient threshold t of the first detection area is determined according to a predetermined confidence coefficient threshold determination method and the occupation ratio r of the first detection area in the second detection area. The method is used for providing different confidence coefficient threshold values t determined according to the ratio r, and can be more suitable for different environments.

Step S206, determining whether the confidence of the first detected region is greater than the target confidence threshold t, and if so, determining that the first detected region includes the object to be recognized.

The step is used for comparing the confidence coefficient value s of the first detection area with the target confidence coefficient threshold value t determined in the previous step, if the confidence coefficient value s of the first detection area is larger than the target confidence coefficient threshold value t, the area is judged to be a high-confidence-coefficient target area, the area is judged to contain the object to be recognized, and the recognition result can be output.

An embodiment of the present application provides a method for identifying a moving object, in which a detection model for an object to be identified is used to detect the object to be identified included in a selected target video frame, and a first detection region including the object to be identified and a corresponding confidence are obtained; comparing the auxiliary video frame obtained by temporally separating the auxiliary video frame from the target video frame by a preset time length with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the auxiliary video frame and the target video frame; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; and judging whether the confidence coefficient of a first detection area is greater than the target confidence coefficient threshold value t, if so, judging that the first detection area contains an object to be identified. Therefore, the image recognition of the single-frame target video frame and the motion characteristics obtained by the target video frame and the auxiliary video frame are combined, the object to be recognized can be sensitively and accurately found in various environments, and the detection rate of the object to be recognized is improved.

In addition to the above steps, an embodiment of the present application further provides that, in the determining whether the confidence level in the first detection region is greater than the target confidence level threshold t, if the determination result is negative, the first detection region is provided to a first classification model for distinguishing high-difficulty samples;

If the confidence level in the first detection area is smaller than or equal to the target confidence level threshold t, the relevant area is not directly abandoned, but the first detection area is used as a low confidence level target and provided to a trained first classification model specially used for distinguishing high-difficulty samples. And carrying out re-recognition judgment on the low-confidence target through the first classification model, and determining whether the low-confidence target contains an object to be recognized.

The method comprises the following steps of further processing the area of the low-confidence target part in the first detection area, identifying and judging the low-confidence target through a trained first classification model, judging whether the low-confidence target contains an object to be identified, and if so, finding out the area containing the object to be identified in the low-confidence target to prevent missing detection.

In addition to the above steps, an embodiment of the present application further provides that, in the second detected region, a non-associated region that is not associated with the first detected region is found;

The method comprises the steps of processing other areas which are not associated with the first detection area in the second detection area, specifically, identifying the non-associated areas which are not associated with the first detection area, respectively providing the non-associated areas to a pre-trained second classification model for identifying the high-difficulty sample, further identifying and judging the non-associated areas by using the second classification model, and determining whether the non-associated areas contain the objects to be identified.

The step is used for identifying and judging the second classification model of the second detection area which is not associated with the first detection area, so that detection omission is prevented.

In the above step, the first classification model and the second classification model may be the same classification model or two classification models trained respectively, the classification models are data-driven deep learning models, and the classification models are obtained by training data collected, simulated and synthesized in a monitoring scene;

the sample data used for training the classification model comprises marked positive samples and marked negative samples; the positive sample is a frame picture containing the object to be identified, and the negative sample is a frame picture not containing the object to be identified and a frame picture containing other moving objects.

In the two steps, a first detection area with the confidence coefficient smaller than or equal to a target confidence coefficient threshold value t and a second detection area which is not associated with the first detection area are further processed, and two low confidence coefficient targets are respectively identified through a specially trained first classification model and a specially trained second classification model.

The third embodiment of the present application also provides a device for identifying a moving object, which is basically similar to the method embodiment and therefore is described more simply, and the details of the related technical features can be found in the corresponding description of the method embodiment provided above, and the following description of the device embodiment is only illustrative. As shown in fig. 5, a block diagram of the moving object recognition apparatus provided in this embodiment includes:

a first identification unit 501, configured to detect, using a detection model for an object to be identified, the object to be identified included in the selected target video frame, and obtain a first detection area including the object to be identified and a corresponding confidence level;

a second identifying unit 502, configured to compare the auxiliary video frame obtained by temporally separating the target video frame by a predetermined time length with the target video frame, and obtain a second detected area having a moving foreground according to an image difference between the auxiliary video frame and the target video frame;

a fusion unit 503, configured to perform image fusion on the first detected region and the second detected region according to the image where the first detected region and the second detected region are located, so as to obtain a fused video frame that identifies the first detected region and the second detected region;

a calculating unit 504, configured to calculate, one by one, a ratio of the second detected region to the first detected region in the fused video frame;

a determining unit 505, configured to determine a target confidence threshold t of the first detection region according to the ratio r and a predetermined confidence threshold determining method;

a determining unit 506, configured to determine whether a confidence of a first detected region is greater than the target confidence threshold t, if so, determine that the first detected region includes an object to be identified.

Optionally, the determining unit is further configured to:

in the judgment of judging whether the confidence level in the first detection area is greater than the target confidence level threshold t, if the judgment result is negative, providing the first detection area to a first classification model for distinguishing high-difficulty samples;

Optionally, the determining unit is further configured to:

and in a preset interval of the ratio r, determining the target confidence coefficient threshold t according to a monotonically decreasing function t = f (r) between the ratio r and the target confidence coefficient threshold t.

Optionally, the first identifying unit is further configured to:

the detection model is a data-driven deep learning model and is obtained by training data collected, simulated and synthesized in a monitoring scene.

Optionally, the first classification model and the second classification model may be the same classification model or two classification models trained respectively, where the classification models are data-driven deep learning models and are obtained by training collected, simulated, and synthesized data in a monitoring scene;

Optionally, the first identifying unit is further configured to:

Optionally, the second identification unit is further configured to:

Optionally, the apparatus is disposed in an intelligent edge server or a collection device with computing power, and is executed for a live monitoring video collected in real time.

Optionally, the object to be identified in the device is one or more of human, mouse, snake, livestock and insect.

The fourth embodiment of the present application further provides a system for monitoring a moving object on site, as shown in fig. 6, the system for monitoring a moving object on site provided in this embodiment includes a video acquisition device 601, an edge computing device 602, and a server 603;

the video acquisition equipment 601 is arranged at a monitoring site and is used for acquiring a site video in real time;

the edge computing device 602, which is disposed at or near the monitoring site, may also be provided integrally with the video capture device, in which a program of the moving object identification method is disposed; the edge computing device obtains the field video collected by the video collecting device, selects a target video frame from the field video, and executes the following steps: detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient; using an auxiliary video frame obtained by temporally separating a video frame with the target video frame by a preset time length to compare with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the two video frames; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; judging whether the confidence coefficient of a first detection area is greater than the target confidence coefficient threshold value t, if so, judging that the first detection area contains an object to be identified; if a first detection area containing the object to be identified is detected, intercepting and sending out a video frame which identifies the first detection area;

the server 603 receives the video frame identifying the first detected region and performs confirmation and/or alarm.

The program in which the moving object identification method is arranged corresponds to the moving object identification method provided in the second embodiment of the present application, so that the description of the system for monitoring a moving object in the field is relatively simple, and please refer to the corresponding description of the method embodiment provided above for the details of the relevant technical features, and the description of the system is only schematic.

In addition, embodiments of the present application further provide an electronic device, which is substantially similar to the method embodiment and therefore is relatively simple to describe, and only the detailed portions of the related technical features need to be referred to the corresponding descriptions of the method embodiment provided above, and the following description of the embodiment of the electronic device is only illustrative. The embodiment of the electronic equipment is as follows: please refer to fig. 7 for understanding the present embodiment, fig. 7 is a schematic view of an electronic device provided in the present embodiment.

As shown in fig. 7, the electronic device provided in this embodiment includes: a processor 701 and memory 702, a communication bus 703, and a communication interface 704.

The processor 701 is configured to execute the one or more computer instructions to implement the steps of the above method embodiments.

The memory 702 is used for storing a program of a method for moving object identification, which program, when read and executed by a processor, performs the following operations:

The communication bus 703 is used to connect the processor 701 and the memory 702 mounted thereon.

The communication interface 704 is configured to provide a connection interface for the processor 701 and the memory 702.

In the embodiments, a method for identifying a moving object, a device and an electronic device corresponding to the method are provided, and in addition, a computer-readable storage medium for implementing the method for identifying a moving object is also provided in the embodiments of the present application. The embodiments of the computer-readable storage medium provided in the present application are described relatively simply, and for relevant portions, reference may be made to the corresponding descriptions of the above method embodiments, and the embodiments described below are merely illustrative.

An embodiment of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when executed by a processor, the computer-executable instructions perform the following operations:

detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient;

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. A moving object recognition method, comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein determining the target confidence threshold t for the first detected region according to the ratio r by a predetermined confidence threshold determination method comprises:

5. The method of claim 1, wherein detecting the object to be recognized included in the selected target video frame by using a detection model for the object to be recognized to obtain a first detection region including the object to be recognized and a corresponding confidence coefficient comprises:

6. The method according to claim 3, wherein the first classification model and the second classification model can be the same classification model or two classification models trained respectively, the classification models are data-driven deep learning models, and the classification models are trained by using collected, simulated and synthesized data in a monitoring scene;

7. The method according to claim 1, wherein before the object to be recognized contained in the selected target video frame is detected by using a detection model for the object to be recognized and obtaining a first detected region containing the object to be recognized and a corresponding confidence level, the method further comprises:

8. The method of claim 1, wherein comparing the auxiliary video frame obtained from the target video frame at a predetermined time interval with the target video frame to obtain a second detected region with a moving foreground according to an image difference between the auxiliary video frame and the target video frame comprises:

9. The method according to claim 1, characterized in that the method is arranged at an intelligent edge server or a capturing device with computing power and is performed for live surveillance video captured in real time.

10. The method of claim 1, wherein the object to be identified is one or more of a human, a mouse, a snake, a livestock, and an insect.

11. A system for monitoring a field moving object is characterized by comprising video acquisition equipment, edge calculation equipment and a server;

the edge computing device is arranged at or near the monitoring site, and can also be integrated with the video acquisition device, wherein a program of the moving object identification method is arranged; the edge computing equipment obtains the field video collected by the video collecting equipment, selects a target video frame from the field video, and executes the following steps: detecting the object to be recognized contained in the selected target video frame by using a detection model aiming at the object to be recognized, and obtaining a first detection area containing the object to be recognized and a corresponding confidence coefficient; using an auxiliary video frame obtained by temporally separating a video frame with the target video frame by a preset time length to compare with the target video frame, and obtaining a second detection area with a moving foreground according to the image difference between the two video frames; carrying out image fusion on the first detection area and the second detection area according to the images of the first detection area and the second detection area to obtain a fused video frame which marks the first detection area and the second detection area; calculating the occupation ratio of the second detection area in the first detection area in the fused video frame one by one; determining a target confidence coefficient threshold t of the first detection area by a preset confidence coefficient threshold determination method according to the proportion r; judging whether the confidence of a first detection area is greater than the target confidence threshold t, if so, judging that the first detection area contains an object to be identified; if a first detection area containing the object to be identified is detected, intercepting and sending out a video frame which identifies the first detection area;

12. A device for identifying a moving object, comprising:

and the judging unit is used for judging whether the confidence coefficient of a first detection area is greater than the target confidence coefficient threshold value t, and if so, judging that the first detection area contains the object to be identified.

13. An electronic device comprising a processor and a memory and computer program instructions stored on the memory and executable on the processor; the processor, when executing the computer program instructions, implements a method of moving object identification as claimed in any of claims 1-10 above.

14. A computer-readable storage medium having one or more computer instructions stored thereon, the computer-readable storage medium storing thereon one or more computer instructions executable by a processor to perform a method for moving object identification as claimed in any one of claims 1-10.