CN118196136A - Motion detection method, system, electronic device and storage medium - Google Patents

Motion detection method, system, electronic device and storage medium Download PDF

Info

Publication number
CN118196136A
CN118196136A CN202211597881.0A CN202211597881A CN118196136A CN 118196136 A CN118196136 A CN 118196136A CN 202211597881 A CN202211597881 A CN 202211597881A CN 118196136 A CN118196136 A CN 118196136A
Authority
CN
China
Prior art keywords
frame
image
target detection
image frame
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211597881.0A
Other languages
Chinese (zh)
Inventor
高若飞
祝淑琼
王威
李小涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202211597881.0A priority Critical patent/CN118196136A/en
Publication of CN118196136A publication Critical patent/CN118196136A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a motion detection method, a motion detection system, electronic equipment and a storage medium, wherein the motion detection method comprises the following steps: invoking a target detection model to respectively process a first image frame and a second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera; fusing the first target detection frame and the second target detection frame into a third target detection frame; and carrying out inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result.

Description

Motion detection method, system, electronic device and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a motion detection method, a motion detection system, an electronic device, and a storage medium.
Background
When the target is detected in the dark environment such as night, the infrared camera is required to be adopted for shooting, and as the infrared image is a gray level image, the available information is less, the noise is large, the difficulty in detecting the target movement is large, and the detection accuracy is low.
Disclosure of Invention
In order to solve the related technical problems, the embodiment of the application provides a motion detection method, a motion detection system, electronic equipment and a storage medium.
The technical scheme of the embodiment of the application is realized as follows:
The embodiment of the application provides a motion detection method, which comprises the following steps:
Invoking a target detection model to respectively process a first image frame and a second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
fusing the first target detection frame and the second target detection frame into a third target detection frame;
And carrying out inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result.
In the above scheme, fusing the first target detection frame and the second target detection frame into a third target detection frame includes:
determining coordinates of the first vertex of the third target detection frame based on the coordinates of the first vertex of the first target detection frame and the coordinates of the first vertex of the second target detection frame; wherein,
The first vertex characterizes an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
In the above aspect, the performing, in the third object detection frame, inter-frame difference calculation on the first image frame and the second image frame, and detecting whether the object moves based on a calculation result, includes:
calculating absolute values of differences of the first image frame and the second image frame in the same pixel position in the third target detection frame;
determining that a target moves between the first image frame and the second image frame if the parameter value of the first parameter is greater than a first threshold; wherein,
The first parameter characterizes a ratio of a first number to an area of the third target detection frame; the first number characterizes a number of pixels having an absolute value of the difference value greater than a second threshold.
In the above solution, before the calling the object detection model processes the first image frame and the second image frame respectively, the method further includes:
invoking an encoder in an image enhancement neural network to extract depth features of the first image frame and the second image frame respectively to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
And invoking a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
In the above solution, before invoking the image enhancement neural network and the target detection model, the method further includes:
Performing first-stage training on the first model to obtain a first model subjected to the first-stage training; the first model comprises an image enhancement neural network and a target detection model, and the output of the image enhancement neural network is used as the input of the target detection model in the first model;
Performing second-stage training on the first model trained in the first stage to obtain the first model trained in the second stage, and extracting an image enhancement neural network and a target detection model for calling from the first model trained in the second stage; wherein,
The loss value corresponding to the first stage is determined based on the first loss value and the second loss value; the first loss value characterizes a loss value of the image enhancement neural network; the second loss value is determined based on the loss value of the target detection model; and determining the loss value corresponding to the second stage based on the loss value of the target detection model.
In the above scheme, the method further comprises:
Performing style migration on the images in the first image set to obtain training data of the first stage; wherein,
Style migration takes images in the second image set as target fields; the first image set represents an RGB image set with labels; the second image set characterizes the non-annotated infrared image set.
In the above scheme, the number of training data used in the first stage is greater than the number of training data used in the second stage.
The embodiment of the application also provides a motion detection system, which comprises:
The target detection model is used for respectively processing the first image frame and the second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
The motion detection module is used for fusing the first target detection frame and the second target detection frame into a third target detection frame; and performing inter-frame difference calculation on the first image frame and the second image frame in the third object detection frame, and detecting whether an object moves or not based on a calculation result.
Wherein, in the above scheme, the system further includes:
The image enhancement neural network is used for calling an encoder to extract depth features of the first image frame and the second image frame before the target detection model respectively processes the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The image enhancement neural network is further used for calling a decoder to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
The embodiment of the application also provides electronic equipment, which comprises: a first processor and a first communication interface; wherein,
The first communication interface is used for acquiring at least two frames of first images captured by the infrared camera;
The first processor is used for calling a target detection model to respectively process a first image frame and a second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; fusing the first target detection frame and the second target detection frame into a third target detection frame; performing inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result; wherein,
The first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera.
In the above scheme, the first processor is further configured to invoke an encoder in an image enhancement neural network to perform depth feature extraction on the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The first processor is further configured to invoke a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature, respectively, to obtain an image-enhanced first image frame and an image-enhanced second image frame.
The embodiment of the application also provides electronic equipment, which is characterized by comprising: a first processor and a first memory for storing a computer program capable of running on the processor,
Wherein the first processor is configured to execute the steps of any of the methods described above when the computer program is run.
The embodiment of the application also provides a storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of any of the methods described above.
In the motion detection method, the motion detection system, the electronic device and the storage medium provided by the embodiment of the application, for two continuous image frames captured by an infrared camera, a target detection model is called to respectively process to obtain a first target detection frame of a first image frame and a second target detection frame of a second image frame, the first target detection frame and the second target detection frame are fused into a third target detection frame, then, inter-frame difference calculation is carried out on the first image frame and the second image frame in the third target detection frame, and whether a target moves is detected according to a calculation result. Here, the target detection frames of two continuous image frames are fused, and the inter-frame difference calculation is performed on the two continuous image frames in the target detection frames obtained by fusion, so that whether the target moves or not is detected, the influence of a large amount of noise in the infrared image frames on the movement detection result can be effectively reduced, and the detection accuracy of movement detection is improved.
Drawings
FIG. 1 is a schematic flow chart of a motion detection method according to an embodiment of the application;
FIG. 2 is a schematic diagram of an image enhancement neural network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a motion detection system according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training principle of a motion detection system according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a motion detection system according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the application.
Detailed Description
When the motion detection is carried out on the target in dark environments such as night, the conventional camera cannot work and needs to be adopted for shooting. In the related art, the motion detection of the target is directly performed based on the image acquired by the infrared camera, and compared with the conventional RGB image, the infrared image only comprises one channel and is a gray scale image, so that the available information contained in the infrared image is far less than the available information contained in the RGB image, and the detection accuracy is easily reduced. In addition, because the infrared camera is sensitive to the motion of the tiny target, noise is easy to appear in the detection process, and the detection result obtained by the method is easy to be influenced by the noise, so that the detection accuracy is also reduced.
Based on this, in each embodiment of the present application, for two continuous image frames captured by an infrared camera, a target detection model is called to process respectively, so as to obtain a first target detection frame of a first image frame and a second target detection frame of a second image frame, and the first target detection frame and the second target detection frame are fused into a third target detection frame, and then, inter-frame differential computation is performed on the first image frame and the second image frame in the third target detection frame, and whether the target moves is detected according to the computation result. Here, the target detection frames of two continuous image frames are fused, and the inter-frame difference calculation is performed on the two continuous image frames in the target detection frames obtained by fusion, so that whether the target moves or not is detected, the influence of a large amount of noise in the infrared image frames on the movement detection result can be effectively reduced, and the detection accuracy of movement detection is improved.
The present application will be described in further detail with reference to the accompanying drawings and examples.
The embodiment of the application provides a motion detection method, as shown in fig. 1, comprising the following steps:
Step 101: and calling a target detection model to respectively process the first image frame and the second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame.
Wherein the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera.
Specifically, continuous first and second image frames are extracted from video captured by an infrared camera, and it is understood that here, the first and second image frames are both infrared images. For convenience of explanation, let the first image frame be the nth frame image in the video, the pixel value be I N (a, b), and the second image frame be the n+1st frame image in the video, the pixel value be I N+1 (a, b).
Step 102: and fusing the first target detection frame and the second target detection frame into a third target detection frame.
Here, let the first target detection frame beThe second target detection frameWherein/>Abscissa information representing upper left corner of target detection frame,/>Ordinate information representing upper left corner of target detection frame,/>Abscissa information representing lower right corner of target detection frame,/>And the ordinate information representing the lower right corner of the target detection frame.
In an embodiment, fusing the first target detection frame and the second target detection frame into a third target detection frame includes:
determining coordinates of the first vertex of the third target detection frame based on the coordinates of the first vertex of the first target detection frame and the coordinates of the first vertex of the second target detection frame; wherein,
The first vertex characterizes an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
That is, in the fusion of the target detection frames, the abscissa of the a-vertex in the first target detection frame or the abscissa of the a-vertex in the second target detection frame may be determined as the abscissa of the a-vertex of the third target detection frame, and likewise, the ordinate of the a-vertex in the first target detection frame or the ordinate of the a-vertex in the second target detection frame may be determined as the ordinate of the a-vertex of the third target detection frame. Here, the a vertex may be an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
In practical application, the fused third target detection frame D (a 1,b1,a2,b2) is obtained by the following formula:
Through the formula, the third target detection frame can cover the areas of the first target detection frame and the second target detection frame, so that the influence of noise in the infrared image on the detection result can be effectively reduced by carrying out inter-frame difference calculation based on the third target detection frame, and a more accurate detection result is obtained.
It should be noted that, when fusing the target detection frames, if the target detection result of the nth frame image is null, that is, the first target detection frame is not output, and the target detection result of the n+1st frame image is not null, d=d N+1, that is, the second target detection frame is taken as the third target detection frame; if the target detection result of the n+1st frame image is null, i.e., the second target detection frame is not output, and the target detection result of the N frame image is not null, d=d N, i.e., the first target detection frame is taken as the third target detection frame. If the target detection results of the nth frame image and the n+1th frame image are both null, i.e. neither the first target detection frame nor the second target detection frame is output, D is null, then returning to step 101, and performing the fusion operation of the target detection frame on the n+1th frame image and the n+2th frame image.
Step 103: and carrying out inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result.
In an embodiment, the performing an inter-frame difference calculation on the first image frame and the second image frame in the third object detection frame, and detecting whether an object moves based on a calculation result includes:
calculating absolute values of differences of the first image frame and the second image frame in the same pixel position in the third target detection frame;
in the event that the parameter value of the first parameter is greater than a first threshold, determining that a motion of an object occurs between the first image frame and the second image frame.
Wherein the first parameter characterizes a ratio of a first number to an area of the third target detection frame; the first number characterizes a number of pixels having an absolute value of the difference value greater than a second threshold.
Here, an inter-frame difference result between the first image frame and the second image frame is calculated in the third target detection frame, that is, an absolute value of a pixel value difference value of the first image frame and the second image frame at the same position is calculated in the third target detection frame. Then, in the inter-frame difference calculation result, counting the number K of pixels with the absolute value of the corresponding pixel value difference value larger than the set second threshold value T, and calculating the motion proportion R:
R=K/[(a2-a1)×(b2-b1)]
Here, (a 2-a1)×(b2-b1) is the area of the third target detection frame, and thus, R can be understood as a ratio of the number of pixels K to the area of the third target detection frame. Then, when R is greater than the set first threshold value M, it is determined that the object has moved between the first image frame and the second image frame, and when R is less than or equal to the set first threshold value M, it is determined that the object has not moved between the first image frame and the second image frame.
To further improve the detection accuracy of the motion detection, in an embodiment, before the invoking the object detection model to process the first image frame and the second image frame, respectively, the method further comprises:
invoking an encoder in an image enhancement neural network to extract depth features of the first image frame and the second image frame respectively to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
And invoking a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
Specifically, the input of the image enhancement neural network is an original infrared image, the output is an enhanced infrared image with the same size as the original infrared image, the image size is 1H×W, namely, the number of channels is 1, the image height is H, and the width is W. As shown in fig. 2, the image enhancement neural network includes two parts, namely an encoder and a decoder, the encoder performs feature extraction on the input original infrared image based on the image enhancement neural network, and simultaneously reduces the feature size to increase the receptive field and reduce the calculation amount. The decoder upsamples the features based on the features extracted by the encoder to recover the original infrared image size. Here, fewer convolution kernels are used, which is more suitable for mobile-end scenarios.
After the image enhancement neural network is introduced, as shown in fig. 3, continuous nth frame images and n+1th frame images are extracted from video shot by an infrared camera, the continuous nth frame images and the n+1th frame images are firstly input into the image enhancement neural network, the nth frame images and the n+1th frame images after image enhancement are obtained, then a first target detection frame corresponding to the nth frame images and a second target detection frame corresponding to the n+1th frame images are respectively obtained through a target detection model, then the first target detection frame and the second target detection frame are fused into a third target detection frame by a moving target detection module, and inter-frame difference calculation is carried out on the nth frame images and the n+1th frame images in the third target detection frame, so that whether movement occurs in a target is determined. Here, the image enhancement neural network can adaptively adjust the image, enhance the image, so that the output image is more suitable for target detection model processing, and meanwhile, the image enhancement neural network can also perform denoising processing on the image, thereby improving the accuracy of target detection. The target detection model is used for realizing target detection based on the infrared image, outputting the position information of the target in the image, and the motion detection module is used for detecting the motion of the target by adopting an inter-frame difference method based on the target detection result. The scheme can provide the detection accuracy of motion detection on the whole.
In an embodiment of the present application, the image-enhanced neural network and the target detection model shown in fig. 3 are trained in two stages end-to-end. In an embodiment, before invoking the image enhancement neural network and the object detection model, the method further comprises:
Performing first-stage training on the first model to obtain a first model subjected to the first-stage training; the first model comprises an image enhancement neural network and a target detection model, and the output of the image enhancement neural network is used as the input of the target detection model in the first model;
And performing second-stage training on the first model trained in the first stage to obtain the first model trained in the second stage, and extracting the image enhancement neural network and the target detection model for calling from the first model trained in the second stage.
The loss value corresponding to the first stage is determined based on the first loss value and the second loss value; the first loss value characterizes a loss value of the image enhancement neural network; the second loss value is determined based on the loss value of the target detection model; and determining the loss value corresponding to the second stage based on the loss value of the target detection model.
Specifically, as shown in FIG. 4, in the first stage training, the training data set is based onTraining is performed. The input image of the image enhancement neural network is x t, the target image is x g, the target detection frame is y, the output of the image enhancement neural network is x p, and the output of the target detection model is pred. In actual application, for the target detection model, GIoU loss and Cross Entropy Loss are adopted, and a second loss value loss o is calculated based on pred and y; for the image enhancement neural network, the MSE Loss is used to calculate based on x p and x g to obtain a first Loss value Loss e, so as to obtain a Loss value Loss corresponding to the final first stage:
loss=losso+α·osse
Wherein the coefficient α is greater than 0 and less than or equal to 1.
Here, the first stage training may be understood as a pre-training process, while in the second stage training, the first model is fine-tuned again, and the corresponding loss value is determined based only on the loss value of the target detection model. Therefore, through two-stage training, the image obtained through image enhancement neural network processing can be focused on the region where the target is located, the infrared image is subjected to image enhancement processing by using the trained image enhancement neural network, the noise of the region where the target is located can be removed better, and finally the accuracy of the motion detection result is improved.
In an embodiment, the method further comprises:
And performing style migration and graying operation on the images in the first image set to obtain training data of the first stage.
The style migration takes the images in the second image set as target fields; the first image set represents an RGB image set with labels; the second image set characterizes the non-annotated infrared image set.
Here, the second image set D t without the target detection mark is formed based on a large number of infrared images that have been acquired by the infrared camera, and furthermore, the open source target detection data set D o with the mark based on the RGB image is taken as the first image set. Before training, carrying out style migration on the original RGB image in D o to an infrared image by using CycleGAN, wherein the target domain is D t, thereby obtaining training data of a first stageWherein each set of data is/>Where x i represents the original RGB image,/>Representing the image after style migration, y i represents the target detection result label.
Further, before training, can also be applied toThe image in (1) is subjected to graying operation, at this time,/>Can be expressed as/> The image after gradation is represented.
It can be seen that the training data of the first stage is generated based on a large number of open source data sets, and the second stage training is to fine tune the first model, so that data in a small number of set motion detection scenes can be used as the training data, so as to improve the detection performance of the first model in the set motion detection scenes. Thus, in an embodiment, the amount of training data used in the first stage is greater than the amount of training data used in the second stage.
In addition, based on the two-stage training, the trained image enhancement neural network can be independently used for enhancing the infrared image, so that the applicability of the model is improved.
Based on the above embodiment, the target detection frames of two continuous image frames in the video acquired by the infrared camera are fused, and the inter-frame difference calculation is performed on the two continuous image frames in the fused target detection frames, so that whether the target moves or not is detected, the condition that the pixel change in the infrared image is large is considered, the influence of a large amount of noise in the infrared image frames on the movement detection result can be effectively reduced, and the detection accuracy of the movement detection is improved. In addition, the model training method provided by the embodiment of the application is trained in two stages, the image obtained by processing the image enhancement neural network can be focused on the region where the target is located, the image enhancement processing is carried out on the infrared image by using the trained image enhancement neural network, the noise of the region where the target is located can be removed better, and finally the accuracy of the motion detection result is improved. And moreover, the trained image enhancement neural network can be independently used for enhancing the infrared image, so that the applicability of the model is improved.
In order to implement the motion detection method according to the embodiment of the present application, the embodiment of the present application further provides a motion detection system, as shown in fig. 5, including:
The object detection model 501 is configured to invoke the object detection model to process a first image frame and a second image frame, respectively, so as to obtain a first object detection frame of the first image frame and a second object detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
The motion detection module 502 is configured to fuse the first target detection frame and the second target detection frame into a third target detection frame; and performing inter-frame difference calculation on the first image frame and the second image frame in the third object detection frame, and detecting whether an object moves or not based on a calculation result.
Wherein, in one embodiment, the motion detection module 502 is configured to:
determining coordinates of the first vertex of the third target detection frame based on the coordinates of the first vertex of the first target detection frame and the coordinates of the first vertex of the second target detection frame; wherein,
The first vertex characterizes an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
In one embodiment, the motion detection module 502 is configured to:
calculating absolute values of differences of the first image frame and the second image frame in the same pixel position in the third target detection frame;
determining that a target moves between the first image frame and the second image frame if the parameter value of the first parameter is greater than a first threshold; wherein,
The first parameter characterizes a ratio of a first number to an area of the third target detection frame; the first number characterizes a number of pixels having an absolute value of the difference value greater than a second threshold.
In an embodiment, the system further comprises:
The image enhancement neural network is used for calling an encoder to extract depth features of a first image frame and a second image frame before the calling target detection model respectively processes the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The image enhancement neural network is further used for calling a decoder to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
In an embodiment, the system further comprises:
The model training module is used for carrying out first-stage training on the first model to obtain a first model which is subjected to the first-stage training; the first model comprises an image enhancement neural network and a target detection model, and the output of the image enhancement neural network is used as the input of the target detection model in the first model;
The model training module is also used for carrying out second-stage training on the first model trained in the first stage to obtain the first model trained in the second stage, and extracting the image enhancement neural network and the target detection model for calling from the first model trained in the second stage.
The loss value corresponding to the first stage is determined based on the first loss value and the second loss value; the first loss value characterizes a loss value of the image enhancement neural network; the second loss value is determined based on the loss value of the target detection model; and determining the loss value corresponding to the second stage based on the loss value of the target detection model.
In an embodiment, the system further comprises:
The training data module is used for carrying out style migration on the images in the first image set to obtain training data of the first stage; wherein,
Style migration takes images in the second image set as target fields; the first image set represents an RGB image set with labels; the second image set characterizes the non-annotated infrared image set.
In an embodiment, the number of training data used in the first stage is greater than the number of training data used in the second stage.
In practical application, the modules can be realized by a processor in the motion detection system. In addition, the training data module, the target detection model, the motion detection module and the image enhancement neural network can be deployed on different equipment entities, namely, the training and the use of the model are respectively realized through the different equipment entities.
It should be noted that: in the motion detection system provided in the above embodiment, only the division of each program module is used for illustration, and in practical application, the process allocation may be performed by different program modules according to needs, that is, the internal structure of the device is divided into different program modules, so as to complete all or part of the processes described above. In addition, the motion detection system and the motion detection method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the motion detection system and the motion detection method are detailed in the method embodiments and are not described herein again.
Based on the hardware implementation of the program modules, and in order to implement the method on the electronic device side in the embodiment of the present application, the embodiment of the present application further provides an electronic device, as shown in fig. 6, an electronic device 600 includes:
The first communication interface 601 is capable of performing information interaction with other network nodes;
The first processor 602 is connected to the first communication interface 601 to implement information interaction with other network nodes, and is configured to execute, when running a computer program, a method provided by one or more technical solutions on the electronic device side. And the computer program is stored on the first memory 603.
Specifically, the first processor 602 is configured to invoke a target detection model to process a first image frame and a second image frame, so as to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
fusing the first target detection frame and the second target detection frame into a third target detection frame;
And carrying out inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result.
In one embodiment, fusing the first target detection frame and the second target detection frame into a third target detection frame includes:
determining coordinates of the first vertex of the third target detection frame based on the coordinates of the first vertex of the first target detection frame and the coordinates of the first vertex of the second target detection frame; wherein,
The first vertex characterizes an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
In an embodiment, the performing an inter-frame difference calculation on the first image frame and the second image frame in the third object detection frame, and detecting whether an object moves based on a calculation result includes:
calculating absolute values of differences of the first image frame and the second image frame in the same pixel position in the third target detection frame;
determining that a target moves between the first image frame and the second image frame if the parameter value of the first parameter is greater than a first threshold; wherein,
The first parameter characterizes a ratio of a first number to an area of the third target detection frame; the first number characterizes a number of pixels having an absolute value of the difference value greater than a second threshold.
In an embodiment, the first processor 602 is further configured to, before the invoking target detection model processes a first image frame and a second image frame respectively, invoke an encoder in an image enhancement neural network to extract depth features of the first image frame and the second image frame respectively, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
the first processor 602 is further configured to invoke a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature, respectively, to obtain an image-enhanced first image frame and an image-enhanced second image frame.
In an embodiment, the first processor 602 is further configured to invoke an encoder in an image enhancement neural network to perform depth feature extraction on the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The first processor is further configured to invoke a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature, respectively, to obtain an image-enhanced first image frame and an image-enhanced second image frame.
The loss value corresponding to the first stage is determined based on the first loss value and the second loss value; the first loss value characterizes a loss value of the image enhancement neural network; the second loss value is determined based on the loss value of the target detection model; and determining the loss value corresponding to the second stage based on the loss value of the target detection model.
In an embodiment, the first processor 602 is further configured to perform style migration on the images in the first image set to obtain training data of the first stage; wherein,
Style migration takes images in the second image set as target fields; the first image set represents an RGB image set with labels; the second image set characterizes the non-annotated infrared image set.
In an embodiment, the number of training data used in the first stage is greater than the number of training data used in the second stage.
It should be noted that: the specific processing of the first processor 602 and the first communication interface 601 may be understood with reference to the above-described methods.
Of course, in actual practice, the various components in electronic device 600 are coupled together via bus system 604. It is understood that the bus system 604 is used to enable connected communications between these components. The bus system 604 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 604 in fig. 6.
The first memory 603 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device 600. Examples of such data include: any computer program for operating on the electronic device 600.
The method disclosed in the above embodiment of the present application may be applied to the first processor 602 or implemented by the first processor 602. The first processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method may be implemented by an integrated logic circuit of hardware or an instruction in software form in the first processor 602. The first Processor 602 described above may be a general purpose Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The first processor 602 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the first memory 603, said first processor 602 reading information in the first memory 603 and performing the steps of the method described above in connection with its hardware.
In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex Programmable logic devices (CPLDs, complex Programmable Logic Device), field-Programmable gate arrays (FPGAs), general purpose processors, controllers, microcontrollers (MCUs, micro Controller Unit), microprocessors (microprocessors), or other electronic elements for performing the aforementioned methods.
It will be appreciated that the first memory 603 of embodiments of the application may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be, among other things, a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), Magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), Direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present application further provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a first memory 603 storing a computer program, which is executable by the first processor 602 of the electronic device 600 to perform the steps of the aforementioned electronic device side method. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," etc. are used to distinguish similar objects and not necessarily to describe a particular order or sequence.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any combination of any one or more of at least two of the plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
In addition, the embodiments of the present application may be arbitrarily combined without any collision.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the present application.

Claims (13)

1. A method of motion detection, comprising:
Invoking a target detection model to respectively process a first image frame and a second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
fusing the first target detection frame and the second target detection frame into a third target detection frame;
And carrying out inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result.
2. The method of claim 1, wherein fusing the first target detection frame and the second target detection frame into a third target detection frame comprises:
determining coordinates of the first vertex of the third target detection frame based on the coordinates of the first vertex of the first target detection frame and the coordinates of the first vertex of the second target detection frame; wherein,
The first vertex characterizes an upper left vertex or a lower left vertex or an upper right vertex or a lower right vertex of the target detection frame.
3. The method according to claim 1, wherein the performing an inter-frame difference calculation on the first image frame and the second image frame within the third object detection frame and detecting whether an object moves based on a result of the calculation, comprises:
calculating absolute values of differences of the first image frame and the second image frame in the same pixel position in the third target detection frame;
determining that a target moves between the first image frame and the second image frame if the parameter value of the first parameter is greater than a first threshold; wherein,
The first parameter characterizes a ratio of a first number to an area of the third target detection frame; the first number characterizes a number of pixels having an absolute value of the difference value greater than a second threshold.
4. A method according to any one of claims 1 to 3, wherein before the invoking the object detection model to process the first image frame and the second image frame, respectively, the method further comprises:
invoking an encoder in an image enhancement neural network to extract depth features of the first image frame and the second image frame respectively to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
And invoking a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
5. The method of claim 4, wherein prior to invoking the image enhancement neural network and the object detection model, the method further comprises:
Performing first-stage training on the first model to obtain a first model subjected to the first-stage training; the first model comprises an image enhancement neural network and a target detection model, and the output of the image enhancement neural network is used as the input of the target detection model in the first model;
Performing second-stage training on the first model trained in the first stage to obtain the first model trained in the second stage, and extracting an image enhancement neural network and a target detection model for calling from the first model trained in the second stage; wherein,
The loss value corresponding to the first stage is determined based on the first loss value and the second loss value; the first loss value characterizes a loss value of the image enhancement neural network; the second loss value is determined based on the loss value of the target detection model; and determining the loss value corresponding to the second stage based on the loss value of the target detection model.
6. The method of claim 5, wherein the method further comprises:
Performing style migration on the images in the first image set to obtain training data of the first stage; wherein,
Style migration takes images in the second image set as target fields; the first image set represents an RGB image set with labels; the second image set characterizes the non-annotated infrared image set.
7. The method of claim 5 or 6, wherein the amount of training data used in the first stage is greater than the amount of training data used in the second stage.
8. A motion detection system, comprising:
The target detection model is used for respectively processing the first image frame and the second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; the first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera;
The motion detection module is used for fusing the first target detection frame and the second target detection frame into a third target detection frame; and performing inter-frame difference calculation on the first image frame and the second image frame in the third object detection frame, and detecting whether an object moves or not based on a calculation result.
9. The system of claim 8, wherein the system further comprises:
The image enhancement neural network is used for calling an encoder to extract depth features of the first image frame and the second image frame before the target detection model respectively processes the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The image enhancement neural network is further used for calling a decoder to up-sample the first depth feature and the second depth feature respectively to obtain a first image frame subjected to image enhancement and a second image frame subjected to image enhancement.
10. An electronic device, comprising: a first processor and a first communication interface; wherein,
The first communication interface is used for acquiring at least two frames of first images captured by the infrared camera;
The first processor is used for calling a target detection model to respectively process a first image frame and a second image frame to obtain a first target detection frame of the first image frame and a second target detection frame of the second image frame; fusing the first target detection frame and the second target detection frame into a third target detection frame; performing inter-frame difference calculation on the first image frame and the second image frame in the third target detection frame, and detecting whether a target moves or not based on a calculation result; wherein,
The first image frame and the second image frame characterize two consecutive image frames captured by an infrared camera.
11. The electronic device of claim 10, wherein the electronic device comprises a memory device,
The first processor is further configured to invoke an encoder in the image enhancement neural network to perform depth feature extraction on the first image frame and the second image frame, so as to obtain a first depth feature of the first image frame and a second depth feature of the second image frame; wherein a first size of the depth features extracted by the encoder is smaller than a second size of the image frames input to the encoder;
The first processor is further configured to invoke a decoder in the image enhancement neural network to up-sample the first depth feature and the second depth feature, respectively, to obtain an image-enhanced first image frame and an image-enhanced second image frame.
12. An electronic device, comprising: a first processor and a first memory for storing a computer program capable of running on the processor,
Wherein the first processor is adapted to perform the steps of the method of any of claims 1 to 7 when the computer program is run.
13. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1 to 7.
CN202211597881.0A 2022-12-12 2022-12-12 Motion detection method, system, electronic device and storage medium Pending CN118196136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211597881.0A CN118196136A (en) 2022-12-12 2022-12-12 Motion detection method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211597881.0A CN118196136A (en) 2022-12-12 2022-12-12 Motion detection method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN118196136A true CN118196136A (en) 2024-06-14

Family

ID=91412636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211597881.0A Pending CN118196136A (en) 2022-12-12 2022-12-12 Motion detection method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN118196136A (en)

Similar Documents

Publication Publication Date Title
WO2020238560A1 (en) Video target tracking method and apparatus, computer device and storage medium
CN111741211B (en) Image display method and apparatus
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
CN110956219B (en) Video data processing method, device and electronic system
US9563967B2 (en) Photographic subject tracking device and camera
CN114764868A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111104925B (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN111563517B (en) Image processing method, device, electronic equipment and storage medium
WO2020043296A1 (en) Device and method for separating a picture into foreground and background using deep learning
CN111598088B (en) Target detection method, device, computer equipment and readable storage medium
CN113542868A (en) Video key frame selection method and device, electronic equipment and storage medium
CN111160340B (en) Moving object detection method and device, storage medium and terminal equipment
CN111931781A (en) Image processing method and device, electronic equipment and storage medium
CN118196136A (en) Motion detection method, system, electronic device and storage medium
CN115358962B (en) End-to-end visual odometer method and device
US10832076B2 (en) Method and image processing entity for applying a convolutional neural network to an image
CN113901268A (en) Video image background acquisition method
JP2009266169A (en) Information processor and method, and program
Hao et al. A VLSI-implementation-friendly ego-motion detection algorithm based on edge-histogram matching
CN112329925B (en) Model generation method, feature extraction method, device and electronic equipment
KR101991043B1 (en) Video summarization method
CN116385731A (en) Small target detection method, system, storage medium and terminal based on context information and global attention
US11443521B2 (en) Method and device for evaluating images, operational assistance method and operating device
CN117994510A (en) Efficient ultralow illumination video segmentation method and device based on edge detection prompt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination