CN112287805A - Moving object detection method and device, readable storage medium and electronic equipment - Google Patents

Moving object detection method and device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN112287805A
CN112287805A CN202011160883.4A CN202011160883A CN112287805A CN 112287805 A CN112287805 A CN 112287805A CN 202011160883 A CN202011160883 A CN 202011160883A CN 112287805 A CN112287805 A CN 112287805A
Authority
CN
China
Prior art keywords
frame image
motion
image
determining
current frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011160883.4A
Other languages
Chinese (zh)
Inventor
王洪东
谭洪贺
孟南
白鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Horizon Shanghai Artificial Intelligence Technology Co Ltd
Original Assignee
Horizon Shanghai Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Horizon Shanghai Artificial Intelligence Technology Co Ltd filed Critical Horizon Shanghai Artificial Intelligence Technology Co Ltd
Priority to CN202011160883.4A priority Critical patent/CN112287805A/en
Publication of CN112287805A publication Critical patent/CN112287805A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a method for detecting a moving object, including: determining inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image; determining a multi-frame image set based on the current frame image and a preset number of historical frame images before the current frame image; determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set; determining at least one detected region of a predetermined scale based on the first motion region; and determining a moving target object in the detected area based on the detected area in the current frame image. According to the technical scheme, the image zooming processing and the moving target object detection are carried out on the first moving area, the calculated amount in the moving object detection process can be reduced, and the calculation efficiency is improved.

Description

Moving object detection method and device, readable storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for detecting a moving object, a readable storage medium, and an electronic device.
Background
The traditional image pyramid algorithm performs down-sampling processing on an input image, and then performs image recognition based on a down-sampling result. Since the scale supported by the module for detecting the moving target object (for example, an SOC chip capable of performing neural network operation) is fixed, but the scale of the target object is not known in advance by a System On Chip (SOC) chip, the pyramid result of each layer is detected by a hardware module for detecting the moving target object in the system on chip, and the module for detecting the moving target object determines one frame of the pyramid result according to the supported scale to detect the moving target object. Under the scene of large image scale, the method has the problems that the calculated amount is large, the multilayer pyramid processing needs to be carried out on each frame of input image, and simultaneously, the higher requirements on the computing power and the bandwidth of a System On Chip (SOC) chip are put forward if the image rate of processing the frame of 30 frames of image or 60 frames of image per second or even higher is to be achieved. In practical application, especially in the fields of security monitoring and the like, the backgrounds of continuous images of multiple frames of images are basically the same, and to some extent, only a small part of the images are considered to be changed, so that if each frame of image is repeatedly processed, a large amount of waste of bandwidth and calculation power is caused.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a moving object detection method and device, a readable storage medium and electronic equipment, wherein the detection of a moving target object is carried out based on a detected region of a first motion region, and the detection of the moving object in a region outside the first motion region in an image is avoided, so that the calculation amount in the moving object detection process is greatly reduced, and the calculation efficiency is improved.
According to an aspect of the present disclosure, there is provided a method of detecting a moving object, including:
determining inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image;
determining a multi-frame image set based on the current frame image and a preset number of historical frame images before the current frame image;
determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
determining at least one detected region of a predetermined scale based on the first motion region;
and determining a moving target object in the detected area based on the detected area in the current frame image.
According to a second aspect of the present disclosure, there is provided a detection device of a moving object, including:
a difference information acquisition module: the image processing device is used for determining the inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image;
a multi-frame image set determination module: the image processing device is used for determining a multi-frame image set based on the current frame image and a preset number of frame images before the current frame image;
a motion region acquisition module: the image processing device is used for determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
the scale conversion module: the device comprises a first motion area, a second motion area and a detection area, wherein the first motion area is used for determining at least one detected area with a preset scale;
a moving object detection module: for determining a moving object in the detected region based on the detected region in the current frame image.
According to a third aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the method of detecting a moving object described in any one of the above.
According to a fourth aspect of the present disclosure, there is provided an electronic apparatus comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize any one of the moving object detection methods.
In the four technical solutions of the present disclosure, the first motion region in the current frame image is determined by the difference information in the multi-frame image set, and then the calculation is performed according to the first motion region to obtain the detected region with a predetermined scale, and the detection is performed on the moving target object according to the detected region.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic flowchart of a moving object detection method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a schematic flowchart of differential information confirmation of a moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 3 is a schematic flow chart of the first motion region determination of the moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 4 is a schematic flowchart of a first motion region determination of a moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating merging of second motion regions of a moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 6 is a flowchart illustrating merging of second motion regions of a moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 7 is a schematic flowchart of detected reference image determination of a moving object detection method according to another exemplary embodiment of the present disclosure.
Fig. 8 is a flowchart illustrating a reference image determination of a moving object detection method according to another exemplary embodiment of the disclosure.
Fig. 9 is a schematic diagram of a moving object detection device according to an exemplary embodiment of the present disclosure.
Fig. 10 is a schematic diagram of a differential information acquisition module of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 11 is a schematic diagram of a first motion region acquisition module of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 12 is a schematic diagram of a first motion region submodule of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 13 is a schematic diagram of a first merging subunit of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 14 is a schematic diagram of a first motion region determination unit of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 15 is a schematic diagram of modules for determining a target object in a reference image of a moving object detection device according to another exemplary embodiment of the disclosure.
Fig. 16 is a schematic diagram of a reference image determination module of a moving object detection device according to another exemplary embodiment of the present disclosure.
Fig. 17 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
Summary of the application
For the detection of a moving object, especially in the process of detecting the moving object on the picture in the video by using a neural network, the imaging sizes of the objects are different due to different resolutions of sensors of an image acquisition module or different distances of shot objects, and it is generally necessary to scale each frame of image in the video acquired by the image acquisition module to form an image with a specified scale and then detect the moving object.
According to the technical scheme, inter-frame difference information is obtained by calculating the difference value of a current frame image and a previous frame image, a multi-frame image set is determined according to a plurality of historical frame images before the current frame image, a first motion area is determined according to the inter-frame difference information of the multi-frame image set, the first motion area of the current frame image is zoomed in the subsequent image identification process to obtain a detected area with a preset scale, and a motion target object is determined based on the detected area. In the method, the first motion area is zoomed, and the part outside the first motion area does not participate in calculation, so that the calculation amount in the process of zooming the image is greatly reduced, and the calculation efficiency is greatly improved.
Exemplary method
Fig. 1 is a method for detecting a moving object according to an exemplary embodiment of the present disclosure, including:
step 101, determining inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image of the current frame image;
in some embodiments, the current frame image refers to an image of a frame image being processed during image processing; the previous frame image refers to a frame image that precedes and is adjacent to the current frame image; the inter-frame difference information refers to a difference image obtained by performing difference calculation on the current frame image and the previous frame image and the brightness or gray scale information of each pixel in the difference image.
102, determining a multi-frame image set based on the current frame image and a preset number of historical frame images before the current frame image;
in some embodiments, the historical frame image refers to a multi-frame image before the current frame image, and in some preferred embodiments, the historical frame image refers to a continuous multi-frame image before the current frame image; the multi-frame image set refers to a set including a plurality of frame images preceding the current frame image. The number of the historical frame images can be selected according to the real-time requirement in the video processing process, for example, when the real-time requirement is high, one frame of image historical frame image is used, and when the real-time requirement is low, two frames of image historical frame images or three frames of image historical frame images can be selected. In the actual implementation process, the number of the historical frame images may also be set according to the bandwidth of a System On Chip (SOC) chip, for example, when the SOC chip is sensitive to the bandwidth ratio, the historical frame images may be set as one frame image, each frame image may calculate and update the motion region, and when the SOC chip has sufficient bandwidth, the difference result of the accumulation of consecutive multi-frame images may be calculated.
103, determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
in some embodiments, the inter-frame difference information refers to a difference image obtained by performing a difference calculation on a current frame image and a previous frame image and luminance or grayscale information of each pixel in the difference image. And accumulating a plurality of continuous inter-frame difference information in the multi-frame image set to obtain an accumulation result of the plurality of continuous inter-frame difference information. When the accumulation result of a specific pixel point in the accumulation result exceeds a specified threshold, the corresponding pixel point in the last frame image in the multi-frame image set is considered as a moving pixel point, and the positions of the moving pixel point and each pixel point set in a predetermined distance around the moving pixel point are a first moving region. When the interframe difference information of the multi-frame image set is accumulated, the moving pixels of the multi-frame images are accumulated, and the obtained result contains the moving pixels in each frame of image, so that a moving object generates ghosting along the moving direction of the moving object in the accumulated result, and therefore, the number of the frame images in the multi-frame image set in the previous step can be selected to be not more than 3, for example, so as to avoid the generation of ghosting.
104, determining at least one detected area with a preset scale based on the first motion area;
in some embodiments, the first moving area is scaled to obtain at least one detected area with a predetermined scale, and in a subsequent processing process, the detected area with the predetermined scale is sent to a module for detecting a moving target object, where the module for detecting the moving target object may be an SOC chip, an FPGA chip, an ASIC chip, or the like, for example, an SOC chip capable of performing a neural network operation. The predetermined scale is determined by a scale supported by a module for detecting a moving object, and the number of detected regions is determined by a scale of the first moving region and a scale supported by a module for detecting a moving object.
Step 105, determining a moving target object in the detected area based on the detected area in the current frame image.
In some embodiments, the detected region is a partial image of the current frame image having a moving target object, and the detection of the moving target object is performed based on the detected region, so that the amount of calculation can be reduced, and the accuracy of detection can be ensured. The moving object is an object whose position has changed between the current frame image and the previous frame image before the current frame image, and is, for example, a traveling vehicle, a running animal, a walking person, or the like.
In the technical scheme of the disclosure, a first motion region in a current frame image is judged through difference information in a multi-frame image set, then calculation is performed according to the first motion region to obtain a detected region with a preset scale, and a motion target object is detected according to the detected region.
As shown in fig. 2, on the basis of the above embodiment of fig. 1, step 101 may further include the following steps:
step 1011, determining a corresponding pixel difference value of a predetermined color channel in a predetermined color space based on the pixel values corresponding to the current frame image and the previous frame image;
in some embodiments, each frame of image has a pixel value in a predetermined color channel in a predetermined color space, and when calculating a pixel difference value, the pixel values of the predetermined color channel in the predetermined color space of the current frame of image and the previous frame of image are differentiated to obtain the pixel difference value. Taking the calculation of the current frame image and the previous frame image in the YUV space as an example, the pixel value of the Y channel of the current frame image in the YUV space is differentiated from the pixel value of the Y channel of the previous frame image in the YUV space, so as to obtain the pixel difference value of the current frame image and the previous frame image in the Y channel.
Taking the calculation of the current frame image and the previous frame image in the YUV space as an example, the calculation process is as follows:
the pixel difference value calculation formula corresponding to the Y channel is as follows:
|Y(x,y,t)–Y(x,y,t-1)|;
the pixel difference value calculation formula corresponding to the U channel is as follows:
|U(x,y,t)–U(x,y,t-1)|;
the pixel difference value calculation formula corresponding to the V channel is as follows:
|V(x,y,t)–V(x,y,t-1)|;
wherein x and y are coordinates of the image; t, t-1 is the number of frame images of the current frame image and the previous frame image; y, U, V respectively denote three channels, and specifically, Y (x, Y, t) denotes a pixel value at coordinates (x, Y) in the Y channel in the YUV space of the current frame image.
It should be understood by those skilled in the art that the above description is only given by way of example of a calculation process in YUV space, and that the calculation process may also be performed in RGB, HSV or HSL space.
Step 1012, determining the inter-frame difference information based on the pixel difference value and a predetermined weight value.
In some embodiments, the pixel difference value is a pixel difference value of a predetermined color channel in a predetermined color space calculated in the previous step, and the difference value is weighted and calculated according to a predetermined weight value to obtain inter-frame difference information. The above weight values are set according to the moving object to be focused on by the picture in the video and the background of the picture in the video, for example, information of the moving object to be focused on by the picture in the video is mainly concentrated in the Y channel, at this time, the requirement for information identification of the Y channel is high, the weight value corresponding to the Y channel is set to be a high weight value, at this time, the pixel difference value of the Y channel has a large influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel. For another example, the information of the background of the picture in the video is mainly concentrated in the V channel, and in this case, in the detection process, the information of the background does not need to be focused, so the information identification requirement on the V channel is high, and the weight value corresponding to the V channel is set to be a high weight value, and in this case, the pixel difference value of the V channel has little influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel and the V channel.
Still taking the current frame image and the previous frame image to calculate in YUV space as an example, the calculation process is as follows:
D(x,y,t)=((|Y(x,y,t)–Y(x,y,t-1)|<<W0)+
(|U(x,y,t)–U(x,y,t-1)|<<W1)+
(|V(x,y,t)–V(x,y,t-1)|<<W2))
wherein, W0+ W1+ W2 is 100%.
In the above formula, D (x, y, t) represents inter-frame difference information of a pixel of the current frame image at coordinates (x, y); y (x, Y, t) represents a pixel difference value of the current frame image at a coordinate (x, Y) of a Y channel in a YUV space, and Y (x, Y, t-1), U (x, Y, t), U (x, Y, t-1), V (x, Y, t), and V (x, Y, t-1) are similar; w0, W1, and W2 denote weight values of pixel differential values in three color channels. In some embodiments, the weight value is set according to a moving object to be focused on by a picture in the video and a background of the picture in the video, for example, information of the moving object to be focused on by the picture in the video is mainly concentrated in a Y channel, at this time, a requirement for information identification of the Y channel is high, the weight value corresponding to the Y channel is set to be a high weight value, at this time, a pixel difference value of the Y channel has a large influence on inter-frame difference information, and identification of information of the Y channel is facilitated. For another example, the information of the background of the picture in the video is mainly concentrated in the V channel, and in this case, in the detection process, the information of the background does not need to be focused, so the information identification requirement on the V channel is high, and the weight value corresponding to the V channel is set to be a high weight value, and in this case, the pixel difference value of the V channel has little influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel and the V channel.
The method in the embodiment is adopted to confirm the inter-frame difference information, and the pixel difference values of all color channels can be concerned, so that the calculation result is sensitive to the gray scale of the moving object. Of course, in some scenarios, for example, to adapt to a scene such as a signal light, only one or more components in YUV may be focused, so that the calculation result is sensitive to color changes of objects in the image.
As shown in fig. 3, on the basis of the embodiment shown in fig. 1, the step 103 may further include the following steps:
step 1031, accumulating the inter-frame difference information of each frame image in the multi-frame image set, and determining the accumulated difference information of the multi-frame image set;
in some embodiments, the inter-frame difference information of each frame of image in the multi-frame image set is accumulated, and the accumulated result is used as the accumulated difference information of the multi-frame image set; the difference information is subsequently used to determine a first motion region of the current frame image. The specific calculation formula of the difference information is as follows:
C(x,y,Δt)=D(x,y,t1)+D(x,y,t1+1)+…+D(x,y,t2);
wherein C (x, y, Δ t) represents a pixel difference value accumulation result accumulated at the (x, y) point; d (x, y, t1) represents a pixel difference value at (x, y) of the t1 frame image.
Step 1032, determining a motion pixel point in the current frame image based on a comparison result of the accumulated difference information and a predetermined threshold;
in some embodiments, the predetermined threshold is set according to the stray noise of the sensor acquiring the image, and the larger the stray noise of the sensor is, the larger the predetermined threshold is generally set, and the purpose of the predetermined threshold is to filter out the identification error caused by the stray noise. For example, the two images are substantially identical, and when the threshold is set too small, a motion region is identified on the two images due to the influence of sensor spurious noise.
Step 1033, determining a first motion region of the current frame image based on the motion pixel points in the current frame image.
In some embodiments, after the accumulated difference information of all the pixel points in the current frame image is determined, pixels with accumulated difference information greater than a predetermined threshold are screened out, that is, motion pixel points are screened out, and the position occupied by the motion pixel points is the first motion region. In general, in order to ensure the integrity of the moving object, pixels within a certain distance around the moving pixel are also used as the pixel value in the first moving region.
In the above embodiment, the inter-frame difference information in the multi-frame image set is accumulated, the moving pixel point is identified according to the accumulation result, and the moving region of the current frame is identified in the multi-frame image set, so that the calculation amount for identifying the moving region and the calculation amount for detecting a subsequent moving target object can be reduced, and the efficiency for identifying a moving object is greatly improved.
As shown in fig. 4, step 1033 may further include the following steps based on the embodiment shown in fig. 3:
step 10331, determining a pixel point set whose distance from the motion pixel point is within the predetermined distance range based on the motion pixel point and a predetermined distance;
in some embodiments, the predetermined distance is set according to a scene of each frame of the image. For example, the predetermined distance is set to be large when the object in the scene in each frame image is large, and the predetermined distance is set to be small when the object in the scene in each frame image is small. The pixel point set comprises the motion pixel points and the pixel points with the distance smaller than the preset distance from the motion pixel points. The purpose of the setting is to make the whole moving target object be included in the first moving area, for example, when a part of a certain moving target object moves, it is obvious that the moving target object cannot be included in the first moving area by using the moving pixel as the moving area, but when the first moving area is determined by setting a reasonable predetermined distance and using the pixels within the predetermined distance around the moving pixel as elements of the pixel set, the moving target object can be included in the first moving area generally.
Step 10332, determining the first motion region based on the set of pixel points.
In some embodiments, the positions of the moving pixels in the current frame image and the positions of the pixels within a predetermined distance around the moving pixel in the image are combined to obtain a first moving region. For example, when the predetermined distance is set to 16 pixels, the size of the first motion region determined by one motion pixel is 32 × 32 of the first motion region. At this time, the first motion region includes both the motion pixel point and the pixel point within the predetermined distance around the motion pixel point.
By adopting the technical scheme in the embodiment, the position of the moving pixel point and the pixel point within the preset distance around the moving pixel point, which occupies the current frame image, is taken as the first moving area, so that the whole moving target object can be ensured to be contained in the first moving area, and the subsequent identification of the moving target object in the first moving area is facilitated.
As shown in fig. 5, based on the embodiment shown in fig. 4, step 10332 may further include the following steps:
step 103321, determining a second motion region based on the set of pixel points;
in some embodiments, each motion pixel point can determine a pixel point set, and the positions occupied by each pixel point of each pixel set are combined to form a second motion region; because each second motion region not only contains motion pixel points, but also contains pixel points within a preset distance around the motion pixel points. When the distance between the two moving pixels is smaller than the preset distance, the pixels which are overlapped inevitably occur in a second moving area determined by the pixel set corresponding to the two moving pixels.
Step 103322 is to merge two or more second motion regions into a first motion region based on overlapping pixels of the two or more second motion regions.
In some embodiments, when two second motion regions have overlapped pixel points, one of the motion pixel points on the surface is in the second motion region corresponding to the other motion pixel point, at this time, the two motion pixel points may be regarded as two motion pixel points of the same motion target object, and therefore, the two second motion regions are merged into the first motion region, so that it is ensured that the same motion target object is not divided, and subsequent motion target object identification is facilitated.
As shown in fig. 6, based on the embodiment shown in fig. 4, step 10332 may further include the following steps:
step 103321, determining a second motion region based on the set of pixel points;
in some embodiments, each motion pixel point can determine a pixel point set, and the positions occupied by the pixel points of each pixel set are combined to form the second motion region. When the distance between two adjacent second motion areas is small, the two second motion areas can be generally considered as motion areas on the same object.
Step 103323, determining a merging result of two adjacent second motion areas based on the distance between the two adjacent second motion areas;
in some embodiments, when a part of the moving object is similar to the image of the background area, discontinuity may be caused between two second moving areas recognized by the same moving object, and in order to ensure that the same moving object is divided into the same first moving area, it is determined whether to merge the two second moving areas according to a distance between two adjacent second moving areas. And when the distance is not less than a specific distance, the two second motion areas are respectively used as two first motion areas. The specific distance may be set according to a scene of a screen in the video or according to a moving object to be recognized.
And 103324, merging two adjacent second motion areas with the distance meeting the preset condition into the first motion area based on the merging result.
In some embodiments, the predetermined condition may be that the distance between the two second motion areas is less than a specific distance. Based on the above steps, two second motion areas with a distance smaller than a specific distance in the current frame image are merged into a first motion area. When the specific distance is set, the setting needs to be performed according to the scene of the picture in the video, and the setting can also be performed according to the moving object which needs to be identified, for example, when the proportion of the moving object in the video image is large, the setting of the specific distance can be large, and for example, when the color of the scene of the picture in the video is single, the setting of the specific distance can be small. The purpose of merging the two second motion areas with the distance smaller than the specific distance in the current frame image is to ensure that the same moving target object is divided into the same first motion area, and if the specific distance is set to be too small, the complete moving object may be divided, so that the subsequent modules cannot be correctly identified; if the specific distance is set too large, the first motion region will have a larger dimension, and the subsequent scaling of the first motion region and the amount of calculation for the identification of the detected region will increase.
As shown in fig. 7, on the basis of the embodiment shown in fig. 1, the method further includes the following steps:
step 106, determining a reference image based on the comparison result of the current frame image and a predetermined condition;
in some embodiments, the pictures in the video represent scenes with slowly changing backgrounds, and in this case, reference images need to be intermittently used, which helps the system to grasp the slowly changing backgrounds. The predetermined condition may be determined according to a background of a picture in the video, or may be determined according to a moving object of the picture in the video, for example, a current frame image may be selected as a reference image every interval of a fixed number of history frame images according to a background change speed of the picture in the video, when the background change speed of the picture in the video is fast, the number of the interval history frame images is small, and when the background change speed of the picture in the video is slow, the number of the interval history frame images is large. Whether the current frame image is used as the reference image can also be determined according to the proportion of the first motion area to the current frame image.
Step 107, successively performing down-sampling processing on the reference image to obtain at least one detected reference image with a predetermined scale;
in some embodiments, the reference image is also limited by a scale supported by a module (e.g., an SOC chip capable of performing neural network operations) for detecting the moving target object, and therefore, the reference image needs to be down-sampled at least once to obtain a detected reference image with at least one predetermined scale. Since the smallest image scale obtained by a single down-sampling is 1/2 of the original image scale, when the scale of the reference image is greatly different from the predetermined scale, the reference image needs to be down-sampled for multiple times, and each down-sampling obtains a frame of detected reference image with the predetermined scale until a detected reference image with the scale supported by the module for detecting the moving target object is obtained, so that the number of the detected reference images needs to be determined by the scale supported by the module for detecting the moving target object and the scale of the reference image. The predetermined scale may be determined by a scale supported by a module that detects moving objects.
Step 108, based on the detected reference image of the at least one predetermined scale, determining the moving object in the reference image.
In some embodiments, after the detected reference image is obtained, the detected reference image is sent to a module for detecting a moving object, such as an SOC chip capable of performing neural network operations. The module is used for detecting the moving target object in the reference image.
In the above embodiment, by selecting the reference image at intervals, the background change of the picture in the video can be identified, so that the influence of the background change on the identification of the moving object is eliminated. For example, with the change of time, the illumination change caused by the rising and falling of the sun can affect the brightness of each frame of image in the video.
On the basis of the embodiment shown in fig. 7, step 106 includes: and determining a reference image based on the number of frame images of the current frame image and the previous reference image interval.
In some embodiments, the determination method of the reference image is a static determination method, and the specific method is to take one frame image as the reference image every fixed number of frame images, for example, the reference image is referred to as an a frame image, the frame image in which the first motion region is scaled is referred to as a B frame image, and the detected reference image obtained by scaling the reference image is periodically sent to the motion target recognition module for processing every T periods through setting a period T. For example, taking a video with a frame rate of 50Hz as an example, when T is 100 milliseconds, that is, a frame image is taken as a reference image every 100 milliseconds, and at this time, the arrangement order of the reference image and the frame image for scaling the first motion region is abbbbbbbbbb. Since the frame image rate is fixed in the same video, that is, the time occupied by each frame image is fixed, and the occupied time is fixed, a fixed period is defined, that is, the number of frame images at intervals is determined. Of course, it is also possible to directly set a fixed number of interval frame images, for example, set 3 frame images per interval, that is, obtain one frame image as a reference image, and at this time, the frame images of the full-image zoom and the frame images of the first motion region zoom are arranged as follows: ABBBABBB.
By setting the reference image in the manner of this embodiment, the influence caused by slow change of the background environment can be eliminated, which is beneficial to accurately identifying the moving target object.
As shown in fig. 8, on the basis of the embodiment shown in fig. 7, step 106 may further include the following steps:
step 1061, determining the proportion of the first motion area in the current frame image to the current frame image;
in some embodiments, the ratio of the first motion region to the current frame image is a ratio of the number of pixels in the first motion region to the number of all pixels in the current frame image. In some embodiments, the first motion region may be multiple, and when the first motion region has multiple numbers, the ratio refers to a ratio of a sum of the numbers of pixels in the multiple first motion regions to the number of all pixels in the current frame image.
Step 1062, determining a reference image based on the comparison result of the ratio and the predetermined threshold range.
In some embodiments, when the detected motion area exceeds a certain proportion of the full map, the full map is automatically zoomed and then sent to the motion object detection module for processing, and the proportion of the motion area can be configured according to different application scenes. For example, in a scene where people flow, traffic flow, or animal activity is dense, setting the ratio smaller helps to identify the activity details of the picture in the video, and at this time, the ratio of the a-frame image and the B-frame image is dynamically changed. For example, when the proportion of the first motion area of the first frame image, the fourth to sixth frame images, and the twelfth frame image exceeds a certain proportion of the full image, the first frame image, the fourth to sixth frame images, and the twelfth frame image are taken as the reference frame, and the first motion areas of the other frame images do not exceed a certain proportion of the full image, and the first motion areas are scaled, so that the arrangement order of the reference image and the frame images for scaling the first motion areas is as follows: ABBAAABBBBBABB.
By adopting the technical scheme of the embodiment, the reference image can be determined according to the proportion of the moving target object in the current frame image, so that the activity details of the picture moving target object in the video can be fully identified.
It will be appreciated by those skilled in the art that the above-described ways of setting the reference image, both static and dynamic, can be switched. For example, the ratio (or absolute number) of the moving pixels in the unit time is detected, and when the ratio is low, the reference image is set to be selected in a static mode. For another example, according to the scene, it is specified that the reference image is selected in a static manner at night and in a dynamic manner in the day.
On the basis of the embodiment shown in fig. 7, the following steps may be further included after step 107:
and determining a detected image with a preset scale corresponding to the current frame image based on the detected region of the current frame image and the detected reference image of the reference image of a frame image before the current frame image.
In some embodiments, this step can be implemented in two ways, one of which is as follows: from the coordinates of the first motion area, the position of the motion area relative to the start of the image is obtained, which can be converted into an address offset relative to the start of the image. When a module for detecting a moving object needs a scaling result of a full image of a current frame image, the scaling result of the detected region and a frame of reference image before the current frame image, namely at least one frame of detected reference image, can be read out simultaneously, the detected reference image and the detected region are summed according to the address offset of the first moving region, and the summed result is the scaling result of the full image of the current frame image. The second is as follows: if the module for detecting the moving target object needs to directly obtain the full-image scaling result of the current frame image instead of transferring through the storage unit, the previous detected reference image stored in the storage unit can be read out simultaneously in the calculation process of the detected area, whether the detected area and the previous detected reference image are summed and output to the subsequent processing module or the previous detected reference image in the storage unit is output to the subsequent processing module is judged according to whether the current image pixel is a moving area, if the pixel is in the first moving area, the detected area and the previous detected reference image are summed and output to the subsequent module, otherwise, the previous detected reference image is directly output to the subsequent module for processing. The processing transmits the detected area with the preset scale obtained from the motion area in the current frame image to the module for detecting the moving target object, thereby saving the processes of storing the detected area with the preset scale into the storage unit and reading out the detected area from the storage unit, and greatly reducing the time delay of image processing.
Exemplary devices
Fig. 9 is a detection device for a moving object according to an exemplary embodiment of the present disclosure, including:
a difference information obtaining module 901, where a user determines inter-frame difference information between a current frame image and a previous frame image based on the current frame image and the previous frame image of the current frame image;
in some embodiments, the current frame image refers to an image of a frame image being processed during image processing; the previous frame image refers to a frame image that precedes and is adjacent to the current frame image; the inter-frame difference information refers to a difference image obtained by performing difference calculation on the current frame image and the previous frame image and the brightness or gray scale information of each pixel in the difference image.
A multi-frame image set determining module 902, configured to determine a multi-frame image set based on the current frame image and a predetermined number of historical frame images before the current frame image;
in some embodiments, the historical frame image refers to a multi-frame image before the current frame image, and in some preferred embodiments, the historical frame image refers to a continuous multi-frame image before the current frame image; the multi-frame image set refers to a set including a plurality of frame images preceding the current frame image. The number of the historical frame images can be selected according to the real-time requirement in the video processing process, for example, when the real-time requirement is high, one frame of image historical frame image is used, and when the real-time requirement is low, two frames of image historical frame images or three frames of image historical frame images can be selected. In the actual implementation process, the number of the historical frame images may also be set according to the bandwidth of the System On Chip (SOC), for example, when the System On Chip (SOC) is sensitive to the bandwidth, the historical frame images may be set as one frame image, each frame image may calculate and update the motion region, and when the System On Chip (SOC) has sufficient bandwidth, the difference result of the accumulation of consecutive multi-frame images may be calculated.
A first motion region obtaining module 903, configured to determine a first motion region in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
in some embodiments, the inter-frame difference information refers to a difference image obtained by performing a difference calculation on a current frame image and a previous frame image and luminance or grayscale information of each pixel in the difference image. And accumulating a plurality of continuous inter-frame difference information in the multi-frame image set to obtain an accumulation result of the plurality of continuous inter-frame difference information. When the accumulation result of a specific pixel point in the accumulation result exceeds a specified threshold, the corresponding pixel point in the last frame image in the multi-frame image set is considered as a moving pixel point, and the positions of the moving pixel point and each pixel point set in a predetermined distance around the moving pixel point are a first moving region. When the interframe difference information of the multi-frame image set is accumulated, the moving pixels of the multi-frame images are accumulated, and the obtained result contains the moving pixels in each frame of image, so that a moving object generates ghosting along the moving direction of the moving object in the accumulated result, and therefore, the number of the frame images in the multi-frame image set in the previous step can be selected to be not more than 3, for example, so as to avoid the generation of ghosting.
A scale conversion module 904 for determining a detected region of at least one predetermined scale based on the first motion region;
in some embodiments, the first moving area is scaled to obtain at least one detected area with a predetermined scale, and in a subsequent processing process, the detected area with the predetermined scale is sent to a module for detecting a moving target object, where the module for detecting the moving target object may be an SOC chip, an FPGA chip, an ASIC chip, or the like, for example, an SOC chip capable of performing a neural network operation. The predetermined scale is determined by a scale supported by a module for detecting a moving object, and the number of detected regions is determined by a scale of the first moving region and a scale supported by a module for detecting a moving object.
A moving object detection module 905, configured to determine a moving object in the detected region based on the detected region in the current frame image.
In some embodiments, the detected region is a partial image of the current frame image having a moving target object, and the detection of the moving target object is performed based on the detected region, so that the amount of calculation can be reduced, and the accuracy of detection can be ensured. The moving object is an object whose position has changed between the current frame image and the previous frame image before the current frame image, and is, for example, a traveling vehicle, a running animal, a walking person, or the like.
In the technical scheme of the disclosure, a first motion region in a current frame image is judged through difference information in a multi-frame image set, then calculation is performed according to the first motion region to obtain a detected region with a preset scale, and a motion target object is detected according to the detected region.
As shown in fig. 10, on the basis of the embodiment of fig. 9, the difference information obtaining module 901 may further include the following sub-modules:
a pixel difference sub-module 9011, configured to determine, based on a pixel value corresponding to the current frame image and the previous frame image, a corresponding pixel difference value of a predetermined color channel in a predetermined color space;
in some embodiments, each frame of image has a pixel value in a predetermined color channel in a predetermined color space, and when calculating a pixel difference value, the pixel values of the predetermined color channel in the predetermined color space of the current frame of image and the previous frame of image are differentiated to obtain the pixel difference value. Taking the calculation of the current frame image and the previous frame image in the YUV space as an example, the pixel value of the Y channel of the current frame image in the YUV space is differentiated from the pixel value of the Y channel of the previous frame image in the YUV space, so as to obtain the pixel difference value of the current frame image and the previous frame image in the Y channel.
Taking the calculation of the current frame image and the previous frame image in the YUV space as an example, the calculation process is as follows:
the pixel difference value calculation formula corresponding to the Y channel is as follows:
|Y(x,y,t)–Y(x,y,t-1)|;
the pixel difference value calculation formula corresponding to the U channel is as follows:
|U(x,y,t)–U(x,y,t-1)|;
the pixel difference value calculation formula corresponding to the V channel is as follows:
|V(x,y,t)–V(x,y,t-1)|;
wherein x and y are coordinates of the image; t, t-1 is the number of frame images of the current frame image and the previous frame image; y, U, V respectively denote three channels, and specifically, Y (x, Y, t) denotes a pixel value at coordinates (x, Y) in the Y channel in the YUV space of the current frame image.
It should be understood by those skilled in the art that the above description is only given by way of example of a calculation process in YUV space, and that the calculation process may also be performed in RGB, HSV or HSL space.
An inter-frame difference sub-module 9012, configured to determine the inter-frame difference information based on the pixel difference value and a predetermined weight value.
In some embodiments, the pixel difference value is a pixel difference value of a predetermined color channel in a predetermined color space calculated in the previous step, and the difference value is weighted and calculated according to a predetermined weight value to obtain inter-frame difference information. The above weight values are set according to the moving object to be focused on by the picture in the video and the background of the picture in the video, for example, information of the moving object to be focused on by the picture in the video is mainly concentrated in the Y channel, at this time, the requirement for information identification of the Y channel is high, the weight value corresponding to the Y channel is set to be a high weight value, at this time, the pixel difference value of the Y channel has a large influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel. For another example, the information of the background of the picture in the video is mainly concentrated in the V channel, and in this case, in the detection process, the information of the background does not need to be focused, so the information identification requirement on the V channel is high, and the weight value corresponding to the V channel is set to be a high weight value, and in this case, the pixel difference value of the V channel has little influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel and the V channel.
Still taking the current frame image and the previous frame image to calculate in YUV space as an example, the calculation process is as follows:
D(x,y,t)=((|Y(x,y,t)–Y(x,y,t-1)|<<W0)+
(|U(x,y,t)–U(x,y,t-1)|<<W1)+
(|V(x,y,t)–V(x,y,t-1)|<<W2))
wherein, W0+ W1+ W2 is 100%.
In the above formula, D (x, y, t) represents inter-frame difference information of a pixel of the current frame image at coordinates (x, y); y (x, Y, t) represents a pixel difference value of the current frame image at a coordinate (x, Y) of a Y channel in a YUV space, and Y (x, Y, t-1), U (x, Y, t), U (x, Y, t-1), V (x, Y, t), and V (x, Y, t-1) are similar; w0, W1, and W2 denote weight values of pixel differential values in three color channels. In some embodiments, the weight value is set according to a moving object to be focused on by a picture in the video and a background of the picture in the video, for example, information of the moving object to be focused on by the picture in the video is mainly concentrated in a Y channel, at this time, a requirement for information identification of the Y channel is high, the weight value corresponding to the Y channel is set to be a high weight value, at this time, a pixel difference value of the Y channel has a large influence on inter-frame difference information, and identification of information of the Y channel is facilitated. For another example, the information of the background of the picture in the video is mainly concentrated in the V channel, and in this case, in the detection process, the information of the background does not need to be focused, so the information identification requirement on the V channel is high, and the weight value corresponding to the V channel is set to be a high weight value, and in this case, the pixel difference value of the V channel has little influence on the inter-frame difference information, which is beneficial to identifying the information of the Y channel and the V channel.
The method in the embodiment is adopted to confirm the inter-frame difference information, and the pixel difference values of all color channels can be concerned, so that the calculation result is sensitive to the gray scale of the moving object. Of course, in some scenarios, for example, to adapt to a scene such as a signal light, only one or more components in YUV may be focused, so that the calculation result is sensitive to color changes of objects in the image.
As shown in fig. 11, on the basis of the embodiment shown in fig. 9, the first motion region acquiring module 903 may further include the following sub-modules:
the difference accumulation submodule 9031 is configured to accumulate inter-frame difference information of each frame of image in the multi-frame image set, and determine accumulated difference information of the multi-frame image set;
in some embodiments, the inter-frame difference information of each frame of image in the multi-frame image set is accumulated, and the accumulated result is used as the accumulated difference information of the multi-frame image set; the difference information is subsequently used to determine a first motion region of the current frame image. The specific calculation formula of the difference information is as follows:
C(x,y,Δt)=D(x,y,t1)+D(x,y,t1+1)+…+D(x,y,t2);
wherein C (x, y, Δ t) represents a pixel difference value accumulation result accumulated at the (x, y) point; d (x, y, t1) represents a pixel difference value at (x, y) of the t1 frame image.
A moving pixel point determining submodule 9032, configured to determine a moving pixel point in the current frame image based on a comparison result between the accumulated difference information and a predetermined threshold;
in some embodiments, the predetermined threshold is set according to the stray noise of the sensor acquiring the image, and the larger the stray noise of the sensor is, the larger the predetermined threshold is generally set, and the purpose of the predetermined threshold is to filter out the identification error caused by the stray noise. For example, the two images are substantially identical, and when the threshold is set too small, a motion region is identified on the two images due to the influence of sensor spurious noise.
The first motion region sub-module 9033 determines a first motion region of the current frame image based on the motion pixel points in the current frame image.
In some embodiments, after the accumulated difference information of all the pixel points in the current frame image is determined, pixels with accumulated difference information greater than a predetermined threshold are screened out, that is, motion pixel points are screened out, and the position occupied by the motion pixel points is the first motion region. In general, in order to ensure the integrity of the moving object, pixels within a certain distance around the moving pixel are also used as the pixel value in the first moving region.
In the above embodiment, the inter-frame difference information in the multi-frame image set is accumulated, the moving pixel point is identified according to the accumulation result, and the moving region of the current frame is identified in the multi-frame image set, so that the calculation amount for identifying the moving region and the calculation amount for detecting the subsequent moving target object can be reduced.
As shown in fig. 12, on the basis of the embodiment shown in fig. 11, the first motion region sub-module 9033 may further include the following units:
a pixel point set determining unit 90331, configured to determine, based on the moving pixel point and a predetermined distance, a pixel point set whose distance from the moving pixel point is within the predetermined distance range;
in some embodiments, the predetermined distance is set according to a scene of each frame of the image. For example, the predetermined distance is set to be large when the object in the scene in each frame image is large, and the predetermined distance is set to be small when the object in the scene in each frame image is small. The pixel point set comprises the motion pixel points and the pixel points with the distance smaller than the preset distance from the motion pixel points. The purpose of the setting is to make the whole moving target object be included in the first moving area, for example, when a part of a certain moving target object moves, it is obvious that the moving target object cannot be included in the first moving area by using the moving pixel as the moving area, but when the first moving area is determined by setting a reasonable predetermined distance and using the pixels within the predetermined distance around the moving pixel as elements of the pixel set, the moving target object can be included in the first moving area generally.
A first motion region determining unit 90332, configured to determine the first motion region based on the pixel point set.
In some embodiments, the positions of the moving pixels in the current frame image and the positions of the pixels within a predetermined distance around the moving pixel in the image are combined to obtain a first moving region. For example, when the predetermined distance is set to 16, the first motion region determined by one motion pixel point has a dimension of 32 × 32. At this time, the first motion region includes both the motion pixel point and the pixel point within the predetermined distance around the motion pixel point.
By adopting the technical scheme in the embodiment, the position of the moving pixel point and the pixel point within the preset distance around the moving pixel point, which occupies the current frame image, is taken as the first moving area, so that the whole moving target object can be ensured to be contained in the first moving area, and the subsequent identification of the moving target object in the first moving area is facilitated.
As shown in fig. 13, on the basis of the above-mentioned embodiment shown in fig. 12, the first motion region determining unit 90332 may further include the following sub-units:
a second motion region determining subunit 903321, configured to determine a second motion region based on the set of pixel points;
in some embodiments, each motion pixel point can determine a pixel point set, and the positions occupied by each pixel point of each pixel set are combined to form a second motion region; because each second motion region not only contains motion pixel points, but also contains pixel points within a preset distance around the motion pixel points. When the distance between the two moving pixels is smaller than the preset distance, the pixels which are overlapped inevitably occur in a second moving area determined by the pixel set corresponding to the two moving pixels.
A first merging subunit 903322, configured to merge two or more second motion regions into a first motion region based on overlapping pixel points that the two or more second motion regions have.
In some embodiments, when two second motion regions have overlapped pixel points, one of the motion pixel points on the surface is in the second motion region corresponding to the other motion pixel point, at this time, the two motion pixel points may be regarded as two motion pixel points of the same motion target object, and therefore, the two second motion regions are merged into the first motion region, so that it is ensured that the same motion target object is not divided, and subsequent motion target object identification is facilitated.
As shown in fig. 14, on the basis of the embodiment shown in fig. 12, the first motion region acquiring module 90332 may further include the following sub-units:
a second motion region determining subunit 903321, configured to determine a second motion region based on the set of pixel points;
in some embodiments, each motion pixel point can determine a pixel point set, and the positions occupied by the pixel points of each pixel set are combined to form the second motion region. When the distance between two adjacent second motion areas is small, the two second motion areas can be generally considered as motion areas on the same object.
A merging result confirmation subunit 903323, configured to determine a merging result of two adjacent second motion regions based on a distance between the two adjacent second motion regions;
in some embodiments, when a part of the moving object is similar to the image of the background area, discontinuity may be caused between two second moving areas recognized by the same moving object, and in order to ensure that the same moving object is divided into the same first moving area, it is determined whether to merge the two second moving areas according to a distance between two adjacent second moving areas. And when the distance is not less than a specific distance, the two second motion areas are respectively used as two first motion areas. The specific distance may be set according to a scene of a screen in the video or according to a moving object to be recognized.
A second merging subunit 903324, configured to merge two adjacent second motion areas with a distance meeting a preset condition into a first motion area based on the merging result.
In some embodiments, the predetermined condition may be that the distance between the two second motion areas is less than a specific distance. Based on the above steps, two second motion areas with a distance smaller than a specific distance in the current frame image are merged into a first motion area. When the specific distance is set, the setting needs to be performed according to the scene of the picture in the video, and the setting can also be performed according to the moving object which needs to be identified, for example, when the proportion of the moving object in the video image is large, the setting of the specific distance can be large, and for example, when the color of the scene of the picture in the video is single, the setting of the specific distance can be small. The purpose of merging the two second motion areas with the distance smaller than the specific distance in the current frame image is to ensure that the same moving target object is divided into the same first motion area, and if the specific distance is set to be too small, the complete moving object may be divided, so that the subsequent modules cannot be correctly identified; if the specific distance is set too large, the first motion region will have a larger dimension, and the subsequent scaling of the first motion region and the amount of calculation for the identification of the detected region will increase.
As shown in fig. 15, on the basis of the embodiment shown in fig. 9, the apparatus further includes the following modules:
a reference image determining module 906 for determining a reference image based on a comparison result of the current frame image with a predetermined condition;
in some embodiments, the pictures in the video represent scenes with slowly changing backgrounds, and in this case, reference images need to be intermittently used to help the system grasp the slowly changing backgrounds. The predetermined condition may be determined according to a background of a picture in the video, or may be determined according to a moving object of the picture in the video, for example, a current frame image may be selected as a reference image every interval of a fixed number of history frame images according to a background change speed of the picture in the video, when the background change speed of the picture in the video is fast, the number of the interval history frame images is small, and when the background change speed of the picture in the video is slow, the number of the interval history frame images is large. Whether the current frame image is used as the reference image can also be determined according to the proportion of the first motion area to the current frame image.
A detected reference image determining module 907, configured to perform downsampling on the reference images successively to obtain at least one detected reference image with a predetermined scale;
in some embodiments, the reference image is also limited by a scale supported by a module (e.g., an SOC chip capable of performing neural network operations) for detecting the moving target object, and therefore, the reference image needs to be down-sampled at least once to obtain a detected reference image with at least one predetermined scale. Since the smallest image scale obtained by a single down-sampling is 1/2 of the original image scale, when the scale of the reference image is greatly different from the predetermined scale, the reference image needs to be down-sampled for multiple times, and each down-sampling obtains a frame of detected reference image with the predetermined scale until a detected reference image with the scale supported by the module for detecting the moving target object is obtained, so that the number of the detected reference images needs to be determined by the scale supported by the module for detecting the moving target object and the scale of the reference image. The predetermined scale may be determined by a scale supported by a module that detects moving objects.
A reference image object determination module 908 for determining the moving object in the reference image based on the detected reference image of the at least one predetermined scale.
In some embodiments, after the detected reference image is obtained, the detected reference image is sent to a module for detecting a moving object, such as an SOC chip capable of performing neural network operations. The module is used for detecting the moving target object in the reference image.
In the above embodiment, by selecting the reference image at intervals, the background change of the picture in the video can be identified, so that the influence of the background change on the identification of the moving object is eliminated. For example, with the change of time, the illumination change caused by the rising and falling of the sun can affect the brightness of each frame of image in the video.
On the basis of the embodiment shown in fig. 15, the reference image determining module 906 is configured to determine a reference image based on the number of frame images of the current frame image and the previous reference image.
In some embodiments, the determination method of the reference image is a static determination method, and the specific method is to take one frame image as the reference image every fixed number of frame images, for example, the reference image is referred to as an a frame image, the frame image in which the first motion region is scaled is referred to as a B frame image, and the detected reference image obtained by scaling the reference image is periodically sent to the motion target recognition module for processing every T periods through setting a period T. For example, taking a video with a frame rate of 50Hz as an example, when T is 100 milliseconds, that is, a frame image is taken as a reference image every 100 milliseconds, and at this time, the arrangement order of the reference image and the frame image for scaling the first motion region is abbbbbbbbbb. Since the frame image rate is fixed in the same video, that is, the time occupied by each frame image is fixed, and the occupied time is fixed, a fixed period is defined, that is, the number of frame images at intervals is determined. Of course, it is also possible to directly set a fixed number of interval frame images, for example, set 3 frame images per interval, that is, obtain one frame image as a reference image, and at this time, the frame images of the full-image zoom and the frame images of the first motion region zoom are arranged as follows: ABBBABBB.
By setting the reference image in the manner of this embodiment, the influence caused by slow change of the background environment can be eliminated, which is beneficial to accurately identifying the moving target object.
As shown in fig. 16, based on the embodiment shown in fig. 15, the reference image determining module 906 may further include the following modules:
the proportion confirming submodule 9061 is configured to determine a proportion of the first motion region in the current frame image to the current frame image;
in some embodiments, the ratio of the first motion region to the current frame image is a ratio of the number of pixels in the first motion region to the number of all pixels in the current frame image. In some embodiments, the first motion region may be multiple, and when the first motion region has multiple numbers, the ratio refers to a ratio of a sum of the numbers of pixels in the multiple first motion regions to the number of all pixels in the current frame image.
A reference image confirmation sub-module 9062 is configured to determine a reference image based on the comparison result of the ratio and the predetermined threshold range.
In some embodiments, when the detected motion area exceeds a certain proportion of the full map, the full map is automatically zoomed and then sent to the motion object detection module for processing, and the proportion of the motion area can be configured according to different application scenes. For example, in a scene where people flow, traffic flow, or animal activity is dense, setting the ratio smaller helps to identify the activity details of the picture in the video, and at this time, the ratio of the a-frame image and the B-frame image is dynamically changed. For example, when the proportion of the first motion area of the first frame image, the fourth to sixth frame images, and the twelfth frame image exceeds a certain proportion of the full image, the first frame image, the fourth to sixth frame images, and the twelfth frame image are taken as the reference frame, and the first motion areas of the other frame images do not exceed a certain proportion of the full image, and the first motion areas are scaled, so that the arrangement order of the reference image and the frame images for scaling the first motion areas is as follows: ABBAAABBBBBABB.
By adopting the technical scheme of the embodiment, the reference image can be determined according to the proportion of the moving target object in the current frame image, so that the activity details of the picture moving target object in the video can be fully identified.
It will be appreciated by those skilled in the art that the above-described ways of setting the reference image, both static and dynamic, can be switched. For example, the ratio (or absolute number) of the moving pixels in the unit time is detected, and when the ratio is low, the reference image is set to be selected in a static mode. For another example, according to the scene, it is specified that the reference image is selected in a static manner at night and in a dynamic manner in the day.
On the basis of the embodiment shown in fig. 15, the apparatus may further include the following modules:
and the current frame image whole image scaling module is used for determining a detected image with a preset scale corresponding to the current frame image based on the detected region of the current frame image and the detected reference image of the reference image of a frame image before the current frame image.
In some embodiments, this module may be implemented in two ways, one of which is as follows: the position of the motion area relative to the starting point of the image is obtained from the coordinates of the first motion area, and this relative position can be converted into an address offset relative to the starting point of the image. When a module for detecting a moving object needs a scaling result of a full image of a current frame image, the scaling result of the detected region and a frame of reference image before the current frame image, namely at least one frame of detected reference image, can be read out simultaneously, the detected reference image and the detected region are summed according to the address offset of the first moving region, and the summed result is the scaling result of the full image of the current frame image. The second is as follows: if the module for detecting the moving target object needs to directly obtain the full-image scaling result of the current frame image instead of transferring through the storage unit, the previous detected reference image stored in the storage unit can be read out simultaneously in the calculation process of the detected area, whether the detected area and the previous detected reference image are summed and output to the subsequent processing module or the previous detected reference image in the storage unit is output to the subsequent processing module is judged according to whether the current image pixel is a moving area, if the pixel is in the first moving area, the detected area and the previous detected reference image are summed and output to the subsequent module, otherwise, the previous detected reference image is directly output to the subsequent module for processing. The processing transmits the detected area with the preset scale obtained from the motion area in the current frame image to the module for detecting the moving target object, thereby saving the processes of storing the detected area with the preset scale into the storage unit and reading out the detected area from the storage unit, and greatly reducing the time delay of image processing.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 17. FIG. 17 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 17, the electronic device 11 includes one or more processors 111 and memory 112.
The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.
Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 111 to implement the moving object detection methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 113 may be, for example, a communication network connector for receiving an input signal.
The input device 113 may also include, for example, a keyboard, a mouse, and the like.
The output device 114 may output various information to the outside, and the output device 114 may include, for example, a display, a speaker, a printer, and a communication network and a remote output device connected thereto, and the like.
Of course, for simplicity, only some of the components of the electronic device 11 relevant to the present disclosure are shown in fig. 17, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 17 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the moving object detection method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the moving object detection method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A method of detecting a moving object, comprising:
determining inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image of the current frame image;
determining a multi-frame image set based on the current frame image and a preset number of historical frame images before the current frame image;
determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
determining at least one detected region of a predetermined scale based on the first motion region;
and determining a moving target object in the detected area based on the detected area in the current frame image.
2. The method of claim 1, wherein determining inter-frame difference information for a current frame image and a previous frame image based on the current frame image and the previous frame image comprises:
determining a corresponding pixel difference value of a predetermined color channel in a predetermined color space based on the pixel values corresponding to the current frame image and the previous frame image;
determining the inter-frame difference information based on the pixel difference value and a predetermined weight value.
3. The method of claim 1, wherein determining a first motion region in the current frame image based on inter-frame difference information for each frame image in the set of multiple frame images comprises:
accumulating the interframe difference information of each frame of image in the multi-frame image set to determine the accumulated difference information of the multi-frame image set;
determining a motion pixel point in the current frame image based on the comparison result of the accumulated difference information and a preset threshold value;
and determining a first motion area of the current frame image based on the motion pixel points in the current frame image.
4. The method of claim 3, wherein determining the first motion region of the current frame image based on the motion pixel points in the current frame image comprises:
determining a pixel point set with the distance between the pixel point set and the motion pixel point within a preset distance range based on the motion pixel point and a preset distance;
determining the first motion region based on the set of pixel points.
5. The method of claim 4, wherein the determining the first motion region based on the set of pixel points comprises:
determining a second motion region based on the set of pixel points;
and merging the more than two second motion areas into a first motion area based on the overlapped pixel points of the more than two second motion areas.
6. The method of claim 4, wherein determining the first motion region frame image based on the set of pixel points comprises:
determining a second motion region based on the set of pixel points;
determining a combination result of two adjacent second motion areas based on the distance between the two adjacent second motion areas;
and merging two adjacent second motion areas with the distance meeting the preset condition into a first motion area based on the merging result.
7. The method of claim 1, wherein the method further comprises:
determining a reference image based on a comparison result of the current frame image with a predetermined condition;
successively carrying out down-sampling processing on the reference image to obtain at least one detected reference image with a preset scale;
and determining the moving target object in the reference image based on the detected reference image with the at least one predetermined scale.
8. A detection device of a moving object, comprising:
a difference information acquisition module: the image processing device is used for determining the inter-frame difference information of a current frame image and a previous frame image based on the current frame image and the previous frame image;
a multi-frame image set determination module: the image processing device is used for determining a multi-frame image set based on the current frame image and a preset number of frame images before the current frame image;
a motion region acquisition module: the image processing device is used for determining a first motion area in the current frame image based on the inter-frame difference information of each frame image in the multi-frame image set;
the scale conversion module: the device comprises a first motion area, a second motion area and a detection area, wherein the first motion area is used for determining at least one detected area with a preset scale;
a moving object detection module: for determining a moving object in the detected region based on the detected region in the current frame image.
9. A computer-readable storage medium, which stores a computer program for executing the method for detecting a moving object according to any one of claims 1 to 7.
10. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for detecting a moving object according to any one of claims 1 to 7.
CN202011160883.4A 2020-10-29 2020-10-29 Moving object detection method and device, readable storage medium and electronic equipment Pending CN112287805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011160883.4A CN112287805A (en) 2020-10-29 2020-10-29 Moving object detection method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011160883.4A CN112287805A (en) 2020-10-29 2020-10-29 Moving object detection method and device, readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112287805A true CN112287805A (en) 2021-01-29

Family

ID=74373367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011160883.4A Pending CN112287805A (en) 2020-10-29 2020-10-29 Moving object detection method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112287805A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419522A (en) * 2022-03-29 2022-04-29 以萨技术股份有限公司 Target object structured analysis method, device and equipment
CN115965653A (en) * 2022-12-14 2023-04-14 北京字跳网络技术有限公司 Light spot tracking method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030169340A1 (en) * 2002-03-07 2003-09-11 Fujitsu Limited Method and apparatus for tracking moving objects in pictures
CN102307274A (en) * 2011-08-31 2012-01-04 南京南自信息技术有限公司 Motion detection method based on edge detection and frame difference
CN104700430A (en) * 2014-10-05 2015-06-10 安徽工程大学 Method for detecting movement of airborne displays
CN107886086A (en) * 2017-12-01 2018-04-06 中国农业大学 A kind of target animal detection method and device based on image/video
CN108109163A (en) * 2017-12-18 2018-06-01 中国科学院长春光学精密机械与物理研究所 A kind of moving target detecting method for video of taking photo by plane
CN110751678A (en) * 2018-12-12 2020-02-04 北京嘀嘀无限科技发展有限公司 Moving object detection method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030169340A1 (en) * 2002-03-07 2003-09-11 Fujitsu Limited Method and apparatus for tracking moving objects in pictures
CN102307274A (en) * 2011-08-31 2012-01-04 南京南自信息技术有限公司 Motion detection method based on edge detection and frame difference
CN104700430A (en) * 2014-10-05 2015-06-10 安徽工程大学 Method for detecting movement of airborne displays
CN107886086A (en) * 2017-12-01 2018-04-06 中国农业大学 A kind of target animal detection method and device based on image/video
CN108109163A (en) * 2017-12-18 2018-06-01 中国科学院长春光学精密机械与物理研究所 A kind of moving target detecting method for video of taking photo by plane
CN110751678A (en) * 2018-12-12 2020-02-04 北京嘀嘀无限科技发展有限公司 Moving object detection method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419522A (en) * 2022-03-29 2022-04-29 以萨技术股份有限公司 Target object structured analysis method, device and equipment
CN115965653A (en) * 2022-12-14 2023-04-14 北京字跳网络技术有限公司 Light spot tracking method and device, electronic equipment and storage medium
CN115965653B (en) * 2022-12-14 2023-11-07 北京字跳网络技术有限公司 Light spot tracking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11263751B2 (en) Method and apparatus for image segmentation using an event sensor
JP5284048B2 (en) Image processing apparatus, imaging apparatus, and image processing method
US10867166B2 (en) Image processing apparatus, image processing system, and image processing method
US8913791B2 (en) Automatically determining field of view overlap among multiple cameras
EP1574992B1 (en) Method and device for tracking moving objects in images
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
US10701281B2 (en) Image processing apparatus, solid-state imaging device, and electronic apparatus
JP6347211B2 (en) Information processing system, information processing method, and program
US11132538B2 (en) Image processing apparatus, image processing system, and image processing method
JP2001285695A (en) Mobile body tracking method and device therefor
US10867390B2 (en) Computer vision processing
CN105577983B (en) Apparatus and method for detecting motion mask
KR19980701568A (en) METHOD AND APPARATUS FOR DETECTING OBJECT MOVEMENT WITHIN AN IMAGE SEQUENCE
JP7272024B2 (en) Object tracking device, monitoring system and object tracking method
JP2007293722A (en) Image processor, image processing method, image processing program, and recording medium with image processing program recorded thereon, and movile object detection system
CN112287805A (en) Moving object detection method and device, readable storage medium and electronic equipment
JP5371040B2 (en) Moving object tracking device, moving object tracking method, and moving object tracking program
CN113298707B (en) Image frame splicing method, video inspection method, device, equipment and storage medium
CN112329616B (en) Target detection method, device, equipment and storage medium
JP7384158B2 (en) Image processing device, moving device, method, and program
JP2002367077A (en) Device and method for deciding traffic congestion
CN112585957A (en) Station monitoring system and station monitoring method
CN114882003A (en) Method, medium and computing device for detecting shooting pose change of camera
JP7107597B2 (en) STATION MONITORING DEVICE, STATION MONITORING METHOD AND PROGRAM
US11157755B2 (en) Image processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination