WO2022135511A1 - Procédé et appareil permettant de positionner un objet mobile, ainsi que dispositif électronique et support de stockage - Google Patents

Procédé et appareil permettant de positionner un objet mobile, ainsi que dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2022135511A1
WO2022135511A1 PCT/CN2021/140765 CN2021140765W WO2022135511A1 WO 2022135511 A1 WO2022135511 A1 WO 2022135511A1 CN 2021140765 W CN2021140765 W CN 2021140765W WO 2022135511 A1 WO2022135511 A1 WO 2022135511A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
frame
area
moving object
sampling
Prior art date
Application number
PCT/CN2021/140765
Other languages
English (en)
Chinese (zh)
Inventor
马欣
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022135511A1 publication Critical patent/WO2022135511A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the embodiments of the present disclosure relate to the technical field of image recognition, and in particular, to a method, apparatus, electronic device, and storage medium for positioning a moving object.
  • feature extraction is usually performed directly in the global image for the video image obtained by the camera assembly, and moving objects are located in the image according to the extracted image features.
  • Embodiments of the present disclosure provide a method, device, electronic device, and storage medium for positioning a moving object.
  • an embodiment of the present disclosure provides a method for locating a moving object, the positioning method includes: acquiring event stream information through a dynamic vision sensor, and acquiring image information through a target camera component; Sampling is performed to obtain a sampling event frame; according to the event stream information corresponding to the sampling event frame, the predicted position area of the moving object in the sampling event frame is determined; The location area that matches the predicted location area.
  • an embodiment of the present disclosure provides a positioning device for a moving object, the positioning device includes: an information acquisition module for acquiring event stream information through a dynamic vision sensor, and acquiring image information through a target camera assembly; a sampling execution module , for sampling the event stream information according to a preset sampling period to obtain a sampling event frame, and determining the predicted location area of a moving object in the sampling event frame according to the event stream information corresponding to the sampling event frame ; a classification execution module, configured to determine, according to the predicted position region, a positioning region in the image information that matches the predicted position region.
  • embodiments of the present disclosure provide an electronic device, the electronic device comprising: one or more processors; a memory for storing one or more programs; when the one or more programs are stored by the one or more programs or multiple processors execute, so that the one or more processors implement the method for locating a moving object described in any embodiment of the present disclosure.
  • an embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method for locating a moving object described in any embodiment of the present disclosure.
  • the predicted position area of the moving object in the sampling event frame is determined,
  • the matching positioning area is determined in the image information of the target camera assembly, which improves the positioning efficiency of moving objects, especially the real-time detection of high-speed moving objects.
  • FIG. 1A is a schematic flowchart of a method for locating a moving object according to an embodiment of the present disclosure
  • FIG. 1B is a schematic flowchart of a method for determining a contour region of a moving object in a sampling event frame according to an embodiment of the present disclosure
  • 1C is a schematic diagram of a predicted position area of a moving object provided by an embodiment of the present disclosure
  • 1D is a schematic flowchart of a method for determining a location area in image information that matches a predicted location area in an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another method for locating a moving object provided by an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of a device for positioning a moving object provided by an embodiment of the present disclosure
  • FIG. 4 is a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 1A is a schematic flowchart of a method for locating a moving object according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure can be used to detect whether there is a moving object in the image information captured by the target camera assembly, and to detect whether there is a moving object in the image information captured by the target camera assembly. Locating, identifying and classifying moving objects, the method can be performed by the device for positioning moving objects in the embodiments of the present disclosure, the device can be implemented by software and/or hardware, and integrated in electronic equipment, the method specifically includes the following steps : Steps S110 to S140.
  • step S110 event stream information is acquired through a dynamic vision sensor, and image information is acquired through a target camera assembly.
  • Dynamic Vision Sensor is an image acquisition device that adopts pixel asynchronous mechanism and is based on address and event expression (AER).
  • AER address and event expression
  • DVS does not need to read all the pixels in the picture, but only needs to obtain the address and information of the pixels whose light intensity changes;
  • the sensor detects that the light intensity change of a certain pixel is greater than or equal to the preset threshold value, it will send out the event signal of the pixel; wherein, if the light intensity change is a positive change, that is, the pixel jumps from low brightness to high If the light intensity changes in a negative direction, that is, the pixel jumps from high brightness to low brightness, it sends out a “-1” signal. If the light intensity change is less than the preset threshold value, no event signal will be sent, and it will be marked as no event; the dynamic vision sensor uses the event marking of each pixel to form event flow information .
  • the target camera assembly is a shooting device that converts optical image signals into electrical signals, and then stores or transmits electrical signals, and can include various types of shooting devices, such as high-speed image acquisition (High-speed Image Acquisition) equipment and surveillance cameras; wherein , a high-speed image acquisition device is an image acquisition device used for high-speed acquisition and acquisition of digital video image information, which can transmit, display and store the acquired image data stream according to a pre-arranged path; in the embodiments of the present disclosure , high-speed image acquisition equipment, which can quickly capture RGB (red, green, and blue three-channel) images in the visible light range, and generate high-speed picture frames to ensure the acquisition of high-speed moving object trajectories.
  • the frame rate of the generated picture frames can be On the order of one thousand to one hundred thousand frames per second.
  • the event flow information of the target scene is acquired by the dynamic vision sensor, and the image information of the target scene is acquired by the target camera component.
  • the event stream information and the image information are shots of the same scene, and the content of the shot images is the same; the event stream information and the image information can be acquired at the same time, or the event stream information can be acquired first through the dynamic vision sensor, And after positioning in the sampling event frame, the image information is obtained through the target camera component.
  • the dynamic vision sensor and the target camera assembly can be placed in adjacent shooting positions (for example, the dynamic vision sensor and the target camera assembly can be integrated in the same electronic device), In order to make the cameras of the two devices, the dynamic vision sensor and the target camera assembly, close enough to improve the shooting angle parallax, and the shooting angles of the cameras of the two devices can be adjusted to ensure that the shooting images of the same scene can be obtained.
  • Step S120 Sampling the event stream information according to a preset sampling period to obtain a sampling event frame.
  • Step S130 Determine the predicted position area of the moving object in the sampling event frame according to the event stream information corresponding to the sampling event frame.
  • the light intensity of the corresponding pixels in the area where the moving object passes in the picture will change to different degrees.
  • the light intensity will increase significantly.
  • the light intensity of the pixels in the disappearing area of the moving object will be significantly reduced. Therefore, according to the event stream information, it can be determined which pixels in the picture may have moving objects.
  • the pixel may be a pixel related to a moving object; the sampling event frame is in the preset sampling period
  • the event stream information of the sampling event frame includes event information corresponding to multiple pixels, and the event information corresponding to each pixel includes at least one annotation event,
  • the location area of the moving object can be predicted.
  • the preset sampling period can be set according to actual needs. For example, in order to improve the detection efficiency of moving objects in the event stream information, the preset sampling period can be set to a lower value; in order to reduce the image processing pressure, the preset sampling period can be set to a lower value. Set the sampling period to a high value; in particular, due to the high detection accuracy of DVS, the detection of event signals at pixel points can reach nanosecond level (for example, 1000 nanoseconds, that is, pixels are acquired every 1000 nanoseconds).
  • the preset sampling period is usually set to millisecond level (for example, 10 milliseconds), therefore, in one sampling period, the light intensity of one pixel point may experience multiple changes, that is, the DVS for one pixel If the point sends out multiple event signals, as long as the event information of the pixel point includes at least one positive event and/or a negative event within the preset sampling period, the pixel point is included in the predicted position area of the moving object.
  • step S130 determining the predicted location area of the moving object in the sampling event frame according to the event flow information corresponding to the sampling event frame, may further include: according to the event flow information corresponding to the sampling event frame, Determine the contour area of the moving object in the sampling event frame, and mark the contour area through the area of interest frame to obtain the predicted position area of the moving object.
  • Region Of Interest is a box, circle, ellipse, and polygon that outlines the area that needs to be processed. Since the contour information of moving objects is usually irregular graphics, it is not easy to locate in the image.
  • the area of interest frame can be marked in the image by means of a rectangular frame to mark the smallest rectangle that includes both the appearance of the moving object and the disappearance of the moving object.
  • the area of is the predicted position area of the moving object.
  • the contour region of the moving object it can be obtained through the target detection algorithm in the sampling event frame, for example, through the sliding window detector or R-CNN (Regions with CNN features, region features based on convolutional neural network).
  • FIG. 1B is a schematic flowchart of a method for determining a contour region of a moving object in a sampling event frame according to an embodiment of the present disclosure.
  • step S130 according to the sampling event frame
  • the corresponding event flow information to determine the contour region of the moving object in the sampling event frame may further include: steps S131 to S133.
  • Step S131 Acquire an event occurrence frame and an event disappearance frame according to the event stream information corresponding to the sampled event frame.
  • the event flow information of the sampling event frame includes event information corresponding to a plurality of pixels in the sampling event frame, the event information corresponding to each pixel includes at least one labeling event, and the labeling event includes a positive event or a negative event. event.
  • step S131 acquiring the event occurrence frame and the event disappearance frame according to the event flow information corresponding to the sampled event frame, may further include: marking the event information corresponding to a plurality of pixel points The pixel corresponding to the labeling event of the positive event is determined as the event occurrence pixel; and in the event information corresponding to the multiple pixel points, the pixel corresponding to the labeling event marked as a negative event is determined as the event disappearance pixel; An event-on frame is generated from all event-on pixels, and an event-off frame is generated from all event-off pixels.
  • sampling event frame describes the event information of all pixels
  • event appearance frame describes the information of the pixels corresponding to all positive events
  • event disappearance frame describes the information of the pixels corresponding to all negative events.
  • the pixel resolution of the event appearance frame, the event disappearance frame and the sampling event frame is the same, and the pixel resolution of the sampling event frame is the same as that of the dynamic vision sensor DVS.
  • the pixel values corresponding to all event occurrence pixels are set as the first pixel value
  • the pixel values corresponding to all non-event occurrence pixels are set as the second pixel value.
  • the pixel values corresponding to all event disappearing pixels are set to the first pixel value
  • the pixel values corresponding to all non-event disappearing pixels are set to the second pixel value.
  • the first pixel value may be set to the maximum pixel value, that is, 255
  • the second pixel value may be set to the minimum pixel value, that is, 0.
  • the event occurrence frame may be represented by an event occurrence matrix
  • the event disappearance frame may be represented by an event disappearance matrix.
  • each element in the event occurrence matrix corresponds to each pixel of the event occurrence frame, and the position is set correspondingly, and the value of each element in the event occurrence matrix is the pixel value of the corresponding pixel;
  • Each element corresponds to each pixel of the event disappearance frame, and the position is set correspondingly, and the value of each element in the event disappearance matrix is the pixel value of the corresponding pixel.
  • the value of each element in the event occurrence empty matrix is initialized to the second pixel value (such as 0), the number of element rows in the event occurrence empty matrix corresponds to the pixel row number in the pixel resolution of the sampling event frame, and the event occurrence
  • the number of element columns in the empty matrix corresponds to the number of pixel columns in the pixel resolution of the sampling event frame, and each element in the empty matrix for event occurrence corresponds to a pixel;
  • the value of each element in the empty matrix for event disappearance is initialized to the second
  • the pixel value (such as 0)
  • the number of element rows in the event disappearance empty matrix corresponds to the number of pixel rows in the pixel resolution of the sampling event frame
  • the number of element columns in the event disappearance empty matrix corresponds to the pixel in the pixel resolution of the sampling event frame.
  • each element in the event disappearance empty matrix corresponds to a pixel.
  • the resolution of the dynamic vision sensor is 1024 (horizontal pixels) ⁇ 648 (vertical pixels)
  • the event appearance empty matrix and the event disappearance empty matrix are both 1024 (row) ⁇ 648 (column) matrices.
  • each element in the event occurrence empty matrix and the event disappearance empty matrix is assigned to obtain the event occurrence matrix and the event disappearance matrix.
  • the values of each element in the event occurrence empty matrix and the event disappearance empty matrix are initialized to the second pixel value, for example, 0. If within the preset sampling period, when the labeling event of a pixel is obtained as a positive event, in the event occurrence empty matrix, the element corresponding to the pixel is assigned as the first pixel value (that is, the assignment is 255); By assigning the first pixel value (ie 255) to the elements in the event occurrence empty matrix corresponding to all the pixels containing positive events within the preset sampling period, and all the pixels containing negative events or no events corresponding to the event The value of the element is kept as the second pixel value (ie 0), so as to obtain the event occurrence matrix; in the obtained event occurrence matrix, the position of the element whose value is 255 in the picture indicates that the moving object appears in the preset sampling period.
  • the appearance track of the edge of the object therefore, according to the event appearance matrix, the appearance outline of the highlighted moving object can be obtained in the image.
  • the element corresponding to the pixel is assigned as the first pixel value (that is, the assignment is 255); by assigning the elements in the event-disappearance empty matrix corresponding to all the pixels containing negative events in the preset sampling period as the first pixel value (ie 255), and all the pixels containing positive events or no events.
  • the value of the corresponding element is kept as the second pixel value (ie 0), thereby obtaining the event disappearance matrix; in the obtained event disappearance matrix, the position of the element whose value is 255 in the picture indicates that motion occurs in the preset sampling period
  • the object is the disappearance track of the edge of the moving object. Therefore, according to the event disappearance matrix, the highlighted moving object disappearance contour can be obtained in the image. Finally, the union of the appearance outline of the moving object and the disappearing outline of the moving object is taken as the outline information of the moving object.
  • Step S132 Determine the predicted appearance area of the moving object according to the event appearance frame, and determine the predicted disappearance area of the moving object according to the event disappearance frame.
  • step S132 the predicted occurrence area of the moving object is determined according to the positions of all pixel points whose pixel values are the first pixel value in the event occurrence frame, and the predicted occurrence area of the moving object is the area where the appearance outline of the aforementioned moving object is located;
  • the predicted disappearance area of the moving object is determined according to the positions of all pixels whose pixel value is the first pixel value in the event disappearance frame, and the predicted disappearance area of the moving object is the area where the disappearance contour of the moving object is located.
  • the position of the pixel point can be represented by two-dimensional position coordinates.
  • Step S133 Determine the contour area of the moving object according to the predicted appearance area and the predicted disappearance area.
  • the predicted appearance area of the moving object is the area where the outline of the aforementioned moving object is located, which can be called the area of the appearance outline of the moving object;
  • the predicted disappearance area of the moving object is the area where the disappearing outline of the moving object is located, which can be It is called the vanishing contour area of the moving object.
  • FIG. 1C is a schematic diagram of a predicted position area of a moving object provided by an embodiment of the present disclosure. As shown in FIG.
  • ROI 1 [x 11 , y 11 , x 12 , y 12 ], (x 11 , y 11 ) and (x 12 , y 12 ) are the upper left corner vertexes of the contour region of the moving object, respectively
  • the two-dimensional position coordinates of A1 (the pixel point in the upper left corner of the corresponding area) and the two-dimensional position coordinates of the lower right corner vertex B1 (the pixel point in the lower right corner of the corresponding area).
  • the predicted position region ROI DVS of moving objects in the frame can be expressed by the following formula:
  • ROI DVS [min(x 11 , x 21 ), min(y 11 , y 21 ), max(x 12 , x 22 ), max(y 12 , y 22 )].
  • ROI DVS [x 11 , y 11 , x 22 , y 22 ], that is, Taking (x 11 , y 11 ) as the position coordinates of the upper left corner vertex of the predicted position region ROI DVS , and taking (x 22 , y 22 ) as the position coordinates of the lower right corner vertex of the predicted position region ROI DVS , the predicted position region is determined ROI DVS.
  • the method for determining the contour area may further include: The event appearing frame and/or the event disappearing frame is processed to remove noise points.
  • the event occurrence matrix corresponding to the event occurrence frame and the event disappearance matrix corresponding to the event disappearance frame are both sparse matrices. Due to the sensitivity of the dynamic vision sensor, sparse noise will also appear in the background area except for moving objects in the picture. Therefore, it is necessary to remove the sparse noise points. Specifically, an erosion operation and a dilation operation are performed on the pixel points of the non-zero pixel value in the event appearance frame and/or the event disappearance frame, so as to realize the removal of noise points, so that in the binarized event appearance frame and/or event disappearance frame When detecting the contour area of the moving object on the frame, it can effectively improve the influence of noise points and improve the accuracy of detecting the contour area.
  • S140 Determine, according to the predicted location area, a positioning area in the image information that matches the predicted location area.
  • the predicted position area of the moving object in the sampling event frame of the dynamic vision sensor is determined, if the resolutions of the dynamic vision sensor and the target camera component are the same, it means that the sampling event frame sampled by the dynamic vision sensor and the image information obtained by the target camera component are not identical. If the resolution is the same, then the predicted location area in the sampling event frame and the positioning area in the image information are the same area, and the image to be detected with the same shooting time, same shooting position and shooting angle as the sampling event frame is obtained in the image information, And according to the predicted position area, the same area in the image to be detected as the predicted position area is directly used as the positioning area;
  • FIG. 1D is a schematic flowchart of a method for determining a location area in image information that matches a predicted location area in an embodiment of the present disclosure.
  • step S140 according to the prediction
  • the location area which determines the location area in the image information that matches the predicted location area, may further include steps S141 to S143.
  • Step S141 acquiring the proportional relationship between the resolutions of the dynamic vision sensor and the target camera assembly.
  • Step S142 scaling the predicted location area according to the proportional relationship.
  • Step S143 Map the zoomed predicted location area to the image information to determine a location area matching the preset location area.
  • the proportional relationship between the resolutions of the dynamic vision sensor and the target camera assembly includes the ratio of the horizontal resolution (horizontal pixels) of the dynamic vision sensor to the horizontal resolution (horizontal pixels) of the target camera assembly, And the ratio of the vertical resolution (vertical pixels) of the dynamic vision sensor to the vertical resolution (vertical pixels) of the target camera assembly.
  • the horizontal direction of the dynamic vision sensor is The ratio of the resolution (horizontal pixels) to the horizontal resolution (horizontal pixels) of the target camera assembly is 1024/1280, and the vertical resolution (vertical pixels) of the dynamic vision sensor and the vertical resolution of the target camera assembly ( vertical pixels) ratio is 648/960.
  • step S142 the ratio between the horizontal resolution of the dynamic vision sensor and the horizontal resolution of the target camera assembly is used as the horizontal adjustment factor
  • the ratio of the vertical resolution of the dynamic vision sensor to the vertical resolution of the target camera component is used as the vertical adjustment factor Adjust the factor according to the horizontal direction and vertical adjustment factor
  • the predicted position region ROI DVS is scaled in the horizontal direction and the vertical direction to obtain the scaled predicted position region ROI.
  • the scaled predicted location area can be represented by the following formula:
  • step S143 the zoomed predicted position area is mapped to the image information, and the same area as the zoomed predicted position area in the image information is the matching positioning area, thereby determining the preset position area.
  • the matching positioning area where the positioning area of the moving object in the image information can also be represented by the following formula:
  • the predicted position area of the moving object in the sampling event frame is determined, and the matching is determined in the image information of the target camera component. It improves the positioning efficiency of moving objects, especially the real-time detection of high-speed moving objects.
  • FIG. 2 is a schematic flowchart of another method for locating a moving object provided by an embodiment of the present disclosure.
  • the trained image classification model can identify and classify the positioning area to determine whether there is a moving object in the image information, so as to realize the identification, classification and tracking of the moving object in the image information.
  • the positioning method may include the following steps: step S210 to step S250.
  • step S210 For the specific description of step S210, reference may be made to the above description of step S110, and details are not repeated here.
  • S220 Sample the event stream information according to a preset sampling period to obtain a sampled event frame.
  • step S220 For the specific description of step S220, reference may be made to the above description of step S120, which will not be repeated here.
  • Step S230 Determine the predicted position area of the moving object in the sampling event frame according to the event stream information corresponding to the sampling event frame.
  • step S230 For the specific description of step S230, reference may be made to the above description of step S130, which will not be repeated here.
  • step S240 For a specific description of step S240, reference may be made to the above description of step S140, which will not be repeated here.
  • the image classification model is a classification model that is pre-trained based on sample images. Its function is to extract image features and obtain feature vectors for the image data of the input positioning area, and then output the corresponding image classification according to the obtained feature vectors. Probability, where the image classification probability represents the probability that the image data of the input positioning area is a positive sample or a negative sample, and then classify according to the image classification probability (ie binary classification) to determine whether there is motion in the image data of the input positioning area Object, realize the recognition and classification of moving objects in the positioning area of image information.
  • the image classification probability represents the probability that the image data of the input positioning area is a positive sample or a negative sample
  • the image features can include the color features, texture features, shape features and spatial relationship features of the image; the color features describe the surface properties of the scene corresponding to the image or the image area, and are based on the features of pixels; the texture features describe the image or The surface properties of the scene corresponding to the image area require statistical calculation in an area containing multiple pixels; the shape feature describes the contour features of the outer boundary of the object and the overall regional features; the spatial relationship feature is the segmentation of the video image.
  • the mutual spatial position or relative direction relationship between the multiple targets such as connection relationship, overlapping relationship, and inclusion relationship, etc.
  • the types of the extracted image features are not specifically limited.
  • the method before identifying and classifying the positioning area in the image information according to the pre-trained image classification model, the method further includes: judging whether the number of pixels in the positioning area is greater than a preset detection threshold . Identify and classify the positioning area in the image information according to the pre-trained image classification model, including: if the number of pixels in the positioning area is greater than the preset detection threshold, then according to the pre-trained image classification model, the positioning area Regions are identified and classified.
  • the number of pixels in the location area is less than or equal to the preset detection threshold, no further processing is performed on the location area.
  • the preset detection threshold in order to avoid misdetecting a small interfering object (eg, flying insect) as the target moving object to be monitored (eg, for the monitoring of high-altitude parabolas, the high-altitude parabola is the target movement to be monitored) object), the preset detection threshold can be set to a larger value to effectively prevent false detection of interfering objects.
  • the preset detection threshold in order to improve the detection accuracy of moving objects in image information, can also be set to a small value, for example, set to 0, that is, there are pixels that change in the positioning area When the point is clicked, the corresponding positioning area will be identified and classified by the image classification model.
  • the preset detection threshold is set so that only when the screen is detected, the number of pixels whose light intensity changes exceeds the preset threshold value.
  • the image feature extraction calculation is performed by the image classification model, and only the location area in the image information needs to be processed, which effectively improves the efficiency of identifying and analyzing moving objects, effectively saves computing resources, and reduces computational costs. pressure and improve computational efficiency.
  • the method before identifying and classifying the positioning area in the image information according to the pre-trained image classification model, the method further includes: acquiring a sample image set, and using the sample image set to classify the image classification model Perform image classification training to obtain a pre-trained image classification model;
  • the image classification model is constructed based on neural network, and the image recognition model is a mathematical model based on neural network (Neural Networks, NNS).
  • NNS neural Networks
  • the image classification model is trained by the sample image set composed of positive sample images and negative sample images, so that the trained image classification model has the ability to output the corresponding image classification probability according to the image data of the input positioning area, and then output The category judgment result of the image data of the input positioning area.
  • the predicted position area of the moving object in the sampling event frame is determined, and the matching is determined in the image information of the target camera component.
  • image recognition and classification processing are performed on the image of the positioning area to determine whether there is a moving object in the image. While realizing the positioning of the moving object, the detection accuracy of the moving object in the image information is improved, which is beneficial to Reduce the occurrence of false detection of moving objects.
  • FIG. 3 is a structural block diagram of an apparatus for positioning a moving object provided by an embodiment of the present disclosure.
  • the apparatus specifically includes: an information acquisition module 310 , a sampling execution module 320 , and a classification execution module 330 .
  • the information acquisition module 310 is configured to acquire event stream information through a dynamic vision sensor, and acquire image information through a target camera assembly.
  • the sampling execution module 320 is configured to sample the event stream information according to the preset sampling period to obtain the sampled event frame; and determine the predicted location area of the moving object in the sampled event frame according to the event stream information corresponding to the sampled event frame.
  • the classification execution module 330 is configured to determine, according to the predicted location area, a positioning area in the image information that matches the predicted location area.
  • the device for positioning a moving object provided by the embodiment of the present disclosure, after the event flow information is obtained through the dynamic vision sensor, the predicted position area of the moving object in the sampling event frame is determined, and the matching is determined in the image information of the target camera component. It improves the positioning efficiency of moving objects, especially the real-time detection of high-speed moving objects.
  • the sampling execution module 320 is configured to determine the contour area of the moving object in the sampling event frame according to the event flow information corresponding to the sampling event frame, and mark the contour area through the area of interest frame to obtain the motion The predicted location area of the object.
  • the sampling execution module 320 may further include: a frame processing unit, a prediction area acquisition unit, and a contour area acquisition unit.
  • the frame processing unit is used for acquiring the event appearing frame and the event disappearing frame according to the event stream information corresponding to the sampled event frame.
  • the predicted area obtaining unit is used for determining the predicted appearing area of the moving object according to the event appearing frame, and determining the predicted disappearing area of the moving object according to the event disappearing frame.
  • the contour area acquisition unit is used for determining the contour area of the moving object according to the predicted appearance area and the predicted disappearance area.
  • the event stream information corresponding to the sampling event frame includes event information of multiple pixels, and the event information of the pixel includes at least one labeling event;
  • the frame acquisition unit is configured to: In the event information, the pixels corresponding to the marked events marked as positive events are determined as the event occurrence pixels; and in the event information corresponding to multiple pixels, the pixels corresponding to the marked events marked as negative events are determined. is the event disappearing pixel; the event appearing frame is generated according to all the event appearing pixels, and the event disappearing frame is generated according to all the event disappearing pixels.
  • the pixel resolutions of the event occurrence frame, the event disappearance frame and the sampling event frame are the same; in the event occurrence frame, the pixel values corresponding to all event occurrence pixels are set as the first pixel value, and all non-event The pixel value corresponding to the event occurrence pixel is set to the second pixel value; in the event disappearing frame, the pixel value corresponding to all event disappearing pixels is set to the first pixel value, and the pixel values corresponding to all non-event disappearing pixels are set to the first pixel value. Two pixel values.
  • the prediction area acquisition unit is configured to: determine the predicted appearance area of the moving object according to the positions of all the pixel points in the event appearance frame with the pixel value of the first pixel value, and according to the event disappearance frame
  • the pixel value is the position of all pixel points of the first pixel value, and the predicted disappearance area of the moving object is determined.
  • the classification execution module 330 is configured to: obtain a proportional relationship between the resolutions of the dynamic vision sensor and the target camera assembly; perform scaling processing on the predicted location area according to the proportional relationship; The predicted location area is mapped into the image information to determine the matching localization area.
  • the device for positioning a moving object further includes: a classification processing execution module, which is configured to identify and classify the positioning area in the image information according to the pre-trained image classification model, To determine whether there is a moving object in the image information.
  • a classification processing execution module which is configured to identify and classify the positioning area in the image information according to the pre-trained image classification model, To determine whether there is a moving object in the image information.
  • the apparatus for locating a moving object further includes: a judgment execution module configured to judge whether the number of pixels in the positioning area is greater than a preset detection threshold.
  • the classification execution module 330 is configured to identify the location area in the image information according to the pre-trained image classification model if the number of pixels in the location area is greater than the preset detection threshold and classification processing.
  • the apparatus for positioning a moving object further includes: a pre-training execution module, the pre-training execution module is configured to obtain a sample image set, and perform image classification training on the image classification model through the sample image set, so as to obtain a pre-training execution module.
  • the image classification model that has been trained; wherein, the image classification model is constructed based on a neural network.
  • the above-mentioned positioning device can execute the positioning method of a moving object provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 4 is a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 4 shows a structural block diagram of an exemplary electronic device 12 suitable for implementing the positioning method described in the embodiment of the present disclosure.
  • the electronic device 12 shown in FIG. 4 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 12 takes the form of a general-purpose computing device.
  • Components of electronic device 12 may include, but are not limited to, one or more processors or processing units 16 , memory 28 , and a bus 18 connecting various system components including memory 28 and processing unit 16 .
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.
  • Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including both volatile and non-volatile media, removable and non-removable media.
  • Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 .
  • Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive”).
  • a disk drive may be provided for reading and writing to removable non-volatile magnetic disks (eg "floppy disks"), as well as removable non-volatile optical disks (eg CD-ROM, DVD-ROM) or other optical media) to read and write optical drives.
  • each drive may be connected to bus 18 through one or more data media interfaces.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
  • a program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment.
  • Program modules 42 generally perform the functions and/or methods of the embodiments described in this disclosure.
  • the electronic device 12 may also communicate with one or more external devices 14 (eg, a keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the electronic device 12 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 20 . As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.
  • the processing unit 16 executes various functional applications and data processing by running the programs stored in the memory 28, for example, implementing the method for positioning a moving object provided by any embodiment of the present disclosure. That is: obtain event stream information through the dynamic vision sensor, and obtain image information through the target camera component; sample the event stream information according to the preset sampling period to obtain the sampled event frame, and according to the event stream information corresponding to the sampled event frame, Determine the predicted location area of the moving object in the sampling event frame; determine the location area in the image information that matches the predicted location area according to the predicted location area.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the method for locating a moving object according to any embodiment of the present disclosure; the method includes: by The dynamic vision sensor obtains event stream information, and obtains image information through the target camera component; samples the event stream information according to a preset sampling period to obtain sampled event frames; The predicted location area of the moving object; according to the predicted location area, determine the location area in the image information that matches the predicted location area.
  • the computer storage medium of the embodiments of the present disclosure may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

L'invention porte sur un procédé et sur un appareil permettant de positionner un objet mobile, ainsi que sur un dispositif électronique et sur un support de stockage. Le procédé consiste : à acquérir des informations de flux d'événements au moyen d'un capteur de vision dynamique et à acquérir des informations d'image au moyen d'un ensemble photographique cible (S110) ; à échantillonner les informations de flux d'événements selon une période d'échantillonnage prédéfinie de sorte à obtenir un cadre d'événement échantillonné (S120) ; à déterminer une zone de localisation prédite d'un objet mobile dans le cadre d'événement échantillonné en fonction d'informations de flux d'événements correspondant au cadre d'événement échantillonné (S130) ; et à déterminer, en fonction de la zone de localisation prédite et dans les informations d'image, une zone de positionnement qui correspond à la zone de localisation prédite (S140). Au moyen du procédé, l'efficacité de positionnement d'un objet mobile est améliorée et, en particulier, les performances de détection en temps réel pour un objet mobile à grande vitesse sont améliorées.
PCT/CN2021/140765 2020-12-24 2021-12-23 Procédé et appareil permettant de positionner un objet mobile, ainsi que dispositif électronique et support de stockage WO2022135511A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011552412.8A CN112669344B (zh) 2020-12-24 2020-12-24 一种运动物体的定位方法、装置、电子设备及存储介质
CN202011552412.8 2020-12-24

Publications (1)

Publication Number Publication Date
WO2022135511A1 true WO2022135511A1 (fr) 2022-06-30

Family

ID=75410041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140765 WO2022135511A1 (fr) 2020-12-24 2021-12-23 Procédé et appareil permettant de positionner un objet mobile, ainsi que dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN112669344B (fr)
WO (1) WO2022135511A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416602A (zh) * 2023-04-17 2023-07-11 江南大学 基于事件数据与图像数据联合的运动目标检测方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909047B (zh) * 2019-11-28 2022-05-17 大连海事大学 一种面向指定时刻的日常行为识别方法
CN112669344B (zh) * 2020-12-24 2024-05-28 北京灵汐科技有限公司 一种运动物体的定位方法、装置、电子设备及存储介质
CN113096158A (zh) * 2021-05-08 2021-07-09 北京灵汐科技有限公司 运动对象的识别方法、装置、电子设备及可读存储介质
WO2023279286A1 (fr) * 2021-07-07 2023-01-12 Harman International Industries, Incorporated Procédé et système d'étiquetage automatique de trames dvs
CN113506321A (zh) * 2021-07-15 2021-10-15 清华大学 图像处理方法及装置、电子设备和存储介质
CN114140365B (zh) * 2022-01-27 2022-07-22 荣耀终端有限公司 基于事件帧的特征点匹配方法及电子设备
CN114549442B (zh) * 2022-02-14 2022-09-20 常州市新创智能科技有限公司 一种运动物体的实时监测方法、装置、设备及存储介质
CN114677443B (zh) * 2022-05-27 2022-08-19 深圳智华科技发展有限公司 光学定位方法、装置、设备及存储介质
CN116055844B (zh) * 2023-01-28 2024-05-31 荣耀终端有限公司 一种跟踪对焦方法、电子设备和计算机可读存储介质
CN117975920A (zh) * 2024-03-28 2024-05-03 深圳市戴乐体感科技有限公司 一种鼓槌动态识别定位方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210513A1 (en) * 2015-01-15 2016-07-21 Samsung Electronics Co., Ltd. Object recognition method and apparatus
CN107123131A (zh) * 2017-04-10 2017-09-01 安徽清新互联信息科技有限公司 一种基于深度学习的运动目标检测方法
CN110660088A (zh) * 2018-06-30 2020-01-07 华为技术有限公司 一种图像处理的方法和设备
US20200011668A1 (en) * 2018-07-09 2020-01-09 Samsung Electronics Co., Ltd. Simultaneous location and mapping (slam) using dual event cameras
CN111831119A (zh) * 2020-07-10 2020-10-27 Oppo广东移动通信有限公司 眼球追踪方法、装置、存储介质及头戴式显示设备
CN111951313A (zh) * 2020-08-06 2020-11-17 北京灵汐科技有限公司 图像配准方法、装置、设备及介质
CN112669344A (zh) * 2020-12-24 2021-04-16 北京灵汐科技有限公司 一种运动物体的定位方法、装置、电子设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844128B (zh) * 2015-01-15 2021-03-02 北京三星通信技术研究有限公司 身份识别方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210513A1 (en) * 2015-01-15 2016-07-21 Samsung Electronics Co., Ltd. Object recognition method and apparatus
CN107123131A (zh) * 2017-04-10 2017-09-01 安徽清新互联信息科技有限公司 一种基于深度学习的运动目标检测方法
CN110660088A (zh) * 2018-06-30 2020-01-07 华为技术有限公司 一种图像处理的方法和设备
US20200011668A1 (en) * 2018-07-09 2020-01-09 Samsung Electronics Co., Ltd. Simultaneous location and mapping (slam) using dual event cameras
CN111831119A (zh) * 2020-07-10 2020-10-27 Oppo广东移动通信有限公司 眼球追踪方法、装置、存储介质及头戴式显示设备
CN111951313A (zh) * 2020-08-06 2020-11-17 北京灵汐科技有限公司 图像配准方法、装置、设备及介质
CN112669344A (zh) * 2020-12-24 2021-04-16 北京灵汐科技有限公司 一种运动物体的定位方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416602A (zh) * 2023-04-17 2023-07-11 江南大学 基于事件数据与图像数据联合的运动目标检测方法及系统
CN116416602B (zh) * 2023-04-17 2024-05-24 江南大学 基于事件数据与图像数据联合的运动目标检测方法及系统

Also Published As

Publication number Publication date
CN112669344A (zh) 2021-04-16
CN112669344B (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2022135511A1 (fr) Procédé et appareil permettant de positionner un objet mobile, ainsi que dispositif électronique et support de stockage
US11643076B2 (en) Forward collision control method and apparatus, electronic device, program, and medium
US7944454B2 (en) System and method for user monitoring interface of 3-D video streams from multiple cameras
US20070052858A1 (en) System and method for analyzing and monitoring 3-D video streams from multiple cameras
US10242294B2 (en) Target object classification using three-dimensional geometric filtering
US10685263B2 (en) System and method for object labeling
CN112800860B (zh) 一种事件相机和视觉相机协同的高速抛撒物检测方法和系统
WO2022199360A1 (fr) Procédé et appareil de positionnement d'objet mobile, dispositif électronique et support de stockage
WO2021031954A1 (fr) Procédé et appareil de détermination de quantité d'objets, ainsi que support de stockage et dispositif électronique
JP7272024B2 (ja) 物体追跡装置、監視システムおよび物体追跡方法
US20210124928A1 (en) Object tracking methods and apparatuses, electronic devices and storage media
Benito-Picazo et al. Deep learning-based video surveillance system managed by low cost hardware and panoramic cameras
US20210342593A1 (en) Method and apparatus for detecting target in video, computing device, and storage medium
TWI726278B (zh) 行車偵測方法、車輛及行車處理裝置
Liu et al. A cloud infrastructure for target detection and tracking using audio and video fusion
CN108229281B (zh) 神经网络的生成方法和人脸检测方法、装置及电子设备
CN115359406A (zh) 一种邮局场景人物交互行为识别方法及系统
CN113076889B (zh) 集装箱铅封识别方法、装置、电子设备和存储介质
Zhou et al. A kinematic analysis-based on-line fingerlings counting method using low-frame-rate camera
US10916016B2 (en) Image processing apparatus and method and monitoring system
CN113762027B (zh) 一种异常行为的识别方法、装置、设备及存储介质
US11270442B2 (en) Motion image integration method and motion image integration system capable of merging motion object images
CN114511592B (zh) 一种基于rgbd相机和bim系统的人员轨迹追踪方法及系统
Osborne et al. Temporally stable feature clusters for maritime object tracking in visible and thermal imagery
CN116994201B (zh) 对高空抛物进行溯源监测的方法及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21909506

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.11.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21909506

Country of ref document: EP

Kind code of ref document: A1