WO2022213729A1 - 对目标的运动信息进行检测的方法和装置、设备和介质 - Google Patents

对目标的运动信息进行检测的方法和装置、设备和介质 Download PDF

Info

Publication number
WO2022213729A1
WO2022213729A1 PCT/CN2022/076765 CN2022076765W WO2022213729A1 WO 2022213729 A1 WO2022213729 A1 WO 2022213729A1 CN 2022076765 W CN2022076765 W CN 2022076765W WO 2022213729 A1 WO2022213729 A1 WO 2022213729A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
detection frame
information
coordinate system
Prior art date
Application number
PCT/CN2022/076765
Other languages
English (en)
French (fr)
Inventor
孟文明
朱红梅
张骞
Original Assignee
地平线征程(杭州)人工智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 地平线征程(杭州)人工智能科技有限公司 filed Critical 地平线征程(杭州)人工智能科技有限公司
Priority to JP2022557731A priority Critical patent/JP7306766B2/ja
Priority to EP22783799.4A priority patent/EP4246437A1/en
Publication of WO2022213729A1 publication Critical patent/WO2022213729A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle

Definitions

  • the present disclosure relates to computer vision technology, in particular to a method and apparatus for detecting motion information of a target, a method and apparatus for controlling a traveling object based on the motion information of the target, an electronic device and a storage medium.
  • the estimation of the motion speed and direction of objects is the focus of research in the fields of unmanned driving, security monitoring, and scene understanding.
  • the decision-making layer can control the vehicle to slow down or even stop to ensure the safe driving of the vehicle.
  • lidar is mostly used for data collection.
  • the laser beam is emitted at a high frequency, and then the distance to the target point is calculated according to the emission time and reception time of the laser beam.
  • the target detection and target tracking are performed on the point cloud data collected at two times corresponding to a certain time range, and then the movement speed and direction of the target within the time range are calculated.
  • Embodiments of the present disclosure provide a method and apparatus for detecting motion information of a target, a method and apparatus for controlling a traveling object based on the motion information of the target, an electronic device, and a storage medium.
  • a method for detecting motion information of a target including:
  • the first image is an image of a scene outside the driving object collected by a camera on the driving object during the driving of the driving object;
  • the depth information of the detection frame of the first target is determined, and based on the position of the detection frame of the first target in the image coordinate system and the detection frame of the first target depth information, to determine the first coordinates of the first target in the first camera coordinate system;
  • the pose change information of the camera device from collecting the second image to collecting the first image; wherein, the second image is a sequence of images where the first image is located before the first image and is different from the first image. Describe the image of the first image interval preset number of frames;
  • the second target is the target in the second image corresponding to the first target
  • the motion information of the first target within a corresponding time range from the acquisition moment of the second image to the acquisition moment of the first image is determined.
  • an intelligent driving control method including:
  • the image sequence of the scene outside the driving object is collected by the camera device on the driving object;
  • a control command for controlling the traveling state of the traveling object is generated according to the motion information of the target.
  • an apparatus for detecting motion information of a target including:
  • a detection module configured to perform target detection on a first image to obtain a detection frame of the first target, where the first image is a scene outside the driving object collected by a camera on the driving object during the driving process of the driving object image;
  • a first acquisition module configured to acquire depth information of the first image in the corresponding first camera coordinate system
  • a first determination module configured to determine the depth information of the detection frame of the first target according to the depth information of the first image acquired by the first acquisition module
  • a second determination module configured to be based on the position of the detection frame of the first target obtained by the detection module in the image coordinate system and the depth information of the detection frame of the first target determined by the first determination module, determining a first coordinate of the first target in the first camera coordinate system;
  • the second acquisition module is configured to acquire the pose change information of the camera device from the acquisition of the second image to the acquisition of the first image; wherein, the second image is located in the first image sequence in the sequence of the first image.
  • a conversion module configured to convert the second coordinate of the second target in the second camera coordinate system corresponding to the second image to the first camera according to the pose change information obtained by the second obtaining module the third coordinate in the coordinate system; wherein, the second target is the target in the second image corresponding to the first target;
  • a third determination module configured to determine the acquisition moment of the first target from the second image based on the first coordinates determined by the second determination module and the third coordinates converted by the conversion module
  • the motion information in the time range is corresponding to the acquisition time of the first image.
  • an intelligent driving control device including:
  • a camera device which is arranged on the driving object, and is used for collecting the image sequence of the scene outside the driving object during the driving process of the driving object;
  • a motion information detection device configured to use at least one frame image in the image sequence as a first image, with a preset number of frames before the first image in the image sequence and spaced from the first image by a preset number of frames At least one frame of the image is used as the second image to determine the motion information of the target in the scene;
  • the motion information detection device includes the device for detecting the motion information of the target according to any embodiment of the present disclosure
  • a control device is configured to generate a control instruction for controlling the traveling state of the traveling object according to the motion information of the target detected by the motion information detection device.
  • a computer-readable storage medium where the storage medium stores a computer program, and the computer program is used to execute the movement of the target according to any of the above-mentioned embodiments of the present disclosure
  • a method of detecting information or a method of controlling a traveling object based on the motion information of the target is provided.
  • an electronic device comprising:
  • a memory for storing the processor-executable instructions
  • the processor is configured to read the executable instructions from the memory, and execute the instructions to implement the method for detecting motion information of a target described in any of the foregoing embodiments of the present disclosure or a target-based
  • the motion information controls the method of the moving object.
  • the image of the scene outside the driving object is collected by the camera device on the driving object during the driving process of the driving object, and the collected images are collected.
  • the obtained first image is subjected to target detection, the detection frame of the first target is obtained, the depth information of the first image in the corresponding first camera coordinate system is obtained, and the detection frame of the first target is determined according to the depth information of the first image.
  • the embodiment of the present disclosure utilizes computer vision technology to determine the motion information of the target in the driving scene based on the sequence of images
  • the image sequence of the scene outside the traveling object is collected by the camera on the traveling object, Using at least one frame image in the image sequence as the first image, and using at least one frame image in the image sequence before the first image and separated from the first image by a preset number of frames as the second image, using any one of the present disclosure
  • the method for detecting the motion information of the target described in the embodiment determines the motion information of the target in the driving scene, and then generates a control instruction for controlling the driving state of the driving object according to the motion information of the target, thereby realizing the detection of driving by computer vision technology.
  • the motion information of the target in the scene and the intelligent driving control of the driving object are beneficial to meet the real-time intelligent driving control of the driving object in the unmanned scene, so as to ensure the safe driving of the driving object.
  • FIG. 1 is a scene diagram to which the present disclosure is applicable.
  • FIG. 2 is a schematic flowchart of a method for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • FIG. 3 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a method for detecting motion information of a target provided by yet another exemplary embodiment of the present disclosure.
  • FIG. 6 is a schematic flowchart of a method for detecting motion information of a target provided by yet another exemplary embodiment of the present disclosure.
  • FIG. 7 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an application flow of a method for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of a method for controlling a traveling object based on motion information of a target provided by an exemplary embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an apparatus for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of an apparatus for detecting motion information of a target provided by another exemplary embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of an apparatus for controlling a traveling object based on motion information of a target provided by an exemplary embodiment of the present disclosure.
  • FIG. 13 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • a plurality may refer to two or more, and “at least one” may refer to one, two or more.
  • the term "and/or" in the present disclosure is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, and A and B exist at the same time , there are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the related objects are an "or" relationship.
  • Embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general-purpose or special-purpose computing system environments or configurations.
  • Examples of well-known terminal equipment, computing systems, environments and/or configurations suitable for use with terminal equipment, computer systems, servers, etc. electronic equipment include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients computer, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.
  • Electronic devices such as terminal devices, computer systems, servers, etc., may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system.
  • program modules may include routines, programs, object programs, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • Computer systems/servers may be implemented in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located on local or remote computing system storage media including storage devices.
  • lidar can obtain the depth values of several points in an instantaneous scene, but cannot directly obtain information such as the moving speed and direction of an object.
  • the embodiments of the present disclosure provide a technical solution for obtaining motion information of a target in a driving scene based on a sequence of images of a driving scene by using a computer vision technology, and a camera on the driving object collects the motion information of the scene outside the driving object during the driving process of the driving object.
  • Image, target detection and target tracking are performed on the first image and the second image separated by a preset number of frames in the collected image sequence, and the first coordinate of the same target in the first camera coordinate system corresponding to the first image is the same as that in the first image.
  • the second coordinate in the second camera coordinate system corresponding to the two images is converted to the third coordinate obtained by the first camera coordinate system, and then the acquisition time of the target in the first image and the second image is determined based on the first coordinate and the third coordinate. Corresponds to the motion information in the time range.
  • the embodiments of the present disclosure do not need to rely on lidar, which can avoid a large amount of calculation processing, save processing time, improve processing efficiency, and help meet the needs of scenarios with high real-time requirements such as unmanned driving.
  • a control instruction for controlling the driving state of the driving object can be generated according to the motion information of the target, thereby realizing the detection of the driving scene by using the computer vision technology
  • the motion information of the target and the intelligent driving control of the driving object are beneficial to meet the real-time intelligent driving control of the driving object in the unmanned scene, so as to ensure the safe driving of the driving object.
  • the embodiments of the present disclosure can be applied to intelligent driving control scenarios of driving objects, robots, toy cars and other driving objects.
  • a control command for controlling the driving state of the driving objects is generated, and the driving conditions of the driving objects are controlled.
  • the driving state of the object is controlled.
  • FIG. 1 is a scene diagram to which the present disclosure is applicable.
  • an image sequence collected by an image acquisition module 101 eg, a camera device such as a camera
  • the device 102; the motion information detection device 102 takes each frame of image in the image sequence or a frame of image selected at intervals of several frames as the second image, and is located after the second image in the sequence of images and is spaced from the second image.
  • a frame image of a certain number of frames is used as the first image, and target detection is performed on the first image to obtain the detection frame of the first target; the depth information of the first image in the corresponding first camera coordinate system is obtained, and according to the first image
  • the depth information of the image determines the depth information of the detection frame of the first target; based on the position of the detection frame of the first target in the image coordinate system and the depth information of the detection frame of the first target, it is determined that the first target is in the first camera coordinate system
  • the first coordinates of the second target are converted to the first camera coordinates from the second coordinates of the second target in the second camera coordinate system corresponding to the second image according to the pose change information of the camera from collecting the second image to collecting the first image.
  • the third coordinate in the system further, based on the first coordinate and the third coordinate, determine the motion information of the first target within the corresponding time range from the acquisition moment of the second image to the acquisition moment of the first image, and output it; the control device 103, Based on the motion information of the first target outputted by the motion information detection device 102 within the corresponding time range, the traveling state of the traveling objects such as the vehicle, the robot, and the toy car is controlled.
  • the control device 103 In the application scenario of controlling the driving state of a driving object, if the movement information of the first target (the movement information may include the movement speed and the movement direction) and the driving state of the driving object (the driving state may include the driving speed and the movement direction) ), it is determined that the driving object and the first target may collide within the next 5 seconds, the control device 103 generates a control command for controlling the driving object to decelerate and output it to the driving object, so as to control the current driving object to decelerate and drive to avoid The traveling object collides with the first target.
  • the embodiments of the present disclosure do not limit specific application scenarios.
  • FIG. 2 is a schematic flowchart of a method for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • This embodiment can be applied to electronic devices, and can also be applied to traveling objects such as vehicles, robots, and toy cars.
  • the method for detecting motion information of a target in this embodiment includes the following steps:
  • Step 201 performing target detection on the first image to obtain a detection frame of the first target.
  • the first image is an image of a scene outside the driving object collected by a camera on the driving object during the driving process of the driving object.
  • the first image may be an RGB (red, green and blue) image or a grayscale image, and the embodiment of the present disclosure does not limit the first image.
  • the target in the embodiment of the present disclosure may be any target of interest in the scene outside the driving object, such as a moving or stationary person, small animal, object, etc., where the object may be, for example, a vehicle, a road on both sides of the road, etc. Buildings, green plants, road markings, traffic lights, etc., the embodiments of the present disclosure do not limit the targets to be detected, and can be determined according to actual needs.
  • a preset target detection framework can be used, for example, a recurrent convolutional neural network (Recurrent Neural Network, RCNN), an accelerated recurrent convolutional neural network (Fast RCNN), a mask (Mask RCNN) ) and other region-based algorithms, just glance at the regression-based algorithms such as You Only Look Once (YOLO), the Single Shot MultiBox Detector (SSD) algorithm obtained by the combination of Faster RCNN and YOLO, and so on, Object detection is performed on the first image.
  • YOLO You Only Look Once
  • SSD Single Shot MultiBox Detector
  • the first target is the target in the first image, which may be one target or multiple targets, and the multiple targets may be targets of the same type (for example, all persons), or may be different types of targets. Targets (eg including people, vehicles, etc.). Correspondingly, performing target detection on the first image to obtain a detection frame of the first target may be one or multiple.
  • the embodiments of the present disclosure do not limit the quantity and type of the first targets.
  • the detection box in the embodiment of the present disclosure is the bounding box of the target (Bounding Box).
  • a thinking vector (x, y, w, h) can be used to represent each detection frame, where (x, y) represents the coordinates of the detection frame in the image coordinate system, which can be the center point of the detection frame or Preset the coordinates of any vertex in the image coordinate system; w and h represent the width and height of the detection frame, respectively.
  • Step 202 Acquire depth information of the first image in the corresponding first camera coordinate system.
  • the depth information is used for the distance information between each point in the scene (corresponding to each pixel point in the image) and the camera device.
  • the depth information can be specifically expressed as depth map.
  • a depth map is an image or image channel that contains distance information between points in the scene and the camera. The depth map is similar to a grayscale image, and each pixel value of it is the actual distance (L) of the camera from a point in the scene, and each pixel value occupies a short (short) length to store the camera to a corresponding point. distance.
  • a neural network can be used to acquire depth information of the first image in the corresponding first camera coordinate system.
  • the neural network is a pre-trained neural network, which can perform depth prediction based on the input image and output the depth information of the scene in the image.
  • an end-to-end U-shaped deep neural network and a monocular depth prediction method based on deep learning can be used to perform depth prediction on the input first image to obtain the depth of the first image in the corresponding first camera coordinate system information.
  • the camera coordinate system is a three-dimensional (3D) coordinate system established with the focus center of the camera device as the origin and the optical axis (ie, the depth direction) as the Z axis.
  • 3D coordinate system established with the focus center of the camera device as the origin and the optical axis (ie, the depth direction) as the Z axis.
  • the camera device on the driving object is in a moving state
  • the pose of the camera device is also in a changing state
  • the corresponding 3D coordinate system is also different.
  • the first camera coordinate system corresponding to the first image is The 3D coordinate system when the camera device captures the first image.
  • step 202 and step 201 may be performed simultaneously, or may be performed in any time sequence, which is not limited in this embodiment of the present disclosure.
  • Step 203 Determine the depth information of the detection frame of the first target according to the depth information of the first image, and determine the first target detection frame based on the position of the detection frame of the first target in the image coordinate system and the depth information of the detection frame of the first target.
  • the depth information of the first image refers to the depth information of the first image in the corresponding first camera coordinate system determined in step 202
  • the depth information of the detection frame of the first target refers to Depth information of the detection frame of the first target in the first camera coordinate system.
  • Step 204 acquiring the pose change information of the camera device from the collection of the second image to the collection of the first image.
  • the second image is an image whose time sequence precedes the first image in the image sequence where the first image is located and is spaced from the first image by a preset number of frames.
  • the specific value of the preset number of frames can be set according to actual needs (for example, a specific scene, a motion state of a driving object, an image collection frequency of a camera device, etc.), and can be 0, 1, 2, 3, etc.,
  • the preset number of frames is 0, the second image and the first image are two adjacent frames of images.
  • the value of the preset number of frames is small, so as to prevent the target in the second image from being captured by the camera device in the first
  • the image has moved out of the shooting range of the camera device and cannot appear in the first image, so as to effectively detect the motion information of the target in the scene outside the driving object; while in the driving scene of the crowded urban road, the movement of the driving object
  • the value of the preset number of frames is large, so that the same target can be detected from the time when the second image is collected to the time when the first image is collected.
  • the motion information within the range can also avoid the computing resources and storage resources occupied by the frequent execution of the motion information detection method, and improve the resource utilization rate.
  • the pose change information in the embodiment of the present disclosure refers to the difference between the pose of the camera device when the first image is collected and the pose when the second image is collected.
  • the pose change information is the pose change information based on the 3D space, which can be specifically expressed as a matrix, so it can be called a pose change matrix.
  • the pose change information may include translation information and rotation information of the camera.
  • the translation information of the camera device may include: displacement amounts of the camera device on the three coordinate axes XYZ in the 3D coordinate system respectively.
  • the rotation information of the camera device may be: rotation vectors based on pitch (Roll), yaw (Yaw) and roll (Pitch), which include rotation component vectors based on the three rotation directions of Roll, Yaw and Pitch, wherein, Roll, Yaw and Pitch respectively represent the rotation of the camera device around the three coordinate axes XYZ in the 3D coordinate system.
  • visual technology can be used to obtain the pose change information of the camera device from the acquisition of the second image to the first image, for example, by using Simultaneous Localization And Mapping (Simultaneous Localization And Mapping, SLAM) method to obtain the pose change information.
  • Simultaneous Localization And Mapping Simultaneous Localization And Mapping, SLAM
  • the depth information of the first image (RGB image) and the first image and the depth information of the second image can be input into the open source Oriented FAST and Rotated BRIEF (ORB)-SLAM framework for red, green and blue depths (Red Green Blue Depth, RGBD) model, which outputs pose change information from the RGBD model.
  • the embodiment of the present disclosure may also adopt other manners, for example, using a global positioning system (Global Positioning System, GPS) and an angular velocity sensor to acquire the pose change information of the camera device from the second image to the first image.
  • GPS Global Positioning System
  • the embodiment of the present disclosure does not limit the specific manner of acquiring the pose change information of the camera device when the second image is collected to the first image.
  • Step 205 Convert the second coordinates of the second target in the second camera coordinate system corresponding to the second image into the first camera coordinate system according to the pose change information of the camera from collecting the second image to collecting the first image the third coordinate of .
  • the second target is the target in the second image corresponding to the first target, and corresponding to the first target, the second target may be one target or multiple targets, and the multiple targets may be targets of the same type ( For example, both are people), and can also be different types of targets (for example, including people, vehicles, etc.).
  • the second camera coordinate system corresponding to the second image is the 3D coordinate system when the camera device collects the second image.
  • steps 204 to 205 and steps 201 to 203 may be performed simultaneously, or may be performed in any time sequence, which is not limited in this embodiment of the present disclosure.
  • Step 206 based on the first coordinates and the third coordinates, determine the motion information of the first target within a time range corresponding to the time when the second image is collected to the time when the first image is collected.
  • the motion information of the first target may include a motion speed and a motion direction of the first target within a corresponding time range.
  • the image of the scene outside the driving object is collected by the camera device on the driving object during the driving process of the driving object, and the target detection is performed on the collected first image to obtain the detection frame of the first target, and the first image is obtained.
  • depth information in the corresponding first camera coordinate system and determine the depth information of the detection frame of the first target according to the depth information of the first image, and then, based on the position of the detection frame of the first target in the image coordinate system and The depth information of the detection frame of the first target determines the first coordinate of the first target in the first camera coordinate system; obtains the pose change information of the camera from collecting the second image to collecting the first image, wherein the second image In the image sequence where the first image is located, the time sequence is located before the first image and is separated from the first image by a preset number of frames, and then according to the pose change information, the target in the second image corresponding to the first target is used as the first target.
  • Second target convert the second coordinate of the second target in the second camera coordinate system corresponding to the second image to the third coordinate in the first camera coordinate system, and then determine the first coordinate based on the first coordinate and the third coordinate
  • the motion information of the target within the corresponding time range from the acquisition moment of the second image to the acquisition moment of the first image.
  • the embodiments of the present disclosure use computer vision technology to determine the motion information of the target in the driving scene based on the image sequence of the driving scene, without the aid of lidar.
  • Emitting a laser beam to construct point cloud data, performing target detection and target tracking on the two point cloud data, and calculating the movement speed and direction of the target can avoid a lot of computational processing, save processing time, and improve processing efficiency, which is beneficial to meet the needs of unmanned Driving and other scenarios that require high real-time performance.
  • FIG. 3 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure. As shown in FIG. 3 , on the basis of the above-mentioned embodiment shown in FIG. 2 , step 203 may include the following steps:
  • Step 2031 Obtain the depth value of each pixel in the detection frame of the first target from the depth information of the first image.
  • the depth information of the first image includes the depth value of each pixel in the first image, and the depth value of each pixel in the detection frame of the first target can be queried from the depth information of the first image.
  • Step 2032 Determine the depth information of the detection frame of the first target based on the depth value of each pixel in the detection frame of the first target in a preset manner.
  • the detection frame of the first target includes a plurality of pixels, and each pixel has its own depth value. Based on this embodiment, the detection frame of the first target is determined comprehensively based on the depth values of each pixel in the detection frame of the first target. In order to accurately determine the first coordinate of the first target in the first camera coordinate system according to the depth information and the position of the detection frame of the first target in the image coordinate system, the first target in the first camera coordinate system can be improved. The accuracy of the coordinates in the system.
  • the depth value with the highest frequency of occurrence may be selected as the depth information of the detection frame of the first target.
  • the inventor found through research that, in practical applications, due to vibration and light during vehicle driving, the quality of the image captured by the camera may be affected, resulting in some noise points in the image, which cannot be Accurately obtaining the depth values of these noise points results in that the depth values of these noise points in the depth information are too large or too small.
  • the distance between each point on the same target and the camera device is similar, and the depth value of the corresponding pixel is also similar.
  • the frequency of occurrence is the highest.
  • the depth value is the depth value corresponding to the most pixels, and the depth value of the individual pixels with large differences can be ignored to avoid the influence of the depth value of the noise pixel in the first image on the depth information of the detection frame of the entire first target. Improve the accuracy of the depth information of the detection frame of the first target.
  • the depth value range with the largest number of pixels within the value range to determine the depth information of the detection frame of the first target, for example, the maximum value of the depth value range with the largest number of pixels within the same depth value range. , the minimum value, the average value of the maximum value and the minimum value, or the median value, etc., as the depth value of the detection frame of the first target.
  • each depth value range may be pre-divided, and the number of pixels in the depth value of each pixel in the detection frame of the first target, which are respectively within the preset depth value ranges, are within a certain depth value range.
  • the greater the number of pixels, the more corresponding points on the surface of the first target, and the depth of the detection frame of the first target is determined based on the depth value range with the largest number of pixels within a certain depth value range.
  • the depth values of some pixels with large differences can be ignored, so as to avoid the influence of the depth values of noise pixels in the first image on the depth information of the detection frame of the entire first target, thereby improving the depth of the detection frame of the first target. accuracy of information.
  • the average value of the depth values of each pixel in the detection frame of the first target may also be obtained as the depth information of the detection frame of the first target.
  • the average value of the depth values of each pixel in the detection frame of the first target is obtained as the depth information of the detection frame of the first target, which can quickly determine the depth information of the detection frame of the first target and reduce individual differences
  • the larger depth value of the pixel has an influence on the depth information of the detection frame of the entire first target, thereby improving the accuracy of the depth information of the detection frame of the first target.
  • FIG. 4 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure. As shown in FIG. 4 , on the basis of the above-mentioned embodiment shown in FIG. 2 or FIG. 3 , before step 205 , the following steps may be further included:
  • Step 301 Determine the correspondence between at least one object in the first image and at least one object in the second image.
  • At least one object in the first image includes the above-mentioned first object.
  • At least one target in the first image and at least one target in the second image may be any target of interest in the scene outside the driving object, such as various types of targets such as people, vehicles, buildings, etc. .
  • the first object is one or more objects in the at least one object in the first image
  • the second object is one or more objects in the at least one object in the second image.
  • the first target is a target that needs to be detected with motion information in the first image
  • the second target is a target that belongs to the same target as the first target in the second image.
  • Step 302 determine the target in the second image corresponding to the first target as the second target.
  • step 301 After determining the correspondence between the at least one object in the first image and the at least one object in the second image, based on the correspondence, it is possible to determine the second image corresponding to the first object in the first image.
  • target is the second target.
  • the corresponding relationship between the objects in the two images can be determined for the two images.
  • the second object in the second image corresponding to the first object can be directly determined according to the corresponding relationship, so as to determine the second object in the second image. target efficiency.
  • the detection frame of at least one target in the second image may be tracked to obtain the difference between the at least one target in the first image and the at least one target in the second image. Correspondence between.
  • the correspondence between objects in different images can be obtained by tracking the detection frame of the object.
  • FIG. 5 is a schematic flowchart of a method for detecting motion information of a target provided by yet another exemplary embodiment of the present disclosure. As shown in FIG. 5 , in other embodiments, step 301 may include the following steps:
  • Step 3011 Obtain optical flow information from the second image to the first image.
  • the optical flow information is used to represent motion or timing information of pixels between images in a video or image sequence.
  • the optical flow information from the second image to the first image that is, the two-dimensional motion field from the second image to the pixels in the first image, is used to represent the movement of the pixels in the second image to the first image.
  • vision technology can be used, for example, by using an open source computer vision library (Open Source Computer Vision Library, OpenCV), for example, the second image and the first image are input into an OpenCV-based model, and the model Output optical flow information between the second image and the first image.
  • OpenCV Open Source Computer Vision Library
  • Step 3012 respectively for the detection frame of each target in at least one target in the second image, based on the optical flow information and the detection frame of the target in the second image, determine that the pixel points in the detection frame of the target in the second image are transferred to position in the first image.
  • Step 3013 obtain the intersection ratio (Intersection over Union, IoU) between the collection of the position where the pixel points in the detection frame of the target in the second image are transferred to the first image and each detection frame in the first image, that is, The coverage ratio between the set and each detection box in the first image.
  • IoU intersection ratio
  • the intersection I between the above-mentioned set and each detection frame in the first image, and the union U between the above-mentioned set and each detection frame in the first image can be obtained, and calculate respectively.
  • the ratio between the intersection I and the union U between the above set and each detection frame in the first image is taken as the coverage ratio between the set and each detection frame in the first image.
  • Step 3014 establish the corresponding relationship between the target in the second image and the target corresponding to the detection frame with the largest intersection ratio in the first image, that is, take the target corresponding to the detection frame with the largest intersection ratio in the first image as the second The target corresponding to the target in the image.
  • a set of positions where pixels in the detection frame of a certain target in the second image are transferred to the first image is determined based on the optical flow information between the two images, and the set and the position in the first image are obtained respectively.
  • the intersection ratio between each detection frame the larger the intersection ratio, the greater the repetition ratio between the detection frame in the first image and the pixels in the above set, and the intersection ratio between each detection frame in the first image and the set.
  • the probability that the largest detection frame is the detection frame of the target in the second image is greater, and the pixels in the detection frame of the target in the second image are transferred to the position in the first image through the optical flow information between the two images and the detection frame of the target in the second image.
  • the corresponding relationship between the targets in the two images can be determined by the intersection ratio between the set of , and each detection frame in the first image, which can more accurately and objectively determine the corresponding relationship between the targets in the two images.
  • FIG. 6 is a schematic flowchart of a method for detecting motion information of a target provided by yet another exemplary embodiment of the present disclosure. As shown in FIG. 6 , on the basis of the above-mentioned embodiment shown in FIG. 2 or FIG. 3 , step 206 may include the following steps:
  • Step 2061 Obtain a vector formed from the third coordinate to the first coordinate.
  • the vector formed from the third coordinate to the first coordinate is the displacement vector formed from the third coordinate to the first coordinate, that is, the directed line segment formed from the third coordinate to the first coordinate, the size of the displacement vector, is the straight-line distance from the third coordinate to the first coordinate, and the direction of the displacement vector is from the third coordinate to the first coordinate.
  • Step 2062 based on the direction of the vector formed from the third coordinate to the first coordinate, determine the movement direction of the first target within the corresponding time range from the acquisition moment of the second image to the acquisition moment of the first image, based on the third coordinate to the
  • the norm of the vector formed by a coordinate and the above time range determine the moving speed of the first target within the above time range.
  • the ratio of the norm of the vector formed by the third coordinate to the first coordinate and the above time range can be obtained as The moving speed of the first target within the above time range.
  • the movement direction and movement speed of the first target within the above-mentioned time range constitute the movement information of the first target in the above-mentioned time range.
  • the movement direction and movement speed of the first target within the above-mentioned corresponding time range can be accurately determined based on the vector formed from the third coordinate to the first coordinate, so as to know the movement state of the first target.
  • FIG. 7 is a schematic flowchart of a method for detecting motion information of a target provided by another exemplary embodiment of the present disclosure. As shown in FIG. 7 , on the basis of the above-mentioned embodiments shown in FIGS. 2 to 6 , before step 205 , the following steps may be further included:
  • Step 401 perform target detection on the second image to obtain a detection frame of the second target.
  • Step 402 acquiring depth information of the second image in the second camera coordinate system.
  • the depth information of the detection frame of the second target is determined according to the depth information of the second image in the second camera coordinate system.
  • the depth information of the detection frame of the second target refers to the depth information of the detection frame of the second target in the second camera coordinate system.
  • Step 403 Determine the second coordinates of the second target in the second camera coordinate system based on the position of the detection frame of the second target in the image coordinate system and the depth information of the detection frame of the second target.
  • target detection and acquisition of depth information can be performed in advance for the second image in the image sequence that is located before the first image, and thus the second coordinates of the second target in the second camera coordinate system are determined so as to facilitate subsequent
  • the second coordinate of the second target is directly converted to determine the motion information of the first target within the corresponding time range, thereby improving the detection efficiency of the target motion information in the scene.
  • the second coordinates of the second target may also be stored, It can be used for subsequent direct query, thereby improving the detection efficiency of target motion information in the scene.
  • the first image as a new second image, and use a third image that is sequentially positioned after the first image in the image sequence as a new first image, to perform the target pairing described in any of the foregoing embodiments of the present disclosure.
  • the method for detecting the motion information according to the invention determines the motion information of the target in the third image within the corresponding time range from the acquisition time of the first image to the acquisition time of the third image.
  • the motion information of the target in the image can be detected frame by frame or at intervals of several frames for the image sequence, so as to realize the continuous detection of the motion state of the target in the scene outside the driving target during the driving process of the driving target, so that the target can be detected according to the target.
  • the motion state of the driving object controls the driving of the driving object and ensures the safe driving of the driving object.
  • FIG. 8 is a schematic diagram of an application flow of a method for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • the method for detecting motion information of a target in an embodiment of the present disclosure is further described below by taking an application embodiment as an example.
  • the application embodiment includes:
  • Step 501 During the driving process of the driving object, a camera on the driving object collects images of scenes outside the driving object to obtain an image sequence.
  • Step 506 is performed for the camera.
  • Step 502 using a preset target detection frame, perform target detection on the second image I t-1 , and obtain the detection frame of the target in the second image It -1 , because the detected frame of the target may be one or more , the detection frame set BBox t-1 is used to represent the detected detection frame of the target in the second image I t-1 , and the detection frame of the target number k at time t-1 (hereinafter referred to as: target k) is described as:
  • (x, y) represents the coordinates of the detection frame of target k in the image coordinate system
  • w and h represent the width and height of the detection frame of target k, respectively.
  • Step 503 using a preset depth estimation method, perform depth estimation on the second image It -1 to obtain a depth map D t- 1 corresponding to the second image It -1 .
  • the depth map D t-1 includes the depth values in the second camera coordinate system corresponding to different pixel points in the second image It-1 at time t -1, and the pixel points in the second image It-1 (i , j) the depth value in the second camera coordinate system can be expressed as
  • Step 504 obtain the depth value of each pixel in the detection frame of each target in the second image I t-1 from the depth map D t- 1 corresponding to the second image I t-1 , and adopt a preset method, based on the first
  • the depth value of each pixel in the detection frame of each target in the second image It -1 determines the depth value of the detection frame of each target in the second image It-1.
  • the depth value of each pixel in the detection frame of each target in the second image I t -1 means that each pixel in the detection frame of each target in the second image I t-1 is in the second camera coordinate system depth value.
  • steps 503 to 504 and step 502 may be performed simultaneously, or may be performed in any time sequence, which is not limited in this embodiment of the present disclosure.
  • Step 505 respectively for the detection frame of each target in the second image I t-1 , based on the position of the detection frame of each target in the image coordinate system and the depth value of the detection frame of each target, determine each target at time t-1 The corresponding 3D coordinates (second coordinates) in the second camera coordinate system.
  • the 3D coordinates in the second camera coordinate system corresponding to the detection frame of target k at time t-1 can be obtained as follows
  • K is an internal parameter of the camera device, which is used to represent the properties of the camera device itself, and can be obtained by calibration in advance.
  • Step 506 Obtain the pose change matrix T t-1 ⁇ t of the camera from time t-1 to time t .
  • step 506 , steps 502 to 505 , and steps 508 to 513 may be performed simultaneously, or may be performed in any time sequence, which is not limited in this embodiment of the present disclosure.
  • Step 507 according to the above-mentioned pose change matrix T t-1 ⁇ t , respectively convert the second coordinates of each target in the second image I t-1 in the second camera coordinate system to the 3D coordinates in the first camera coordinate system (ie the third coordinate above).
  • the second coordinates of the detection frame of the target k in the second image I t-1 can be set as follows Convert to third coordinate
  • Step 508 using a preset target detection frame, perform target detection on the first image It, and obtain a detection frame of the target (that is, the above-mentioned first target) in the first image It, because the detection frame of the detected target may be a or more, the detection frame of the first target is represented by the detection frame set BBox t , and the detection frame of the target numbered k ⁇ (hereinafter referred to as: target k ⁇ ) in the first target at time t is described as:
  • (x, y) represents the coordinates of the detection frame of the target p in the image coordinate system
  • w and h respectively represent the width and height of the detection frame of the target k ⁇ .
  • Step 509 using a preset depth estimation method, perform depth estimation on the first image It to obtain a depth map D t corresponding to the first image It .
  • the depth map D t includes depth values in the first camera coordinate system corresponding to different pixel points in the first image It at time t , and the pixel point (i, j ) in the first image It is in the first camera coordinate system
  • the depth value in the system can be expressed as
  • Step 510 Obtain the depth value of each pixel in the detection frame of the first target from the depth map D t corresponding to the first image I t , and adopt a preset method based on the depth of each pixel in the detection frame of the first target. value to determine the depth value of the detection frame of the first target.
  • the depth value of the detection frame of the first target refers to the depth value of the detection frame of the first target in the first camera coordinate system.
  • steps 509 to 510 and step 508 may be performed simultaneously, or may be performed in any order in time, which is not limited in this embodiment of the present disclosure.
  • Step 511 based on the position of the detection frame of the first target in the image coordinate system and the depth value of the detection frame of the first target, determine the first coordinate in the first camera coordinate system corresponding to the first target at time t.
  • the first target may be one target or multiple targets.
  • the detection frame based on each target is determined in the image coordinate system.
  • the position and depth values determine the 3D coordinates in the first camera coordinate system corresponding to the detection frame of the target at time t (ie, the above-mentioned first coordinates). For example, continuing to take the target k ⁇ at time t as an example, the 3D coordinates in the first camera coordinate system corresponding to the detection frame of the target k ⁇ at time t can be obtained as follows
  • K is an internal parameter of the camera device, which is used to represent the properties of the camera device itself, and can be obtained by calibration in advance.
  • Step 512 Determine the correspondence between the first object in the first image It -1 and the object in the second image It.
  • Step 513 determine the target in the second image corresponding to the first target as the second target.
  • the second target may be one target or multiple targets.
  • the second target may be one target or multiple targets, and multiple targets may be targets of the same type (for example, all people), or different types of targets (for example, including people, vehicles, etc.). , buildings, etc.).
  • the second object in the second image corresponding to the first object may be determined by the manner described in any of the above-mentioned embodiments of FIG. 4 to FIG. 5 of the present disclosure,
  • steps 512 to 513 may be executed after passing through step 502 and step 508, and may be executed simultaneously with the above-mentioned other steps in this application embodiment, or may be executed in any time sequence, and this embodiment of the present disclosure does not make restrictions.
  • Step 514 based on the first coordinate of the first target and the corresponding third coordinate of the second target, determine the motion information of the first target in the corresponding time range ⁇ t from time t -1 to time t.
  • the first target may be one target or multiple targets.
  • step 514 is performed for each first target respectively.
  • the third coordinate of the corresponding second target k at time t-1 Determine the motion information of the first target k ⁇ in the corresponding time range ⁇ t . Specifically, get the third coordinate to the first coordinate
  • the formed vector, taking the direction of the vector as the movement direction of the first target k ⁇ in the corresponding time range ⁇ t is expressed as:
  • FIG. 9 is a schematic flowchart of a method for controlling a traveling object based on motion information of a target provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to traveling objects such as vehicles, robots, and toy cars. As shown in FIG. 9 , the method for controlling a traveling object based on the motion information of the target in this embodiment includes the following steps:
  • Step 601 During the driving process of the driving object, an image sequence of a scene outside the driving object is collected by a camera device on the driving object.
  • Step 602 taking at least one frame image in the image sequence as the first image, and taking at least one frame image in the image sequence before the first image and separated from the first image by a preset number of frames as the second image,
  • the motion information of the target in the scene outside the driving object is determined by using the method for detecting the motion information during driving according to any of the above embodiments of the present disclosure.
  • Step 603 Generate a control instruction for controlling the driving state of the driving object according to the motion information of the above target, so as to control the driving state of the driving object.
  • the motion information detection method during driving described in any of the embodiments of the present disclosure can be used to determine the motion information of the target in the driving scene, and then generate the motion information for controlling the driving according to the motion information of the target.
  • the control command of the driving state of the object thus realizing the use of computer vision technology to detect the motion information of the target in the driving scene, and the intelligent driving control of the driving object, which is beneficial to meet the real-time intelligent driving control of the driving object in the unmanned scene, so as to ensure Safe driving of moving objects.
  • control instructions may include, but are not limited to, at least one of the following: a control instruction for maintaining the size of the motion speed, a control instruction for adjusting the size of the motion speed (for example, a control for decelerating travel).
  • control commands for accelerating driving, etc.
  • control commands for maintaining the direction of movement such as control commands for left steering, control commands for right steering, and control commands for merging to the left lane , or a control command to merge to the right lane, etc.
  • a control command for early warning such as a reminder message to pay attention to the target ahead
  • a control command for switching the driving mode such as switching to the automatic cruise driving mode
  • control instructions for switching to manual driving mode etc.
  • the method for detecting the motion information of the target or the method for controlling the traveling object based on the motion information of the target provided by any of the above embodiments of the present disclosure can be executed by any appropriate device with data processing capabilities, including but not limited to: terminal equipment and servers etc.
  • the method for detecting the motion information of the target or the method for controlling the traveling object based on the motion information of the target provided by any of the above-mentioned embodiments of the present disclosure may be executed by a processor, for example, the processor calls the corresponding instructions stored in the memory to
  • the method for detecting motion information of a target or the method for controlling a traveling object based on the motion information of the target provided by any of the above embodiments of the present disclosure is executed. No further description will be given below.
  • FIG. 10 is a schematic structural diagram of an apparatus for detecting motion information of a target provided by an exemplary embodiment of the present disclosure.
  • the device for detecting the motion information of the target may be installed in electronic equipment such as terminal equipment and servers, or may be installed on moving objects such as vehicles, robots, toy cars, etc., to execute the motion of the target in any of the above embodiments of the present disclosure method of information detection.
  • the apparatus for detecting motion information of a target includes: a detection module 701 , a first acquisition module 702 , a first determination module 703 , a second determination module 704 , a second acquisition module 705 , a conversion module 706 and The third determination module 707 . in:
  • the detection module 701 is configured to perform target detection on a first image to obtain a detection frame of the first target, where the first image is an image of a scene outside the driving object collected by a camera on the driving object during the driving process of the driving object.
  • the first acquiring module 702 is configured to acquire depth information of the first image in the corresponding first camera coordinate system.
  • the first determining module 703 is configured to determine the depth information of the detection frame of the first target according to the depth information of the first image obtained by the first obtaining module 702 .
  • the second determination module 704 is configured to determine that the first target is in the image coordinate system based on the position of the detection frame of the first target obtained by the detection module 701 and the depth information of the detection frame of the first target determined by the first determination module 703 The first coordinate in the first camera coordinate system.
  • the second obtaining module 705 is configured to obtain the pose change information of the camera device from collecting the second image to collecting the first image.
  • the second image is an image whose time sequence is located before the first image in the image sequence where the first image is located and is spaced from the first image by a preset number of frames.
  • the conversion module 706 is configured to convert the second coordinate of the second target in the second camera coordinate system corresponding to the second image to the third coordinate in the first camera coordinate system according to the pose change information obtained by the second acquisition module 705. coordinate.
  • the second target is the target in the second image corresponding to the first target.
  • the third determining module 707 is configured to determine, based on the first coordinates determined by the second determining module 704 and the third coordinates converted by the converting module 706, the corresponding time of the first target from the time of capturing the second image to the time of capturing the first image Motion information in range.
  • computer vision technology is used to determine the motion information of the target in the driving scene based on collecting the scene outside the driving object during the driving process of the driving object, without the aid of lidar, compared with the use of lidar to obtain the target motion speed and
  • the method of direction because there is no need to construct point cloud data by emitting laser beams at high frequencies, perform target detection and target tracking on two point cloud data, and calculate the movement speed and direction of the target, it can avoid a lot of computational processing and save processing time. , improve processing efficiency, and help meet the needs of scenarios with high real-time requirements such as unmanned driving.
  • FIG. 11 is a schematic structural diagram of an apparatus for detecting motion information of a target provided by another exemplary embodiment of the present disclosure.
  • the first determining module 703 includes: The depth value of each pixel in the detection frame of the first target is obtained from the depth information of the image; the first determination unit 7032 is configured to adopt a preset method based on each pixel in the detection frame of the first target obtained by the first obtaining unit 7031 The depth value of the point determines the depth information of the detection frame of the first target.
  • the first determining unit 7032 is specifically configured to select, among the depth values of each pixel in the detection frame of the first target acquired by the first acquiring unit 7031, the depth value with the highest occurrence frequency as the depth value.
  • the depth information of the detection frame of the first target is specifically configured to select, among the depth values of each pixel in the detection frame of the first target acquired by the first acquiring unit 7031, the depth value with the highest occurrence frequency as the depth value.
  • the first determining unit 7032 is specifically configured to determine, among the depth values of each pixel in the detection frame of the first target, the number of pixels that are respectively within the preset depth value ranges; based on The depth information of the detection frame of the first target is determined in the depth value range with the largest number of pixel points in the same depth value range.
  • the first determining unit 7032 is specifically configured to acquire the average value of the depth values of each pixel in the detection frame of the first target as the depth information of the detection frame of the first target.
  • the apparatus for detecting the motion information of the target in the above embodiment may further include: a fourth determination module 708 and a fifth determination module 709. in:
  • the fourth determination module 708 is configured to determine the correspondence between at least one object in the first image and at least one object in the second image; wherein, the objects in the first image include the above-mentioned first object.
  • the fifth determining module 709 is configured to determine, according to the correspondence determined by the fourth module 708, the target in the second image corresponding to the first target as the above-mentioned second target.
  • the fourth determination module 708 is specifically configured to track the detection frame of at least one target in the second image to obtain at least one target in the first image and the target in the second image. Correspondence between at least one target.
  • the fourth determining module 708 may include: a second acquiring unit 7081, configured to acquire optical flow information from the second image to the first image; and a second determining unit 7082, configured to respectively target the second The detection frame of each target in at least one target in the image, based on the above-mentioned optical flow information and the detection frame of the target in the second image, determine the position where the pixel points in the detection frame of the target in the second image are transferred to the first image
  • the third acquisition unit 7083 is used to obtain the intersection ratio between the collection of the positions where the pixel points in the detection frame of the target are transferred to the first image and each detection frame in the first image; the establishment unit 7084 is used to establish The correspondence between the target in the second image and the target corresponding to the detection frame with the largest intersection ratio in the first image.
  • the third determining module 707 includes: a fourth acquiring unit 7071, configured to acquire a vector formed from the third coordinate to the first coordinate; a third determining unit 7072, using Based on the direction of the vector obtained by the fourth obtaining unit 7071, the movement direction of the first target within the above time range is determined, and the movement speed of the first target within the above time range is determined based on the norm of the vector and the above time range.
  • the detection module 701 may also be configured to perform target detection on the second image to obtain a detection frame of the second target.
  • the first obtaining module 702 may also be configured to obtain depth information of the second image in the second camera coordinate system.
  • the second determination module 704 can also be configured to determine the second target based on the position of the detection frame of the second target obtained by the detection module 701 in the image coordinate system and the depth information of the detection frame of the second target determined by the first determination module 703 The second coordinate in the second camera coordinate system.
  • the apparatus for detecting motion information of a target in the above embodiment may further include: a storage module 710 configured to store the second data of the second target determined by the second determining module 704 coordinate.
  • the first image may also be used as the new second image, and the third image in the sequence of images located after the first image may be used as the new image.
  • each module in the device for detecting the motion information of the target performs corresponding operations to determine the target in the third image within the corresponding time range from the acquisition moment of the first image to the acquisition moment of the third image. sports information.
  • FIG. 12 is a schematic structural diagram of an apparatus for controlling a traveling object based on motion information of a target provided by an exemplary embodiment of the present disclosure.
  • the device for controlling the traveling object based on the motion information of the target can be installed on the traveling objects such as vehicles, robots, toy cars, etc., to control the traveling object based on the motion information of the target. of.
  • the device for controlling the traveling object based on the motion information of the target includes a camera device 801 , a motion information detection device 802 and a control device 803 . in:
  • the camera device 801 is arranged on the driving object, and is used for collecting an image sequence of a scene outside the driving object during the driving process of the driving object.
  • the motion information detection device 802 is configured to use at least one frame image in the above image sequence as the first image, and use at least one frame image in the above image sequence before the first image and spaced from the first image by a preset number of frames As the second image, the motion information of the object in the scene outside the driving object is determined.
  • the motion information detection apparatus 802 may be specifically implemented by the apparatus for detecting motion information of a target according to any of the embodiments in FIG. 10 to FIG. 11 .
  • the control device 803 is configured to generate a control instruction for controlling the traveling state of the traveling object according to the motion information of the target detected by the motion information detection device 802 .
  • the image sequence of the scene outside the driving object is collected by the camera device on the driving object, and at least one frame image in the image sequence is used as the first image, and the image sequence located in the first image in the image sequence is used as the first image.
  • At least one frame of image before the image and separated from the first image by a preset number of frames is used as the second image, and the motion information of the target in the driving scene is determined by using the method for detecting the motion information of the target described in any embodiment of the present disclosure, Then, according to the motion information of the target, a control command for controlling the driving state of the driving object is generated, so that the computer vision technology is used to detect the motion information of the target in the driving scene, and the intelligent driving control of the driving object is realized, which is conducive to satisfying the unmanned driving scene. Real-time intelligent driving control of the driving object in the middle of the road to ensure the safe driving of the driving object.
  • control instructions may include, but are not limited to, at least one of the following: a control instruction for maintaining the size of the motion speed, a control instruction for adjusting the size of the motion speed, and a control instruction for maintaining the motion direction
  • FIG. 13 illustrates a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device includes one or more processors 11 and memory 12 .
  • the processor 11 may be a central processing unit (Central Processing Unit, CPU) or other form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the electronic device 10 to perform desired functions.
  • CPU Central Processing Unit
  • Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (Random Access Memory, RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (Read-Only Memory, ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 11 may execute the program instructions to implement the detection of the motion information of the target in the various embodiments of the present disclosure described above method or a method of controlling a traveling object based on the motion information of the target and/or other desired functions.
  • Various contents such as depth information of an image, depth information of a detection frame of a target, and pose change information of a camera can also be stored in the computer-readable storage medium.
  • the electronic device 10 may also include an input device 13 and an output device 14 interconnected by a bus system and/or other form of connection mechanism (not shown).
  • the input device 13 may be the aforementioned microphone or microphone array, or the input device 13 may be a communication network connector.
  • the input device 13 may also include, for example, a keyboard, a mouse, and the like.
  • the output device 14 can output various information to the outside, including the determined motion information of the first target within a time range corresponding to the time of collection of the second image to the time of collection of the first image.
  • the output devices 14 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.
  • the electronic device 10 may also include any other suitable components according to the specific application.
  • embodiments of the present disclosure may also be computer program products comprising computer program instructions that, when executed by a processor, cause the processor to perform the "exemplary method" described above in this specification Steps in a method for detecting motion information of a target or a method for controlling a traveling object based on the motion information of the target described in the various embodiments of the present disclosure.
  • the computer program product may write program code for performing operations of embodiments of the present disclosure in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as "C" language or similar programming languages.
  • the program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • embodiments of the present disclosure may also be computer-readable storage media having computer program instructions stored thereon that, when executed by a processor, cause the processor to perform the above-described "Example Method" section of this specification Steps in a method for detecting motion information of a target or a method for controlling a traveling object based on the motion information of the target according to various embodiments of the present disclosure described in .
  • the computer-readable storage medium may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above.
  • readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage devices, magnetic storage devices, Or any suitable combination of the above.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing methods according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
  • each component or each step may be decomposed and/or recombined. These disaggregations and/or recombinations should be considered equivalents of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

公开了一种对目标的运动信息进行检测的方法和装置、设备和介质,其中,运动信息检测方法包括:对第一图像进行目标检测,得到第一目标的检测框;获取第一图像在对应的第一相机坐标系中的深度信息并由此确定第一目标的检测框的深度信息,基于第一目标的检测框在图像坐标系中的位置和深度信息确定第一目标在第一相机坐标系中的第一坐标;根据摄像装置的位姿变化信息,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标;基于第一坐标和第三坐标确定第一目标的运动信息。本公开实施例提供的技术方案,可以避免大量的计算处理,提高处理效率。

Description

对目标的运动信息进行检测的方法和装置、设备和介质 技术领域
本公开涉及计算机视觉技术,尤其是一种对目标的运动信息进行检测的方法和装置、基于目标的运动信息控制行驶对象的方法和装置、电子设备及存储介质。
背景技术
物体的运动速度和方向估计是无人驾驶、安防监控、场景理解等领域的研究重点。在无人驾驶、安防监控、场景理解等领域中,需要估计出场景中所有物体的运动速度和方向并提供给决策层,以便决策层进行相应决策。例如,在无人驾驶系统中,在感知到处于道路旁边的运动物体(如人或者动物等)向道路中央靠近时,决策层可以控制车辆减速行驶,甚至停车,以保障车辆的安全行驶。
目前,在无人驾驶、安防监控、场景理解等场景中,大多采用激光雷达进行数据采集,通过高频率的发射激光束,然后根据激光束的发出时间和接收时间来计算与目标点之间的距离,得到点云数据,然后在某个时间范围对应的两个时刻采集得到的点云数据上进行目标检测和目标追踪,再计算目标在该时间范围内的运动速度和方向。
发明内容
为了解决上述技术问题,提出了本公开。本公开的实施例提供了一种对目标的运动信息进行检测的方法和装置、基于目标的运动信息控制行驶对象的方法和装置、电子设备及存储介质。
根据本公开实施例的一个方面,提供了一种对目标的运动信息进行检测的方法,包括:
对第一图像进行目标检测,得到第一目标的检测框,所述第一图像为行驶对象上的摄像装置在所述行驶对象行驶过程中采集的所述行驶对象外场景的图像;
获取所述第一图像在对应的第一相机坐标系中的深度信息;
根据所述第一图像的深度信息,确定所述第一目标的检测框的深度信息,并基于所述第一目标的检测框在图像坐标系中的位置和所述第一目标的检测框的深度信息,确定所述第一目标在所述第一相机坐标系中的第一坐标;
获取摄像装置从采集第二图像到采集所述第一图像的位姿变化信息;其中,所述第二图像为所述第一图像所在图像序列中时序位于所述第一图像之前、且与所述第一图像间隔预设帧数的图像;
根据所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标;其中,所述第二目标为所述第一目标对应的第二图像中的目标;
基于所述第一坐标和所述第三坐标,确定所述第一目标从所述第二图像的采集时刻到所述第一图像的采集时刻对应时间范围内的运动信息。
根据本公开实施例的另一个方面,提供了一种智能驾驶控制方法,包括:
在行驶对象行驶过程中,通过所述行驶对象上的摄像装置采集所述行驶对象外场景的图像序列;
以所述图像序列中的至少一频帧图像作为第一图像、以所述图像序列中位于所述第一图像之前、且与所述第一图像间隔预设帧数的至少一帧图像作为第二图像,利用本公开任一实施例所述对目标的运动信息进行检测的方法,确定所述场景中目标的运动信息;
根据所述目标的运动信息生成用于控制所述行驶对象行驶状态的控制指令。
根据本公开实施例的又一个方面,提供了一种对目标的运动信息进行检测的装置,包括:
检测模块,用于对第一图像进行目标检测,得到第一目标的检测框,所述第一图像为行驶对象上的摄像装置在所述行驶对象行驶过程中采集的所述行驶对象外场景的图像;
第一获取模块,用于获取所述第一图像在对应的第一相机坐标系中的深度信息;
第一确定模块,用于根据所述第一获取模块获取的所述第一图像的深度信息,确定所述第一目标的检测框的深度信息;
第二确定模块,用于基于所述检测模块得到的所述第一目标的检测框在图像坐标系中的位置和所述第一确定模块确定的所述第一目标的检测框的深 度信息,确定所述第一目标在所述第一相机坐标系中的第一坐标;
第二获取模块,用于获取摄像装置从采集第二图像到采集所述第一图像的位姿变化信息;其中,所述第二图像为所述第一图像所在图像序列中时序位于所述第一图像之前、且与所述第一图像间隔预设帧数的图像;
转换模块,用于根据所述第二获取模块获取的所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标;其中,所述第二目标为所述第一目标对应的第二图像中的目标;
第三确定模块,用于基于所述第二确定模块确定的所述第一坐标和所述转换模块转换到的所述第三坐标,确定所述第一目标从所述第二图像的采集时刻到所述第一图像的采集时刻对应时间范围内的运动信息。
根据本公开实施例的再一个方面,提供了一种智能驾驶控制装置,包括:
摄像装置,设置于行驶对象上,用于在行驶对象行驶过程中,采集所述行驶对象外场景的图像序列;
运动信息检测装置,用于以所述图像序列中的至少一频帧图像作为第一图像、以所述图像序列中位于所述第一图像之前、且与所述第一图像间隔预设帧数的至少一帧图像作为第二图像,确定所述场景中目标的运动信息;所述运动信息检测装置包括本公开任一实施例所述对目标的运动信息进行检测的装置;
控制装置,用于根据所述运动信息检测装置检测到的所述目标的运动信息,生成用于控制所述行驶对象行驶状态的控制指令。
根据本公开实施例的又一个方面,提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行本公开上述任一实施例所述的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法。
根据本公开实施例的再一个方面,提供了一种电子设备,所述电子设备包括:
处理器;
用于存储所述处理器可执行指令的存储器;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现本公开上述任一实施例所述的对目标的运动信息进行检测的方法或 者基于目标的运动信息控制行驶对象的方法。
基于本公开上述实施例提供的对目标的运动信息进行检测的方法和装置、电子设备及存储介质,通过行驶对象上的摄像装置在行驶对象行驶过程中采集该行驶对象外场景的图像,对采集到的第一图像进行目标检测,得到第一目标的检测框,获取第一图像在对应的第一相机坐标系中的深度信息,并根据该第一图像的深度信息确定第一目标的检测框的深度信息,然后,基于第一目标的检测框在图像坐标系中的位置和第一目标的检测框的深度信息,确定第一目标在第一相机坐标系中的第一坐标;获取摄像装置从采集第二图像到采集第一图像的位姿变化信息,其中的第二图像为第一图像所在图像序列中时序位于第一图像之前、且与第一图像间隔预设帧数的图像,然后根据该位姿变化信息,以第一目标对应的第二图像中的目标作为第二目标,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标,进而,基于第一坐标和第三坐标,确定第一目标从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息。本公开实施例利用计算机视觉技术,基于驾驶场景图像序列确定驾驶场景中目标的运动信息,无需借助于激光雷达,相比于采用激光雷
基于本公开上述实施例提供的基于目标的运动信息控制行驶对象的方法和装置、电子设备及存储介质,在行驶对象行驶过程中,通过行驶对象上的摄像装置采集行驶对象外场景的图像序列,以图像序列中的至少一频帧图像作为第一图像、以图像序列中位于第一图像之前、且与第一图像间隔预设帧数的至少一帧图像作为第二图像,利用本公开任一实施例所述对目标的运动信息进行检测的方法确定驾驶场景中目标的运动信息,进而根据该目标的运动信息生成用于控制行驶对象行驶状态的控制指令,从而实现了利用计算机视觉技术检测驾驶场景中目标的运动信息、对行驶对象的智能驾驶控制,有利于满足无人驾驶场景中对行驶对象的实时智能驾驶控制,以保障行驶对象的安全行驶。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其他目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一 步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1是本公开所适用的场景图。
图2是本公开一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图3是本公开另一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图4是本公开又一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图5是本公开再一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图6是本公开还一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图7是本公开又一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。
图8是本公开一示例性实施例提供的对目标的运动信息进行检测的方法的一个应用流程示意图。
图9是本公开一示例性实施例提供的基于目标的运动信息控制行驶对象的方法的流程示意图。
图10是本公开一示例性实施例提供的对目标的运动信息进行检测的装置的结构示意图。
图11是本公开另一示例性实施例提供的对目标的运动信息进行检测的装置的结构示意图。
图12是本公开一示例性实施例提供的基于目标的运动信息控制行驶对象的装置的结构示意图。
图13是本公开一示例性实施例提供的电子设备的结构图。
具体实施方式
下面,将参考附图详细地描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理 解,本公开不受这里描述的示例实施例的限制。
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
另外,本公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
申请概述
在实现本公开的过程中,本公开发明人通过研究发现,激光雷达可以获得一个瞬时场景中若干个点的深度值,但无法直接得到某个物体的运动速度和方向等信息,若想获知物体在某个时间范围内的运动速度和方向,还需要在该时间范围对应的两个时刻采集得到的点云数据上进行目标检测和目标追踪,再计算目标在该时间范围内的运动速度和方向,需要大量的计算处理,所需时间较长,效率较低,无法满足无人驾驶等对实时性要求较高的场景需求。
本公开实施例提供了一种利用计算机视觉技术,基于驾驶场景图像序列获取驾驶场景中目标的运动信息的技术方案,通过行驶对象上的摄像装置在行驶对象行驶过程中采集该行驶对象外场景的图像,对采集到的图像序列中间隔预设帧数的第一图像和第二图像进行目标检测和目标跟踪,同一目标在第一图像对应的第一相机坐标系中的第一坐标和在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系得到的第三坐标,再基于第一坐标和第三坐标确定该目标在第一图像和第二图像的采集时刻对应时间范围内的运动信息。本公开实施例无需借助于激光雷达,可以避免大量的计算处理,节省处理时间,提高处理效率,有利于满足无人驾驶等对实时性要求较高的场景需求。
基于本公开上述实施例提供的上述技术方案检测到驾驶场景中目标的运动信息后,可以根据目标的运动信息生成用于控制行驶对象行驶状态的控制指令,从而实现了利用计算机视觉技术检测驾驶场景中目标的运动信息、对行驶对象的智能驾驶控制,有利于满足无人驾驶场景中对行驶对象的实时智能驾驶控制,以保障行驶对象的安全行驶。
示例性系统
本公开实施例可以应用于行驶对象、机器人、玩具车等行驶对象的智能驾驶控制场景,通过检测行驶对象的驾驶场景中目标的运动信息,生成用于控制行驶对象行驶状态的控制指令,对行驶对象的行驶状态进行控制。
图1是本公开所适用的一个场景图。如图1所示,本公开实施例应用于行驶对象的智能驾驶控制场景时,由行驶对象上的图像采集模块101(例如摄像头等摄像装置)采集得到图像序列输入本公开实施例的运动信息检测装置102;运动信息检测装置102,以该图像序列中的每一帧图像或者间隔若干帧选取的一帧图像作为第二图像,以该图像序列中时序位于第二图像之后、与第二图像间隔一定帧数的一帧图像作为第一图像,对第一图像进行目标检测,得到第一目标的检测框;获取第一图像在对应的第一相机坐标系中的深度信息,并根据该第一图像的深度信息确定第一目标的检测框的深度信息;基于第一目标的检测框在图像坐标系中的位置和第一目标的检测框的深度信息确定第一目标在第一相机坐标系中的第一坐标;根据摄像装置从采集第二图像到采集第一图像的位姿变化信息,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标;进而,基于第一坐标和第三坐标,确定第一目标从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息并输出;控制装置103,基于运动信息检测装置102输出的第一目标在对应时间范围内的运动信息,控制车辆、机器人、玩具车等行驶对象的行驶状态。例如,在控制行驶对象行驶状态的应用场景中,若基于第一目标的运动信息(该运动信息可包括运动速度和运动方向)和行驶对象的行驶状态(该行驶状态可包括行驶速度和行驶方向),确定行驶对象与第一目标在未来5秒钟内可能发生碰撞,则控制装置103生成用于控制行驶对象减速行驶的控制指令并输出给该行驶对象,以控制当前行驶对象减速行驶,避免行驶对象与第一目标发生碰撞。本公开实施例对具体的应用场景不做限制。
示例性方法
图2是本公开一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。本实施例可应用在电子设备上,也可以应用于车辆、机器人、 玩具车等行驶对象上。如图2所示,该实施例的对目标的运动信息进行检测的方法包括如下步骤:
步骤201,对第一图像进行目标检测,得到第一目标的检测框。
其中,第一图像为行驶对象上的摄像装置在该行驶对象行驶过程中采集的该行驶对象外场景的图像。该第一图像可以为RGB(红绿蓝)图像,也可以为灰度图像,本公开实施例对第一图像不做限制。
可选地,本公开实施例中的目标,可以是行驶对象外场景中任意感兴趣的目标,例如运动或静止的人、小动物、物体等,其中的物体例如可以是车辆、道路两侧的建筑物、绿植、道路标线、交通交通信号灯等,本公开实施例对需要检测的目标不做限定,可以根据实际需求确定。
可选地,在其中一些实施方式中,可以采用预设目标检测框架,例如,循环卷积神经网络(Recurrent Neural Network,RCNN)、加速循环卷积神经网络(Fast RCNN)、掩模(Mask RCNN)等基于区域的算法,只需瞄一眼(You Only Look Once,YOLO)等基于回归的算法,Faster RCNN和YOLO结合得到的单步多框检测(Single Shot MultiBox Detector,SSD)算法,等等,对第一图像进行目标检测。本公开实施例对目标检测的具体方式、采用的目标检测框架不做限制。
本公开实施例中,第一目标为第一图像中的目标,可以为一个目标,也可以为多个目标,多个目标可以为相同类型的目标(例如都为人),也可以为不同类型的目标(例如包括人、车辆等)。相应地,对第一图像进行目标检测,得到第一目标的检测框可以为一个,也可以为多个。本公开实施例对第一目标的数量和类型不做限制。
本公开实施例中的检测框为目标的边界框(Bounding Box)。可选地,可以使用思维向量(x,y,w,h)来表示每个检测框,其中,(x,y)表示检测框在图像坐标系中的坐标,可以是检测框的中心点或预设任一顶点在图像坐标系中的坐标;w、h分别表示检测框的宽和高。
步骤202,获取第一图像在对应的第一相机坐标系中的深度信息。
本公开实施例中,深度(Depth)信息用于场景中各点(分别对应于图像中的各像素点)与摄像装置之间的距离信息,在其中一些实施方式中,深度信息具体可以表示为深度图。深度图是包含场景中各点与摄像装置之间的距离信息的图像或图像通道。深度图类似于灰度图像,它的每个像素值是摄像 装置距离场景中一个点的实际距离(L),每个像素值占用一个短(short)长度来存储摄像装置到对应的一个点的距离。
可选地,在其中一些实施方式中,可以通过一个神经网络,来获取第一图像在对应的第一相机坐标系中的深度信息。其中的神经网络为预先训练好的神经网络,可以基于输入的图像进行深度预测,并输出该图像中场景的深度信息。例如,可以采用一个端到端的U-型深度神经网络,基于深度学习的单目深度预测方法,对输入的第一图像进行深度预测,得到第一图像在对应的第一相机坐标系中的深度信息。
本公开实施例中,相机坐标系是以摄像装置的聚焦中心为原点,以光轴(即深度方向)为Z轴建立的三维(3D)坐标系。在行驶对象行驶过程中,行驶对象上的摄像装置处于运动状态下,摄像装置的位姿也处于变化状态中,相应建立的3D坐标系也不相同,第一图像对应的第一相机坐标系即摄像装置采集第一图像时的3D坐标系。
可选地,步骤202与步骤201可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤203,根据第一图像的深度信息,确定第一目标的检测框的深度信息,并基于第一目标的检测框在图像坐标系中的位置和第一目标的检测框的深度信息,确定第一目标在第一相机坐标系中的第一坐标。
其中,在步骤203中,第一图像的深度信息,指的是通过步骤202确定的第一图像在对应的第一相机坐标系中的深度信息,第一目标的检测框的深度信息指的是第一目标的检测框在第一相机坐标系中的深度信息。
步骤204,获取摄像装置从采集第二图像到采集第一图像的位姿变化信息。
其中,第二图像为第一图像所在图像序列中时序位于第一图像之前、且与第一图像间隔预设帧数的图像。
本公开实施例中,预设帧数的具体取值可以根据实际需求(例如具体场景、行驶对象的运动状态、摄像装置的图像采集频率等)设置,可以为0、1、2、3等,预设帧数为0时,第二图像和第一图像为相邻的两帧图像。例如,在高速驾驶场景,行驶对象的运动速度较大和/或摄像装置的图像采集频率较高时,预设帧数的取值较小,以避免第二图像中的目标在摄像装置采集第一图像时已经移动至摄像装置的拍摄范围之外、从而无法出现在第一图像中,实现对行驶对象外场景中目标的运动信息的有效检测;而在拥挤的城市道路 驾驶场景,行驶对象的运动速度较小和/或摄像装置的图像采集频率较低时,预设帧数的取值较大,这样,既可以检测到同一目标在第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息,还可以避免频繁执行运动信息检测方法所需占用的计算资源和存储资源,提高资源利用率。
可选的,本公开实施例中的位姿变化信息是指:摄像装置在采集第一图像时的位姿,与在采集第二图像时的位姿之间的差异。该位姿变化信息为基于3D空间的位姿变化信息,具体可以表示为矩阵,因此可以称为位姿变化矩阵。该位姿变化信息可以包括:摄像装置的平移信息和旋转信息。其中,摄像装置的平移信息可以包括:摄像装置分别在3D坐标系中三个坐标轴XYZ上的位移量。其中的摄像装置的旋转信息可以为:基于俯仰(Roll)、偏航(Yaw)和翻滚(Pitch)的旋转向量,其包括基于Roll、Yaw和Pitch这三个旋转方向的旋转分量向量,其中,Roll、Yaw和Pitch分别表示摄像装置绕3D坐标系中三个坐标轴XYZ的旋转。
可选的,在其中一些实施方式中,可以利用视觉技术,来获取摄像装置从采集第二图像到第一图像时的位姿变化信息,例如,利用即时定位与地图构建(Simultaneous Localization And Mapping,SLAM)方式,获取位姿变化信息。例如,可以将第一图像(RGB图像)和第一图像的深度信息以及第二图像(RGB图像)输入开源定向快速和旋转摘要(Oriented FAST and Rotated BRIEF,ORB)-SLAM框架的红绿蓝深度(Red Green Blue Depth,RGBD)模型,由RGBD模型输出位姿变化信息。另外,本公开实施例也可以采用其他方式,例如,利用全球定位系统(Global Positioning System,GPS)和角速度传感器,获取摄像装置从采集第二图像到第一图像时的位姿变化信息。本公开实施例对获取摄像装置从采集第二图像到第一图像时的位姿变化信息的具体方式不做限制。
步骤205,根据摄像装置从采集第二图像到采集第一图像的位姿变化信息,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标。
其中,第二目标为第一目标对应的第二图像中的目标,与第一目标相应地,第二目标可以为一个目标,也可以为多个目标,多个目标可以为相同类型的目标(例如都为人),也可以为不同类型的目标(例如包括人、车辆等)。
本公开实施例中,第二图像对应的第二相机坐标系即摄像装置采集第二 图像时的3D坐标系。
可选地,步骤204~步骤205与步骤201~步骤203可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤206,基于第一坐标和第三坐标,确定第一目标从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息。
本公开实施例中,第一目标的运动信息可以包括第一目标在对应时间范围内的运动速度和运动方向。
本实施例中,通过行驶对象上的摄像装置在行驶对象行驶过程中采集该行驶对象外场景的图像,对采集到的第一图像进行目标检测,得到第一目标的检测框,获取第一图像在对应的第一相机坐标系中的深度信息,并根据该第一图像的深度信息确定第一目标的检测框的深度信息,然后,基于第一目标的检测框在图像坐标系中的位置和第一目标的检测框的深度信息,确定第一目标在第一相机坐标系中的第一坐标;获取摄像装置从采集第二图像到采集第一图像的位姿变化信息,其中的第二图像为第一图像所在图像序列中时序位于第一图像之前、且与第一图像间隔预设帧数的图像,然后根据该位姿变化信息,以第一目标对应的第二图像中的目标作为第二目标,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标,进而,基于第一坐标和第三坐标,确定第一目标从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息。本公开实施例利用计算机视觉技术,基于驾驶场景图像序列确定驾驶场景中目标的运动信息,无需借助于激光雷达,相比于采用激光雷达获取目标运动速度和方向的方式,由于无需通过高频率的发射激光束构建点云数据、在两个点云数据上进行目标检测和目标追踪、计算目标的运动速度和方向,可以避免大量的计算处理,节省处理时间,提高处理效率,有利于满足无人驾驶等对实时性要求较高的场景需求。
图3是本公开另一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。如图3所示,在上述图2所示实施例的基础上,步骤203可包括如下步骤:
步骤2031,从第一图像的深度信息中获取第一目标的检测框中各像素点的深度值。
第一图像的深度信息包括第一图像中各像素点的深度值,可以从第一图 像的深度信息中查询第一目标的检测框中各像素点的深度值。
步骤2032,采用预设方式,基于第一目标的检测框中各像素点的深度值,确定第一目标的检测框的深度信息。
第一目标的检测框中包括多个像素点,每个像素都有各自的深度值,基于本实施例,综合基于第一目标的检测框中各像素点的深度值确定第一目标的检测框的深度信息,以便根据该深度信息和第一目标的检测框在图像坐标系中的位置准确确定第一目标在第一相机坐标系中的第一坐标,可以提高第一目标在第一相机坐标系中坐标的准确性。
例如,在其中一些实施方式中,可以选取第一目标的检测框中各像素点的深度值中,出现频率最高的深度值作为第一目标的检测框的深度信息。
在实现本公开发明的过程中,发明人通过研究发现,在实际应用中,由于车辆行驶过程中的振动、光线等原因,可能影响摄像装置采集的图像质量,导致图像中存在一些噪声点,无法准确获取这些噪声点的深度值,导致深度信息中这些噪声点的深度值过大或过小。而场景中同一个目标上各点与摄像装置之间的距离相近,对应像素的深度值也相近,本实施例中,选取第一目标的检测框中各像素点的深度值中,出现频率最高的深度值即最多像素点对应的深度值,可以忽略个别差异较大的像素点的深度值,避免第一图像中噪声像素点的深度值对整个第一目标的检测框的深度信息的影响,提高第一目标的检测框的深度信息的准确性。
或者,在另一些实施方式中,也可以确定第一目标的检测框中各像素点的深度值中,分别处于预设各深度值范围内的像素点的数量,然后,基于深度值处于同一深度值范围内的像素点的数量最多的深度值范围,确定第一目标的检测框的深度信息,例如,以该深度值处于同一深度值范围内的像素点的数量最多的深度值范围的最大值、最小值、最大值和最小值的平均值、或者中值等,作为第一目标的检测框的深度值。
本实施例中,可以预先划分各深度值范围,统计第一目标的检测框中各像素点的深度值中分别处于预设各深度值范围内的像素点的数量,处于某一深度值范围内的像素点的数量越多,对应的第一目标表面上的点越多,基于深度值中处于某一深度值范围内的像素点的数量最多的深度值范围确定第一目标的检测框的深度信息,可以忽略部分差异较大的像素点的深度值,避免第一图像中噪声像素点的深度值对整个第一目标的检测框的深度信息的影响, 进而提高第一目标的检测框的深度信息的准确性。
或者,在又一些实施方式中,还可以获取第一目标的检测框中各像素点的深度值的平均值,作为第一目标的检测框的深度信息。
本实施例中,获取第一目标的检测框中各像素点的深度值的平均值作为第一目标的检测框的深度信息,可以快速确定第一目标的检测框的深度信息,并降低个别差异较大的像素点的深度值对整个第一目标的检测框的深度信息的影响,进而提高第一目标的检测框的深度信息的准确性。
图4是本公开又一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。如图4所示,在上述图2或图3所示实施例的基础上,在步骤205之前,还可包括如下步骤:
步骤301,确定第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系。
其中,第一图像中的至少一个目标包括上述第一目标。
本公开实施例中,第一图像中的至少一个目标、第二图像中的至少一个目标,可以是行驶对象外场景中任意感兴趣的目标,例如人、车辆、建筑物等各种类型的目标。其中的第一目标为第一图像中的至少一个目标中的一个目标或者多个目标,第二目标为第二图像中的至少一个目标中的一个目标或者多个目标。
确定第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系,即确定第一图像和第二图像中的目标之间,哪些目标属于同一个目标,在第一图像和第二图像中属于同一个目标的两个目标之间建立对应关系。例如,第一目标为第一图像中需要进行运动信息检测的目标,第二目标即第二图像中与第一目标属于同一个目标的目标。
步骤302,根据上述对应关系,确定第一目标对应的第二图像中的目标作为第二目标。
通过步骤301,确定第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系后,基于该对应关系,可以确定第一图像中的第一目标对应的第二图像中的目标,即为第二目标。
基于本实施例,可以针对两个图像,确定两个图像中目标之间的对应关系,这样,便可以直接根据对应关系确定第一目标对应的第二图像中的第二目标,从而确定第二目标的效率。
可选地,在其中一些实施方式中,步骤301中,可以对第二图像中的至少一个目标的检测框进行跟踪,得到第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系。
基于本实施例,可以通过对目标的检测框进行跟踪的方式,得到不同图像中目标之间的对应关系。
图5是本公开再一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。如图5所示,在另一些实施方式中,步骤301可包括如下步骤:
步骤3011,获取第二图像到第一图像的光流信息。
本公开实施例中,光流信息用于表示视频或图像序列中图像之间像素点的运动或时序信息。第二图像到第一图像的光流信息,即第二图像到第一图像中的像素的二维运动场,用于表示第二图像中的像素点移动到第一图像中的移动情况。在其中一些实施方式中,可以利用视觉技术,例如,利用开源计算机视觉库(Open Source Computer Vision Library,OpenCV)方式,例如,将第二图像和第一图像输入基于OpenCV的模型中,由该模型输出第二图像和第一图像之间的光流信息。
步骤3012,分别针对第二图像中的至少一个目标中各目标的检测框,基于光流信息和第二图像中的目标的检测框,确定第二图像中的目标的检测框中像素点转移到第一图像中的位置。
步骤3013,获取第二图像中的目标的检测框中像素点转移到第一图像中的位置的集合与第一图像中的各检测框之间的交并比(Intersection over Union,IoU),即该集合与第一图像中的各检测框之间的覆盖比例。
可选地,在其中一些实施方式中,可以获取上述集合与第一图像中的各检测框之间的交集I、上述集合与第一图像中的各检测框之间的并集U,分别计算上述集合与第一图像中的各检测框之间的交集I与并集U之间的比值,作为集合与第一图像中的各检测框之间的覆盖比例。
步骤3014,建立第二图像中的目标与第一图像中交并比最大的检测框对应目标之间的对应关系,即以该第一图像中交并比最大的检测框对应目标作为该第二图像中的目标对应的目标。
基于本实施例,基于两个图像之间的光流信息确定第二图像中某一目标的检测框中像素点转移到第一图像中的位置的集合,分别获取该集合与第一 图像中的各检测框之间的交并比,交并比越大,说明第一图像中的该检测框与上述集合中像素的重复比例越大,第一图像中各检测框中与该集合交并比最大的检测框为第二图像中该目标的检测框的概率越大,通过两个图像之间的光流信息和第二图像中的目标的检测框中像素点转移到第一图像中的位置的集合与第一图像中的各检测框之间的交并比来确定两个图像中目标之间的对应关系,可以较准确、客观的确定两个图像中目标之间的对应关系。
图6是本公开还一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。如图6所示,在上述图2或图3所示实施例的基础上,步骤206可包括如下步骤:
步骤2061,获取第三坐标到第一坐标形成的向量。
其中,第三坐标到第一坐标形成的向量即从第三坐标到第一坐标形成的位移(displacement)向量,即从第三坐标到第一坐标形成的有向线段,该位移向量的大小,是从第三坐标到第一坐标的直线距离,该位移向量的方向是从第三坐标指向第一坐标。
步骤2062,基于第三坐标到第一坐标形成的向量的方向,确定第一目标在从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动方向,基于第三坐标到第一坐标形成的向量的范数与上述时间范围确定第一目标在上述时间范围内的运动速度,例如,可以获取第三坐标到第一坐标形成的向量的范数与上述时间范围的比值,作为第一目标在上述时间范围内的运动速度。其中,第一目标在上述时间范围内的运动方向和运动速度,构成第一目标在上述时间范围内的运动信息。
基于本实施例,可以基于第三坐标到第一坐标形成的向量,准确确定第一目标在上述对应时间范围内的运动方向和运动速度,从而获知第一目标的运动状态。
图7是本公开又一示例性实施例提供的对目标的运动信息进行检测的方法的流程示意图。如图7所示,在上述图2-图6所示实施例的基础上,在步骤205之前,还可包括如下步骤:
步骤401,对第二图像进行目标检测,得到第二目标的检测框。
步骤402,获取第二图像在第二相机坐标系中的深度信息。
另外,在确定第二图像在第二相机坐标系中的深度信息之后,根据该第二图像在第二相机坐标系中的深度信息,确定第二目标的检测框的深度信息。 其中,第二目标的检测框的深度信息,指的是第二目标的检测框在第二相机坐标系中的深度信息。
步骤403,基于第二目标的检测框在图像坐标系中的位置和第二目标的检测框的深度信息,确定第二目标在第二相机坐标系中的第二坐标。
基于本实施例,可以预先针对图像序列中时序位于第一图像之前的第二图像进行目标检测和获取深度信息,并由此确定第二目标在第二相机坐标系中的第二坐标,以便后续直接对该第二目标的第二坐标进行转换处理来确定第一目标在对应时间范围内的运动信息,从而提高场景中目标运动信息的检测效率。
可选地,在其中一些实施方式中,基于上述图7所示实施例,确定第二目标在所述第二相机坐标系中的第二坐标后,还可以存储第二目标的第二坐标,以便后续直接查询使用,从而提高场景中目标运动信息的检测效率。
可选地,还可以以第一图像作为新的第二图像,以图像序列中时序位于第一图像之后的第三图像作为新的第一图像,执行本公开上述任一实施例所述对目标的运动信息进行检测的方法,确定第三图像中目标从上述第一图像的采集时刻到第三图像的采集时刻对应时间范围内的运动信息。
基于本实施例,可以针对图像序列逐帧或间隔若干帧检测图像中目标的运动信息,从而实现在行驶对象的行驶过程中,对行驶对象外场景中目标的运动状态的持续检测,以便根据目标的运动状态控制行驶对象的行驶,保障行驶对象的安全行驶。
图8是本公开一示例性实施例提供的对目标的运动信息进行检测的方法的一个应用流程示意图。以下以一个应用实施例为例,对本公开实施例对目标的运动信息进行检测的方法进行进一步说明。如图8所示,该应用实施例包括:
步骤501,在行驶对象行驶过程中,行驶对象上的摄像装置采集该行驶对象外场景的图像,得到图像序列。
以该图像序列中,t-1时刻采集的图像作为第二图像,表示为I t-1,执行步骤502~步骤505以及步骤507;以该图像序列中,t时刻采集的图像作为第一图像,表示为I t,执行步骤508~步骤511。针对摄像装置执行步骤506。
步骤502,采用预设目标检测框架,对第二图像I t-1进行目标检测,得到第二图像I t-1中目标的检测框,由于检测到的目标的检测框可能为一个 或多个,以检测框集合BBox t-1来表示检测到的第二图像I t-1中目标的检测框,t-1时刻编号为k的目标(以下称为:目标k)的检测框描述为:
Figure PCTCN2022076765-appb-000001
其中,(x,y)表示目标k的检测框在图像坐标系中的坐标,w、h分别表示目标k的检测框的宽和高。
步骤503,采用预设深度估计方式,对第二图像I t-1进行深度估计,得到第二图像I t-1对应的深度图D t-1
其中,深度图D t-1中包括第二图像I t-1中不同像素点在t-1时刻对应的第二相机坐标系中的深度值,第二图像I t-1中像素点(i,j)在第二相机坐标系中的深度值可以表示为
Figure PCTCN2022076765-appb-000002
步骤504,从第二图像I t-1对应的深度图D t-1中获取第二图像I t-1中各目标的检测框中各像素点的深度值,并采用预设方式,基于第二图像I t-1中各目标的检测框中各像素点的深度值,确定第二图像I t-1中各目标的检测框的深度值。
其中,第二图像I t-1中各目标的检测框中各像素点的深度值,指的是第二图像I t-1中各目标的检测框中各像素点在第二相机坐标系中的深度值。
继续以t-1时刻目标k为例,从第二图像I t-1对应的深度图D t-1中,获取目标k的检测框
Figure PCTCN2022076765-appb-000003
中各像素点的深度值,然后采用本公开上述实施例的方式,基于目标k的检测框
Figure PCTCN2022076765-appb-000004
中各像素点的深度值确定目标k的检测框
Figure PCTCN2022076765-appb-000005
的深度值
Figure PCTCN2022076765-appb-000006
其中,步骤503~步骤504与步骤502可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤505,分别针对第二图像I t-1中各目标的检测框,基于各目标的检测框在图像坐标系中的位置和各目标的检测框的深度值,确定各目标在t-1时刻对应的第二相机坐标系中的3D坐标(第二坐标)。
例如,继续以t-1时刻目标k为例,可以通过如下方式得到目标k的检测框在t-1时刻对应的第二相机坐标系中的3D坐标
Figure PCTCN2022076765-appb-000007
Figure PCTCN2022076765-appb-000008
其中,K为摄像装置的内参,用于表示摄像装置自身属性,可以预先通过标定获得。
步骤506,获取摄像装置从t-1时刻到t时刻的位姿变化矩阵T t-1→t
其中,步骤506与步骤502~步骤505、以及步骤508~513可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤507,根据上述位姿变化矩阵T t-1→t,分别将第二图像I t-1中各目标在第二相机坐标系中的第二坐标转换到第一相机坐标系中的3D坐标(即上述第三坐标)。
例如,继续以t-1时刻目标k为例,可以通过如下方式将第二图像I t-1中目标k的检测框的第二坐标
Figure PCTCN2022076765-appb-000009
转换到第三坐标
Figure PCTCN2022076765-appb-000010
Figure PCTCN2022076765-appb-000011
步骤508,采用预设目标检测框架,对第一图像I t进行目标检测,得到第一图像I t中目标(即上述第一目标)的检测框,由于检测到的目标的检测框可能为一个或多个,以检测框集合BBox t来表示第一目标的检测框,t时刻第一目标中编号为k 的目标(以下称为:目标k )的检测框描述为:
Figure PCTCN2022076765-appb-000012
其中,(x,y)表示目标p的检测框在图像坐标系中的坐标,w、h分别表示目标k 的检测框的宽和高。
步骤509,采用预设深度估计方式,对第一图像I t进行深度估计,得到第一图像I t对应的深度图D t
其中,深度图D t中包括第一图像I t中不同像素点在t时刻对应的第一相机坐标系中的深度值,第一图像I t中像素点(i,j)在第一相机坐标系中的深度值可以表示为
Figure PCTCN2022076765-appb-000013
步骤510,从第一图像I t对应的深度图D t中获取第一目标的检测框中各像素点的深度值,并采用预设方式,基于第一目标的检测框中各像素点的深度值,确定第一目标的检测框的深度值。
其中,第一目标的检测框的深度值,指的是第一目标的检测框在第一相机坐标系中的深度值。
继续以t时刻目标k 为例,从第一图像I t对应的深度图D t中,获取目标k 的检测框
Figure PCTCN2022076765-appb-000014
中各像素点的深度值,然后采用本公开上述实施例的方式,基于目标k 的检测框
Figure PCTCN2022076765-appb-000015
中各像素点的深度值确定目标k的检测框
Figure PCTCN2022076765-appb-000016
的深度值
Figure PCTCN2022076765-appb-000017
其中,步骤509~步骤510与步骤508可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤511,基于第一目标的检测框在图像坐标系中的位置和第一目标的检测框的深度值,确定第一目标在t时刻对应的第一相机坐标系中的第一坐标。
其中的第一目标可以是一个目标,也可以是多个目标,第一目标是多个目标时,分别针对第一目标中的每个目标,基于每个目标的检测框在图像坐标系中的位置和深度值,确定该目标的检测框在t时刻对应的第一相机坐标系中的3D坐标(即上述第一坐标)。例如,继续以t时刻目标k 为例,可以通过如下方式得到目标k 的检测框在t时刻对应的第一相机坐标系中的3D坐标
Figure PCTCN2022076765-appb-000018
Figure PCTCN2022076765-appb-000019
其中,K为摄像装置的内参,用于表示摄像装置自身属性,可以预先通过标定获得。
步骤512,确定第一图像I t-1中第一目标和第二图像I t中目标之间的对应关系。
步骤513,根据上述对应关系,确定与第一目标对应的第二图像中的目标作为第二目标。
其中的第二目标可以是一个目标,也可以是多个目标。与第一目标相应地,第二目标可以为一个目标,也可以为多个目标,多个目标可以为相同类型的目标(例如都为人),也可以为不同类型的目标(例如包括人、车辆、建筑物等)。
其中,步骤512~步骤513可以通过本公开上述图4~图5任一实施例所述的方式,确定与第一目标对应的第二图像中的第二目标,
其中,步骤512~步骤513在通过步骤502和步骤508之后执行即可,与本应用实施例中的上述其他步骤之间可以同时执行,也可以以任意时间顺序执行,本公开实施例对此不做限制。
步骤514,基于第一目标的第一坐标和对应的第二目标的第三坐标,确定第一目标在从t-1时刻到t时刻对应时间范围Δ t内的运动信息。
其中的第一目标可以是一个目标,也可以是多个目标,第一目标是多个目标时,分别针对每个第一目标,执行该步骤514。
假设第二图像I t-1中的第二目标k与第一图像I t中的第一目标k 对应,根据第一目标k 在t时刻的第一坐标
Figure PCTCN2022076765-appb-000020
与对应的第二目标k在t-1时刻的第三坐标
Figure PCTCN2022076765-appb-000021
确定第一目标k 在对应时间范围Δ t内的运动信息。具体 来说,获取第三坐标
Figure PCTCN2022076765-appb-000022
到第一坐标
Figure PCTCN2022076765-appb-000023
形成的向量,以该向量的方向作为第一目标k 在对应时间范围Δ t内的运动方向,表示为:
Figure PCTCN2022076765-appb-000024
获取第三坐标
Figure PCTCN2022076765-appb-000025
到第一坐标
Figure PCTCN2022076765-appb-000026
形成的向量的范数
Figure PCTCN2022076765-appb-000027
通过如下方式获取第一目标k 在对应时间范围Δ t内的运动速度v:
Figure PCTCN2022076765-appb-000028
图9是本公开一示例性实施例提供的基于目标的运动信息控制行驶对象的方法的流程示意图。本实施例可应用在车辆、机器人、玩具车等行驶对象上。如图9所示,该实施例的基于目标的运动信息控制行驶对象的方法包括如下步骤:
步骤601,在行驶对象行驶过程中,通过行驶对象上的摄像装置采集行驶对象外场景的图像序列。
步骤602,以图像序列中的至少一频帧图像作为第一图像、以图像序列中位于所述第一图像之前、且与第一图像间隔预设帧数的至少一帧图像作为第二图像,利用本公开上述任一实施例行驶中的运动信息检测方法的方法,确定行驶对象外场景中目标的运动信息。
步骤603,根据上述目标的运动信息,生成用于控制行驶对象行驶状态的控制指令,以便控制行驶对象的行驶状态。
基于本实施例,可以在行驶对象行驶过程中,利用本公开任一实施例所述行驶中的运动信息检测方法确定驾驶场景中目标的运动信息,进而根据该目标的运动信息生成用于控制行驶对象行驶状态的控制指令,从而实现了利用计算机视觉技术检测驾驶场景中目标的运动信息、对行驶对象的智能驾驶控制,有利于满足无人驾驶场景中对行驶对象的实时智能驾驶控制,以保障行驶对象的安全行驶。
可选地,在其中一些实施方式中,上述控制指令例如可以包括但不限于以下至少之一:用于保持运动速度大小的控制指令、用于调整运动速度大小的控制指令(例如减速行驶的控制指令、加速行驶的控制指令等)、用于保持运动方向的控制指令、用于调整运动方向的控制指令(例如左转向的控制指令、右转向的控制指令、向左侧车道并线的控制指令、或者向右侧车道并线的控制指令等)、用于预警提示的控制指令(例如请注意前方目标等的提示消 息)、用于进行驾驶模式切换的控制指令(例如切换为自动巡航驾驶模式的控制指令、切换为人工驾驶模式的控制指令等)等等。本公开实施例的控制指令可以根据实际需求设置,而不限于上述控制指令。
本公开上述任一实施例提供的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法,可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开上述任一实施例提供的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法,可以由处理器执行,如处理器通过调用存储器存储的相应指令,来执行本公开上述任一实施例提供的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法。下文不再赘述。
示例性装置
图10是本公开一示例性实施例提供的对目标的运动信息进行检测的装置的结构示意图。该对目标的运动信息进行检测的装置可以设置于终端设备、服务器等电子设备中,也可以设置于车辆、机器人、玩具车等行驶对象上,执行本公开上述任一实施例的对目标的运动信息进行检测的方法。如图10所示,该对目标的运动信息进行检测的装置包括:检测模块701、第一获取模块702、第一确定模块703、第二确定模块704、第二获取模块705、转换模块706和第三确定模块707。其中:
检测模块701,用于对第一图像进行目标检测,得到第一目标的检测框,其中的第一图像为行驶对象上的摄像装置在行驶对象行驶过程中采集的行驶对象外场景的图像。
第一获取模块702,用于获取第一图像在对应的第一相机坐标系中的深度信息。
第一确定模块703,用于根据第一获取模块702获取的第一图像的深度信息,确定第一目标的检测框的深度信息。
第二确定模块704,用于基于检测模块701得到的第一目标的检测框在图像坐标系中的位置和第一确定模块703确定的第一目标的检测框的深度信息,确定第一目标在第一相机坐标系中的第一坐标。
第二获取模块705,用于获取摄像装置从采集第二图像到采集第一图像的位姿变化信息。其中,第二图像为第一图像所在图像序列中时序位于第一图 像之前、且与第一图像间隔预设帧数的图像。
转换模块706,用于根据第二获取模块705获取的位姿变化信息,将第二目标在第二图像对应的第二相机坐标系中的第二坐标转换到第一相机坐标系中的第三坐标。其中,第二目标为第一目标对应的第二图像中的目标。
第三确定模块707,用于基于第二确定模块704确定的第一坐标和转换模块706转换到的第三坐标,确定第一目标从第二图像的采集时刻到第一图像的采集时刻对应时间范围内的运动信息。
基于本实施例,利用计算机视觉技术,基于在行驶对象行驶过程中采集该行驶对象外场景的确定驾驶场景中目标的运动信息,无需借助于激光雷达,相比于采用激光雷达获取目标运动速度和方向的方式,由于无需通过高频率的发射激光束构建点云数据、在两个点云数据上进行目标检测和目标追踪、计算目标的运动速度和方向,可以避免大量的计算处理,节省处理时间,提高处理效率,有利于满足无人驾驶等对实时性要求较高的场景需求。
图11是本公开另一示例性实施例提供的对目标的运动信息进行检测的装置的结构示意图。如图11所示,在上述图11所示实施例的基础上,该实施例对目标的运动信息进行检测的装置中,第一确定模块703包括:第一获取单元7031,用于从第一图像的深度信息中获取第一目标的检测框中各像素点的深度值;第一确定单元7032,用于采用预设方式,基于第一获取单元7031获取的第一目标的检测框中各像素点的深度值,确定第一目标的检测框的深度信息。
可选地,在其中一些实施方式中,第一确定单元7032,具体用于选取第一获取单元7031获取的第一目标的检测框中各像素点的深度值中,出现频率最高的深度值作为第一目标的检测框的深度信息。
或者,在另一些实施方式中,第一确定单元7032,具体用于确定第一目标的检测框中各像素点的深度值中,分别处于预设各深度值范围内的像素点的数量;基于深度值处于同一深度值范围内的像素点的数量最多的深度值范围,确定第一目标的检测框的深度信息。
或者,在又一些实施方式中,第一确定单元7032,具体用于获取第一目标的检测框中各像素点的深度值的平均值,作为第一目标的检测框的深度信息。
可选地,再参见图11,在上述实施例对目标的运动信息进行检测的装置 中,还可以包括:第四确定模块708和第五确定模块709。其中:
第四确定模块708,用于确定第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系;其中,第一图像中的目标包括上述第一目标。
第五确定模块709,用于根据第四模块708确定的对应关系,确定第一目标对应的第二图像中的目标作为上述第二目标。
可选地,在其中一些实施方式中,第四确定模块708,具体用于对第二图像中的至少一个目标的检测框进行跟踪,得到第一图像中的至少一个目标与第二图像中的至少一个目标之间的对应关系。
或者,在另一些实施方式中,第四确定模块708可以包括:第二获取单元7081,用于获取第二图像到第一图像的光流信息;第二确定单元7082,用于分别针对第二图像中的至少一个目标中各目标的检测框,基于上述光流信息和第二图像中的目标的检测框,确定第二图像中的目标的检测框中像素点转移到第一图像中的位置;第三获取单元7083,用于获取目标的检测框中像素点转移到第一图像中的位置的集合与第一图像中的各检测框之间的交并比;建立单元7084,用于建立第二图像中的目标与第一图像中交并比最大的检测框对应目标之间的对应关系。
可选地,再参见图11,在其中一些实施方式中,第三确定模块707包括:第四获取单元7071,用于获取第三坐标到第一坐标形成的向量;第三确定单元7072,用于基于第四获取单元7071获取的向量的方向,确定第一目标在上述时间范围内的运动方向,基于上述向量的范数与上述时间范围确定第一目标在上述时间范围内的运动速度。
可选地,在上述各实施例对目标的运动信息进行检测的装置中,检测模块701,还可用于对第二图像进行目标检测,得到第二目标的检测框。第一获取模块702,还可用于获取第二图像在第二相机坐标系中的深度信息。第二确定模块704,还可用于基于检测模块701得到的第二目标的检测框在图像坐标系中的位置和第一确定模块703确定的第二目标的检测框的深度信息,确定第二目标在第二相机坐标系中的第二坐标。
可选地,再参见图11,在上述实施例对目标的运动信息进行检测的装置中,还可以包括:存储模块710,用于存储第二确定模块704确定的第二目标的所述第二坐标。
可选地,在上述各实施例对目标的运动信息进行检测的装置中,还可以 以第一图像作为新的第二图像,以图像序列中时序位于第一图像之后的第三图像作为新的第一图像,由对目标的运动信息进行检测的装置中的各模块执行相应的操作,以确定第三图像中目标从上述第一图像的采集时刻到第三图像的采集时刻对应时间范围内的运动信息。
图12是本公开一示例性实施例提供的基于目标的运动信息控制行驶对象的装置的结构示意图。该行驶中的基于目标的运动信息控制行驶对象的装置可以设置于车辆、机器人、玩具车等行驶对象上,来对行驶对象进行基于目标的运动信息控制行驶对象的基于目标的运动信息控制行驶对象的。如图12所示,该基于目标的运动信息控制行驶对象的装置包括:摄像装置801、运动信息检测装置802和控制装置803。其中:
摄像装置801,设置于行驶对象上,用于在行驶对象行驶过程中,采集行驶对象外场景的图像序列。
运动信息检测装置802,用于以上述图像序列中的至少一频帧图像作为第一图像、以上述图像序列中位于第一图像之前、且与第一图像间隔预设帧数的至少一帧图像作为第二图像,确定行驶对象外场景中目标的运动信息。该运动信息检测装置802具体可以通过上述图10-图11中任一实施例的对目标的运动信息进行检测的装置实现。
控制装置803,用于根据运动信息检测装置802检测到的目标的运动信息,生成用于控制行驶对象行驶状态的控制指令。
基于本实施例,在行驶对象行驶过程中,通过行驶对象上的摄像装置采集行驶对象外场景的图像序列,以图像序列中的至少一频帧图像作为第一图像、以图像序列中位于第一图像之前、且与第一图像间隔预设帧数的至少一帧图像作为第二图像,利用本公开任一实施例所述对目标的运动信息进行检测的方法确定驾驶场景中目标的运动信息,进而根据该目标的运动信息生成用于控制行驶对象行驶状态的控制指令,从而实现了利用计算机视觉技术检测驾驶场景中目标的运动信息、对行驶对象的智能驾驶控制,有利于满足无人驾驶场景中对行驶对象的实时智能驾驶控制,以保障行驶对象的安全行驶。
可选地,在其中一些实施方式中,上述控制指令例如可以包括但不限于以下至少之一:用于保持运动速度大小的控制指令、用于调整运动速度大小的控制指令、用于保持运动方向的控制指令、用于调整运动方向的控制指令、用于预警提示的控制指令、用于进行驾驶模式切换的控制指令等等。
示例性电子设备
下面,参考图13来描述根据本公开实施例的电子设备。图13图示了根据本公开实施例的电子设备的框图。如图13所示,电子设备包括一个或多个处理器11和存储器12。
处理器11可以是中央处理单元(Central Processing Unit,CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备10中的其他组件以执行期望的功能。
存储器12可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(Random Access Memory,RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(Read-Only Memory,ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器11可以运行所述程序指令,以实现上文所述的本公开的各个实施例的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如图像的深度信息、目标的检测框的深度信息、摄像装置的位姿变化信息等各种内容。
在一个示例中,电子设备10还可以包括:输入装置13和输出装置14,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
例如,该输入装置13可以是上述的麦克风或麦克风阵列,或者,该输入装置13可以是通信网络连接器。
此外,该输入设备13还可以包括例如键盘、鼠标等等。
该输出装置14可以向外部输出各种信息,包括确定出的第一目标从第二图像的采集时刻到第一图像的采集时刻对应的时间范围内的运动信息等。该输出设备14可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图13中仅示出了该电子设备10中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备10还可以包括任何其他适当的组件。
示例性计算机程序产品和计算机可读存储介质
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法中的步骤。
所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的对目标的运动信息进行检测的方法或者基于目标的运动信息控制行驶对象的方法中的步骤。
所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器((Erasable Programmable Read-Only Memory,EPROM)或闪存)、光纤、便携式紧凑盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限 制本公开为必须采用上述具体的细节来实现。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (11)

  1. 一种对目标的运动信息进行检测的方法,包括:
    对第一图像进行目标检测,得到第一目标的检测框,所述第一图像为行驶对象上的摄像装置在所述行驶对象行驶过程中采集的所述行驶对象外场景的图像;
    获取所述第一图像在对应的第一相机坐标系中的深度信息;
    根据所述第一图像的深度信息,确定所述第一目标的检测框的深度信息,并基于所述第一目标的检测框在图像坐标系中的位置和所述第一目标的检测框的深度信息,确定所述第一目标在所述第一相机坐标系中的第一坐标;
    获取所述摄像装置从采集第二图像到采集所述第一图像的位姿变化信息;其中,所述第二图像为所述第一图像所在图像序列中时序位于所述第一图像之前、且与所述第一图像间隔预设帧数的图像;
    根据所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标;其中,所述第二目标为所述第一目标对应的第二图像中的目标;
    基于所述第一坐标和所述第三坐标,确定所述第一目标从所述第二图像的采集时刻到所述第一图像的采集时刻对应时间范围内的运动信息。
  2. 根据权利要求1所述的方法,其中,所述根据所述第一图像在对应的第一相机坐标系中的深度信息,确定所述第一目标的检测框的深度信息,包括:
    从所述第一图像的深度信息中获取所述第一目标的检测框中各像素点的深度值;
    采用预设方式,基于所述第一目标的检测框中各像素点的深度值,确定所述第一目标的检测框的深度信息。
  3. 根据权利要求1所述的方法,其中,所述根据所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标之前,还包括:
    确定所述第一图像中的至少一个目标与所述第二图像中的至少一个目标之间的对应关系;
    所述第一图像中的至少一个目标包括所述第一目标;
    根据所述对应关系,确定所述第一目标对应的第二图像中的目标作为所述第二目标。
  4. 根据权利要求3所述的方法,其中,所述确定所述第一图像中的至少一个目标与所述第二图像中的至少一个目标之间的对应关系,包括:
    对所述第二图像中的至少一个目标的检测框进行跟踪,得到所述第一图像中的至少一个目标与所述第二图像中的至少一个目标之间的对应关系;
    或者,
    获取所述第二图像到所述第一图像的光流信息;
    分别针对所述第二图像中的至少一个目标中各目标的检测框,基于所述光流信息和所述第二图像中的目标的检测框,确定所述第二图像中的目标的检测框中像素点转移到所述第一图像中的位置;
    获取所述目标的检测框中像素点转移到所述第一图像中的位置的集合与所述第一图像中的各检测框之间的交并比;
    建立所述第二图像中的目标与所述第一图像中交并比最大的检测框对应目标之间的对应关系。
  5. 根据权利要求1所述的方法,其中,所述基于所述第一坐标和所述第三坐标,确定所述第一目标从所述第二图像的采集时刻到所述第一图像的采集时刻对应时间范围内的运动信息,包括:
    获取所述第三坐标到所述第一坐标形成的向量;
    基于所述向量的方向确定所述第一目标在所述时间范围内的运动方向,基于所述向量的范数与所述时间范围确定所述第一目标在所述时间范围内的运动速度,其中,所述第一目标在所述时间范围内的运动信息包括:所述第一目标在所述时间范围内的运动方向和运动速度。
  6. 根据权利要求1-5任一所述的方法,其中,所述根据所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标之前,还包括:
    对所述第二图像进行目标检测,得到所述第二目标的检测框;
    获取所述第二图像在所述第二相机坐标系中的深度信息,并根据所述第二图像在所述第二相机坐标系中的深度信息,确定所述第二目标的检测框的深度信息;
    基于所述第二目标的检测框在图像坐标系中的位置和所述第二目标的检测框的深度信息,确定所述第二目标在所述第二相机坐标系中的第二坐标。
  7. 一种基于目标的运动信息控制行驶对象的方法,包括:
    在行驶对象行驶过程中,通过所述行驶对象上的摄像装置采集所述行驶对象外场景的图像序列;
    以所述图像序列中的至少一帧图像作为第一图像、以所述图像序列中位于所述第一图像之前、且与所述第一图像间隔预设帧数的至少一帧图像作为第二图像,利用权利要求1-7任一所述的方法,确定所述场景中目标的运动信息;
    根据所述目标的运动信息生成用于控制所述行驶对象行驶状态的控制指令。
  8. 一种对目标的运动信息进行检测的装置,包括:
    检测模块,用于对第一图像进行目标检测,得到第一目标的检测框,所述第一图像为行驶对象上的摄像装置在所述行驶对象行驶过程中采集的所述行驶对象外场景的图像;
    第一获取模块,用于获取所述第一图像在对应的第一相机坐标系中的深度信息;
    第一确定模块,用于根据所述第一获取模块获取的所述第一图像的深度信息,确定所述第一目标的检测框的深度信息;
    第二确定模块,用于基于所述检测模块得到的所述第一目标的检测框在图像坐标系中的位置和所述第一确定模块确定的所述第一目标的检测框的深度信息,确定所述第一目标在所述第一相机坐标系中的第一坐标;
    第二获取模块,用于获取摄像装置从采集第二图像到采集所述第一图像的位姿变化信息;其中,所述第二图像为所述第一图像所在图像序列中时序位于所述第一图像之前、且与所述第一图像间隔预设帧数的图像;
    转换模块,用于根据所述第二获取模块获取的所述位姿变化信息,将第二目标在所述第二图像对应的第二相机坐标系中的第二坐标转换到所述第一相机坐标系中的第三坐标;其中,所述第二目标为所述第一目标对应的第二图像中的目标;
    第三确定模块,用于基于所述第二确定模块确定的所述第一坐标和所述转换模块转换到的所述第三坐标,确定所述第一目标从所述第二图像的采集 时刻到所述第一图像的采集时刻对应时间范围内的运动信息。
  9. 一种基于目标的运动信息控制行驶对象的装置,包括:
    摄像装置,设置于行驶对象上,用于在行驶对象行驶过程中,采集所述行驶对象外场景的图像序列;
    运动信息检测装置,用于以所述图像序列中的至少一频帧图像作为第一图像、以所述图像序列中位于所述第一图像之前、且与所述第一图像间隔预设帧数的至少一帧图像作为第二图像,确定所述场景中目标的运动信息;所述运动信息检测装置包括权利要求10-16任一所述的装置;
    控制装置,用于根据所述运动信息检测装置检测到的所述目标的运动信息,生成用于控制所述行驶对象行驶状态的控制指令。
  10. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-7任一所述的方法。
  11. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-7任一所述的方法。
PCT/CN2022/076765 2021-04-07 2022-02-18 对目标的运动信息进行检测的方法和装置、设备和介质 WO2022213729A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022557731A JP7306766B2 (ja) 2021-04-07 2022-02-18 ターゲット動き情報検出方法、装置、機器及び媒体
EP22783799.4A EP4246437A1 (en) 2021-04-07 2022-02-18 Method and apparatus for detecting motion information of target, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110373003.XA CN113096151B (zh) 2021-04-07 2021-04-07 对目标的运动信息进行检测的方法和装置、设备和介质
CN202110373003.X 2021-04-07

Publications (1)

Publication Number Publication Date
WO2022213729A1 true WO2022213729A1 (zh) 2022-10-13

Family

ID=76674988

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/076765 WO2022213729A1 (zh) 2021-04-07 2022-02-18 对目标的运动信息进行检测的方法和装置、设备和介质

Country Status (4)

Country Link
EP (1) EP4246437A1 (zh)
JP (1) JP7306766B2 (zh)
CN (1) CN113096151B (zh)
WO (1) WO2022213729A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115890639A (zh) * 2022-11-17 2023-04-04 浙江荣图智能科技有限公司 一种机器人视觉引导定位抓取控制系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096151B (zh) * 2021-04-07 2022-08-09 地平线征程(杭州)人工智能科技有限公司 对目标的运动信息进行检测的方法和装置、设备和介质
CN113936042B (zh) * 2021-12-16 2022-04-05 深圳佑驾创新科技有限公司 一种目标跟踪方法、装置和计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034040A1 (en) * 2014-07-29 2016-02-04 Sony Computer Entertainment Inc. Information processing device, information processing method, and computer program
CN110415276A (zh) * 2019-07-30 2019-11-05 北京字节跳动网络技术有限公司 运动信息计算方法、装置及电子设备
CN111723716A (zh) * 2020-06-11 2020-09-29 深圳地平线机器人科技有限公司 确定目标对象朝向的方法、装置、系统、介质及电子设备
CN112419385A (zh) * 2021-01-25 2021-02-26 国汽智控(北京)科技有限公司 一种3d深度信息估计方法、装置及计算机设备
CN112509047A (zh) * 2020-12-10 2021-03-16 北京地平线信息技术有限公司 基于图像的位姿确定方法、装置、存储介质及电子设备
CN112541553A (zh) * 2020-12-18 2021-03-23 深圳地平线机器人科技有限公司 目标对象的状态检测方法、装置、介质以及电子设备
CN113096151A (zh) * 2021-04-07 2021-07-09 地平线征程(杭州)人工智能科技有限公司 对目标的运动信息进行检测的方法和装置、设备和介质

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019084804A1 (zh) * 2017-10-31 2019-05-09 深圳市大疆创新科技有限公司 一种视觉里程计及其实现方法
CN108734726A (zh) * 2017-12-04 2018-11-02 北京猎户星空科技有限公司 一种目标跟踪方法、装置、电子设备及存储介质
CN111344644B (zh) 2018-08-01 2024-02-20 深圳市大疆创新科技有限公司 用于基于运动的自动图像捕获的技术
CN111354037A (zh) 2018-12-21 2020-06-30 北京欣奕华科技有限公司 一种定位方法及系统
CN109816690A (zh) * 2018-12-25 2019-05-28 北京飞搜科技有限公司 基于深度特征的多目标追踪方法及系统
CN111402286B (zh) * 2018-12-27 2024-04-02 杭州海康威视系统技术有限公司 一种目标跟踪方法、装置、系统及电子设备
CN109727273B (zh) * 2018-12-29 2020-12-04 北京茵沃汽车科技有限公司 一种基于车载鱼眼相机的移动目标检测方法
EP3680858A1 (en) * 2019-01-11 2020-07-15 Tata Consultancy Services Limited Dynamic multi-camera tracking of moving objects in motion streams
CN111213153A (zh) * 2019-01-30 2020-05-29 深圳市大疆创新科技有限公司 目标物体运动状态检测方法、设备及存储介质
CN111247557A (zh) * 2019-04-23 2020-06-05 深圳市大疆创新科技有限公司 用于移动目标物体检测的方法、系统以及可移动平台
CN112015170A (zh) * 2019-05-29 2020-12-01 北京市商汤科技开发有限公司 运动物体检测及智能驾驶控制方法、装置、介质及设备
JP7383870B2 (ja) * 2019-05-30 2023-11-21 モービルアイ ビジョン テクノロジーズ リミテッド デバイス、方法、システムおよびコンピュータプログラム
CN110533699B (zh) * 2019-07-30 2024-05-24 平安科技(深圳)有限公司 基于光流法的像素变化的动态多帧测速方法
JP7339616B2 (ja) * 2019-08-07 2023-09-06 眞次 中村 速度測定装置および速度測定方法
CN110929567B (zh) * 2019-10-17 2022-09-27 北京全路通信信号研究设计院集团有限公司 基于单目相机监控场景下目标的位置速度测量方法及系统
CN111179311B (zh) 2019-12-23 2022-08-19 全球能源互联网研究院有限公司 多目标跟踪方法、装置及电子设备
CN111583329B (zh) * 2020-04-09 2023-08-04 深圳奇迹智慧网络有限公司 增强现实眼镜显示方法、装置、电子设备和存储介质
CN111897429A (zh) 2020-07-30 2020-11-06 腾讯科技(深圳)有限公司 图像显示方法、装置、计算机设备及存储介质
CN112541938A (zh) * 2020-12-17 2021-03-23 通号智慧城市研究设计院有限公司 一种行人速度测量方法、系统、介质及计算设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034040A1 (en) * 2014-07-29 2016-02-04 Sony Computer Entertainment Inc. Information processing device, information processing method, and computer program
CN110415276A (zh) * 2019-07-30 2019-11-05 北京字节跳动网络技术有限公司 运动信息计算方法、装置及电子设备
CN111723716A (zh) * 2020-06-11 2020-09-29 深圳地平线机器人科技有限公司 确定目标对象朝向的方法、装置、系统、介质及电子设备
CN112509047A (zh) * 2020-12-10 2021-03-16 北京地平线信息技术有限公司 基于图像的位姿确定方法、装置、存储介质及电子设备
CN112541553A (zh) * 2020-12-18 2021-03-23 深圳地平线机器人科技有限公司 目标对象的状态检测方法、装置、介质以及电子设备
CN112419385A (zh) * 2021-01-25 2021-02-26 国汽智控(北京)科技有限公司 一种3d深度信息估计方法、装置及计算机设备
CN113096151A (zh) * 2021-04-07 2021-07-09 地平线征程(杭州)人工智能科技有限公司 对目标的运动信息进行检测的方法和装置、设备和介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115890639A (zh) * 2022-11-17 2023-04-04 浙江荣图智能科技有限公司 一种机器人视觉引导定位抓取控制系统

Also Published As

Publication number Publication date
CN113096151A (zh) 2021-07-09
JP2023523527A (ja) 2023-06-06
JP7306766B2 (ja) 2023-07-11
CN113096151B (zh) 2022-08-09
EP4246437A1 (en) 2023-09-20

Similar Documents

Publication Publication Date Title
TWI691730B (zh) 用於檢測運輸工具的環境資訊之方法和系統
WO2022213729A1 (zh) 对目标的运动信息进行检测的方法和装置、设备和介质
Shin et al. Roarnet: A robust 3d object detection based on region approximation refinement
JP7345504B2 (ja) Lidarデータと画像データの関連付け
WO2020052540A1 (zh) 对象标注方法、移动控制方法、装置、设备及存储介质
WO2019179464A1 (zh) 用于预测目标对象运动朝向的方法、车辆控制方法及装置
CN110363058B (zh) 使用单触发卷积神经网络的用于避障的三维对象定位
US11270457B2 (en) Device and method for detection and localization of vehicles
US10668921B2 (en) Enhanced vehicle tracking
EP3766044B1 (en) Three-dimensional environment modeling based on a multicamera convolver system
US10679369B2 (en) System and method for object recognition using depth mapping
WO2019129255A1 (zh) 一种目标跟踪方法及装置
Manglik et al. Forecasting time-to-collision from monocular video: Feasibility, dataset, and challenges
KR20210022703A (ko) 운동 물체 검출 및 지능형 운전 제어 방법, 장치, 매체 및 기기
WO2020233436A1 (zh) 车辆速度确定方法及车辆
WO2023036083A1 (zh) 传感器数据处理方法、系统及可读存储介质
Manglik et al. Future near-collision prediction from monocular video: Feasibility, dataset, and challenges
Li et al. Vehicle object detection based on rgb-camera and radar sensor fusion
Mukherjee et al. Ros-based pedestrian detection and distance estimation algorithm using stereo vision, leddar and cnn
Pandey et al. Light-weight object detection and decision making via approximate computing in resource-constrained mobile robots
US20220375134A1 (en) Method, device and system of point cloud compression for intelligent cooperative perception system
WO2021223166A1 (zh) 状态信息确定方法、装置、系统、可移动平台和存储介质
Chi et al. Dynamic small target detection and tracking based on hierarchical network and adaptive input image stream
US20240062386A1 (en) High throughput point cloud processing
Chen et al. 3D Car Tracking using Fused Data in Traffic Scenes for Autonomous Vehicle.

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022557731

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 17907662

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22783799

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022783799

Country of ref document: EP

Effective date: 20230614

NENP Non-entry into the national phase

Ref country code: DE