WO2022099510A1 - Object identification method and apparatus, computer device, and storage medium - Google Patents

Object identification method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022099510A1
WO2022099510A1 PCT/CN2020/128125 CN2020128125W WO2022099510A1 WO 2022099510 A1 WO2022099510 A1 WO 2022099510A1 CN 2020128125 W CN2020128125 W CN 2020128125W WO 2022099510 A1 WO2022099510 A1 WO 2022099510A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
image
feature
target
features
Prior art date
Application number
PCT/CN2020/128125
Other languages
French (fr)
Chinese (zh)
Inventor
张磊杰
Original Assignee
深圳元戎启行科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳元戎启行科技有限公司 filed Critical 深圳元戎启行科技有限公司
Priority to CN202080092994.8A priority Critical patent/CN115004259B/en
Priority to PCT/CN2020/128125 priority patent/WO2022099510A1/en
Publication of WO2022099510A1 publication Critical patent/WO2022099510A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to an object recognition method, apparatus, computer equipment and storage medium.
  • Self-driving cars are intelligent cars that realize unmanned driving through computer systems.
  • the computer system automatically and safely controls the driving of the car without the active operation of the human being.
  • In the process of driving an autonomous vehicle it is necessary to detect obstacles on the way and avoid obstacles in time.
  • the inventor realizes that the current methods for identifying obstacles cannot accurately identify obstacles, resulting in low obstacle avoidance capability of the autonomous vehicle, and thus low safety of the autonomous vehicle.
  • an object recognition method, apparatus, computer device and storage medium are provided.
  • An object recognition method includes:
  • the target moving object is controlled to move based on the position corresponding to the scene object.
  • An object recognition device includes:
  • the current scene image acquisition module is used to acquire the current scene image and the current scene point cloud corresponding to the target moving object;
  • an initial point cloud feature obtaining module used for performing image feature extraction on the current scene graph to obtain initial image features, and performing point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
  • the target image feature obtaining module is used to obtain the target image position corresponding to the current scene image, and based on the initial point cloud features, the point cloud features corresponding to the target image position, perform fusion processing on the initial image features, Get the target image features;
  • the target point cloud feature obtaining module is used to obtain the target point cloud position corresponding to the current scene point cloud, and based on the initial image features, the image features corresponding to the target point cloud position, perform the initial point cloud feature Fusion processing to obtain the target point cloud features;
  • a position determination module for determining an object position corresponding to a scene object based on the target image feature and the target point cloud feature
  • a motion control module configured to control the target moving object to move based on the position corresponding to the scene object.
  • a computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored therein, the computer-readable instructions, when executed by the processor, cause the one or more processors to execute The following steps:
  • the target moving object is controlled to move based on the position corresponding to the scene object.
  • One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the target moving object is controlled to move based on the position corresponding to the scene object.
  • FIG. 1 is an application scenario diagram of an object recognition method according to one or more embodiments
  • FIG. 2 is a schematic flowchart of an object recognition method according to one or more embodiments
  • FIG. 3 is a schematic flowchart of steps for obtaining target point cloud features according to one or more embodiments
  • FIG. 4 is a schematic diagram of an object recognition system in accordance with one or more embodiments.
  • FIG. 5 is a block diagram of an apparatus for object recognition in accordance with one or more embodiments.
  • FIG. 6 is a block diagram of a computer device in accordance with one or more embodiments.
  • the object recognition method provided by this application can be applied to the application environment shown in FIG. 1 .
  • the application environment includes a terminal 102 and a server 104, and a point cloud collection device and an image collection device are installed in the terminal 102.
  • the point cloud collection device is used to collect point cloud data, such as the point cloud of the current scene.
  • the image acquisition device is used to acquire images, such as the current scene image.
  • the terminal 102 can transmit the collected current scene image and the current scene point cloud to the server 104, and the server 104 can obtain the current scene image and the current scene point cloud corresponding to the terminal 102, and the target moving object refers to the image feature of the current scene graph.
  • Extract obtain the initial image features, extract the point cloud features of the current scene point cloud, obtain the initial point cloud features, obtain the target image position corresponding to the current scene image, and based on the initial point cloud features, the point cloud features corresponding to the target image position, Perform fusion processing on the initial image features to obtain the target image features, obtain the target point cloud position corresponding to the current scene point cloud, and fuse the initial point cloud features based on the image features corresponding to the target point cloud position in the initial image features
  • the target point cloud feature is obtained, the object position corresponding to the scene object is determined based on the target image feature and the target point cloud feature, and the terminal 102 is controlled to move based on the position corresponding to the scene object.
  • the terminal 102 may be, but is not limited to, self-driving cars and mobile robots.
  • the server 104 can be implemented by an independent server or a server cluster composed of multiple servers.
  • the point cloud collection device can be any device that can collect point cloud data, and it can be but not limited to lidar.
  • the image acquisition device may be any device that can acquire image data, and may be, but not limited to, a camera.
  • the above application scenario is only an example, and does not constitute a limitation on the object recognition method provided by the embodiment of the present application.
  • the object recognition method provided by the embodiment of the present application can also be applied to other application scenarios.
  • the above object recognition method The method may be performed by the terminal 102 .
  • an object recognition method is provided, and the method is applied to the server 102 in FIG. 1 as an example for description, including the following steps:
  • S202 Acquire a current scene image and a current scene point cloud corresponding to the target moving object.
  • a moving object refers to an object in a state of motion, which can be a living object, can be but not limited to humans and animals, or can be an inanimate object, can be but is not limited to vehicles and drones, such as Can be an autonomous vehicle.
  • the target moving object refers to the moving object whose movement is to be controlled according to the scene image and the scene point cloud.
  • the target moving object is, for example, the terminal 102 in FIG. 1 .
  • the scene image refers to the image corresponding to the scene where the moving object is located.
  • the scene image may reflect the environment where the moving object is located, for example, the scene image may include one or more of lanes, vehicles, pedestrians or obstacles in the environment.
  • the scene image may be acquired by an image acquisition device built into the moving object, for example, it may be acquired by a camera installed in an autonomous vehicle, or it may be acquired by an image acquisition device external to the moving object and associated with the moving object, For example, it may be acquired by an image acquisition device connected to a moving object through a connection line or a network, for example, it may be acquired by a camera connected to the autonomous vehicle via a network on the road where the autonomous driving vehicle is located.
  • the current scene image refers to an image corresponding to the current scene where the target moving object is located at the current time.
  • the current scene refers to the scene where the target moving object is located at the current time.
  • the external image acquisition device can transmit the acquired scene image to the moving object.
  • a point cloud refers to a collection of three-dimensional data points in a three-dimensional coordinate system. For example, it can be a collection of three-dimensional data points corresponding to the surface of an object in a three-dimensional coordinate system.
  • a point cloud can represent the outer surface of an object. surface shape.
  • a three-dimensional data point refers to a point in a three-dimensional space, and the three-dimensional data point includes three-dimensional coordinates, and the three-dimensional coordinates may include, for example, an X coordinate, a Y coordinate, and a Z coordinate.
  • the three-dimensional data points may also include at least one of RGB color, grayscale value, or time.
  • the scene point cloud refers to a collection of 3D data points corresponding to the scene.
  • the point cloud can be obtained by scanning with lidar. Among them, lidar is an active sensor. By emitting a laser beam, after hitting the laser beam on the surface of the object, the laser beam is bounced, and the bounced laser signal is collected to obtain the point cloud of the
  • the scene point cloud refers to the point cloud corresponding to the scene where the moving object is located.
  • the scene point cloud can be collected by the built-in point cloud collection device of the moving object, for example, it can be obtained by scanning the lidar installed in the autonomous vehicle, or it can be the point cloud collection device external to the moving object and associated with the moving object. For example, it can be obtained by scanning a point cloud collection device connected to a moving object through a connecting line or network. For example, it can be obtained by scanning a lidar connected to an automatic value vehicle through a network on the road where the autonomous vehicle is located.
  • the current scene point cloud refers to the point cloud corresponding to the current scene where the target moving object is located at the current time.
  • the external point cloud acquisition device can transmit the scanned scene point cloud to the moving object.
  • the target moving object can collect the current scene in real time through an image acquisition device to obtain an image of the current scene, and can collect the current scene in real time through a point cloud acquisition device to obtain a point cloud of the current scene.
  • the target moving object can send the collected current scene image and the current scene point cloud to the server, and the server can determine the position of the obstacle on the running path of the target moving object according to the current scene image and the current scene point. The position is transmitted to the target moving object, so that the target moving object can avoid obstacles while moving.
  • S204 perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features.
  • the image feature (Image Feature) is used to reflect the feature of the image
  • the point cloud feature (Point Feature) is used to reflect the feature of the point cloud.
  • Image features have strong representation ability for slender objects such as pedestrians.
  • the point cloud feature can be represented in the form of a vector, and the point cloud feature can also be called a point cloud feature vector, and the point cloud feature vector can be, for example, (a1, b1, c1).
  • Point cloud features can also be called point features.
  • Point cloud features have lossless representation ability for point cloud information.
  • the image feature may be represented in the form of a vector, and the image feature may also be referred to as an image feature vector, and the image feature vector may be (a2, b2, c2), for example.
  • the initial image features refer to image features obtained by feature extraction from the current scene image.
  • the initial point cloud feature refers to the point cloud feature obtained by feature extraction from the current scene point cloud.
  • the server may obtain the object recognition model, and the object recognition model may include an image feature extraction layer and a point cloud feature extraction layer.
  • the server may input the current scene image into the image feature extraction layer, and the image feature extraction layer performs feature extraction on the current scene image, such as convolution, to obtain image features.
  • the server can obtain the initial image features according to the image features output by the image feature extraction layer, for example, the image features output by the image feature recognition model can be used as the initial image features.
  • the server can input the current scene point cloud into the point cloud feature extraction layer, and the point cloud feature extraction layer performs feature extraction on the current scene point cloud, such as convolution, to obtain point cloud features.
  • the server can obtain the initial point cloud feature according to the point cloud feature output by the point cloud feature extraction layer, for example, the point cloud feature output by the point cloud feature recognition model can be used as the initial point cloud feature.
  • the image feature extraction layer and the point cloud feature extraction layer are jointly trained.
  • the server can input the scene image into the image feature extraction layer, input the scene point cloud into the point cloud feature extraction layer, and obtain the predicted image features output by the image feature extraction layer and the predicted points output by the point cloud feature extraction layer.
  • Cloud features obtain the standard image features corresponding to the scene image
  • the standard image features refer to the real image features
  • the standard point cloud features refer to the real point cloud features.
  • the first loss value is determined according to the predicted image feature, for example, the first loss value is obtained according to the difference between the predicted image feature and the standard image feature.
  • the first loss value is determined according to the predicted point cloud feature, for example, the second loss value is obtained according to the difference between the predicted point cloud feature and the standard point cloud feature.
  • the total loss value is determined according to the first loss value and the second loss value, and the total loss value may include the first loss value and the second loss value, for example, may be the result of adding the first loss value and the second loss value.
  • the server can use the total loss value to adjust the parameters of the image feature extraction layer and the parameters of the point cloud feature extraction layer to obtain the image feature extraction layer after training and the point cloud feature extraction layer after training.
  • S206 Acquire the target image position corresponding to the current scene image, and based on the point cloud features corresponding to the target image position in the initial point cloud features, perform fusion processing on the initial image features to obtain the target image features.
  • the image position refers to the position of the image in the image coordinate system, and may include the corresponding positions of each pixel in the image in the image coordinate system.
  • the image coordinate system refers to the coordinate system adopted by the image acquired by the image acquisition device, and the coordinates of each pixel in the image can be obtained according to the image coordinate system.
  • the target image position refers to the position of each pixel in the current scene image in the image coordinate system.
  • the image position may be determined according to the parameters of the image acquisition device, for example, the parameters of the image acquisition device may be camera parameters, and the camera parameters may include external parameters of the camera and internal parameters of the camera.
  • the image coordinate system is a two-dimensional coordinate system, and the coordinates in the image coordinate system include abscissa and ordinate.
  • the point cloud feature corresponding to the target image position refers to the point cloud feature at the position in the point cloud coordinate system corresponding to the target image position in the initial point cloud feature.
  • the position in the point cloud coordinate system corresponding to the target image position may or may not overlap with the position corresponding to the initial point cloud feature.
  • the server can fuse the point cloud features corresponding to the overlapping positions with the initial image features to obtain the target image features.
  • the position of the target image is position A
  • the position in the corresponding point cloud coordinate system is position B
  • the position of the initial point cloud feature in the point cloud coordinate system is position C
  • the overlapping part of position C and position B is position D
  • the point cloud features corresponding to position D can be fused into the initial image features.
  • the fusion process refers to establishing an association relationship between different features at the same position in the same coordinate system, for example, establishing an association relationship between the image feature a corresponding to the position A and the point cloud feature b.
  • the fusion process may also be to obtain fusion features including the different features according to different features at the same position in the same coordinate system, for example, according to the image feature a corresponding to the position A and the point cloud feature b to obtain the fusion feature including a and b.
  • the fusion features can be represented in vector form.
  • the server may obtain the position in the point cloud coordinate system corresponding to the target image position, and according to the initial point cloud feature, the point cloud feature at the position in the point cloud coordinate system corresponding to the target image position, for the initial image
  • the features are fused to obtain the target image features.
  • the object recognition model may also include an image airspace fusion layer, and the server may input the initial point cloud features and the initial image features into the image airspace fusion layer, and the image airspace fusion layer may determine the position of the initial point cloud feature and the initial image feature. The coincident position between the positions is extracted, and the point cloud feature at the coincident position is extracted from the initial point cloud feature, and fused into the initial image feature to obtain the target image feature.
  • the point cloud position refers to the position of the point cloud in the point cloud coordinate system, and may include the corresponding positions of each three-dimensional data point in the point cloud in the point cloud coordinate system.
  • the coordinates corresponding to each 3D data point in the point cloud can be obtained according to the point cloud coordinate system.
  • the target point cloud position refers to the point cloud position corresponding to each 3D data point in the current scene point cloud.
  • the position of the point cloud may be determined according to the parameters of the point cloud collection device, and the parameters of the point cloud collection device may be, for example, the parameters of the laser radar.
  • the point cloud coordinate system is a three-dimensional coordinate system, and the coordinates in the point cloud coordinate system may include X coordinate, Y coordinate and Z coordinate. Of course, the point cloud coordinate system can also be other types of three-dimensional coordinate systems, which are not limited here.
  • the image feature corresponding to the target point cloud position refers to the image feature at the position in the image coordinate system corresponding to the target point cloud position in the initial image feature.
  • the position in the image coordinate system corresponding to the target point cloud position may or may not overlap with the position corresponding to the initial image feature.
  • the server can fuse the image features corresponding to the overlapping positions with the initial point cloud features to obtain the target point cloud features.
  • the server may obtain the position in the image coordinate system corresponding to the target point cloud position, and according to the initial image features, the image features at the position in the image coordinate system corresponding to the target point cloud position, for the initial point cloud feature Fusion processing is performed to obtain the target point cloud features.
  • the object recognition model may also include a point cloud airspace fusion layer, and the server may input the initial point cloud features and initial image features into the point cloud airspace fusion layer, and the point cloud airspace fusion layer may determine the position of the initial point cloud feature and the initial The coincident position between the positions of the image features, and extract the image features at the coincident position from the initial image features, and fuse them into the initial point cloud features to obtain the target point cloud features.
  • S210 Determine the object position corresponding to the scene object based on the target image feature and the target point cloud feature.
  • the scene object refers to the object in the scene where the target movement object is located, and the scene object may be a living object, such as a person or an animal, or an inanimate object, such as a vehicle, a tree, etc. or stone.
  • the object position may include at least one of the position of the scene object in the current scene image or the position of the scene object in the current scene point cloud.
  • the scene objects in the current scene image and the scene objects in the current scene point cloud may be the same, or there may be differences.
  • the server may perform calculation according to the position of the target image feature and the position of the target point cloud feature to obtain the position of each scene object.
  • the server may perform time series fusion of target image features obtained from different video frames to obtain fused target image features, and perform image task learning according to the fused target image features.
  • Temporal fusion refers to concatenating image features of different frames, or concatenating point cloud features of different frames, or concatenating voxel features of different frames.
  • the server can fuse the target point cloud features obtained from different scene point clouds in time series, obtain the fused target point cloud features, and perform point cloud task learning according to the fused target point cloud features.
  • the server can fuse the fused target image features and the fused target point cloud features to obtain the secondary fused target image features and the secondary fused target point cloud features. Task learning, using the target point cloud features after secondary fusion to perform point cloud task learning.
  • the target moving object is controlled to move based on the position corresponding to the scene object.
  • the server can transmit the position corresponding to the scene object to the target running object, and the target running object can determine the movement route that can avoid the scene object according to the position corresponding to the scene object, and move according to the movement route, so as to avoid the scene object , to ensure safe movement.
  • the current scene image and the current scene point cloud corresponding to the target moving object are obtained, image feature extraction is performed on the current scene image to obtain initial image features, and point cloud feature extraction is performed on the current scene point cloud to obtain the initial point cloud.
  • image feature extraction is performed on the current scene image to obtain initial image features
  • point cloud feature extraction is performed on the current scene point cloud to obtain the initial point cloud.
  • obtain the target image position corresponding to the current scene image and fuse the initial image features based on the point cloud features corresponding to the target image position in the initial point cloud features to obtain the target image features
  • obtain the target point corresponding to the current scene point cloud Cloud position based on the image features corresponding to the target point cloud position in the initial image features, fuse the initial point cloud features to obtain the target point cloud features, and determine the object position corresponding to the scene object based on the target image features and target point cloud features
  • the target moving object is controlled to move based on the position corresponding to the scene object, so that the position of the scene object can be accurately obtained, so that the target moving object
  • the target image position corresponding to the current scene image is obtained, and based on the point cloud features corresponding to the target image position in the initial point cloud features, the initial image features are fused to obtain the target image features, including: according to the point cloud features The coordinate conversion relationship between the coordinate system and the image coordinate system, the target point cloud position is converted into a position in the image coordinate system, and the first conversion position is obtained; and, the first coincidence position of the first conversion position and the target image position is obtained, In the initial point cloud feature, the point cloud feature corresponding to the first coincident position is fused into the image feature corresponding to the first coincident position in the initial image feature to obtain the target image feature.
  • the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system refers to the transformation relationship between the coordinates in the point cloud coordinate system and the coordinates in the image coordinate system.
  • the object corresponding to the coordinates before transformation in the point cloud coordinate system is the same as the object corresponding to the coordinates after transformation in the image coordinate system.
  • the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system is referred to as the first transformation relationship.
  • the coordinates of the position represented by the coordinates in the point cloud coordinate system in the image coordinate system can be determined through the first transformation relationship, that is, the image position corresponding to the target point cloud location in the image coordinate system can be determined through the first transformation relationship.
  • (x1, y1, z1) in the point cloud coordinate system can be converted into coordinates (x2, y2) in the image coordinate system through the first conversion relationship.
  • converting coordinates in one coordinate system to coordinates in another coordinate system can be called the process of physical space projection.
  • the first transformation position refers to the position corresponding to the target point cloud position in the image coordinate system, and the first transformation position is the position in the two-dimensional coordinate system.
  • the first conversion position may include the two-dimensional coordinates in the image coordinate system of the three-dimensional coordinates of all or part of the three-dimensional data points corresponding to the target point cloud position.
  • the first coincident position refers to a position where the first conversion position coincides with the target image position.
  • the point cloud feature corresponding to the first coincidence position refers to the point cloud feature corresponding to the position of the first coincidence position in the point cloud coordinate system.
  • the first conversion position includes (x1, y1), (x2, y2) and (x3, y3)
  • the target image position includes (x2, y2), (x3, y3) and (x4, y4)
  • the first The coincident position includes (x2, y2) and (x3, y3), if the position of (x2, y2) in the point cloud coordinate system is (x1, y1, z1), (x3, y3)
  • the position in the point cloud coordinate system is (x2, y2, z2)
  • the point cloud features corresponding to the first coincident position include the point cloud features corresponding to (x1, y1, z1) and the point cloud features corresponding to (x2, y2, z2).
  • the server may stitch the point cloud feature corresponding to the first coincident position with the image feature corresponding to the first coincident position to obtain the target image feature.
  • the server may obtain the target image feature by splicing the point cloud feature corresponding to the first coincidence position to the image feature corresponding to the first coincidence position.
  • the server can stitch vector B and vector A to obtain a stitched vector.
  • the feature of the target image can be obtained from the vector, for example, the spliced vector can be used as the feature of the target image, or the feature of the target image can be obtained by processing the spliced vector.
  • the server may convert the target image position into a position in the point cloud coordinate system according to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, and obtain the point cloud position corresponding to the target image position.
  • the cloud position proposes the corresponding point cloud features from the initial point cloud features, and fuses them into the initial image features to obtain the target image features.
  • the target point cloud position is converted into a position in the image coordinate system, the first conversion position is obtained, and the first conversion position and the target image position are obtained.
  • the first coincidence position of the initial point cloud feature, the point cloud feature corresponding to the first coincidence position in the initial point cloud feature is fused into the image feature corresponding to the first coincidence position in the initial image feature, and the target image feature is obtained, so that the target image feature includes Image features and point cloud features improve the richness of the features in the target image features and improve the representation ability of the target image features.
  • the position of the target point cloud corresponding to the point cloud of the current scene is obtained, and based on the image features corresponding to the position of the target point cloud in the initial image features, the initial point cloud features are fused to obtain the target point cloud features, including: According to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, the target image position is converted into a position in the point cloud coordinate system to obtain the second conversion position; and, the second conversion position and the target point cloud position are obtained. For two coincident positions, the image features corresponding to the second coincident positions in the initial image features are fused into the point cloud features corresponding to the second coincident positions in the initial point cloud features to obtain the target point cloud features.
  • the coordinate transformation relationship between the image coordinate system and the point cloud coordinate system refers to the transformation relationship between the coordinates in the image coordinate system and the coordinates in the point cloud coordinate system.
  • the object corresponding to the coordinates before transformation in the image coordinate system is the same as the object corresponding to the coordinates after transformation in the point cloud coordinate system.
  • the coordinate transformation relationship between the intermediate image coordinate system and the point cloud coordinate system is described below as a second transformation relationship.
  • the coordinates of the position represented by the coordinates in the image coordinate system in the point cloud coordinate system can be determined through the second conversion relationship.
  • the second conversion position refers to the position corresponding to the target image position in the point cloud coordinate system, and the second conversion position is the position in the three-dimensional coordinate system.
  • the second conversion position may include the three-dimensional coordinates of all or part of the two-dimensional coordinates corresponding to the target image position in the point cloud coordinate system.
  • the second coincident position refers to the position where the second transformation position coincides with the target point cloud position.
  • the image feature corresponding to the second coincident position refers to the image feature corresponding to the two-dimensional coordinates in the image coordinate system corresponding to the second coincident position.
  • the target point cloud feature is a feature obtained by fusing the image feature corresponding to the second coincident position into the point cloud feature corresponding to the second coincident position in the initial point cloud feature.
  • the server may perform feature fusion between the image feature corresponding to the second coincident position and the point cloud feature corresponding to the second coincident position to obtain the target point cloud feature.
  • Feature fusion may include one or more of arithmetic operations, combination or concatenation of features. Arithmetic operations may include one or more of addition, subtraction, multiplication or division.
  • the server may obtain the target point cloud feature by splicing the image feature corresponding to the second coincident position to the point cloud feature corresponding to the second coincident position.
  • the server can splicing the vector C and the vector D to obtain the spliced vector.
  • the vector obtains the target point cloud feature.
  • the spliced vector can be used as the target point cloud feature, or the spliced vector can be processed to obtain the target point cloud feature.
  • the server may convert the position of the target point cloud into a position in the image coordinate system according to the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system, and obtain the image position corresponding to the target point cloud position.
  • the position proposes the corresponding image features from the initial image features, and fuses them into the initial point cloud features to obtain the target point cloud features.
  • the image feature at the same position as the image position can be extracted from the initial image feature, or the image feature at the position where the difference from the image position is smaller than the position difference threshold can be extracted from the initial image feature, and fused to the initial image feature.
  • the target point cloud feature is obtained.
  • the position difference threshold can be set as required, or can be preset.
  • the target image position is converted into a position in the point cloud coordinate system to obtain the second conversion position, and the second conversion position and the target point cloud are obtained.
  • the second coincident position of the position, the image feature corresponding to the second coincident position in the initial image feature is fused into the point cloud feature corresponding to the second coincident position in the initial point cloud feature, and the target point cloud feature is obtained, so that the target point
  • the cloud features include image features and point cloud features, which improves the feature richness in the target point cloud features and improves the representation ability of the target point cloud features.
  • a second coincident position between the second conversion position and the target point cloud position is obtained, and the image features corresponding to the second coincident position in the initial image features are fused into the initial point cloud features Among the point cloud features corresponding to the second coincident position, the obtained target point cloud features include:
  • a voxel is an abbreviation for a volume element (Volume Pixel).
  • Voxelization refers to dividing a point cloud into multiple voxels according to a given voxel size.
  • the dimensions of each voxel in the X, Y and Z axis directions may be, for example, w, h and e, respectively.
  • the voxels obtained by segmentation include empty voxels and non-empty voxels, empty voxels do not include points in the point cloud, and non-empty voxels include points in the point cloud.
  • the voxelization result may include at least one of the number of voxels obtained after voxelization, the position information of the voxels, or the size of the voxels.
  • a voxel feature is a feature used to represent a voxel.
  • Voxel features can accelerate the convergence of the network model and simplify the complexity of the network model.
  • the server can sample the same number of points from the voxel according to the number of points included in the voxel in the voxelization result, obtain the sampling points corresponding to the voxels, and perform feature extraction according to the sampling points corresponding to the voxels to obtain the voxels The corresponding initial voxel features.
  • the voxel feature recognition model refers to a model that extracts voxel features.
  • the object recognition model further includes a voxel feature extraction layer, and the voxel feature extraction layer may be obtained by joint training with the image feature extraction layer and the point cloud feature extraction layer.
  • the server can input the scene point cloud into the voxel feature extraction layer, and obtain the voxel feature output by the voxel feature extraction layer.
  • the intermediate point cloud feature is a feature obtained by fusing the image feature corresponding to the second coincident position into the point cloud feature corresponding to the second coincident position in the initial point cloud feature.
  • S308 Obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target voxel position to a position in the point cloud coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the point cloud coordinate system, to obtain a third transformation Location.
  • the voxel position refers to the position of the voxel in the voxel coordinate system.
  • the target voxel position refers to the position of the voxel corresponding to the current scene point cloud in the voxel coordinate system.
  • the target voxel position may include the respective positions of each voxel corresponding to the current scene point cloud in the voxel coordinate system.
  • the coordinates of the voxel can be obtained according to the voxel coordinate system.
  • the coordinate conversion relationship between the voxel coordinate system and the point cloud coordinate system refers to the conversion relationship between the coordinates in the voxel coordinate system and the coordinates in the point cloud coordinate system.
  • the voxel coordinate system is a three-dimensional coordinate system. In the following description, the coordinate conversion relationship between the voxel coordinate system and the point cloud coordinate system is referred to as the third conversion relationship.
  • the third transformation position refers to the corresponding position of the target voxel position in the point cloud coordinate system.
  • the third transformed position is the position in the point cloud coordinate system.
  • S310 Obtain a third coincident position between the third conversion position and the target voxel position, and fuse the voxel feature corresponding to the third coincident position in the initial voxel feature with the voxel feature corresponding to the third conversion position in the intermediate point cloud feature
  • the target point cloud feature is obtained.
  • the third coincident position refers to the coincidence position of the third conversion position and the target voxel position.
  • the voxel feature corresponding to the third coincident position refers to the voxel feature at the corresponding position of the third coincident position in the voxel coordinate system.
  • the server may perform feature fusion between the voxel feature corresponding to the third coincident position in the initial voxel feature and the point cloud feature corresponding to the third transformation position in the intermediate point cloud feature to obtain the target point cloud feature.
  • the current scene point cloud is voxelized to obtain the voxelization result
  • the voxel feature extraction is performed according to the voxelization result
  • the initial voxel feature is obtained
  • the second transformation position and the first position of the target point cloud position are obtained.
  • Two coincidence positions, in the initial image features, the image features corresponding to the second coincidence positions are fused into the point cloud features corresponding to the second coincidence positions in the initial point cloud features, to obtain the intermediate point cloud features, and obtain the corresponding point cloud of the current scene.
  • the target voxel position convert the target voxel position to the position in the point cloud coordinate system, obtain the third conversion position, obtain the third conversion position and the target
  • the third coincident position of the voxel position, the voxel feature corresponding to the third coincident position in the initial voxel feature is fused into the point cloud feature corresponding to the third transformation position in the intermediate point cloud feature, and the target point cloud is obtained.
  • the intermediate point cloud features include point cloud features and image features
  • the target point cloud features include image features, point cloud features, and voxel features, which improves the feature richness in the target point cloud features.
  • the method further includes: performing voxelization on the point cloud of the current scene to obtain a voxelization result; performing voxel feature extraction according to the voxelization result to obtain an initial voxel feature; obtaining the corresponding point cloud of the current scene
  • convert the target image position to the position in the voxel coordinate system to obtain the fourth conversion position; obtain the fourth conversion position and the voxel
  • the image feature corresponding to the fourth coincident position in the initial image feature is fused into the voxel feature corresponding to the fourth coincident position in the initial voxel feature to obtain the target voxel feature.
  • the coordinate conversion relationship between the image coordinate system and the voxel coordinate system refers to the conversion relationship of converting coordinates in the image coordinate system into coordinates in the voxel coordinate system.
  • the object corresponding to the coordinates before transformation in the image coordinate system is the same as the object corresponding to the coordinates after transformation in the voxel coordinate system.
  • the coordinate conversion relationship between the intermediate image coordinate system and the voxel coordinate system is described below as a fourth conversion relationship.
  • the coordinates of the position represented by the coordinates in the image coordinate system in the voxel coordinate system can be determined through the fourth conversion relationship.
  • the fourth transformation position refers to the position corresponding to the target image position in the voxel coordinate system, and the fourth transformation position is the position in the three-dimensional coordinate system.
  • the fourth conversion position may include the three-dimensional coordinates in the voxel coordinate system of all or part of the two-dimensional coordinates corresponding to the target image position.
  • the fourth coincident position refers to a position where the fourth transformation position coincides with the target voxel position.
  • the image feature corresponding to the fourth overlapping position refers to the image feature corresponding to the two-dimensional coordinates in the image coordinate system corresponding to the fourth overlapping position.
  • the target voxel feature is a feature obtained by fusing the image feature corresponding to the fourth coincident position into the voxel feature corresponding to the fourth coincident position in the initial voxel feature.
  • the server may perform feature fusion between the image feature corresponding to the fourth coincident position and the voxel feature corresponding to the fourth coincident position to obtain the target voxel feature.
  • the server may convert the target voxel position into a position in the image coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the image coordinate system, and obtain the image position corresponding to the target voxel position.
  • the position proposes the corresponding image features from the initial image features, and fuses them into the initial voxel features to obtain the target voxel features.
  • the center position of the voxel can be projected into the image coordinate system to obtain the center image position, and the image features at the position where the difference between the original image feature and the center image position is less than the difference threshold can be extracted into the original voxel feature.
  • the target voxel features are obtained.
  • the difference threshold can be set as required or preset.
  • the object recognition model may further include a voxel spatial fusion layer
  • the server may input image features and voxel features into the voxel spatial fusion layer
  • the voxel spatial fusion layer may determine the position and voxel of the image features The overlapping positions between the positions of the features are extracted, and the image features at the overlapping positions are extracted from the image features, and fused into the voxel features to obtain the target voxel features.
  • the object recognition model can also include a point and voxel fusion layer.
  • the server can input the target voxel features and intermediate point cloud features into the point and voxel fusion layer.
  • the overlapping positions between the positions of the intermediate point cloud features are extracted, and the voxel features at the overlapping positions are extracted from the target voxel features, and fused into the intermediate point cloud features to obtain the target point cloud features.
  • the point and voxel fusion layer can also be referred to as a point cloud and voxel fusion layer.
  • the current scene point cloud is voxelized to obtain a voxelization result
  • voxel feature extraction is performed according to the voxelization result
  • the initial voxel feature is obtained
  • the target voxel position corresponding to the current scene point cloud is obtained
  • the target image position is converted into a position in the voxel coordinate system
  • the fourth conversion position is obtained
  • the fourth coincident position between the fourth conversion position and the voxel position is obtained.
  • the target voxel features include voxel features
  • the representation ability of target voxel features and the richness of features are improved.
  • determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature includes: acquiring the associated scene image corresponding to the current scene image and the associated scene point cloud corresponding to the current scene point cloud; acquiring the associated scene The associated image features corresponding to the image, and the associated point cloud features corresponding to the associated scene point cloud; according to the chronological order, the feature fusion of the target image features and the associated image features is performed to obtain the target image temporal sequence features; according to the chronological order, the target point The cloud feature and the associated point cloud feature are fused to obtain the target point cloud time series feature; based on the target image time series feature and the target point cloud time series feature, the object position corresponding to the scene object is determined.
  • the associated scene image refers to an image associated with the current scene image.
  • the associated scene image may be a forward frame collected before the current moment or a backward frame collected later by the image capture device that obtained the current scene image.
  • the forward frame can be used as the associated scene image, and the current scene image and the forward frame can also be detected for overlapping objects. If there is a coincident detection object between the current scene image and the forward frame, the forward frame is used as the current scene.
  • the image's associated scene image For example, if vehicle A exists in the current scene image and vehicle A also exists in the forward frame, the forward frame may be used as the associated scene image of the previous scene image.
  • the current scene image and the associated scene image may be different video frames in the same video, for example, may be different video frames in the video captured by the image capturing device.
  • the associated scene image may be a video frame captured before or after the current scene image.
  • the method of obtaining the associated image features may refer to the obtaining method of the target image features.
  • the associated scene point cloud refers to the point cloud associated with the current scene point cloud.
  • the associated scene point cloud may be the scene point cloud collected before or after the current moment by the point cloud collection device that collected the current scene point cloud.
  • the method of obtaining the associated point cloud features can refer to the obtaining method of the target point cloud features.
  • the server may combine the target image feature and the associated image feature according to the time sequence of the associated scene image and the current scene image to obtain the combined image feature. Arranged before image features that are later in time.
  • the server may obtain the target image time sequence feature according to the combined image feature, for example, the combined image feature may be used as the target image time sequence feature, or the combined image feature may be processed to obtain the target image time sequence feature.
  • the server may combine the target point cloud feature and the associated point cloud feature according to the time sequence of the associated scene point cloud and the current scene point cloud to obtain the combined point cloud feature.
  • the earlier point cloud features can be arranged before the later point cloud features.
  • the server can obtain the target point cloud time series features according to the combined point cloud features.
  • the combined point cloud features can be used as the target point cloud time series features, or the combined point cloud features can be processed to obtain the target point cloud time series features.
  • the server may obtain the associated voxel feature corresponding to the associated scene point cloud, and perform feature fusion on the target voxel feature and the associated voxel feature in a chronological order to obtain the target voxel time sequence feature.
  • the server may perform feature fusion using target image time series features, target point cloud time series features, and target voxel time series features to obtain secondary fusion image features, secondary fusion time series features, and secondary fusion point cloud features.
  • the feature fusion between the target image time series features, the target point cloud time series features, and the target voxel time series features can refer to the feature fusion method between the initial image features, the initial point cloud features, and the initial voxel features.
  • the server can use the secondary fusion image features, secondary fusion voxel features, and secondary fusion point cloud features to perform image task learning, voxel task learning, and point cloud task learning, respectively.
  • the image feature may include position information of the object
  • the server may obtain the position of the object in the target image feature, obtain the first position, and the position of the object in the associated image feature, obtain the second position, according to The first position and the second position determine the motion state of the object. For example, it can be determined whether the object has changed lanes or turned according to the relative relationship between the first position and the second position, and it can be determined according to the difference between the first position and the second position. Determines the speed of movement of the object.
  • the point cloud feature and the voxel feature may also include the position information of the object, and also the point cloud feature and the voxel feature may be used to determine the motion state of the object.
  • obtain the associated scene image corresponding to the current scene image, and the associated scene point cloud corresponding to the current scene point cloud obtain the associated image feature corresponding to the associated scene image, and the associated point cloud feature corresponding to the associated scene point cloud, according to
  • the feature fusion of the target image features and the associated image features is performed to obtain the target image time series features.
  • the target point cloud features and the associated point cloud features are feature fused to obtain the target point cloud time series features.
  • the image timing features and the target point cloud timing features determine the object positions corresponding to the scene objects, so that the target image timing features include image features of different scene images and point cloud features of different scene point clouds, which improves the accuracy of scene object positions.
  • determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature includes: determining a combined position between the target image feature and the target point cloud feature to obtain the target combined position; and, combining the target combined position As the object position corresponding to the scene object.
  • the combined position may be a combination of the position corresponding to the target image feature and the position corresponding to the target point cloud feature.
  • the server can use the coordinates in the same coordinate system to represent the position corresponding to the target image feature and the position corresponding to the target point cloud feature, for example, use the coordinates in the image coordinate system to obtain the first feature position corresponding to the target image feature, and the second feature position corresponding to the target point cloud feature, and calculating the result of combining the first feature position and the second feature position to obtain the object position corresponding to the scene object.
  • the combined position between the target image feature and the target point cloud feature is determined to obtain the target combined position, and the target combined position is used as the object position corresponding to the scene object, which improves the accuracy of the object position.
  • the server may perform task learning using at least one of target image features, target point cloud features, or target voxel features.
  • Tasks can include low-level tasks and high-level tasks, and low-level tasks can include point-level semantic segmentation and scene flow estimation, voxel-level semantic segmentation and scene flow estimation, and pixel-level semantic segmentation and scene flow. estimate.
  • High-level tasks can include object detection, scene recognition, and instance segmentation.
  • an object recognition system mainly includes a first multi-sensor feature extraction (Multi-Sensor Feature Extraction) module, a temporal fusion (Temporal Fusion) module, a second Multi-sensor feature extraction module, Image View Tasks learning module, Voxel Tasks learning module and Point Tasks learning module.
  • each model can be implemented by one or more neural network models.
  • the multi-sensor feature extraction module supports the fusion method of a single sensor and multiple sensors, that is, the input can be the data collected by a single sensor, or the data collected by multiple sensors separately.
  • the sensor may be, for example, at least one of an image acquisition device or a point cloud acquisition device.
  • the multi-sensor feature extraction module includes Image Feature Extraction, Point Feature Extraction, Voxel Feature Extraction, Image Spatial Fusion, Point Cloud Point Spatial Fusion, Voxel Spatial Fusion, and Point-Voxel Fusion.
  • the image spatial domain fusion module is used to fuse point cloud features into image features
  • the point cloud spatial domain fusion module is used to fuse image features into point features
  • the voxel spatial domain fusion module is used to fuse image features into voxel features.
  • the cloud and voxel fusion module is used to fuse point features into voxel features and voxel features into point features.
  • the time series fusion module is used to fuse the features obtained from images of different frames, that is, the concatenation of feature dimensions.
  • the timing fusion module is used to fuse the before and after timing information of the features.
  • the image features can be concatenated in the pixel dimension, for example, the pixel dimension can be concatenated, or the two features can be correlated.
  • point cloud features similar to FlowNe3D, feature extraction operations for each point field can be performed, similar to related operations, and operations for voxel features are similar to those for image features, but the operation of voxel features deals with three-dimensional The data.
  • multi-sensor multi-task fusion can be performed by an object recognition system, which mainly includes the following steps:
  • Step 1 Input the image and point cloud of the frame before and after;
  • Step 2 Input the image and point cloud at each moment into the multi-sensor feature extraction module
  • Step 3 The multi-sensor feature extraction module outputs image features, point features and voxel features at each moment respectively;
  • Step 4 The image features, point features and voxel features output by the multi-sensor feature extraction module are respectively time-series fusion to obtain three time-series features, namely image time-series features, point time-series features and voxel time-series features;
  • Step 5 Input the three time series features obtained in Step 4 into the multi-sensor feature extraction module, and perform feature fusion again to obtain the final image features, final point features and final voxel features.
  • Step 6 Based on the final image feature (Final ImageView Feature), the final point feature (Final Point Feature) and the final voxel feature (Final Voxel Feature), perform task learning at the image level, point level and voxel level.
  • the multi-sensor feature extraction module can newly select the feature input of the sensor through the effective sensor, that is, it can select the effective sensor.
  • the data collected by the sensor is used as the input data of the multi-sensor feature extraction module. For example, if the camera fails, the data collected by the lidar can be used for point tasks and voxel tasks. A camera failure may be a malfunction of the camera. A valid sensor can be a functioning sensor.
  • the effectiveness of task learning is improved due to the coverage of tasks from low-level to high-level.
  • the whole task can be trained to improve the performance of the target task.
  • inference is deep learning to apply the ability learned from training to work.
  • the inference phase can be understood as the phase where the trained model is used.
  • the object recognition system and object recognition method proposed above can be applied to autonomous driving perception algorithms.
  • tasks such as object detection, semantic segmentation, and scene flow estimation can be achieved.
  • the results of scene flow estimation and semantic segmentation can be used as clues for non-deep learning object detection methods based on point clouds, such as the cost term of clustering in cluster-based object detection.
  • steps in the flowcharts of FIGS. 2-4 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages are not necessarily completed at the same time. The order of execution of the steps is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.
  • an object recognition apparatus including: a current scene image acquisition module 502, an initial point cloud feature acquisition module 504, a target image feature acquisition module 506, and a target point cloud feature acquisition module 508.
  • the position determination module 510 and the motion control module 512 wherein:
  • the current scene image acquisition module 502 is configured to acquire the current scene image and the current scene point cloud corresponding to the target moving object.
  • the initial point cloud feature obtaining module 504 is configured to perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features.
  • the target image feature obtaining module 506 is used to obtain the target image position corresponding to the current scene image, and based on the initial point cloud features and the point cloud features corresponding to the target image position, the initial image features are fused to obtain the target image features.
  • the target point cloud feature obtaining module 508 is used to obtain the target point cloud position corresponding to the current scene point cloud, and based on the image features corresponding to the target point cloud position in the initial image features, the initial point cloud features are fused to obtain the target point cloud. feature.
  • the position determination module 510 is configured to determine the object position corresponding to the scene object based on the target image feature and the target point cloud feature.
  • the motion control module 512 is configured to control the target moving object to move based on the position corresponding to the scene object.
  • the target image feature obtaining module 506 includes:
  • the first conversion position obtaining unit is used for converting the target point cloud position into the position in the image coordinate system according to the coordinate conversion relationship between the point cloud coordinate system and the image coordinate system, so as to obtain the first conversion position.
  • the target image feature obtaining unit is used to obtain the first coincidence position of the first conversion position and the target image position, and fuse the point cloud feature corresponding to the first coincidence position in the initial point cloud feature into the first coincidence position in the initial image feature From the corresponding image features, the target image features are obtained.
  • the target point cloud feature obtaining module 508 includes:
  • the second conversion position obtaining unit is used for converting the target image position into a position in the point cloud coordinate system according to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system to obtain the second conversion position.
  • the target point cloud feature obtaining unit is used to obtain the second coincidence position between the second conversion position and the target point cloud position, and fuse the image features corresponding to the second coincidence position in the initial image feature into the second coincidence position in the initial point cloud feature.
  • the target point cloud feature is obtained.
  • the target point cloud feature obtaining unit is further configured to voxelize the current scene point cloud to obtain a voxelization result; perform voxel feature extraction according to the voxelization result to obtain an initial voxel feature; obtain The second conversion position and the second coincidence position of the target point cloud position, the image features corresponding to the second coincidence position in the initial image features are fused into the point cloud features corresponding to the second coincidence position in the initial point cloud features, and the middle point is obtained.
  • Point cloud feature obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target voxel position to the position in the point cloud coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the point cloud coordinate system, and obtain the first Three transformation positions; and obtaining the third overlapping position of the third transformation position and the target voxel position, and merging the voxel features corresponding to the third overlapping position in the initial voxel features into the third transformation in the intermediate point cloud feature In the point cloud feature corresponding to the position, the target point cloud feature is obtained.
  • the apparatus further includes:
  • the voxelization result obtaining module is used to voxelize the point cloud of the current scene to obtain the voxelization result.
  • the initial voxel feature obtaining module is used to extract the voxel feature according to the voxelization result to obtain the initial voxel feature.
  • the fourth conversion position obtaining module is used to obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target image position to the position in the voxel coordinate system according to the coordinate conversion relationship between the image coordinate system and the voxel coordinate system , to get the fourth transition position.
  • the target voxel feature obtaining module is used to obtain the fourth coincidence position between the fourth conversion position and the voxel position, and fuse the image features corresponding to the fourth coincidence position in the initial image features into the fourth coincidence position in the initial voxel feature From the corresponding voxel features, the target voxel features are obtained.
  • the location determination module 510 includes:
  • the associated scene image acquisition unit is configured to acquire the associated scene image corresponding to the current scene image and the associated scene point cloud corresponding to the current scene point cloud.
  • the associated image feature acquisition unit is configured to acquire associated image features corresponding to the associated scene images and associated point cloud features corresponding to the associated scene point clouds.
  • the target image time sequence feature obtaining unit is used to perform feature fusion on the target image feature and the associated image feature according to the time sequence to obtain the target image time sequence feature.
  • the time series feature acquisition unit of the target point cloud is used to perform feature fusion on the target point cloud feature and the associated point cloud feature according to the time sequence to obtain the target point cloud time series feature.
  • the position determination unit is used for determining the position of the object corresponding to the scene object based on the time sequence feature of the target image and the time sequence feature of the target point cloud.
  • the location determination module 510 includes:
  • the target combined position obtaining unit is used to determine the combined position between the target image feature and the target point cloud feature to obtain the target combined position.
  • the object position obtaining unit is used for taking the target combined position as the object position corresponding to the scene object.
  • Each module in the above-mentioned object recognition device may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store data such as the current scene image, the current point cloud image, point cloud features, image features, and voxel features.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions when executed by a processor, implement a method of object recognition.
  • FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device includes a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processors, cause the one or more processors to perform the steps of the above object identification method.
  • One or more computer storage media storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above object identification method.
  • the computer storage medium is a readable storage medium, and the readable storage medium may be non-volatile or volatile.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

An object identification method, comprising: acquiring a current scene image and a current scene point cloud which correspond to a target movement object (S202); performing image feature extraction on the current scene image, so as to obtain initial image features, and performing point cloud feature extraction on the current scene point cloud, so as to obtain initial point cloud features (S204); acquiring a target image position corresponding to the current scene image, and performing fusing processing on the initial image features on the basis of a point cloud feature, corresponding to the target image position, among the initial point cloud features, so as to obtain a target image feature (S206); acquiring a target point cloud position corresponding to the current scene point cloud, and performing fusing processing on the initial point cloud features on the basis of an image feature, corresponding to the target point cloud position, among the initial point cloud features, so as to obtain a target point cloud feature (S208); determining, on the basis of the target image feature and the target point cloud feature, an object position corresponding to a scene object (S210); and controlling, on the basis of the position corresponding to the scene object, the target movement object to move (S212).

Description

对象识别方法、装置、计算机设备和存储介质Object recognition method, apparatus, computer equipment and storage medium 技术领域technical field
本申请涉及一种对象识别方法、装置、计算机设备和存储介质。The present application relates to an object recognition method, apparatus, computer equipment and storage medium.
背景技术Background technique
随着人工智能的发展,出现了自动驾驶汽车,自动驾驶汽车是一种通过计算机系统实现无人驾驶的智能汽车,其依靠人工智能、视觉计算、雷达、监控装置和全球定位系统协同合作,使得计算机系统在没有人类的主动操作下,自动并且安全地控制汽车行驶。在自动驾驶汽车的行驶过程中,需要对行驶途中的障碍物进行检测,以及时躲避障碍物。With the development of artificial intelligence, self-driving cars have appeared. Self-driving cars are intelligent cars that realize unmanned driving through computer systems. The computer system automatically and safely controls the driving of the car without the active operation of the human being. In the process of driving an autonomous vehicle, it is necessary to detect obstacles on the way and avoid obstacles in time.
然而,发明人意识到,目前的用于识别障碍物的方式存在不能准确识别障碍物的情况,导致自动驾驶汽车的避障能力低,从而使得自动驾驶车辆的安全性低。However, the inventor realizes that the current methods for identifying obstacles cannot accurately identify obstacles, resulting in low obstacle avoidance capability of the autonomous vehicle, and thus low safety of the autonomous vehicle.
发明内容SUMMARY OF THE INVENTION
根据本申请公开的各种实施例,提供一种对象识别方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, an object recognition method, apparatus, computer device and storage medium are provided.
一种对象识别方法包括:An object recognition method includes:
获取目标运动对象对应的当前场景图像以及当前场景点云;Obtain the current scene image and the current scene point cloud corresponding to the target moving object;
对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;Perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;Obtaining the target image position corresponding to the current scene image, and performing fusion processing on the initial image feature based on the point cloud feature corresponding to the target image position in the initial point cloud feature to obtain the target image feature;
获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;Obtaining the target point cloud position corresponding to the point cloud of the current scene, and performing fusion processing on the initial point cloud feature based on the image feature corresponding to the target point cloud position in the initial image feature to obtain the target point cloud feature;
基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature; and
基于所述场景对象对应的位置控制所述目标运动对象进行运动。The target moving object is controlled to move based on the position corresponding to the scene object.
一种对象识别装置包括:An object recognition device includes:
当前场景图像获取模块,用于获取目标运动对象对应的当前场景图像以及当前场景点云;The current scene image acquisition module is used to acquire the current scene image and the current scene point cloud corresponding to the target moving object;
初始点云特征得到模块,用于对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;an initial point cloud feature obtaining module, used for performing image feature extraction on the current scene graph to obtain initial image features, and performing point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
目标图像特征得到模块,用于获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;The target image feature obtaining module is used to obtain the target image position corresponding to the current scene image, and based on the initial point cloud features, the point cloud features corresponding to the target image position, perform fusion processing on the initial image features, Get the target image features;
目标点云特征得到模块,用于获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;The target point cloud feature obtaining module is used to obtain the target point cloud position corresponding to the current scene point cloud, and based on the initial image features, the image features corresponding to the target point cloud position, perform the initial point cloud feature Fusion processing to obtain the target point cloud features;
位置确定模块,用于基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及a position determination module for determining an object position corresponding to a scene object based on the target image feature and the target point cloud feature; and
运动控制模块,用于基于所述场景对象对应的位置控制所述目标运动对象进行运动。A motion control module, configured to control the target moving object to move based on the position corresponding to the scene object.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored therein, the computer-readable instructions, when executed by the processor, cause the one or more processors to execute The following steps:
获取目标运动对象对应的当前场景图像以及当前场景点云;Obtain the current scene image and the current scene point cloud corresponding to the target moving object;
对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;Perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;Obtaining the target image position corresponding to the current scene image, and performing fusion processing on the initial image feature based on the point cloud feature corresponding to the target image position in the initial point cloud feature to obtain the target image feature;
获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;Obtaining the target point cloud position corresponding to the point cloud of the current scene, and performing fusion processing on the initial point cloud feature based on the image feature corresponding to the target point cloud position in the initial image feature to obtain the target point cloud feature;
基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature; and
基于所述场景对象对应的位置控制所述目标运动对象进行运动。The target moving object is controlled to move based on the position corresponding to the scene object.
一个或多个存储有计算机可读指令的计算机存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
获取目标运动对象对应的当前场景图像以及当前场景点云;Obtain the current scene image and the current scene point cloud corresponding to the target moving object;
对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;Perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;Obtaining the target image position corresponding to the current scene image, and performing fusion processing on the initial image feature based on the point cloud feature corresponding to the target image position in the initial point cloud feature to obtain the target image feature;
获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;Obtaining the target point cloud position corresponding to the point cloud of the current scene, and performing fusion processing on the initial point cloud feature based on the image feature corresponding to the target point cloud position in the initial image feature to obtain the target point cloud feature;
基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature; and
基于所述场景对象对应的位置控制所述目标运动对象进行运动。The target moving object is controlled to move based on the position corresponding to the scene object.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the present application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域 普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings required in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为根据一个或多个实施例中对象识别方法的应用场景图;1 is an application scenario diagram of an object recognition method according to one or more embodiments;
图2为根据一个或多个实施例中对象识别方法的流程示意图;2 is a schematic flowchart of an object recognition method according to one or more embodiments;
图3为根据一个或多个实施例中得到目标点云特征的步骤的流程示意图;3 is a schematic flowchart of steps for obtaining target point cloud features according to one or more embodiments;
图4为根据一个或多个实施例中对象识别系统的示意图;4 is a schematic diagram of an object recognition system in accordance with one or more embodiments;
图5为根据一个或多个实施例中对象识别装置的框图;5 is a block diagram of an apparatus for object recognition in accordance with one or more embodiments;
图6为根据一个或多个实施例中计算机设备的框图。6 is a block diagram of a computer device in accordance with one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
本申请提供的对象识别方法,可以应用于如图1所示的应用环境中。该应用环境包括终端102和服务器104,终端102中安装有点云采集设备和图像采集设备。点云采集设备用于采集点云数据,例如当前场景点云。图像采集设备用于采集图像,例如当前场景图像。终端102可以将采集得到的当前场景图像和当前场景点云传输至服务器104,服务器104可以获取终端102对应的当前场景图像以及当前场景点云,目标运动对象指的是对当前场景图进行图像特征提取,得到初始图像特征,对当前场景点云进行点云特征提取,得到初始点云特征,获取当前场景图像对应的目标图像位置,基于初始点云特征中,目标图像位置对应的点云特征,对初始图像特征进行融合(Fusion)处理,得到目标图像特征,获取当前场景点云对应的目标点云位置,基于初始图像特征中,目标点云位置对应的图像特征,对初始点云特征进行融合处理,得到目标点云特征,基于目标图像特征以及目标点云特征,确定场景对象对应的对象位置,基于场景对象对应的位置控制终端102进行运动。终端102可以但不限于是自动驾驶汽车和移动机器人。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。点云采集设备可以是任意的可以采集点云数据的设备,可以但不限于是激光雷达。图像采集设备可以是任意的可以采集图像数据的设备,可以但不限于是相机。The object recognition method provided by this application can be applied to the application environment shown in FIG. 1 . The application environment includes a terminal 102 and a server 104, and a point cloud collection device and an image collection device are installed in the terminal 102. The point cloud collection device is used to collect point cloud data, such as the point cloud of the current scene. The image acquisition device is used to acquire images, such as the current scene image. The terminal 102 can transmit the collected current scene image and the current scene point cloud to the server 104, and the server 104 can obtain the current scene image and the current scene point cloud corresponding to the terminal 102, and the target moving object refers to the image feature of the current scene graph. Extract, obtain the initial image features, extract the point cloud features of the current scene point cloud, obtain the initial point cloud features, obtain the target image position corresponding to the current scene image, and based on the initial point cloud features, the point cloud features corresponding to the target image position, Perform fusion processing on the initial image features to obtain the target image features, obtain the target point cloud position corresponding to the current scene point cloud, and fuse the initial point cloud features based on the image features corresponding to the target point cloud position in the initial image features After processing, the target point cloud feature is obtained, the object position corresponding to the scene object is determined based on the target image feature and the target point cloud feature, and the terminal 102 is controlled to move based on the position corresponding to the scene object. The terminal 102 may be, but is not limited to, self-driving cars and mobile robots. The server 104 can be implemented by an independent server or a server cluster composed of multiple servers. The point cloud collection device can be any device that can collect point cloud data, and it can be but not limited to lidar. The image acquisition device may be any device that can acquire image data, and may be, but not limited to, a camera.
可以理解,上述应用场景仅是一种示例,并不构成对本申请实施例提供的对象识别方法的限定,本申请实施例提供的对象识别方法还可以应用与其他应用场景中,例如,上述对象识别方法可以由终端102执行的。It can be understood that the above application scenario is only an example, and does not constitute a limitation on the object recognition method provided by the embodiment of the present application. The object recognition method provided by the embodiment of the present application can also be applied to other application scenarios. For example, the above object recognition method The method may be performed by the terminal 102 .
在一些实施例中,如图2所示,提供了一种对象识别方法,以该方法应用于图1中的服务器102为例进行说明,包括以下步骤:In some embodiments, as shown in FIG. 2, an object recognition method is provided, and the method is applied to the server 102 in FIG. 1 as an example for description, including the following steps:
S202,获取目标运动对象对应的当前场景图像以及当前场景点云。S202: Acquire a current scene image and a current scene point cloud corresponding to the target moving object.
具体地,运动对象指的是处于运动状态的对象,可以是有生命的物体,可以但不限于是人和动物,也可以是无生命的物体,可以但不限于是车辆和无人机,例如可以是自动 驾驶车辆。目标运动对象指的是待根据场景图像以及场景点云控制运动的运动对象。目标运动对象例如为图1中的终端102。Specifically, a moving object refers to an object in a state of motion, which can be a living object, can be but not limited to humans and animals, or can be an inanimate object, can be but is not limited to vehicles and drones, such as Could be an autonomous vehicle. The target moving object refers to the moving object whose movement is to be controlled according to the scene image and the scene point cloud. The target moving object is, for example, the terminal 102 in FIG. 1 .
场景图像指的是运动对象所处的场景对应的图像。场景图像可以反映运动对象的所处的环境,例如场景图像中可以包括所处的环境中的车道、车辆、行人或障碍物中的一种或者多种。场景图像可以是运动对象内置的图像采集设备采集得到的,例如可以是自动驾驶车辆中安装的相机拍摄得到的,也可以是运动对象外置的且与运动对象关联的图像采集设备采集得到的,例如可以是通过连接线或网络与运动对象连接的图像采集设备采集得到的,例如可以是通过自动驾驶车辆所在的道路上与自动驾驶车辆通过网络连接的相机拍摄得到的。当前场景图像指的是目标运动对象当前时间所处的当前场景对应的图像。当前场景指的是当前时间目标运动对象所处的场景。外置的图像采集设备可以将采集得到的场景图像传输至运动对象。The scene image refers to the image corresponding to the scene where the moving object is located. The scene image may reflect the environment where the moving object is located, for example, the scene image may include one or more of lanes, vehicles, pedestrians or obstacles in the environment. The scene image may be acquired by an image acquisition device built into the moving object, for example, it may be acquired by a camera installed in an autonomous vehicle, or it may be acquired by an image acquisition device external to the moving object and associated with the moving object, For example, it may be acquired by an image acquisition device connected to a moving object through a connection line or a network, for example, it may be acquired by a camera connected to the autonomous vehicle via a network on the road where the autonomous driving vehicle is located. The current scene image refers to an image corresponding to the current scene where the target moving object is located at the current time. The current scene refers to the scene where the target moving object is located at the current time. The external image acquisition device can transmit the acquired scene image to the moving object.
点云(point cloud)指的是三维坐标系中的三维数据点组成的集合,例如可以是物体的表面在三维坐标系中对应的各个三维数据点组成的集合,点云可以表示一个物体的外表面形状。三维数据点指的是三维空间中的点,三维数据点包括三维坐标,三维坐标例如可以包括X坐标、Y坐标以及Z坐标。三维数据点还可以包括RGB颜色、灰度值或时间中的至少一种。场景点云指的是场景对应的三维数据点组成的集合。点云可以是通过激光雷达扫描得到的。其中,激光雷达是一种有源的传感器,通过发射激光束,将激光束打到物体表面之后,激光束被反弹,收集反弹的激光信号得到物体的点云。A point cloud refers to a collection of three-dimensional data points in a three-dimensional coordinate system. For example, it can be a collection of three-dimensional data points corresponding to the surface of an object in a three-dimensional coordinate system. A point cloud can represent the outer surface of an object. surface shape. A three-dimensional data point refers to a point in a three-dimensional space, and the three-dimensional data point includes three-dimensional coordinates, and the three-dimensional coordinates may include, for example, an X coordinate, a Y coordinate, and a Z coordinate. The three-dimensional data points may also include at least one of RGB color, grayscale value, or time. The scene point cloud refers to a collection of 3D data points corresponding to the scene. The point cloud can be obtained by scanning with lidar. Among them, lidar is an active sensor. By emitting a laser beam, after hitting the laser beam on the surface of the object, the laser beam is bounced, and the bounced laser signal is collected to obtain the point cloud of the object.
场景点云指的是运动对象所处的场景对应的点云。场景点云可以是运动对象内置的点云采集设备采集得到的,例如可以是自动驾驶车辆中安装的激光雷达扫描得到的,也可以是运动对象外置的且与运动对象关联的点云采集设备扫描得到的,例如可以是通过连接线或网络与运动对象连接的点云采集设备扫描得到的,例如可以是自动驾驶车辆所在的道路上与自动价值车辆通过网络连接的激光雷达扫描得到的。当前场景点云指的是目标运动对象当前时间所处的当前场景对应的点云。外置的点云采集设备可以将扫描得到的场景点云传输至运动对象。The scene point cloud refers to the point cloud corresponding to the scene where the moving object is located. The scene point cloud can be collected by the built-in point cloud collection device of the moving object, for example, it can be obtained by scanning the lidar installed in the autonomous vehicle, or it can be the point cloud collection device external to the moving object and associated with the moving object. For example, it can be obtained by scanning a point cloud collection device connected to a moving object through a connecting line or network. For example, it can be obtained by scanning a lidar connected to an automatic value vehicle through a network on the road where the autonomous vehicle is located. The current scene point cloud refers to the point cloud corresponding to the current scene where the target moving object is located at the current time. The external point cloud acquisition device can transmit the scanned scene point cloud to the moving object.
在一些实施例中,目标运动对象可以通过图像采集设备实时的对当前场景进行采集,得到当前场景图像,可以通过点云采集设备实时的对当前场景进行采集,得到当前场景点云。目标运动对象可以将采集得到的当前场景图像以及当前场景点云发送至服务器,服务器可以根据当前场景图像以及当前场景点确定目标运动对象的运行路径上的障碍物的位置,服务器可以将障碍物的位置传输至目标运动对象,从而使得目标运动对象可以在运动时可以避开障碍物。In some embodiments, the target moving object can collect the current scene in real time through an image acquisition device to obtain an image of the current scene, and can collect the current scene in real time through a point cloud acquisition device to obtain a point cloud of the current scene. The target moving object can send the collected current scene image and the current scene point cloud to the server, and the server can determine the position of the obstacle on the running path of the target moving object according to the current scene image and the current scene point. The position is transmitted to the target moving object, so that the target moving object can avoid obstacles while moving.
S204,对当前场景图进行图像特征提取,得到初始图像特征,对当前场景点云进行点云特征提取,得到初始点云特征。S204, perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features.
具体地,图像特征(Image Feature)用于反映图像的特征,点云特征(Point Feature)用于反映点云的特征。图像特征对行人等瘦长型对象有较强的表示能力。点云特征可以是 采用向量形式表示的,点云特征也可以称为点云特征向量,点云特征向量例如可以是(a1,b1,c1)。点云特征也可以称为点特征。点云特征对点云的信息有无损表示能力。图像特征可以是采用向量的形式表示的,图像特征也可以称为图像特征向量,图像特征向量例如可以是(a2,b2,c2)。初始图像特征指的是对当前场景图像进行特征提取得到的图像特征。初始点云特征指的是对当前场景点云进行特征提取得到的点云特征。Specifically, the image feature (Image Feature) is used to reflect the feature of the image, and the point cloud feature (Point Feature) is used to reflect the feature of the point cloud. Image features have strong representation ability for slender objects such as pedestrians. The point cloud feature can be represented in the form of a vector, and the point cloud feature can also be called a point cloud feature vector, and the point cloud feature vector can be, for example, (a1, b1, c1). Point cloud features can also be called point features. Point cloud features have lossless representation ability for point cloud information. The image feature may be represented in the form of a vector, and the image feature may also be referred to as an image feature vector, and the image feature vector may be (a2, b2, c2), for example. The initial image features refer to image features obtained by feature extraction from the current scene image. The initial point cloud feature refers to the point cloud feature obtained by feature extraction from the current scene point cloud.
在一些实施例中,服务器可以获取对象识别模型,对象识别模型可以包括图像特征提取层以及点云特征提取层。服务器可以将当前场景图像输入图像特征提取层中,图像特征提取层对当前场景图像进行特征提取,例如进行卷积,得到图像特征。服务器可以根据图像特征提取层输出的图像特征得到初始图像特征,例如可以将图像特征识别模型输出的图像特征作为初始图像特征。服务器可以将当前场景点云输入到点云特征提取层中,点云特征提取层对当前场景点云进行特征提取,例如进行卷积,得到点云特征。服务器可以根据点云特征提取层输出的点云特征得到初始点云特征,例如可以将点云特征识别模型输出的点云特征作为初始点云特征。In some embodiments, the server may obtain the object recognition model, and the object recognition model may include an image feature extraction layer and a point cloud feature extraction layer. The server may input the current scene image into the image feature extraction layer, and the image feature extraction layer performs feature extraction on the current scene image, such as convolution, to obtain image features. The server can obtain the initial image features according to the image features output by the image feature extraction layer, for example, the image features output by the image feature recognition model can be used as the initial image features. The server can input the current scene point cloud into the point cloud feature extraction layer, and the point cloud feature extraction layer performs feature extraction on the current scene point cloud, such as convolution, to obtain point cloud features. The server can obtain the initial point cloud feature according to the point cloud feature output by the point cloud feature extraction layer, for example, the point cloud feature output by the point cloud feature recognition model can be used as the initial point cloud feature.
在一些实施例中,图像特征提取层以及点云特征提取层是联合训练得到的。具体地,服务器可以将场景图像输入到图像特征提取层中,将场景点云输入到点云特征提取层中,得到图像特征提取层输出的预测图像特征,以及点云特征提取层输出的预测点云特征,获取场景图像对应的标准图像特征,标准图像特征指的是真实的图像特征,获取场景点云对应的标准点云特征,标准点云特征指的是真实的点云特征。根据预测图像特征确定第一损失值,例如根据预测图像特征与标准图像特征之间的差异得到第一损失值。根据预测点云特征确定第一损失值,例如根据预测点云特征与标准点云特征之间的差异得到第二损失值。根据第一损失值以及第二损失值确定总损失值,总损失值可以包括第一损失值以及第二损失值,例如可以是第一损失值与第二损失值相加后的结果。服务器可以利用总损失值调整图像特征提取层的参数以及点云特征提取层的参数,得到训练后的图像特征提取层以及训练后的点云特征提取层。In some embodiments, the image feature extraction layer and the point cloud feature extraction layer are jointly trained. Specifically, the server can input the scene image into the image feature extraction layer, input the scene point cloud into the point cloud feature extraction layer, and obtain the predicted image features output by the image feature extraction layer and the predicted points output by the point cloud feature extraction layer. Cloud features, obtain the standard image features corresponding to the scene image, the standard image features refer to the real image features, obtain the standard point cloud features corresponding to the scene point cloud, and the standard point cloud features refer to the real point cloud features. The first loss value is determined according to the predicted image feature, for example, the first loss value is obtained according to the difference between the predicted image feature and the standard image feature. The first loss value is determined according to the predicted point cloud feature, for example, the second loss value is obtained according to the difference between the predicted point cloud feature and the standard point cloud feature. The total loss value is determined according to the first loss value and the second loss value, and the total loss value may include the first loss value and the second loss value, for example, may be the result of adding the first loss value and the second loss value. The server can use the total loss value to adjust the parameters of the image feature extraction layer and the parameters of the point cloud feature extraction layer to obtain the image feature extraction layer after training and the point cloud feature extraction layer after training.
S206,获取当前场景图像对应的目标图像位置,基于初始点云特征中,目标图像位置对应的点云特征,对初始图像特征进行融合处理,得到目标图像特征。S206: Acquire the target image position corresponding to the current scene image, and based on the point cloud features corresponding to the target image position in the initial point cloud features, perform fusion processing on the initial image features to obtain the target image features.
具体地,图像位置指的是图像在图像坐标系中的位置,可以包括图像中各个像素点在图像坐标系中分别对应的位置。图像坐标系指的是图像采集设备采集得到的图像所采用的坐标系,根据图像坐标系可以得到图像中各个像素点的坐标。目标图像位置指的是当前场景图像中各个像素点在图像坐标系中的位置。图像位置可以是根据图像采集设备的参数确定的,图像采集设备的参数例如可以是相机参数,相机参数可以包括相机的外参和相机的内参。图像坐标系为二维坐标系,图像坐标系中的坐标包括横坐标和纵坐标。Specifically, the image position refers to the position of the image in the image coordinate system, and may include the corresponding positions of each pixel in the image in the image coordinate system. The image coordinate system refers to the coordinate system adopted by the image acquired by the image acquisition device, and the coordinates of each pixel in the image can be obtained according to the image coordinate system. The target image position refers to the position of each pixel in the current scene image in the image coordinate system. The image position may be determined according to the parameters of the image acquisition device, for example, the parameters of the image acquisition device may be camera parameters, and the camera parameters may include external parameters of the camera and internal parameters of the camera. The image coordinate system is a two-dimensional coordinate system, and the coordinates in the image coordinate system include abscissa and ordinate.
目标图像位置对应的点云特征指的是初始点云特征中,目标图像位置对应的点云坐标系中的位置处的点云特征。目标图像位置对应的点云坐标系中的位置,与初始点云特征对应的位置之间可以是有重叠的,也可以是没有重叠的。服务器可以将重叠的位置对应的 点云特征与初始图像特征进行融合处理,得到目标图像特征。例如目标图像位置为位置A,对应的点云坐标系中的位置为位置B,初始点云特征在点云坐标系中的位置为位置C,位置C与位置B重叠的部分是位置D,则可以将位置D对应的点云特征融合到初始图像特征中。The point cloud feature corresponding to the target image position refers to the point cloud feature at the position in the point cloud coordinate system corresponding to the target image position in the initial point cloud feature. The position in the point cloud coordinate system corresponding to the target image position may or may not overlap with the position corresponding to the initial point cloud feature. The server can fuse the point cloud features corresponding to the overlapping positions with the initial image features to obtain the target image features. For example, the position of the target image is position A, the position in the corresponding point cloud coordinate system is position B, the position of the initial point cloud feature in the point cloud coordinate system is position C, and the overlapping part of position C and position B is position D, then The point cloud features corresponding to position D can be fused into the initial image features.
融合处理指的是将同一坐标系中的相同位置处的不同特征建立关联关系,例如将位置A对应的图像特征a与点云特征b之间建立关联关系。融合处理也可以是根据同一坐标系中的相同位置处的不同特征得到包含该不同特征的融合特征,例如根据位置A对应的图像特征a以及点云特征b得到包括a和b的融合特征。融合特征可以是用向量形式表示的。The fusion process refers to establishing an association relationship between different features at the same position in the same coordinate system, for example, establishing an association relationship between the image feature a corresponding to the position A and the point cloud feature b. The fusion process may also be to obtain fusion features including the different features according to different features at the same position in the same coordinate system, for example, according to the image feature a corresponding to the position A and the point cloud feature b to obtain the fusion feature including a and b. The fusion features can be represented in vector form.
在一些实施例中,服务器可以获取目标图像位置对应的点云坐标系中的位置,根据初始点云特征中,目标图像位置对应的点云坐标系中的位置处的点云特征,对初始图像特征进行融合处理,得到目标图像特征。具体地,对象识别模型还可以包括图像空域融合层,服务器可以将初始点云特征以及初始图像特征输入到图像空域融合层中,图像空域融合层可以确定初始点云特征的位置与初始图像特征的位置间的重合位置,并从初始点云特征中提取重合位置处的点云特征,融合到初始图像特征中,得到目标图像特征。In some embodiments, the server may obtain the position in the point cloud coordinate system corresponding to the target image position, and according to the initial point cloud feature, the point cloud feature at the position in the point cloud coordinate system corresponding to the target image position, for the initial image The features are fused to obtain the target image features. Specifically, the object recognition model may also include an image airspace fusion layer, and the server may input the initial point cloud features and the initial image features into the image airspace fusion layer, and the image airspace fusion layer may determine the position of the initial point cloud feature and the initial image feature. The coincident position between the positions is extracted, and the point cloud feature at the coincident position is extracted from the initial point cloud feature, and fused into the initial image feature to obtain the target image feature.
S208,获取当前场景点云对应的目标点云位置,基于初始图像特征中,目标点云位置对应的图像特征,对初始点云特征进行融合处理,得到目标点云特征。S208 , obtaining the target point cloud position corresponding to the point cloud of the current scene, and based on the image features corresponding to the target point cloud position in the initial image features, perform fusion processing on the initial point cloud features to obtain the target point cloud features.
具体地。点云位置指的是点云在点云坐标系中的位置,可以包括点云中各个三维数据点分别在点云坐标系中对应的位置。根据点云坐标系可以得到点云中各个三维数据点分别对应的坐标。目标点云位置指的是当前场景点云中各个三维数据点分别对应的点云位置。点云位置可以是根据点云采集设备的参数确定的,点云采集设备的参数例如可以是激光雷达的参数。点云坐标系为三维坐标系,点云坐标系中的坐标可以包括X坐标、Y坐标以及Z坐标。当然,点云坐标系也可以是其他类型的三维坐标系,这里不做限制。specifically. The point cloud position refers to the position of the point cloud in the point cloud coordinate system, and may include the corresponding positions of each three-dimensional data point in the point cloud in the point cloud coordinate system. The coordinates corresponding to each 3D data point in the point cloud can be obtained according to the point cloud coordinate system. The target point cloud position refers to the point cloud position corresponding to each 3D data point in the current scene point cloud. The position of the point cloud may be determined according to the parameters of the point cloud collection device, and the parameters of the point cloud collection device may be, for example, the parameters of the laser radar. The point cloud coordinate system is a three-dimensional coordinate system, and the coordinates in the point cloud coordinate system may include X coordinate, Y coordinate and Z coordinate. Of course, the point cloud coordinate system can also be other types of three-dimensional coordinate systems, which are not limited here.
目标点云位置对应的图像特征指的是初始图像特征中,目标点云位置对应的图像坐标系中的位置处的图像特征。目标点云位置对应的图像坐标系中的位置,与初始图像特征对应的位置之间可以是有重叠的,也可以是没有重叠的。服务器可以将重叠的位置对应的图像特征与初始点云特征进行融合处理,得到目标点云特征。The image feature corresponding to the target point cloud position refers to the image feature at the position in the image coordinate system corresponding to the target point cloud position in the initial image feature. The position in the image coordinate system corresponding to the target point cloud position may or may not overlap with the position corresponding to the initial image feature. The server can fuse the image features corresponding to the overlapping positions with the initial point cloud features to obtain the target point cloud features.
在一些实施例中,服务器可以获取目标点云位置对应的图像坐标系中的位置,根据初始图像特征中,目标点云位置对应的图像坐标系中的位置处的图像特征,对初始点云特征进行融合处理,得到目标点云特征。具体地,对象识别模型还可以包括点云空域融合层,服务器可以将初始点云特征以及初始图像特征输入到点云空域融合层中,点云空域融合层可以确定初始点云特征的位置与初始图像特征的位置间的重合位置,并从初始图像特征中提取重合位置处的图像特征,融合到初始点云特征中,得到目标点云特征。In some embodiments, the server may obtain the position in the image coordinate system corresponding to the target point cloud position, and according to the initial image features, the image features at the position in the image coordinate system corresponding to the target point cloud position, for the initial point cloud feature Fusion processing is performed to obtain the target point cloud features. Specifically, the object recognition model may also include a point cloud airspace fusion layer, and the server may input the initial point cloud features and initial image features into the point cloud airspace fusion layer, and the point cloud airspace fusion layer may determine the position of the initial point cloud feature and the initial The coincident position between the positions of the image features, and extract the image features at the coincident position from the initial image features, and fuse them into the initial point cloud features to obtain the target point cloud features.
S210,基于目标图像特征以及目标点云特征,确定场景对象对应的对象位置。S210: Determine the object position corresponding to the scene object based on the target image feature and the target point cloud feature.
具体地,场景对象指的是目标运功对象所处的场景中的物体,场景对象可以是有生命的物体,例如可以是人或者动物,也可以是无生命的物体,例如可以是车辆、树或石头。 场景对象可以有多个。对象位置可以包括场景对象在当前场景图像中的位置或场景对象在当前场景点云中的位置中的至少一种。当前场景图像中的场景对象与当前场景点云中的场景对象可以是相同的,也可以存在差异。Specifically, the scene object refers to the object in the scene where the target movement object is located, and the scene object may be a living object, such as a person or an animal, or an inanimate object, such as a vehicle, a tree, etc. or stone. There can be multiple scene objects. The object position may include at least one of the position of the scene object in the current scene image or the position of the scene object in the current scene point cloud. The scene objects in the current scene image and the scene objects in the current scene point cloud may be the same, or there may be differences.
在一些实施例中,服务器可以根据目标图像特征的位置与目标点云特征的位置进行计算,得到各个场景对象的位置。In some embodiments, the server may perform calculation according to the position of the target image feature and the position of the target point cloud feature to obtain the position of each scene object.
在一些实施例中,服务器可以将不同视频帧得到的目标图像特征进行时序融合,得到融合后的目标图像特征,根据融合后的目标图像特征进行图像任务学习。时序融合指的是将不同帧的图像特征进行串联,或者将不同帧的点云特征进行串联,或者将不同帧的体素特征进行串联。服务器可以将不同场景点云得到的目标点云特征进行时序融合,得到融合后的目标点云特征,根据融合后的目标点云特征进行点云任务学习。服务器可以融合后的目标图像特征以及融合后的目标点云特征进行融合,得到二次融合后的目标图像特征以及二次融合后的目标点云特征,利用二次融合后的目标图像特征进行图像任务学习,利用二次融合后的目标点云特征进行点云任务学习。In some embodiments, the server may perform time series fusion of target image features obtained from different video frames to obtain fused target image features, and perform image task learning according to the fused target image features. Temporal fusion refers to concatenating image features of different frames, or concatenating point cloud features of different frames, or concatenating voxel features of different frames. The server can fuse the target point cloud features obtained from different scene point clouds in time series, obtain the fused target point cloud features, and perform point cloud task learning according to the fused target point cloud features. The server can fuse the fused target image features and the fused target point cloud features to obtain the secondary fused target image features and the secondary fused target point cloud features. Task learning, using the target point cloud features after secondary fusion to perform point cloud task learning.
S212,基于场景对象对应的位置控制目标运动对象进行运动。S212, the target moving object is controlled to move based on the position corresponding to the scene object.
具体地,服务器可以将场景对象对应的位置传输至目标运行对象,目标运行对象可以根据场景对象对应的位置,确定可以避开场景对象的运动路线,并按照运动路线运动,从而可以避开场景对象,保证安全的运动。Specifically, the server can transmit the position corresponding to the scene object to the target running object, and the target running object can determine the movement route that can avoid the scene object according to the position corresponding to the scene object, and move according to the movement route, so as to avoid the scene object , to ensure safe movement.
上述对象识别方法中,获取目标运动对象对应的当前场景图像以及当前场景点云,对当前场景图进行图像特征提取,得到初始图像特征,对当前场景点云进行点云特征提取,得到初始点云特征,获取当前场景图像对应的目标图像位置,基于初始点云特征中,目标图像位置对应的点云特征,对初始图像特征进行融合处理,得到目标图像特征,获取当前场景点云对应的目标点云位置,基于初始图像特征中,目标点云位置对应的图像特征,对初始点云特征进行融合处理,得到目标点云特征,基于目标图像特征以及目标点云特征,确定场景对象对应的对象位置,基于场景对象对应的位置控制目标运动对象进行运动,从而准确度的得到场景对象的位置,使得目标运动对象可以避开场景对象运动,提高了目标运动对象运动过程中的安全性。In the above object recognition method, the current scene image and the current scene point cloud corresponding to the target moving object are obtained, image feature extraction is performed on the current scene image to obtain initial image features, and point cloud feature extraction is performed on the current scene point cloud to obtain the initial point cloud. feature, obtain the target image position corresponding to the current scene image, and fuse the initial image features based on the point cloud features corresponding to the target image position in the initial point cloud features to obtain the target image features, and obtain the target point corresponding to the current scene point cloud Cloud position, based on the image features corresponding to the target point cloud position in the initial image features, fuse the initial point cloud features to obtain the target point cloud features, and determine the object position corresponding to the scene object based on the target image features and target point cloud features , the target moving object is controlled to move based on the position corresponding to the scene object, so that the position of the scene object can be accurately obtained, so that the target moving object can avoid the movement of the scene object, and the safety of the target moving object during the movement is improved.
在一些实施例中,获取当前场景图像对应的目标图像位置,基于初始点云特征中,目标图像位置对应的点云特征,对初始图像特征进行融合处理,得到目标图像特征,包括:根据点云坐标系与图像坐标系之间的坐标转换关系,将目标点云位置转换为图像坐标系中的位置,得到第一转换位置;及,获取第一转换位置与目标图像位置的第一重合位置,将初始点云特征中,第一重合位置对应的点云特征,融合到初始图像特征中第一重合位置对应的图像特征中,得到目标图像特征。In some embodiments, the target image position corresponding to the current scene image is obtained, and based on the point cloud features corresponding to the target image position in the initial point cloud features, the initial image features are fused to obtain the target image features, including: according to the point cloud features The coordinate conversion relationship between the coordinate system and the image coordinate system, the target point cloud position is converted into a position in the image coordinate system, and the first conversion position is obtained; and, the first coincidence position of the first conversion position and the target image position is obtained, In the initial point cloud feature, the point cloud feature corresponding to the first coincident position is fused into the image feature corresponding to the first coincident position in the initial image feature to obtain the target image feature.
具体地,点云坐标系与图像坐标系之间的坐标转换关系,指的是将点云坐标系中的坐标转换为图像坐标系中的坐标的转换关系。转换前的坐标在点云坐标系中所对应的物体与转换后的坐标在图像坐标系中所对应的物体是一致的。下面的描述中将点云坐标系与图像 坐标系之间的坐标转换关系记作第一转换关系。通过第一转换关系可以确定点云坐标系中的坐标所表示的位置在图像坐标系中的坐标,即通过第一转换关系可以确定目标点云位置在图像坐标系中对应的图像位置。例如对于点云坐标系中的坐标(x1,y1,z1),可以通过第一转换关系,将(x1,y1,z1)转化为图像坐标系中的坐标(x2,y2)。其中,将一个坐标系中的坐标转换为另外一个坐标系中的坐标可以称为物理空间投影的过程。Specifically, the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system refers to the transformation relationship between the coordinates in the point cloud coordinate system and the coordinates in the image coordinate system. The object corresponding to the coordinates before transformation in the point cloud coordinate system is the same as the object corresponding to the coordinates after transformation in the image coordinate system. In the following description, the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system is referred to as the first transformation relationship. The coordinates of the position represented by the coordinates in the point cloud coordinate system in the image coordinate system can be determined through the first transformation relationship, that is, the image position corresponding to the target point cloud location in the image coordinate system can be determined through the first transformation relationship. For example, for coordinates (x1, y1, z1) in the point cloud coordinate system, (x1, y1, z1) can be converted into coordinates (x2, y2) in the image coordinate system through the first conversion relationship. Among them, converting coordinates in one coordinate system to coordinates in another coordinate system can be called the process of physical space projection.
第一转换位置指的是目标点云位置在图像坐标系中对应的位置,第一转换位置为二维坐标系中的位置。第一转换位置中可以包括目标点云位置对应的全部或者部分三维数据点的三维坐标在图像坐标系中的二维坐标。第一重合位置指的是第一转换位置与目标图像位置重合的位置。第一重合位置对应的点云特征,指的是第一重合位置在点云坐标系中的位置对应的点云特征。例如,第一转换位置包括(x1,y1)、(x2,y2)以及(x3,y3),目标图像位置包括(x2,y2)、(x3,y3)以及(x4,y4),则第一重合位置包括(x2,y2)以及(x3,y3),若(x2,y2)在点云坐标系中的位置为(x1,y1,z1),(x3,y3)点云坐标系中的位置为(x2,y2,z2),则第一重合位置对应的点云特征包括(x1,y1,z1)对应的点云特征,以及(x2,y2,z2)对应的点云特征。The first transformation position refers to the position corresponding to the target point cloud position in the image coordinate system, and the first transformation position is the position in the two-dimensional coordinate system. The first conversion position may include the two-dimensional coordinates in the image coordinate system of the three-dimensional coordinates of all or part of the three-dimensional data points corresponding to the target point cloud position. The first coincident position refers to a position where the first conversion position coincides with the target image position. The point cloud feature corresponding to the first coincidence position refers to the point cloud feature corresponding to the position of the first coincidence position in the point cloud coordinate system. For example, the first conversion position includes (x1, y1), (x2, y2) and (x3, y3), and the target image position includes (x2, y2), (x3, y3) and (x4, y4), then the first The coincident position includes (x2, y2) and (x3, y3), if the position of (x2, y2) in the point cloud coordinate system is (x1, y1, z1), (x3, y3) The position in the point cloud coordinate system is (x2, y2, z2), the point cloud features corresponding to the first coincident position include the point cloud features corresponding to (x1, y1, z1) and the point cloud features corresponding to (x2, y2, z2).
在一些实施例中,服务器可以将第一重合位置对应的点云特征与第一重合位置对应的图像特征进行拼接,得到目标图像特征。例如,服务器可以将第一重合位置对应的点云特征拼接到第一重合位置对应的图像特征之后,得到目标图像特征。例如,若第一重合位置对应的点云特征用向量A表示,第一重合位置对应的图像特征用向量B表示,服务器可以将向量B与向量A拼接,得到拼接后的向量,根据拼接后的向量得到目标图像特征,例如可以将拼接后的向量作为目标图像特征,也可以对拼接后的向量进行处理得到目标图像特征。In some embodiments, the server may stitch the point cloud feature corresponding to the first coincident position with the image feature corresponding to the first coincident position to obtain the target image feature. For example, the server may obtain the target image feature by splicing the point cloud feature corresponding to the first coincidence position to the image feature corresponding to the first coincidence position. For example, if the point cloud feature corresponding to the first coincident position is represented by vector A, and the image feature corresponding to the first coincident position is represented by vector B, the server can stitch vector B and vector A to obtain a stitched vector. The feature of the target image can be obtained from the vector, for example, the spliced vector can be used as the feature of the target image, or the feature of the target image can be obtained by processing the spliced vector.
在一些实施例中,服务器可以根据图像坐标系与点云坐标系之间的坐标转换关系,将目标图像位置转化为点云坐标系中的位置,得到目标图像位置对应的点云位置,根据点云位置从初始点云特征中提出对应的点云特征,融合到初始图像特征中,得到目标图像特征。In some embodiments, the server may convert the target image position into a position in the point cloud coordinate system according to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, and obtain the point cloud position corresponding to the target image position. The cloud position proposes the corresponding point cloud features from the initial point cloud features, and fuses them into the initial image features to obtain the target image features.
上述实施例中,根据点云坐标系与图像坐标系之间的坐标转换关系,将目标点云位置转换为图像坐标系中的位置,得到第一转换位置,获取第一转换位置与目标图像位置的第一重合位置,将初始点云特征中,第一重合位置对应的点云特征,融合到初始图像特征中第一重合位置对应的图像特征中,得到目标图像特征,使得目标图像特征中包括图像特征以及点云特征,提高了目标图像特征中特征的丰富度,提高了目标图像特征的表示能力。In the above embodiment, according to the coordinate conversion relationship between the point cloud coordinate system and the image coordinate system, the target point cloud position is converted into a position in the image coordinate system, the first conversion position is obtained, and the first conversion position and the target image position are obtained. The first coincidence position of the initial point cloud feature, the point cloud feature corresponding to the first coincidence position in the initial point cloud feature is fused into the image feature corresponding to the first coincidence position in the initial image feature, and the target image feature is obtained, so that the target image feature includes Image features and point cloud features improve the richness of the features in the target image features and improve the representation ability of the target image features.
在一些实施例中,获取当前场景点云对应的目标点云位置,基于初始图像特征中,目标点云位置对应的图像特征,对初始点云特征进行融合处理,得到目标点云特征,包括:根据图像坐标系与点云坐标系之间的坐标转换关系,将目标图像位置转换为点云坐标系中的位置,得到第二转换位置;及,获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到目标点云特征。In some embodiments, the position of the target point cloud corresponding to the point cloud of the current scene is obtained, and based on the image features corresponding to the position of the target point cloud in the initial image features, the initial point cloud features are fused to obtain the target point cloud features, including: According to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, the target image position is converted into a position in the point cloud coordinate system to obtain the second conversion position; and, the second conversion position and the target point cloud position are obtained. For two coincident positions, the image features corresponding to the second coincident positions in the initial image features are fused into the point cloud features corresponding to the second coincident positions in the initial point cloud features to obtain the target point cloud features.
具体地,图像坐标系与点云坐标系之间的坐标转换关系,指的是将图像坐标系中的坐标转换为点云坐标系中的坐标的转换关系。转换前的坐标在图像坐标系中所对应的物体与转换后的坐标在点云坐标系中所对应的物体是一致的。下面的描述中间图像坐标系与点云坐标系之间的坐标转换关系记作第二转换关系。通过第二转换关系可以确定图像坐标系中的坐标所表示的位置在点云坐标系中的坐标。Specifically, the coordinate transformation relationship between the image coordinate system and the point cloud coordinate system refers to the transformation relationship between the coordinates in the image coordinate system and the coordinates in the point cloud coordinate system. The object corresponding to the coordinates before transformation in the image coordinate system is the same as the object corresponding to the coordinates after transformation in the point cloud coordinate system. The coordinate transformation relationship between the intermediate image coordinate system and the point cloud coordinate system is described below as a second transformation relationship. The coordinates of the position represented by the coordinates in the image coordinate system in the point cloud coordinate system can be determined through the second conversion relationship.
第二转换位置指的是目标图像位置在点云坐标系中对应的位置,第二转换位置为三维坐标系中的位置。第二转换位置中可以包括目标图像位置对应的全部或者部分二维坐标在点云坐标系中的三维坐标。第二重合位置指的是第二转换位置与目标点云位置重合的位置。第二重合位置对应的图像特征,指的是第二重合位置对应的图像坐标系中的二维坐标对应的图像特征。目标点云特征是通过将第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到的特征。The second conversion position refers to the position corresponding to the target image position in the point cloud coordinate system, and the second conversion position is the position in the three-dimensional coordinate system. The second conversion position may include the three-dimensional coordinates of all or part of the two-dimensional coordinates corresponding to the target image position in the point cloud coordinate system. The second coincident position refers to the position where the second transformation position coincides with the target point cloud position. The image feature corresponding to the second coincident position refers to the image feature corresponding to the two-dimensional coordinates in the image coordinate system corresponding to the second coincident position. The target point cloud feature is a feature obtained by fusing the image feature corresponding to the second coincident position into the point cloud feature corresponding to the second coincident position in the initial point cloud feature.
在一些实施例中,服务器可以将第二重合位置对应的图像特征与第二重合位置对应的点云特征进行特征融合,得到目标点云特征。特征融合可以包括对特征进行算数运算、组合或拼接中的一种或者多种。算数运算可以包括加、减、乘或除中的一种或者多种。例如,服务器可以将第二重合位置对应的图像特征拼接到第二重合位置对应的点云特征之后,得到目标点云特征。例如,若第二重合位置对应的点云特征用向量C表示,第二重合位置对应的图像特征用向量D表示,服务器可以将向量C与向量D拼接,得到拼接后的向量,根据拼接后的向量得到目标点云特征,例如可以将拼接后的向量作为目标点云特征,也可以对拼接后的向量进行处理得到目标点云特征。In some embodiments, the server may perform feature fusion between the image feature corresponding to the second coincident position and the point cloud feature corresponding to the second coincident position to obtain the target point cloud feature. Feature fusion may include one or more of arithmetic operations, combination or concatenation of features. Arithmetic operations may include one or more of addition, subtraction, multiplication or division. For example, the server may obtain the target point cloud feature by splicing the image feature corresponding to the second coincident position to the point cloud feature corresponding to the second coincident position. For example, if the point cloud feature corresponding to the second coincident position is represented by a vector C, and the image feature corresponding to the second coincident position is represented by a vector D, the server can splicing the vector C and the vector D to obtain the spliced vector. The vector obtains the target point cloud feature. For example, the spliced vector can be used as the target point cloud feature, or the spliced vector can be processed to obtain the target point cloud feature.
在一些实施例中,服务器可以根据点云坐标系与图像坐标系之间的坐标转换关系,将目标点云位置转化为图像坐标系中的位置,得到目标点云位置对应的图像位置,根据图像位置从初始图像特征中提出对应的图像特征,融合到初始点云特征中,得到目标点云特征。例如,可以从初始图像特征中提取与该图像位置相同的位置处的图像特征,或者从初始图像特征中提取与该图像位置之间的差异小于位置差异阈值的位置处的图像特征,融合到初始点云特征中,得到目标点云特征。位置差异阈值可以根据需要设置,也可以是预先设置的。In some embodiments, the server may convert the position of the target point cloud into a position in the image coordinate system according to the coordinate transformation relationship between the point cloud coordinate system and the image coordinate system, and obtain the image position corresponding to the target point cloud position. The position proposes the corresponding image features from the initial image features, and fuses them into the initial point cloud features to obtain the target point cloud features. For example, the image feature at the same position as the image position can be extracted from the initial image feature, or the image feature at the position where the difference from the image position is smaller than the position difference threshold can be extracted from the initial image feature, and fused to the initial image feature. In the point cloud feature, the target point cloud feature is obtained. The position difference threshold can be set as required, or can be preset.
上述实施例中,根据图像坐标系与点云坐标系之间的坐标转换关系,将目标图像位置转换为点云坐标系中的位置,得到第二转换位置,获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到目标点云特征,从而使得目标点云特征中包括图像特征以及点云特征,提高了目标点云特征中特征的丰富度,提高了目标点云特征的表示能力。In the above embodiment, according to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, the target image position is converted into a position in the point cloud coordinate system to obtain the second conversion position, and the second conversion position and the target point cloud are obtained. The second coincident position of the position, the image feature corresponding to the second coincident position in the initial image feature is fused into the point cloud feature corresponding to the second coincident position in the initial point cloud feature, and the target point cloud feature is obtained, so that the target point The cloud features include image features and point cloud features, which improves the feature richness in the target point cloud features and improves the representation ability of the target point cloud features.
在一些实施例中,如图3所示,获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到目标点云特征包括:In some embodiments, as shown in FIG. 3 , a second coincident position between the second conversion position and the target point cloud position is obtained, and the image features corresponding to the second coincident position in the initial image features are fused into the initial point cloud features Among the point cloud features corresponding to the second coincident position, the obtained target point cloud features include:
S302,对当前场景点云进行体素化,得到体素化结果。S302 , performing voxelization on the point cloud of the current scene to obtain a voxelization result.
具体地,体素是体积元素(Volume Pixel)的简称。体素化指的是按照给定的体素尺寸将点云分割为多个体素。每个体素在X、Y和Z轴方向上的尺寸例如可以分别为w、h和e。分割得到的体素包括空体素和非空体素,空体素不包括点云中的点,非空体素包括点云中的点。体素化结果可以包括体素化后得到的体素的数量、体素的位置信息或者体素的尺寸中的至少一种。Specifically, a voxel is an abbreviation for a volume element (Volume Pixel). Voxelization refers to dividing a point cloud into multiple voxels according to a given voxel size. The dimensions of each voxel in the X, Y and Z axis directions may be, for example, w, h and e, respectively. The voxels obtained by segmentation include empty voxels and non-empty voxels, empty voxels do not include points in the point cloud, and non-empty voxels include points in the point cloud. The voxelization result may include at least one of the number of voxels obtained after voxelization, the position information of the voxels, or the size of the voxels.
S304,根据体素化结果进行体素特征提取,得到初始体素特征。S304, extracting voxel features according to the voxelization result to obtain initial voxel features.
具体地,体素特征(Voxel Feature)是用于表示体素的特征。体素特征可以加速网络模型的收敛,简化网络模型的复杂度。服务器可以根据体素化结果中体素内部包括的点的数量,从体素内部采样得到相同数量的点,得到体素对应的采样点,根据体素对应的采样点进行特征提取,得到体素对应的初始体素特征。例如可以针对每个体素中的采样点构成的点云的中心坐标,并给予该中心坐标对体素中的点记性中心归一化处理,得到数据矩阵,将数据矩阵输入到已训练的体素特征识别模型中,得到初始体素特征。体素特征识别模型指的是提取体素特征的模型。Specifically, a voxel feature is a feature used to represent a voxel. Voxel features can accelerate the convergence of the network model and simplify the complexity of the network model. The server can sample the same number of points from the voxel according to the number of points included in the voxel in the voxelization result, obtain the sampling points corresponding to the voxels, and perform feature extraction according to the sampling points corresponding to the voxels to obtain the voxels The corresponding initial voxel features. For example, according to the center coordinates of the point cloud formed by the sampling points in each voxel, and normalize the point memory center in the voxel to the center coordinates, obtain a data matrix, and input the data matrix into the trained voxel In the feature recognition model, the initial voxel features are obtained. The voxel feature recognition model refers to a model that extracts voxel features.
在一些实施例中,对象识别模型还包括体素特征提取层,体素特征提取层可以是与图像特征提取层以及点云特征提取层进行联合训练得到的。服务器可以将场景点云输入到体素特征提取层中,得到体素特征提取层输出的体素特征。In some embodiments, the object recognition model further includes a voxel feature extraction layer, and the voxel feature extraction layer may be obtained by joint training with the image feature extraction layer and the point cloud feature extraction layer. The server can input the scene point cloud into the voxel feature extraction layer, and obtain the voxel feature output by the voxel feature extraction layer.
S306,获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到中间点云特征。S306, obtain the second coincidence position between the second conversion position and the target point cloud position, and fuse the image features corresponding to the second coincidence position in the initial image features into the point cloud features corresponding to the second coincidence position in the initial point cloud features , get the intermediate point cloud features.
具体地,中间点云特征是通过将第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到的特征。Specifically, the intermediate point cloud feature is a feature obtained by fusing the image feature corresponding to the second coincident position into the point cloud feature corresponding to the second coincident position in the initial point cloud feature.
S308,获取当前场景点云对应的目标体素位置,根据体素坐标系与点云坐标系之间的坐标转换关系,将目标体素位置转换为点云坐标系中的位置,得到第三转换位置。S308: Obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target voxel position to a position in the point cloud coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the point cloud coordinate system, to obtain a third transformation Location.
具体地,体素位置指的体素在体素坐标系中的位置。目标体素位置指的是当前场景点云对应的体素在体素坐标系中的位置。目标体素位置可以包括当前场景点云对应的各个体素分别在体素坐标系中的位置。根据体素坐标系可以得到体素的坐标。体素坐标系与点云坐标系之间的坐标转换关系,指的是将体素坐标系中的坐标转换为点云坐标系中的坐标的转换关系。体素坐标系为三维坐标系。下面的描述中将体素坐标系与点云坐标系之间的坐标转换关系记作第三转换关系。Specifically, the voxel position refers to the position of the voxel in the voxel coordinate system. The target voxel position refers to the position of the voxel corresponding to the current scene point cloud in the voxel coordinate system. The target voxel position may include the respective positions of each voxel corresponding to the current scene point cloud in the voxel coordinate system. The coordinates of the voxel can be obtained according to the voxel coordinate system. The coordinate conversion relationship between the voxel coordinate system and the point cloud coordinate system refers to the conversion relationship between the coordinates in the voxel coordinate system and the coordinates in the point cloud coordinate system. The voxel coordinate system is a three-dimensional coordinate system. In the following description, the coordinate conversion relationship between the voxel coordinate system and the point cloud coordinate system is referred to as the third conversion relationship.
第三转换位置指的是目标体素位置在点云坐标系中对应的位置。第三转换位置为点云坐标系中的位置。The third transformation position refers to the corresponding position of the target voxel position in the point cloud coordinate system. The third transformed position is the position in the point cloud coordinate system.
S310,获取第三转换位置与目标体素位置的第三重合位置,将初始体素特征中,第三重合位置对应的体素特征,融合到中间点云特征中第三转换位置对应的点云特征中,得到目标点云特征。S310: Obtain a third coincident position between the third conversion position and the target voxel position, and fuse the voxel feature corresponding to the third coincident position in the initial voxel feature with the voxel feature corresponding to the third conversion position in the intermediate point cloud feature In the point cloud feature, the target point cloud feature is obtained.
具体地,第三重合位置指的是第三转换位置与目标体素位置的重合位置。第三重合位置对应的体素特征,指的是第三重合位置在体素坐标系中的对应位置处的体素特征。服务器可以将初始体素特征中第三重合位置对应的体素特征,与中间点云特征中第三转换位置对应的点云特征进行特征融合,得到目标点云特征。Specifically, the third coincident position refers to the coincidence position of the third conversion position and the target voxel position. The voxel feature corresponding to the third coincident position refers to the voxel feature at the corresponding position of the third coincident position in the voxel coordinate system. The server may perform feature fusion between the voxel feature corresponding to the third coincident position in the initial voxel feature and the point cloud feature corresponding to the third transformation position in the intermediate point cloud feature to obtain the target point cloud feature.
上述实施例中,对当前场景点云进行体素化,得到体素化结果,根据体素化结果进行体素特征提取,得到初始体素特征,获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到中间点云特征,获取当前场景点云对应的目标体素位置,根据体素坐标系与点云坐标系之间的坐标转换关系,将目标体素位置转换为点云坐标系中的位置,得到第三转换位置,获取第三转换位置与目标体素位置的第三重合位置,将初始体素特征中,第三重合位置对应的体素特征,融合到中间点云特征中第三转换位置对应的点云特征中,得到目标点云特征,使得中间点云特征中包括点云特征以及图像特征,从而使得目标点云特征中包括图像特征、点云特征以及体素特征,提高了目标点云特征中特征的丰富度。提高了目标点云特征的表示能力。将体素特征具有的容易学习的优势与点云特征具有的信息无损的优势相结合,达到了优势互补的效果。In the above embodiment, the current scene point cloud is voxelized to obtain the voxelization result, the voxel feature extraction is performed according to the voxelization result, the initial voxel feature is obtained, and the second transformation position and the first position of the target point cloud position are obtained. Two coincidence positions, in the initial image features, the image features corresponding to the second coincidence positions are fused into the point cloud features corresponding to the second coincidence positions in the initial point cloud features, to obtain the intermediate point cloud features, and obtain the corresponding point cloud of the current scene. The target voxel position, according to the coordinate conversion relationship between the voxel coordinate system and the point cloud coordinate system, convert the target voxel position to the position in the point cloud coordinate system, obtain the third conversion position, obtain the third conversion position and the target The third coincident position of the voxel position, the voxel feature corresponding to the third coincident position in the initial voxel feature is fused into the point cloud feature corresponding to the third transformation position in the intermediate point cloud feature, and the target point cloud is obtained. features, so that the intermediate point cloud features include point cloud features and image features, so that the target point cloud features include image features, point cloud features, and voxel features, which improves the feature richness in the target point cloud features. Improves the representation ability of target point cloud features. Combining the advantages of easy learning of voxel features and the information lossless advantages of point cloud features achieves the effect of complementary advantages.
在一些实施例中,该方法还包括:对当前场景点云进行体素化,得到体素化结果;根据体素化结果进行体素特征提取,得到初始体素特征;获取当前场景点云对应的目标体素位置,根据图像坐标系与体素坐标系之间的坐标转换关系,将目标图像位置转换为体素坐标系中的位置,得到第四转换位置;获取第四转换位置与体素位置的第四重合位置,将初始图像特征中,第四重合位置对应的图像特征,融合到初始体素特征中第四重合位置对应的体素特征中,得到目标体素特征。In some embodiments, the method further includes: performing voxelization on the point cloud of the current scene to obtain a voxelization result; performing voxel feature extraction according to the voxelization result to obtain an initial voxel feature; obtaining the corresponding point cloud of the current scene According to the coordinate conversion relationship between the image coordinate system and the voxel coordinate system, convert the target image position to the position in the voxel coordinate system to obtain the fourth conversion position; obtain the fourth conversion position and the voxel For the fourth coincident position of the position, the image feature corresponding to the fourth coincident position in the initial image feature is fused into the voxel feature corresponding to the fourth coincident position in the initial voxel feature to obtain the target voxel feature.
具体地,图像坐标系与体素坐标系之间的坐标转换关系,指的是将图像坐标系中的坐标转换为体素坐标系中的坐标的转换关系。转换前的坐标在图像坐标系中所对应的物体与转换后的坐标在体素坐标系中所对应的物体是一致的。下面的描述中间图像坐标系与体素坐标系之间的坐标转换关系记作第四转换关系。通过第四转换关系可以确定图像坐标系中的坐标所表示的位置在体素坐标系中的坐标。Specifically, the coordinate conversion relationship between the image coordinate system and the voxel coordinate system refers to the conversion relationship of converting coordinates in the image coordinate system into coordinates in the voxel coordinate system. The object corresponding to the coordinates before transformation in the image coordinate system is the same as the object corresponding to the coordinates after transformation in the voxel coordinate system. The coordinate conversion relationship between the intermediate image coordinate system and the voxel coordinate system is described below as a fourth conversion relationship. The coordinates of the position represented by the coordinates in the image coordinate system in the voxel coordinate system can be determined through the fourth conversion relationship.
第四转换位置指的是目标图像位置在体素坐标系中对应的位置,第四转换位置为三维坐标系中的位置。第四转换位置中可以包括目标图像位置对应的全部或者部分二维坐标在体素坐标系中的三维坐标。第四重合位置指的是第四转换位置与目标体素位置重合的位置。第四重合位置对应的图像特征,指的是第四重合位置对应的图像坐标系中的二维坐标对应的图像特征。目标体素特征是通过将第四重合位置对应的图像特征,融合到初始体素特征中第四重合位置对应的体素特征中,得到的特征。The fourth transformation position refers to the position corresponding to the target image position in the voxel coordinate system, and the fourth transformation position is the position in the three-dimensional coordinate system. The fourth conversion position may include the three-dimensional coordinates in the voxel coordinate system of all or part of the two-dimensional coordinates corresponding to the target image position. The fourth coincident position refers to a position where the fourth transformation position coincides with the target voxel position. The image feature corresponding to the fourth overlapping position refers to the image feature corresponding to the two-dimensional coordinates in the image coordinate system corresponding to the fourth overlapping position. The target voxel feature is a feature obtained by fusing the image feature corresponding to the fourth coincident position into the voxel feature corresponding to the fourth coincident position in the initial voxel feature.
在一些实施例中,服务器可以将第四重合位置对应的图像特征与第四重合位置对应的体素特征进行特征融合,得到目标体素特征。In some embodiments, the server may perform feature fusion between the image feature corresponding to the fourth coincident position and the voxel feature corresponding to the fourth coincident position to obtain the target voxel feature.
在一些实施例中,服务器可以根据体素坐标系与图像坐标系之间的坐标转换关系,将 目标体素位置转化为图像坐标系中的位置,得到目标体素位置对应的图像位置,根据图像位置从初始图像特征中提出对应的图像特征,融合到初始体素特征中,得到目标体素特征。例如可以将体素的中心位置投影到图像坐标系中,得到中心图像位置,将初始图像特征中提取与中心图像位置之间的差异小于差异阈值的位置处的图像特征,融合到初始体素特征中,得到目标体素特征。差异阈值可以根据需要设置,也可以是预先设置的。In some embodiments, the server may convert the target voxel position into a position in the image coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the image coordinate system, and obtain the image position corresponding to the target voxel position. The position proposes the corresponding image features from the initial image features, and fuses them into the initial voxel features to obtain the target voxel features. For example, the center position of the voxel can be projected into the image coordinate system to obtain the center image position, and the image features at the position where the difference between the original image feature and the center image position is less than the difference threshold can be extracted into the original voxel feature. , the target voxel features are obtained. The difference threshold can be set as required or preset.
在一些实施例中,对象识别模型还可以包括体素空域融合层,服务器可以将图像特征以及体素特征输入到体素空域融合层中,体素空域融合层可以确定图像特征的位置与体素特征的位置间的重合位置,并从图像特征中提取重合位置处的图像特征,融合到体素特征中,得到目标体素特征。对象识别模型还可以包括点与体素融合层,服务器可以将目标体素特征以及中间点云特征输入到点与体素融合层中,点与体素融合层可以确定目标体素特征的位置与中间点云特征的位置间的重合位置,并从目标体素特征中提取重合位置处的体素特征,融合到中间点云特征中,得到目标点云特征。点与体素融合层也可以称为点云与体素融合层。In some embodiments, the object recognition model may further include a voxel spatial fusion layer, the server may input image features and voxel features into the voxel spatial fusion layer, and the voxel spatial fusion layer may determine the position and voxel of the image features The overlapping positions between the positions of the features are extracted, and the image features at the overlapping positions are extracted from the image features, and fused into the voxel features to obtain the target voxel features. The object recognition model can also include a point and voxel fusion layer. The server can input the target voxel features and intermediate point cloud features into the point and voxel fusion layer. The overlapping positions between the positions of the intermediate point cloud features are extracted, and the voxel features at the overlapping positions are extracted from the target voxel features, and fused into the intermediate point cloud features to obtain the target point cloud features. The point and voxel fusion layer can also be referred to as a point cloud and voxel fusion layer.
上述实施例中,对当前场景点云进行体素化,得到体素化结果,根据体素化结果进行体素特征提取,得到初始体素特征,获取当前场景点云对应的目标体素位置,根据图像坐标系与体素坐标系之间的坐标转换关系,将目标图像位置转换为体素坐标系中的位置,得到第四转换位置,获取第四转换位置与体素位置的第四重合位置,将初始图像特征中,第四重合位置对应的图像特征,融合到初始体素特征中第四重合位置对应的体素特征中,得到目标体素特征,使得目标体素特征中包括体素特征以及图像特征,提高了目标体素特征的表示能力以及特征的丰富度。In the above embodiment, the current scene point cloud is voxelized to obtain a voxelization result, voxel feature extraction is performed according to the voxelization result, the initial voxel feature is obtained, and the target voxel position corresponding to the current scene point cloud is obtained, According to the coordinate conversion relationship between the image coordinate system and the voxel coordinate system, the target image position is converted into a position in the voxel coordinate system, the fourth conversion position is obtained, and the fourth coincident position between the fourth conversion position and the voxel position is obtained. , fuse the image features corresponding to the fourth coincident position in the initial image features into the voxel features corresponding to the fourth coincident position in the initial voxel features to obtain the target voxel features, so that the target voxel features include voxel features As well as image features, the representation ability of target voxel features and the richness of features are improved.
在一些实施例中,基于目标图像特征以及目标点云特征,确定场景对象对应的对象位置包括:获取当前场景图像对应的关联场景图像,以及当前场景点云对应的关联场景点云;获取关联场景图像对应的关联图像特征,以及关联场景点云对应的关联点云特征;按照时间先后顺序,对目标图像特征以及关联图像特征进行特征融合,得到目标图像时序特征;按照时间先后顺序,对目标点云特征以及关联点云特征进行特征融合,得到目标点云时序特征;基于目标图像时序特征以及目标点云时序特征,确定场景对象对应的对象位置。In some embodiments, determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature includes: acquiring the associated scene image corresponding to the current scene image and the associated scene point cloud corresponding to the current scene point cloud; acquiring the associated scene The associated image features corresponding to the image, and the associated point cloud features corresponding to the associated scene point cloud; according to the chronological order, the feature fusion of the target image features and the associated image features is performed to obtain the target image temporal sequence features; according to the chronological order, the target point The cloud feature and the associated point cloud feature are fused to obtain the target point cloud time series feature; based on the target image time series feature and the target point cloud time series feature, the object position corresponding to the scene object is determined.
具体地,关联场景图像指的是与当前场景图像关联的图像,例如关联场景图像可以是采集得到当前场景图像的图像采集设备在当前时刻之前采集到的前向帧或之后采集得到的后向帧。可以将前向帧作为关联场景图像,也可以对当前场景图像以及前向帧进行重合对象检测,如果当前场景图像与前向帧之间存在重合的检测对象,则将该前向帧作为当前场景图像的关联场景图像。例如,当前场景图像中存在车辆A,前向帧中也存在车辆A,则可以将前向帧作为前场景图像的关联场景图像。当前场景图像与关联场景图像可以是同一视频中的不同视频帧,例如可以是图像采集设备采集得到的视频中的不同视频帧。关联场景图像可以是当前场景图像之前或之后采集到的视频帧。关联图像特征的得到方式可以参考目标图像特征的得到方式。Specifically, the associated scene image refers to an image associated with the current scene image. For example, the associated scene image may be a forward frame collected before the current moment or a backward frame collected later by the image capture device that obtained the current scene image. . The forward frame can be used as the associated scene image, and the current scene image and the forward frame can also be detected for overlapping objects. If there is a coincident detection object between the current scene image and the forward frame, the forward frame is used as the current scene. The image's associated scene image. For example, if vehicle A exists in the current scene image and vehicle A also exists in the forward frame, the forward frame may be used as the associated scene image of the previous scene image. The current scene image and the associated scene image may be different video frames in the same video, for example, may be different video frames in the video captured by the image capturing device. The associated scene image may be a video frame captured before or after the current scene image. The method of obtaining the associated image features may refer to the obtaining method of the target image features.
关联场景点云指的是与当前场景点云关联的点云,例如关联场景点云可以是采集得到当前场景点云的点云采集设备在当前时刻之前或之后采集得到的场景点云。关联点云特征的得到方式可以参考目标点云特征的得到方式。The associated scene point cloud refers to the point cloud associated with the current scene point cloud. For example, the associated scene point cloud may be the scene point cloud collected before or after the current moment by the point cloud collection device that collected the current scene point cloud. The method of obtaining the associated point cloud features can refer to the obtaining method of the target point cloud features.
在一些实施例中,服务器可以根据关联场景图像与当前场景图像的时间先后顺序,对目标图像特征以及关联图像特征进行组合,得到组合图像特征,在组合图像特征中,时间靠前的图像特征可以排列在时间靠后的图像特征之前。服务器可以根据组合图像特征得到目标图像时序特征,例如可以将组合图像特征作为目标图像时序特征,也可以对组合图像特征进行处理,得到目标图像时序特征。In some embodiments, the server may combine the target image feature and the associated image feature according to the time sequence of the associated scene image and the current scene image to obtain the combined image feature. Arranged before image features that are later in time. The server may obtain the target image time sequence feature according to the combined image feature, for example, the combined image feature may be used as the target image time sequence feature, or the combined image feature may be processed to obtain the target image time sequence feature.
在一些实施例中,服务器可以根据关联场景点云与当前场景点云的时间先后顺序,对目标点云特征以及关联点云特征进行组合,得到组合点云特征,在组合点云特征中,时间靠前的点云特征可以排列在时间靠后的点云特征之前。服务器可以根据组合点云特征得到目标点云时序特征,例如可以将组合点云特征作为目标点云时序特征,也可以对组合点云特征进行处理,得到目标点云时序特征。In some embodiments, the server may combine the target point cloud feature and the associated point cloud feature according to the time sequence of the associated scene point cloud and the current scene point cloud to obtain the combined point cloud feature. The earlier point cloud features can be arranged before the later point cloud features. The server can obtain the target point cloud time series features according to the combined point cloud features. For example, the combined point cloud features can be used as the target point cloud time series features, or the combined point cloud features can be processed to obtain the target point cloud time series features.
在一些实施例中,服务器可以获取关联场景点云对应的关联体素特征,按照时间先后顺序,对目标体素特征以及关联体素特征进行特征融合,得到目标体素时序特征。In some embodiments, the server may obtain the associated voxel feature corresponding to the associated scene point cloud, and perform feature fusion on the target voxel feature and the associated voxel feature in a chronological order to obtain the target voxel time sequence feature.
在一些实施例中,服务器可以利用目标图像时序特征、目标点云时序特征以及目标体素时序特征进行特征融合,得到二次融合图像特征、二次融合时序特征以及二次融合点云特征。目标图像时序特征、目标点云时序特征以及目标体素时序特征之间的特征融合可以参考初始图像特征、初始点云特征以及初始体素特征之间的特征融合方法。服务器可以使用二次融合图像特征、二次融合体素特征以及二次融合点云特征分别进行图像任务学习、体素任务学习以及点云任务学习。In some embodiments, the server may perform feature fusion using target image time series features, target point cloud time series features, and target voxel time series features to obtain secondary fusion image features, secondary fusion time series features, and secondary fusion point cloud features. The feature fusion between the target image time series features, the target point cloud time series features, and the target voxel time series features can refer to the feature fusion method between the initial image features, the initial point cloud features, and the initial voxel features. The server can use the secondary fusion image features, secondary fusion voxel features, and secondary fusion point cloud features to perform image task learning, voxel task learning, and point cloud task learning, respectively.
在一些实施例中,图像特征中可以包括对象的位置信息,服务器可以获取对象在目标图像特征中的位置,得到第一位置,以及该对象在关联图像特征中的位置,得到第二位置,根据第一位置以及第二位置确定对象的运动状态,例如可以根据第一位置与第二位置的相对关系,确定对象是否发生变道或转弯,可以根据第一位置与第二位置之间的差异,确定对象的运动速度。当然,点云特征以及体素特征中也可以包括对象的位置信息,同样可以利用点云特征以及体素特征,确定对象的运动状态。In some embodiments, the image feature may include position information of the object, the server may obtain the position of the object in the target image feature, obtain the first position, and the position of the object in the associated image feature, obtain the second position, according to The first position and the second position determine the motion state of the object. For example, it can be determined whether the object has changed lanes or turned according to the relative relationship between the first position and the second position, and it can be determined according to the difference between the first position and the second position. Determines the speed of movement of the object. Of course, the point cloud feature and the voxel feature may also include the position information of the object, and also the point cloud feature and the voxel feature may be used to determine the motion state of the object.
本实施例中,获取当前场景图像对应的关联场景图像,以及当前场景点云对应的关联场景点云,获取关联场景图像对应的关联图像特征,以及关联场景点云对应的关联点云特征,按照时间先后顺序,对目标图像特征以及关联图像特征进行特征融合,得到目标图像时序特征,按照时间先后顺序,对目标点云特征以及关联点云特征进行特征融合,得到目标点云时序特征,基于目标图像时序特征以及目标点云时序特征,确定场景对象对应的对象位置,从而目标图像时序特征中包括不同场景图像的图像特征以及不同场景点云的点云特征,提高了场景对象位置的准确性。In this embodiment, obtain the associated scene image corresponding to the current scene image, and the associated scene point cloud corresponding to the current scene point cloud, obtain the associated image feature corresponding to the associated scene image, and the associated point cloud feature corresponding to the associated scene point cloud, according to In chronological order, the feature fusion of the target image features and the associated image features is performed to obtain the target image time series features. According to the time sequence, the target point cloud features and the associated point cloud features are feature fused to obtain the target point cloud time series features. Based on the target The image timing features and the target point cloud timing features determine the object positions corresponding to the scene objects, so that the target image timing features include image features of different scene images and point cloud features of different scene point clouds, which improves the accuracy of scene object positions.
在一些实施例中,基于目标图像特征以及目标点云特征,确定场景对象对应的对象位 置包括:确定目标图像特征与目标点云特征间的组合位置,得到目标组合位置;及,将目标组合位置作为场景对象对应的对象位置。In some embodiments, determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature includes: determining a combined position between the target image feature and the target point cloud feature to obtain the target combined position; and, combining the target combined position As the object position corresponding to the scene object.
具体地,组合位置可以是目标图像特征对应的位置与目标点云特征对应的位置的合并。服务器可以将目标图像特征对应的位置,与目标点云特征对应的位置,采用同一坐标系中的坐标表示,例如均采用图像坐标系中的坐标表示,得到目标图像特征对应的第一特征位置,以及目标点云特征对应的第二特征位置,计算第一特征位置与第二特征位置合并后的结果,得到场景对象对应的对象位置。场景对象可以有多个。Specifically, the combined position may be a combination of the position corresponding to the target image feature and the position corresponding to the target point cloud feature. The server can use the coordinates in the same coordinate system to represent the position corresponding to the target image feature and the position corresponding to the target point cloud feature, for example, use the coordinates in the image coordinate system to obtain the first feature position corresponding to the target image feature, and the second feature position corresponding to the target point cloud feature, and calculating the result of combining the first feature position and the second feature position to obtain the object position corresponding to the scene object. There can be multiple scene objects.
本实施例中,确定目标图像特征与目标点云特征间的组合位置,得到目标组合位置,将目标组合位置作为场景对象对应的对象位置,提高了对象位置的准确度。In this embodiment, the combined position between the target image feature and the target point cloud feature is determined to obtain the target combined position, and the target combined position is used as the object position corresponding to the scene object, which improves the accuracy of the object position.
在一些实施例中,服务器可以利用目标图像特征、目标点云特征或目标体素特征中的至少一种进行任务学习。任务可以包括底层任务和高层任务,底层任务可以包括点级别的语义分割(Semantic Segmentation)和场景流(Scene Flow)估计、体素级别的语义分割和场景流估计以及像素级别的语义分割和场景流估计。高层任务可以包括目标检测、场景识别和实例分割(Instance Segmentation)。In some embodiments, the server may perform task learning using at least one of target image features, target point cloud features, or target voxel features. Tasks can include low-level tasks and high-level tasks, and low-level tasks can include point-level semantic segmentation and scene flow estimation, voxel-level semantic segmentation and scene flow estimation, and pixel-level semantic segmentation and scene flow. estimate. High-level tasks can include object detection, scene recognition, and instance segmentation.
在一些实施例中,如图4所示,提供了一种对象识别系统,对象识别系统主要包括第一多传感器特征提取(Multi-Sensor Feature Extraction)模块、时序融合(Temporal Fusion)模块、第二多传感器特征提取模块、图像任务(Image View Tasks)学习模块、体素任务(Voxel Tasks)学习模块以及点任务(Point Tasks)学习模块。其中,每个模型可以采用一个或者多个神经网络模型实现。In some embodiments, as shown in FIG. 4, an object recognition system is provided, and the object recognition system mainly includes a first multi-sensor feature extraction (Multi-Sensor Feature Extraction) module, a temporal fusion (Temporal Fusion) module, a second Multi-sensor feature extraction module, Image View Tasks learning module, Voxel Tasks learning module and Point Tasks learning module. Wherein, each model can be implemented by one or more neural network models.
多传感器特征提取模块支持单个传感器以及多个传感器的融合方法,即输入可以是单个传感器采集得到的数据,也可以是多个传感器分别采集得到的数据。传感器例如可以是图像采集设备或点云采集设备中的至少一种。多传感器特征提取模块包括图像特征提取模块(Image Feature Extraction)、点云特征提取模块(Point Feature Extraction)、体素特征提取模块(Voxel Feature Extraction)、图像空域融合模块(Image Spatial Fusion)、点云空域融合模块(Point Spatial Fusion)、体素空域融合模块(Voxel Spatial Fusion)以及点云与体素融合模块(Point-Voxel Fusion)。图像空域融合模块用于将点云特征融合到图像特征中,点云空域融合模块用于将图像特征融合到点特征中,体素空域融合模块用于将图像特征融合到体素特征中,点云与体素融合模块用于将点特征融合到体素特征中以及将体素特征融合到点特征中。时序融合模块用于将不同帧的图像得到的特征进行融合,即作特征维度的串联。The multi-sensor feature extraction module supports the fusion method of a single sensor and multiple sensors, that is, the input can be the data collected by a single sensor, or the data collected by multiple sensors separately. The sensor may be, for example, at least one of an image acquisition device or a point cloud acquisition device. The multi-sensor feature extraction module includes Image Feature Extraction, Point Feature Extraction, Voxel Feature Extraction, Image Spatial Fusion, Point Cloud Point Spatial Fusion, Voxel Spatial Fusion, and Point-Voxel Fusion. The image spatial domain fusion module is used to fuse point cloud features into image features, the point cloud spatial domain fusion module is used to fuse image features into point features, and the voxel spatial domain fusion module is used to fuse image features into voxel features. The cloud and voxel fusion module is used to fuse point features into voxel features and voxel features into point features. The time series fusion module is used to fuse the features obtained from images of different frames, that is, the concatenation of feature dimensions.
时序融合模块用于融合特征的前后时序信息。对于图像特征,可以将图像特征进行像素点(Pixel)维度的特征串联,例如可以进行Pixel维度的Concate,也可以将两个特征进行相关(correlation)操作。对于点云特征,与FlowNe3D类似,可以进行每个点领域的特征提取操作,与相关操作类似,对于体素特征的操作,与对图像特征的操作类似,不过体素特征的操作处理的是三维的数据。The timing fusion module is used to fuse the before and after timing information of the features. For the image features, the image features can be concatenated in the pixel dimension, for example, the pixel dimension can be concatenated, or the two features can be correlated. For point cloud features, similar to FlowNe3D, feature extraction operations for each point field can be performed, similar to related operations, and operations for voxel features are similar to those for image features, but the operation of voxel features deals with three-dimensional The data.
在一些实施例中,可以通过对象识别系统进行多传感器多任务融合,主要包括下面的步骤:In some embodiments, multi-sensor multi-task fusion can be performed by an object recognition system, which mainly includes the following steps:
步骤1:输入前后帧的图像以及点云;Step 1: Input the image and point cloud of the frame before and after;
步骤2:分别将每一时刻的图像与点云输入到多传感器特征提取模块中;Step 2: Input the image and point cloud at each moment into the multi-sensor feature extraction module;
步骤3:多传感器特征提取模块分别输出每一时刻的图像特征、点特征以及体素特征;Step 3: The multi-sensor feature extraction module outputs image features, point features and voxel features at each moment respectively;
步骤4:将多传感器特征提取模块输出的图像特征、点特征以及体素特征分别进行时序融合,得到三种时序特征,即图像时序特征、点时序特征以及体素时序特征;Step 4: The image features, point features and voxel features output by the multi-sensor feature extraction module are respectively time-series fusion to obtain three time-series features, namely image time-series features, point time-series features and voxel time-series features;
步骤5:将步骤4中得到的三种时序特征输入到多传感器特征提取模块中,再次进行特征融合,得到最终的图像特征、最终的点特征以及最终的体素特征。Step 5: Input the three time series features obtained in Step 4 into the multi-sensor feature extraction module, and perform feature fusion again to obtain the final image features, final point features and final voxel features.
步骤6:基于最终的图像特征(Final ImageView Feature)、最终的点特征(Final Point Feature)以及最终的体素特征(Final Voxel Feature),进行图像级别、点级别以及体素级别下的任务学习。Step 6: Based on the final image feature (Final ImageView Feature), the final point feature (Final Point Feature) and the final voxel feature (Final Voxel Feature), perform task learning at the image level, point level and voxel level.
采用上述实施例提出的对象识别系统,由于采用不同的特征表达方式,即采用多种特征,例如图像特征和点云特征,并且特征之间进行了融合,因此提高了特征学习的有效性。由于特征可以是根据不同类型的传感器采集的数据得到的,因此实现了多传感器融合,提高了算法鲁棒性,多传感器特征提取模块可以通过传感器的有效新选择传感器的特征输入,即可以选择有效的传感器采集得到的数据作为多传感器特征提取模块的输入数据。比如,若相机失效,则可以采用激光雷达采集的数据进行点任务和体素任务。相机失效可以是相机的功能不正常。有效的传感器可以是功能正常的传感器。由于涵盖底层到高层的任务,提高了任务学习的有效性。但是训练的时候可以全任务训,以提升目标任务的性能,,可以根据业务需要,在推断(Inference)阶段,只输出对应任务的网络分支,减少计算量。其中,推断就是深度学习把从训练中学习到的能力应用到工作中去。推断阶段可以理解为使用已训练的模型的阶段。上述提出的对象识别系统和对象识别方法可以应用于自动驾驶感知算法中,对于安装有相机和激光雷达的自动驾驶车辆,可以实现目标检测、语义分割以及场景流估计等任务。场景流估计以及语义分割的结果可以作为基于点云的非深度学习目标检测方法的线索,如基于聚类的目标检测中聚类的cost项。Using the object recognition system proposed in the above embodiment, since different feature expression methods are adopted, that is, multiple features, such as image features and point cloud features, are used, and the features are fused, the effectiveness of feature learning is improved. Since the features can be obtained from data collected by different types of sensors, multi-sensor fusion is realized, and the robustness of the algorithm is improved. The multi-sensor feature extraction module can newly select the feature input of the sensor through the effective sensor, that is, it can select the effective sensor. The data collected by the sensor is used as the input data of the multi-sensor feature extraction module. For example, if the camera fails, the data collected by the lidar can be used for point tasks and voxel tasks. A camera failure may be a malfunction of the camera. A valid sensor can be a functioning sensor. The effectiveness of task learning is improved due to the coverage of tasks from low-level to high-level. However, during training, the whole task can be trained to improve the performance of the target task. According to business needs, in the inference stage, only the network branch corresponding to the task can be output to reduce the amount of calculation. Among them, inference is deep learning to apply the ability learned from training to work. The inference phase can be understood as the phase where the trained model is used. The object recognition system and object recognition method proposed above can be applied to autonomous driving perception algorithms. For autonomous vehicles equipped with cameras and lidars, tasks such as object detection, semantic segmentation, and scene flow estimation can be achieved. The results of scene flow estimation and semantic segmentation can be used as clues for non-deep learning object detection methods based on point clouds, such as the cost term of clustering in cluster-based object detection.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of FIGS. 2-4 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages are not necessarily completed at the same time. The order of execution of the steps is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.
在一些实施例中,如图5所示,提供了一种对象识别装置,包括:当前场景图像获取模块502、初始点云特征得到模块504、目标图像特征得到模块506、目标点云特征得 到模块508、位置确定模块510和运动控制模块512,其中:In some embodiments, as shown in FIG. 5, an object recognition apparatus is provided, including: a current scene image acquisition module 502, an initial point cloud feature acquisition module 504, a target image feature acquisition module 506, and a target point cloud feature acquisition module 508. The position determination module 510 and the motion control module 512, wherein:
当前场景图像获取模块502,用于获取目标运动对象对应的当前场景图像以及当前场景点云。The current scene image acquisition module 502 is configured to acquire the current scene image and the current scene point cloud corresponding to the target moving object.
初始点云特征得到模块504,用于对当前场景图进行图像特征提取,得到初始图像特征,对当前场景点云进行点云特征提取,得到初始点云特征。The initial point cloud feature obtaining module 504 is configured to perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features.
目标图像特征得到模块506,用于获取当前场景图像对应的目标图像位置,基于初始点云特征中,目标图像位置对应的点云特征,对初始图像特征进行融合处理,得到目标图像特征。The target image feature obtaining module 506 is used to obtain the target image position corresponding to the current scene image, and based on the initial point cloud features and the point cloud features corresponding to the target image position, the initial image features are fused to obtain the target image features.
目标点云特征得到模块508,用于获取当前场景点云对应的目标点云位置,基于初始图像特征中,目标点云位置对应的图像特征,对初始点云特征进行融合处理,得到目标点云特征。The target point cloud feature obtaining module 508 is used to obtain the target point cloud position corresponding to the current scene point cloud, and based on the image features corresponding to the target point cloud position in the initial image features, the initial point cloud features are fused to obtain the target point cloud. feature.
位置确定模块510,用于基于目标图像特征以及目标点云特征,确定场景对象对应的对象位置。The position determination module 510 is configured to determine the object position corresponding to the scene object based on the target image feature and the target point cloud feature.
运动控制模块512,用于基于场景对象对应的位置控制目标运动对象进行运动。The motion control module 512 is configured to control the target moving object to move based on the position corresponding to the scene object.
在一些实施例中,目标图像特征得到模块506,包括:In some embodiments, the target image feature obtaining module 506 includes:
第一转换位置得到单元,用于根据点云坐标系与图像坐标系之间的坐标转换关系,将目标点云位置转换为图像坐标系中的位置,得到第一转换位置。The first conversion position obtaining unit is used for converting the target point cloud position into the position in the image coordinate system according to the coordinate conversion relationship between the point cloud coordinate system and the image coordinate system, so as to obtain the first conversion position.
目标图像特征得到单元,用于获取第一转换位置与目标图像位置的第一重合位置,将初始点云特征中,第一重合位置对应的点云特征,融合到初始图像特征中第一重合位置对应的图像特征中,得到目标图像特征。The target image feature obtaining unit is used to obtain the first coincidence position of the first conversion position and the target image position, and fuse the point cloud feature corresponding to the first coincidence position in the initial point cloud feature into the first coincidence position in the initial image feature From the corresponding image features, the target image features are obtained.
在一些实施例中,目标点云特征得到模块508,包括:In some embodiments, the target point cloud feature obtaining module 508 includes:
第二转换位置得到单元,用于根据图像坐标系与点云坐标系之间的坐标转换关系,将目标图像位置转换为点云坐标系中的位置,得到第二转换位置。The second conversion position obtaining unit is used for converting the target image position into a position in the point cloud coordinate system according to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system to obtain the second conversion position.
目标点云特征得到单元,用于获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到目标点云特征。The target point cloud feature obtaining unit is used to obtain the second coincidence position between the second conversion position and the target point cloud position, and fuse the image features corresponding to the second coincidence position in the initial image feature into the second coincidence position in the initial point cloud feature. In the point cloud feature corresponding to the position, the target point cloud feature is obtained.
在一些实施例中,目标点云特征得到单元,还用于对当前场景点云进行体素化,得到体素化结果;根据体素化结果进行体素特征提取,得到初始体素特征;获取第二转换位置与目标点云位置的第二重合位置,将初始图像特征中,第二重合位置对应的图像特征,融合到初始点云特征中第二重合位置对应的点云特征中,得到中间点云特征;获取当前场景点云对应的目标体素位置,根据体素坐标系与点云坐标系之间的坐标转换关系,将目标体素位置转换为点云坐标系中的位置,得到第三转换位置;及获取第三转换位置与目标体素位置的第三重合位置,将初始体素特征中,第三重合位置对应的体素特征,融合到中间点云特征中第三转换位置对应的点云特征中,得到目标点云特征。In some embodiments, the target point cloud feature obtaining unit is further configured to voxelize the current scene point cloud to obtain a voxelization result; perform voxel feature extraction according to the voxelization result to obtain an initial voxel feature; obtain The second conversion position and the second coincidence position of the target point cloud position, the image features corresponding to the second coincidence position in the initial image features are fused into the point cloud features corresponding to the second coincidence position in the initial point cloud features, and the middle point is obtained. Point cloud feature; obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target voxel position to the position in the point cloud coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the point cloud coordinate system, and obtain the first Three transformation positions; and obtaining the third overlapping position of the third transformation position and the target voxel position, and merging the voxel features corresponding to the third overlapping position in the initial voxel features into the third transformation in the intermediate point cloud feature In the point cloud feature corresponding to the position, the target point cloud feature is obtained.
在一些实施例中,该装置还包括:In some embodiments, the apparatus further includes:
体素化结果得到模块,用于对当前场景点云进行体素化,得到体素化结果。The voxelization result obtaining module is used to voxelize the point cloud of the current scene to obtain the voxelization result.
初始体素特征得到模块,用于根据体素化结果进行体素特征提取,得到初始体素特征。The initial voxel feature obtaining module is used to extract the voxel feature according to the voxelization result to obtain the initial voxel feature.
第四转换位置得到模块,用于获取当前场景点云对应的目标体素位置,根据图像坐标系与体素坐标系之间的坐标转换关系,将目标图像位置转换为体素坐标系中的位置,得到第四转换位置。The fourth conversion position obtaining module is used to obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target image position to the position in the voxel coordinate system according to the coordinate conversion relationship between the image coordinate system and the voxel coordinate system , to get the fourth transition position.
目标体素特征得到模块,用于获取第四转换位置与体素位置的第四重合位置,将初始图像特征中,第四重合位置对应的图像特征,融合到初始体素特征中第四重合位置对应的体素特征中,得到目标体素特征。The target voxel feature obtaining module is used to obtain the fourth coincidence position between the fourth conversion position and the voxel position, and fuse the image features corresponding to the fourth coincidence position in the initial image features into the fourth coincidence position in the initial voxel feature From the corresponding voxel features, the target voxel features are obtained.
在一些实施例中,位置确定模块510,包括:In some embodiments, the location determination module 510 includes:
关联场景图像获取单元,用于获取当前场景图像对应的关联场景图像,以及当前场景点云对应的关联场景点云。The associated scene image acquisition unit is configured to acquire the associated scene image corresponding to the current scene image and the associated scene point cloud corresponding to the current scene point cloud.
关联图像特征获取单元,用于获取关联场景图像对应的关联图像特征,以及关联场景点云对应的关联点云特征。The associated image feature acquisition unit is configured to acquire associated image features corresponding to the associated scene images and associated point cloud features corresponding to the associated scene point clouds.
目标图像时序特征得到单元,用于按照时间先后顺序,对目标图像特征以及关联图像特征进行特征融合,得到目标图像时序特征。The target image time sequence feature obtaining unit is used to perform feature fusion on the target image feature and the associated image feature according to the time sequence to obtain the target image time sequence feature.
目标点云时序特征得到单元,用于按照时间先后顺序,对目标点云特征以及关联点云特征进行特征融合,得到目标点云时序特征。The time series feature acquisition unit of the target point cloud is used to perform feature fusion on the target point cloud feature and the associated point cloud feature according to the time sequence to obtain the target point cloud time series feature.
位置确定单元,用于基于目标图像时序特征以及目标点云时序特征,确定场景对象对应的对象位置。The position determination unit is used for determining the position of the object corresponding to the scene object based on the time sequence feature of the target image and the time sequence feature of the target point cloud.
在一些实施例中,位置确定模块510,包括:In some embodiments, the location determination module 510 includes:
目标组合位置得到单元,用于确定目标图像特征与目标点云特征间的组合位置,得到目标组合位置。The target combined position obtaining unit is used to determine the combined position between the target image feature and the target point cloud feature to obtain the target combined position.
对象位置得到单元,用于将目标组合位置作为场景对象对应的对象位置。The object position obtaining unit is used for taking the target combined position as the object position corresponding to the scene object.
关于对象识别装置的具体限定可以参见上文中对于对象识别方法的限定,在此不再赘述。上述对象识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the object recognition apparatus, reference may be made to the limitation of the object recognition method above, which will not be repeated here. Each module in the above-mentioned object recognition device may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储当前场景图像、当前点云图像、点云特征、图像特征以及体素特征等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种对象识别方法。In some embodiments, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store data such as the current scene image, the current point cloud image, point cloud features, image features, and voxel features. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions, when executed by a processor, implement a method of object recognition.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行上述对象识别方法的步骤。A computer device includes a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processors, cause the one or more processors to perform the steps of the above object identification method.
一个或多个存储有计算机可读指令的计算机存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述对象识别方法的步骤。One or more computer storage media storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above object identification method.
其中,该计算机存储介质为可读存储介质,可读存储介质可以是非易失性,也可以是易失性的。Wherein, the computer storage medium is a readable storage medium, and the readable storage medium may be non-volatile or volatile.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (10)

  1. 一种对象识别方法,包括:An object recognition method comprising:
    获取目标运动对象对应的当前场景图像以及当前场景点云;Obtain the current scene image and the current scene point cloud corresponding to the target moving object;
    对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;Perform image feature extraction on the current scene graph to obtain initial image features, and perform point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
    获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;Obtaining the target image position corresponding to the current scene image, and performing fusion processing on the initial image feature based on the point cloud feature corresponding to the target image position in the initial point cloud feature to obtain the target image feature;
    获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;Obtaining the target point cloud position corresponding to the point cloud of the current scene, and performing fusion processing on the initial point cloud feature based on the image feature corresponding to the target point cloud position in the initial image feature to obtain the target point cloud feature;
    基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature; and
    基于所述场景对象对应的位置控制所述目标运动对象进行运动。The target moving object is controlled to move based on the position corresponding to the scene object.
  2. 根据权利要求1所述的方法,其中,所述获取所述当前场景图像对应的目标图像位置,基于所述初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征,包括:The method according to claim 1, wherein, in the acquisition of the target image position corresponding to the current scene image, based on the initial point cloud feature, the point cloud feature corresponding to the target image position, the initial image The features are fused to obtain the target image features, including:
    根据点云坐标系与图像坐标系之间的坐标转换关系,将所述目标点云位置转换为图像坐标系中的位置,得到第一转换位置;及According to the coordinate conversion relationship between the point cloud coordinate system and the image coordinate system, the target point cloud position is converted into a position in the image coordinate system to obtain a first converted position; and
    获取所述第一转换位置与所述目标图像位置的第一重合位置,将所述初始点云特征中,所述第一重合位置对应的点云特征,融合到所述初始图像特征中所述第一重合位置对应的图像特征中,得到目标图像特征。Obtain the first coincidence position of the first conversion position and the target image position, and fuse the point cloud feature corresponding to the first coincidence position in the initial point cloud feature into the initial image feature described in the From the image features corresponding to the first coincident position, the target image features are obtained.
  3. 根据权利要求1所述的方法,其中,所述获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征,包括:The method according to claim 1, wherein the acquiring the target point cloud position corresponding to the current scene point cloud, based on the initial image features, the image features corresponding to the target point cloud position, for the initial The point cloud features are fused to obtain the target point cloud features, including:
    根据图像坐标系与点云坐标系之间的坐标转换关系,将所述目标图像位置转换为点云坐标系中的位置,得到第二转换位置;及According to the coordinate conversion relationship between the image coordinate system and the point cloud coordinate system, the target image position is converted into a position in the point cloud coordinate system to obtain a second converted position; and
    获取所述第二转换位置与所述目标点云位置的第二重合位置,将所述初始图像特征中,所述第二重合位置对应的图像特征,融合到所述初始点云特征中所述第二重合位置对应的点云特征中,得到目标点云特征。Obtain the second coincidence position of the second conversion position and the target point cloud position, and fuse the image features corresponding to the second coincidence position in the initial image features into the initial point cloud features. From the point cloud feature corresponding to the second coincident position, the target point cloud feature is obtained.
  4. 根据权利要求3所述的方法,其中,所述获取所述第二转换位置与所述目标点云位置的第二重合位置,将所述初始图像特征中,所述第二重合位置对应的图像特征,融合到所述初始点云特征中所述第二重合位置对应的点云特征中,得到目标点云特征包括:The method according to claim 3, wherein the obtaining of the second coincidence position of the second conversion position and the target point cloud position, and the image corresponding to the second coincidence position in the initial image feature The feature is fused into the point cloud feature corresponding to the second coincident position in the initial point cloud feature, and the obtained target point cloud feature includes:
    对所述当前场景点云进行体素化,得到体素化结果;Perform voxelization on the current scene point cloud to obtain a voxelization result;
    根据所述体素化结果进行体素特征提取,得到初始体素特征;Perform voxel feature extraction according to the voxelization result to obtain initial voxel features;
    获取所述第二转换位置与所述目标点云位置的第二重合位置,将所述初始图像特征中,所述第二重合位置对应的图像特征,融合到所述初始点云特征中所述第二重合位置对应的点云特征中,得到中间点云特征;Obtain the second coincidence position of the second conversion position and the target point cloud position, and fuse the image features corresponding to the second coincidence position in the initial image features into the initial point cloud features. In the point cloud feature corresponding to the second coincident position, obtain the intermediate point cloud feature;
    获取所述当前场景点云对应的目标体素位置,根据体素坐标系与点云坐标系之间的坐标转换关系,将所述目标体素位置转换为点云坐标系中的位置,得到第三转换位置;及Obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target voxel position to the position in the point cloud coordinate system according to the coordinate transformation relationship between the voxel coordinate system and the point cloud coordinate system, and obtain the first three switching positions; and
    获取所述第三转换位置与所述目标体素位置的第三重合位置,将所述初始体素特征中,所述第三重合位置对应的体素特征,融合到所述中间点云特征中所述第三转换位置对应的点云特征中,得到目标点云特征。Obtain the third coincidence position between the third conversion position and the target voxel position, and fuse the voxel feature corresponding to the third coincidence position in the initial voxel feature into the intermediate point cloud The target point cloud feature is obtained from the point cloud feature corresponding to the third transformation position in the feature.
  5. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, wherein the method further comprises:
    对所述当前场景点云进行体素化,得到体素化结果;Perform voxelization on the current scene point cloud to obtain a voxelization result;
    根据所述体素化结果进行体素特征提取,得到初始体素特征;Perform voxel feature extraction according to the voxelization result to obtain initial voxel features;
    获取所述当前场景点云对应的目标体素位置,根据图像坐标系与体素坐标系之间的坐标转换关系,将所述目标图像位置转换为体素坐标系中的位置,得到第四转换位置;及Obtain the target voxel position corresponding to the point cloud of the current scene, and convert the target image position into a position in the voxel coordinate system according to the coordinate transformation relationship between the image coordinate system and the voxel coordinate system to obtain a fourth transformation location; and
    获取所述第四转换位置与所述体素位置的第四重合位置,将所述初始图像特征中,所述第四重合位置对应的图像特征,融合到所述初始体素特征中所述第四重合位置对应的体素特征中,得到目标体素特征。Obtain the fourth coincidence position between the fourth conversion position and the voxel position, and fuse the image feature corresponding to the fourth coincidence position in the initial image feature into the first position in the initial voxel feature. Among the voxel features corresponding to the four overlapping positions, the target voxel features are obtained.
  6. 根据权利要求1所述的方法,其中,所述基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置包括:The method according to claim 1, wherein the determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature comprises:
    获取所述当前场景图像对应的关联场景图像,以及所述当前场景点云对应的关联场景点云;acquiring the associated scene image corresponding to the current scene image, and the associated scene point cloud corresponding to the current scene point cloud;
    获取所述关联场景图像对应的关联图像特征,以及所述关联场景点云对应的关联点云特征;acquiring the associated image feature corresponding to the associated scene image, and the associated point cloud feature corresponding to the associated scene point cloud;
    按照时间先后顺序,对所述目标图像特征以及所述关联图像特征进行特征融合,得到目标图像时序特征;Perform feature fusion on the target image feature and the associated image feature according to the time sequence to obtain the target image time sequence feature;
    按照时间先后顺序,对所述目标点云特征以及所述关联点云特征进行特征融合,得到目标点云时序特征;及Performing feature fusion on the target point cloud features and the associated point cloud features in a chronological order to obtain target point cloud time series features; and
    基于所述目标图像时序特征以及所述目标点云时序特征,确定场景对象对应的对象位置。Based on the time sequence feature of the target image and the time sequence feature of the target point cloud, the object position corresponding to the scene object is determined.
  7. 根据权利要求1所述的方法,其中,所述基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置包括:The method according to claim 1, wherein the determining the object position corresponding to the scene object based on the target image feature and the target point cloud feature comprises:
    确定所述目标图像特征与所述目标点云特征间的组合位置,得到目标组合位置;及Determine the combined position between the target image feature and the target point cloud feature to obtain the target combined position; and
    将所述目标组合位置作为场景对象对应的对象位置。The target combination position is used as the object position corresponding to the scene object.
  8. 一种对象识别装置,包括:An object recognition device, comprising:
    当前场景图像获取模块,用于获取目标运动对象对应的当前场景图像以及当前场景点云;The current scene image acquisition module is used to acquire the current scene image and the current scene point cloud corresponding to the target moving object;
    初始点云特征得到模块,用于对所述当前场景图进行图像特征提取,得到初始图像特征,对所述当前场景点云进行点云特征提取,得到初始点云特征;an initial point cloud feature obtaining module, used for performing image feature extraction on the current scene graph to obtain initial image features, and performing point cloud feature extraction on the current scene point cloud to obtain initial point cloud features;
    目标图像特征得到模块,用于获取所述当前场景图像对应的目标图像位置,基于所述 初始点云特征中,所述目标图像位置对应的点云特征,对所述初始图像特征进行融合处理,得到目标图像特征;The target image feature obtaining module is used to obtain the target image position corresponding to the current scene image, and based on the initial point cloud features, the point cloud features corresponding to the target image position, perform fusion processing on the initial image features, Get the target image features;
    目标点云特征得到模块,用于获取所述当前场景点云对应的目标点云位置,基于所述初始图像特征中,所述目标点云位置对应的图像特征,对所述初始点云特征进行融合处理,得到目标点云特征;The target point cloud feature obtaining module is used to obtain the target point cloud position corresponding to the current scene point cloud, and based on the initial image features, the image features corresponding to the target point cloud position, perform the initial point cloud feature Fusion processing to obtain the target point cloud features;
    位置确定模块,用于基于所述目标图像特征以及所述目标点云特征,确定场景对象对应的对象位置;及a position determination module for determining an object position corresponding to a scene object based on the target image feature and the target point cloud feature; and
    运动控制模块,用于基于所述场景对象对应的位置控制所述目标运动对象进行运动。A motion control module, configured to control the target moving object to move based on the position corresponding to the scene object.
  9. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1至7中任一项所述方法的步骤。A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored in the memory that, when executed by the one or more processors, cause the one or more processors to A processor performs the steps of the method of any one of claims 1 to 7.
  10. 一个或多个存储有计算机可读指令的计算机存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1至7中任一项所述方法的步骤。One or more computer storage media storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform any one of claims 1 to 7 the steps of the method.
PCT/CN2020/128125 2020-11-11 2020-11-11 Object identification method and apparatus, computer device, and storage medium WO2022099510A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080092994.8A CN115004259B (en) 2020-11-11 2020-11-11 Object recognition method, device, computer equipment and storage medium
PCT/CN2020/128125 WO2022099510A1 (en) 2020-11-11 2020-11-11 Object identification method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/128125 WO2022099510A1 (en) 2020-11-11 2020-11-11 Object identification method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022099510A1 true WO2022099510A1 (en) 2022-05-19

Family

ID=81601893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128125 WO2022099510A1 (en) 2020-11-11 2020-11-11 Object identification method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN115004259B (en)
WO (1) WO2022099510A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246287A (en) * 2023-03-15 2023-06-09 北京百度网讯科技有限公司 Target object recognition method, training device and storage medium
CN116958766A (en) * 2023-07-04 2023-10-27 阿里巴巴(中国)有限公司 Image processing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740669B (en) * 2023-08-16 2023-11-14 之江实验室 Multi-view image detection method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110045729A (en) * 2019-03-12 2019-07-23 广州小马智行科技有限公司 A kind of Vehicular automatic driving method and device
US10634793B1 (en) * 2018-12-24 2020-04-28 Automotive Research & Testing Center Lidar detection device of detecting close-distance obstacle and method thereof
CN111191600A (en) * 2019-12-30 2020-05-22 深圳元戎启行科技有限公司 Obstacle detection method, obstacle detection device, computer device, and storage medium
CN111563923A (en) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 Method for obtaining dense depth map and related device
CN111797734A (en) * 2020-06-22 2020-10-20 广州视源电子科技股份有限公司 Vehicle point cloud data processing method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10634793B1 (en) * 2018-12-24 2020-04-28 Automotive Research & Testing Center Lidar detection device of detecting close-distance obstacle and method thereof
CN110045729A (en) * 2019-03-12 2019-07-23 广州小马智行科技有限公司 A kind of Vehicular automatic driving method and device
CN111191600A (en) * 2019-12-30 2020-05-22 深圳元戎启行科技有限公司 Obstacle detection method, obstacle detection device, computer device, and storage medium
CN111797734A (en) * 2020-06-22 2020-10-20 广州视源电子科技股份有限公司 Vehicle point cloud data processing method, device, equipment and storage medium
CN111563923A (en) * 2020-07-15 2020-08-21 浙江大华技术股份有限公司 Method for obtaining dense depth map and related device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246287A (en) * 2023-03-15 2023-06-09 北京百度网讯科技有限公司 Target object recognition method, training device and storage medium
CN116246287B (en) * 2023-03-15 2024-03-22 北京百度网讯科技有限公司 Target object recognition method, training device and storage medium
CN116958766A (en) * 2023-07-04 2023-10-27 阿里巴巴(中国)有限公司 Image processing method

Also Published As

Publication number Publication date
CN115004259A (en) 2022-09-02
CN115004259B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
WO2022099510A1 (en) Object identification method and apparatus, computer device, and storage medium
CN111160302B (en) Obstacle information identification method and device based on automatic driving environment
CN110163904B (en) Object labeling method, movement control method, device, equipment and storage medium
Cheng et al. Noise-aware unsupervised deep lidar-stereo fusion
CN111223135B (en) System and method for enhancing range estimation by monocular cameras using radar and motion data
US11113526B2 (en) Training methods for deep networks
US20210183083A1 (en) Self-supervised depth estimation method and system
CN111191600A (en) Obstacle detection method, obstacle detection device, computer device, and storage medium
US20210097266A1 (en) Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision
KR20190087258A (en) Object pose estimating method and apparatus
JP7135665B2 (en) VEHICLE CONTROL SYSTEM, VEHICLE CONTROL METHOD AND COMPUTER PROGRAM
KR20210025942A (en) Method for stereo matching usiing end-to-end convolutional neural network
EP3992908A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
US11436839B2 (en) Systems and methods of detecting moving obstacles
US11321859B2 (en) Pixel-wise residual pose estimation for monocular depth estimation
US11443151B2 (en) Driving assistant system, electronic device, and operation method thereof
CN116469079A (en) Automatic driving BEV task learning method and related device
US11625905B2 (en) System and method for tracking occluded objects
CN116681739A (en) Target motion trail generation method and device and electronic equipment
US20230109473A1 (en) Vehicle, electronic apparatus, and control method thereof
US20230400863A1 (en) Information processing device, information processing system, method, and program
CN115346184A (en) Lane information detection method, terminal and computer storage medium
WO2022127451A1 (en) Method and apparatus for determining spatial state of elevator, and storage medium
CN115703234A (en) Robot control method, robot control device, robot, and storage medium
WO2019188392A1 (en) Information processing device, information processing method, program, and moving body

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961069

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.10.2023)