WO2022193193A1

WO2022193193A1 - Data processing method and device

Info

Publication number: WO2022193193A1
Application number: PCT/CN2021/081387
Authority: WO
Inventors: 江灿森; 陈琦; 衡量; 沈劭劼
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-09-22
Also published as: CN116762094A

Abstract

A data processing method and a device. The method comprises: when a movable platform moves in a space scene, controlling an image sensor, which is located on the movable platform, to capture multi-frame image data of the space scene; processing the multi-frame image data to obtain map metadata, wherein the map metadata comprises any one or a combination of a three-dimensional feature point, texture data, and semantic information; determining whether the map metadata meets a mapping quality requirement; and if the map metadata meets the mapping quality requirement, generating map data according to the map metadata, wherein the map data is used for controlling the movable platform to move in the space scene. By means of this solution, a customization requirement of a mobile platform for map data can be met, and the accuracy of the generated map data can also be ensured.

Description

Data processing method and device

technical field

The present application relates to the technical field of automatic driving, and in particular, to a data processing method and device.

Background technique

High-precision map data is an important basis for vehicle autonomous driving and is of great significance to the development of intelligent vehicles.

HD map data is usually provided by map providers. Map suppliers usually only provide high-precision map data with relatively large usage, but do not provide high-precision map data with relatively small usage.

However, smart cars have different requirements for high-precision maps, and the existing high-precision map supply methods cannot adapt to the personalized map requirements of smart cars.

SUMMARY OF THE INVENTION

The embodiments of the present application provide a data processing method and device, aiming to provide a solution that can adapt to the personalized map requirements of different mobile platforms.

In a first aspect, the present application provides a data processing method, comprising:

When the movable platform moves in the space scene, control the image sensor located on the movable platform to collect multiple frames of image data of the space scene;

Process the multi-frame image data to obtain map metadata; wherein, the map metadata includes any one or a combination of three-dimensional feature points, texture data and semantic information;

Judging whether the map metadata meets the quality requirements for mapping;

If the map metadata meets the mapping quality requirements, map data is generated according to the map metadata; wherein the map data is used to control the movable platform to move within the spatial scene.

In a second aspect, the present application provides a control device, comprising: a memory for storing instructions and a processor for executing the instructions stored in the memory, where the processor is used to specifically execute:

Judging whether the map metadata meets the quality requirements for mapping;

In a third aspect, the present application provides a movable platform including an image sensor and the data processing method involved in the second aspect.

To sum up, the embodiments of the present application provide a data processing method and device. The movable platform collects image data of a spatial scene when moving, generates map data based on the image data, and then uses the collected map data to control its movement, which can meet the requirements of the mobile platform. Personalized map data requirements for mobile platforms. To generate map data based on image data, the existing image sensors on the mobile platform can be used, and there is no need to configure high-cost sensors such as lidar to collect point clouds, reducing the cost of map construction. Moreover, after judging that the map metadata meets the quality requirements for mapping, the map data is generated according to the map metadata, so as to ensure the accuracy of the map data generated by the mobile platform. In addition, in this application, map data is generated based on image data, and the storage of map data is lighter, which is very convenient for real-time update and maintenance of maps.

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present application, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

FIG. 1 is a schematic structural diagram of a movable platform according to an embodiment of the present application;

2 is a schematic flowchart of a data processing method provided by another embodiment of the present application;

3 is a schematic flowchart of a data processing method provided by another embodiment of the present application;

4 is a schematic flowchart of a data processing method provided by another embodiment of the present application;

5 is a schematic flowchart of a data processing method provided by another embodiment of the present application;

FIG. 6 is a schematic structural diagram of a control device provided by another embodiment of the present application.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

It should be noted that when a component is referred to as being "fixed to" another component, it can be directly on the other component or there may also be a centered component. When a component is considered to be "connected" to another component, it may be directly connected to the other component or there may be a co-existence of an intervening component.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing specific embodiments only, and are not intended to limit the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and features in the embodiments may be combined with each other without conflict.

The existing high-precision map supply methods cannot meet the personalized map requirements of smart cars. To solve the above technical problems, the present application provides a data processing method and device. The technical idea of the present application is: the image data of the space scene is collected by the movable platform while walking in the space scene, and the collected data is used to generate the map data, which can adapt to the personalized needs of the movable platform for map data. Before generating map data, it is judged whether the generated map metadata meets the requirements of mapping quality, so as to ensure that the map data generated by the mobile platform can accurately reflect the spatial scene. And the map data is generated based on the image data collected by the image sensor, no high-cost image sensor is needed, and the cost of map construction is reduced.

As shown in FIG. 1 , an embodiment of the present application provides a movable platform 100 . The movable platform 100 includes an image sensor 101 , a travel sensor (not shown) and a control device (not shown). The image sensor 101 is used to collect the image data of the scene around the movable platform 100, the driving sensor is used to collect the driving data of the movable platform, and the control device is used to execute the data processing method described below. Repeat.

This application can be used to solve the automatic parking problem in the automatic driving function, and can be used for map construction in the process of short-distance automatic parking, for example, within 300 meters. The map is mainly used to record various landmarks in the parking lot, including parking spaces, traffic signs, road lane lines, landmark buildings, etc. After the map is constructed, it can assist in the realization of functions such as parking lot location recognition in the process of automatic parking, automatic search for parking spaces in the map area, and locating vehicles at any location in the map area.

As shown in Figure 2, the application provides a data processing method, the execution subject of the method is a control device, and the method specifically includes the following steps:

S201. When the movable platform moves in the space scene, the control device controls the image sensor located on the movable platform to collect multiple frames of image data of the space scene.

Wherein, when the movable platform enters a certain space scene, the image sensor on the movable platform is controlled to work, the image sensor collects image data of the space scene, and the image sensor transmits the image data to the control device.

Preferably, the driving sensor collects the position information of the movable platform, transmits the collected position information to the control device, and the control device determines whether to enter a certain space scene according to the position information, and controls the movable platform when it is determined to enter the designated space scene. The upper image sensor works, and the image sensor transmits the collected multi-frame image data to the control device.

S202, the control device processes the multi-frame image data to obtain map metadata.

The map metadata is used to generate map data, and the map metadata includes any one or a combination of three-dimensional feature points, texture data, and semantic information.

The three-dimensional feature points are used to reflect the position and shape of objects in the scene space, the texture data are used to reflect the surface information of the objects in the scene space, and the semantic information is used to reflect the categories of objects represented by the texture data and the three-dimensional feature points.

The above-mentioned map metadata is obtained by extracting two-dimensional feature data from image data, matching two-dimensional feature data in multi-frame image data, and semantic recognition.

Further, to determine whether the map metadata meets the quality requirements for mapping, step S203 enumerates a specific implementation:

S203. If the map metadata meets the quality requirements for mapping, the control device generates map data according to the map metadata.

The mapping quality requirement is used to determine whether the above-mentioned map metadata is rich enough, that is, whether the quantity of map metadata is sufficient, and whether the data types of the map metadata are sufficient.

If the map metadata is rich enough, the quality of the map data constructed using the map metadata will be higher, that is, the map data will more accurately describe the spatial scene. If the data volume of the map metadata is small and the types are single, the quality of the map data constructed by using the map metadata is low, that is, the map data cannot accurately describe the spatial scene.

When the map metadata meets the mapping quality requirements, the map metadata is processed to obtain map data, for example, each layer is obtained by image processing of the map metadata.

The map data is used to control the movement of the movable platform within the spatial scene. The control device can control the movable platform to move within the space scene at the current moment according to the map data generated at the previous moment. The control device can also control the movable platform to move in the space scene according to the generated map data when entering the space scene again next time.

In the above technical solution, the movable platform collects image data during the walking process, generates map data based on the image data, and then uses the collected map data to control its walking, which can meet the personalized map data requirements of the movable platform, and can directly use The existing image sensors of the mobile platform collect data, and there is no need to configure high-cost sensors, such as lidar. In addition, after judging that the map metadata meets the quality requirements for mapping, map data is generated according to the map metadata to ensure the accuracy of the generated map data.

As shown in FIG. 3 , another embodiment of the present application provides a data processing method, the execution subject of the method is a control device, and the method specifically includes the following steps:

S301. When the movable platform moves in the space scene, the control device controls the image sensor located on the movable platform to collect multiple frames of image data of the space scene.

Among them, a plurality of image sensors are arranged on the movable platform, and the plurality of image sensors are located around the movable platform and are used to collect image data in the space scene where the movable platform is located. The image sensor for the image data.

S302, the control device processes the multi-frame image data to obtain map metadata.

Wherein, this step has been described in detail in the above embodiments, and will not be repeated here.

S303, the control device uses the identification of the image sensor to mark the source of the map metadata, and obtain the marked map metadata;

Wherein, after multiple frames of image data are processed to obtain map metadata, the obtained map metadata is marked with an identifier of an image sensor that collects the image data. That is to mark the data source of the map metadata.

S304. If the marked map metadata meets the quality requirements for mapping, the control device generates marked map data according to the marked map metadata.

Among them, the mapping quality requirement is used to judge whether the marked map data is rich enough. The map metadata includes any one or a combination of three-dimensional feature points, texture data, and semantic information.

The mapping quality requirements include at least one of the following: the total number of three-dimensional feature points reaches the first threshold; there are at least two three-dimensional feature points with different components on the three coordinate axes; the total number of texture data reaches the second threshold. the number threshold; the number of types of texture data reaches a third number threshold; the total number of semantic information reaches a fourth number threshold; and the number of types of semantic information reaches a fifth number threshold.

By judging whether the total number of three-dimensional feature points reaches the first number threshold, it is determined whether the three-dimensional feature points are sufficiently abundant in number. By judging whether there are at least two three-dimensional feature points with different components on the three coordinate axes, it is determined whether the three-dimensional feature points are sufficiently rich in type. If all 3D feature points are located on the same plane, that is, all 3D feature points have the same component on one of the coordinate axes, for example: all 3D feature points have the same component in the z-axis direction, that is, 3D feature points can only represent A plane cannot represent a rich three-dimensional space scene.

By judging whether the total quantity of texture data reaches the second quantity threshold, it is determined whether the quantity of texture data is sufficiently abundant. By judging whether the number of types of texture data reaches a third quantity threshold, it is determined whether the texture data is rich enough in type.

By judging whether the total quantity of semantic information reaches the fourth quantity threshold, it is determined whether the texture data is sufficiently abundant in quantity. By judging whether the number of types of semantic information reaches the fifth quantity threshold, it is determined whether the types of semantic information are rich enough.

For the above-mentioned map metadata, it is determined whether the map metadata meets the quality requirements of map construction in combination with its richness in quantity and type. The map data generated based on the map metadata can accurately reflect the spatial scene.

After obtaining map metadata that meets the requirements of mapping quality, map data is generated according to the marked map metadata. The process of generating map data specifically includes at least one of the following:

A marked feature data layer is generated according to the marked three-dimensional feature points and marked texture data; and a marked semantic information layer is generated according to the marked semantic information.

Preferably, image processing is performed on the marked three-dimensional features and the marked texture data to obtain a marked feature data layer. The marked semantic information is imaged to obtain the marked semantic information layer.

In other embodiments, the parking space layer can be generated according to the marked semantic information, which specifically includes: extracting semantic information representing the parking space from the marked semantic information, and performing image processing to generate the parking space according to the semantic information of the represented parking space. The marked parking space layer.

Wherein, the marked map data includes marking information used to indicate the source of the data. When using map data, the map data can be filtered according to the marker information and moving direction of the map data, and then the filtered map data can be used to control the movement of the movable platform, so as to reduce the amount of data processing in the process of using the map data, so that the movable platform can Generate control instructions from map data more quickly.

In the above technical solution, it is judged whether the map metadata meets the requirements of mapping quality according to the richness of each map metadata in quantity and type, and map data that accurately reflects the spatial scene can be obtained according to the map metadata. In addition, the source tagging process is performed on the map metadata, so that the obtained map data can also reflect the data source. When using the map data, the data can be filtered according to the data source to reduce the data processing amount, and then the control instructions can be quickly generated according to the map data. To control the precise movement of the movable platform.

As shown in FIG. 4 , another embodiment of the present application provides a data processing method, the execution subject of the method is a control device, and the method specifically includes the following steps:

S401. When the movable platform moves in the space scene, the control device controls the image sensor located on the movable platform to collect multiple frames of image data of the space scene.

S402, the control device processes the multi-frame image data to obtain map metadata.

S403, the control device uses the identification of the image sensor to mark the source of the map metadata, and obtains the marked map metadata,

S404. If the marked map metadata meets the quality requirements for mapping, the control device generates marked map data according to the marked map metadata.

Among them, S401 to S404 have been described in detail in the above embodiments, and will not be repeated here.

S405, the control device acquires the moving direction of the movable platform and the real-time data collected by the image sensor.

Wherein, after the control device generates the marked map data, the control device can control the movable platform to move in the space scene at the current moment according to the map data generated at the previous moment. It is also possible to control the movable platform to move in the space scene according to the generated map data when entering the space scene again next time.

When the movable platform moves in the space scene, the moving direction of the movable platform is collected by the driving sensor, and the real-time data of the space scene is collected by the image sensor. The control device controls the movable platform to move in the space scene in real time based on the moving direction, real-time data and map data of the movable platform.

S406, the control device acquires target data matching the moving direction from the marked map data according to the marking information.

The marker information of the map data is used to reflect the data source of the map data, that is, the image sensor that collects the image data corresponding to the map data can also be determined. The installation position of the image sensor on the movable platform is fixed, and then the position information of the image sensor that collects the image data corresponding to the map data can be determined according to the marker information. The target data is selected from the map data in combination with the moving direction of the movable platform and the above-mentioned position information.

More specifically, if the moving direction of the movable vehicle is forward walking, the map data derived from the image sensor installed in front of the movable platform is obtained from the map data as the target data. If the moving direction of the movable vehicle is to walk backward, the map data derived from the image sensor installed behind the movable platform is obtained from the map data as the target data.

S407, the control device generates a control instruction according to the real-time data and the target data.

Among them, the real-time data collected by the image sensor is also image data, and the control device performs feature extraction on the image data, and determines the location information of the movable platform according to the processed real-time data and map data, and then generates a mobile platform based on the location information and map data. Control instructions for controlling the movement of the movable platform, so that the movable platform moves within the spatial scene under the control of the control instructions.

When the location information of the movable platform is determined according to the processed real-time data and the map data, the processed real-time data and the map data are matched to obtain a matching result, and the location information of the movable platform is determined according to the successfully matched map data.

In another embodiment, after the control device matches the real-time data with the target data to obtain a matching result, the reliability value of the target data is set according to the matching result.

More specifically, if the matching result is that the matching is successful, the reliability value of the target data is set to the first reliability value, and if the matching result is that the matching fails, the reliability value of the target data is set to the second reliability value. Wherein, the first reliability value is greater than the second reliability value.

After obtaining the reliable value of the target data, the control device counts the reliable value of the target data to obtain the reliability statistical result. When the reliability statistical result meets the low reliability condition, the target data is deleted to realize the optimization of the map data.

When the reliability value of the target data is counted, if the reliability statistics result is the average value of the reliability value, the low reliability condition is that the average value of the reliability value is smaller than the preset average value.

In the above technical solution, after the control device generates the map data with the source mark, selects target data from the map data according to the mark information, and controls the movable platform to move according to the target data and the real-time data collected by the image sensor, and obtains by screening Target data, reducing the amount of data processing during the use of map data, the control device can generate control instructions more quickly, so that the movable device can move in the space scene reliably. In addition, when the control device matches the real-time data with the target data, the target data is marked with the matching result, so as to realize the optimization of the map data.

The data processing method provided by the present application is described below by taking the movable platform as an intelligent car as an example. The execution subject of the method is a control device in the intelligent car, such as a trip computer, and the method specifically includes the following steps:

S501. When the movable platform moves in the space scene, the control device controls the image sensor located on the movable platform to collect multiple frames of image data of the space scene.

Among them, the smart car is equipped with a monocular camera, such as a driving recorder, and a fisheye camera installed around the smart car. The above cameras are used to collect image data in a certain spatial scene, such as an underground parking lot. . Smart cars are also equipped with driving sensors, such as low-precision inertial navigation units, odometers, GPS, etc.

The data processing method provided by the present application does not require the smart car to add new sensors, and the above sensors can be used to generate map data and control the driving of the smart vehicle.

S502, the control device processes the multi-frame image data to obtain map metadata.

Wherein, after acquiring the multi-frame image data collected by the camera and the driving data collected by the driving sensor, the control device processes the multi-frame image data to obtain map metadata. Specifically include the following steps:

S5001 , using the above driving data to calculate an inter-frame pose between two frames of image data.

The VIO and VO algorithms can be used to process the image data and the data collected by the driving sensor to estimate the frame-to-frame pose between two frames of image data. The frame-to-frame pose serves as the basis for image data processing. It is also possible to estimate the frame-to-frame pose using driving sensors, such as integrating the data collected by the odometer and the inertial measurement unit to obtain the frame-to-frame pose.

S5002 , extracting two-dimensional feature data in the multi-frame image data.

Among them, feature extraction is performed on the image data collected by the monocular camera and the image data collected by the fisheye camera. Preferably, geometric features are extracted from the image data, such as: object edges, corners, planes, salient points, special textures, and the like. It also extracts texture data, gradient data, pixel color and other data in the image data. These characteristic information have the characteristics of time stability, angle stability, scale stability, etc., and can be observed stably and consistently at different angles, distances, and time periods.

When the feature extraction is performed on the image data to obtain the two-dimensional feature data, the extracted texture data, gradient data, pixel color, etc. are also used to encode the two-dimensional feature data, so as to perform feature matching and build a map dictionary.

During feature extraction, it is necessary to correct the image data collected by the fisheye camera, for example: convert the image data under the fisheye camera model to the image data under the pinhole camera model, and convert the two-dimensional feature data in the pinhole image data to the image data under the pinhole camera model. The two-dimensional feature data in the resulting fisheye image data can be fused, which is conducive to feature matching processing.

S5003. Perform feature matching on the pose between frames and the two-dimensional feature data.

Among them, it is mainly to perform time series correlation on the extracted features. Common timing correlation methods include algorithms such as inter-frame correlation, window correlation, and loopback correlation.

The inter-frame correlation is mainly for two adjacent images, for example: an image with an acquisition time interval of 50 milliseconds, or an image with a displayed position interval of 20 cm. Usually, there will be more feature correlations in the inter-frame matching.

Window correlation mainly refers to correlating all features within a period of time or distance. Through the quantitative statistics of feature correlation, performance indicators such as feature stability and consistency can be measured.

For example, when a two-dimensional feature data can be associated with a large number of images within a window, such as 30 frames of images, the two-dimensional feature data is high-quality two-dimensional feature data, and has better robustness to temporal and spatial changes sex.

S5004. Calculate three-dimensional feature points according to the feature matching result.

Among them, the calculation of three-dimensional feature points is based on the three-point coplanarity assumption, using two images at different positions to observe the same object, and calculate the three-dimensional coordinates of the object. In the present application, the image feature data is triangulated by using the inter-frame poses and matching results of two frames of image data to obtain three-dimensional feature points.

S5005. Extract the semantic information of the multi-frame image data.

Among them, the extraction of semantic information is mainly to extract the information of objects with clear categories in the spatial scene, such as: ground lane lines, parking spaces, indicating arrows, etc., air collision bars, hanging signs, large walls, pillars, etc. Semantic information is usually a relatively stable element. Usually, only when the environment changes, such as parking lot maintenance and reconstruction, will the semantic information fail, which can accurately reflect the spatial scene.

S503. Use the identifier of the image sensor to mark the source of the map metadata, and obtain the marked map metadata.

Among them, in order to realize the one-time construction of map data, image data will be collected by using cameras located in different directions, for example, using monocular cameras and fisheye cameras to collect image data. When collecting image data, marking is performed according to the characteristics of image data collected by different cameras. When using map data, filter different map metadata for matching according to the driving direction of the intelligent vehicle.

S504. If the marked map metadata meets the quality requirements for mapping, the control device generates marked map data according to the marked map metadata.

Among them, it is judged whether the obtained map metadata meets the requirements of mapping quality through information such as the richness of spatial three-dimensional feature points, the richness of semantic information, and the richness of texture data. If it does not meet the requirements, the control device will send out warning information to warn you to choose a more suitable space scene and build a map in a more suitable time period.

If the obtained map metadata meets the requirements of mapping quality, the marked map data is generated. More specifically, image processing is performed on the marked three-dimensional features and the marked texture data to obtain a marked feature data layer. The marked semantic information is imaged to obtain the marked semantic information layer.

After the feature data layer and the semantic information layer are generated, the corresponding layers can be generated based on individual requirements for the autonomous driving of smart cars. For example: building a dictionary of keyframes.

The process of generating map data is described below by taking the generation of map data of the parking lot as an example: when the intelligent vehicle drives along the path in the figure, the fisheye camera and the front-view camera collect the image data in the parking lot, and stitch the image data collected by the fisheye camera. A look-around top view is formed, and then the images collected by the look-up top view and the monocular camera are spliced into a ground image, and then the deep learning method is used to realize the identification and extraction of parking spaces, mainly including lane lines, parking spaces, ground indicating arrows and other information, as semantics important part of the map.

Through the identification of parking spaces, the identification and storage of valid parking spaces, invalid parking spaces, exclusive parking spaces, parking space numbers and other information in the map can be realized, which can be used for interactive selection of customers during automatic parking.

In the detection results of parking spaces, noise, false detection, etc. may appear. It is necessary to fuse and filter the location, type, size and other information of the parking spaces observed for multiple times to obtain a semantic layer, and then combine the feature data layer of the parking lot. , and finally form the map data of the parking lot.

After the map data is generated, the map data can be optimized. Specifically, it includes: the position of the 3D point, the pose of the camera, and the quality of the semantic map.

S505, the control device acquires the moving direction of the movable platform and the real-time data collected by the image sensor.

Among them, the control device obtains the driving direction of the smart car, and obtains the image data collected by the fisheye camera and the monocular camera.

S506. Acquire target data matching the moving direction from the marked map data according to the marked information.

Among them, when the intelligent vehicle is driving forward, the monocular camera will be used to collect the map data corresponding to the image data for positioning, and when the intelligent vehicle is driving backward, the map data corresponding to the image data collected by the fisheye camera will be used for positioning. In order to make intelligent vehicles have faster and more powerful positioning capabilities.

S507. The control device generates a control instruction according to the real-time data and the target data.

The control device matches the real-time data with the map data to obtain a matching result, determines the location information of the intelligent vehicle according to the successfully matched map data, and then controls the intelligent vehicle to drive according to the location information of the intelligent vehicle.

In the above technical solution, after the control device generates the map data with the source mark, the target data is selected from the map data according to the mark information, and the driving of the vehicle is controlled according to the target data and the real-time data collected by the image sensor, so as to reduce the process of using the map data. In order to realize the rapid positioning of the smart car, and then control the driving of the smart car more reliably.

As shown in FIG. 5 , another embodiment of the present application provides a data processing method. The execution body of the method is a control device in a smart car, such as a trip computer. The method specifically includes the following steps:

S601. Obtain an inter-frame pose according to multi-frame data collected by a sensor on a movable platform.

Among them, the movable platform is a smart car, a driving recorder is installed in front of the smart car, and a pinhole camera in the driving recorder is used as an image sensor. A fisheye camera is installed around the smart car, which also acts as an image sensor. Pinhole cameras and fisheye cameras are used to collect image data of spatial scenes.

The smart car is also equipped with an odometer, an Inertial Measurement Unit (IMU) and GPS. The inertial measurement unit, GPS and odometer are used to collect the driving data of the smart car, such as acceleration, speed, mileage, data such as driving location.

After using the pinhole camera to collect multi-frame image data and using the odometer and IMU to collect the driving data of the smart car, the inter-frame pose is estimated based on the multi-frame image data, the odometer and the driving data collected by the IMU.

When estimating the pose between frames, the monocular visual inertial system (VINS) algorithm, the visual inertial odometer (VIO) algorithm or the visual odometer (visual odometer, Abbreviation: VO) algorithm, which can also estimate the pose between frames based on the integration of wheel speed and IMU output driving data.

Estimating the frame-to-frame pose is an important system basis for the entire data processing method, and its quality will directly affect the computational time-consuming of subsequent map data optimization steps.

S602. Extract two-dimensional feature data in the multi-frame image data.

Among them, this step is mainly to perform feature extraction on the image data collected by the pinhole camera and the image data collected by the fisheye camera to obtain feature data. The feature data is two-dimensional feature data.

Two-dimensional feature data includes geometric feature data, such as: object edges, corners, planes, salient points, special textures and other features. These characteristic information have the characteristics of time stability, angle stability, scale stability, etc., and are relatively stable at different angles, distances, and time periods, and maintain consistent observability. At the same time, the two-dimensional feature data is effectively expressed, using texture data, gradient data, pixel color data, etc. to encode the two-dimensional feature data, and the encoded two-dimensional feature data is used for feature matching and dictionary data.

During feature extraction, the image data collected by the fisheye camera is corrected, so that the image data under the fisheye camera model is converted to the pinhole camera model. After conversion, the two-dimensional features of the image data collected by the pinhole camera The 2D feature data of the data and the image data collected by the fisheye camera can be fused to achieve feature matching with higher stability and consistency.

S603, extracting semantic information on the image data collected by the pinhole camera.

Among them, the semantic information processing part is mainly to process the objects with clear meaning in the image data of the spatial scene, such as the ground objects such as lane lines, parking spaces, indicating arrows, anti-collision bars, hanging signs, large walls, pillars and other spaces in the space. Class recognition of objects is performed to obtain semantic information. Semantic information is usually a relatively stable element, and it is usually only in the case of large-scale environmental changes, such as parking lot maintenance and reconstruction, that semantic information is unavailable.

S604 , stitching the images collected by the fisheye camera into a look-around top view.

The specific splicing method adopts the prior art, which will not be repeated here. After obtaining the look-around top view, the semantic information extraction of the look-up top view mainly includes the extraction of semantic information such as lane lines, parking spaces, ground indicating arrows, etc., which can be used as an important part of the semantic information layer.

Through the identification of parking spaces in the top view, the identification and storage of valid parking spaces, invalid parking spaces, exclusive parking spaces, parking space numbers and other information in the map can be realized, which can be used for interactive selection of customers during automatic parking.

S605. Match the two-dimensional feature data extracted in S602 based on the poses between frames estimated in S601.

Among them, the matching of two-dimensional and feature data is mainly to perform time series association on the extracted two-dimensional feature data. Algorithms such as inter-frame association, window association, and loopback association can be used for matching.

The inter-frame correlation is mainly to correlate two adjacent images, for example, the interval is 50 milliseconds, or the interval is 20 centimeters. Usually, there will be more feature correlations in the inter-frame matching.

Window correlation is mainly to correlate all features within a fixed or non-fixed time or within a fixed or non-fixed distance range, and obtain performance indicators such as the stability and consistency of the feature by counting the number of feature correlations. For example, when a two-dimensional feature data can be associated with a large number of images within a window, such as 30 frames of images, it indicates that the two-dimensional feature data is of very high quality and has good robustness to temporal and spatial changes.

Loopback matching means that data may be collected multiple times in the same spatial scene. In this case, by correlating the two-dimensional feature data with the image data of different time periods, it can not only identify whether a map has been constructed at the current location, but also effectively fuse and update the image data of the spatial scene observed multiple times.

S606, construct a dictionary of key frames in the map data based on the two-dimensional feature data extracted in S602.

Wherein, constructing the key frame dictionary refers to clustering the features of the image data of the spatial scene, and expressing the current scene by using a combination of multiple two-dimensional feature data. The construction of the key frame dictionary can use not only two-dimensional feature data, but also semantic information, deep learning descriptors, and so on.

The functions of the expression of the dictionary include: First, the expression of the richness of the scene. If the keyframe dictionary in the map data of a spatial scene is rich, the map data representing the spatial scene has rich texture data, geometric features, semantic information, etc. When the richness of the key frame dictionary in the map data of the spatial scene is low, it means that the quality of the map data of the spatial scene is poor, and the user cannot use the map to control the intelligent vehicle, such as automatic parking, which promotes the user's expected management. Second, the keyframe dictionary can be used for position recognition during parking relocation. During relocation initialization, the vehicle needs to find its current location on the map. Through the matching of the key frame dictionary, the approximate current position of the vehicle in the map can be quickly found, and then accurate position estimation can be achieved through the matching of semantic information and two-dimensional feature data.

S607 , extracting parking space information from the image data collected by the look-ahead top view and the pinhole camera.

Among them, the splicing of the image data collected by looking around the fisheye and the image data collected by the pinhole camera can reflect the complete ground image at a long distance. The deep learning method is used to realize the identification and extraction of parking spaces, and the deep learning method will not be repeated here.

In the recognition results of parking spaces, problems such as noise and misrecognition may occur. It is necessary to fuse and filter the location, type, size and other information of parking spaces on the image data collected multiple times to obtain multiple continuous, stable and high-quality parking spaces. bit layer.

S608. Obtain three-dimensional feature points according to the feature matching result.

Wherein, obtaining three-dimensional feature points according to the feature matching result is to realize triangulation of feature data, that is, to convert two-dimensional feature data into three-dimensional feature data.

The triangulation step is mainly based on the three-point coplanarity assumption, and the three-dimensional coordinates of the object are calculated by using the images obtained by observing the same object at two different positions. The traditional SFM technology needs to estimate the pose between the cameras and the three-dimensional point cloud at the same time. In this application, after the inter-frame pose is estimated, the two-dimensional feature data is matched based on the inter-frame pose, and then the feature data is triangulated quickly based on the matching result.

S609. Determine whether the map metadata meets the quality requirements for mapping.

The map metadata refers to the three-dimensional feature points, semantic information and texture data obtained in the above steps. Through the number of 3D feature points, the richness of 3D feature points in space, the richness of semantic features, the richness of scene texture, the environmental illumination and other information, comprehensively judge the construction quality of map data constructed based on the currently obtained map metadata. Whether it meets the quality requirements of the drawing. The specific mapping quality requirements have been described in detail in the foregoing embodiments, and will not be repeated this time.

If it does not meet the quality requirements of the map, the control device will issue a warning message to warn the user to change the spatial scene, better weather, and better time to build the map.

S610. Generate map data according to the map metadata, and optimize the map data.

Among them, when the quality requirements for mapping are met, the map data constructed based on the map metadata can meet the needs of automatic parking when it is used for automatic parking control, and the quality of the map data will be optimized uniformly. The optimization contents include: the position of 3D feature points, the pose of the camera, and the quality of the semantic layer. For example: optimization of the position, angle, size, category, etc. of semantic elements.

S611. Store the map data according to different layers.

Among them, after the map data is constructed, the map data is stored in different layers, such as: feature data layer, semantic information layer, navigation data layer, key frame dictionary layer, etc. Different layers can be used to provide data for the automatic parking location function.

In addition, in order to realize the one-time construction of map data, the map metadata of the image data collected by different cameras will be encoded, and the map metadata of the image data collected by different cameras will be used for matching in different positioning stages. For example, when the vehicle is moving forward, the map metadata of the image data collected by the front-view camera is used for matching and positioning, and when the vehicle is moving backward, the map metadata of the image data collected by the rear-view fisheye camera is used for matching and positioning to achieve more Fast and powerful map building capabilities.

When using map data for automatic parking, the control device will increase the weight of the matched map metadata, while the unmatched map metadata will reduce the weight, and add a certain amount of high-quality map metadata to the map to realize the map. The timely update of map metadata ensures the long-term timeliness and high quality of the map.

In the above technical solution, pure visual lightweight map data is constructed, and complex and manual intervention of high-precision map construction is avoided. The constructed map data has extremely low hardware cost requirements, and can quickly realize high-quality map data construction. . After a lot of tests, the constructed map data can be effectively used for positioning problems in the process of automatic parking. Through a lot of learning, it can realize functions such as exclusive parking space selection, selecting any parking space for parking, parking lot cruise parking and other functions, which can help people very well. It also provides map navigation and real-time path planning for functions such as valet parking and memory parking.

As shown in FIG. 6, the present application provides a control device 700, the control device 700 specifically includes: a memory 701 for storing instructions and a processor 702 for executing the instructions stored in the memory, and the processor 702 is used for specific execution :

Judging whether the map metadata meets the quality requirements for mapping;

Optionally, the mapping quality requirements include at least one of the following:

The total number of three-dimensional feature points reaches the first number threshold;

There are at least two three-dimensional feature points with different components on the three coordinate axes;

The total amount of texture data reaches the second amount threshold;

The number of types of texture data reaches the third number threshold;

The total quantity of semantic information reaches the fourth quantity threshold;

The number of kinds of semantic information reaches the fifth number threshold.

Optionally, the processor 702 is configured to specifically execute:

Use the identity of the image sensor to mark the source of the map metadata, and obtain the marked map metadata;

Generate map data based on map metadata, including:

Generate marked map data according to the marked map metadata;

Wherein, the marked map data includes marking information used to indicate the source of the data.

Optionally, the processor 702 is configured to specifically execute at least one of the following:

According to the marked 3D feature points and the marked texture data, the marked feature data layer is generated;

Generate a marked semantic information layer according to the marked semantic information.

Optionally, the processor 702 is configured to specifically execute:

Obtain the moving direction of the movable platform and real-time data collected by the image sensor;

Obtain target data matching the moving direction from the marked map data according to the marked information;

Generate control instructions according to real-time data and target data; wherein, the control instructions are used to control the movement of the movable platform.

Optionally, the processor 702 is configured to specifically execute:

If the movement direction is forward movement, the map data from the image sensor installed in front of the movable platform is obtained from the map data as the target data;

If the moving direction is backward movement, the map data derived from the image sensor installed behind the movable platform is obtained from the map data as the target data.

Optionally, the processor 702 is configured to specifically execute:

Match real-time data and target data to obtain matching results;

Obtain the position information of the movable platform according to the successfully matched target data;

Control commands are generated according to the position information of the movable platform.

Optionally, the processor 702 is configured to specifically execute:

Set the reliable value of the target data according to the matching result.

Optionally, the processor 702 is configured to specifically execute:

If the matching result is that the matching is successful, set the reliable value of the target data as the first reliable value;

If the matching result is that the matching fails, set the reliable value of the target data to the second reliable value;

Wherein, the first reliability value is greater than the second reliability value.

Optionally, the processor 702 is configured to specifically execute:

Count the reliable values of the target data and obtain the reliability statistical results;

When the reliability statistics result meets the low reliability condition, delete the target data.

Optionally, the image sensor includes a monocular camera and/or a fisheye camera.

Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, execute Including the steps of the above-mentioned method embodiments; and the aforementioned storage medium includes: read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other various programs that can store program codes medium.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. scope.

Claims

A data processing method, comprising:

When the movable platform moves in the space scene, controlling the image sensor located on the movable platform to collect multiple frames of image data of the space scene;

Process the multi-frame image data to obtain map metadata; wherein, the map metadata includes any one or a combination of three-dimensional feature points, texture data and semantic information;

Judging whether the map metadata meets the quality requirements for mapping;

If the map metadata meets the mapping quality requirement, map data is generated according to the map metadata; wherein the map data is used to control the movable platform to move within the spatial scene.
The method according to claim 1, wherein the mapping quality requirements include at least one of the following:

The total number of the three-dimensional feature points reaches a first number threshold;

There are at least two three-dimensional feature points with different components on the three coordinate axes;

The total quantity of the texture data reaches a second quantity threshold;

The number of types of the texture data reaches a third number threshold;

The total quantity of the semantic information reaches a fourth quantity threshold;

The number of kinds of the semantic information reaches a fifth number threshold.
The method according to claim 1 or 2, wherein after the multi-frame image data is processed to obtain map metadata, the method further comprises:

Using the identifier of the image sensor to mark the source of the map metadata to obtain the marked map metadata;

Generate map data according to the map metadata, which specifically includes:

generating marked map data according to the marked map metadata;

Wherein, the marked map data includes marked information for indicating the source of the data.
The method according to claim 3, wherein generating the map data according to the marked map metadata includes at least one of the following:

According to the marked 3D feature points and the marked texture data, the marked feature data layer is generated;

Generate a marked semantic information layer according to the marked semantic information.
The method according to claim 3 or 4, wherein after generating the marked map data according to the marked map metadata, the method further comprises:

acquiring the moving direction of the movable platform and real-time data collected by the image sensor;

Obtain target data matching the moving direction from the marked map data according to the marked information;

A control instruction is generated according to the real-time data and the target data; wherein, the control instruction is used to control the movement of the movable platform.
The method according to claim 5, wherein acquiring target data matching the moving direction from the marked map data according to the marked information specifically includes:

If the moving direction is forward movement, obtain map data from an image sensor installed in front of the movable platform as the target data from the map data;

If the moving direction is backward movement, map data derived from an image sensor installed behind the movable platform is obtained from the map data as the target data.
The method according to claim 5, wherein generating a control instruction according to the real-time data and the target data specifically includes:

Matching the real-time data and the target data to obtain a matching result;

Obtain the position information of the movable platform according to the successfully matched target data;

The control instruction is generated according to the position information of the movable platform.
The method according to claim 7, wherein after matching the real-time data and the target data to obtain a matching result, the method further comprises:

The reliability value of the target data is set according to the matching result.
The method according to claim 8, wherein setting the reliability value of the target data according to the matching result specifically includes:

If the matching result is that the matching is successful, set the reliable value of the target data as the first reliable value;

If the matching result is a matching failure, setting the reliable value of the target data to a second reliable value;

Wherein, the first reliability value is greater than the second reliability value.
The method according to claim 8, wherein after setting the reliability value of the target data according to the matching result, the method further comprises:

Count the reliability values of the target data to obtain reliability statistics results;

When the reliability statistics result satisfies the low reliability condition, the target data is deleted.
The method according to any one of claims 1 to 10, wherein the image sensor comprises a monocular camera and/or a fisheye camera.
A control device, characterized in that it comprises: a memory for storing instructions and a processor for executing instructions stored in the memory, wherein the processor is used to specifically execute:

When the movable platform moves in the space scene, controlling the image sensor located on the movable platform to collect multiple frames of image data of the space scene;

Process the multi-frame image data to obtain map metadata; wherein, the map metadata includes any one or a combination of three-dimensional feature points, texture data and semantic information;

Judging whether the map metadata meets the quality requirements for mapping;

If the map metadata meets the mapping quality requirement, map data is generated according to the map metadata; wherein the map data is used to control the movable platform to move within the spatial scene.
The control device according to claim 12, wherein the mapping quality requirements include at least one of the following:

The total number of the three-dimensional feature points reaches a first number threshold;

There are at least two three-dimensional feature points with different components on the three coordinate axes;

The total quantity of the texture data reaches a second quantity threshold;

The number of types of the texture data reaches a third number threshold;

The total quantity of the semantic information reaches a fourth quantity threshold;

The number of kinds of the semantic information reaches a fifth number threshold.
The control device according to claim 12 or 13, wherein the processor is configured to specifically execute:

Using the identifier of the image sensor to mark the source of the map metadata to obtain the marked map metadata;

Generate map data according to the map metadata, which specifically includes:

generating marked map data according to the marked map metadata;

Wherein, the marked map data includes marked information for indicating the source of the data.
The control device according to claim 14, wherein the processor is configured to specifically execute at least one of the following:

According to the marked three-dimensional feature points and the marked texture data, the marked feature data layer is generated;

Generate a marked semantic information layer according to the marked semantic information.
The control device according to claim 14 or 15, wherein the processor is configured to specifically execute:

acquiring the moving direction of the movable platform and real-time data collected by the image sensor;

Obtain target data matching the moving direction from the marked map data according to the marked information;

A control instruction is generated according to the real-time data and the target data; wherein, the control instruction is used to control the movement of the movable platform.
The control device according to claim 16, wherein the processor is configured to specifically execute:

If the moving direction is forward movement, obtain map data from an image sensor installed in front of the movable platform as the target data from the map data;

If the moving direction is backward movement, map data derived from an image sensor installed behind the movable platform is obtained from the map data as the target data.
The control device according to claim 16, wherein the processor is configured to specifically execute:

Matching the real-time data and the target data to obtain a matching result;

Obtain the position information of the movable platform according to the successfully matched target data;

Control commands are generated according to the position information of the movable platform.
The control device according to claim 18, wherein the processor is configured to specifically execute:

The reliability value of the target data is set according to the matching result.
The control device according to claim 19, wherein the processor is configured to specifically execute:

If the matching result is that the matching is successful, set the reliable value of the target data as the first reliable value;

If the matching result is a matching failure, setting the reliable value of the target data to a second reliable value;

Wherein, the first reliability value is greater than the second reliability value.
The control device according to claim 19, wherein the processor is configured to specifically execute:

Count the reliability values of the target data to obtain reliability statistics results;

When the reliability statistics result satisfies the low reliability condition, the target data is deleted.
The control device according to any one of claims 12 to 21, wherein the image sensor comprises a monocular camera and/or a fisheye camera.
A movable platform is characterized by comprising an image sensor and the data processing method according to any one of claims 12 to 22.