WO2022089577A1

WO2022089577A1 - Pose determination method and related device thereof

Info

Publication number: WO2022089577A1
Application number: PCT/CN2021/127380
Authority: WO
Inventors: 宋佳蓉
Original assignee: 华为技术有限公司
Priority date: 2020-10-31
Filing date: 2021-10-29
Publication date: 2022-05-05
Also published as: CN114445490A

Abstract

Provided in an embodiment of the present invention is a pose determination method. The method comprises: acquiring a first image frame and a second image frame captured by a camera device, each of the first image frame and the second image frame comprising a dynamic target; respectively performing corner detection on the first image frame and the second image frame, and obtaining multiple first corners of the first image frame and multiple second corners of the second image frame; removing, from the multiple first corners, a first corner at a region where the dynamic target of the first image frame is located, and obtaining multiple first corners without said removed first corner; removing, from the multiple second corners, a second corner at a region where the dynamic target of the second image frame is located, and obtaining multiple second corners without said removed second corner; and determining a pose change of the camera device according to the multiple first corners without said removed first corner and the multiple second corners without said removed second corner. In this embodiment, removal of a corner at a region where a dynamic target is located from multiple corners ensures that a visual reprojection error is more reliable, thereby preventing occurrence of an error in determination of a pose change due to introduction of a dynamic feature.

Description

A pose determination method and related equipment

This application claims the priority of the Chinese patent application filed on October 31, 2020 with the application number 202011199017.6 and the invention titled "A Pose Determination Method and Related Equipment", the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the technical field of image processing, and more particularly, to a pose determination method and related devices.

Background technique

With the development of mobile robot technology, indoor real-time positioning technology has received extensive attention. The robot knows its own location and can provide real-time information for planning control and other modules to complete the desired task. However, in an indoor environment, the global positioning system (GPS) cannot be used for positioning due to unstable signals. Some indoor positioning methods based on signal generators, such as ultra wide band (UWB) and wireless fidelity (WIFI), require the installation of signal generators in the usage scenarios or are prone to limit the movement area and cost question. Second, because the laser sensor is too expensive, the laser-based indoor positioning technology will bring the problem of high cost.

Due to the low cost of camera equipment and the ability to capture rich information, vision-based active real-time localization techniques are proposed. Common positioning schemes include pure vision-based positioning technology. The main idea of this scheme is to solve the pose of the moving body based on visual feature point matching and global optimization. First, the corner points are extracted from the image, and the matching corner points between the two frames are used to calculate the pose change of the shooting device between the two frames. Pose changes can include position changes as well as rotational angle changes. However, in some scenes, there are objects in dynamic motion. At this time, visual constraints cannot provide reliable observations, resulting in large errors, which affect the calculation accuracy of relative poses.

SUMMARY OF THE INVENTION

In a first aspect, the present application provides a method for determining a pose, the method comprising:

Acquire a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are adjacent image frames captured by the camera device, and the first image frame and the second image frame are adjacent image frames captured by the camera device. The second image frames respectively include dynamic objects, wherein the dynamic objects are objects that are displaced relative to the ground when the imaging device captures the image frames. The so-called displacement relative to the ground when the imaging device captures the image frame means that the dynamic target moves relative to the ground in space, for example, from position A to position B (position A and position B are two different positions). In one implementation, the dynamic target may be a vehicle, such as a car, a truck, a passenger car, a trailer, an incomplete vehicle, a motorcycle, etc. It should be understood that the first image frame and the second image frame may include The same dynamic target may also include different dynamic targets; by performing corner detection (corner detection) on the first image frame and the second image frame respectively, to obtain multiple first image frames of the first image frame. A corner point and a plurality of second corner points of the second image frame; in some cases, it can be understood that the recognition of the image by the human eye is usually completed in a local small area or small window. If the grayscale of the image in the area of the window changes greatly when the small window is moved in a small range in all directions, then it can be considered that there are corner points in the window. If the grayscale of the image in the area of the window does not change when the small window is moved in a small range in all directions, then it can be considered that there are no corners in the window. In some embodiments, the plurality of first corner points and the plurality of second corner points may be accelerated segment test (Features from accelerated segment test, FAST) corner points, Harris corner points or binary robust Variable scalable key points (Binary Robust Invariant Scalable Keypoints, BRISK) corner points, etc., are not limited in this embodiment of the present invention. Eliminate the first corner points in the region where the dynamic target included in the first image frame is located from the plurality of first corner points, so as to obtain the plurality of first corner points after culling; culling out the plurality of second corner points The second image frame includes the second corner points of the area where the dynamic target is located, so as to obtain a plurality of second corner points after culling; when there are dynamic corner points (that is, the corner points of the area where the dynamic target is located), the same corner The observed amount of points in different camera states includes not only the parallax caused by the movement of the camera itself, but also the parallax caused by the movement of the feature points themselves. The corner points of the region can ensure that the visual reprojection error term is reliable. According to the plurality of first corner points after the culling and the plurality of second corner points after the culling, it is determined that the image capturing device when the second image frame is photographed is relative to the time when the first image frame is photographed. Pose changes.

The pose change may refer to the change of the position of the photographing device and the change of the rotation angle. The change in position may represent the distance between the position at which the capturing device captured the second image frame and the position at which the first image frame was captured. The change in the rotation angle may represent an angular difference between the rotation angle when the photographing device photographed the second image frame and the rotation angle when the first image frame was photographed. When the photographing device is fixed on the vehicle, the posture change of the photographing device can be regarded as the posture change of the vehicle.

Taking the first image frame as the image frame before the second image frame as an example, multiple corner points extracted from the first image frame can be used for optical flow to find the distance between the first image frame and the second image frame. Match the corners, and obtain the optical flow information of the matching corners. The optical flow information is used to represent the motion information of the matching corners in the two adjacent images (from the perspective of the image frame, it can be called parallax), and further , the pose change of the imaging device when the second image frame is photographed relative to when the first image frame is photographed can be determined based on the optical flow information. The algorithm used for the optical flow can be the Lucas-Kanade optical flow algorithm or other algorithms. In addition to the optical flow, a descriptor or a direct method can also be used to match the corner points.

When there are dynamic corner points, the observed amount of the same corner point in different camera states includes not only the parallax caused by the movement of the camera itself, but also the parallax caused by the movement of the corner point itself. In this embodiment, the above-mentioned methods are used to address the dynamic environment. To solve the problem of large positioning error, the corner points in the area where the dynamic target is located in the corner points are eliminated to ensure that the visual reprojection error is more reliable, so as to overcome the error caused by the dynamic feature when the pose changes.

In one possible implementation, the dynamic target is a vehicle.

In a possible implementation, the method is applied to a target vehicle, and the target vehicle is fixedly provided with the camera device, the inertial measurement unit IMU, and a wheel speedometer; the method further includes: acquiring a measurement obtained by the IMU. During the period from capturing the first image frame to capturing the second image frame, the camera device obtains the motion state data of the target vehicle; the camera device obtains the measurement of the wheel speed when the camera device captures the first image. the wheel speed data of the target vehicle during the period from the frame to the shooting of the second image frame; correspondingly, the determination is based on the plurality of first corner points after the elimination and the plurality of second corner points after the elimination The pose change of the camera device when shooting the second image frame relative to when shooting the first image frame includes: according to the plurality of first corner points after the culling and the plurality of first corner points after the culling The disparity of the second corner points in the respective image frames determines the change of the first pose of the camera device when the second image frame is captured relative to the first image frame when the first image frame is captured, wherein the first The attitude change does not include scale information; according to the motion state data, the wheel speed data, and the first attitude change, it is determined that when the camera device captures the second image frame, relative to capturing the first image The second pose change at frame time, the second pose change includes scale information; the second pose change is non-linearly optimized to obtain the pose change. The motion state data may be vehicle acceleration data and angular velocity data measured by an inertial sensor (IMU), and the wheel speed data may be vehicle wheel speed, steering wheel angle data, and the like.

In one implementation, timestamps may be aligned to the removed first corner points and the removed second corner points with the motion state data and the wheel speed data, wherein, due to different data frequencies, A single frame of image data (removed corner points) corresponds to multiple motion state data and wheel speed data. After that, the motion state data and wheel speed data of each frame of image can be pre-integrated to provide initial pose values for the image. After that, the sliding window method is used, and pure visual data is used for initialization processing to obtain the pose and position without scale information (also called depth) when the camera equipment captures all image frames in the sliding window. The speed data is restored without scale information, and the positions of all corner points are recalculated, and then nonlinear optimization is performed on the restored scale information and the inverse depth of the corner points to obtain the pose change.

In a possible implementation, the performing nonlinear optimization on the second pose change includes: performing nonlinear optimization on the second pose change according to a preset optimization function, wherein the preset The optimization function includes a wheel speedometer residual term.

In order to optimize the pose change smoothly or approach the real value, nonlinear optimization is usually performed. Nonlinear optimization refers to finding an optimal set of numerical mappings in a given objective function, namely x-min f(x). According to the derivative theory, the effective value of x can be obtained by solving the derivative equation Δf(x)=0. When f(x) is a nonlinear function, the optimization is nonlinear optimization. In the embodiment of the present application, the optimization function can be represented by It is composed of five items, which can be specifically expressed as: optimization function = a priori residual + IMU residual term + wheel speedometer residual term + visual reprojection error term + loop closure detection reprojection error term.

In the embodiment of the present application, in view of the problem of large positioning error in a dynamic environment, the corner points in the area where the dynamic target is located among the multiple corner points are eliminated to ensure that the visual reprojection error is more reliable, thereby overcoming the positioning error introduced by dynamic features. . In view of the disadvantage of the large two-dimensional motion initialization error of the visual inertial navigation system, the wheel speedometer residual term is added. The pose is initialized and calculated through the joint information of vision, IMU and wheel speedometer. So that when the IMU convergence is not good, the wheel speedometer can be used to compensate, so as to improve the effect.

In a possible implementation, the method further includes:

A pre-trained neural network is used to detect the dynamic target in the first image frame, and detect the dynamic target in the second image frame, so as to obtain the region where the dynamic target included in the first image frame is located, and the The second image frame includes the area where the dynamic object is located.

In one implementation, the current key frame and the map can also be closed-loop detection, the detected frame is a closed-loop frame, and the common-view feature point between the closed-loop frame and the frame in the sliding window can be found; the reprojection error term is established to add nonlinear optimization. Medium; perform four-degree-of-freedom optimization on the keyframes that have been optimized and slide out of the window and their associated frames; insert the optimized keyframes into the map; in actual use, the map, that is, loopback detection, can be turned on or off. While positioning and building a map, the final position is equivalent to the first fixed camera position. Of course, this can be fixed to a specific reference system by subsequent coordinate transformation.

In a second aspect, the present application provides a pose determination device, the device comprising:

an acquisition module, configured to acquire a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are adjacent image frames captured by the camera device, and the first image frame and the second image frame are adjacent image frames captured by the camera device. The image frame and the second image frame respectively include dynamic objects;

a corner extraction module, configured to obtain a plurality of first corners of the first image frame and the a plurality of second corner points of the second image frame;

A culling module, configured to cull the first corner points of the region where the dynamic target included in the first image frame is located among the plurality of first corner points, so as to obtain a plurality of culled first corner points;

Eliminating the second corner points of the region where the dynamic target is located in the second image frame included in the plurality of second corner points, to obtain a plurality of eliminated second corner points;

a positioning module, configured to determine, according to the plurality of first corner points after culling and the plurality of second corner points after culling, relative to when the camera device is photographing the second image frame, relative to when photographing the first The pose change over an image frame.

In a possible implementation, the plurality of first corner points and the plurality of second corner points include one of the following: accelerated segment test feature FAST corner points, Harris corner points, and binary robust invariant Scalable key BRISK corners.

In a possible implementation, the apparatus is applied to a target vehicle, and the target vehicle is fixedly provided with the camera device, the inertial measurement unit IMU and a wheel speedometer; the acquisition module is configured to acquire the IMU measurement result obtained by the motion state data of the target vehicle during the period from shooting the first image frame to shooting the second image frame by the camera device;

acquiring the wheel speed data of the target vehicle during the period from shooting the first image frame to shooting the second image frame by the camera device measured by the wheel speed meter;

Correspondingly, the positioning module is configured to determine, according to the disparity of the plurality of first corner points after culling and the plurality of second corner points after culling in respective image frames, that the camera device is in the shooting location. The first pose change when the second image frame is taken relative to when the first image frame is captured, wherein the first pose change does not include scale information;

According to the motion state data, the wheel speed data and the change of the first posture, determine the second posture of the camera device when the second image frame is photographed relative to when the first image frame is photographed change, the second pose change includes scale information;

Non-linear optimization is performed on the second pose change to obtain the pose change.

In a possible implementation, the positioning module is configured to perform nonlinear optimization on the second pose change according to a preset optimization function, wherein the preset optimization function includes a wheel speed meter residual item .

In a possible implementation, the apparatus further includes:

A dynamic target detection module, configured to detect the dynamic target in the first image frame through the pre-trained neural network, and detect the dynamic target in the second image frame, so as to obtain the dynamic target included in the first image frame the region where the dynamic target is located, and the region where the dynamic object included in the second image frame is located.

In one possible implementation, the dynamic target is a vehicle.

In a third aspect, a computer-readable storage medium is provided, where the computer-readable storage medium stores program codes, wherein the program codes include instructions for performing part or all of the operations in the method described in the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on a communication device, causes the communication device to perform some or all of the operations in the method described in the first aspect.

In a fifth aspect, a chip is provided, the chip includes a processor, and the processor is configured to perform some or all of the operations in the method described in the first aspect above.

An embodiment of the present application provides a method for determining a pose, the method includes: acquiring a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are for the camera The adjacent image frames captured by the device, the first image frame and the second image frame respectively include dynamic targets; by performing corner detection on the first image frame and the second image frame respectively (corner detection) ) to obtain a plurality of first corner points of the first image frame and a plurality of second corner points of the second image frame; excluding the plurality of first corner points included in the first image frame the first corner points of the area where the dynamic target is located, to obtain a plurality of first corner points after culling; culling the second corner points of the area where the dynamic target is located in the second image frame included in the plurality of second corner points, to obtain a plurality of second corner points after culling; according to the plurality of first corner points after culling and the plurality of second corner points after culling, it is determined that the camera device is shooting the second image frame relative to the pose change when the first image frame was captured. When there are dynamic corner points, the observed amount of the same corner point in different camera states includes not only the parallax caused by the movement of the camera itself, but also the parallax caused by the movement of the corner point itself. In this embodiment, the above-mentioned methods are used to address the dynamic environment. To solve the problem of large positioning error, the corner points in the region where the dynamic target is located among the multiple corner points are eliminated to ensure that the visual reprojection error is more reliable, thereby overcoming the determination error of the pose change introduced by the dynamic feature.

Description of drawings

1 is a functional block diagram of a vehicle provided by an embodiment of the present invention;

2 is a functional block diagram of an automatic driving system provided by an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a method for determining a pose provided by an embodiment of the present application;

FIG. 4 is a schematic flowchart of a pose determination method provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a pose determination device provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a pose determination apparatus provided by an embodiment of the present application.

Detailed ways

The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

The terms "first", "second", "third" and "fourth" in the description and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be components. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.

FIG. 1 is a functional block diagram of a vehicle 100 according to an embodiment of the present invention. In one embodiment, the vehicle 100 is configured in a fully or partially autonomous driving mode. For example, the vehicle 100 can control itself while in an autonomous driving mode, and can determine the current state of the vehicle and its surroundings through human manipulation, determine the possible behavior of at least one other vehicle in the surrounding environment, and determine the other vehicles perform The confidence level corresponding to the likelihood of the possible behavior, the vehicle 100 is controlled based on the determined information. When the vehicle 100 is in an autonomous driving mode, the vehicle 100 may be placed to operate without human interaction.

Vehicle 100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 . Alternatively, vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each of the subsystems and elements of the vehicle 100 may be interconnected by wire or wirelessly.

The travel system 102 may include components that provide powered motion for the vehicle 100 . In one embodiment, propulsion system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 . The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gas oil engine and an electric motor, a hybrid engine consisting of an internal combustion engine and an air compression engine. Engine 118 converts energy source 119 into mechanical energy.

Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. The energy source 119 may also provide energy to other systems of the vehicle 100 .

Transmission 120 may transmit mechanical power from engine 118 to wheels 121 . Transmission 120 may include a gearbox, a differential, and a driveshaft. In one embodiment, transmission 120 may also include other devices, such as clutches. Among other things, the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .

The sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 100 . For example, the sensor system 104 may include a positioning system 122 (the positioning system may be a GPS system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar system 126, a laser rangefinder 128 and camera 130 . The sensor system 104 may also include sensors of the internal systems of the vehicle 100 being monitored (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the autonomous vehicle 100 .

Among them, with the development of advanced driver assistance systems (ADAS) and unmanned driving technologies, higher requirements are placed on the performance of the radar system 126 such as distance and angular resolution. The improvement of the distance and angular resolution of the vehicle-mounted radar system 126 enables the vehicle-mounted radar system 126 to detect multiple measurement points for a target object when imaging the target to form high-resolution point cloud data. The radar system in this application 126 may also be referred to as point cloud imaging radar.

The positioning system 122 may be used to estimate the geographic location of the vehicle 100 . The IMU 124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration. In one embodiment, IMU 124 may be a combination of an accelerometer and a gyroscope.

Radar system 126 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar system 126 may be used to sense the speed and/or heading of objects.

The laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.

Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 . Camera 130 may be a still camera or a video camera. In this embodiment of the present application, the camera 130 may also be referred to as an imaging device.

The control system 106 controls the operation of the vehicle 100 and its components. Control system 106 may include various elements including steering system 132 , throttle 134 , braking unit 136 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .

The steering system 132 is operable to adjust the heading of the vehicle 100 . For example, in one embodiment it may be a steering wheel system.

The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100 .

The braking unit 136 is used to control the deceleration of the vehicle 100 . The braking unit 136 may use friction to slow the wheels 121 . In other embodiments, the braking unit 136 may convert the kinetic energy of the wheels 121 into electrical current. The braking unit 136 may also take other forms to slow the wheels 121 to control the speed of the vehicle 100 .

Computer vision system 140 may process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 . The objects and/or features may include traffic signals, road boundaries and obstacles. Computer vision system 140 may use object recognition algorithms, Structure from Motion (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.

The route control system 142 is used to determine the travel route of the vehicle 100 . In some embodiments, the route control system 142 may combine data from the sensors 138 , the GPS 122 , and one or more predetermined maps to determine a driving route for the vehicle 100 .

The obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the vehicle 100 .

Of course, in one example, the control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.

Vehicle 100 interacts with external sensors, other vehicles, other computer systems, or users through peripheral devices 108 . Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .

In some embodiments, peripherals 108 provide a means for a user of vehicle 100 to interact with user interface 116 . For example, the onboard computer 148 may provide information to the user of the vehicle 100 . User interface 116 may also operate on-board computer 148 to receive user input. The onboard computer 148 can be operated via a touch screen. In other cases, peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle. For example, microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 . Similarly, speakers 152 may output audio to a user of vehicle 100 .

Wireless communication system 146 may wirelessly communicate with one or more devices, either directly or via a communication network. For example, the wireless communication system 146 may use 3G cellular communications, such as code division multiple access (CDMA), enhanced versatile disk (EVD), global system for mobile communications , GSM)/general packet radio service (GPRS), or 4G cellular communications such as LTE. Or 5G cellular communications. The wireless communication system 146 may communicate with a wireless local area network (WLAN) using WiFi. In some embodiments, the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or ZigBee. Other wireless protocols, such as various vehicle communication systems, for example, wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations public and/or private data communications.

The power supply 110 may provide power to various components of the vehicle 100 . In one embodiment, the power source 110 may be a rechargeable lithium-ion or lead-acid battery. One or more battery packs of such a battery may be configured as a power source to provide power to various components of the vehicle 100 . In some embodiments, power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.

Some or all of the functions of the vehicle 100 are controlled by the computer system 112 . Computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as data storage device 114 . Computer system 112 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.

The processor 113 may be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor may be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor. Although FIG. 1 functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, one of ordinary skill in the art will understand that the processor, computer, or memory may actually include a processor, a computer, or a memory that may or may not Multiple processors, computers, or memories stored within the same physical enclosure. For example, the memory may be a hard drive or other storage medium located within an enclosure other than computer 110 . Thus, reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering and deceleration components, may each have their own processors that only perform computations related to component-specific functions.

In this embodiment of the present application, the processor 113 may acquire data from the camera 130 and other sensor devices, and perform vehicle positioning based on the acquired data.

In various aspects described herein, a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.

In some embodiments, data storage 114 may include instructions 115 (eg, program logic) executable by processor 113 to perform various functions of vehicle 100 , including those described above. Data storage 114 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or performing data processing on one or more of propulsion system 102 , sensor system 104 , control system 106 , and peripherals 108 . control commands.

In addition to instructions 115, data storage 114 may store data such as road maps, route information, vehicle location, direction, speed, and other vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 while the vehicle 100 is in autonomous, semi-autonomous, and/or manual modes.

A user interface 116 for providing information to or receiving information from a user of the vehicle 100 . Optionally, the user interface 116 may include one or more input/output devices within the set of peripheral devices 108 , such as a wireless communication system 146 , an onboard computer 148 , a microphone 150 and a speaker 152 .

Computer system 112 may control functions of vehicle 100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 . For example, computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144 . In some embodiments, computer system 112 is operable to provide control of various aspects of vehicle 100 and its subsystems.

Alternatively, one or more of these components described above may be installed or associated with the vehicle 100 separately. For example, data storage device 114 may exist partially or completely separate from vehicle 100 . The above-described components may be communicatively coupled together in a wired and/or wireless manner.

Optionally, the above component is just an example. In practical applications, components in each of the above modules may be added or deleted according to actual needs, and FIG. 1 should not be construed as a limitation on the embodiment of the present invention.

A self-driving car traveling on a road, such as vehicle 100 above, can recognize objects within its surroundings to determine adjustments to the current speed. The objects may be other vehicles, traffic control devices, or other types of objects. In some examples, each identified object may be considered independently, and based on the object's respective characteristics, such as its current speed, acceleration, distance from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.

Alternatively, the autonomous vehicle vehicle 100 or a computing device associated with the autonomous vehicle 100 (eg, computer system 112, computer vision system 140, data storage device 114 of FIG. 1) may be based on the characteristics of the identified objects and the surrounding environment state (eg, traffic, rain, ice on the road, etc.) to predict the behavior of the identified objects. Optionally, each identified object is dependent on the behavior of the other, so it is also possible to predict the behavior of a single identified object by considering all identified objects together. The vehicle 100 can adjust its speed based on the predicted behavior of the identified object. In other words, the self-driving car can determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and the like.

In addition to providing instructions to adjust the speed of the self-driving car, the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (eg, , cars in adjacent lanes on the road) safe lateral and longitudinal distances.

The above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc. The embodiments of the invention are not particularly limited.

Scenario Example 1: Autonomous Driving System

According to FIG. 2 , computer system 101 includes processor 103 coupled to system bus 105 . The processor 103 may be one or more processors, each of which may include one or more processor cores. A video adapter 107, which can drive a display 109, is coupled to the system bus 105. The system bus 105 is coupled to an input-output (I/O) bus through a bus bridge 111 . I/O interface 115 is coupled to the I/O bus. I/O interface 115 communicates with various I/O devices, such as input device 117 (eg, keyboard, mouse, touch screen, etc.), media tray 121, (eg, compact disc read-only) memory, CD-ROM), multimedia interface, etc.). Transceiver 123 (which can send and/or receive radio communication signals), camera 155 (which can capture still and moving digital video images) and external USB port 125 . Wherein, optionally, the interface connected to the I/O interface 115 may be a USB interface.

The processor 103 may be any conventional processor, including a reduced instruction set computing (reduced instruction set computing, RISC) processor, a complex instruction set computing (complex instruction set computing, CISC) processor or a combination of the above. Alternatively, the processor may be a special purpose device such as an ASIC. Optionally, the processor 103 may be a neural network processor or a combination of a neural network processor and the above-mentioned conventional processors.

Alternatively, in various embodiments herein, computer system 101 may be located remotely from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle. In other aspects, some of the processes herein are performed on a processor disposed within an autonomous vehicle, others are performed by a remote processor, including taking actions required to perform a single maneuver.

Computer 101 may communicate with software deployment server 149 through network interface 129 . Illustratively, network interface 129 is a hardware network interface, such as a network card. The network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN). Optionally, the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.

The hard disk drive interface is coupled to the system bus 105 . The hard drive interface is connected to the hard drive. System memory 135 is coupled to system bus 105 . Data running in system memory 135 may include operating system 137 and application programs 143 of computer 101 .

The operating system includes a Shell 139 and a kernel 141 . Shell 139 is an interface between the user and the operating system's kernel. The shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.

Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources. The kernel 141 directly interacts with the hardware, and the operating system kernel usually runs processes, provides inter-process communication, provides CPU time slice management, interrupts, memory management, IO management, and the like.

Application 143 includes programs that control the autonomous driving of the car, for example, programs that manage the interaction of the autonomous vehicle with road obstacles, programs that control the route or speed of the autonomous vehicle, and programs that control the interaction between the autonomous vehicle and other autonomous vehicles on the road. . Application 143 also exists on the system of deploying server 149. In one embodiment, computer system 101 may download application 143 from software deployment server 149 when application 143 needs to be executed.

Sensor 153 is associated with computer system 101 . Sensor 153 is used to detect the environment around computer system 101 . For example, the sensor 153 can detect animals, cars, obstacles and pedestrian crossings, etc. Further, the sensor 153 can also detect the environment around the above-mentioned animals, cars, obstacles and pedestrian crossings, such as: the environment around animals, for example, around animals Other animals present, weather conditions, ambient light levels, etc. Alternatively, if the computer 101 is located in a self-driving car, the sensor may be a radar system or the like.

A common positioning scheme is based on pure vision positioning technology, namely visual SLAM. The main idea of this scheme is to solve the pose of the moving body based on visual feature point matching and global optimization. First, the features are extracted from the image, and the feature points matched between the two frames are used to calculate the relative pose transformation between the two frames, and finally the information is used. Calculate the odometer information.

In addition to visual SLAM, localization methods based on visual and inertial measurement unit (IMU) data fusion are also widely used. The main idea of this scheme is to fuse the IMU with the visual data to realize the positioning of the moving body. However, the localization method based on the fusion of visual and inertial measurement unit IMU data cannot provide reliable observations in a dynamic environment, resulting in large errors, thus affecting the localization accuracy.

In order to solve the above problem, referring to FIG. 3 , FIG. 3 is a schematic flowchart of a method for determining a pose provided by an embodiment of the present application. As shown in FIG. 3 , the method for determining a pose provided by an embodiment of the present application includes:

301. Acquire a first image frame and a second image frame captured by an imaging device, where the first image frame and the second image frame are adjacent image frames captured by the imaging device, and the first image frame and the second image frame are adjacent image frames captured by the imaging device. The second image frames respectively include dynamic objects.

In this embodiment of the present application, when the vehicle is in an environment with no GPS or unstable GPS, such as a tunnel, an underground parking lot, a high-rise building, or a place with many occlusions, the first image frame and the second image frame captured by the camera device can be acquired , wherein the first image frame and the second image frame may be two consecutive image frames captured by a camera device.

In the embodiment of the present application, the first image frame and the second image frame respectively include a dynamic target, and the dynamic target is a target that is displaced relative to the ground when the imaging device captures the image frame. The so-called displacement relative to the ground when the imaging device captures the image frame means that the dynamic target moves relative to the ground in space, for example, from position A to position B (position A and position B are two different positions). In one implementation, the dynamic target may be a vehicle, such as a car, a truck, a passenger car, a trailer, a non-holonomic vehicle, a motorcycle, and the like.

It should be understood that the first image frame and the second image frame may include the same dynamic object, the dynamic object being located in different regions in the first image frame and the second image frame. The first image frame and the second image frame may also include different dynamic objects. For example, the first image frame may include vehicle 1 and vehicle 2, the second image frame may include vehicle 1 and vehicle 2, the position of vehicle 1 in the first image frame is different from the position of vehicle 1 in the second image frame, the vehicle The position of 2 in the first image frame is different from the position of vehicle 2 in the second image frame. For another example, the first image frame may include vehicle 1 and vehicle 2, the second image frame may include vehicle 1 and vehicle 3, the second image frame does not include vehicle 3, the first image frame does not include vehicle 2, and the vehicle 1 The position in the first image frame is different from the position of the vehicle 1 in the second image frame.

302. Perform corner detection (corner detection) on the first image frame and the second image frame respectively, to obtain a plurality of first corners of the first image frame and a plurality of corners of the second image frame. Multiple second corner points.

In this embodiment of the present application, after the first image frame and the second image frame captured by the camera device are acquired, corner detection may be further performed on the first image frame and the second image frame to obtain the first image frame and the second image frame. A plurality of first corner points of the image frame and a plurality of second corner points of the second image frame.

In one implementation, step 302 may be performed by the processor of the vehicle itself, that is, the processor of the vehicle itself may perform corner detection on the first image frame and the second image frame to obtain the first image a plurality of first corner points of the frame and a plurality of second corner points of the second image frame. In another implementation, step 302 may be performed by a server on the cloud side, that is, the vehicle may send the first image frame and the second image frame captured by the camera device to the server on the cloud side, and the server may The image frame and the second image frame are subjected to corner detection to obtain a plurality of first corner points of the first image frame and a plurality of second corner points of the second image frame.

In some embodiments, the plurality of first corner points and the plurality of second corner points may be accelerated segment test (Features from accelerated segment test, FAST) corner points, Harris corner points or binary robust Variable scalable key points (Binary Robust Invariant Scalable Keypoints, BRISK) corner points, etc., are not limited in this embodiment of the present invention.

In some cases, it can be understood that the recognition of the image by the human eye is usually completed in a small local area or small window. If the grayscale of the image in the area of the window changes greatly when the small window is moved in a small range in all directions, then it can be considered that there are corner points in the window. If the grayscale of the image in the area of the window does not change when the small window is moved in a small range in all directions, then it can be considered that there are no corners in the window.

Regarding how to obtain a plurality of first corner points of the first image frame and a plurality of corner points of the second image frame by performing corner detection on the first image frame and the second image frame respectively The multiple second corner points may be based on existing implementations, and details are not described herein again.

303. Eliminate the first corner points in the region where the dynamic target included in the first image frame is located among the plurality of first corner points, so as to obtain a plurality of first corner points after the elimination; Eliminate the plurality of second corners point the second corner points of the region where the dynamic target included in the second image frame is located, so as to obtain a plurality of second corner points after culling.

It should be understood that the first corner points of the region where the dynamic target is located in the first image frame included in the plurality of first corner points may be firstly eliminated, and then the second image frame of the plurality of second corner points may be eliminated. The second corner point of the area where the included dynamic target is located; or, firstly remove the second corner point of the area where the dynamic target is located in the second image frame included in the plurality of second corner points, and then remove the plurality of second corner points. In a corner point, the first corner point of the region where the dynamic object is included in the first image frame; A corner point and an operation of culling a second corner point of the region where the dynamic object included in the second image frame is located among the plurality of second corner points.

In one implementation, a pre-trained neural network may be used to detect dynamic objects in the first image frame and detect dynamic objects in the second image frame, so as to obtain the dynamic objects included in the first image frame the area where the dynamic target is located, and the area where the dynamic target is located in the second image frame, and the first corner point located in the area where the dynamic target detected in the first image frame is located among the plurality of first corner points is culled to obtain The eliminated first corner point, and the second corner point located in the region where the dynamic target detected in the second image frame is located among the plurality of second corner points is eliminated to obtain the eliminated second corner point.

In one implementation, the optical flow method can also be used to track the corners and add new ones, and then give each corner a unique id, and calculate the normalized coordinates of the corners in the camera coordinate system, Pixel coordinates, pixel velocity, etc.

When there are dynamic corner points (that is, the corner points of the area where the dynamic target is located), the observation amount of the same corner point in different camera states includes not only the parallax caused by the motion of the camera itself, but also the parallax caused by the motion of the feature point itself. Parallax, in the embodiment of the present application, by eliminating the corner points in the area where the dynamic target is located among the plurality of corner points, the visual reprojection error term can be guaranteed to be reliable. Specifically, the visual reprojection optimization term can refer to the following formula:

in,

is the observed coordinate of the l-th landmark point in the j-th camera's normalized camera coordinate system, and π _c is the camera internal parameter. In order to obtain the motion information of the camera itself, when there are dynamic corners, the observation of the same corner in different camera states includes not only the parallax caused by the motion of the camera itself, but also the parallax caused by the movement of the corner itself. In order to avoid the occurrence of this situation, in this embodiment, the dynamic corner points are removed during image processing to improve the accuracy of the visual residual item.

Exemplarily, referring to FIG. 4 , when the imaging device is a forward-looking monocular camera, feature extraction and tracking and dynamic target culling can be performed on the image frames captured by the camera, and then the dynamic features can be removed by using the monocular cross-matrix structure SFM. Visual residuals affected by points (also called dynamic corners).

304. Determine, according to the plurality of first corner points after culling and the plurality of second corner points after culling, when the camera device is photographing the second image frame, relative to when the first image frame is photographed posture changes.

In the embodiment of the present application, the first corner points of the region where the dynamic target included in the first image frame is located are removed from the plurality of first corner points, so as to obtain a plurality of removed first corner points; The second corner points of the region where the dynamic target is located in the second image frame included in the plurality of second corner points, after obtaining the plurality of second corner points after culling, can point and the culled second corner points to determine the change of the pose of the imaging device when the second image frame is captured relative to the first image frame.

The embodiments of the present application can be applied to a target vehicle, where the camera device, the inertial measurement unit IMU, and the wheel speedometer are fixedly installed in the target vehicle, and the camera device measured by the IMU can also be obtained when shooting the first The motion state data of the target vehicle during the period from the image frame to the capture of the second image frame, and the acquisition of the camera device measured by the wheel speedometer between the capture of the first image frame and the capture of the second image frame. During the period, the wheel speed data of the target vehicle can be obtained, and further, according to the motion state data, the wheel speed data, the plurality of first corner points after the elimination, and the plurality of second corner points after the elimination, determine the The pose of the imaging device when the second image is photographed is changed relative to when the first image is photographed. The motion state data may be vehicle acceleration data and angular velocity data measured by an inertial sensor (IMU), and the wheel speed data may be vehicle wheel speed, steering wheel angle data, and the like.

Specifically, in one implementation, it may be determined according to the disparity of the plurality of first corner points after culling and the plurality of second corner points after culling in the respective image frames that the camera device is in the shooting location. relative to the first posture change when the second image frame is captured, wherein the first posture change does not include scale information; according to the motion state data, the wheel speed data and the For the first pose change, determine a second pose change when the camera device captures the second image frame relative to when the first image frame is captured, and the second pose change includes scale information; Non-linear optimization is performed on the second pose change to obtain the pose change. Wherein, in the process of nonlinear optimization, nonlinear optimization may be performed on the second pose change according to a preset optimization function, wherein the preset optimization function includes a wheel speed meter residual term.

Nonlinear optimization refers to finding an optimal set of numerical mappings in a given objective function, namely x-min f(x). According to the derivative theory, the effective value of x can be obtained by solving the derivative equation Δf(x)=0. When f(x) is a nonlinear function, the optimization is nonlinear optimization. The optimization function in this embodiment of the present application may be composed of five items, which may be specifically expressed as: optimization function=a priori residual + IMU residual term + wheel speedometer residual term + visual reprojection error term + loop closure detection reprojection error item.

In this embodiment of the present application, a plurality of first corner points after removal and a plurality of second corner points after removal may be time stamped to align with motion state data and wheel speed data, wherein, due to different data frequencies, A single frame of image data (removed corner points) corresponds to multiple motion state data and wheel speed data. After that, the motion state data and wheel speed data of each frame of image can be pre-integrated to provide initial pose values for the image. After that, the sliding window method is used, and pure visual data is used for initialization processing to obtain the pose and position without scale information (also called depth) when the camera equipment captures all image frames in the sliding window. The speed data is restored without scale information, and the positions of all corner points are recalculated, and then nonlinear optimization is performed on the restored scale information and the inverse depth of the corner points to obtain the pose change.

Specifically, you can refer to the following formula:

in,

is the position of the kth frame in the world coordinate system,

is the angle of the kth frame in the world coordinate system,

is the velocity of the kth frame in the world coordinate system,

is the position of the pre-integration between two frames,

is the pre-integrated angle between two frames,

is the speed of preintegration between two frames,

is the acceleration bias of the kth frame,

is the offset of the k-th frame gyroscope,

is the quaternion representation of the pose of the kth frame in the world coordinate system.

It can be seen from the above formula that the estimated value can include the accelerometer bias b _a and the gyroscope bias b _ω . Due to insufficient excitation under two-dimensional motion, the acceleration bias is difficult to estimate, which will cause the bias value to converge too slowly. The pose estimation is inaccurate. Therefore, the wheel speed meter residual term is introduced in the embodiment of the present application, which can be specifically shown in the following formula:

in,

is the angle of the kth frame in the world coordinate system,

is the position of the kth frame in the world coordinate system,

is the position of the pre-integration between two frames,

is the pre-integrated angle between two frames,

Exemplarily, referring to FIG. 4 , pre-integration can be performed on the wheel speed data (including vehicle speed data and steering wheel data, etc.) measured by the wheel speedometer to obtain a pre-integration result, and then a pre-integration residual is performed based on the pre-integration result. establishment.

In the embodiment of the present application, a wheel speed meter residual item is added in view of the disadvantage that the two-dimensional motion initialization error of the visual inertial navigation system is relatively large. The pose is initialized and calculated through the joint information of vision, IMU and wheel speedometer. So that when the IMU convergence is not good, the wheel speedometer can be used to compensate, so as to improve the effect.

In one implementation, the current key frame and the map can also be closed-loop detection, the detected frame is a closed-loop frame, and the common-view feature point between the closed-loop frame and the frame in the sliding window can be found; the reprojection error term is established to add nonlinear optimization. Medium; perform four-degree-of-freedom optimization on the keyframes that have been optimized and slide out of the window and their associated frames; insert the optimized keyframes into the map; in actual use, the map, that is, loopback detection, can be selected to be enabled or disabled. While positioning and building a map, the final position is equivalent to the first fixed camera position. Of course, this can be fixed to a specific reference system by subsequent coordinate transformation.

An embodiment of the present application provides a method for determining a pose, the method includes: acquiring a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are for the camera The adjacent image frames captured by the device, the first image frame and the second image frame respectively include dynamic targets; by performing corner detection on the first image frame and the second image frame respectively (corner detection) ) to obtain a plurality of first corner points of the first image frame and a plurality of second corner points of the second image frame; excluding the plurality of first corner points included in the first image frame the first corner points of the area where the dynamic target is located, to obtain a plurality of first corner points after culling; culling the second corner points of the area where the dynamic target is located in the second image frame included in the plurality of second corner points, to obtain a plurality of second corner points after culling; according to the plurality of first corner points after culling and the plurality of second corner points after culling, it is determined that the camera device is shooting the second image frame relative to the pose change when the first image frame was captured. When there are dynamic corner points, the observation amount of the same corner point in different camera states includes not only the parallax caused by the movement of the camera itself, but also the parallax caused by the movement of the corner point itself. In this embodiment, the above-mentioned methods are used for the dynamic environment. To solve the problem of large positioning error, the corner points in the region where the dynamic target is located among the multiple corner points are eliminated to ensure that the visual reprojection error is more reliable, thereby overcoming the determination error of the pose change introduced by the dynamic feature.

Referring to FIG. 5 , FIG. 5 is a schematic structural diagram of a pose determination apparatus provided by an embodiment of the present application. As shown in FIG. 5 , a pose determination apparatus 500 provided by an embodiment of the present application includes:

The acquiring module 501 is configured to acquire a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are adjacent image frames captured by the camera device, and the first image frame and the second image frame are adjacent image frames captured by the camera device. An image frame and the second image frame respectively include dynamic objects;

A corner extraction module 502 is configured to perform corner detection (corner detection) on the first image frame and the second image frame to obtain a plurality of first corners and all the first corners of the first image frame. multiple second corner points of the second image frame;

A culling module 503, configured to cull the first corner points of the region where the dynamic target included in the first image frame is located among the plurality of first corner points, so as to obtain a plurality of culled first corner points;

The positioning module 504 is configured to determine, according to the plurality of first corner points after culling and the plurality of second corner points after culling, relative to the photographing of the camera device when photographing the second image frame The pose change at the first image frame.

In one possible implementation,

The device is applied to a target vehicle, and the target vehicle is fixedly provided with the camera device, the inertial measurement unit IMU, and a wheel speedometer; the acquisition module is configured to acquire the camera device measured by the IMU in the shooting location. motion state data of the target vehicle during the period from the first image frame to the shooting of the second image frame;

In a possible implementation, the apparatus further includes:

A dynamic target detection module, configured to detect the dynamic target in the first image frame through a pre-trained neural network, and detect the dynamic target in the second image frame, so as to obtain the dynamic target included in the first image frame the region where the dynamic target is located, and the region where the dynamic object included in the second image frame is located.

Based on the same concept, referring to FIG. 6 , an embodiment of the present application provides an apparatus 600 for determining a pose and attitude, including a transceiver 610, a processor 620, and a memory 630; the memory 630 is used to store programs, instructions, or codes; in executing programs, instructions or codes in memory 630;

a transceiver 610, configured to receive the first image frame and the second image frame input by the camera device;

The processor 620 is configured to perform corner detection (corner detection) on the first image frame and the second image frame respectively, so as to obtain a plurality of first corners of the first image frame and the Multiple second corner points of two image frames; culling the first corner points of the region where the dynamic target included in the first image frame is located among the multiple first corner points, so as to obtain the multiple first corner points after the culling ; Eliminate the second corner points of the dynamic target area included in the second image frame in the plurality of second corner points to obtain a plurality of second corner points after the elimination; According to the plurality of second corner points after the elimination A corner point and a plurality of second corner points after the culling are used to determine the change of the pose of the imaging device when the second image frame is captured relative to the first image frame.

The processor 620 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 620 or an instruction in the form of software. The above-mentioned processor 602 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 630, and the processor 620 reads the information in the memory 630, and performs the above method steps in combination with its hardware.

Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, apparatuses (systems), and computer program products according to the embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

A pose determination method, characterized in that the method comprises:

Acquire a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are adjacent image frames captured by the camera device, and the first image frame and the second image frame are adjacent image frames captured by the camera device. The second image frames respectively include dynamic objects;

By performing corner detection (corner detection) on the first image frame and the second image frame respectively, a plurality of first corner points of the first image frame and a plurality of first corner points of the second image frame are obtained second corner;

Eliminating the first corner points of the region where the dynamic target is located in the first image frame included in the plurality of first corner points, to obtain a plurality of first corner points after the elimination;

Eliminating the second corner points of the region where the dynamic target is located in the second image frame included in the plurality of second corner points, to obtain a plurality of eliminated second corner points;

According to the plurality of first corner points after the culling and the plurality of second corner points after the culling, it is determined that the image capturing device when the second image frame is photographed is relative to the time when the first image frame is photographed. Pose changes.
The method of claim 1, wherein the plurality of first corner points and the plurality of second corner points comprise one of the following: accelerated segment test feature FAST corner points, Harris corner points, and Binary Robust Invariant Scalable Keypoint BRISK Corner.
The method according to claim 1 or 2, wherein the method is applied to a target vehicle, and the target vehicle is fixedly provided with the camera device, the inertial measurement unit (IMU) and a wheel speedometer; the method further comprises:

Acquiring the motion state data of the target vehicle during the period from shooting the first image frame to shooting the second image frame by the camera device measured by the IMU;

acquiring the wheel speed data of the target vehicle during the period from shooting the first image frame to shooting the second image frame by the camera device measured by the wheel speed meter;

Correspondingly, according to the plurality of first corner points after culling and the plurality of second corner points after culling, it is determined relative to when the camera device is photographing the second image frame, relative to when photographing the second image frame. The pose changes during an image frame, including:

According to the parallax of the plurality of first corner points after culling and the plurality of second corner points after culling in the respective image frames, it is determined that the image capturing device is relatively relative to the image frame when capturing the second image frame. the first pose change during the first image frame, wherein the first pose change does not include scale information;

According to the motion state data, the wheel speed data and the change of the first posture, determine the second posture of the camera device when the second image frame is photographed relative to when the first image frame is photographed change, the second pose change includes scale information;

Non-linear optimization is performed on the second pose change to obtain the pose change.
The method according to claim 3, wherein the non-linear optimization of the second pose change comprises:

Non-linear optimization is performed on the second pose change according to a preset optimization function, wherein the preset optimization function includes a wheel speed meter residual term.
The method according to any one of claims 1 to 4, wherein the method further comprises:

A pre-trained neural network is used to detect the dynamic target in the first image frame, and detect the dynamic target in the second image frame, so as to obtain the region where the dynamic target included in the first image frame is located, and the The second image frame includes the area where the dynamic object is located.
The method according to any one of claims 1 to 5, wherein the dynamic target is a vehicle.
A pose determination device, characterized in that the device comprises:

an acquisition module, configured to acquire a first image frame and a second image frame captured by a camera device, where the first image frame and the second image frame are adjacent image frames captured by the camera device, and the first image frame and the second image frame are adjacent image frames captured by the camera device. The image frame and the second image frame respectively include dynamic objects;

a corner extraction module, configured to obtain a plurality of first corners of the first image frame and the a plurality of second corner points of the second image frame;

A culling module, configured to cull the first corner points of the region where the dynamic target included in the first image frame is located among the plurality of first corner points, so as to obtain a plurality of culled first corner points;

Eliminating the second corner points of the region where the dynamic target is located in the second image frame included in the plurality of second corner points, to obtain a plurality of eliminated second corner points;

a positioning module, configured to determine, according to the plurality of first corner points after culling and the plurality of second corner points after culling, relative to when the camera device is photographing the second image frame, relative to when photographing the first The pose change over an image frame.
The apparatus of claim 7, wherein the plurality of first corner points and the plurality of second corner points comprise one of the following: accelerated segment test feature FAST corner points, Harris corner points, and Binary Robust Invariant Scalable Keypoint BRISK Corner.
The device according to claim 7 or 8, characterized in that, the device is applied to a target vehicle, and the target vehicle is fixedly provided with the camera device, the inertial measurement unit (IMU) and the wheel speedometer; the acquisition module, using in acquiring the motion state data of the target vehicle during the period from shooting the first image frame to shooting the second image frame of the camera device obtained by the IMU measurement;

acquiring the wheel speed data of the target vehicle during the period from shooting the first image frame to shooting the second image frame by the camera device measured by the wheel speed meter;

Correspondingly, the positioning module is configured to determine, according to the disparity of the plurality of first corner points after culling and the plurality of second corner points after culling in respective image frames, that the camera device is in the shooting location. The first pose change when the second image frame is taken relative to when the first image frame is captured, wherein the first pose change does not include scale information;

According to the motion state data, the wheel speed data and the change of the first posture, determine the second posture of the camera device when the second image frame is photographed relative to when the first image frame is photographed change, the second pose change includes scale information;

Non-linear optimization is performed on the second pose change to obtain the pose change.
The device according to claim 9, wherein the positioning module is configured to perform nonlinear optimization on the second pose change according to a preset optimization function, wherein the preset optimization function includes a Speedometer residual term.
The device according to any one of claims 7 to 10, wherein the device further comprises:

A dynamic target detection module, configured to detect the dynamic target in the first image frame through a pre-trained neural network, and detect the dynamic target in the second image frame, so as to obtain the dynamic target included in the first image frame the region where the dynamic target is located, and the region where the dynamic object included in the second image frame is located.
The device according to any one of claims 7 to 11, wherein the dynamic target is a vehicle.
A non-volatile computer-readable storage medium, wherein the non-volatile computer-readable storage medium contains computer instructions for executing the pose determination method according to any one of claims 1 to 6.
A computing device, characterized in that the computing device includes a memory and a processor, the memory stores codes, and the processor is used to obtain the codes to execute the bits according to any one of claims 1 to 6 Pose determination method.