Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present application may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
For convenience of understanding, technical terms related to the present application are explained as follows:
the "moving device" referred to in the present application may be any equipment with moving capability, including, but not limited to, automobiles, ships, submarines, airplanes, aircrafts, etc., wherein the automobiles include vehicles with six automatic Driving technical grades, i.e., L0-L5, formulated by Society of automatic Engineers (SAE International) or national standard "automobile Driving automation classification", hereinafter referred to as automatic-Driving Vehicle ADV (automatic-Driving Vehicle).
An "autonomous vehicle ADV" as referred to herein may be a vehicle device or a robotic device having various functions as follows:
(1) manned functions, such as home cars, buses, and the like;
(2) cargo carrying functions, such as common trucks, van trucks, dump trailers, enclosed trucks, tank trucks, flat vans, container vans, dump trucks, special structure vans and the like;
(3) tool functions such as logistics distribution vehicles, Automated Guided Vehicles (AGV), patrol vehicles, cranes, excavators, bulldozers, forklifts, road rollers, loaders, off-road vehicles, armored vehicles, sewage treatment vehicles, sanitation vehicles, dust suction vehicles, ground cleaning vehicles, watering vehicles, sweeping robots, food delivery robots, shopping guide robots, lawn mowers, golf carts, etc.;
(4) entertainment functions, such as recreational vehicles, casino automatic drives, balance cars, and the like;
(5) special rescue functions, such as fire trucks, ambulances, electrical power rush-repair trucks, engineering rescue vehicles and the like.
Fig. 1 is a flowchart of a method for estimating dynamic and static states of an obstacle according to an embodiment of the present invention, which includes the following steps:
s11, acquiring image data of the environment where the mobile device is located;
s12: carrying out image analysis on the image data, and determining a plurality of pairs of background feature matching points and a plurality of pairs of foreground feature matching points of a foreground barrier to be estimated in two adjacent frames of images;
s13: determining the corresponding relation of the two adjacent frames of images based on the coordinates of the multiple pairs of background feature matching points in a preset coordinate system;
s14: and judging whether the coordinates of the multiple pairs of foreground feature matching points in the preset coordinate system accord with the determined corresponding relation of the two adjacent frames of images or not so as to carry out dynamic and static estimation on the foreground barrier.
In the present embodiment, visual image information is captured using a mobile device equipped with an imaging device. As an embodiment, a mobile device includes: vehicles, aircraft, and underwater robots. The vehicle may be an unmanned vehicle, such as an unmanned sweeper, an unmanned ground washer, an unmanned logistics vehicle, and an unmanned taxi. The camera may be an onboard camera.
Taking the vehicle and the onboard camera as an example, for step S11, the various mobile devices exemplified above may obtain image data of their environment to perform obstacle dynamic and static estimation. However, in some special cases, a plurality of mobile devices may construct a mobile network (for example, two aircraft are communicatively connected to each other (which is common when a plurality of aircraft perform various patterns in the air), and both aircraft may obtain image data of their respective environments), however, if one of the aircraft sensors is damaged and cannot obtain image data of the environment, the aircraft may obtain image data of the other aircraft for estimation under the network. Without networking, only a single mobile device may obtain image data if other devices around the mobile device are available for cooperation (e.g., a vehicle traveling in a road, a sensor damaged, other vehicles around the vehicle in cooperation providing image data for it, and obstacle estimation may also be performed).
For step S12, after the captured visual image data (i.e., image data) is obtained, image analysis is performed to analyze features, foreground, and background in the image, and further determine pairs of background feature matching points and pairs of foreground feature matching points of foreground obstacles that need to be dynamically and statically estimated in two adjacent frames of images.
As an embodiment, comprising:
carrying out ORB feature extraction and matching on two adjacent frames of images in the visual image data to obtain a plurality of pairs of feature matching points;
carrying out image segmentation on the visual image data to distinguish a foreground and a background;
2D target detection is carried out on the visual image data, and a target detection frame is used for selecting an obstacle to be estimated;
based on the image segmentation result, the multiple pairs of feature matching points are divided into multiple pairs of foreground feature matching points and multiple static pairs of background feature matching points;
and associating the plurality of pairs of foreground feature matching points with the target detection frame, and determining the foreground barrier to be estimated and the plurality of corresponding pairs of foreground feature matching points.
In this embodiment, ORB (Oriented FAST and Rotated BRIEF, directed FAST feature extraction and rotation BRIEF feature description algorithm) feature extraction may be performed on the visual image data in the first thread; wherein, the fast (features from accessed Segment test) feature is a picture feature for accelerating the Segment test detection; BRIEF (binary Robust Independent features) features are binary Robust Independent basic features. And matching ORB features extracted from two adjacent frames of images to obtain a plurality of pairs of feature matching points. The ORB feature extraction adopts an improved FAST key point detection method, so that the extracted features have directionality, and the BRIEF features having rotation invariance are adopted (of course, the method for feature extraction may be selected according to specific requirements, and is not limited here). The extraction and matching of features is made by: the method comprises three steps of key point detection, key point feature description and key point matching. The features are not semantic information, and may be specific structures in the image, such as points and edges. Visual feature descriptors are used to describe the local appearance around each feature point (position with coordinate system) that is (ideally) invariant under variations in illumination, translation, scale and in-plane rotation. The matching of features is by comparing feature descriptors in two images to identify similar features.
In the second thread, the categories of image semantic segmentation or instance segmentation (a specific segmentation method can be specially selected according to specific requirements, and a deep learning network adopted by the segmentation method is not specifically limited) on the visual image data are divided into two categories: foreground and background, wherein the foreground is substantially constituted by traffic participants, e.g. people, cyclists, cars, etc., of the dynamic/static foreground obstacles to be estimated in the autonomous driving environment. Whereas the background is essentially made up of static background obstacles in the autonomous driving environment, such as poles, trees, the ground, etc.
And distinguishing foreground obstacle feature matching points and background obstacle feature matching points from a plurality of pairs of feature matching points extracted from the first thread by utilizing the segmentation result of the image by using semantic segmentation/example segmentation in the second thread. Foreground obstacles may be considered herein to include both dynamic obstacles and static obstacles. Background obstacles include only static obstacles. And further associating the determined multiple pairs of foreground feature matching points with the target detection frame, so that the foreground barrier to be estimated and the corresponding multiple pairs of foreground feature matching points are determined.
For step S13, according to the imaging principle of the camera, the moving camera (moving with the vehicle) takes the same static object at two different positions, and there is a certain correspondence between the two pictures taken. The corresponding relation between the two pictures can be relatively and accurately determined through the coordinate mapping relation of the multiple pairs of feature matching points determined by the two pictures in the preset coordinate system.
For step S14, based on the coordinates of the pairs of foreground feature matching points of the foreground obstacle extracted from the visual image, it may be verified whether the pairs of coordinates conform to the determined correspondence between the two photographs. The matching feature points of the two adjacent frames of images are used for judgment, and the time dimension is added, so that whether the feature matching point pair of the foreground barrier meets the corresponding relation between the two adjacent frames of photos determined based on the matching feature point pair of the background (static) barrier or not can be accurately judged, namely whether the two adjacent frames of photos have static characteristics or not.
As an embodiment, estimation can be performed based on epipolar geometry and background obstacle features, and epipolar constraint in the epipolar geometry can be utilized in determining the correspondence between two frames of photos.
Taking a specific embodiment as an example: estimating epipolar lines in the corresponding second frame images based on the coordinates of the first feature matching points in the first frame images and the corresponding relation between the two adjacent frame images;
judging whether corresponding second feature matching points in a second frame image are far away from the epipolar line by utilizing epipolar constraint, and determining the second feature matching points far away from the epipolar line as dynamic points;
and performing dynamic and static estimation on the foreground barrier according to the judgment results of all the second feature matching points.
According to the epipolar constraint, if the object is a static object, the second feature matching point matched with the first feature matching point is always on the polar line in the second frame image. Whether the feature matching point is a dynamic point can be judged by judging whether the feature matching point is positioned on the polar line or not by utilizing the polar line constraint. But considering external factors in the real environment (e.g. camera shake, ground resonance, or disturbance from ambient wind), the inventors have optimized the way of decision, i.e. whether to move away from the epipolar line. So that the judgment result has more operation flexibility.
For epipolar constraint, the correspondence between two adjacent frames of images can be represented by a basic matrix. At this time, the coordinates of the feature matching points may be selected from the coordinates in the image coordinate system. The basic matrix can be obtained by calculation using an eight-point algorithm, or by multiplying an essential matrix (see below) containing the attitude information of the camera by a parameter matrix in the camera. The invention is not limited thereto.
In order to judge whether the feature matching point is far away from the epipolar line, the inventor designs a point-to-epipolar line distance formula, and a specific calculation formula is as follows:
wherein: d is the distance of the feature point to the epipolar line, wherein,
,P
1pixel coordinates representing the feature point of the foreground obstacle in the first frame image, P2 representing pixel coordinates of the feature point matching P1 in the second frame image,P
1and P
2U and v in (1) are the coordinates of the pixel points in the image coordinate system respectively.
In the above formula, I2Representing polar lines, by column vectors [ X, Y, Z]TWhere X, Y, and Z are polar coordinate components, u is a coordinate value in the X direction, v is a coordinate value in the Y direction, and 1 is one pixel. T is the transposed sign of the matrix. The molecule in formula D has P2 TFP1Wherein FP1=[X,Y,Z]TI.e. the component value containing Z. And | X | |, and | Y | | represent distance values in space.
Can find out the molecule
Is an epipolar constraint equation. When in use
When =0, D =0, indicating that the second feature matching point is located on the epipolar line.
However, the inventors have designed the denominator that characterizes the epipolar line:
thus, when the numerator is not 0 (meaning that the second feature matching point does not fall on the polar line), the distance of the second feature matching point from the polar line can be measured, and whether to move away or not can be determined from the measured relative distance.
The camera type to be mounted may be different in consideration of the type of the mobile device, and a monocular camera or an RGB-D binocular camera may be selected as the camera type. When the camera is selected, the camera can be adaptively selected according to specific requirements, which is not limited herein.
As an alternative embodiment of the epipolar constraint, when the coordinates of the feature matching points are represented by coordinates in a camera coordinate system, the corresponding relationship between two adjacent frames of images can be represented by an essential matrix. The intrinsic matrix comprises attitude information of the camera (a rotation matrix and a translation vector of the camera between different positions), when the attitude information of the camera is needed in other application scenes, the intrinsic matrix can be selected for a barrier dynamic and static estimation method, and the calculation load is reduced by multiplexing data related to the intrinsic matrix. As described above, the essential matrix and the basic matrix have a corresponding relationship, that is, the essential matrix and the intrinsic parameter matrix are multiplied to obtain the basic matrix, and therefore, a distance formula from the second feature matching point to the epipolar line under the camera coordinate can also be designed based on the essential matrix, and whether the second feature matching point is far away from the epipolar line is determined by the corresponding distance formula, which is not repeated herein.
Specifically, based on epipolar constraint, measuring the distance from each second feature matching point to the epipolar line;
and determining the second feature matching point with the distance larger than the set threshold value as a dynamic point.
And determining the second feature matching point with the distance smaller than the set threshold value as a static point.
In actual use, the set threshold value is adjusted according to the actual situation, which is not limited herein. When the distance D of a certain second feature matching point from the epipolar line exceeds a set threshold value, the second feature matching point can be determined to be a dynamic point. Similarly, when the distance D from the epipolar line of a second feature matching point is smaller than a set threshold, the second feature matching point can be determined to be a static point.
Therefore, dynamic and static estimation can be performed on the foreground obstacle according to the dynamic and static judgment results of all the second feature matching points in the judgment, and the overall steps are shown in fig. 2.
According to the embodiment, the foreground barrier and the background barrier are distinguished in the visual image data, and the corresponding relation between two adjacent frames of images is determined by using the feature matching points of the background barrier, so that the accurate sensing of the position or the speed of the barrier by the sensor is not depended on in the dynamic and static estimation process of the foreground barrier, the failure caused by the change of the position of the sensor to the barrier amount is avoided, the position or the speed of the barrier does not need to be converted into an absolute coordinate system, and the dependence of sensing on the positioning data of the automatic driving mobile device is reduced.
As an embodiment, performing dynamic and static estimation on the foreground obstacle according to the determination results of all the second feature matching points includes:
and performing dynamic and static estimation on the foreground barrier according to the number proportion of the dynamic points and/or the static points.
In this embodiment, the dynamic and static state estimation is not simply performed only according to the existence of the dynamic point, for example, a pedestrian standing on the ground to swing an arm really has a dynamic point according to the judgment of the dynamic and static states, but the pedestrian still stands on the ground actually, so that the static point and the dynamic point need to be comprehensively judged. Of course, the specific number ratio of the judgments may be adjusted according to specific situations, and is not limited herein.
As an implementation manner, as shown in fig. 3, a flowchart of a method for dynamically and statically estimating an obstacle according to another embodiment of the present invention is provided, where after dynamically and statically estimating the foreground obstacle, the method further includes:
s15: performing 3D obstacle detection by using laser radar data of an environment where the mobile device is located, and projecting the detected 3D obstacle into the image data;
s16: fusing the 2D dynamic and static estimation information of the foreground obstacle into the corresponding 3D obstacle in the image data.
In the present embodiment, it is considered that the traveling vehicle actually has a three-dimensional space. For even more accurate estimation, a fusion of 2D and 3D is performed. The mobile device is equipped with a shooting device for shooting image data and a laser radar for acquiring laser radar data.
In general, the method refines the dynamic and static state estimation of the 3D obstacle more simply, and as shown in fig. 4, is a general flow chart of the dynamic and static state estimation of the 3D obstacle. After the mobile device obtains visual image data (which is simply image data), the mobile device detects a 2D barrier in an image, distinguishes a foreground from a background, performs dynamic and static estimation on a corresponding foreground barrier, and finally obtains dynamic and static information of the foreground 2D barrier (which is the foreground barrier, which may also be a part of the dynamic and static estimation of the foreground 2D barrier, and more specifically, the dynamic and static estimation of the foreground barrier is described above and is not described herein again).
For step S15, after the laser radar point cloud data is obtained by using a laser radar target detection method (different detection methods may be selected for different environmental conditions), 3D obstacle detection is performed, and then the 3D obstacle detected by the laser is projected into the visual image data by using the shooting device and the internal and external calibration parameters of the laser radar.
For step S16, the dynamic and static state of the foreground obstacle and the obstacle detected by the laser are matched and information fused with each other, so as to obtain fused dynamic and static information of the 3D obstacle.
According to the embodiment, after the dynamic and static information of the 2D obstacle in the visual image data is obtained, the dynamic and static estimation of the 3D obstacle is finally realized by utilizing the fusion of vision and the laser radar. Therefore, the accuracy of dynamic and static estimation of the obstacles in the real environment driving process is further improved.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the obstacle dynamic and static estimation method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
performing image analysis on visual image data shot by the mobile device, and determining multiple pairs of background feature matching points and multiple pairs of foreground feature matching points of a foreground barrier to be estimated in two adjacent frames of images;
determining the corresponding relation of the two adjacent frames of images based on the coordinates of the multiple pairs of background feature matching points in a preset coordinate system;
and judging whether the coordinates of the multiple pairs of foreground feature matching points in the preset coordinate system accord with the determined corresponding relation of the two adjacent frames of images or not so as to carry out dynamic and static estimation on the foreground barrier.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform the obstacle dynamic-static estimation method in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the obstacle dynamic and static state estimation method comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the steps of the obstacle dynamic and static state estimation method of any embodiment of the invention.
Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to a method for estimating dynamic and static states of an obstacle provided in another embodiment of the present application, and as shown in fig. 5, the electronic device includes:
one or more processors 510 and memory 520, with one processor 510 being an example in fig. 5. The apparatus of the obstacle dynamic-static estimation method may further include: an input device 530 and an output device 540.
The processor 510, the memory 520, the input device 530, and the output device 540 may be connected by a bus or other means, and the bus connection is exemplified in fig. 5.
The memory 520, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the obstacle dynamic and static state estimation method in the embodiments of the present application. The processor 510 executes various functional applications and data processing of the server by executing the nonvolatile software programs, instructions and modules stored in the memory 520, so as to implement the obstacle dynamic and static state estimation method of the above method embodiment.
The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of a camera, a lidar, and the like within the mobile device. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 520 may optionally include memory located remotely from processor 510, which may be connected to a mobile device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Input device 530 may receive input numeric or character information and generate signals related to visual image data, lidar data. The output device 540 may include a display device such as a display screen.
The one or more modules are stored in the memory 520 and, when executed by the one or more processors 510, perform the obstacle dynamic and static estimation method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, among others.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other on-board electronic devices with data interaction functions, such as a vehicle-mounted device mounted on a vehicle.
As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.