WO2022143237A1

WO2022143237A1 - Target positioning method and system, and related device

Info

Publication number: WO2022143237A1
Application number: PCT/CN2021/139421
Authority: WO
Inventors: 唐道龙; 李宏波; 李冬虎; 常胜; 沈建惠
Original assignee: 华为技术有限公司
Priority date: 2020-12-31
Filing date: 2021-12-18
Publication date: 2022-07-07
Also published as: CN114693785A

Abstract

A target positioning method and system, and a related device. The method comprises the following steps: acquiring a first image and a second image; performing target detection and matching on the first image and the second image, so as to obtain a first target area of the first image and a second target area of the second image; performing feature point detection and matching on the first target area and the second target area, so as to obtain a feature point matching result; and determining position information of a target according to the feature point matching result. By means of the method, feature point detection and matching are performed on a target area where a target is located, and parallax information of the target is determined according to a pixel difference between feature points, without the need to match each pixel in a first target area and a second target area, such that computing resources required for target positioning are reduced, and a background image is prevented from interfering with target parallax computation, thereby improving the accuracy of parallax computation, and improving the precision of distance measurement and positioning.

Description

A method, system and related equipment for target positioning

This application requires a Chinese patent application with the application number 202011638235.5 and the application title "A Data Processing Method, System and Equipment" to be filed with the China Intellectual Property Office on December 31, 2020, and filed in China on May 24, 2021 The Intellectual Property Office, the application number is 202110567480.X, the application title is "a method, system and related equipment for target positioning" Chinese patent application priority, the entire content of which is incorporated in this application by reference.

technical field

The present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to a method, system and related equipment for target positioning.

Background technique

With the continuous development of AI technology, stereo vision algorithms are currently widely used in intelligent security, autonomous driving, industrial inspection, 3D reconstruction, virtual reality and other fields, reflecting its strong technical competitiveness. Stereo vision algorithms usually use a multi-camera to shoot the target to obtain a multi-channel image of the target, and then determine the parallax of the target according to the multi-channel image. The directional difference produced by a target, the distance between the target and the camera can be calculated based on the distance between the cameras (i.e. the baseline length) and the parallax.

However, when the current stereo vision algorithm determines the distance between the target and the camera, since the target is not a point in the multi-channel image, but an image area, it is necessary to determine the disparity of each pixel in the area. Parallax and the baseline length of the multi-eye camera determine the distance between the target and the camera. This process not only consumes huge computing resources, but also is prone to noise and calculation errors during the calculation process, resulting in poor target positioning accuracy, which in turn affects subsequent 3D reconstruction, Automatic driving, security monitoring and other applications.

SUMMARY OF THE INVENTION

The present application provides a method, system and related equipment for target positioning, which are used to solve the problems that the target positioning process consumes a huge amount of resources and the target positioning accuracy is poor.

A first aspect provides a method for locating a target, the method comprising the following steps: acquiring a first image and a second image, the first image and the second image are obtained by shooting the same target at the same time by a multi-camera camera, and then Perform target detection and matching on the first image and the second image to obtain a first target area of the first image and a second target area of the second image, wherein the first target area and the second target area include targets, and the first target area and the second target area include targets. The target area and the second target area are subjected to feature point detection and matching, and a feature point matching result is obtained, wherein the feature point matching result includes the corresponding relationship between the feature points in the first target area and the feature points in the second target area, The feature points with corresponding relationship describe the same feature of the target, and the position information of the target is determined according to the matching result of the feature points and the parameter information of the multi-camera.

In the specific implementation, the parameter information includes at least the baseline length of the multi-camera camera and the focal length of the multi-camera camera, and the parallax information of the target can be obtained according to the pixel difference between the feature points with corresponding relationship in the feature point matching result, and the parallax information includes the first The difference between the pixel coordinates of the feature points in one target area and the pixel coordinates of the corresponding feature points in the second target area, and then according to the disparity information of the target, the baseline length of the multi-camera camera, and the focal length of the multi-camera camera, Determine the distance between the target and the camera, and obtain the position information of the target.

Implement the method described in the first aspect, determine the parallax information of the target according to the matching result of the feature points, and then determine the position information of the target, without performing matching and parallax calculation on each pixel in the first image area and the second image area, thereby reducing The computing resources required for positioning and ranging, while avoiding background interference, noise and other problems, improve the accuracy of ranging and positioning.

In a possible implementation manner of the first aspect, the multi-camera camera includes multiple camera groups, and each camera group in the multiple camera groups includes multiple cameras. Based on this, baseline data of the multi-camera camera can be obtained. Including the baseline length between multiple cameras in each group of cameras, according to the measurement accuracy requirements of the target, the target baseline is obtained from the baseline data, and then the first image and the second image are obtained according to the target baseline, wherein the first image and the second image are obtained. The second image is captured by the camera group corresponding to the target baseline.

For example, if the multi-camera includes N cameras, where N is a positive integer, and the camera numbers are 1, 2, ..., N in sequence, every two cameras can be combined into a binocular camera group corresponding to the baseline length, such as The baseline of the binocular camera group composed of camera 1 and camera N is BL1, the baseline of the binocular camera group composed of camera 1 and camera N-1 is BL2, and so on, C_N^2=N(N-1) /2 binocular camera groups, so the baseline data includes the baseline lengths of C_N^2=N(N-1)/2 kinds of binocular cameras. The above examples are for illustration, and do not limit the number of multi-camera cameras and the number of cameras included in each camera group.

In the specific implementation, the target baseline can be determined according to the measurement accuracy requirements of the target. Specifically, the first accuracy index and the second accuracy index of each group of cameras may be determined first, wherein the first accuracy index is inversely proportional to the baseline length of each group of cameras, and the first accuracy index is positively related to the common viewing area of each group of cameras. Proportional relationship. The second accuracy index is proportional to the baseline length and focal length of each group of cameras. The common viewing area is the area captured by multiple cameras in each group of cameras, and then the first accuracy index is determined according to the measurement accuracy requirements of the target. and the weight of the second precision index, and then obtain the comprehensive index of each group of cameras according to the first precision index, the second precision index and the weight, so as to determine the target baseline according to the comprehensive index of each group of cameras.

It should be understood that for a multi-eye camera with a fixed baseline, in order to ensure the ranging accuracy, its ranging range is also fixed. This is because the closer the target is to the camera, the co-viewing area of the multi-eye camera approaches 0, wherein the co-viewing area is It refers to the area that the multi-camera can capture at the same time. At this time, the target may not have imaging points in individual cameras in the multi-camera, so the parallax of the target cannot be calculated. The farther the target is from the camera, the more blurred the target area on the first image and the second image will be, which will affect the parallax calculation. Therefore, a multi-eye camera with a fixed baseline has a fixed ranging range. Further, since the baseline length of the multi-eye camera and the common viewing area have a certain influence on the measurement accuracy of the parallax, the common viewing area refers to the area that the multi-eye camera can capture at the same time. The longer the baseline length of the multi-eye camera, the greater the ranging accuracy, but the common view area will gradually decrease with the increase of the baseline length, and the target may not be in the common view area of the multi-eye camera. Therefore, the target baseline can be determined according to the size of the common viewing area of the multi-camera and the measurement accuracy requirements of the target to be measured.

Wherein, the measurement accuracy requirement of the target to be measured may include the approximate distance between the target and the multi-camera camera, in other words, the long-distance target or the short-distance target of the target. Among them, whether the target to be measured is a long-distance target or a short-range target can be determined according to the size of the image area where the target is located in the image collected by the multi-eye camera, and the image area occupied by the long-distance target in the image collected by the camera is very small. , the image area occupied by the close-range target in the image collected by the camera is very large, so when the image area is smaller than the first threshold, it can be determined that the target to be measured is a close-range target, and when the image area is smaller than the second threshold, it can be determined The target to be tested is a long-distance target. The measurement accuracy requirement may also include the measurement error threshold of the target to be measured, for example, the measurement error is not less than 1 meter. It should be understood that the above examples are for illustration, and are not specifically limited in the present application.

The baseline data collected by the multi-camera camera not only includes the baseline length between multiple cameras in each camera group, but also includes the size of the common viewing area between the multiple cameras, wherein the size of the common viewing area can be determined according to the camera group. The shooting range of each camera is determined, and the shooting range refers to the range of the geographical area recorded in the image captured by the camera. In the specific implementation, the pixel coordinates of each edge position point can be determined by determining the edge position point that can be displayed in the video picture of each channel of video, and after converting it into geographic coordinates through a camera calibration algorithm, According to the area composed of these geographic coordinates, the shooting range of the video of the road is determined, and then the size of the common viewing area is obtained.

Optionally, a baseline adjustment request carrying a target baseline can be sent to the multi-camera camera, and the baseline adjustment request is used to instruct the multi-camera camera to adjust the baseline length of the camera group included in the multi-camera camera to the above-mentioned target baseline, and then receive the above-mentioned target baseline. The first image and the second image captured by the camera group corresponding to the target baseline.

In the above implementation manner, the target baseline is determined according to the size of the common viewing area of the multi-camera camera and the measurement accuracy requirements of the target to be measured, and the measurement accuracy can be improved as much as possible while ensuring that the target is within the shooting range of the binocular camera corresponding to the target baseline. It solves the problem that the fixed baseline multi-eye camera can only measure the distance of the target within the fixed distance measurement range, and expands the distance measurement range of the distance measurement and positioning system provided by the present application.

In a possible implementation manner of the first aspect, the multi-eye camera can shoot the target to obtain the first channel video and the second channel video, and after receiving the first channel video and the second channel video, the first channel video and the second channel video can be obtained. The video and the second channel of video are subjected to time synchronization processing to obtain the first image and the second image at the same moment, wherein the first image is the image frame in the first channel video, and the second image is the image frame in the second channel video .

In a specific implementation, a reference frame can be obtained from the first channel of video, and a plurality of motion frames can be obtained from the second channel of video, wherein the reference frame and the plurality of motion frames include moving objects, and then the reference frame and the plurality of motion frames are combined with each other. Perform feature point matching to obtain synchronization frames in multiple motion frames, wherein the parallelism of the connection between the feature points in the synchronization frame and the corresponding feature points in the reference frame satisfies a preset condition, and then according to the reference frame and the synchronization frame The time synchronization correction is performed on the video of the first channel and the video of the second channel to obtain the first image and the second image at the same moment. Wherein, satisfying the preset condition may refer to determining the frame with the highest parallelism between the lines as the synchronization frame.

It should be understood that because the model, manufacturer, timestamp, and frame rate of the video may be different for each camera in the multi-camera, and the network transmission delay may also cause frame loss during the transmission process, the calculation performance of the camera is poor. Frame loss is also prone to occur, so it is difficult to ensure the time synchronization of multi-channel video collected by multiple cameras. For example, camera 1 and camera 2 monitor the same intersection. Because camera 1 captures the vehicle running the red light at time T1, the real-time video stream transmitted by camera 1 is lost within 20ms after the capture time T1. , the camera 2 did not capture, and there was no frame loss. Therefore, after the first video and the second video received by the target positioning system 110, from the moment T1, the video collected by the camera 2 is higher than the video collected by the camera 1. It is 20ms faster. If the disparity calculation is performed directly on the first video and the second video collected by camera 1 and camera 2, there will be errors in the obtained disparity information, which will lead to obstacles in subsequent applications such as ranging and 3D reconstruction. Time synchronization of the channel image and the second channel image can solve this problem.

In a specific implementation, the above-mentioned reference frame and motion frame may be determined by an optical flow method. Among them, the optical flow refers to the instantaneous speed of the pixel motion of the space moving object on the observation imaging plane. When the time interval is very small, the optical flow can also be equivalent to the displacement of the space moving object. Based on this, the step flow of determining the reference frame and the motion frame can be as follows: first, perform target detection of the synchronization target on each frame of the first video and the second video, and obtain one or more synchronization targets in each frame of the image. , and then determine the optical flow of each synchronization target by the optical flow method, and determine whether it is a moving object by the optical flow of each synchronization target in each frame of image, so as to obtain the moving frame containing the moving object and the number of moving objects. Up to reference frames.

It is worth noting that when the target is detected for each frame of image, the detected synchronization target should be a target that may move, not a certain stationary target, such as a building. Therefore, the synchronization target may be the target to be measured in the foregoing content, or may be other targets, which are not specifically limited in this application. For example, if the target to be tested is a certain utility pole, the synchronization target used to achieve time synchronization can be a pedestrian or a vehicle; if the target to be measured is vehicle A, then the synchronization target used to achieve time synchronization can be Vehicles and pedestrians, the above examples are used for illustration and are not limited in this application.

It should be noted that the target detection algorithm in this embodiment of the present application may use any of the existing neural network models for target detection with better effects in the industry, for example: one-stage unified real-time target detection (You Only Look Once : Unified, Real-Time Object Detection, Yolo) model, Single Shot multi box Detector (SSD) model, Region ConvolutioNal Neural Network (RCNN) model or fast regional convolutional neural network Network (Fast Region Convolutional Neural Network, Fast-RCNN) models, etc., are not specifically limited in this application. In addition, the optical flow method in the embodiments of the present application can also use any one of the optical flow methods that are already available in the industry for calculating optical flow with better effects, such as the Lucas-Kanade (LK) optical flow method. There is no specific limitation.

Optionally, after obtaining the optical flow of each object in each frame (that is, the instantaneous speed of the object), it can be determined whether the object is a moving object by determining whether the speed of the object has a speed component in the direction of the image line, specifically, Since the multi-camera (such as the multi-camera shown in Figure 1) is fixed at the same height, if the object moves in the row direction, the row coordinates of the object will change, so if the object X in the motion frame Tn is the same as the previous frame Tn If the row coordinates of the same object X in -1 (or the next frame Tn+1) are not equal, it can be determined that the object is a moving object. It can be understood that the vertically moving object only moves in the column direction. This type of moving object has no velocity component in the image row direction, but only has a velocity component in the column direction. Therefore, the vertically moving object does not contribute to the disparity calculation. The vertically moving object is also regarded as a non-moving object and does not participate in the parallax calculation, thereby reducing the amount of calculation and improving the accuracy and efficiency of the parallax calculation.

Further, when the reference frame is matched with the motion frame, the moving object in the reference frame and the moving object in the motion frame can be matched with the feature points, and the difference value Δs of the row coordinates of each feature point can be calculated. The closer the moving object in the reference frame is to the moving object in the moving frame, the more parallel the line between the feature point in the reference frame and the feature point in the moving frame is. If Δs is 0, it means that the two frames are synchronized. 0, it means that the two frames are not synchronized, so the accurate synchronization offset time Δt can be calculated according to Δs, and the synchronization offset time Δt can be used as the compensation for the row coordinates of each subsequent frame, so as to obtain the synchronized first video and The second channel video, and then obtain the first image and the second image at each moment, such as the first image and the second image.

Optionally, before performing video synchronization processing on the first channel of video and the second channel of video, stereoscopic correction may also be performed on the first channel of video and the second channel of video. It should be understood that the formula used for calculating the parallax is often derived under the assumption that the multi-camera is in an ideal situation, so before using the multi-camera for ranging and positioning, the actually used multi-camera should be corrected to an ideal state. Taking a binocular camera as an example, after stereo correction, the image planes of the left and right cameras are parallel, the optical axis is perpendicular to the image plane, and the pole is located at infinity. At this time, the polar line corresponding to the point (x0, y0) is y=y0. In specific implementation, the embodiment of the present application may adopt any one of the existing stereo correction methods with better effects in the industry, such as the bouguet polar line correction method, which is not specifically limited in the present application.

In the above-mentioned implementation manner, the time synchronization processing is performed on the first video and the second video to obtain the first image and the second image, and then the determination of the position information of the target is carried out according to the first image and the second image, which can improve the accuracy of the position information. This improves the accuracy of subsequent applications such as AR, VR, and 3D reconstruction.

In a possible implementation manner of the first aspect, the first image may be input into the detection and matching model to obtain a first detection and matching result of the first image, and the second image may be input into the detection and matching model to obtain the second image of the second image. Detection and matching results, wherein the first detection and matching results and the second detection and matching results include a target frame and a label, the target frame is used to indicate the area of the target in the image, the label of the same target is the same, and the first detection and matching result is obtained. The target area is obtained according to the second detection and matching result.

The target frame in the target detection matching result may be a rectangular frame, a circular frame, an oval frame, etc., which is not specifically limited in this application. It should be understood that if the number of targets to be tested is multiple, the detection and matching results may include multiple target frames of multiple targets. Therefore, in the detection and matching results, the same target may be identified by the same label, and different targets may be identified using different In this way, when the disparity calculation is performed on the target, the same target in different video frames can be identified according to the label, so as to achieve the feature point matching of the same target in the first image and the second image at the same moment. , and then obtain the parallax of the target.

Optionally, the detection matching model may include a feature extraction module and a detection matching module. The feature extraction module is used to extract the features in the input first image and the second image, and generate a high-dimensional feature vector, and the detection matching module 620 is used to generate a detection matching result including a target frame and a label according to the above-mentioned feature vector.

Optionally, before acquiring the first image and the second image, a sample set may be used to train the detection matching model, and the sample set may include the first image sample, the second image sample, and The value includes the true value of target detection and the true value of target matching, wherein the true value of target detection includes the target frame of the target in the first image sample and the second image sample, and the true value of target matching includes the target in the first image sample and the second image sample. When using this sample set to train the above detection matching model, the detection matching loss used for backpropagation is determined according to the gap between the output value of the detection matching module and the true value of the sample. According to the detection matching loss pair The parameters of the detection and matching model are adjusted until the above-mentioned detection and matching loss reaches the threshold, and then the trained detection and matching model is obtained.

In the specific implementation, the feature extraction module can be a neural network backbone structure such as VGG, Resnet, etc. for extracting image features. The above detection and matching module can be a target detection network, such as YOLO network, SSD network, RCNN, etc., which is not specifically limited in this application. .

In the above implementation manner, by marking the same target with the same label, after the first image and the second image are input into the detection matching model, the first target area and the second target area can be determined according to whether the labels are the same, rather than the target area. Performing image recognition to determine the same target in the first image and the second image can reduce computational complexity, improve the acquisition efficiency of the first target area and the second target area, and further improve the efficiency of ranging and positioning.

In a second aspect, a target positioning system is provided, the system includes: an acquisition unit, configured to acquire a first image and a second image, the first image and the second image are obtained by shooting the same target at the same time by a multi-camera camera ; Detection and matching unit, for carrying out target detection and matching to the first image and the second image, to obtain the first target area of the first image and the second target area of the second image, wherein, the first target area and the second target area The area includes a target; the detection and matching unit is used to detect and match feature points on the first target area and the second target area, and obtain a feature point matching result, wherein the feature point matching result includes the feature point in the first target area and the first target area. The corresponding relationship between the feature points in the two target areas, the feature points with the corresponding relationship describe the same feature of the target; the position determination unit is used to determine the position information of the target according to the matching result of the feature points and the parameter information of the multi-camera camera .

By implementing the system described in the second aspect, a target baseline can be determined according to the target to be measured, and the camera group using the target baseline can collect the target to obtain a first image and a second image, and then perform target detection on the first image and the second image. and matching, obtain the first target area and the second target area where the target is located, and finally detect and match the feature points of the first target area and the second target area to obtain the feature point matching result, and determine each feature point matching result according to the feature point matching result. The disparity information of the feature points is used to determine the position information of the target. The system can flexibly select the target baseline camera group for data collection according to the target to be measured, avoid the problem of limited ranging range caused by fixed baseline multi-eye cameras, and improve the ranging range of the target positioning system. The system determines the location information of the target according to the parallax information of the feature points, and does not need to perform matching and parallax calculation for each pixel in the first image area and the second image area, thereby reducing the computational resources required for positioning and ranging, and avoiding background interference. , noise and other problems, improve the accuracy of ranging and positioning.

In a possible implementation manner of the second aspect, the parameter information includes at least the baseline length of the multi-camera camera and the focal length of the multi-camera camera; the position determination unit is configured to match the feature points according to the feature points with corresponding relationships between the feature points. The disparity information of the target is obtained, and the disparity information includes the difference between the pixel coordinates of the feature points in the first target area and the pixel coordinates of the feature points that have a corresponding relationship in the second target area; the position determination unit is used for According to the parallax information of the target, the baseline length of the multi-eye camera and the focal length of the multi-eye camera, the distance between the target and the camera is determined, and the position information of the target is obtained.

In a possible implementation manner of the second aspect, the multi-camera camera includes multiple camera groups, each camera group in the multiple camera groups includes multiple cameras, and the system further includes a baseline determination unit, a baseline determination unit for acquiring Baseline data of multi-camera cameras, the baseline data includes the baseline length between multiple cameras in each group of cameras; the baseline determination unit is used to obtain the target baseline from the baseline data according to the measurement accuracy requirements of the target; the acquisition unit is used for The first image and the second image are acquired according to the target baseline, wherein the first image and the second image are captured by a camera group corresponding to the target baseline.

In a possible implementation manner of the second aspect, the baseline determination unit is configured to send a baseline adjustment request carrying a target baseline to the multi-camera camera, where the baseline adjustment request is used to instruct the multi-camera camera to adjust the baseline length of the camera group it includes to the target baseline; an acquisition unit configured to receive the first image and the second image captured by the camera group corresponding to the target baseline.

In a possible implementation manner of the second aspect, a baseline determination unit is configured to determine a first accuracy index and a second accuracy index of each group of cameras, wherein the first accuracy index is inversely proportional to the baseline length of each group of cameras , the first accuracy index is proportional to the common viewing area of each group of cameras, the second accuracy index is proportional to the baseline length and focal length of each group of cameras, and the common viewing area is captured by multiple cameras in each group of cameras. area; a baseline determination unit for determining the weights of the first accuracy index and the second accuracy index according to the measurement accuracy requirements of the target; a baseline determination unit for obtaining each group of The comprehensive index of the camera; the baseline determination unit, which is used to determine the target baseline according to the comprehensive index of each group of cameras.

In a possible implementation manner of the second aspect, the system further includes a synchronization unit, which is used for receiving the first channel video and the second channel video obtained by shooting the target with the multi-camera camera; the synchronization unit is used for Perform time synchronization processing on the video of the first channel and the video of the second channel to obtain the first image and the second image at the same moment, wherein the first image is the image frame in the video of the first channel, and the second image is the video of the second channel image frame in .

In a possible implementation manner of the second aspect, the synchronization unit is configured to obtain a reference frame from the first channel of video, and obtain multiple motion frames from the second channel of video, wherein the reference frame and the multiple motion frames are Including a moving object; a synchronization unit is used to match the reference frame with the feature points of a plurality of motion frames, and obtain a synchronization frame in the plurality of motion frames, wherein, between the feature points in the synchronization frame and the corresponding feature points in the reference frame The parallelism of the connection line satisfies the preset condition; the synchronization unit is used to perform time synchronization correction on the video of the first channel and the video of the second channel according to the reference frame and the synchronization frame, and obtain the first image and the second image at the same moment.

In a possible implementation manner of the second aspect, the detection and matching unit is configured to input the first image into the detection and matching model, obtain a first detection and matching result of the first image, input the second image into the detection and matching model, and obtain the first detection and matching result of the first image. The second detection and matching results of the two images, wherein the first detection and matching results and the second detection and matching results include a target frame and a label, and the target frame is used to indicate the area of the target in the image, and the labels of the same target are the same; the detection and matching unit, It is used to obtain the first target area according to the first detection and matching result, and obtain the second target area according to the second detection and matching result.

In a third aspect, a computer program product is provided, comprising a computer program that, when read and executed by a computing device, implements the method described in the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, comprising instructions that, when executed on a computing device, cause the computing device to implement the method as described in the first aspect.

In a fifth aspect, a computing device is provided, including a processor and a memory, where the processor executes code in the memory to implement the method described in the first aspect.

Description of drawings

1 is a schematic diagram of an imaging structure based on a binocular camera;

Fig. 2 is the architecture diagram of a kind of stereo vision system provided by the application;

3 is a schematic diagram of the deployment of a target positioning system provided by the present application;

4 is a schematic diagram of ranging errors of binocular cameras with different baselines of fixed targets provided by the present application;

Fig. 5 is the imaging schematic diagram of a kind of binocular camera with too long baseline when shooting the target point;

6 is a schematic flowchart of steps of a target positioning method provided by the present application;

Fig. 7 is the step flow diagram of the first video and the second video provided by the application to perform time synchronization;

8 is an example diagram of a target detection matching result provided by the present application;

9 is a schematic structural diagram of a target detection model provided by the present application;

10 is a schematic flowchart of feature point detection and matching provided by the application;

11 is a schematic diagram of a feature point matching result of a target positioning method provided by the present application in an application scenario;

12 is a schematic diagram of a textureless object provided by the present application;

13 is a schematic flowchart of steps for determining a first target area and a second target area under an occlusion scenario provided by the present application;

14 is a schematic structural diagram of a target positioning system provided by the present application;

FIG. 15 is a schematic structural diagram of a computing device provided by the present application.

Detailed ways

The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.

The application scenarios involved in this application are described below.

Stereoscopic positioning refers to determining the position of objects in the image in the three-dimensional world space through the video or image information obtained by the image sensor. People can analyze the video information collected by the image sensor to realize target coordinate positioning, target ranging, 3D reconstruction, etc., and feed the results back to the terminal or cloud processor to serve more abundant applications, such as intelligent security and autonomous driving. , industrial testing, intelligent transportation, AR, VR, ADAS, medicine, etc.

Stereo vision positioning usually uses a binocular camera to shoot the target to obtain a multi-channel video of the target, first determine the parallax according to the multi-channel video, where the parallax refers to observing the same object from two observation points with a certain distance. The direction difference generated by the target, according to the distance between the cameras (ie the baseline length) and the parallax, can calculate the distance between the target and the camera, so as to obtain the accurate position of the target in the three-dimensional world space. In a specific implementation, the parallax may be the difference in pixel coordinates of the target in images captured by different cameras. For example, assuming that the binocular camera includes a left camera and a right camera, if the pixel coordinates of the target X in the picture captured by the left camera are (x, y), the pixel coordinates in the picture captured by the right camera are (x+d,y), then d is the parallax of the target X in the horizontal direction. It should be understood that the above examples are for illustration, and are not specifically limited in the present application.

The method for determining the distance between the target and the camera according to the baseline length and parallax will be illustrated below with reference to FIG. 1 . As shown in FIG. 1 , FIG. Among them, the camera parameters such as the focal length and imaging plane length of the two cameras of the binocular camera are the same, P is the target point to be detected, O _L is the optical center of the left camera of the binocular camera, and _OR is the right camera. The optical center of the camera, the line segment AB is the imaging plane of the left camera, the line segment CD is the imaging plane of the right camera, the line segment O _L O _R is the baseline of the binocular camera, the length is b, the imaging plane AB and the optical center O _L The distance between them is the focal length f of the binocular camera.

The point _PL on the imaging plane AB is the imaging of the target point P on the left camera, the point P _R on the imaging plane CD is the imaging of the target point P on the right camera, and the imaging _PL and the imaging plane AB are the leftmost The distance _XL between the points A of the edge of the target point P is the image abscissa of the image captured by the camera on the left side of the target point P, and the distance X _R between the imaging P _R and the point C on the leftmost edge of the imaging plane CD is is the image abscissa of the image captured by the camera on the right side of the target point P, then the parallax of the target point P to be detected is (X _L -X _R ).

Since the imaging planes AB, CD and the baseline O _L O _R are parallel, the triangle PO _L O _R is similar to the triangle PP _L P _R. Assuming that the vertical distance between the target P and the baselines of the two cameras is z, according to the triangle similarity theorem, Obtain the following equation:

Since the parameters of the two cameras of the binocular camera are the same, so CD=AB, the following equation can be obtained:

Among them, X _L - X _R is the parallax, b is the baseline length, and f is the focal length of the camera, so the distance z between the target and the camera can be obtained according to the parallax, baseline length and focal length of the multi-eye camera. It should be understood that the calculation process of the derivation process shown in FIG. 1 is used for illustration, and the present application does not limit the specific algorithm for determining the relationship between the target and the camera according to the parallax.

However, when the current stereo vision algorithm determines the distance between the target and the camera, since the target is not a point in the multi-channel image, but an image area, it is necessary to determine the disparity of each pixel in the area. The parallax and the baseline length of the multi-camera are accurate to determine the distance between the target and the camera. This process not only consumes huge computing resources, but also is prone to problems such as noise, calculation errors, and background interference in a large number of calculations, making the accuracy of ranging and positioning impossible. This will affect subsequent 3D reconstruction, automatic driving, security monitoring and other applications.

In order to solve the problem that the above-mentioned stereo vision algorithm consumes a huge amount of computing resources and the accuracy of ranging and positioning is poor, the present application provides a target positioning system, which can flexibly set the baseline length of the multi-eye camera according to the target to be measured, thereby To solve the problem of small ranging range, by performing target detection and matching on the first image and the second image captured by the multi-camera, determine the target area where the target is located in the first image and the second image, and then determine the target area where the target is located. The target area of the target area is detected and matched, and the disparity information of the target is determined according to the pixel difference between the feature points in each target area and the feature points in other target areas, without matching each pixel in the target area, Therefore, the computing resources required for ranging and positioning are reduced, the interference of the background image on the target parallax calculation is avoided, the accuracy of the parallax calculation is improved, and the accuracy of the ranging and positioning is improved.

FIG. 2 is a system architecture diagram of an embodiment of the present application. As shown in FIG. 2 , the system architecture for stereo vision positioning provided by the present application may include a target positioning system 110 , a multi-eye camera 120 and an application server 130 . The target positioning system 110 and the multi-camera 120 are connected through a network, and the application server 130 and the target positioning system 110 are connected through a network. The above network may be a wireless network or a wired network, which is not specifically limited in this application.

The multi-eye camera 120 includes multiple camera groups, and each camera group in the multiple camera groups includes multiple cameras. For example, the multi-eye camera 120 includes N cameras, where N is a positive integer, and the camera numbers are 1, 2, . . . ,N, every two cameras can be combined into a binocular camera group corresponding to the baseline length, for example, the baseline of the binocular camera group composed of camera 1 and camera N is BL1, and the binocular camera group composed of camera 1 and camera N-1 The baseline is BL2, and so on, you can get

A set of binocular cameras. It should be understood that the above examples are for illustration, and are not specifically limited in the present application.

Wherein, the multi-eye camera 120 is used for sending baseline data to the target positioning system 110, wherein the baseline data includes the baseline length between multiple cameras in each camera group. Still taking the above example as an example, the multi-eye camera 120 includes

binocular cameras, then the baseline data can include

Baseline length of a binocular camera. The multi-eye camera 120 is further configured to receive the target baseline sent by the target positioning system 110 , and send the first image and the second image collected by the camera group corresponding to the target baseline to the target positioning system 110 . The first image and the second image are images obtained by photographing the same target at the same time. The target baseline is the target baseline determined by the target positioning system 110 according to the measurement accuracy requirements of the target.

In a specific implementation, the multi-camera 120 can receive a baseline adjustment request sent by the target positioning system 110, and the baseline adjustment request carries the above-mentioned target baseline, and the multi-camera 120 can obtain the camera corresponding to the target baseline according to the baseline adjustment request. The group captures the first image and the second image, which are then sent to the object positioning system 110 .

It is understandable that when using the stereo vision algorithm to locate the target, the baseline length of the multi-camera camera is often fixed. In the case of a fixed baseline, the farther the target is, the worse the ranging accuracy will be, which will lead to poor ranging and positioning. The range is also limited. Therefore, the target positioning system determines the baseline length of the multi-camera and 120 according to the measurement accuracy requirements of the target. For example, when measuring the distance of a long-distance target, a binocular camera with a longer baseline can be used, and a short-range target can be used for distance measurement and positioning. When , a binocular camera with a short baseline can be used to expand the range of ranging and positioning and solve the problem of small ranging range during ranging and positioning.

Optionally, the multi-eye camera 120 is further configured to use the camera group corresponding to the target baseline to shoot the target to obtain multi-channel videos, such as the first channel video and the second channel video, and then combine the first channel video and the second channel video. The video is sent to the target positioning system 110, wherein the first video and the second video include the above-mentioned first image and the second image, and the target positioning system 110 can perform time synchronization processing on the first video and the second video, The first and second images described above are obtained.

It is worth noting that the above-mentioned first channel video and second channel video may be real-time video collected by the multi-camera 120, or may be a cached historical video. For example, the multi-camera 120 includes 10 cameras, which are located at the door of a certain community. After each camera collects the surveillance video at the entrance of the community from 8:00 am to 8:00 pm, it is transmitted to the target positioning system 110 as the first video and the second video at 9:00 pm for processing, and each camera can also collect real-time monitoring of the community door. The surveillance video is transmitted to the target positioning system 110 in real time through the network for processing, which is not specifically limited in this application.

Optionally, when the target is a stationary target, the camera in the multi-eye camera 120 can also be a monocular movable camera. In short, the camera system only includes a camera mounted on a slidable support rod, The camera can collect the first video and the second video of the target at different angles through the sliding support rod, and the distance length of the camera moving on the sliding support rod is the above target baseline. It should be understood that the above-mentioned multi-eye camera 120 may further include other structures capable of capturing the first channel video and the second channel video of the same target, which is not specifically limited in this application.

The application server 130 may be a single server, or a server cluster composed of multiple servers, and the server may be implemented by a general-purpose physical server, for example, an ARM server or an X86 server, or may be combined with network functions virtualization (network functions virtualization, A virtual machine (virtual machine, VM) implemented by NFV) technology, such as a virtual machine in a data center, is not specifically limited in this application. Among them, the application server 130 is used to realize functions such as three-dimensional reconstruction, industrial detection, intelligent security, AR, VR, automatic driving, etc. according to the position information sent by the target positioning system 110 .

The target positioning system 110 is configured to receive the first image and the second image sent by the multi-camera 120, perform target detection and matching on the first image and the second image, and obtain the first target area of the first image and the first target area of the second image. Two target areas, wherein the first target area and the second target area include the above-mentioned target to be measured. Then, feature point detection and matching are performed on the first target area and the second target area to obtain a feature point matching result, wherein the feature point matching result includes the feature point in the first target area and the feature point in the second target area. The corresponding relationship between the two, the feature points with the corresponding relationship describe the same feature of the target, and finally the parallax information of the target can be determined according to the matching results of the feature points and the parameter information of the multi-camera camera, and then the position information of the target can be determined according to formula 3, And send it to the above-mentioned application server 130, so that the application server 130 can realize functions such as three-dimensional reconstruction, AR, VR, automatic driving, etc. according to the position information.

Optionally, the target positioning system 110 can also receive the above-mentioned baseline data sent by the multi-eye camera 120, and then obtain the target baseline from the baseline data according to the measurement accuracy requirements of the target to be measured, and then send it to the multi-eye camera 120, so that The multi-camera 120 uses the camera group corresponding to the target baseline to capture the target to obtain the first image and the second image.

Optionally, after the multi-camera 120 uses the camera corresponding to the target baseline to collect the target, it can also obtain the first video and the second video. After the target positioning system 110 receives the first video and the second video, , the time synchronization processing can be performed on the first channel video and the second channel video to obtain the above-mentioned first image and second image.

In specific implementation, the target positioning system 110 provided by the present application is flexible in deployment, and can be deployed in an edge environment, specifically, an edge computing device in the edge environment or a software system running on one or more edge computing devices. The edge environment refers to a cluster of edge computing devices that are geographically close to the multi-camera 120 and used to provide computing, storage, and communication resources, such as edge computing all-in-one computers located on both sides of a road. For example, the target positioning system 110 may be one or more edge computing devices located near the intersection or a software system running on the edge computing device near the intersection, where the camera 1 is set. , Camera 2 and Camera 3 to monitor the intersection. The edge computing device can determine the most suitable baseline according to the target to be measured as BL3. The cameras that meet the baseline BL3 include Camera 1 and Camera 3. The edge computing device can monitor Camera 1 and Camera 3. 3. Perform time synchronization processing on the first video and the second video collected to obtain the first image and the second image at the same moment, and then perform target detection on the first image and the second image to obtain the first target image and the second image. Two target images, wherein both the first target image and the second target image include the target to be tested, then, the feature point detection and matching are performed on the first target image and the second target image to obtain a feature point matching result, according to the feature The point matching result and the parameter information of camera 1 and camera 3 can determine the parallax information of the target, and then determine the position information of the target in combination with formula 3, and send the position information to the application server 130, so that the application server can realize three-dimensional reconstruction and AR according to the position information. , VR, autonomous driving and other functions.

The target positioning system 110 may also be deployed in a cloud environment, which is an entity that provides cloud services to users by utilizing basic resources in a cloud computing mode. The cloud environment includes a cloud data center and a cloud service platform, and the cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by the cloud service provider. The target positioning system 110 may be a server in a cloud data center, a virtual machine created in the cloud data center, or a software system deployed on a server or virtual machine in the cloud data center, and the software system may be distributed It is deployed on multiple servers, or distributed on multiple virtual machines, or distributed on both virtual machines and servers. For example, the target positioning system 110 can also be deployed in a cloud data center that is far away from a certain intersection, where camera 1, camera 2 and camera 3 are set to monitor the intersection, and the cloud data center can determine the most suitable for the intersection according to the target to be measured. The baseline is BL3, and the cameras that satisfy the baseline BL3 include camera 1 and camera 3. The cloud data center can perform time synchronization processing on the first video and the second video collected by camera 1 and camera 3 to obtain the first video at the same time. image and the second image, and then perform target detection on the first image and the second image to obtain the first target image and the second target image, wherein the first target image and the second target image both include the target to be detected, and then , perform feature point detection and matching on the first target image and the second target image, obtain the feature point matching result, determine the parallax information of the target according to the feature point matching result and the parameter information of camera 1 and camera 3, and then determine the target in combination with formula 3 The location information is sent to the application server 130, so that the application server can implement functions such as 3D reconstruction, AR, VR, and automatic driving according to the location information.

The targeting system 110 may also be deployed partly in an edge environment and partly in a cloud environment. For example, the edge computing device is responsible for determining the baseline of the binocular camera according to the target to be measured, and the cloud data center determines the parallax information according to the first video and the second video collected by the binocular camera. For example, as shown in Figure 3, the intersection is provided with camera 1, camera 2 and camera 3 to monitor the intersection, and the edge computing device can determine the most suitable baseline according to the target to be measured as BL3, which meets the requirements of the baseline BL3. The cameras include camera 1 and camera 3. The edge computing device can also perform time synchronization processing on the first video and the second video collected by camera 1 and camera 3 to obtain the first image and the second image at the same moment, and then It is sent to the cloud data center, and the cloud data center can perform target detection on the first image and the second image to obtain the first target image and the second target image, wherein both the first target image and the second target image include the object to be detected. Then, perform feature point detection and matching on the first target image and the second target image, and obtain the feature point matching result. According to the feature point matching result and the parameter information of camera 1 and camera 3, the parallax information of the target can be determined, and then The position information of the target is determined in combination with formula 3, and the position information is sent to the application server 130, so that the application server can realize functions such as 3D reconstruction, AR, VR, and automatic driving according to the position information.

It should be understood that the unit modules inside the target positioning system 110 may also be divided into multiple divisions, and each module may be a software module, a hardware module, or part of a software module and part of a hardware module, which is not limited in this application. FIG. 2 is an exemplary division manner. As shown in FIG. 2 , the target positioning system 110 may include a baseline determination unit 111 , a synchronization unit 112 and a detection and matching unit 113 . It should be noted that, due to the flexible deployment of the target positioning system 110, each module in the target positioning system can also be deployed on the same edge computing device, the same cloud data center, or the same physical machine, of course, it can also be partially deployed For edge computing devices, some are deployed in cloud data centers. For example, the baseline determination unit 111 is deployed in edge computing devices, and the synchronization unit 112 and detection matching unit 113 are deployed in cloud data centers, which are not specifically limited in this application.

The baseline determination unit 111 is configured to receive the baseline data sent by the multi-camera 120 , and then determine the target baseline according to the measurement accuracy requirement of the target to be measured, and send it to the multi-camera 120 . It should be understood that the baseline length of the multi-eye camera and the common viewing area have a certain influence on the measurement accuracy of the parallax, wherein the common viewing area refers to the area that can be photographed by the multi-eye cameras at the same time.

It should be understood that, in order to ensure the ranging accuracy of a multi-camera camera with a fixed baseline, its ranging range is also fixed. This is because the closer the target is to the camera, the common viewing area of the multi-camera camera is close to 0. There may be no imaging points for individual cameras in , and therefore no disparity of the target can be calculated. The farther the target is from the camera, the more blurred the target area on the first image and the second image will be, which will affect the parallax calculation. Therefore, a multi-eye camera with a fixed baseline has a fixed ranging range. In the system provided by the present application, the baseline determination unit 111 can flexibly determine the target baseline according to the measurement accuracy requirements of the target to be measured, thereby expanding the ranging range of the ranging and positioning system provided by the present application.

The following briefly describes several factors that affect the ranging and positioning error of the multi-camera. Taking the binocular camera as an example, FIG. 4 is a schematic diagram of the ranging error of the binocular camera with different baselines of the fixed target. It can be seen from FIG. 4 that when the distance between the target and the multi-camera 120 is 50 meters, the baseline of the binocular camera 120 Different lengths have different ranging errors. The shorter the baseline length, the greater the ranging error, and the lower the corresponding ranging accuracy; the longer the baseline length, the smaller the ranging error, and the higher the corresponding ranging accuracy.

However, when the baseline length of the binocular cameras is too long, the common viewing area of the binocular cameras will continue to decrease, and there may be situations where the left camera or the right camera cannot capture the target, for example, as shown in Figure 5, Figure 5 is a A schematic diagram of the imaging of a binocular camera with a long baseline when shooting a target point. If the baseline of the binocular camera is too long, the target point P is not within the shooting range of the right camera. In other words, the target point P is within the shooting range of the right camera. There is no imaging point on the imaging plane CD, so the position information of the target cannot be determined according to the parallax.

Combining the above factors affecting the ranging accuracy of the multi-camera, the baseline determination unit 111 can determine the target baseline in the following manner: first, determine the first accuracy index and the second accuracy index of each group of cameras, wherein the first accuracy index is related to each group of cameras. The baseline length of the cameras is inversely proportional, the first accuracy index is proportional to the common viewing area of each group of cameras, and the second accuracy index is proportional to the baseline length and focal length of each group of cameras. The area photographed by multiple cameras, and then determine the weight of the first accuracy index and the second accuracy index according to the measurement accuracy requirements of the target, and then obtain the comprehensive combination of each group of cameras according to the first accuracy index, the second accuracy index and the weight. indicators, so as to determine the target baseline according to the comprehensive indicators of each group of cameras.

The measurement accuracy requirement may include the approximate distance between the target and the multi-camera 120, such as whether the target is a long-distance target or a short-distance target, wherein whether the target is a long-distance target or a short-distance target can be determined according to the target in the image captured by the camera. The size of the image area is determined. For example, a target whose image area size is smaller than the first threshold is a long-distance target, and a target whose image area size is greater than the second threshold is a short-range target. The strategy accuracy requirement may also include the target measurement error threshold, for example, the target measurement error must not be less than 1 meter. It should be understood that the above examples are for illustration, and are not specifically limited in the present application.

Optionally, after determining the target baseline, the baseline determination unit 111 may send a baseline adjustment request carrying the target baseline to the multi-camera 120, where the baseline adjustment request is used to instruct the multi-cam 120 to adjust the baseline length of the camera group to the above-mentioned length. target baseline.

The synchronization unit 112 is configured to receive the multi-camera 120 to collect the target using the camera group of the target baseline to obtain multi-channel videos, such as the first channel video and the second channel video, and then perform time synchronization on the first channel video and the second channel video. processing to obtain a first image and a second image, wherein the first image and the second image are obtained by photographing the same target at the same time. It should be understood that due to the different time stamps, frame rates, and network delays of different cameras, the two videos captured by the binocular camera may be out of sync in time. For example, the image with the time stamp of T1 on the left camera describes the world time. is the scene at time T1, and the image with the time stamp of T1 on the right camera describes the scene at time T1+Δt in the world time. After the synchronization unit 112 performs time synchronization on the video of the first channel and the video of the second channel, the first image and the second image at the same moment can be obtained. Using the first image and the second image to perform the next disparity calculation can improve the final result. accuracy of the location information.

The detection and matching unit 113 is used to detect and identify the target to be tested in the first image and the second image, obtain the first target area and the second target area, and then perform feature points on the first target area and the second target area. The feature point matching result includes the corresponding relationship between the feature points in the first target area and the feature points in the second target area, and the feature points with the corresponding relationship describe the target of the same feature. Then, the parallax information of each feature point is determined according to the feature point matching result, and the position information of the target is determined according to the parallax information and the parameter information of the multi-camera.

Wherein, the disparity information includes the disparity of each feature point of the target, which may be the difference between the pixel coordinates of the feature points in the first target area and the pixel coordinates of the corresponding feature points in the second target area. Regarding the disparity For description, reference may be made to the embodiment in FIG. 1 , which will not be repeated here.

The position information may include the distance between the target and the camera, which may be determined according to parallax and parameter information of the multi-camera. It can be known from the formula 3 in the embodiment of FIG. 1 that the parameter information at least includes the baseline length and the focal length of the multi-camera. The location information may also include the geographic coordinates of the target, where the geographic coordinates may be determined according to the geographic coordinates of the multi-camera combined with the distance between the target and the camera, and may be specifically determined according to the requirements of the application service 130. If the location coordinates include The geographic coordinates of the target, the parameter information of the multi-camera may include the geographic coordinates of the multi-camera.

It can be understood that the detection and matching unit 113 determines the position information of the target according to the parallax information of the feature points, and does not need to perform matching and parallax calculation on each pixel in the first image area and the second image area, thereby reducing the time required for positioning and ranging. Computing resources, while avoiding background interference, noise and other problems, and improving the accuracy of ranging and positioning.

To sum up, the target positioning system provided by the present application can determine the target baseline according to the target to be measured, use the camera group of the target baseline to collect the target to obtain the first image and the second image, and then compare the first image and the second image. The image is subjected to target detection and matching, and the first target area and the second target area where the target is located are obtained. Finally, the first target area and the second target area are detected and matched by feature points, and the result of feature point matching is obtained. Matching according to the feature points As a result, the disparity information of each feature point is determined, thereby determining the position information of the target. The system can flexibly select the target baseline camera group for data collection according to the target to be measured, avoid the problem of limited ranging range caused by fixed baseline multi-eye cameras, and improve the ranging range of the target positioning system. The system determines the location information of the target according to the parallax information of the feature points, and does not need to perform matching and parallax calculation for each pixel in the first image area and the second image area, thereby reducing the computational resources required for positioning and ranging, and avoiding background interference. , noise and other problems, improve the accuracy of ranging and positioning.

As shown in FIG. 6 , the present application provides a target positioning method, which can be applied to the architecture of the stereoscopic vision system shown in FIG. 2 . Specifically, the method can be executed by the aforementioned target positioning system 110 . The method may include the following steps:

S310: Determine the target baseline according to the measurement accuracy requirement of the target to be measured.

With reference to the foregoing content, the above-mentioned multi-camera 120 may include N camera groups, and each camera group includes a plurality of cameras.

A binocular camera. The multi-camera 120 may send the baseline data of each camera group to the object positioning system 110 before step S310. The target positioning system 110 can select a target baseline from the baseline data of the N(N-1)/2 types of binocular cameras according to the measurement accuracy requirements of the target to be measured, and send it to the multi-eye camera 120 .

In one embodiment, referring to the embodiments of FIGS. 4 and 5 , it can be known that the longer the baseline length of the multi-camera 120 is, the greater the ranging accuracy is, but the common viewing area will gradually decrease as the baseline length increases, and targets may appear. In the case of not being within the common viewing area of the multi-camera 120 . Therefore, the target baseline can be determined according to the size of the common viewing area of the multi-camera 120 and the measurement accuracy requirements of the target to be measured.

The baseline data collected by the multi-camera 120 includes not only the baseline length between the multiple cameras in each camera group, but also the size of the common viewing area between the multiple cameras, where the size of the common viewing area can be determined according to the size of the camera The shooting range of each camera in the group is determined, and the shooting range refers to the range of the geographical area recorded in the image captured by the camera.

In the specific implementation, the pixel coordinates of each edge position point can be determined by determining the edge position point that can be displayed in the video picture of each channel of video, and after converting it into geographic coordinates through a camera calibration algorithm, The shooting range of the video is determined according to the area composed of these geographic coordinates, and then the size of the common viewing area between multiple cameras is obtained.

The measurement accuracy requirement of the target to be measured may include the approximate distance between the target and the multi-camera 120 , in other words, the long-range target or the short-range target of the target. Among them, whether the target to be measured is a long-distance target or a short-range target can be determined according to the size of the image area where the target is located in the image collected by the multi-camera camera, and the image area occupied by the long-distance target in the image collected by the camera is very small , the image area occupied by the close-range target in the image collected by the camera is very large, so when the image area is smaller than the first threshold, it can be determined that the target to be measured is a close-range target, and when the image area is smaller than the second threshold, it can be determined The target to be tested is a long-distance target. The measurement accuracy requirement may also include the measurement error threshold of the target to be measured, for example, the measurement error is not less than 1 meter. It should be understood that the above examples are for illustration, and are not specifically limited in the present application.

In one embodiment, the first accuracy index p ₁ and the second accuracy index p ₂ of each group of cameras may be determined, wherein the first accuracy index p ₁ is inversely proportional to the baseline length of each group of cameras, and the first accuracy index p ₁ is proportional to the common viewing area of each group of cameras, the second accuracy index p ₂ is proportional to the baseline length and focal length of each group of cameras, the common viewing area is the area captured by multiple cameras in each group of cameras, Then, the weight α of the first accuracy index p ₁ and the second accuracy index p ₂ is determined according to the measurement accuracy requirements of the target, and then the weight α of each group of cameras is obtained according to the first accuracy index p ₁ , the second accuracy index p ₂ and the weight α. The comprehensive index p is used to determine the target baseline according to the comprehensive index of each group of cameras.

In the specific implementation, combined with the above conclusion: "the common view area gradually decreases with the increase of the baseline length", it can be known that the relationship between the ranging accuracy p ₁ and the common view area FOV is:

Combining the above conclusion: "the longer the baseline length, the greater the ranging accuracy", the relationship between the baseline length b and the ranging accuracy p ₂ can be known as:

p ₂ =f×b (5)

Combining the above two conclusions: "The longer the baseline length is, the greater the ranging accuracy is, but the smaller the common viewing area is at the same time", it can be known that the comprehensive accuracy is:

p=αλ ₁ p ₁ +(1-α)λ ₂ p ₂ (6)

Among them, λ ₁ and λ ₂ are both unit conversion coefficients, so that the units of p ₁ and p ₂ are consistent, and can participate in the calculation of the comprehensive accuracy p. The weight α∈(0,1), the specific value of α can be determined according to the above-mentioned ranging and positioning requirements. For example, when the target to be measured is a long-distance target, the baseline length has a great influence on the ranging accuracy. Therefore, the value of α can be appropriately reduced in combination with the measurement error threshold of the target to be measured, and the accuracy index p ₂ based on the baseline length can be improved. Influence on the comprehensive index; Similarly, when the target to be measured is a short-range target, the size of the common viewing area has a greater impact on the ranging accuracy, so it can be combined with the measurement error threshold of the target to be measured to appropriately increase α. value, and improve the impact of the common view-based precision index p ₁ on the comprehensive index.

It should be understood that, based on formulas (1) to (3), the comprehensive index p of the above N(N-1)/2 camera groups can be determined according to the target to be measured, and then the baseline length corresponding to the largest comprehensive index p is determined as target baseline, which is sent to the multi-camera 120. It should be understood that the above examples are used for illustration, and the present application does not limit the specific formula of the comprehensive index.

S320: Acquire the first image and the second image according to the target baseline.

Specifically, the target baseline may be sent to the multi-camera 120 according to the target baseline, and then the first image and the second image captured by the camera group corresponding to the target baseline may be received. In some embodiments, a baseline adjustment request carrying a target baseline may be sent to the multi-camera camera, and after the multi-camera adjusts the baseline length of at least one camera group to the target baseline according to the baseline adjustment request and shoots a video or image, then A first image and a second image captured by the camera group corresponding to the target baseline are received. In short, the baseline adjustment request is used to instruct the multi-camera 120 to adjust the baseline length of the camera group to the above target baseline.

In one embodiment, after the target baseline or the baseline adjustment request carrying the target baseline is sent to the multi-camera, the first video and the second video captured by the camera group corresponding to the target baseline can also be received. The first image and the second image are obtained after time synchronization processing is performed between the video of the first channel and the video of the second channel, wherein the video of the first channel and the video of the second channel are captured by the camera group corresponding to the target baseline. An image is an image at a uniform moment. The first image is a video frame in the first channel of video, and the second image is a video frame in the second channel of video.

It should be understood that because the model, manufacturer, timestamp, and frame rate of the video may be different for each camera in the multi-camera, and the network transmission delay may also cause frame loss during the transmission process, the calculation performance of the camera is poor. Frame loss is also prone to occur, so it is difficult to ensure the time synchronization of multi-channel video collected by multiple cameras. For example, camera 1 and camera 2 monitor the same intersection. Because camera 1 captures the vehicle running the red light at time T1, the real-time video stream transmitted by camera 1 is lost within 20ms after the capture time T1. , the camera 2 did not capture, and there was no frame loss. Therefore, after the first video and the second video received by the target positioning system 110, from the moment T1, the video collected by the camera 2 is higher than the video collected by the camera 1. If the disparity calculation is performed directly on the first video and the second video collected by camera 1 and camera 2, there will be errors in the obtained disparity information, which will lead to obstacles in subsequent applications such as ranging and 3D reconstruction. Therefore, before the disparity calculation is performed, time synchronization processing may be performed on the first channel video and the second channel video at step S320, thereby improving the parallax calculation accuracy, thereby improving the accuracy of applications such as ranging and 3D reconstruction.

In one embodiment, the reference frame can be obtained from the first video, and a plurality of motion frames can be obtained from the second video, wherein the reference frame and the plurality of motion frames include moving objects, and then the reference frame is combined with the plurality of motion frames. The motion frame performs feature point matching to obtain synchronization frames in multiple motion frames, wherein the parallelism of the line between the feature points in the synchronization frame and the corresponding feature points in the reference frame satisfies the preset condition, and finally according to the reference frame and the reference frame. The synchronization frame performs time synchronization correction on the video of the first channel and the video of the second channel to obtain the first image and the second image. Wherein, satisfying the preset condition may refer to determining the frame with the highest parallelism between the lines as the synchronization frame.

Specifically, the above-mentioned reference frame and motion frame may be determined by an optical flow method. Among them, the optical flow refers to the instantaneous speed of the pixel motion of the space moving object on the observation imaging plane. When the time interval is very small, the optical flow can also be equivalent to the displacement of the space moving object. Based on this, the step flow of determining the reference frame and the motion frame can be as follows: first, perform target detection of the synchronization target on each frame of the first video and the second video, and obtain one or more synchronization targets in each frame of the image. , and then determine the optical flow of each synchronization target by the optical flow method, and determine whether it is a moving object by the optical flow of each synchronization target in each frame of image, so as to obtain the moving frame containing the moving object and the number of moving objects. Up to reference frames.

It is worth noting that when the target is detected for each frame of image, the detected synchronization target should be a target that may move, not a certain stationary target, such as a building. Therefore, the synchronization target may be the target to be measured in the foregoing content, or may be other targets, which are not specifically limited in this application. For example, if the target to be tested is a certain utility pole, the synchronization target for time synchronization can be a pedestrian or a vehicle; if the target to be tested is vehicle A, the synchronization target for time synchronization can be Vehicles and pedestrians, the above examples are used for illustration and are not limited in this application.

In one embodiment, after obtaining the optical flow of each object in each frame (that is, the instantaneous speed of the object), it can be determined whether the object is a moving object by determining whether the speed of the object has a speed component in the direction of the image line. Ground, since the multi-camera (eg, the multi-camera 120 shown in FIG. 2 ) is fixed at the same height, if the object moves in the row direction, the row coordinates of the object will change, so if the object X in the motion frame Tn is the same as the previous one If the row coordinates of the same object X in one frame Tn-1 (or the next frame Tn+1) are not equal, it can be determined that the object is a moving object. It can be understood that the vertically moving object only moves in the column direction. This type of moving object has no velocity component in the image row direction, but only has a velocity component in the column direction. Therefore, the vertically moving object does not contribute to the disparity calculation. The vertically moving object is also regarded as a non-moving object and does not participate in the parallax calculation, thereby reducing the amount of calculation and improving the accuracy and efficiency of the parallax calculation.

Further, when the reference frame is matched with the motion frame, the moving object in the reference frame and the moving object in the motion frame can be matched with the feature points, and the difference value Δs of the row coordinates of each feature point can be calculated. The closer the moving object in the reference frame is to the moving object in the moving frame, the more parallel the line between the feature point in the reference frame and the feature point in the moving frame is. If Δs is 0, it means that the two frames are synchronized. 0, it means that the two frames are not synchronized, so the accurate synchronization offset time Δt can be calculated according to Δs. Exemplarily, the formula of Δt can be as follows:

Among them, Δs ₁ and Δs ₂ are the two values with the smallest absolute value of the feature point difference, fr is the video frame rate, and the synchronization offset time Δt can be used as the compensation for the row coordinates of each subsequent frame, so as to obtain the first channel after synchronization. video and the second channel video, and then obtain the first image and the second image at each moment.

For example, as shown in FIG. 7 , assuming that the frame number of the reference frame of camera 1 is P1, the motion frame of camera 2 includes frame Q1, frame Q2 and frame Q3, and the motion frame Q1 of camera 2 and the reference frame P1 are characterized by Point matching, obtain the mean value Δs ₁ of the difference between the row coordinates of each feature point, perform feature point matching between the motion frame Q2 of the camera 2 and the reference frame P1, obtain the mean value Δs ₂ of the difference value of the row coordinates of each feature point, put the camera 2 The feature points of the motion frame Q3 of the camera 2 and the reference frame P1 are matched to obtain the mean value Δs ₃ of the difference between the row coordinates of each feature point, where Δs ₂ =0. Therefore, the difference between the motion frame Q2 of the camera 2 and the feature points in the reference frame P1 is Δs 3 . The connection lines between them are parallel, the motion frame Q2 and the reference frame P1 are the first image and the second image at the same time, so the frame P1 of the camera 1 can be aligned with the frame Q2 of the camera 2, that is, the video captured by the camera 1 1 frame slower than the video captured by camera 2. Of course, after obtaining the synchronization offset time Δt according to formula (7), the camera 1 and the camera 2 can be synchronized. For example, the offset time Δt=3ms, that is, the camera 1 is 3ms faster than the camera 2, then the camera 2 can be adjusted faster 3 seconds to achieve the purpose of synchronizing with the video of camera 1. It should be understood that FIG. 4 is used for illustration, and is not specifically limited in the present application.

In one embodiment, before performing video synchronization processing on the first channel of video and the second channel of video in step S320, stereoscopic correction may also be performed on the first channel of video and the second channel of video. It should be understood that the formula used in calculating the parallax is often derived under the assumption that the multi-camera is in an ideal situation, so before using the multi-camera for distance measurement and positioning, the actually used multi-camera 120 can be corrected to an ideal state . Taking a binocular camera as an example, after stereo correction, the image planes of the left and right cameras are parallel, the optical axis is perpendicular to the image plane, and the pole is located at infinity. The point (x ₀ , y ₀ ) at this time corresponds to The epipolar line is y=y ₀ . In specific implementation, the embodiment of the present application may adopt any one of the existing stereo correction methods with better effects in the industry, such as the bouguet polar line correction method, which is not specifically limited in the present application.

In one embodiment, in step S320, a multi-eye camera can also be used to capture the same target at the same time to obtain the first image and the second image. In the case of two images instead of the first video and the second video, the step of performing time synchronization processing on the first video and the second video in step S320 can be omitted, and step S330 is executed to The parallax calculation is performed, which will not be repeated here.

S330: Perform target detection and matching on the first image and the second image to obtain a first target area of the first image and a second target area of the second image. Wherein, the first target area and the second target area include the above-mentioned target to be detected.

In one embodiment, the first image may be input into the detection and matching model to obtain the first detection and matching result of the first image, and the second image may be input into the detection and matching model to obtain the second detection and matching result of the second image. The first target area is obtained from the matching result, and the second target area is obtained according to the second detection and matching result. The first detection matching result and the second detection matching result include a bounding box and a label, and the bounding box is used to indicate the area of the target to be detected in the image. The labels of different targets are different. According to the first detection matching The label in the result and the second detection matching result can determine the same target in the first image and the second image, and then determine the first target area and the second target area in combination with the above target frame.

Specifically, the target frame in the target detection matching result may be a rectangular frame, a circular frame, an oval frame, etc., which is not specifically limited in this application. It should be understood that if the number of targets to be tested is multiple, the detection and matching results may include multiple target frames of multiple targets. Therefore, in the detection and matching results, the same target may be identified by the same label, and different targets may be identified using different In this way, when the disparity calculation is performed on the target, the same target in different video frames can be identified according to the label, so as to achieve the feature point matching of the same target in the first image and the second image at the same moment. , and then obtain the disparity of the target.

For example, still taking the example shown in FIG. 7 as an example, in the synchronized first video and second video, frame P3 of camera 1 and frame Q4 of camera 2 are the first image and the first image at the same moment. Two images, exemplarily, after the frame P3 of the camera 1 and the frame Q4 of the camera 2 are input into the above detection matching model, the obtained first detection matching result and second detection matching result can be as shown in FIG. An example diagram of a target detection matching result in a target localization method is provided, wherein the detection matching result is the rectangular target frame and ID label shown in FIG. 8, ID: 001 and ID: 002, according to the first detection matching result and As a result of the second detection and matching, it can be known that the tank truck selected by the box in frame P3 and the tank truck selected by the box in frame Q4 are the same vehicle, and the bus box selected in frame P3 and the bus box selected in frame Q4 are the same vehicle. is the same car. It should be understood that FIG. 8 is used for illustration, the target frame can also be other forms such as a circular frame, an oval frame, etc., and the ID label displayed by the detection matching result can also be other forms such as letters, numbers, etc., which are not specifically limited in this application. .

Optionally, as shown in FIG. 9 , FIG. 9 is a schematic structural diagram of a target detection model in a target localization method provided by the present application. The detection and matching model may include a feature extraction module 610 and a detection and matching module 620 . Among them, the feature extraction module 610 is used to extract the features in the input first image and the second image, and generate a high-dimensional feature vector, and the detection matching module 620 is used to generate the detection matching result including the target frame and the label according to the above-mentioned feature vector. . For example, the frame P3 of the camera 1 and the frame Q4 of the camera 2 are the first image and the second image at the same time. The frame P3 and the frame Q4 can be input to the feature extraction module 610 to generate a high-dimensional feature vector, and then the feature The vector input detection matching module 620, the detection matching module 620 generates the detection matching result as shown in Figure 5, if the target to be tested is 001, then the first target area and the second target area shown in Figure 9 can be obtained, it should be understood , FIG. 9 is used for illustration, and is not specifically limited in this application.

In one embodiment, before step S310, a sample set may be used to train the detection matching model, and the sample set may include a first image sample, a second image sample, and a corresponding sample truth value, and the sample truth value includes target detection. The ground-truth value and the target matching ground-truth value, wherein the target detection ground-truth value includes the target frame of the target in the first image sample and the second image sample, and the target matching ground-truth value includes the label of the target in the first image sample and the second image sample, using When the above-mentioned detection matching model is trained by the sample set, the detection matching loss used for back propagation is determined according to the difference between the output value of the detection matching module 620 and the sample true value, and the detection matching model is determined according to the detection matching loss. The parameters are adjusted until the above detection matching loss reaches the threshold, and the trained detection matching model is obtained.

In specific implementation, the feature extraction module 610 may be a neural network backbone structure such as VGG, Resnet, etc. for extracting image features, and the above-mentioned detection and matching module 620 may be a target detection network, such as YOLO network, SSD network, RCNN, etc. Specific restrictions.

It should be understood that by marking the same target with the same label in this application, after the first image and the second image are input into the detection and matching model, the first target area and the second target area can be determined according to whether the labels are the same, rather than Performing image recognition on the target to determine the same target in the first image and the second image can reduce computational complexity, improve the acquisition efficiency of the first target area and the second target area, and further improve the efficiency of ranging and positioning.

S340: Perform feature point detection and matching on the first target area and the second target area to obtain a feature point matching result. The feature point matching result includes the corresponding relationship between the feature points in the first target area and the feature points in the second target area, and the feature points with the corresponding relationship describe the same feature of the target to be measured. For example, the target to be tested is a pedestrian, and the feature points of the pedestrian include eyes, nose and mouth, then there is a correspondence between the eyes of the pedestrian in the first target area and the eyes of the pedestrian in the second target area.

In one embodiment, a feature point detection algorithm can be used to perform feature point detection on the first target area and the second target area to obtain the feature points of the first target area and the feature points of the second target area. The target is the same target, so the feature points of the first target area have their corresponding feature points in the second target area.

Optionally, the feature point detection and matching algorithm in this embodiment of the present application may be a feature point extraction algorithm (features from accelerated segment test, FAST), a feature point description algorithm (binary robust independent elementary features, BRIEF), a combination of FAST and BRIEF oriented fast and rotated brief (ORB), accelerated robust feature algorithm (speeded up robust features, SURF), accelerated KAZE feature algorithm (accelerated KAZE features, AKAZE), etc., which are not specifically limited in this application.

Still taking the examples described in the embodiments of FIGS. 7 to 9 as an example, the target to be tested is a vehicle with ID: 001, and the first target area and the second target area are shown in FIG. 10 , which is an example provided by this application A schematic flowchart of feature point detection and matching in the target localization method. After the feature point detection and matching is performed on the first target area and the second target area, the feature point matching result shown in FIG. 10 can be obtained. Exemplarily, FIG. 10 shows a partial feature point matching result, and each feature point detected in the first target area will have a corresponding feature point in the second target area. It should be understood that in FIG. 10 , the feature points with corresponding relationships are represented by connecting lines. In a specific implementation, the feature point matching result can be used to represent the corresponding relationship between the feature points in other ways. FIG. Specific restrictions.

S350: Determine the position information of the target according to the feature point matching result and the parameter information of the multi-camera.

The parameter information of the multi-camera includes at least the baseline length and focal length of the multi-camera, and may also include geographic coordinate information of the multi-camera. The location information of the target may include the distance between the target and the multi-camera, and may also include the geographic coordinates of the target, which is not specifically limited in this application.

Specifically, the disparity information of the target can be obtained according to the pixel difference between the feature points with corresponding relationships in the feature matching result, where the disparity information includes the pixel coordinates of the feature points in the first target area and the corresponding features in the second target area. The distance between the pixel coordinates of the point, combined with the embodiment of FIG. 1 and formula 3, can be used to determine the distance between the target and the multi-camera camera according to the parallax information, the baseline length b and the focal length f. Determine the geographic coordinates of the target.

In the specific implementation, after determining the difference between the pixel coordinates of each feature point in the first target area and the pixel coordinates in the second target area, some credible pixel differences are taken as the parallax, or the average value is taken as the target disparity information, and then use the disparity information to perform distance calculation, which is not specifically limited in this application. For example, if the first target area includes feature points A1 and B1 of target X, the second target area includes feature points A2 and B2 of target X, where A1 and A2 are the same feature point, and B1 and B2 are the same For a feature point, after determining the pixel difference D1 between the feature point A1 and the feature point A2, and the pixel difference D2 between the feature point B1 and the feature point B2, the average value of the pixel difference D1 and the pixel difference D2 can be determined. The parallax of the target, and then the distance between the target and the binocular camera is obtained. It should be understood that the above examples are for illustration, and are not specifically limited in this application.

Exemplarily, as shown in FIG. 11, FIG. 11 is a schematic diagram of the feature point matching result of a target positioning method provided by the present application in an application scenario, to measure the actual application scenario of the distance between the person Y and the binocular camera. For example, using the target positioning method provided by this application, you can first determine the target baseline according to the measurement accuracy requirements (for example, the person Y is a short-range target, and the measurement error is plus or minus 1 meter), combined with formulas (4) to (6), and then Send the target baseline to the multi-camera 120, obtain the first image and the second image captured by the camera group corresponding to the target baseline, and then input the first image and the second image into the detection matching model shown in FIG. The first detection matching result and the second detection matching result shown in FIG. 11 , according to the target frame and label in the first detection matching result and the second detection matching result, the first target area and the second target area including the person Y are obtained. , after the feature point detection and matching of the first target area and the second target area, the feature point matching result as shown in Figure 11 can be obtained. The parallax is 14.2m away from the camera. It should be understood that FIG. 11 is used for illustration, and is not specifically limited in the present application.

It can be understood that since the parallax is determined according to the difference between the feature points, rather than the difference between each pixel in the first target area and the second target area, it can not only reduce the amount of calculation, but also improve the calculation efficiency of the parallax. Moreover, since the feature points can not only be in the pixel, but also between the pixels, in other words, the accuracy of determining the parallax based on pixel matching is at the integer level, and the accuracy of determining the parallax based on the feature point matching is at the decimal level. Therefore, in the present application, the parallax calculation is performed with higher accuracy by means of feature point matching, thereby making the accuracy of ranging and positioning higher.

The solution provided by the present application can also improve the parallax calculation accuracy of the textureless object, thereby improving the ranging and positioning accuracy of the textureless object. It is understandable that when using a multi-eye camera to shoot textureless objects, the pixel difference of the textureless objects is very small, resulting in the method of calculating the difference between the pixels of different road images to determine the target parallax, and the accuracy is very poor. However, using the solution provided in this application, extract the first target area and the second target area where the target is located, and then perform feature point matching on the first target area and the second target area to obtain the feature point matching result. According to the feature point matching result Determining parallax can improve the matching accuracy of untextured objects.

Illustratively, taking the actual application scenario of measuring the distance between the textureless object Z and the binocular camera as an example, as shown in FIG. 12 , FIG. 12 is a schematic diagram of a textureless object provided by the present application, assuming a textureless object. Z is a checkerboard, and the checkerboard is placed at a distance of 7.5m from the binocular camera. Using a certain brand of binocular camera to shoot the checkerboard, the output depth value is 6.7m, while the output depth value of the solution provided by this application is 7.2 Therefore, the solution provided by the present application has higher accuracy of parallax calculation and better ranging and positioning accuracy.

The solution provided by the present application can also improve the accuracy of parallax calculation in an occluded scene, thereby improving the accuracy of ranging and positioning of occluded objects. It is understandable that since the pixels of the occluded object are covered and appear as the pixels of the occluder, the method of calculating the difference between the pixel points of different road images to determine the parallax of the target has poor accuracy, but the method provided by this application is used. Scheme, after using the detection and matching model shown in Figure 9 to perform target detection and matching on the target, the position of the occluded object can be estimated, the occluded object can be supplemented, and the supplemented first target area and second target area can be obtained. target area, and then perform feature point detection and matching on it to obtain the feature point matching result, determine the parallax information of the target according to the feature to be matched result, and obtain the distance between the target and the multi-camera, so that the calculated parallax accuracy is higher , the ranging accuracy of occluded objects is also higher.

For example, as shown in FIG. 13 , FIG. 13 is a schematic flowchart of the steps of determining the first target area and the second target area in an occlusion scenario provided by the present application. It is assumed that the first target area of the target 004 is not covered by the target 005 occlusion, and in the second target area, the target 004 is occluded by the target 005. If the disparity information of the target is directly determined according to the difference between the pixels between the first target area and the second target area, since the target 004 is blocked in the right image If the target 005 is occluded, the final obtained parallax will be inaccurate, resulting in low ranging and positioning accuracy. When using the solution provided by this application to perform parallax calculation on the target 004 in the first target area and the second target area, you can first estimate the position of the target 004 in the second target area, and then perform feature point detection. and matching to obtain the feature point matching result, thereby obtaining the disparity of the target 004, and then obtaining the ranging and positioning result of the target 004. The solution provided by the present application has higher disparity calculation accuracy in the occlusion scene.

To sum up, the present application provides a target positioning method, which can determine a target baseline according to the target to be measured, use the camera group of the target baseline to collect the target to obtain a first image and a second image, and then perform the first image and the second image. The image and the second image are subjected to target detection and matching, and the first target area and the second target area where the target is located are obtained, and finally the feature point detection and matching are performed on the first target area and the second target area, and the feature point matching result is obtained, The disparity information of each feature point is determined according to the feature point matching result, thereby determining the position information of the target. The system can flexibly select the target baseline camera group for data collection according to the target to be measured, avoid the problem of limited ranging range caused by fixed baseline multi-eye cameras, and improve the ranging range of the target positioning system. The system determines the location information of the target according to the parallax information of the feature points, and does not need to perform matching and parallax calculation for each pixel in the first image area and the second image area, thereby reducing the computational resources required for positioning and ranging, and avoiding background interference. , noise and other problems, improve the accuracy of ranging and positioning.

The methods of the embodiments of the present application are described in detail above. In order to facilitate better implementation of the above solutions in the embodiments of the present application, correspondingly, related equipment for implementing the above solutions is also provided below.

In the present application, the target positioning system 110 may be divided into modules or units according to functions, and there may be various ways of division. For example, as shown in the aforementioned FIG. 2 , the target positioning system 110 may include a baseline determination unit 111 , a synchronization unit 112 and a detection matching unit 113 . For specific functions of each module, reference may be made to the foregoing description, which will not be repeated here. In another embodiment, the target positioning system 110 may be further divided into units according to functions. For example, FIG. 14 is a schematic structural diagram of another target positioning system 110 provided by the present application.

As shown in FIG. 14, the present application provides a target positioning system 110. As shown in FIG. 14, the target positioning system 110 includes: a baseline determination unit 1410, an acquisition unit 1420, a synchronization unit 1430, a detection matching unit 1440, and a position Determining unit 1450.

an acquisition unit 1420, configured to acquire a first image and a second image, the first image and the second image are obtained by photographing the same target at the same time by a multi-camera camera;

The detection and matching unit 1440 is configured to perform target detection and matching on the first image and the second image, and obtain the first target area of the first image and the second target area of the second image, wherein the first target area and the second target area are The area includes the target;

The detection and matching unit 1440 is configured to perform feature point detection and matching on the first target area and the second target area, and obtain a feature point matching result, wherein the feature point matching result includes the feature points in the first target area and the second target area. The corresponding relationship between the feature points in , the feature points with the corresponding relationship describe the same feature of the target;

The position determining unit 1450 is configured to determine the position information of the target according to the feature point matching result and the parameter information of the multi-camera.

In one embodiment, the parameter information includes at least the baseline length of the multi-camera camera and the focal length of the multi-camera camera; the position determination unit 1450 is configured to obtain the target according to the pixel difference between the feature points with corresponding relationships in the feature point matching result. The disparity information, the disparity information includes the difference between the pixel coordinates of the feature points in the first target area and the pixel coordinates of the feature points that have a corresponding relationship in the second target area; the position determination unit 1450 is used for disparity information according to the target. , the baseline length of the multi-eye camera and the focal length of the multi-eye camera, determine the distance between the target and the camera, and obtain the position information of the target.

In one embodiment, the multi-camera camera includes a plurality of camera groups, and each camera group in the plurality of camera groups includes a plurality of cameras, and the baseline determination unit 1410 is configured to acquire baseline data of the multi-camera camera, and the baseline data includes each group of cameras. The baseline length between the multiple cameras in the baseline; the baseline determination unit 1410 is used to obtain the target baseline from the baseline data according to the measurement accuracy requirements of the target; the acquisition unit 1420 is used to obtain the first image and the second image according to the target baseline , wherein the first image and the second image are captured by the camera group corresponding to the target baseline.

In one embodiment, the baseline determination unit 1410 is configured to send a baseline adjustment request carrying a target baseline to the multi-camera camera, where the baseline adjustment request is used to instruct the multi-camera camera to adjust the baseline length of the camera group included in the multi-camera camera to the Target baseline; the acquiring unit 1420 is configured to receive the first image and the second image captured by the camera group corresponding to the target baseline.

In one embodiment, the baseline determination unit 1410 is configured to determine the first accuracy index and the second accuracy index of each group of cameras, wherein the first accuracy index is inversely proportional to the baseline length of each group of cameras, and the first accuracy index is inversely proportional to the length of the baseline of each group of cameras. The common viewing area of each group of cameras is in a proportional relationship, the second accuracy index is proportional to the baseline length and focal length of each group of cameras, and the common viewing area is the area photographed by multiple cameras in each group of cameras; the baseline determination unit 1410 , used to determine the weight of the first accuracy index and the second accuracy index according to the measurement accuracy requirements of the target; the baseline determination unit 1410 is used to obtain the comprehensive index of each group of cameras according to the first accuracy index, the second accuracy index and the weight ; The baseline determination unit 1410 is used to determine the target baseline according to the comprehensive index of each group of cameras.

In one embodiment, the synchronization unit 1430 is used to receive the first video and the second video obtained by shooting the target with the multi-eye camera; the synchronization unit 1430 is used to time the first video and the second video. Synchronous processing is performed to obtain a first image and a second image at the same moment, wherein the first image is an image frame in the first channel of video, and the second image is an image frame in the second channel of video.

In one embodiment, the synchronization unit 1430 is configured to obtain a reference frame from the first channel video, and obtain a plurality of motion frames from the second channel video, wherein the reference frame and the plurality of motion frames include moving objects; the synchronization unit 1430, for performing feature point matching between the reference frame and multiple motion frames, to obtain a synchronization frame in the multiple motion frames, wherein the parallelism of the line between the feature point in the synchronization frame and the corresponding feature point in the reference frame The preset conditions are met; the synchronization unit 1430 is configured to perform time synchronization correction on the video of the first channel and the video of the second channel according to the reference frame and the synchronization frame, and obtain the first image and the second image at the same moment.

In one embodiment, the detection matching unit 1440 is configured to input the first image into the detection matching model, obtain the first detection matching result of the first image, input the second image into the detection matching model, and obtain the second detection matching result of the second image. The matching result, wherein the first detection matching result and the second detection matching result include a target frame and a label, the target frame is used to indicate the area of the target in the image, and the label of the same target is the same; the detection matching unit 1440 is used for according to the first The first target area is obtained by detecting the matching result, and the second target area is obtained according to the second detecting and matching result.

It should be understood that the unit modules inside the target positioning system 110 may also be divided into multiple divisions, and each module may be a software module, a hardware module, or part of a software module and part of a hardware module, which is not limited in this application. 2 and 14 are both exemplary division manners. For example, in some feasible solutions, the obtaining unit 1420 in FIG. 14 may also be omitted; in other feasible solutions, the location in FIG. 14 The determination unit 1450 may also be omitted; in other feasible solutions, the detection and matching unit 1440 in FIG. 14 may be further divided into multiple modules, such as an image detection and matching module for obtaining the first target area and the second target area, and a feature point detection module for obtaining a feature point matching result, which is not limited in this application.

To sum up, the present application provides a target positioning system, which can determine a target baseline according to the target to be measured, use the camera group of the target baseline to collect the target to obtain a first image and a second image, and then perform the first image and the second image. The image and the second image are subjected to target detection and matching, and the first target area and the second target area where the target is located are obtained, and finally the feature point detection and matching are performed on the first target area and the second target area, and the feature point matching result is obtained, The disparity information of each feature point is determined according to the feature point matching result, thereby determining the position information of the target. The system can flexibly select the target baseline camera group for data collection according to the target to be measured, avoid the problem of limited ranging range caused by fixed baseline multi-eye cameras, and improve the ranging range of the target positioning system. The system determines the location information of the target according to the parallax information of the feature points, and does not need to perform matching and parallax calculation for each pixel in the first image area and the second image area, thereby reducing the computational resources required for positioning and ranging, and avoiding background interference. , noise and other problems, improve the accuracy of ranging and positioning.

FIG. 15 is a schematic structural diagram of a computing device 900 provided by the present application, and the computing device 900 may be the target positioning system 110 in the foregoing content. As shown in FIG. 15 , the computing device 900 includes a processor 910 , a communication interface 920 and a memory 930 . The processor 910, the communication interface 920 and the memory 930 can be connected to each other through the internal bus 940, and can also communicate through other means such as wireless transmission. The embodiment of the present application takes the connection through the bus 940 as an example, and the bus 940 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like. The bus 940 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in Figure 15, but it does not mean that there is only one bus or one type of bus.

The processor 910 may be composed of at least one general-purpose processor, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL) or any combination thereof. Processor 910 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 930, which enable computing device 900 to provide various services.

The memory 930 is used for storing program codes, and is controlled and executed by the processor 910 to execute the processing steps of the target positioning system in the above-mentioned embodiment. The program code may include one or more software modules, and the one or more software modules may be the software modules provided in the embodiment of FIG. 14 , such as an acquisition unit, a detection matching unit and a position determination unit, wherein the acquisition unit is used to acquire The first image and the second image, the detection and matching unit is used to input the first image and the second image into the detection and matching model, obtain the first target area and the second target area, and then perform the feature on the first target area and the second target area. Point detection and matching to obtain the feature point matching result, and the position determination unit is used for determining the position information of the target according to the feature point matching result and the parameter information of the multi-camera. Specifically, it can be used to execute S310-step S350 in the embodiment of FIG. 6 and its optional steps, and can also be used to implement other functions of the target positioning system 110 described in the embodiment of FIG. 1 to FIG. 13 , which will not be repeated here.

It should be noted that this embodiment can be implemented by a general physical server, for example, an ARM server or an X86 server, or can be implemented based on a general physical server combined with a virtual machine implemented by NFV technology . A complete computer system with complete hardware system functions and running in a completely isolated environment is not specifically limited in this application.

Memory 930 may include volatile memory (volatile memory), such as random access memory (RAM); memory 1030 may also include non-volatile memory (non-volatile memory), such as read-only memory (read- only memory, ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (solid-state drive, SSD); the memory 930 may also include a combination of the above types. The memory 930 may store program codes, and may specifically include program codes for executing other steps described in the embodiments of FIG. 1 to FIG. 13 , which will not be repeated here.

The communication interface 920 may be a wired interface (such as an Ethernet interface), an internal interface (such as a high-speed serial computer expansion bus (peripheral component interconnect express, PCIe) bus interface), a wired interface (such as an Ethernet interface), or a wireless interface ( such as a cellular network interface or using a wireless local area network interface) to communicate with other devices or modules.

It should be noted that FIG. 15 is only a possible implementation manner of the embodiment of the present application. In practical applications, the computing device 900 may further include more or less components, which is not limited here. For content not shown or described in the embodiments of the present application, reference may be made to the relevant descriptions in the foregoing embodiments of FIG. 1 to FIG. 13 , which will not be repeated here.

It should be understood that the computing device shown in FIG. 15 may also be a computer cluster composed of at least one server, which is not specifically limited in this application.

Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a processor, the method flow shown in FIG. 1 to FIG. 13 is implemented.

The embodiment of the present application further provides a computer program product, when the computer program product runs on the processor, the method flow shown in FIG. 1-FIG. 13 is realized.

The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, fiber optic, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, high density digital video discs (DVDs), or semiconductor media. The semiconductor media may be SSDs.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed by the present invention. Modifications or substitutions should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A method for target positioning, characterized in that the method comprises:

acquiring a first image and a second image, the first image and the second image are obtained by shooting the same target at the same time by a multi-camera camera;

Perform target detection and matching on the first image and the second image to obtain a first target area of the first image and a second target area of the second image, wherein the first target area and the second target area includes the target;

Perform feature point detection and matching on the first target area and the second target area to obtain a feature point matching result, wherein the feature point matching result includes the feature points in the first target area and the first target area. The corresponding relationship between the feature points in the two target areas, the feature points with the corresponding relationship describe the same feature of the target;

The position information of the target is determined according to the feature point matching result and the parameter information of the multi-camera.
The method according to claim 1, wherein the parameter information includes at least a baseline length of the multi-camera camera and a focal length of the multi-camera camera;

The determining of the location information of the target according to the feature point matching result and the parameter information of the multi-camera includes:

According to the pixel difference between the corresponding feature points in the feature point matching result, the disparity information of the target is obtained, and the disparity information includes the pixel coordinates of the feature points in the first target area and the first target area. The difference between the pixel coordinates of the feature points with the corresponding relationship in the two target regions;

According to the parallax information of the target, the baseline length of the multi-eye camera and the focal length of the multi-eye camera, the distance between the target and the camera is determined, and the position information of the target is obtained.
The method according to claim 1 or 2, wherein the multi-camera camera comprises a plurality of camera groups, each camera group in the plurality of camera groups comprises a plurality of cameras, and the acquiring the first image and the second camera Two images include:

acquiring baseline data of the multi-camera camera, the baseline data including the baseline lengths between the plurality of cameras in each group of cameras;

According to the measurement accuracy requirement of the target, obtain the target baseline from the baseline data;

The first image and the second image are acquired according to the target baseline, wherein the first image and the second image are captured by a camera group corresponding to the target baseline.
The method according to any one of claims 1-3, wherein the acquiring the first image and the second image comprises:

sending a baseline adjustment request carrying a target baseline to the multi-eye camera, where the baseline adjustment request is used to instruct the multi-eye camera to adjust the baseline length of the camera group included in the multi-eye camera to the target baseline;

The first image and the second image captured by the camera group corresponding to the target baseline are received.
The method according to claim 3 or 4, wherein the obtaining the target baseline from the baseline data according to the measurement accuracy requirement of the target comprises:

Determine the first accuracy index and the second accuracy index of each group of cameras, wherein the first accuracy index is inversely proportional to the baseline length of each group of cameras, and the first accuracy index is related to each group of cameras. The common viewing area is in a proportional relationship, the second accuracy index is proportional to the baseline length and the focal length of each group of cameras, and the common viewing area is the area jointly photographed by a plurality of cameras in each group of cameras ;

Determine the weight of the first accuracy index and the second accuracy index according to the measurement accuracy requirement of the target;

According to the first accuracy index, the second accuracy index and the weight, obtain the comprehensive index of each group of cameras;

According to the comprehensive index of each group of cameras, the target baseline is determined.
The method according to any one of claims 1 to 5, wherein the acquiring the first image and the second image comprises:

receiving the first video and the second video obtained by shooting the target by the multi-camera;

Perform time synchronization processing on the first channel video and the second channel video to obtain the first image and the second image at the same moment, wherein the first image is the image in the first channel video frame, and the second image is an image frame in the second channel of video.
The method according to claim 6, wherein the performing time synchronization processing on the first channel of video and the second channel of video to obtain the first image and the second image at the same moment comprises:

Obtain a reference frame from the first video, and obtain a plurality of motion frames from the second video, wherein the reference frame and the plurality of motion frames include moving objects;

Perform feature point matching on the reference frame and the plurality of motion frames to obtain a synchronization frame in the plurality of motion frames, wherein the feature point in the synchronization frame and the corresponding feature point in the reference frame are determined. The parallelism of the connecting lines satisfies the preset condition;

Time synchronization correction is performed on the first channel of video and the second channel of video according to the reference frame and the synchronization frame to obtain the first image and the second image at the same moment.
The method according to any one of claims 1 to 7, wherein the first image and the second image are subjected to target detection and matching to obtain a first target area of the first image , the second target area of the second image, including:

inputting the first image into a detection and matching model to obtain a first detection and matching result of the first image, inputting the second image into the detection and matching model to obtain a second detection and matching result of the second image, Wherein, the first detection matching result and the second detection matching result include a target frame and a label, the target frame is used to indicate the area of the target in the image, and the labels of the same target are the same;

The first target area is obtained according to the first detection and matching result, and the second target area is obtained according to the second detection and matching result.
A target positioning system, characterized in that the system comprises:

an acquisition unit, configured to acquire a first image and a second image, wherein the first image and the second image are obtained by photographing the same target at the same time by a multi-camera camera;

a detection and matching unit, configured to perform target detection and matching on the first image and the second image to obtain a first target area of the first image and a second target area of the second image, wherein the the first target area and the second target area include the target;

The detection and matching unit is configured to perform feature point detection and matching on the first target area and the second target area, and obtain a feature point matching result, wherein the feature point matching result includes the first target area. The corresponding relationship between the feature points and the feature points in the second target area, and the feature points with the corresponding relationship describe the same feature of the target;

A position determination unit, configured to determine the position information of the target according to the feature point matching result and the parameter information of the multi-camera.
The system according to claim 9, wherein the parameter information includes at least a baseline length of the multi-camera camera and a focal length of the multi-camera camera;

The position determination unit is configured to obtain parallax information of the target according to the pixel difference between the feature points with corresponding relationships in the feature point matching result, where the parallax information includes features in the first target area the difference between the pixel coordinates of the point and the pixel coordinates of the feature points that have the corresponding relationship in the second target area;

The position determination unit is configured to determine the distance between the target and the camera according to the parallax information of the target, the baseline length of the multi-eye camera and the focal length of the multi-eye camera, and obtain the target location information.
The system according to claim 9 or 10, wherein the multi-camera camera comprises a plurality of camera groups, each camera group in the plurality of camera groups comprises a plurality of cameras, and the system further comprises a baseline determination unit ,

the baseline determination unit, configured to acquire baseline data of the multi-camera camera, where the baseline data includes baseline lengths between a plurality of cameras in each group of cameras;

the baseline determination unit, configured to obtain the target baseline from the baseline data according to the measurement accuracy requirement of the target;

The acquiring unit is configured to acquire the first image and the second image according to the target baseline, wherein the first image and the second image are captured by a camera group corresponding to the target baseline.
The system according to any one of claims 9 to 11, characterized in that:

The baseline determination unit is configured to send a baseline adjustment request carrying a target baseline to the multi-camera camera, where the baseline adjustment request is used to instruct the multi-camera camera to adjust the baseline length of the camera group included in the multi-camera camera to the target baseline;

The acquiring unit is configured to receive the first image and the second image captured by the camera group corresponding to the target baseline.
The system according to claim 11 or 12, wherein,

The baseline determination unit is configured to determine the first accuracy index and the second accuracy index of each group of cameras, wherein the first accuracy index is inversely proportional to the baseline length of each group of cameras, and the first The accuracy index is proportional to the common viewing area of each group of cameras, the second accuracy index is proportional to the baseline length and focal length of each group of cameras, and the common viewing area is the common viewing area of each group of cameras. Areas captured by multiple cameras;

the baseline determination unit, configured to determine the weight of the first accuracy index and the second accuracy index according to the measurement accuracy requirement of the target;

the baseline determination unit, configured to obtain the comprehensive index of each group of cameras according to the first accuracy index, the second accuracy index and the weight;

The baseline determination unit is configured to determine the target baseline according to the comprehensive index of each group of cameras.
The system according to any one of claims 9 to 13, wherein the system further comprises a synchronization unit,

the synchronization unit, configured to receive the first video and the second video obtained by shooting the target by the multi-camera;

The synchronization unit is configured to perform time synchronization processing on the first channel video and the second channel video to obtain the first image and the second image at the same moment, wherein the first image is the The image frame in the first channel of video, and the second image is the image frame in the second channel of video.
The system of claim 14, wherein:

The synchronization unit is configured to obtain a reference frame from the first channel of video, and obtain a plurality of motion frames from the second channel of video, wherein the reference frame and the plurality of motion frames include moving objects ;

The synchronization unit is configured to perform feature point matching between the reference frame and the plurality of motion frames to obtain a synchronization frame in the plurality of motion frames, wherein the feature points in the synchronization frame and the reference frame The parallelism of the lines between the corresponding feature points in the frame satisfies the preset condition;

The synchronization unit is configured to perform time synchronization correction on the video of the first channel and the video of the second channel according to the reference frame and the synchronization frame, so as to obtain the first image and the second video at the same moment. image.
A system according to any one of claims 9 to 15, characterized in that:

The detection and matching unit is configured to input the first image into a detection and matching model, obtain a first detection and matching result of the first image, and input the second image into the detection and matching model to obtain the second The second detection and matching result of the image, wherein the first detection and matching result and the second detection and matching result include a target frame and a label, and the target frame is used to indicate the area of the target in the image. said labels are the same;

The detection and matching unit is configured to obtain a first target area according to the first detection and matching result, and obtain the second target area according to the second detection and matching result.
A computer-readable storage medium, comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any one of claims 1 to 8.
A computing device, comprising a processor and a memory, wherein the processor executes the code in the memory to execute the method according to any one of claims 1 to 8.
A computer program product, characterized by comprising a computer program, which, when the computer program is read and executed by a computing device, causes the computing device to execute the method according to any one of claims 1 to 8 .