CN110889873A - Target positioning method and device, electronic equipment and storage medium - Google Patents

Target positioning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110889873A
CN110889873A CN201911175503.1A CN201911175503A CN110889873A CN 110889873 A CN110889873 A CN 110889873A CN 201911175503 A CN201911175503 A CN 201911175503A CN 110889873 A CN110889873 A CN 110889873A
Authority
CN
China
Prior art keywords
target
left view
calculating
view
disparity map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911175503.1A
Other languages
Chinese (zh)
Inventor
李子申
潘军道
吴海涛
李瑞东
刘振耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Opto Electronics of CAS
Original Assignee
Academy of Opto Electronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Opto Electronics of CAS filed Critical Academy of Opto Electronics of CAS
Priority to CN201911175503.1A priority Critical patent/CN110889873A/en
Publication of CN110889873A publication Critical patent/CN110889873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target positioning method, a target positioning device, electronic equipment and a storage medium, wherein the method comprises the following steps: calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned; inputting the left view into the trained deep learning network, and outputting a target mask in the left view; and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view. According to the invention, binocular stereo vision and depth learning are combined, a binocular camera is utilized to calculate the position deviation between corresponding points of left and right views according to a triangulation principle, a depth learning method is utilized to perform specific target identification processing on an image, a scene target is positioned in real time by combining three-dimensional reconstruction information on the basis of target identification, the target positioning process is simplified, primary and secondary targets are not distinguished, and simultaneously all target positions in a field of view are calculated; deep learning can be targeted to specific targets and general targets to expand the scope of target localization applications.

Description

Target positioning method and device, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of target positioning, and particularly relates to a target positioning method and device, electronic equipment and a storage medium.
Background
The airborne photoelectric imaging platform is all-weather photoelectric reconnaissance equipment which integrates high-precision measurement equipment such as a visible light camera, a thermal infrared imager, a television tracker, a laser range finder, an angle sensor and the like and is used for realizing functions such as aerial reconnaissance, target aiming, tracking, positioning and the like.
The method comprises the following steps that generally, a single-point positioning method is adopted by an airborne photoelectric platform, and a cross-hair pointing target at the center of an image is positioned through an attitude measurement/laser ranging positioning model; the implementation of positioning of a plurality of targets requires frequent changes of the spatial orientation of the airborne photoelectric platform for multiple positioning, which consumes long time and is difficult to implement real-time or quasi-real-time positioning of a plurality of targets at the same time.
In the prior art, a multi-target autonomous positioning model based on a pixel sight vector is provided for simultaneously positioning a plurality of targets in real time or quasi-real time and establishing a multi-target autonomous positioning system of an airborne photoelectric imaging platform. The method comprises the steps of obtaining pixel coordinates of each target in a view field through a target detection algorithm, constructing a sight line vector of each target according to an imaging principle of a single-sided array Charge Coupled Device (CCD) sensor, calculating a pixel sight line angle between each target and a main target in the center of an image, calculating an angle and distance relation between each target and an airborne photoelectric platform by combining a measured azimuth angle, a height angle and a distance of the main target relative to the photoelectric platform, obtaining position and attitude information of an aircraft carrier by applying a Global Positioning System (GPS) and an attitude measurement technology, and calculating geodetic coordinates of a plurality of targets in a single image through a homogeneous coordinate transformation method.
After the photoelectric platform searches for a ground target, the main target is locked at the center of a view field, information such as an azimuth angle and a height angle of a visual axis relative to the navigation attitude measurement system, a distance between the main target and the photoelectric platform and the like is output, and meanwhile positioning data output by a GPS positioning system and photoelectric platform attitude data output by the navigation attitude measurement system are collected for coordinate conversion, and geodetic coordinates of the main target are calculated. For other targets (called secondary targets herein) in the field of view, the target detection module can be used for outputting pixel coordinates of the other targets, constructing a sight line vector of each target and calculating a pixel sight line angle between the sight line vector and the primary target, calculating a distance and angle relation between each target and the photoelectric platform by combining an azimuth angle, a height angle and a distance of the primary target relative to the photoelectric platform, and outputting geodetic coordinates of the secondary targets through homogeneous coordinate transformation.
The target detection module simultaneously detects the pixel coordinates of a plurality of static or moving targets by adopting an image segmentation method, a frame difference method or an optical flow method.
Disclosure of Invention
To overcome the existing problems or at least partially solve the problems, embodiments of the present invention provide an object positioning method, an apparatus, an electronic device, and a storage medium.
According to a first aspect of the embodiments of the present invention, there is provided a target positioning method, including:
calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned;
inputting the left view into the trained deep learning network, and outputting a target mask in the left view;
and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view.
On the basis of the technical scheme, the invention can be improved as follows.
Preferably, the calculating the disparity map based on the left view and the right view captured by the binocular camera and containing the target to be positioned includes:
calibrating the binocular camera to obtain internal and external parameters of the binocular camera;
performing stereo rectification on the left view and the right view based on the internal and external parameters of the binocular camera, so that the left view and the right view keep line alignment;
and matching the left view and the right view by adopting a stereo matching method based on the corrected left view and the right view to obtain a disparity map.
Preferably, the stereo matching method is an efficient large-scale stereo matching method.
Preferably, the deep learning network is trained by:
training the deep learning network based on a left view training set, wherein the left view training set comprises a plurality of left views and pixel point positions of targets in each left view, the pixel point positions of the targets form a target mask, and the left view is captured by the binocular camera.
Preferably, the calculating three-dimensional space coordinates of the object to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the object mask in the left view includes:
calculating a reprojection matrix according to a stereoscopic vision principle and internal and external parameters of the binocular camera;
and calculating to obtain the three-dimensional space coordinate of the target to be positioned based on the reprojection matrix, the parallax map and the target mask in the left view.
Preferably, the calculating the three-dimensional space coordinate of the object to be positioned based on the reprojection matrix, the disparity map, and the object mask in the left view includes:
calculating the three-dimensional space coordinates of the object by the following formula:
Figure BDA0002289834820000031
q is a reprojection matrix, (X, Y) represents the coordinates of pixel points of the target to be positioned in the left view, d is the parallax of the pixel point coordinates of the target to be positioned in the left view at the position of (X, Y), and (X/W, Y/W, Z/W) is the three-dimensional space coordinates corresponding to the target to be positioned in the scene.
Preferably, the deep learning network is a Mask RCNN deep neural network.
Preferably, the target to be positioned in the left view and the right view includes one or more.
According to a second aspect of the embodiments of the present invention, there is provided an object locating apparatus, including:
the first calculation module is used for calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned;
the output module is used for inputting the left view into the trained deep learning network and outputting a target mask code in the left view;
and the second calculation module is used for calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view.
According to a third aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor calls the program instruction to be able to execute the target location method provided in any one of the various possible implementations of the first aspect.
According to a fourth aspect of embodiments of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the target location method provided in any one of the various possible implementations of the first aspect.
The embodiment of the invention provides a target positioning method, a target positioning device, electronic equipment and a storage medium, wherein binocular stereoscopic vision and deep learning are combined, a binocular camera is used for calculating position deviation between corresponding points of a left view and a right view according to a triangulation principle, a deep learning method is used for carrying out specific target identification processing on an image, a scene target is positioned in real time by combining three-dimensional reconstruction information on the basis of target identification, the target positioning process is simplified, primary and secondary targets are not distinguished, and simultaneously all target positions in a field of view are calculated; deep learning can be targeted to specific targets and general targets to expand target localization applications.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic overall flow chart of a target positioning method according to an embodiment of the present invention;
fig. 2 is a flowchart of acquiring a disparity map of left and right views according to an embodiment of the present invention;
FIG. 3-1 is a schematic view of a stereotactic orthotic model;
FIG. 3-2 is a schematic view of a binocular optical axis parallel model;
FIG. 4 is a flow chart of a three-dimensional reconstruction projection provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of an overall structure of a target positioning apparatus according to an embodiment of the present invention;
fig. 6 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
In an embodiment of the present invention, a target positioning method is provided, and fig. 1 is a schematic overall flow chart of the target positioning method provided in the embodiment of the present invention, where the method includes:
calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned;
inputting the left view into the trained deep learning network, and outputting a target mask in the left view;
and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view.
It can be understood that in the embodiment of the present invention, a binocular camera is adopted to capture an image including a target to be positioned, that is, a left view and a right view including the target to be positioned are captured by the binocular camera, a disparity map is obtained by calculation according to the left view and the right view, and a target mask corresponding to the target to be positioned in the left view is extracted by using a trained deep learning network. And finally, based on the disparity map and the target mask in the left view, calculating the three-dimensional space coordinate of the target to be positioned by using a three-dimensional reconstruction projection method.
The embodiment of the invention combines binocular stereo vision with deep learning, utilizes a binocular camera to calculate the position deviation between corresponding points of left and right views according to the triangulation principle, utilizes a deep learning method to identify and process specific targets of images, combines three-dimensional reconstruction information to position scene targets in real time on the basis of target identification, simplifies the target positioning process, does not distinguish primary and secondary targets, and simultaneously calculates all target positions in the field of view; deep learning can be targeted to specific targets and general targets to expand target localization applications.
Referring to fig. 2, on the basis of the above embodiment, in the embodiment of the present invention, the calculating a disparity map based on the left view and the right view captured by the binocular camera and including the target to be positioned includes:
calibrating the binocular camera to obtain internal and external parameters of the binocular camera;
performing stereo rectification on the left view and the right view based on the internal and external parameters of the binocular camera, so that the left view and the right view keep line alignment;
and matching the left view and the right view by adopting a stereo matching method based on the corrected left view and the right view to obtain a disparity map.
It can be understood that the method for calculating the disparity map according to the left view and the right view including the target in the embodiment of the present invention includes the first step of calibrating the binocular camera (the binocular optical axis parallel model) to obtain the internal and external parameters of the binocular camera. In the embodiment of the invention, the binocular calibration adopts an MATLAB calibration tool box to directly calibrate the binocular camera, so as to obtain the internal parameters of the left camera and the right camera and the posture of the right camera relative to the left camera.
And secondly, performing image stereo correction on the left view and the right view based on the internal and external parameters of the binocular camera, so that the left view and the right view keep line alignment. Specifically, in practical application, binocular stereo vision needs to be subjected to image distortion correction, left and right views are subjected to stereo correction to be made into a standard optical axis parallel model, two imaging planes are coplanar and aligned in rows, and therefore search of matching points only needs to be conducted according to the rows, and a basis is made for stereo matching.
And thirdly, matching the left view and the right view by adopting a stereo matching method based on the corrected left view and right view to obtain a disparity map. In order to ensure real-time and reliability, the embodiment of the invention adopts an Efficient Large-Scale Stereo Matching (ELAS) method, which is a Bayesian process and can calculate the accurate disparity map of the high-resolution image at a frame rate close to real time.
On the basis of the above embodiments, in the embodiments of the present invention, the deep learning network is trained in the following manner:
training the deep learning network based on a left view training set, wherein the left view training set comprises a plurality of left views and pixel point positions of targets in each left view, the pixel point positions of the targets form a target mask, and the left view is captured by the binocular camera.
It is understood that the binocular optical axis parallel model is a simple stereoscopic model. In order to obtain the three-dimensional coordinates of a certain point in the space, a binocular optical axis parallel model is modeled. In practical situations, the left and right camera imaging planes are difficult to realize by strictly placing the cameras, so that stereo correction is necessary, the binocular stereo imaging schematic diagrams with two parallel camera optical axes are shown in fig. 3-1 and 3-2, fig. 3-1 is a stereo correction model schematic diagram of a left and right view, and fig. 3-2 is a binocular optical axis parallel model schematic diagram.
The imaging of the camera accords with a pinhole imaging model, the baseline distance T of the left camera and the right camera is constant, the left camera and the right camera are assumed to be identical, and the focal distance f is1=f2F. And principal point clcr(intersection of the optical axis with the image plane) has been calibrated to have the same pixel coordinates on the left and right images. The optical centers of the left camera and the right camera are respectively used as the original points O of the coordinate systems of the left eye camera and the right eye cameral,OrThe connecting lines between them are taken as their common x-axis and their optical axes are taken as their respective z-axes, and their y-axes are perpendicular to the xz-plane (schematic not shown). S in FIG. 1l,SrThe projection of left and right eye imaging plane coordinate system RCS (continuous correlation system) on the X-axis, the imaging plane coordinate system uses the top left vertex of the image as the origin of the coordinate system, and one point P (X) of the physical worldw,Yw,Zw) The intersection points in the left and right eye image plane coordinate systems are respectively (x)l,yl) And (x)r,yr). As can be seen from fig. 1:
dxl=xl-cl
dxr=cr-xr
let d be dxl-dxrAnd d is parallax.
From a similar triangle can be derived:
Figure BDA0002289834820000071
deducing:
Figure BDA0002289834820000081
when the left eye camera coordinate system is used as the world coordinate system wcs (world coordinate system),
Figure BDA0002289834820000082
in the same way
Figure BDA0002289834820000083
Wherein x islAnd xrThe units are millimeters. In practical application, pixel points are adopted to represent:
Figure BDA0002289834820000084
Figure BDA0002289834820000085
Figure BDA0002289834820000086
wherein x isplAnd xprCoordinate positions respectively characterised by the pixel, in units of cplAnd cprRespectively expressed as the center coordinates of the left and right view pixels, with the unit of one, SxIs the pixel size in millimeters.
From the above analysis, it can be seen that the three-dimensional space coordinates of any space point can be obtained by only obtaining the coordinates of the pixel points of the space point in the left view and the right view.
Therefore, to locate the target in the scene, that is, to obtain the three-dimensional space coordinates of the target in the scene, it is necessary to first obtain the position coordinates of the pixel points of each point of the target in the left and right views.
In the embodiment of the invention, the target in the view and the pixel point coordinates of the target are extracted by utilizing the deep learning network. The deep learning network is trained on the basis of the left view, wherein the target in each left view is extracted and the pixel point coordinates of the target are marked, the pixel point coordinates of the target form a target Mask, a left view training set is formed by the left views and the target Mask in each left view, and the deep learning network is trained by the left view training set to obtain the trained deep learning network.
And for the target to be positioned, capturing a left view and a right view of the target to be positioned by using binocular cameras, inputting the left view into the trained deep learning network, and outputting a target mask of the target in the left view.
Referring to fig. 4, on the basis of the foregoing embodiments, in the embodiment of the present invention, the calculating the three-dimensional space coordinate of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view includes:
calculating a reprojection matrix according to a stereoscopic vision principle and internal and external parameters of the binocular camera;
and calculating to obtain the three-dimensional space coordinate of the target to be positioned based on the reprojection matrix, the parallax map and the target mask in the left view.
It can be understood that, in the above embodiment, the target mask of the target in the left view of the target to be positioned is obtained, and the three-dimensional space coordinate of the target to be positioned is obtained by using the three-dimensional reconstruction projection method. The three-dimensional reconstruction of the environment is usually completed in a non-contact manner, and the non-contact three-dimensional reconstruction method is divided into two types according to different methods for acquiring the depth information of the target object: active and passive. Active three-dimensional reconstruction refers to directly acquiring depth information of a target object in an environment by emitting light sources or energy sources such as laser and infrared rays to the object in the environment, and mainly includes a moire fringe method, a time of flight (TOF) method and a structured light method. Compared with the active three-dimensional reconstruction technology, the passive three-dimensional reconstruction technology does not use any specific light source, utilizes the reflection of the surrounding environment such as sunlight, uses a camera to acquire the image information of the object, and then realizes the three-dimensional modeling of the object through a specific algorithm. The embodiment of the invention adopts a passive three-dimensional modeling method, and the specific process of three-dimensional modeling is to calculate a reprojection matrix according to a stereoscopic vision principle and internal and external parameters of a binocular camera; and calculating to obtain the three-dimensional space coordinates of the target based on the reprojection matrix, the disparity map and the target mask in the left view.
On the basis of the foregoing embodiments, in the embodiments of the present invention, calculating the three-dimensional space coordinate of the target to be positioned based on the reprojection matrix, the disparity map, and the target mask in the left view includes:
calculating the three-dimensional space coordinates of the object by the following formula:
Figure BDA0002289834820000101
q is a reprojection matrix, (X, Y) represents the coordinates of pixel points of the target to be positioned in the left view, d is the parallax of the pixel point coordinates of the target to be positioned in the left view at the position of (X, Y), and (X/W, Y/W, Z/W) is the three-dimensional space coordinates corresponding to the target to be positioned in the scene.
Extracting a target mask of the target to be positioned in the left view through a deep learning network (the target mask can obtain the coordinates of each pixel point of the target to be positioned), and calculating according to the formula and the coordinates of each pixel point of the target to be positioned to obtain the three-dimensional space coordinates corresponding to the coordinates of each pixel point of the target to be positioned.
On the basis of the above embodiments, in the embodiment of the present invention, one or more targets to be positioned in the left view and the right view are included. When a plurality of targets to be positioned are in a scene, calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain the targets to be positioned; inputting the left view into the trained deep learning network, and outputting a target mask of each target in the left view; based on the target mask of each target in the disparity map and the left view, the three-dimensional space coordinate of each target is calculated by using a three-dimensional reconstruction projection method, namely, each target in the scene is positioned, and the positioning of a plurality of targets in the scene is realized.
In another embodiment of the invention, an object localization apparatus is provided for implementing the methods of the preceding embodiments. Therefore, the descriptions and definitions in the embodiments of the target positioning method described above can be used for understanding the execution modules in the embodiments of the present invention. Fig. 5 is a schematic diagram of an overall structure of an object locating apparatus according to an embodiment of the present invention, which includes a first calculating module 51, an output module 52, and a second calculating module 53.
The first calculation module 51 is used for calculating a disparity map based on a left view and a right view which are captured by the binocular camera and contain a target to be positioned;
an output module 52, configured to input the left view into the trained deep learning network, and output a target mask in the left view;
and the second calculating module 53 is configured to calculate three-dimensional space coordinates of the object to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the object mask in the left view.
The target positioning device provided in the embodiments of the present invention corresponds to the target positioning methods provided in the embodiments described above, and the relevant technical features of the provided target positioning device may refer to the relevant technical features of the target positioning method, which is not described herein again.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: processor (processor)01, communication Interface (Communications Interface)02, memory (memory)03 and communication bus 04, wherein, the processor 01, communication Interface 02 and memory 03 can call logic instructions in the memory 03 to execute the following method by communication bus 04 to complete the communication between the processor 01 and the memory 03: calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned; inputting the left view into the trained deep learning network, and outputting a target mask in the left view; and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the target mask in the disparity map and the left view.
In addition, the logic instructions in the memory 03 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned; inputting the left view into the trained deep learning network, and outputting a target mask in the left view; and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the target mask in the disparity map and the left view.
According to the target positioning method, the target positioning device, the electronic equipment and the storage medium, a binocular optical axis parallel model (a binocular camera) is used for acquiring a left view and a right view (a left view and a right view) in real time and performing three-dimensional correction; the method comprises the following steps of calculating the position deviation between corresponding points of two views according to a triangulation principle, carrying out specific target identification processing on an image after scene information is acquired, and positioning a scene target in real time by combining three-dimensional reconstruction information on the basis of target identification, wherein the method has the following advantages:
the binocular vision simulates the process of human eyes for perceiving the target object information in the space, and three-dimensional information of a space point is obtained through coordinates of one point in the space on left and right imaging planes on the basis of parallax and a triangular geometrical relationship by utilizing two cameras; compared with other devices, the binocular vision three-dimensional reconstruction does not need to add complex light source equipment, and has the advantages of reliability, convenience, appropriate precision, low cost, accordance with popular requirements and the like.
The pixel position of the target is detected only by acquiring image information of the target in advance and marking the target (marking pixel position coordinates of the target in the image), after the acquired large number of target images are marked, the target images are trained and learned by a deep learning network to obtain a parameter model meeting requirements, the trained parameter model is applied to a newly input image to obtain a target detection result, and the output of the target detection is the pixel position coordinates of the target in a left view.
The target detection based on the deep learning algorithm is a mainstream target detection algorithm in the current computer vision field, depends on hierarchical characteristic representation of a multilayer neural network learning image, and can realize higher accuracy compared with the traditional detection method; the binocular vision is combined with a deep learning network, the deep learning network identifies a target in an image and outputs the pixel position of the target in the image, and finally, the binocular vision three-dimensional reconstruction information is combined to complete real-time positioning of the target; the Mask RCNN extracts the target and the pixel position of the target, the target position is solved by three-dimensional reprojection in combination with binocular stereo vision, and the calculated amount is small.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of locating an object, comprising:
calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned;
inputting the left view into the trained deep learning network, and outputting a target mask in the left view;
and calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view.
2. The method for locating the target according to claim 1, wherein the calculating the disparity map based on the left view and the right view captured by the binocular camera and containing the target to be located comprises:
calibrating the binocular camera to obtain internal and external parameters of the binocular camera;
performing stereo rectification on the left view and the right view based on the internal and external parameters of the binocular camera, so that the left view and the right view keep line alignment;
and matching the left view and the right view by adopting a stereo matching method based on the corrected left view and the right view to obtain a disparity map.
3. The method of claim 2, wherein the stereo matching method is an efficient large-scale stereo matching method.
4. The method of claim 1, wherein the deep learning network is trained by:
training the deep learning network based on a left view training set, wherein the left view training set comprises a plurality of left views and pixel point positions of targets in each left view, the pixel point positions of the targets form a target mask, and the left view is captured by the binocular camera.
5. The method for locating an object according to claim 1, wherein the calculating three-dimensional space coordinates of the object to be located by using a three-dimensional reconstruction projection method based on the disparity map and the object mask in the left view comprises:
calculating a reprojection matrix according to a stereoscopic vision principle and internal and external parameters of the binocular camera;
and calculating to obtain the three-dimensional space coordinate of the target to be positioned based on the reprojection matrix, the parallax map and the target mask in the left view.
6. The method of claim 5, wherein the calculating three-dimensional space coordinates of the object to be positioned based on the reprojection matrix, the disparity map, and the object mask in the left view comprises:
calculating the three-dimensional space coordinates of the object by the following formula:
Figure FDA0002289834810000021
q is a reprojection matrix, (X, Y) represents the coordinates of pixel points of the target to be positioned in the left view, d is the parallax of the pixel point coordinates of the target to be positioned in the left view at the position of (X, Y), and (X/W, Y/W, Z/W) is the three-dimensional space coordinates corresponding to the target to be positioned in the scene.
7. The method of claim 1 or 4, wherein the deep learning network is a MaskRCNN deep neural network.
8. The method as claimed in claim 1, wherein the target to be located in the left and right views comprises one or more.
9. An object positioning device, comprising:
the first calculation module is used for calculating a disparity map based on a left view and a right view which are captured by a binocular camera and contain a target to be positioned;
the output module is used for inputting the left view into the trained deep learning network and outputting a target mask code in the left view;
and the second calculation module is used for calculating the three-dimensional space coordinates of the target to be positioned by using a three-dimensional reconstruction projection method based on the disparity map and the target mask in the left view.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the object localization method as claimed in any one of claims 1 to 8 are implemented by the processor when executing the program.
CN201911175503.1A 2019-11-26 2019-11-26 Target positioning method and device, electronic equipment and storage medium Pending CN110889873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911175503.1A CN110889873A (en) 2019-11-26 2019-11-26 Target positioning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911175503.1A CN110889873A (en) 2019-11-26 2019-11-26 Target positioning method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110889873A true CN110889873A (en) 2020-03-17

Family

ID=69748906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911175503.1A Pending CN110889873A (en) 2019-11-26 2019-11-26 Target positioning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110889873A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111413597A (en) * 2020-03-31 2020-07-14 北方夜视技术股份有限公司 Ultraviolet, infrared and visible light integrated high-voltage power transformation equipment detection method
CN111862758A (en) * 2020-09-02 2020-10-30 思迈(青岛)防护科技有限公司 Cardio-pulmonary resuscitation training and checking system and method based on artificial intelligence
CN111951332A (en) * 2020-07-20 2020-11-17 燕山大学 Glasses design method based on sight estimation and binocular depth estimation and glasses thereof
CN113658274A (en) * 2021-08-23 2021-11-16 海南大学 Individual spacing automatic calculation method for primate species behavior analysis
CN113870647A (en) * 2021-11-19 2021-12-31 山西宁志科技有限公司 Teaching training platform of visual identification system
CN115144879A (en) * 2022-07-01 2022-10-04 燕山大学 Multi-machine multi-target dynamic positioning system and method
CN115950436A (en) * 2023-03-13 2023-04-11 南京汽车人信息技术有限公司 Method and system for positioning moving object in given space and storage medium
CN116309849A (en) * 2023-05-17 2023-06-23 新乡学院 Crane positioning method based on visual radar

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103337094A (en) * 2013-06-14 2013-10-02 西安工业大学 Method for realizing three-dimensional reconstruction of movement by using binocular camera
CN103868460A (en) * 2014-03-13 2014-06-18 桂林电子科技大学 Parallax optimization algorithm-based binocular stereo vision automatic measurement method
CN106910222A (en) * 2017-02-15 2017-06-30 中国科学院半导体研究所 Face three-dimensional rebuilding method based on binocular stereo vision
CN108491810A (en) * 2018-03-28 2018-09-04 武汉大学 Vehicle limit for height method and system based on background modeling and binocular vision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103337094A (en) * 2013-06-14 2013-10-02 西安工业大学 Method for realizing three-dimensional reconstruction of movement by using binocular camera
CN103868460A (en) * 2014-03-13 2014-06-18 桂林电子科技大学 Parallax optimization algorithm-based binocular stereo vision automatic measurement method
CN106910222A (en) * 2017-02-15 2017-06-30 中国科学院半导体研究所 Face three-dimensional rebuilding method based on binocular stereo vision
CN108491810A (en) * 2018-03-28 2018-09-04 武汉大学 Vehicle limit for height method and system based on background modeling and binocular vision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田萱等: "《基于深度学习的图像语义分割技术》", 31 May 2019 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111413597A (en) * 2020-03-31 2020-07-14 北方夜视技术股份有限公司 Ultraviolet, infrared and visible light integrated high-voltage power transformation equipment detection method
CN111413597B (en) * 2020-03-31 2022-02-15 北方夜视技术股份有限公司 Ultraviolet, infrared and visible light integrated high-voltage power transformation equipment detection method
CN111951332A (en) * 2020-07-20 2020-11-17 燕山大学 Glasses design method based on sight estimation and binocular depth estimation and glasses thereof
CN111951332B (en) * 2020-07-20 2022-07-19 燕山大学 Glasses design method based on sight estimation and binocular depth estimation and glasses thereof
CN111862758A (en) * 2020-09-02 2020-10-30 思迈(青岛)防护科技有限公司 Cardio-pulmonary resuscitation training and checking system and method based on artificial intelligence
CN113658274A (en) * 2021-08-23 2021-11-16 海南大学 Individual spacing automatic calculation method for primate species behavior analysis
CN113658274B (en) * 2021-08-23 2023-11-28 海南大学 Automatic individual spacing calculation method for primate population behavior analysis
CN113870647A (en) * 2021-11-19 2021-12-31 山西宁志科技有限公司 Teaching training platform of visual identification system
CN115144879A (en) * 2022-07-01 2022-10-04 燕山大学 Multi-machine multi-target dynamic positioning system and method
CN115950436A (en) * 2023-03-13 2023-04-11 南京汽车人信息技术有限公司 Method and system for positioning moving object in given space and storage medium
CN116309849A (en) * 2023-05-17 2023-06-23 新乡学院 Crane positioning method based on visual radar
CN116309849B (en) * 2023-05-17 2023-08-25 新乡学院 Crane positioning method based on visual radar

Similar Documents

Publication Publication Date Title
CN110889873A (en) Target positioning method and device, electronic equipment and storage medium
CN110296691B (en) IMU calibration-fused binocular stereo vision measurement method and system
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN109360240B (en) Small unmanned aerial vehicle positioning method based on binocular vision
CN109993793B (en) Visual positioning method and device
CN110044300B (en) Amphibious three-dimensional vision detection device and detection method based on laser
CN110176032B (en) Three-dimensional reconstruction method and device
CN107886477A (en) Unmanned neutral body vision merges antidote with low line beam laser radar
CN106408601B (en) A kind of binocular fusion localization method and device based on GPS
CN104376552A (en) Virtual-real registering algorithm of 3D model and two-dimensional image
CN202362833U (en) Binocular stereo vision-based three-dimensional reconstruction device of moving vehicle
CN111862180B (en) Camera set pose acquisition method and device, storage medium and electronic equipment
CN112837207B (en) Panoramic depth measurement method, four-eye fisheye camera and binocular fisheye camera
CN102072706A (en) Multi-camera positioning and tracking method and system
CN109425348A (en) A kind of while positioning and the method and apparatus for building figure
CN108764080B (en) Unmanned aerial vehicle visual obstacle avoidance method based on point cloud space binarization
CN111127540B (en) Automatic distance measurement method and system for three-dimensional virtual space
Jin et al. An Indoor Location‐Based Positioning System Using Stereo Vision with the Drone Camera
CN116128966A (en) Semantic positioning method based on environmental object
CN112330747B (en) Multi-sensor combined detection and display method based on unmanned aerial vehicle platform
CN210986289U (en) Four-eye fisheye camera and binocular fisheye camera
Ye et al. A calibration trilogy of monocular-vision-based aircraft boresight system
CN117115271A (en) Binocular camera external parameter self-calibration method and system in unmanned aerial vehicle flight process
Wu et al. Passive ranging based on planar homography in a monocular vision system
CN113034615B (en) Equipment calibration method and related device for multi-source data fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200317

RJ01 Rejection of invention patent application after publication