CN117315033B

CN117315033B - Neural network-based identification positioning method and system and storage medium

Info

Publication number: CN117315033B
Application number: CN202311608176.0A
Authority: CN
Inventors: 高于科; 张腾宇; 叶杨笙; 赵越
Original assignee: Shanghai Xiangong Intelligent Technology Co ltd
Current assignee: Shanghai Xiangong Intelligent Technology Co ltd
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-03-19
Anticipated expiration: 2043-11-29
Also published as: CN117315033A

Abstract

The invention provides a neural network-based identification positioning method, a neural network-based identification positioning system and a storage medium, wherein the method comprises the following steps: collecting a reference image ref of the identification object, and establishing a mask; collecting binocular images aimL and aimR containing the identification objects; taking reference images ref and aimL as input, and respectively acquiring feature points and descriptor subsets thereof through image feature extraction processingFeature point setAnd respectively obtaining the characteristic point matching results of ref and aimL through characteristic matching processingThe method comprises the steps of carrying out a first treatment on the surface of the RecordingThe feature point set in the mask area isA kind of electronic deviceAt the position ofThe set of the corresponding matching characteristic points isThe method comprises the steps of carrying out a first treatment on the surface of the Binocular stereo correction is carried out on the aimL and the aimR to obtain the penetrationLooking at the transformation matrix Q to perform binocular stereo matching to obtain depth value z of the matched pixel points, thereby obtaining corresponding pixelsSpatial location of all feature point pixels in (3)Thereby, the recognition target is positioned by the abstract characteristic points, and the method has more universality.

Description

Neural network-based identification positioning method and system and storage medium

Technical Field

The present invention relates to a positioning technology, and in particular, to a positioning method, a system, and a storage medium for performing target feature matching recognition based on a neural network.

Background

At present, two schemes are mainly adopted to perform visual identification and object positioning by using a camera, one of which is mainly to train a neural network by using a marked data set, so as to identify a target object in an image by using a pre-trained neural network model; and secondly, carrying out three-dimensional reconstruction on the pixel points of the image to obtain the space coordinate positions of the pixel points of the target object in the image so as to realize positioning of the object.

For the scheme one, because the trained neural network model is influenced by the data set and the annotation, the process of providing the data set, annotating and training is required to be carried out on different target objects, and the change factors such as illumination, visual angle and the like are all problems to be considered in the annotating and training process, so that the identification and positioning scheme is difficult to generalize. Furthermore, the resources consumed by the neural network may also increase geometrically from identifying only objects to identifying the target object region. For the second scheme, at least the pixel area of the target object is required to be identified, and the accuracy of the identified pixel area can also have a great influence on the positioning of the object.

Disclosure of Invention

Therefore, the main purpose of the invention is to provide a neural network-based recognition positioning method, a neural network-based recognition positioning system and a storage medium, so that recognition targets can be positioned through abstract feature points, and interference caused by illumination and visual angle transformation is overcome, and the neural network-based recognition positioning method is more universal.

In order to achieve the above object, according to one aspect of the present invention, there is provided a neural network-based identification and positioning method, including the steps of:

step S100, collecting a reference image ref of the identified object, and establishing a mask corresponding to the identified object; collecting binocular images aimL and aimR containing the identification objects;

step S200, extracting a neural network model through image features, respectively reasoning a reference image ref, and obtaining feature points and a description subset thereofAnd deducing one of the target images aimL to obtain a feature point set；

Step S300 willAndinputting a feature matching neural network model for reasoning, and respectively obtaining the feature point matching results of ref and aimLThe method comprises the steps of carrying out a first treatment on the surface of the And recordThe set of feature points in the identifier mask region isA kind of electronic deviceAt the position ofThe corresponding set of the matching characteristic points is；

Step S400 performs binocular stereo correction on aimL and aimR to obtain a perspective transformation matrix Q for binocular stereo matching to obtain a depth value Z of a matched pixel point, thereby obtaining a corresponding pixel pointSpatial location of all feature point pixels in (3)。

In a possible preferred embodiment, the neural network-based identification positioning method further includes:

step S500 when the binocular stereo correction in step S400 fails or the binocular stereo matching result fails, calculating the matched feature points in the binocular image aimL, aimR、And after normalization processing, calculating a corresponding depth value according to the parallax and the camera baseline distance so as to recover the spatial position information of the feature point pixels.

In a possibly preferred embodiment, wherein matched feature points in the binocular image are calculated in step S500、The method comprises the following steps:

step S510, extracting neural network models from the ai mL and ai mR input image features, and respectively obtaining feature point sets of the first and second images containing the identification object、Inputting the feature matching neural network model to obtain feature point matching results，And recordIs at the middle positionThe feature point set in the minimum range of the pixel coordinates of the medium feature point is as followsAt the same time will be withMatched with each otherThe characteristic points of (a) are marked as。

In a possibly preferred embodiment, wherein in step S500 will be、The normalization processing step comprises the following steps:

step S520 willAndmedium feature point pixel coordinatesAndprojecting to the normalization plane, and recording the characteristic points corresponding to the normalization plane as、；

Step S530 is provided with、And (3) calculating:

、/>

wherein the method comprises the steps of，，，The focal lengths of the first eye camera and the second eye camera on the imaging plane are respectively the x axis and the y axis,，、、and imaging optical center coordinates of the first and second eye cameras on x and y axes on a pixel coordinate system respectively.

In a possibly preferred embodiment, the step of calculating the corresponding depth value according to the parallax and the camera baseline distance in the step S500 to recover the spatial position information of the feature point pixels includes:

step S540 calculationCorresponding depth2, wherein the parallax is，Is the camera baseline distance;

the spatial position information of the feature point pixels after the recovery in step S550 is

。

In a possibly preferred embodiment, the step of obtaining the perspective transformation matrix Q in step S400 includes:

step S410 horizontally sets up the binocular camera, and calculates:

wherein is provided with，、、The first and second eye cameras respectively image the optical center coordinates on the x and y axes on the pixel coordinate system, so，For x-axis distance in the first order camera coordinate system,is the focal length of the camera.

In a possibly preferred embodiment, the step of performing binocular stereo matching in step S400 includes:

step S420, screening out the pixels which are matched with each other in the aimL and aimR after binocular stereo correctionAnd ；

step S430 calculates:

to obtain the spatial position of the pixel pointWherein the parallax isDepth value z=of corresponding pixel pointAnd/distarty, W is a proportionality coefficient.

In order to achieve the above object, corresponding to the above method, according to a second aspect of the present invention, there is also provided a neural network-based identification and location system, comprising:

a storage unit, configured to store a program including the neural network-based identification positioning method steps in any one of the above examples, for the acquisition unit, and for the processing unit to timely perform scheduling;

the acquisition unit is used for controlling the binocular camera to acquire binocular images aimL and aimR containing the identification object and a reference image ref of the identification object;

the processing unit is used for establishing a mask corresponding to the identification object according to the reference image ref of the identification object, respectively reasoning the reference image ref through the neural network model SuperPoint, and obtaining feature points and a description subset thereofAnd deducing one of the target images aimL to obtain a feature point setThe method comprises the steps of carrying out a first treatment on the surface of the Then willAndinputting a neural network model LightGlue for reasoning, and respectively obtaining the characteristic point matching results of ref and aimLThe method comprises the steps of carrying out a first treatment on the surface of the And recordThe set of feature points in the identifier mask region isA kind of electronic deviceAt the position ofThe corresponding set of the matching characteristic points isThe method comprises the steps of carrying out a first treatment on the surface of the Then carrying out binocular stereo correction on the aimL and aimR to obtain a perspective transformation matrix Q so as to carry out binocular stereo matching to obtain a depth value Z of a matched pixel point, thereby obtaining a corresponding pixel pointSpatial location of all feature point pixels in (3)。

In a possible preferred embodiment, the neural network-based identification positioning system, wherein the processing unit is further configured to determine that, when the binocular stereo correction fails or the binocular stereo matching results fail, calculate the matched feature points in the binocular images aimL, aimR、And after normalization processing, calculating a corresponding depth value according to the parallax and the camera baseline distance so as to recover the spatial position information of the feature point pixels.

In order to achieve the above object, corresponding to the above method, according to a third aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the neural network based identification positioning method as described in any of the above examples.

The neural network-based identification positioning method, system and storage medium have the advantages that the identification object is skillfully abstracted into the identification characteristic point problem in design, so that a reference image can be acquired in advance, a target object area is marked, target detection is realized, and the target is positioned through the characteristic point position information, and therefore, compared with the conventional target detection positioning scheme, the abstracted characteristic point detection has more universality and the calculation force requirement is more economical. In addition, the feature point descriptors generated by the feature point recognition scheme have better adaptability to homography transformation, and can better overcome the interference caused by illumination variation and visual angle transformation, so that the stability and reliability of the recognition positioning result are improved as a whole.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of steps of a neural network-based identification positioning method of the present invention;

FIG. 2 is a logic diagram of a neural network-based identification positioning method of the present invention;

FIG. 3 is a schematic diagram of an exemplary acquired identifier reference image in a neural network-based identification positioning method of the present invention;

FIG. 4 is a schematic diagram of a mask of an exemplary recognition object in the neural network-based recognition positioning method of the present invention;

FIG. 5 is a schematic diagram of example reference image (left) and left eye image (right) matching points in the neural network-based identification positioning method of the present invention;

FIG. 6 is a schematic diagram of an example pallet region matching result graph in the neural network-based recognition positioning method of the present invention;

FIG. 7 is a diagram illustrating an example binocular pallet region matching result in a neural network based recognition positioning method of the present invention;

FIG. 8 is a schematic diagram of an example binocular stereo matching point cloud (left) and pallet point cloud (right) in a neural network based identification positioning method of the present invention;

fig. 9 is a schematic structural diagram of the neural network-based identification and positioning system of the present invention.

Detailed Description

In order that those skilled in the art can better understand the technical solutions of the present invention, the following description will clearly and completely describe the specific technical solutions of the present invention in conjunction with the embodiments to help those skilled in the art to further understand the present invention. It will be apparent that the embodiments described herein are merely some, but not all embodiments of the invention. It should be noted that embodiments and features of embodiments in this application may be combined with each other by those of ordinary skill in the art without departing from the inventive concept and conflict. All other embodiments, which are derived from the embodiments herein without creative effort for a person skilled in the art, shall fall within the disclosure and the protection scope of the present invention.

Furthermore, the terms "first," "second," "S100," "S200," and the like in the description and in the claims and drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those described herein. Also, the terms "comprising" and "having" and any variations thereof herein are intended to cover a non-exclusive inclusion. Unless specifically stated or limited otherwise, the terms "disposed," "configured," "mounted," "connected," "coupled" and "connected" are to be construed broadly, e.g., as being either permanently connected, removably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this case will be understood by those skilled in the art in view of the specific circumstances and in combination with the prior art.

In order to locate an identification target through abstract feature points, referring to fig. 1 to 8, the invention provides an identification and location method based on a neural network, which comprises the following steps:

step S100, collecting a reference image ref of the identified object, and establishing a mask corresponding to the identified object; binocular images aimL, aimR containing the recognition object are acquired.

In particular, the present example is shown in fig. 3-8, and is illustrative of identifying and locating pallets in a scene,

for identifying the pallet, firstly, as shown in fig. 3, a color image containing the pallet to be identified is collected and marked as a reference image ref, wherein the pallet in the reference image should be well observed and not blocked, secondly, a template is needed to be provided for the pallet image, a pallet region needs to be marked in the template, the image is called a mask, the size of the mask image pixel of the pallet is consistent with that of the original reference image, and the non-pallet region and the pallet region are represented by different pixel values as shown in fig. 4.

After the above-mentioned work is completed, for the pallet to be detected, a set of images including the pallet to be detected needs to be acquired by using the binocular camera calibrated with the internal reference and the external reference, the images are called binocular images, as shown in fig. 5, the binocular images include images shot by the first-order camera and the second-order camera, in this example, the first-order images are called aimL, the second-order images are called aimR, where it is to be noted that the first-order images and the second-order images are called as the first-order images in this example only to distinguish the intention of binocular angle shooting, and the first-order and the second-order images are not specified to correspond to the meaning of left-order and right-order respectively, so according to different embodiments, it is understood that the left-order can also be right-order, once the first-order images are determined to be in the direction, the second-order directions can be seen, and the first-order and second-order directions can be mutually converted without being limited when implemented.

In fig. 5 to 7, for ease of understanding, the first eye is the left eye image aimL, and the second eye is the right eye image aimR.

Step S200, extracting a neural network model through image features, respectively reasoning a reference image ref, and obtaining feature points and a description subset thereofAnd deducing one of the target images aimL to obtain a feature point set。

Specifically, in the prior art, there are two methods of manual and machine learning to obtain feature point recognition and descriptor construction of an image. The manual method is to manually design features, obtain stable points in the image, and construct identity features of the points, namely descriptors, and the method is customized for specific applications, so that generalization capability and robustness are poor; the machine learning method uses machine learning technology, such as a deep learning method, relies on data driving to extract features, and deep and data set specific feature representation can be obtained according to learning of a large number of samples, so that the expression of the data set is more efficient and accurate.

The SuperPoint neural network model is an image feature extractor based on a machine learning method, and aims to detect key points and corresponding descriptors from an image, generalize the descriptor capability of detecting angular points, extracting the angular points and constructing illumination adaptability. In this example, therefore, the SuperPoint neural network model is preferably used to infer feature points and feature point descriptors in the image.

Firstly, inputting a reference image ref into a pretrained SuperPoint neural network model for reasoning to obtain feature points, feature point scores and corresponding feature point descriptorsThen, similarly, inputting the left-eye image aimL into SuperPoint for reasoning to obtain a left-eye characteristic point set。

Wherein each feature point contains its pixel coordinates on the imageU, v represent the values of the points on the x, y axes of the pixel coordinate system of the image, respectively, and n feature points are represented asFor each feature point, a feature point quality score is preferably calculated to obtain a floating point scoreGenerating a descriptor of 256-dimensional floating point number, and representing the descriptors corresponding to the n feature points as。

Step S300 willAndinputting a feature matching neural network model for reasoning, and respectively obtaining the feature point matching results of ref and aimLThe method comprises the steps of carrying out a first treatment on the surface of the And recordThe set of feature points in the identifier mask region isA kind of electronic deviceAt the position ofThe corresponding set of the matching characteristic points is。

Specifically, in this example, the feature matching neural network model may employ SuperGlue, SGMNet or LightGlue, where SuperGlue is a feature matching algorithm, based on a graph-rolling neural network architecture, aimed at predicting the partial matching relationship between the local feature point sets extracted from images a and B. The SuperGlue can be used for realizing characteristic points obtained based on SuperPoint to carry out matching work. While LightGlue was developed based on SuperGlue, lightGlue is superior to SuperGlue and SGMNet in accuracy and comparable in recall; the LightGlue is obviously superior to the SuperGlue and SGMNet of the existing method in SuperPoint characteristic, and can greatly improve the accuracy of matching with DISK local characteristic. And better correspondence and more accurate relative pose are produced, and about 30% of inference time is reduced.

Thus in this example, it is preferable to use the LightGlue neural network model for the data obtained in step S200Andreasoning to obtain) WhereinThe m-dimensional vectors of the two columns respectively represent two groups of feature points successfully matched，According to the index of the index, as shown in FIG. 5, the matching result deduced based on the neural network model can be obtained，。

As shown in fig. 6, the matching result for the reference imageNeed to judgeIs defined by the coordinates of each feature point pixelWhether the characteristic points are in the pallet area in the mask or not to record the characteristic point set in the mask area as，The corresponding matching characteristic point set in the left eye image is。

Step S400 is to aimL,aimR performs binocular stereo correction to obtain a perspective transformation matrix Q for binocular stereo matching to obtain a depth value Z of a matched pixel point, thereby obtaining a corresponding pixel pointSpatial location of all feature point pixels in (3)。

Specifically, step S400 mainly includes the steps of reducing two-dimensional coordinates of the matching result screened in step S300 to a three-dimensional space through binocular external parameters to estimate a distance and a position of the matching result, and the main steps include:

step S410, binocular stereo correction: taking a binocular camera which is horizontally placed as an example, the binocular camera shoots a picture of a target to be identified, the picture is divided into left and right eye images, and the left eye image and the right eye image are subjected to binocular stereo correction. Through binocular stereo correction, matching in the image is reduced from two-dimensional search on the x, y axes to one-dimensional search on the x axis, and no matching points can be filtered out. After binocular stereo correction, a 4 x 4 perspective transformation matrix Q can be obtained. For a horizontally placed binocular camera, where Q is denoted as:

wherein is provided with，、、The first and second eye cameras respectively image optical center coordinates in x and y axes on a pixel coordinate system, and after binocular stereo correction, the y axes of the pixel coordinate systems of the left and right eye cameras are aligned, soFor a binocular camera, the baseline represents the distance between the imaging centers of the two cameras, i.eHere, whereRepresents a set of real numbers,、is divided into x, y and z axis distances under a left eye camera coordinate system,is the focal length of the camera.

After that, the corrected binocular camera image can be subjected to pixel point matching line by line pixel by pixel in a binocular stereo matching manner, as shown in fig. 7, and the point with successful matching can acquire a corresponding depth value Z.

Step S420 binocular stereo matching: for the matched pixel points, the pixel coordinates of the pixel points on the left eye image aimL and the right eye image aimR are respectivelyAnd；

step S430 calculates parallaxAccording toThe pixel coordinates of all the feature points in the array are obtained, and the depth value Z = of the corresponding pixel point is obtainedAnd/distarty, the scaling factor W is set to 1, which is valid for depth values (z>0) According to the pixel coordinate points of (2)

Finally obtaining the space position of the pixel point. So far, according to the spatial position information of the identified characteristic points on the pallet, the position information of the pallet relative to the camera can be returned.

On the other hand, in general, binocular stereo matching consumes less power resources than deep learning, but if there is a small common region in a binocular image, there is a problem that binocular stereo correction fails or the result of binocular stereo matching fails, and therefore, in the binocular stereo correction fails or binocular stereo matching cannot be recoveredIn the case of corresponding to the spatial position information of the pixel point, as shown in fig. 2, this example further includes the steps of:

Wherein in step S500, matched feature points in the binocular image are calculated、The method comprises the following steps:

step S510 is to input aimL and aimR into SuperPoint to obtain the left and right images containing the identification objectCondition point set、Inputting LightGlue to obtain feature point matching result，And recordIs at the middle positionIn the method, the feature point set in the minimum range of the feature point pixel coordinates (min_range is preferably set to be 10) is as followsAt the same time will be withMatched with each otherThe characteristic points of (a) are marked as。

Wherein in step S500、The normalization processing step comprises the following steps:

Step S530 is provided with、And (3) calculating:

、/>

wherein the method comprises the steps of，，，The focal lengths of the left eye camera and the right eye camera on the imaging plane are respectively the x axis and the y axis,，、、and imaging optical center coordinates of the left eye camera and the right eye camera on an x axis and a y axis on a pixel coordinate system respectively.

In step S500, the step of calculating the corresponding depth value according to the parallax and the camera baseline distance to recover the spatial position information of the feature point pixels includes:

step S540 left eye feature pointThe corresponding depth Z can be estimated as2, wherein the parallax isI.e.Is used for the two norms of (2),for camera baseline distance；

。

So far, according to the spatial position information of the identified characteristic points on the pallet, the position information of the pallet relative to the camera can be returned.

In addition, it should be noted that, although the pallet is taken as an example detection target, a person skilled in the art may adjust the example process according to the concept of the example to detect other objects, and only mark the object to be identified in the reference image as a template, so the invention is not limited to the kind of detection target, and any alternative examples of other detection targets made on the premise of using the example concept of the invention are within the scope of the disclosure of the invention.

Meanwhile, in the present example, although the characteristic points are generated by using the neural network model SuperPoint model in an optimal manner, and the characteristic points are matched by using the LightGlue, it is understood that other neural network models that can be adapted to implement the processes exist in the prior art, or one neural network model can be used to complete the whole matching process of the two steps, so the present invention is not limited to the types of the neural network models used, and any alternative examples of other neural network models made on the premise of using the concept of the present invention are within the scope of the disclosure of the present invention.

On the other hand, as shown in fig. 9, the present invention further provides a neural network-based identification positioning system, corresponding to the above method example, which includes:

the processing unit is used for establishing a mask corresponding to the identification object according to the reference image ref of the identification object, respectively reasoning the reference image ref through the neural network model SuperPoint, and obtaining feature points and a description subset thereofAnd deducing one of the target images aimL to obtain a feature point setThe method comprises the steps of carrying out a first treatment on the surface of the Then willAndinput neural network model LightGlue performs reasoning to respectively obtain the feature point matching results of ref and aimLThe method comprises the steps of carrying out a first treatment on the surface of the And recordThe set of feature points in the identifier mask region isA kind of electronic deviceAt the position ofThe corresponding set of the matching characteristic points isThe method comprises the steps of carrying out a first treatment on the surface of the Then carrying out binocular stereo correction on the aimL and aimR to obtain a perspective transformation matrix Q so as to carry out binocular stereo matching to obtain a depth value Z of a matched pixel point, thereby obtaining a corresponding pixel pointSpatial location of all feature point pixels in (3)。

Further, in order to solve the problem that the binocular stereo correction fails or the binocular stereo matching cannot recover the spatial position information of the corresponding pixel points, the processing unit is further configured to determine that when the binocular stereo correction fails or the binocular stereo matching results fail, calculate the matched feature points in the binocular images aimL, aimR、And after normalization processing, calculating a corresponding depth value according to the parallax and the camera baseline distance so as to recover the spatial position information of the feature point pixels.

Further, in correspondence with the above method, the present invention also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the neural network based identification positioning method as described in any of the above examples.

In summary, the neural network-based recognition positioning method, system and storage medium have the beneficial effects that the recognition object is skillfully abstracted into the recognition characteristic point problem in design, so that the reference image can be acquired in advance, the target object area is marked, the target detection is realized, and the target is positioned through the characteristic point position information, therefore, compared with the conventional target detection positioning scheme, the abstracted characteristic point detection has more universality, and the calculation force requirement is more economical. In addition, the feature point descriptors generated by the feature point recognition scheme have better adaptability to homography transformation, and can better overcome the interference caused by illumination variation and visual angle transformation, so that the stability and reliability of the recognition positioning result are improved as a whole.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is to be limited only by the following claims and their full scope and equivalents, and any modifications, equivalents, improvements, etc., which fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

It will be appreciated by those skilled in the art that the system, apparatus and their respective modules provided by the present invention may be implemented entirely by logic programming method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., except for implementing the system, apparatus and their respective modules provided by the present invention in a purely computer readable program code. Therefore, the system, the apparatus, and the respective modules thereof provided by the present invention may be regarded as one hardware component, and the modules included therein for implementing various programs may also be regarded as structures within the hardware component; modules for implementing various functions may also be regarded as being either software programs for implementing the methods or structures within hardware components.

Furthermore, all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program, where the program is stored in a storage medium and includes several instructions for causing a single-chip microcomputer, chip or processor (processor) to perform all or part of the steps in the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In addition, any combination of various embodiments of the present invention may be performed, so long as the concept of the embodiments of the present invention is not violated, and the disclosure of the embodiments of the present invention should also be considered.

Claims

1. The neural network-based identification positioning method comprises the following steps:

step S200, extracting a neural network model through image features, respectively reasoning a reference image ref, and obtaining feature points and a description subset thereofAnd reasoning one of the target images aimL to obtain a feature point set +.>；

Step S300 willAnd->Inputting a feature matching neural network model for reasoning, and respectively obtaining the feature point matching results of ref and aimL +.>The method comprises the steps of carrying out a first treatment on the surface of the And record->The set of feature points in the identifier mask region is +.>And->At->The set of corresponding matching feature points in (a) is +.>；

2. The neural network-based identification positioning method of claim 1, further comprising:

step S500 when the binocular stereo correction fails or the binocular stereo matching result fails in the step S400, calculating matched characteristic points in the binocular image aimL and aimR、/>And after normalization processing, calculating a corresponding depth value according to the parallax and the camera baseline distance so as to recover the spatial position information of the feature point pixels.

3. The neural network-based identification positioning method according to claim 2, wherein the matched feature points in the binocular image are calculated in step S500、/>The method comprises the following steps:

step S510, extracting neural network models from the ai mL and ai mR input image features, and respectively obtaining feature point sets of the first and second images containing the identification object、/>Inputting the feature matching neural network model to obtain feature point matching results，/>And record +.>Is at->The feature point set in the minimum range of the pixel coordinates of the middle feature point is +.>At the same time will be->Matched +.>The characteristic point of (a) is marked as->。

4. The neural network-based identification positioning method as claimed in claim 2, wherein in step S500、/>The normalization processing step comprises the following steps:

step S520 willAnd->Middle feature point pixel coordinates +.>And->Projecting to the normalization plane, and recording the characteristic point of the corresponding normalization plane as +.>、/>；

Step S530 is provided with、/>And (3) calculating:

、/>，

wherein the method comprises the steps of，/>，/>，/>Focal lengths of the first-eye camera and the second-eye camera on an imaging plane in an x axis and a y axis respectively, < >>，、/>、/>Respectively is a first the second camera is x on the pixel coordinate system,The y-axis images the optical center coordinates.

5. The neural network-based recognition positioning method of claim 4, wherein the step of calculating the corresponding depth values according to the parallax and the camera baseline distance to recover the spatial position information of the feature point pixels in step S500 comprises:

step S540 calculationCorresponding depth->2, wherein parallax->，Is the camera baseline distance;

the spatial position information of the feature point pixels after the recovery in step S550 is。

6. The neural network-based recognition positioning method of claim 1, wherein the step of acquiring the perspective transformation matrix Q in step S400 includes:

step S410 horizontally sets up the binocular camera, and calculates:

，

wherein is provided with，/>、/>、/>The first and second eye cameras respectively image the optical center coordinates of the x and y axes on the pixel coordinate system, so +.>，/>For x-axis distance in the first eye camera coordinate system +.>Is the focal length of the camera.

7. The neural network-based recognition positioning method of claim 6, wherein the step of performing binocular stereo matching in step S400 includes:

step S430 calculates:

，

to obtain the spatial position of the pixel pointWherein parallax->Depth value z= = -of corresponding pixel point>And/distarty, W is a proportionality coefficient.

8. A neural network-based identification positioning system, comprising:

a storage unit, configured to store a program including the neural network-based identification positioning method steps as set forth in any one of claims 1 to 7, for the acquisition unit, and for the processing unit to timely perform the scheduling;

the processing unit is used for establishing a mask corresponding to the identification object according to the reference image ref of the identification object, respectively reasoning the reference image ref through the neural network model SuperPoint, and obtaining feature points and a description subset thereofAnd reasoning one of the target images aimL to obtain a feature point set +.>The method comprises the steps of carrying out a first treatment on the surface of the Then will->And->Inputting a neural network model LightGlue for reasoning, and respectively obtaining characteristic point matching results of ref and aimL +.>The method comprises the steps of carrying out a first treatment on the surface of the And record->The set of feature points in the identifier mask region is +.>And->At->The set of corresponding matching feature points in (a) is +.>The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out binocular stereo correction on the aimL and aimR to obtain a perspective transformation matrix Q, carrying out binocular stereo matching to obtain a depth value Z of a matched pixel point, and obtaining corresponding +.>Spatial positions of all feature point pixels +.>。

9. The neural network-based recognition positioning system of claim 8, wherein the processing unit is further configured to determine that the feature points of the match in the binocular images aimL, aimR are calculated when the binocular stereo correction fails or the binocular stereo matching fails、/>And after normalization processing, calculating a corresponding depth value according to the parallax and the camera baseline distance so as to recover the spatial position information of the feature point pixels.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the neural network based identification positioning method of any of claims 1 to 7.