CN112529960A

CN112529960A - Target object positioning method and device, processor and electronic device

Info

Publication number: CN112529960A
Application number: CN202011497459.9A
Authority: CN
Inventors: 王开创; 周海民; 吴崇龙
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Gree Intelligent Equipment Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Gree Intelligent Equipment Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-03-19

Abstract

The invention discloses a target object positioning method, a target object positioning device, a processor and an electronic device. Wherein, the method comprises the following steps: acquiring a camera image; acquiring a predicted position of a target object in a camera image by using a target detection mode based on deep learning; and based on the predicted position, carrying out three-dimensional positioning on the target object in a binocular vision positioning mode. The invention solves the technical problem of inaccurate positioning in the existing three-dimensional positioning method.

Description

Target object positioning method and device, processor and electronic device

Technical Field

The invention relates to the field of image processing, in particular to a target object positioning method, a target object positioning device, a target object positioning processor and an electronic device.

Background

The machine vision technology is a cross discipline which relates to a plurality of fields such as artificial intelligence, neurobiology, psychobiology, computer science, image processing, pattern recognition and the like. Among other things, machine vision techniques can be used for actual inspection, measurement and control of objects to be inspected.

Three-dimensional localization of a target object is one of the important parts of the application of vision technology, wherein three-dimensional localization includes detection of a target object. However, in the conventional target detection algorithm, a target is detected according to artificially designed features, after an image of a target object is acquired, the image is subjected to region selection, then features are extracted, and finally the features are classified through a classifier. In the above process, the feature extraction is performed by using a manually designed feature extractor.

However, the traditional target detection algorithm has poor stability, is only suitable for scenes with simple backgrounds, and if illumination or noise occurs in the application scenes, the detection result is affected, and even target detection fails.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a target object positioning method, a target object positioning device, a processor and an electronic device, and at least solves the technical problem of inaccurate positioning of the existing three-dimensional positioning method.

According to an aspect of the embodiments of the present invention, there is provided a method for positioning a target object, including: acquiring a camera image; acquiring a predicted position of a target object in a camera image by using a target detection mode based on deep learning; and based on the predicted position, carrying out three-dimensional positioning on the target object in a binocular vision positioning mode.

Further, the target detection mode based on deep learning comprises the following steps: the method for positioning the target object comprises the following steps of extracting a network model, generating a network model in a region and classifying and regressing the network model, wherein the method for positioning the target object further comprises the following steps: extracting a feature map from the camera image using a feature extraction network model, wherein the feature extraction network model comprises: a plurality of feature extraction layers, each of the plurality of feature extraction layers comprising: the number of the convolutional layers and the pooling layers contained in each feature extraction layer is different from that of the pooling layers; generating an interested area in the feature map by using the area generation network model; and performing region-of-interest pooling operation on the feature map and the region of interest by using a classification regression network model to generate feature map vectors, and performing feature integration on the feature map vectors to obtain the predicted position.

Further, the target object positioning method further includes: determining a left camera imaging plane and a right camera imaging plane of a binocular camera; and carrying out stereo matching on the left camera imaging plane and the right camera imaging plane based on the predicted positions to obtain a three-dimensional positioning result of the target object.

Further, the target object positioning method further includes: acquiring camera parameter information of a binocular camera, wherein the camera parameter information includes: the distance, the parallax, the focal length, the principal point coordinates of the imaging plane of the left camera and the principal point coordinates of the imaging plane of the right camera are measured; acquiring first coordinate information of a first projection position of the spatial coordinate information of the predicted position on the left camera imaging plane and second coordinate information of a second projection position of the predicted position on the right camera imaging plane; and calculating to obtain a three-dimensional positioning result by utilizing the space coordinate information, the first coordinate information, the second coordinate information and the camera parameter information.

Further, the target object positioning method further includes: calculating to obtain a first coordinate value on a first coordinate axis and a second coordinate value on a second coordinate axis by using the distance, the parallax, the principal point coordinate of the imaging plane of the left camera and the first coordinate information between the left camera and the right camera, or calculating to obtain a first coordinate value on the first coordinate axis and a second coordinate value on the second coordinate axis by using the distance, the parallax, the principal point coordinate of the imaging plane of the right camera and the second coordinate information between the left camera and the right camera, wherein the coordinate plane determined by the first coordinate axis and the second coordinate axis is parallel to the imaging plane of the left camera and the imaging plane of the right camera; calculating to obtain a third coordinate value on a third coordinate axis by using the distance, the parallax and the focal length between the left camera and the right camera, wherein the third coordinate axis is parallel to the optical axis of the left camera and the optical axis of the right camera; and determining a three-dimensional positioning result based on the first coordinate value, the second coordinate value and the third coordinate value.

Further, the target object positioning method further includes: and performing stereo calibration and alignment on the left camera imaging plane and the right camera imaging plane through camera calibration.

According to another aspect of the embodiments of the present invention, there is also provided a target object positioning apparatus, including: the first acquisition module is used for acquiring a camera image; the second acquisition module is used for acquiring the predicted position of the target object in the camera image by using a target detection mode based on deep learning; and the positioning module is used for carrying out three-dimensional positioning on the target object in a binocular vision positioning mode based on the predicted position.

According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned target object positioning method when running.

According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, wherein the program is configured to execute the above-mentioned target object positioning method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the above-mentioned target object positioning method.

In the embodiment of the invention, a mode of combining a depth learning technology and a binocular vision technology is adopted, after a camera image is obtained, a predicted position of a target object in the camera image is obtained through a target detection method based on depth learning, and the target object is positioned in three dimensions through a binocular vision positioning mode based on the predicted position.

In the process, the target object is roughly positioned in a deep learning target detection mode, and on the basis, the positioning result of the rough positioning is positioned again in a binocular vision positioning mode, so that an accurate positioning result is obtained. In addition, because the target object is positioned twice, the influence of illumination or noise on the positioning result in practical application can be avoided, and the accuracy of the positioning result is further ensured.

Therefore, the scheme provided by the application achieves the purpose of three-dimensional positioning of the target object, so that the technical effect of improving the accuracy of the positioning result of the target object is achieved, and the technical problem of inaccurate positioning in the existing three-dimensional positioning method is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method for locating a target object according to an embodiment of the present invention;

FIG. 2 is a block diagram of an alternative fast R-CNN target detection architecture in accordance with embodiments of the present invention;

fig. 3 is a schematic network structure diagram of an alternative VGG-16 network according to an embodiment of the present invention;

FIG. 4 is a flow chart of an alternative binocular visual positioning according to an embodiment of the present invention;

FIG. 5 is a schematic view of an alternative imaging model of a binocular camera according to an embodiment of the present invention;

FIG. 6 is a schematic illustration of an alternative three-dimensional orientation according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a target object positioning apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for locating a target object, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of a target object locating method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, a camera image is acquired.

In step S102, the camera image includes at least a target object. For example, in the detection of the tool, a plurality of tools may be photographed by a camera to obtain a camera image, that is, the camera image includes the plurality of tools, wherein the target object may be an abnormal tool having an abnormality. Then, the workpiece detection device analyzes the camera image to determine the position of the target object, and finally, the workpiece detection device may control the robot or the like to pick up the abnormal workpiece from the plurality of workpieces based on the position of the abnormal workpiece.

The camera image may be an image captured by a single camera or an image captured by a binocular camera. In addition, in the present application, a camera image may include a complex background, that is, the method provided by the present application can achieve accurate positioning of the position of the target object in the complex environment background.

And step S104, acquiring the predicted position of the target object in the camera image by using a target detection mode based on the deep learning.

In step S104, the target object may be detected by using a target detection algorithm based on deep learning, and the position (i.e., the predicted position) of the target object in the camera image is outlined, wherein the measured position is only the approximate position and not the precise position of the target object in the camera image.

It should be noted that the problem of poor stability of the target detection algorithm in a complex scene can be effectively solved through a deep learning target detection mode.

And S106, carrying out three-dimensional positioning on the target object in a binocular vision positioning mode based on the predicted position.

It is noted that, as can be seen from step S104, the position of the target object obtained in step S104 is not an accurate position, and the position of the target object may be further determined based on a binocular vision positioning method in order to obtain an accurate position. The binocular vision positioning method is a method for positioning by using two cameras, namely, two cameras fixed at different positions are used for shooting a target object, then coordinates of the target object on image planes of the two cameras are respectively obtained, and under the condition that the accurate relative positions of the two cameras are known, the coordinates of the target object in a coordinate system of a certain camera can be obtained by adopting a geometric method, namely, the position of the target object is determined.

Based on the solutions defined in steps S102 to S106, it can be known that, in the embodiment of the present invention, a depth learning technique and a binocular vision technique are combined, after a camera image is acquired, a predicted position of a target object in the camera image is acquired by a target detection method based on depth learning, and the target object is three-dimensionally positioned by a binocular vision positioning manner based on the predicted position.

It is easy to notice that, in the above process, the target object is roughly positioned by the target detection method of deep learning, and on this basis, the positioning result of the rough positioning is re-positioned by the binocular vision positioning method, so as to obtain an accurate positioning result. In addition, because the target object is positioned twice, the influence of illumination or noise on the positioning result in practical application can be avoided, and the accuracy of the positioning result is further ensured.

In an alternative embodiment, after the camera image is acquired, the position of the target object in the camera image may be detected based on a target detection manner of depth learning, wherein the target detection algorithm based on depth learning includes: the system comprises a feature extraction network model, a region generation network model and a classification regression network model. Optionally, the target detection algorithm based on deep learning may be, but is not limited to, fast R-CNN (fast target convolutional neural network), where fig. 2 is a schematic diagram of a fast R-CNN target detection framework. As shown in fig. 2, after the camera image is subjected to convolution layer processing, a feature map of the camera image is obtained, features are extracted, and after ROI (Region of Interest) pooling layer processing, the features are output to a classifier for classification.

Specifically, firstly, a feature extraction network model is used for extracting a feature map from a camera image, then an area generation network model is used for generating an interested area in the feature map, finally, a classification regression network model is used for conducting interested area pooling operation on the feature map and the interested area to generate feature map vectors, and feature integration is conducted on the feature map vectors to obtain a predicted position. Wherein, the feature extraction network model comprises: a plurality of feature extraction layers, each of the plurality of feature extraction layers comprising: the feature extraction layer comprises convolution layers and a pooling layer, and the number of the convolution layers and the number of the pooling layers contained in each feature extraction layer are different.

Optionally, in the feature extraction network model, the feature extraction may be performed using a VGG-16 network. For example, fig. 3 shows a schematic network structure diagram of an alternative VGG-16 network, as shown in fig. 3, the feature extraction part of the VGG-16 network includes a plurality of feature extraction layers, fig. 3 shows a scene including five feature extraction layers (each dashed box in fig. 3 represents one feature extraction layer), each feature extraction layer is composed of a convolution layer and a pooling layer, and each feature extraction layer includes convolution layers with different numbers of layers from the pooling layer, for example, in fig. 3, the first feature extraction layer includes two convolution layers with 3 × 3 and 64 channels and one pooling layer.

After the feature map is obtained, in the region-generating network model, the feature map generated in the feature extraction network model is used as an input of the region-generating network model, and a region of interest, i.e., an ROI region, is generated on the feature map. And then performing region-of-interest pooling operation on the feature map generated in the feature extraction network model and the region-of-interest generated by the region generation network model to generate feature map vectors, inputting the generated feature map vectors into two fully-connected layers for feature integration, and finally combining the two output layers for classification and regression to obtain the predicted position of the target object in the camera image.

It should be noted that, since the preset position obtained by detecting the camera image in the target detection method of the deep learning is not accurate, in order to improve the accuracy of the positioning, after the predicted position is obtained, the target object is three-dimensionally positioned in the binocular vision positioning method based on the predicted position. Specifically, a left camera imaging plane and a right camera imaging plane of the binocular camera are determined, and then stereo matching is performed on the left camera imaging plane and the right camera imaging plane based on the predicted positions to obtain a three-dimensional positioning result of the target object.

Optionally, fig. 4 shows a flow chart of an optional binocular vision positioning, and as can be seen from fig. 4, first, a camera image is obtained, then, the binocular camera is calibrated, then, the calibrated binocular camera is used to detect the target object, and finally, the three-dimensional positioning of the target object is realized through stereo matching.

Wherein fig. 5 shows an imaging model of a binocular camera, in fig. 5 a plane v₁O₁u₁For the left camera imaging plane, plane v₂O₂u₂For the right camera imaging plane, P (X)_P，Y_P，Z_P) Coordinates are located for the three dimensions of the target object. In fig. 5, a coordinate system O_c1X₁Y₁Z₁For the spatial coordinate system corresponding to the left camera, coordinate system O_c2X₂Y₂Z₂Is the corresponding space coordinate system of the right camera, C₁(u_c1，v_c1) Is O_c1The points being mapped on a plane v₁O₁u₁Coordinate of (5), C₂(u_c2，v_c2) Is O_c2The points being mapped on a plane v₂O₂u₂Coordinates of (5), in the same way, P₁(u₁，v₁) Is P (X)_P，Y_P，Z_P) The points being mapped on a plane v₁O₁u₁Coordinate of (5), P₂(u₂，v₂) Is P (X)_P，Y_P，Z_P) The points being mapped on a plane v₂O₂u₂And, in fig. 5, the left camera optical axis is Z₁Axis, right camera optical axis Z₂A shaft.

In an optional embodiment, in the process of performing stereo matching on the left camera imaging plane and the right camera imaging plane based on the predicted positions to obtain the three-dimensional positioning result, first camera parameter information of a binocular camera is obtained, then first coordinate information of a first projection position of spatial coordinate information of the predicted positions on the left camera imaging plane and second coordinate information of a second projection position of the predicted positions on the right camera imaging plane are obtained, and the three-dimensional positioning result is obtained by calculation using the spatial coordinate information, the first coordinate information, the second coordinate information and the camera parameter information. Wherein the camera parameter information includes: the distance between the left camera and the right camera, the parallax, the focal length, the principal point coordinates of the imaging plane of the left camera and the principal point coordinates of the imaging plane of the right camera. For example, the illustration of three-dimensional positioning shown in FIG. 6In the figure, B is the distance between the left and right cameras, f is the focal length, P (X)_P，Y_P，Z_P) Is the predicted coordinates of the target object. Wherein, P (X)_P，Y_P，Z_P) The first projection position on the left camera imaging plane is x in FIG. 6₁，P(X_P，Y_P，Z_P) The second projection position on the right camera imaging plane is x in fig. 6₂The first coordinate information corresponding to the first projection position is P in fig. 5₁(u₁，v₁) The second coordinate information corresponding to the second projection position is P in fig. 5₂(u₂，v₂). In addition, the principal point coordinate of the imaging plane of the left camera is C₁Coordinate (u) of_c1，v_c1) And the coordinate of the principal point of the imaging plane of the right camera is C₂Coordinate (u) of_c2，v_c2)。

Further, after camera parameter information, first coordinate information and second coordinate information of the binocular camera are obtained, a three-dimensional positioning result is obtained through calculation by utilizing the space coordinate information, the first coordinate information, the second coordinate information and the camera parameter information. Specifically, first, a first coordinate value on a first coordinate axis and a second coordinate value on a second coordinate axis are obtained by calculation using a distance, a parallax, a main point coordinate of a left camera imaging plane and first coordinate information between a left camera and a right camera, or a first coordinate value on the first coordinate axis and a second coordinate value on the second coordinate axis are obtained by calculation using a distance, a parallax, a main point coordinate of a right camera imaging plane and second coordinate information between a left camera and a right camera, then a third coordinate value on a third coordinate axis is obtained by calculation using a distance, a parallax and a focal length between the left camera and the right camera, and finally, a three-dimensional positioning result is determined based on the first coordinate value, the second coordinate value and the third coordinate value. The coordinate plane determined by the first coordinate axis and the second coordinate axis is parallel to the left camera imaging plane and the right camera imaging plane, and the third coordinate axis is parallel to the left camera optical axis and the right camera optical axis.

It should be noted that the first coordinate value and the second coordinate value can be calculated according to the coordinate of the principal point of the left camera imaging plane and the first coordinate information, and the first coordinate value and the second coordinate value can also be calculated according to the coordinate of the principal point of the right camera imaging plane and the second coordinate information.

In addition, it should be noted that the first coordinate axis is an X axis in a spatial coordinate system, and the second coordinate axis is a Y axis in the spatial coordinate system, for example, in fig. 5, X is₁Axis and Y₁The plane defined by the axes being parallel to the left and right camera imaging planes, and, likewise, X₂Axis and Y₂The plane defined by the axes is parallel to the left and right camera imaging planes.

Wherein the first coordinate value X_PSatisfies the following formula:

in the above formula, B is the distance between the left and right cameras, u₁Is first coordinate information, u_c1Is one of the principal point coordinates of the left camera imaging plane, and d is the parallax.

Second coordinate value Y_PSatisfies the following formula:

in the above formula, B is the distance between the left and right cameras, v₁Is second coordinate information, v_c1Is one of the principal point coordinates of the left camera imaging plane, and d is the parallax.

Third coordinate value Z_PSatisfies the following formula:

in the above formula, B is the distance between the left and right cameras, f is the focal length, and d is the parallax.

The three-dimensional positioning result obtained according to the first coordinate value, the second coordinate value and the third coordinate value is as follows:

it should be noted that, in order to improve the accuracy of the three-dimensional positioning result, after image acquisition, camera calibration and target detection are completed, the camera calibration is used to perform stereo calibration and alignment on the left camera imaging plane and the right camera imaging plane.

According to the method, the deep learning technology and the binocular vision technology are combined together, the problem of inaccurate positioning of the existing target three-dimensional positioning is solved, a two-stage positioning method is provided, namely target detection is completed through a target detection algorithm based on deep learning, namely coarse positioning, then the image processing technology is combined with the binocular vision positioning principle, the target is positioned in three dimensions, namely fine positioning, and the accuracy of the target three-dimensional positioning is improved.

Example 2

According to an embodiment of the present invention, there is also provided an embodiment of a target object positioning apparatus, where fig. 7 is a schematic diagram of a target object positioning apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes: a first obtaining module 701, a second obtaining module 703 and a positioning module 705.

The first acquiring module 701 is configured to acquire a camera image; a second obtaining module 703, configured to obtain a predicted position of the target object in the camera image by using a target detection method based on deep learning; and a positioning module 705, configured to perform three-dimensional positioning on the target object in a binocular vision positioning manner based on the predicted position.

It should be noted that the first obtaining module 701, the second obtaining module 703 and the positioning module 705 correspond to steps S102 to S106 in the above embodiment, and the three modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in embodiment 1.

Optionally, the target detection method based on deep learning includes: the second acquisition module comprises: the device comprises an extraction module, a first generation module and a second generation module. The extraction module is used for extracting a feature map from the camera image by using a feature extraction network model, wherein the feature extraction network model comprises: a plurality of feature extraction layers, each of the plurality of feature extraction layers comprising: the number of the convolutional layers and the pooling layers contained in each feature extraction layer is different from that of the pooling layers; the first generation module is used for generating an interested area in the feature map by utilizing the area generation network model; and the second generation module is used for performing region-of-interest pooling operation on the feature map and the region of interest by using the classification regression network model to generate feature map vectors and performing feature integration on the feature map vectors to obtain the predicted position.

Optionally, the positioning module includes: the device comprises a first determining module and a matching module. The first determining module is used for determining a left camera imaging plane and a right camera imaging plane of the binocular camera; and the matching module is used for carrying out three-dimensional matching on the left camera imaging plane and the right camera imaging plane based on the predicted positions to obtain a three-dimensional positioning result of the target object.

Optionally, the matching module includes: the device comprises a third acquisition module, a fourth acquisition module and a first calculation module. The third acquisition module is used for acquiring the camera parameter information of the binocular camera, wherein the camera parameter information comprises: the distance, the parallax, the focal length, the principal point coordinates of the imaging plane of the left camera and the principal point coordinates of the imaging plane of the right camera are measured; the fourth acquisition module is used for acquiring first coordinate information of a first projection position of the space coordinate information of the predicted position on the left camera imaging plane and second coordinate information of a second projection position of the predicted position on the right camera imaging plane; and the first calculation module is used for calculating to obtain a three-dimensional positioning result by utilizing the space coordinate information, the first coordinate information, the second coordinate information and the camera parameter information.

Optionally, the first calculation module includes: the device comprises a second calculation module, a third calculation module and a second determination module. The second calculation module is used for calculating a first coordinate value on a first coordinate axis and a second coordinate value on a second coordinate axis by using the distance, the parallax, the principal point coordinate of the imaging plane of the left camera and the first coordinate information between the left camera and the right camera, or calculating the first coordinate value on the first coordinate axis and the second coordinate value on the second coordinate axis by using the distance, the parallax, the principal point coordinate of the imaging plane of the right camera and the second coordinate information between the left camera and the right camera, wherein the coordinate plane determined by the first coordinate axis and the second coordinate axis is parallel to the imaging plane of the left camera and the imaging plane of the right camera; the third calculation module is used for calculating a third coordinate value on a third coordinate axis by utilizing the distance, the parallax and the focal length between the left camera and the right camera, wherein the third coordinate axis is parallel to the optical axis of the left camera and the optical axis of the right camera; and the second determining module is used for determining the three-dimensional positioning result based on the first coordinate value, the second coordinate value and the third coordinate value.

Optionally, the target object positioning apparatus further includes: and the adjusting module is used for carrying out three-dimensional calibration and alignment on the left camera imaging plane and the right camera imaging plane through camera calibration.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium having a computer program stored therein, wherein the computer program is configured to execute the positioning method of the target object in the above embodiment 1 when running.

Example 4

According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, wherein the program is configured to execute the positioning method of the target object in the above embodiment 1 when running.

Example 5

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for locating a target object in embodiment 1.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for locating a target object, comprising:

acquiring a camera image;

acquiring a predicted position of a target object in the camera image by using a target detection mode based on deep learning;

and based on the predicted position, carrying out three-dimensional positioning on the target object in a binocular vision positioning mode.

2. The positioning method according to claim 1, wherein the target detection manner based on deep learning comprises: the obtaining of the predicted position of the target object in the camera image by using the target detection mode based on the deep learning comprises:

extracting a feature map from the camera image using the feature extraction network model, wherein the feature extraction network model comprises: a plurality of feature extraction layers, each feature extraction layer of the plurality of feature extraction layers comprising: the number of the convolutional layers and the pooling layers contained in each feature extraction layer is different from that of the pooling layers;

generating a region of interest in the feature map using the region-generating network model;

and performing region-of-interest pooling operation on the feature map and the region of interest by using the classification regression network model to generate feature map vectors, and performing feature integration on the feature map vectors to obtain the predicted position.

3. The positioning method according to claim 1, wherein three-dimensionally positioning the target object by the binocular vision positioning based on the predicted position comprises:

determining a left camera imaging plane and a right camera imaging plane of a binocular camera;

and carrying out stereo matching on the left camera imaging plane and the right camera imaging plane based on the predicted positions to obtain a three-dimensional positioning result of the target object.

4. The positioning method according to claim 3, wherein stereo matching the left camera imaging plane and the right camera imaging plane based on the predicted position, and obtaining the three-dimensional positioning result comprises:

acquiring camera parameter information of the binocular camera, wherein the camera parameter information includes: the distance, the parallax, the focal length, the principal point coordinates of the imaging plane of the left camera and the principal point coordinates of the imaging plane of the right camera are measured;

acquiring first coordinate information of a first projection position of the space coordinate information of the predicted position on the left camera imaging plane and second coordinate information of a second projection position of the predicted position on the right camera imaging plane;

and calculating to obtain the three-dimensional positioning result by utilizing the space coordinate information, the first coordinate information, the second coordinate information and the camera parameter information.

5. The positioning method according to claim 4, wherein calculating the three-dimensional positioning result by using the spatial coordinate information, the first coordinate information, the second coordinate information, and the camera parameter information includes:

calculating a first coordinate value on a first coordinate axis and a second coordinate value on a second coordinate axis by using the distance between the left camera and the right camera, the parallax, the principal point coordinate of the imaging plane of the left camera and the first coordinate information, or calculating a first coordinate value on the first coordinate axis and a second coordinate value on the second coordinate axis by using the distance between the left camera and the right camera, the parallax, the principal point coordinate of the imaging plane of the right camera and the second coordinate information, wherein the coordinate plane determined by the first coordinate axis and the second coordinate axis is parallel to the imaging plane of the left camera and the imaging plane of the right camera;

calculating to obtain a third coordinate value on a third coordinate axis by using the distance between the left camera and the right camera, the parallax and the focal length, wherein the third coordinate axis is parallel to the optical axis of the left camera and the optical axis of the right camera;

and determining the three-dimensional positioning result based on the first coordinate value, the second coordinate value and the third coordinate value.

6. The method of claim 3, further comprising:

and carrying out stereo calibration and alignment on the left camera imaging plane and the right camera imaging plane through camera calibration.

7. An apparatus for locating a target object, comprising:

the first acquisition module is used for acquiring a camera image;

the second acquisition module is used for acquiring the predicted position of the target object in the camera image by using a target detection mode based on deep learning;

and the positioning module is used for carrying out three-dimensional positioning on the target object in a binocular vision positioning mode based on the predicted position.

8. A non-volatile storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of locating a target object as claimed in any one of claims 1 to 6 when run.

9. A processor for running a program, wherein the program is arranged to perform the method for locating a target object as claimed in any one of claims 1 to 6 when run.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of locating a target object as claimed in any one of claims 1 to 6.