CN112819953A

CN112819953A - Three-dimensional reconstruction method, network model training method and device and electronic equipment

Info

Publication number: CN112819953A
Application number: CN202110213255.6A
Authority: CN
Inventors: 凌清; 翟光坤; 吴兴华
Original assignee: Beijing Crownthought Science & Technology Co ltd
Current assignee: Beijing Crownthought Science & Technology Co ltd
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2021-05-18
Anticipated expiration: 2041-02-24
Also published as: CN112819953B

Abstract

The application relates to a three-dimensional reconstruction method, a network model training device and electronic equipment, and belongs to the technical field of computers. The three-dimensional reconstruction method comprises the following steps: acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image; recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located; taking the rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area. In the embodiment of the application, the target object in the first image is identified by using the identification model so as to determine the range of the object to be reconstructed in a two-dimensional space, thereby reducing the reconstruction calculation amount in the three-dimensional space and further solving the problems of too low speed and too high calculation cost of the conventional three-dimensional reconstruction.

Description

Three-dimensional reconstruction method, network model training method and device and electronic equipment

Technical Field

The application belongs to the technical field of computers, and particularly relates to a three-dimensional reconstruction method, a network model training device and electronic equipment.

Background

In industrial scenes such as material sorting and part grabbing, a three-dimensional model Of a grabbed object block is often required to be obtained, and the problems Of too low three-dimensional reconstruction speed and too high calculation cost are faced no matter active light identification based on Time Of Flight (TOF) or stereoscopic vision based on deep learning.

Disclosure of Invention

In view of this, an object of the present application is to provide a three-dimensional reconstruction method, a network model training method, an apparatus and an electronic device, so as to solve the problems of too low three-dimensional reconstruction speed and too high computation cost in the prior art.

The embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method, including: acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image; recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located; taking the rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area. In the embodiment of the application, the identification model is used for identifying the target object in the first image so as to determine the range of the object to be reconstructed in a two-dimensional space, then the rectangular area where the target object is located is used as an identification point, the target rectangular area corresponding to the rectangular area is found in the second image, and finally the three-dimensional coordinate of the target object can be determined according to the rectangular area where the target object is located and the target rectangular area.

With reference to one possible implementation manner of the embodiment of the first aspect, finding a target rectangular region corresponding to the rectangular region in the second image by using the rectangular region where the target object is located as an identification point includes: based on a binocular three-dimensional reconstruction algorithm, coordinates of each corner of a rectangular region where the target object is located are used as identification points, and target point coordinates corresponding to the coordinates of each corner of the rectangular region are found in the second image. In the embodiment of the application, based on a binocular three-dimensional reconstruction algorithm, coordinates of each corner of a rectangular region where a target object is located are used as identification points, and target point coordinates corresponding to the coordinates of each corner of the rectangular region are found in a second image, so that a plurality of three-dimensional coordinates of the target object can be calculated based on a plurality of identification points and a plurality of corresponding target point coordinates, and the positioning accuracy of the target object is improved.

With reference to a possible implementation manner of the embodiment of the first aspect, before the identifying the target object in the first image by using a pre-trained identification model, the method further includes: obtaining a training sample set, the training sample set comprising: an image in which the target object has been labeled and an image containing only the target object; and training the SSD identification network by using the training sample set to obtain the trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the area image where the target is identified from the image marked with the target by the SSD identification network, and the model parameter of the SSD identification network is updated according to the similarity scoring. In the embodiment of the application, the SSD identification network is trained by using the image of the marked target and the sample set only containing the image of the target, the image only containing the target is used as a standard during training, the similarity score is carried out on the image of the area where the target is located, which is identified by the SSD identification network from the image of the marked target, and the model parameters of the SSD identification network are updated according to the similarity score, so that the SSD model can accurately and quickly identify the required target.

With reference to one possible implementation manner of the embodiment of the first aspect, the obtaining a training sample set includes: marking the target object in the obtained original image to obtain a JSON image; modifying the values of the pixel points of the marked part and the unmarked part in the JSON image to obtain a binary image; cutting out an image of the area where the target object is located from the binarized image, converting the size of the image of the area where the target object is located into a preset standard size, converting the size of the marked original image into the preset standard size, and performing gray level processing on the converted image to obtain the training sample set. In the embodiment of the application, the target object in the original image is labeled, the values of the pixel points of the labeled part and the unlabeled part in the obtained JSON image are modified, so that the characteristic of the labeled image is more obvious, the image of the area where the target object is located is cut out, the image is converted into the preset standard size, the size of the labeled original image is converted into the preset standard size, and the converted image is subjected to gray level processing, so that the training precision can be improved, and the interference of irrelevant information is reduced.

In a second aspect, an embodiment of the present application further provides a network model training method, including: obtaining a training sample set, the training sample set comprising: an image of an annotated target object and an image containing only the target object; and training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the image of the area where the target is located, which is identified by the SSD identification network from the image marked with the target, and the model parameters of the SSD identification network are updated according to the similarity scoring.

In combination with a possible implementation manner of the embodiment of the second aspect, the obtaining a training sample set includes: marking the target object in the obtained original image to obtain a JSON image; modifying the values of the pixel points of the marked part and the unmarked part in the JSON image to obtain a binary image; cutting out an image of the area where the target object is located from the binarized image, converting the size of the image of the area where the target object is located into a preset standard size, converting the size of the marked original image into the preset standard size, and performing gray level processing on the converted image to obtain the training sample set.

In a third aspect, an embodiment of the present application further provides a three-dimensional reconstruction apparatus, including: the device comprises an acquisition module, an identification module and a reconstruction module; the binocular camera comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an image pair acquired by a binocular camera, and the image pair comprises a first image and a second image; the recognition module is used for recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located; the reconstruction module is used for taking the rectangular area where the target object is located as an identification point and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

In a fourth aspect, an embodiment of the present application further provides a network model training apparatus, including: the system comprises an acquisition module and a training module; an obtaining module, configured to obtain a training sample set, where the training sample set includes: an image of an annotated target object and an image containing only the target object; and the training module is used for training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the area image where the target is identified from the image marked with the target by the SSD identification network, and the model parameter of the SSD identification network is updated according to the similarity scoring.

In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a memory and a processor, the processor coupled to the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform the foregoing first aspect embodiment and/or a method provided in connection with any one of the possible implementations of the first aspect embodiment, or to perform the foregoing first aspect embodiment and/or a method provided in connection with one of the possible implementations of the first aspect embodiment.

In a sixth aspect, this embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the foregoing first aspect and/or the method provided in connection with any one of the possible implementations of the first aspect, or to perform the foregoing first aspect and/or the method provided in connection with one of the possible implementations of the first aspect.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.

Fig. 1 shows a schematic flow chart of a network model training method provided in an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating a principle of obtaining a training sample set according to an embodiment of the present application.

Fig. 3 shows a schematic flowchart of a binocular vision-based three-dimensional reconstruction method according to an embodiment of the present application.

Fig. 4 shows a schematic diagram of a binocular detection principle provided by an embodiment of the present application.

Fig. 5 shows a module schematic diagram of a network model training apparatus according to an embodiment of the present application.

Fig. 6 shows a schematic block diagram of a binocular vision-based three-dimensional reconstruction apparatus according to an embodiment of the present application.

Fig. 7 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The method solves the problems that the existing three-dimensional reconstruction speed is too slow and the calculation cost is too high. The inventor of the application analyzes the existing various three-dimensional reconstruction schemes, and finds that most of the time cost of the three-dimensional reconstruction is that all three-dimensional objects in a visual field need to be reconstructed, for example, only a small part of the three-dimensional reconstructed target objects exist, but in point cloud reconstruction, a large amount of time is often spent on separating the target objects from the background, and due to the disorder of point cloud data, the operation consumes a large amount of time. Based on this, the embodiment of the application provides a binocular vision-based three-dimensional reconstruction method, which determines the range of an object to be reconstructed in a two-dimensional space, so that the reconstruction calculation amount in the three-dimensional space is reduced, and the problems of too low three-dimensional reconstruction speed and too high calculation cost in the prior art are solved.

It should be noted that, the reasons for the defects existing in the prior art are the results obtained after the inventor has made practice and research, and therefore, the discovery process of the above problems and the solutions proposed by the following embodiments of the present invention to the above problems should be the contribution of the inventor to the present invention in the process of the present invention.

In order to facilitate understanding of the binocular vision-based three-dimensional reconstruction method provided in the embodiments of the present application, a training method of the recognition model involved therein is first described below. The network model training method provided by the embodiment of the present application will be described below with reference to fig. 1.

Step S101: obtaining a training sample set, the training sample set comprising: an image of an annotated target object and an image containing only the target object.

In industrial scenes such as material sorting and part grabbing, aiming at the problems of long positioning time consumption and excessively low three-dimensional reconstruction speed in the operation of grabbing articles, in the embodiment of the application, a training sample set containing images of a target object (to-be-grabbed target) is obtained to train a network model. The labeled target object is different according to different application scenes.

Wherein the training sample set may be obtained by: marking a target object in the obtained original image to obtain a JSON image, modifying the values of pixel points of a marked part and an unmarked part in the JSON image (modifying the value of the pixel point of the marked part to 255 and modifying the value of the pixel point of the unmarked part to 0) to obtain a binary image, cutting out an area image of the target object from the binary image, converting the size of the area image of the target object to a preset standard size, converting the size of the marked original image to the preset standard size, and performing gray processing on the converted image to obtain a training sample set.

For ease of understanding, the process of obtaining the training sample set may be described with reference to the schematic diagram shown in fig. 2. After the original image is obtained, the target object in the original image may be labeled by using labeling software, for example, the original image is opened by using labelme software, an object to be captured (target object) is labeled, and after the label is obtained, the software outputs a JSON image, where the JSON image stores labeling information of a user. In order to make the characteristic of the labeled image more obvious, modifying the values of the pixel points of the labeled part and the unlabeled part in the JSON image, for example, modifying the value of the pixel point of the labeled part to 255, modifying the value of the pixel point of the unlabeled part to 0, obtaining a binary image, cutting out an area image where the target object is located from the binary image, obtaining an image only containing the target object, and converting the size of the area image where the target object is located into a preset standard size, for example, 300 × 300; and converting the size of the marked original image into a preset standard size, for example, 300 × 300, and performing gray scale processing on the converted image, so as to obtain training sample data. After the image of the area where the cut target object is located is converted into the image with the preset standard size, the image can be binarized again, and the value of the pixel point of which the value is 0 is modified into 255.

The preset standard size is related to the size supported by the network model to be trained, and is not limited to 300 × 300 in the above example, and the sizes supported by different types of network models are different, and the corresponding preset standard sizes are also different.

Step S102: and training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the image of the area where the target is located, which is identified by the SSD identification network from the image marked with the target, and the model parameters of the SSD identification network are updated according to the similarity scoring.

After a training sample set is obtained, training an SSD (Single Shot Multi Box Detector) recognition network by using the training sample set, wherein during training, an image only containing a target is used as a standard, similarity scoring is carried out on a region image where the target is located, which is recognized by the SSD recognition network from the image with the target marked, and a model parameter of the SSD recognition network is updated according to the similarity scoring until the similarity scoring of the region image where the target is located, which is recognized by the SSD recognition network from the image with the target marked, and the image only containing the target is greater than a threshold value, so that a trained recognition model can be obtained. The SSD identification network is an object detection identification network of one-stage, and the backbone network of the SSD identification network is VGG 16.

After the recognition model is obtained, the trained recognition model can be used for recognizing the object to be reconstructed, so that the range of the object to be reconstructed is determined on a two-dimensional space. The binocular vision-based three-dimensional reconstruction method provided by the embodiment of the present application will be described below with reference to fig. 3.

Step S201: an image pair acquired by a binocular camera is acquired, the image pair including a first image and a second image.

The method comprises the steps of acquiring an image pair acquired by a binocular camera, wherein the image pair comprises images acquired by a left camera and a right camera of the binocular camera respectively for the same target object, namely a first image and a second image.

The first image may be an image captured by a left camera, and correspondingly, the second image may be an image captured by a right camera, and the second image may be an image captured by a left camera, and correspondingly, the first image may be an image captured by a right camera.

Step S202: and identifying the target object in the first image by using a pre-trained identification model to obtain a rectangular area where the target object is located.

In the application, in order to reduce the number of point clouds of three-dimensional reconstruction, a pre-trained recognition model is used for recognizing a target object in a first image to obtain a rectangular region where the target object is located, namely, the first image is input into the pre-trained recognition model, the recognition model is used for recognizing the rectangular region where the target object in the first image is located, and the rectangular region where the target object is located is output, namely, a rect region. Wherein, the output result comprises the coordinates of each corner of the rectangular area. The range of the object to be reconstructed is determined on the two-dimensional space, so that the reconstruction calculation amount in the three-dimensional space is reduced.

In one embodiment, the recognition model may be an SSD (single Shot multi box detector) recognition network, such as an SSD recognition network based on Mobilenet-v 2. Besides the SSD identification network, other identification networks, such as the YOLO network, may be used.

The process of training the recognition model may refer to the process shown in the network model training method. The labeled target object in the images in the training sample set should be consistent with the target object to be identified, or both of the labeled target object and the target object belong to the same type of target object.

Step S203: and taking the rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image.

And based on a binocular three-dimensional reconstruction algorithm, taking the rectangular region where the target object is located as an identification point, and finding a target rectangular region corresponding to the rectangular region in the second image. The rectangular area where the target object is located is used as the identification point, and the point cloud data in the whole first image is not used as the identification point, so that the number of point clouds in three-dimensional reconstruction is reduced.

The binocular three-dimensional reconstruction algorithm generally adopts pixel point matching based on a parallax method, and supposing that target identification used in the method adopts target identification based on a left target, when the parallax method is used, a rectangular area where a target object is located is used as a left target identification point, and a target rectangular area corresponding to the rectangular area is searched in a right target second image.

The process of finding the target rectangular area corresponding to the rectangular area in the second image, with the rectangular area where the target object is located as the identification point, may be: based on a binocular three-dimensional reconstruction algorithm, coordinates of each corner of a rectangular region where the target object is located are used as identification points, and target point coordinates corresponding to the coordinates of each corner of the rectangular region are found in the second image. Assuming that the identification based on the left target is adopted, the rect area result output by the SSD identification model can be input into a left target coordinate system, and target point coordinates corresponding to coordinates of each corner of the rectangular area are searched in the right target by using the left target point as an identification point. For example, the coordinate of the upper left corner of the rectangular region is used as the identification point, the target point corresponding to the upper left corner of the rectangular region is searched for in the right eye, the coordinate of the upper right corner of the rectangular region is used as the identification point, the target point corresponding to the upper right corner of the rectangular region is searched for in the right eye, the coordinate of the lower left corner of the rectangular region is used as the identification point, the target point corresponding to the lower left corner of the rectangular region is searched for in the right eye, the coordinate of the lower right corner of the rectangular region is used as the identification point, and the target point corresponding to the lower right corner of the rectangular region is searched for in the right eye.

Step S204: and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

And after a target rectangular area corresponding to the rectangular area is found in the second image, the three-dimensional coordinates of the target object can be obtained according to the rectangular area where the target object is located and the target rectangular area. After the coordinates of the corresponding points of the left and right objects are obtained, a triangular relation is established, and then the three-dimensional coordinates of the corresponding points can be calculated.

For the sake of easy understanding, the following describes how to calculate the three-dimensional coordinates of the corresponding point based on the coordinates of the corresponding point for left and right purposes, with reference to the schematic diagram shown in fig. 4. Suppose that the left camera in binocular vision is used as the origin O_LLet the coordinates of the spatial scene point P (X, Y, Z) on the left and right camera imaging plane images be P1(u1, v1) and P2(u2, v2), respectivelyThe projection centers are O1 and O2, respectively, and the coordinates (u0, v0) of the point where the main optical axis intersects the left image plane; then, X ═ b. (u1-u0)/(u1-u 2); b. (v1-v0)/(u1-u 2); and Z is b.f/(u1-u 2). Wherein, b is the base line distance of the projection centers of the left camera and the right camera, and f is the focal length of the left camera and the right camera.

By the method, the three-dimensional coordinates corresponding to each point (such as the upper left corner, the lower left corner, the upper right corner and the lower right corner) matched with the left eye and the right eye can be calculated.

An embodiment of the present application provides a network model training apparatus 100, as shown in fig. 5, the network model training apparatus 100 includes: an acquisition module 110 and a training module 120.

An obtaining module 110, configured to obtain a training sample set, where the training sample set includes: an image of an annotated target object and an image containing only the target object. Optionally, the obtaining module 110 is specifically configured to: marking the target object in the obtained original image to obtain a JSON image; modifying the values of the pixel points of the marked part and the unmarked part in the JSON image to obtain a binary image; cutting out an image of the area where the target object is located from the binarized image, converting the size of the image of the area where the target object is located into a preset standard size, converting the size of the marked original image into the preset standard size, and performing gray level processing on the converted image to obtain the training sample set.

The training module 120 is configured to train the SSD identification network by using the training sample set to obtain a trained identification model, where in training, the image only including the target is used as a standard, a similarity score is performed on an area image where the target is located, which is identified by the SSD identification network from the image labeled with the target, and a model parameter of the SSD identification network is updated according to the similarity score.

The network model training apparatus 100 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to corresponding contents in the foregoing method embodiments for parts of embodiments that are not mentioned in the apparatus embodiments.

The embodiment of the present application further provides a three-dimensional reconstruction apparatus 200, as shown in fig. 6. The three-dimensional reconstruction apparatus 200 includes: an acquisition module 210, an identification module 220, and a reconstruction module 230.

The acquiring module 210 is configured to acquire an image pair acquired by the binocular camera, where the image pair includes a first image and a second image.

The recognition module 220 is configured to recognize the target object in the first image by using a pre-trained recognition model, so as to obtain a rectangular region where the target object is located.

A reconstruction module 230, configured to use a rectangular region where the target object is located as an identification point, and find a target rectangular region corresponding to the rectangular region in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

Optionally, the reconstruction module 230 is specifically configured to, based on a binocular three-dimensional reconstruction algorithm, use coordinates of each corner of a rectangular region where the target object is located as an identification point, and find coordinates of a target point in the second image, where the coordinates of each corner of the rectangular region correspond to the coordinates of the target point.

Optionally, the three-dimensional reconstruction apparatus 200 further comprises a training module, and correspondingly, the obtaining module 210, further configured to obtain a training sample set, where the training sample set includes: an image that has been annotated with the object and an image that only contains the object.

And the training module is used for training the SSD identification network by using the training sample set to obtain the trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the area image where the target is identified from the image marked with the target by the SSD identification network, and the model parameter of the SSD identification network is updated according to the similarity scoring.

Optionally, the obtaining module 210 is further specifically configured to: marking the target object in the obtained original image to obtain a JSON image; modifying the values of the pixel points of the marked part and the unmarked part in the JSON image to obtain a binary image; cutting out an image of the area where the target object is located from the binarized image, converting the size of the image of the area where the target object is located into a preset standard size, converting the size of the marked original image into the preset standard size, and performing gray level processing on the converted image to obtain the training sample set.

The three-dimensional reconstruction apparatus 200 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, no mention is made in the apparatus embodiment, and reference may be made to the corresponding contents in the foregoing method embodiments.

As shown in fig. 7, fig. 7 is a block diagram illustrating a structure of an electronic device 300 according to an embodiment of the present disclosure. The electronic device 300 includes: a transceiver 310, a memory 320, a communication bus 330, and a processor 340.

The elements of the transceiver 310, the memory 320 and the processor 340 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically coupled to each other via one or more communication buses 330 or signal lines. The transceiver 310 is used for transceiving data. The memory 320 is used for storing a computer program, such as a software functional module shown in fig. 5 or fig. 6, that is, the network model training apparatus 100 shown in fig. 5 or the three-dimensional reconstruction apparatus 200 shown in fig. 6. The network model training apparatus 100 or the three-dimensional reconstruction apparatus 200 includes at least one software function module, which may be stored in the memory 320 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 300.

Wherein, when the processor 340 is configured to execute a software functional module or a computer program included in the network model training apparatus 100, the processor 340 is configured to obtain a training sample set, where the training sample set includes: an image of an annotated target object and an image containing only the target object; and training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the image of the area where the target is located, which is identified by the SSD identification network from the image marked with the target, and the model parameters of the SSD identification network are updated according to the similarity scoring.

Wherein, when the processor 340 is configured to execute the software functional module or the computer program included in the three-dimensional reconstruction apparatus 200, the processor 340 is configured to acquire an image pair acquired by the binocular camera, where the image pair includes a first image and a second image; recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located; taking the rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

The Memory 320 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

Processor 340 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 340 may be any conventional processor or the like.

The electronic device 300 includes, but is not limited to, a computer, a server, and the like.

The present embodiment also provides a non-volatile computer-readable storage medium (hereinafter, referred to as a storage medium), where a computer program is stored on the storage medium, and when the computer program is run by the electronic device 300, the three-dimensional reconstruction method or the network model training method described above is executed.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of three-dimensional reconstruction, comprising:

acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image;

recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located;

taking the rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image;

and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

2. The method according to claim 1, wherein the step of finding a target rectangular area corresponding to the rectangular area in the second image by using the rectangular area where the target object is located as an identification point comprises:

based on a binocular three-dimensional reconstruction algorithm, coordinates of each corner of a rectangular region where the target object is located are used as identification points, and target point coordinates corresponding to the coordinates of each corner of the rectangular region are found in the second image.

3. The method of claim 1, wherein prior to identifying the object in the first image using a pre-trained identification model, the method further comprises:

obtaining a training sample set, the training sample set comprising: an image in which the target object has been labeled and an image containing only the target object;

and training the SSD identification network by using the training sample set to obtain the trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the area image where the target is identified from the image marked with the target by the SSD identification network, and the model parameter of the SSD identification network is updated according to the similarity scoring.

4. The method of claim 3, wherein obtaining a training sample set comprises:

marking the target object in the obtained original image to obtain a JSON image;

modifying the values of the pixel points of the marked part and the unmarked part in the JSON image to obtain a binary image;

cutting out an image of the area where the target object is located from the binarized image, converting the size of the image of the area where the target object is located into a preset standard size, converting the size of the marked original image into the preset standard size, and performing gray level processing on the converted image to obtain the training sample set.

5. A network model training method is characterized by comprising the following steps:

obtaining a training sample set, the training sample set comprising: an image of an annotated target object and an image containing only the target object;

and training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the image of the area where the target is located, which is identified by the SSD identification network from the image marked with the target, and the model parameters of the SSD identification network are updated according to the similarity scoring.

6. The method of claim 5, wherein obtaining a training sample set comprises:

7. A three-dimensional reconstruction apparatus, comprising:

the binocular camera comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an image pair acquired by a binocular camera, and the image pair comprises a first image and a second image;

the recognition module is used for recognizing the target object in the first image by using a pre-trained recognition model to obtain a rectangular area where the target object is located;

the reconstruction module is used for taking the rectangular area where the target object is located as an identification point and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

8. A network model training apparatus, comprising:

an obtaining module, configured to obtain a training sample set, where the training sample set includes: an image of an annotated target object and an image containing only the target object;

and the training module is used for training the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the image only containing the target is used as a standard, similarity scoring is carried out on the area image where the target is identified from the image marked with the target by the SSD identification network, and the model parameter of the SSD identification network is updated according to the similarity scoring.

9. An electronic device, comprising:

a memory and a processor, the processor coupled to the memory;

the memory is used for storing programs;

the processor, configured to invoke a program stored in the memory to perform the method according to any one of claims 1-4, or to perform the method according to any one of claims 5-6.

10. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1-4 or the method of any one of claims 5-6.