CN112819953B

CN112819953B - Three-dimensional reconstruction method, network model training method, device and electronic equipment

Info

Publication number: CN112819953B
Application number: CN202110213255.6A
Authority: CN
Inventors: 凌清; 翟光坤; 吴兴华
Original assignee: Beijing Crownthought Science & Technology Co ltd
Current assignee: Beijing Crownthought Science & Technology Co ltd
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2024-01-19
Anticipated expiration: 2041-02-24
Also published as: CN112819953A

Abstract

The application relates to a three-dimensional reconstruction method, a network model training method, a device and electronic equipment, and belongs to the technical field of computers. The three-dimensional reconstruction method comprises the following steps: acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image; identifying the target object in the first image by utilizing a pre-trained identification model to obtain a rectangular area where the target object is located; taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is and the target rectangular area. In the embodiment of the application, the object in the first image is identified by utilizing the identification model so as to determine the object range to be reconstructed in the two-dimensional space, thereby reducing the reconstruction calculation amount in the three-dimensional space and further solving the problems of too slow speed and too high calculation cost of the existing three-dimensional reconstruction.

Description

Three-dimensional reconstruction method, network model training method, device and electronic equipment

Technical Field

The application belongs to the technical field of computers, and particularly relates to a three-dimensional reconstruction method, a network model training method, a device and electronic equipment.

Background

In industrial scenes such as material sorting and part grabbing, a three-dimensional model Of a grabbed object block is required to be acquired, and the problems Of too slow three-dimensional reconstruction speed and too high calculation cost are faced no matter whether active light identification based on Time Of Flight (TOF) or stereoscopic vision based on deep learning are carried out.

Disclosure of Invention

In view of the above, an object of the present application is to provide a three-dimensional reconstruction method, a network model training method, a device and an electronic apparatus, so as to solve the problems of too slow speed and too high calculation cost of the existing three-dimensional reconstruction.

Embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method, including: acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image; identifying a target object in the first image by using a pre-trained identification model to obtain a rectangular area where the target object is located; taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is and the target rectangular area. In the embodiment of the application, the object in the first image is identified by utilizing the identification model, so that the object range to be reconstructed is determined in a two-dimensional space, then the rectangular area where the object is located is taken as an identification point, the target rectangular area corresponding to the rectangular area is found in the second image, finally the three-dimensional coordinate of the object can be determined according to the rectangular area where the object is located and the target rectangular area, and the problems of too slow speed and too high calculation cost of the existing three-dimensional reconstruction are solved by reducing the reconstruction calculation amount in the three-dimensional space.

With reference to a possible implementation manner of the first aspect of the embodiment, the finding, in the second image, a target rectangular area corresponding to the rectangular area with the rectangular area where the target object is located as an identification point includes: and based on a binocular three-dimensional reconstruction algorithm, taking the coordinates of each corner of the rectangular area where the target object is located as identification points, and finding out target point coordinates corresponding to the coordinates of each corner of the rectangular area in the second image. In the embodiment of the application, based on a binocular three-dimensional reconstruction algorithm, coordinates of each corner of a rectangular area where a target object is located are used as identification points, and target point coordinates corresponding to the coordinates of each corner of the rectangular area are found in a second image, so that a plurality of three-dimensional coordinates of the target object are calculated based on a plurality of identification points and corresponding target point coordinates, and the positioning accuracy of the target object is improved.

With reference to a possible implementation manner of the embodiment of the first aspect, before identifying the target object in the first image by using a pre-trained identification model, the method further includes: obtaining a training sample set, the training sample set comprising: an image of the object that has been marked and an image that contains only the object; and training the SSD recognition network by using the training sample set to obtain the trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring. In the embodiment of the application, the SSD recognition network is trained by using the image marked with the target object and the sample set of the image only containing the target object, and when the SSD recognition network is trained, the image only containing the target object is used as a standard, the similarity scoring is carried out on the area image of the target object recognized by the SSD recognition network from the image marked with the target object, and the model parameters of the SSD recognition network are updated according to the similarity scoring, so that the SSD model can accurately and rapidly recognize the required target object.

With reference to a possible implementation manner of the embodiment of the first aspect, obtaining a training sample set includes: labeling the target object in the acquired original image to obtain a JSON image; modifying the values of pixel points of the marked part and the unmarked part in the JSON image to obtain a binarized image; cutting out an area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the marked original image into the preset standard size, and carrying out gray processing on the converted image to obtain the training sample set. In the embodiment of the application, the target object in the original image is marked, and the values of the pixel points of the marked part and the unmarked part in the obtained JSON image are modified, so that the characteristics of the marked image are more obvious, then the area image where the target object is located is cut out, the area image is converted into the preset standard size, the size of the marked original image is converted into the preset standard size, and the converted image is subjected to gray processing, so that the training accuracy can be improved, and the interference of irrelevant information is reduced.

In a second aspect, an embodiment of the present application further provides a network model training method, including: obtaining a training sample set, the training sample set comprising: an image of a marked object and an image containing only the object; and training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

With reference to a possible implementation manner of the second aspect embodiment, obtaining a training sample set includes: labeling the target object in the acquired original image to obtain a JSON image; modifying the values of pixel points of the marked part and the unmarked part in the JSON image to obtain a binarized image; cutting out an area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the marked original image into the preset standard size, and carrying out gray processing on the converted image to obtain the training sample set.

In a third aspect, an embodiment of the present application further provides a three-dimensional reconstruction apparatus, including: the device comprises an acquisition module, an identification module and a reconstruction module; the acquisition module is used for acquiring an image pair acquired by the binocular camera, wherein the image pair comprises a first image and a second image; the identification module is used for identifying the target object in the first image by utilizing a pre-trained identification model to obtain a rectangular area where the target object is located; the reconstruction module is used for taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

In a fourth aspect, an embodiment of the present application further provides a network model training apparatus, including: the system comprises an acquisition module and a training module; an acquisition module for acquiring a training sample set, the training sample set comprising: an image of a marked object and an image containing only the object; the training module is used for training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

In a fifth aspect, embodiments of the present application further provide an electronic device, including: the device comprises a memory and a processor, wherein the processor is connected with the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory, to perform the method provided by the above-mentioned first aspect embodiment and/or any of the possible implementation manners of the first aspect embodiment, or to perform the method provided by the above-mentioned first aspect embodiment and/or one of the possible implementation manners of the first aspect embodiment.

In a sixth aspect, the embodiments of the present application further provide a storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the embodiment of the first aspect and/or provided in connection with any one of the possible implementation manners of the embodiment of the first aspect, or performs the method of the embodiment of the first aspect and/or provided in connection with one of the possible implementation manners of the embodiment of the first aspect.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objects and other advantages of the present application may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. The above and other objects, features and advantages of the present application will become more apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the several views of the drawings. The drawings are not intended to be drawn to scale, with emphasis instead being placed upon illustrating the principles of the present application.

Fig. 1 shows a flow chart of a network model training method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of acquiring a training sample set according to an embodiment of the present application.

Fig. 3 shows a flow chart of a three-dimensional reconstruction method based on binocular vision according to an embodiment of the present application.

Fig. 4 shows a schematic diagram of a binocular detection principle according to an embodiment of the present application.

Fig. 5 shows a schematic block diagram of a network model training device according to an embodiment of the present application.

Fig. 6 shows a schematic block diagram of a three-dimensional reconstruction device based on binocular vision according to an embodiment of the present application.

Fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, relational terms such as "first," "second," and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Furthermore, the term "and/or" in this application is merely an association relation describing an association object, and indicates that three relations may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone.

In view of the problems of too slow speed and too high calculation cost of the existing three-dimensional reconstruction. The inventor of the application finds that most of the time cost of three-dimensional reconstruction is that it needs to reconstruct all three-dimensional objects in the field of view, for example, only a small part of the three-dimensional reconstructed target object, but in practice, in point cloud reconstruction, a great deal of time is required to separate the target object from the background, and this operation consumes a great deal of time due to the disorder of the point cloud data. Based on the above, the embodiment of the application provides a binocular vision-based three-dimensional reconstruction method, which determines the object range to be reconstructed in a two-dimensional space, so that the reconstruction calculation amount in the three-dimensional space is reduced, and the problems of too low speed and too high calculation cost of the existing three-dimensional reconstruction are solved.

It should be noted that, the reasons for the defects of the prior art solutions are all the results obtained by the inventor after practice and careful study, and therefore, the discovery process of the above problems and the solutions presented below by the embodiments of the present invention for the above problems should be all contributions of the inventor to the present invention in the process of the present invention.

In order to facilitate understanding of the binocular vision-based three-dimensional reconstruction method provided in the embodiments of the present application, a training method of the recognition model related thereto will be described. The method for training a network model according to the embodiment of the present application will be described below with reference to fig. 1.

Step S101: obtaining a training sample set, the training sample set comprising: an image of a tagged object and an image containing only the object.

In industrial scenes such as material sorting and part grabbing, and the like, aiming at the problems of long positioning time consumption and too low three-dimensional reconstruction speed during object grabbing operation, in the embodiment of the application, a training sample set containing an image of a target object (a target to be grabbed) is obtained to train a network model. According to different application scenes, the marked targets are different.

Wherein the training sample set may be obtained by: labeling the target object in the obtained original image to obtain a JSON image, modifying the values of the pixel points of the labeling part and the non-labeling part in the JSON image (modifying the value of the pixel point of the labeling part to 255 and modifying the value of the pixel point of the non-labeling part to 0) to obtain a binarized image, cutting out the area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the labeled original image into the preset standard size, and carrying out gray processing on the converted image to obtain a training sample set.

For ease of understanding, the process of acquiring the training sample set may be described with reference to the schematic diagram shown in fig. 2. After the original image is obtained, the target object in the original image can be marked by using marking software, for example, the original image is opened by using labelme software, the object to be captured (target object) is marked, after marking, the software outputs a JSON image, and the JSON image stores marking information of a user. In order to make the characteristics of the marked image more obvious, modifying the values of the marked part and the unmarked part of the JSON image, for example, modifying the value of the marked part to 255, modifying the value of the unmarked part to 0 to obtain a binarized image, then cutting out the area image of the target object from the binarized image to obtain an image only containing the target object, and converting the size of the area image of the target object into a preset standard size, for example, 300 x 300; and converting the size of the marked original image into a preset standard size, for example, 300 x 300, and performing gray processing on the converted image, so that training sample data can be obtained. After converting the cut region image of the target object into the image with the preset standard size, the image can be re-binarized, and the value of the pixel point with the value of 0 is modified to 255.

The preset standard size is related to the size supported by the network model to be trained, and is not limited to 300×300 in the above example, and the sizes supported by different types of network models are different, and the corresponding preset standard sizes are also different.

Step S102: and training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

After the training sample set is obtained, training the SSD (Single Shot MultiBox Detector) identification network by using the training sample set, wherein during training, an image only containing the target object is used as a standard, the similarity scoring is carried out on the area image of the target object identified by the SSD identification network from the image of the marked target object, and the model parameters of the SSD identification network are updated according to the similarity scoring until the similarity scoring of the area image of the target object identified by the SSD identification network from the image of the marked target object and the image only containing the target object is larger than a threshold value, so that a trained identification model can be obtained. The SSD recognition network is an object detection recognition network of one-stage, and a backbone network of the SSD recognition network is VGG16.

After the recognition model is obtained, the object to be reconstructed can be recognized by using the trained recognition model, so that the range of the object to be reconstructed is determined in two-dimensional space. The binocular vision-based three-dimensional reconstruction method provided in the embodiment of the present application will be described below with reference to fig. 3.

Step S201: an image pair acquired by a binocular camera is acquired, the image pair including a first image and a second image.

An image pair acquired by a binocular camera is acquired, wherein the image pair comprises images acquired by a left camera and a right camera in the binocular camera respectively for the same target object, namely a first image and a second image.

The first image may be an image collected by a left camera, and correspondingly, the second image is an image collected by a right camera, and the second image may be an image collected by a left camera, and correspondingly, the first image is an image collected by a right camera.

Step S202: and identifying the target object in the first image by using a pre-trained identification model to obtain a rectangular area where the target object is located.

In the application, in order to reduce the number of point clouds in three-dimensional reconstruction, a pre-trained recognition model is utilized to recognize a target object in a first image, so as to obtain a rectangular area where the target object is located, namely, the first image is input into the pre-trained recognition model, the recognition model is utilized to recognize the rectangular area where the target object in the first image is located, and the rectangular area where the target object is located, namely, the rect area is output. Wherein the output result contains the coordinates of each corner of the rectangular region. The object range to be reconstructed is determined in two-dimensional space, so that the reconstruction calculation amount in three-dimensional space is reduced.

In one embodiment, the identification model may be a SSD (Single Shot MultiBox Detector) identification network, such as a mobilet-v 2 based SSD identification network. In addition to SSD identification networks, other identification networks, such as a YOLO network, may be used.

The process of training the recognition model can refer to the process shown in the network model training method. The marked target objects in the images in the training sample set are consistent with the target objects to be identified, or the marked target objects in the images in the training sample set are the same type of target objects.

Step S203: and taking the rectangular area where the target object is located as an identification point, and finding out a target rectangular area corresponding to the rectangular area in the second image.

Based on a binocular three-dimensional reconstruction algorithm, taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image. The rectangular area where the target object is located is used as the identification point, and the point cloud data in the whole first image is not used as the identification point, so that the number of the point clouds in three-dimensional reconstruction is reduced.

The binocular three-dimensional reconstruction algorithm generally adopts pixel point matching based on a parallax method, and the target recognition adopted in the application is assumed to be based on left target recognition, so that when the parallax method is used, a rectangular area where a target object is located is used as a left target recognition point, and a target rectangular area corresponding to the rectangular area is searched in a right target second image.

The process of finding the rectangular area of the object corresponding to the rectangular area in the second image may be that the rectangular area of the object is taken as the identification point: and based on a binocular three-dimensional reconstruction algorithm, taking the coordinates of each corner of the rectangular area where the target object is located as identification points, and finding out target point coordinates corresponding to the coordinates of each corner of the rectangular area in the second image. Assuming that target recognition based on left destination is adopted, the rect region result output by the SSD recognition model can be input into a left destination coordinate system, and target point coordinates corresponding to the coordinates of each corner of the rectangular region can be found in the right destination by using the left destination point as a recognition point. For example, the coordinates of the upper left corner of the rectangular area are taken as the identification points, the coordinates of the upper right corner of the rectangular area are taken as the identification points, the points corresponding to the points of the upper right corner of the rectangular area are taken as the right eyes, the coordinates of the lower left corner of the rectangular area are taken as the identification points, the points corresponding to the points of the lower left corner of the rectangular area are taken as the identification points, the coordinates of the lower right corner of the rectangular area are taken as the identification points, and the points corresponding to the points of the lower right corner of the rectangular area are taken as the right eyes.

Step S204: and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is and the target rectangular area.

After the target rectangular area corresponding to the rectangular area is found in the second image, the three-dimensional coordinate of the target object can be obtained according to the rectangular area where the target object is located and the target rectangular area. After the coordinates of the corresponding points of the left and right objects are obtained, a triangular relationship is established, and the three-dimensional coordinates of the corresponding points can be calculated.

Here, for ease of understanding, a description will be given below of how to calculate the three-dimensional coordinates of the corresponding point for left and right purposes based on the coordinates of the corresponding point, with reference to the schematic diagram shown in fig. 4. Assume that a left camera in binocular vision is taken as an origin O _L Assuming that coordinates of the spatial scene point P (X, Y, Z) on the left and right camera imaging plane images are P1 (u 1, v 1) and P2 (u 2, v 2), respectively, projection centers of the left and right cameras are O1, O2, respectively, and coordinates (u 0, v 0) of a point where the main optical axis intersects with the left image plane; then, x=b. (u 1-u 0)/(u 1-u 2); y=b. (v 1-v 0)/(u 1-u 2); z= b.f/(u 1-u 2). Wherein b is the base line distance of the projection centers of the left and right cameras, and f is the focal length of the left and right cameras.

Through the mode, the three-dimensional coordinates corresponding to each point (such as the upper left corner, the lower left corner, the upper right corner and the lower right corner) matched with the left and right eyes can be calculated.

An embodiment of the present application provides a network model training apparatus 100, as shown in fig. 5, where the network model training apparatus 100 includes: the acquisition module 110 and the training module 120.

An obtaining module 110, configured to obtain a training sample set, where the training sample set includes: an image of a tagged object and an image containing only the object. Optionally, the obtaining module 110 is specifically configured to: labeling the target object in the acquired original image to obtain a JSON image; modifying the values of pixel points of the marked part and the unmarked part in the JSON image to obtain a binarized image; cutting out an area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the marked original image into the preset standard size, and carrying out gray processing on the converted image to obtain the training sample set.

And the training module 120 is configured to train the SSD identification network by using the training sample set to obtain a trained identification model, wherein during training, the SSD identification network performs similarity scoring on an area image in which the object identified by the SSD identification network from the image labeled with the object, and updates model parameters of the SSD identification network according to the similarity scoring.

The network model training apparatus 100 provided in the embodiments of the present application has the same implementation principle and technical effects as those of the foregoing method embodiments, and for brevity, reference may be made to the corresponding contents of the foregoing method embodiments for the parts of the apparatus embodiment that are not mentioned.

The embodiment of the application also provides a three-dimensional reconstruction device 200, as shown in fig. 6. The three-dimensional reconstruction apparatus 200 includes: an acquisition module 210, an identification module 220, a reconstruction module 230.

The acquiring module 210 is configured to acquire an image pair acquired by the binocular camera, where the image pair includes a first image and a second image.

The identifying module 220 is configured to identify the target object in the first image by using a pre-trained identifying model, so as to obtain a rectangular area where the target object is located.

A reconstruction module 230, configured to find a target rectangular area corresponding to the rectangular area in the second image, with the rectangular area where the target object is located as an identification point; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

Optionally, the reconstruction module 230 is specifically configured to find, in the second image, coordinates of the target point corresponding to the coordinates of each corner of the rectangular area, using the coordinates of each corner of the rectangular area where the target object is located as the identification point based on a binocular three-dimensional reconstruction algorithm.

Optionally, the three-dimensional reconstruction apparatus 200 further includes a training module, and accordingly, the obtaining module 210 is further configured to obtain a training sample set, where the training sample set includes: the image of the object has been marked and only the image of the object is included.

The training module is used for training the SSD recognition network by using the training sample set to obtain the trained recognition model, wherein during training, the image only containing the target object is used as a standard, the similarity scoring is carried out on the area image of the target object recognized by the SSD recognition network from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

Optionally, the obtaining module 210 is further specifically configured to: labeling the target object in the acquired original image to obtain a JSON image; modifying the values of pixel points of the marked part and the unmarked part in the JSON image to obtain a binarized image; cutting out an area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the marked original image into the preset standard size, and carrying out gray processing on the converted image to obtain the training sample set.

The three-dimensional reconstruction device 200 provided in the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to the corresponding content in the foregoing method embodiment for the part of the device embodiment that is not mentioned.

As shown in fig. 7, fig. 7 shows a block diagram of an electronic device 300 according to an embodiment of the present application. The electronic device 300 includes: a transceiver 310, a memory 320, a communication bus 330, and a processor 340.

The transceiver 310, the memory 320, and the processor 340 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically coupled to each other via one or more communication buses 330 or signal lines. Wherein the transceiver 310 is used for receiving and transmitting data. The memory 320 is used for storing a computer program, such as the software functional modules shown in fig. 5 or 6, i.e. the network model training device 100 shown in fig. 5 or the three-dimensional reconstruction device 200 shown in fig. 6. The network model training apparatus 100 or the three-dimensional reconstruction apparatus 200 includes at least one software function module that may be stored in the memory 320 in the form of software or firmware (firmware) or cured in an Operating System (OS) of the electronic device 300.

The processor 340 is configured to, when executing a software function module or a computer program included in the network model training apparatus 100, obtain a training sample set, where the training sample set includes: an image of a marked object and an image containing only the object; and training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

The processor 340 is configured to obtain an image pair acquired by the binocular camera when the processor 340 is configured to execute a software functional module or a computer program included in the three-dimensional reconstruction apparatus 200, where the image pair includes a first image and a second image; identifying a target object in the first image by using a pre-trained identification model to obtain a rectangular area where the target object is located; taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is and the target rectangular area.

The Memory 320 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

Processor 340 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. The general purpose processor may be a microprocessor or the processor 340 may be any conventional processor or the like.

The electronic device 300 includes, but is not limited to, a computer, a server, and the like.

The embodiments of the present application further provide a non-volatile computer readable storage medium (hereinafter referred to as a storage medium) on which a computer program is stored, where the computer program, when executed by a computer such as the electronic device 300 described above, performs the three-dimensional reconstruction method or the network model training method described above.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A three-dimensional reconstruction method, comprising:

acquiring an image pair acquired by a binocular camera, wherein the image pair comprises a first image and a second image;

identifying a target object in the first image by using a pre-trained identification model to obtain a rectangular area where the target object is located;

taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image;

and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is and the target rectangular area.

2. The method according to claim 1, wherein the finding the target rectangular area corresponding to the rectangular area in the second image using the rectangular area where the target object is located as the identification point includes:

and based on a binocular three-dimensional reconstruction algorithm, taking the coordinates of each corner of the rectangular area where the target object is located as identification points, and finding out target point coordinates corresponding to the coordinates of each corner of the rectangular area in the second image.

3. The method of claim 1, wherein prior to identifying the object in the first image using the pre-trained identification model, the method further comprises:

obtaining a training sample set, the training sample set comprising: an image of the object that has been marked and an image that contains only the object;

and training the SSD recognition network by using the training sample set to obtain the trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

4. A method according to claim 3, wherein obtaining a training sample set comprises:

labeling the target object in the acquired original image to obtain a JSON image;

modifying the values of pixel points of the marked part and the unmarked part in the JSON image to obtain a binarized image;

cutting out an area image of the target object from the binarized image, converting the size of the area image of the target object into a preset standard size, converting the size of the marked original image into the preset standard size, and carrying out gray processing on the converted image to obtain the training sample set.

5. A method for training a network model, comprising:

obtaining a training sample set, the training sample set comprising: an image of a marked object and an image containing only the object;

and training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

6. The method of claim 5, wherein obtaining a training sample set comprises:

7. A three-dimensional reconstruction apparatus, comprising:

the acquisition module is used for acquiring an image pair acquired by the binocular camera, wherein the image pair comprises a first image and a second image;

the identification module is used for identifying the target object in the first image by utilizing a pre-trained identification model to obtain a rectangular area where the target object is located;

the reconstruction module is used for taking a rectangular area where the target object is located as an identification point, and finding a target rectangular area corresponding to the rectangular area in the second image; and obtaining the three-dimensional coordinates of the target object according to the rectangular area where the target object is located and the target rectangular area.

8. A network model training apparatus, comprising:

an acquisition module for acquiring a training sample set, the training sample set comprising: an image of a marked object and an image containing only the object;

the training module is used for training the SSD recognition network by using the training sample set to obtain a trained recognition model, wherein during training, the image only containing the target object is used as a standard, the SSD recognition network performs similarity scoring on the area image where the target object is located, which is recognized from the image marked with the target object, and model parameters of the SSD recognition network are updated according to the similarity scoring.

9. An electronic device, comprising:

the device comprises a memory and a processor, wherein the processor is connected with the memory;

the memory is used for storing programs;

the processor is configured to invoke a program stored in the memory to perform the method of any of claims 1-4 or to perform the method of any of claims 5-6.

10. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1-4 or performs the method of any of claims 5-6.