CN113643420A

CN113643420A - Three-dimensional reconstruction method and device

Info

Publication number: CN113643420A
Application number: CN202110749332.XA
Authority: CN
Inventors: 谢日旭; 王明晖; 赵铮; 魏晓林
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2021-11-12
Anticipated expiration: 2041-07-02
Also published as: CN113643420B

Abstract

The specification discloses a three-dimensional reconstruction method and a three-dimensional reconstruction device. The method comprises the steps of carrying out target detection on each frame of image of an acquired reconstruction object, carrying out mask processing on the position of a target object belonging to a preset type to determine a reconstructed image, grouping each reconstructed image according to time sequence information when each reconstructed image is acquired to obtain a plurality of data groups, carrying out sequence feature matching on the reconstructed images in the data groups, carrying out pairwise matching on a first frame reconstructed image of each data group and each reconstructed image of other data groups aiming at each data group in the data groups to find a common-view area of different data groups and carrying out three-dimensional reconstruction. By adding a mask to a target object which can generate interference in feature matching, the interference of the target object to the matching accuracy is removed, and the images are grouped according to the time sequence, so that the calculation time and the calculation amount for carrying out overall violent feature matching are reduced, and the accuracy and the reconstruction efficiency of three-dimensional reconstruction are improved.

Description

Three-dimensional reconstruction method and device

Technical Field

The present disclosure relates to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method and apparatus.

Background

With the wide application of three-dimensional reconstruction techniques, the accuracy of three-dimensional reconstruction results is also more and more important. Currently, three-dimensional reconstruction techniques include vision-based three-dimensional reconstruction and laser-based three-dimensional reconstruction. In the vision-based three-dimensional reconstruction technology, a plurality of frames of images of a reconstructed object (such as an article, a building group, an indoor scene and the like) are acquired, and the images have a certain common visual area so as to be capable of performing feature detection based on the common visual area in the images and performing feature matching on the images. After feature matching is carried out to obtain feature points of a plurality of target objects, three-dimensional reconstruction is carried out based on the obtained feature points.

However, in the current vision-based three-dimensional reconstruction technology, feature matching is performed by means of brute force matching, that is, for each frame of image, feature matching is performed on the frame of image and each other image two by two, so that all possible matching is tried in the acquired images. The calculation amount of violence matching is large, time consumption is long, and therefore three-dimensional reconstruction efficiency is low.

Moreover, if similar or identical objects are placed at different positions of the reconstruction object (for example, the same decorations are placed at different positions in a room), when the images corresponding to the similar objects are violently matched, the similar objects may be mismatched, that is, the images acquired at different positions are successfully matched with the feature points. In this case, these ornaments become interferents for three-dimensional reconstruction of the reconstruction object. Therefore, when an interference object exists in the acquired image of the reconstruction object, the probability of mismatching of the feature points is increased by the violence matching method, and the three-dimensional reconstruction is inaccurate or even fails.

Disclosure of Invention

The present specification provides a three-dimensional reconstruction method and apparatus to partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the present specification provides a three-dimensional reconstruction method, including:

acquiring a plurality of frames of images of a pre-acquired reconstruction object, and carrying out target detection on each frame of image;

determining image areas of the target object which belongs to a preset type from each frame of image according to the detection result, and determining each reconstructed image by adding a mask to the image areas of the target object in each frame of image;

acquiring time sequence information of each frame of reconstructed image, and grouping each frame of reconstructed image according to the time sequence information to obtain each data group;

for each data group, sequentially performing feature matching on each frame of reconstructed image in the data group and adjacent reconstructed images thereof according to the time sequence information, and performing feature matching on head and tail frame reconstructed images in the data group and each frame of reconstructed image in other data groups;

and performing three-dimensional reconstruction on the reconstruction object according to the obtained matching results.

Optionally, the target detection is performed on each frame of image, and specifically includes:

aiming at each frame of image, carrying out target detection on the image, and judging whether a target object exists in the image or not;

if so, determining the type and the position of the target object in the image, and taking the determined type and the determined position of the target object as a detection result;

and if not, taking the target object not existing in the image as a detection result.

Optionally, determining an image region of the target object belonging to a preset type from each frame of image according to the detection result, and determining each reconstructed image by adding a mask to the image region of the target object in each frame of image, specifically including:

determining that each frame image with the target object belonging to the preset type is a target image according to the detection result, and determining that each frame image without the target object belonging to the preset type and each frame image without any target object are standard images;

adding a mask to the position of a target object belonging to a preset type in each frame of target image;

and taking each target image and each standard image after the mask is added as a reconstructed image.

Optionally, according to the time sequence information, grouping the reconstructed images of each frame to obtain each data group, specifically including:

determining the acquisition sequence of each frame of reconstructed image according to the time sequence information;

according to the acquisition sequence, sequentially determining the time interval between two adjacent reconstructed images;

judging whether two reconstructed images with time intervals larger than a preset interval exist or not;

if so, taking two reconstructed images with the time interval larger than the preset interval as identification images, and grouping the reconstructed images of the frames according to the obtained identification images to obtain data groups;

wherein the preset interval is greater than the frame rate of the collected images.

Optionally, grouping the reconstructed images of the frames according to the obtained identification images to obtain data groups, specifically including:

aiming at each frame of identification image, determining an adjacent identification image of the frame of identification image according to the acquisition sequence;

judging whether other reconstructed images exist between the frame identification image and the adjacent identification image;

and if so, determining that the frame identification image, the adjacent identification image and the other reconstructed images between the frame identification image and the adjacent identification image are a data group.

Optionally, according to the time sequence information, sequentially performing feature matching on each reconstructed image in the data set and its adjacent reconstructed image, specifically including:

according to the time sequence information, sequentially determining each frame of reconstructed image in the data set as a target matching image;

determining a matching interval of the target matching image according to a preset interval length;

and performing feature matching on the target matching image and other reconstructed images in the matching interval.

Optionally, before determining an image region of the target object belonging to the preset type from each frame of image according to the detection result, the method further includes:

acquiring a reference image of a target object belonging to a preset type and a plurality of pre-acquired frame images of a reconstruction object;

for each acquired frame image, performing similarity matching on the frame image and the reference image;

and determining the image area of the target object belonging to the preset type from each frame image according to the matching result.

The present specification provides a three-dimensional reconstruction apparatus comprising:

the detection module is used for acquiring a plurality of frames of images of a pre-collected reconstruction object and carrying out target detection on each frame of image;

the determining module is used for determining an image area of the target object belonging to a preset type from each frame of image according to the detection result, and determining each reconstructed image by adding a mask to the image area of the target object in each frame of image;

the grouping module is used for acquiring time sequence information of each frame of reconstructed image, and grouping each frame of reconstructed image according to the time sequence information to obtain each data group;

the matching module is used for sequentially carrying out feature matching on each frame of reconstructed image in each data group and adjacent reconstructed images thereof according to the time sequence information and carrying out feature matching on the head and tail frame of reconstructed images in the data group and each frame of reconstructed image in other data groups;

and the reconstruction module is used for performing three-dimensional reconstruction on the reconstruction object according to the obtained matching results.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the three-dimensional reconstruction method described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above three-dimensional reconstruction method when executing the program.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the three-dimensional reconstruction method provided in this specification, target detection is performed on each frame of image of an acquired reconstruction object, a mask processing is performed on the position of a target object belonging to a preset type to determine a reconstructed image, each reconstructed image is grouped according to time sequence information when each reconstructed image is acquired to obtain a plurality of data sets, sequential feature matching of the reconstructed images is performed in the data sets, and for each data set, pairwise matching is performed on a first reconstructed image of the data set and each reconstructed image of other data sets to find a common-view region of different data sets, so as to perform three-dimensional reconstruction.

According to the method, the mask is added to the target object which can generate interference in feature matching, the interference of the target object to the matching accuracy is removed, the images are grouped according to the time sequence, the calculation time and the calculation amount of overall violent feature matching are reduced, and the accuracy and the reconstruction efficiency of three-dimensional reconstruction are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

fig. 1 is a schematic flow chart of a three-dimensional reconstruction method in this specification;

FIG. 2 is a schematic diagram of feature matching provided herein;

FIG. 3 is a schematic diagram of a packet provided herein;

FIG. 4 is a schematic diagram of an acquisition path provided herein;

FIG. 5 is a schematic diagram of an acquisition path provided herein;

fig. 6 is a schematic diagram of a three-dimensional reconstruction apparatus provided herein;

fig. 7 is a schematic structural diagram of an electronic device corresponding to fig. 1 provided in this specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

Currently, in the vision-based three-dimensional reconstruction technology, a certain common viewing area is required for the acquired partial images of the reconstruction object, and usually, the images are obtained by shooting the reconstruction object from different angles in a whole manner or shooting different local areas of the reconstruction object. Because the full view of the reconstructed object cannot be shot from one angle, no matter the images are shot from different angles to the reconstructed object or the images are shot from different local areas of the reconstructed object, the images need to have certain common viewing areas, so that when three-dimensional reconstruction is performed, feature matching is performed based on the common viewing areas, and a plurality of feature points corresponding to the same three-dimensional point on the reconstructed object in reality are found on different shot images. And after the matching results of all the acquired images are obtained, performing three-dimensional reconstruction based on the matching results.

Take three-dimensional reconstruction of an indoor scene of a mall as an example. Because the market belongs to comparatively huge reconstruction object, and the internal structure of market is complicated, consequently, when carrying out three-dimensional reconstruction to indoor scene such as market usually, need carry out the collection of local image along its inside route in the market, after gathering whole images, carry out the feature matching again to carry out three-dimensional reconstruction according to each matching result that obtains.

However, in the current three-dimensional reconstruction technology based on vision, feature matching is performed through violence matching, and the computation amount of violence matching is large and time consumption is long, so that the three-dimensional reconstruction efficiency based on violence matching is low. Moreover, if similar interferents exist in different acquired images, the violence matching method can increase the probability of mismatching of the feature points (e.g., matching of feature points corresponding to different objects in the images is successful), which leads to inaccurate three-dimensional reconstruction and even failure of reconstruction.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method in this specification, which specifically includes the following steps:

s100: acquiring a plurality of frames of images of a pre-acquired reconstruction object, and carrying out target detection on each frame of image.

In one or more embodiments of the present description, the three-dimensional reconstruction method may be performed by a server.

In one or more embodiments of the present disclosure, when reconstructing a three-dimensional reconstruction object, a plurality of frames of images of the reconstruction object, which are acquired in advance by an acquiring person, need to be acquired first.

Since the interferent may be collected when the image of the reconstruction object is collected, if the interferent exists between different frames of images, the interferent is easily mismatched when the collected images are feature-matched, so that feature points actually belonging to different positions are successfully matched.

In one or more embodiments of the present specification, an object that appears in different positions of an environment corresponding to a reconstruction object (an external environment where the reconstruction object is located, or an internal environment of the reconstruction object itself when the reconstruction object is an indoor scene) and is prone to mismatch may be used as an interfering object. Generally, the interferents also include multiple types, and the probability of a mismatch is higher for the same type of interferents (i.e., similar interferents) that occur at different locations.

In one or more embodiments of the present disclosure, for a reconstructed object, several types of interferents existing in an environment of the reconstructed object may be determined as a preset type of target object according to the environment corresponding to the reconstructed object. So as to eliminate the interference of the interfering object to the reconstruction accuracy in the subsequent steps.

Taking the reconstruction object as an indoor scene as an example, usually, some ornaments, such as decorative colored lamps in spring festival, christmas trees in christmas, decorative wall paintings, advertisement posters, etc., exist in the indoor scene. Generally, the holiday decorations are time-efficient, and are removed after the holidays, and the decorative wall pictures, advertising posters, and the like are replaced periodically. These unfixed items to be decorated indoors are items that do not need to participate in three-dimensional reconstruction for aesthetic purposes or for advertising purposes. Moreover, such articles are generally placed at a plurality of positions, and if the articles are decorated at a plurality of positions in a room, mismatching may be caused when the features of the acquired images of the multi-frame reconstruction object are matched, and the decorations may become interferences.

Therefore, in one or more embodiments of the present specification, after acquiring several frames of pre-acquired images of a reconstruction object, the server may perform target detection on each pre-acquired frame of image to detect whether an interfering object exists in each frame of image. Based on this, it is first necessary to determine which targets are acquired when acquiring the image of the reconstruction object, so as to further determine whether there are targets belonging to the preset type and possibly causing interference in the subsequent steps.

In one or more embodiments of the present disclosure, the server may perform target detection on each image, and determine whether a target object exists in the image. If yes, determining the type and the position of the target object in the image, and taking the determined type and the determined position of the target object as a detection result. And if not, taking the target object not existing in the image as a detection result.

Wherein the object is an object identified from the image by object detection. The server may determine which objects are at which locations in the image by object detection. For example, when the server detects the target of each frame of image, if the scene exists in each frame of image, the server may output that the type of the target object is a decorative hanging picture, and mark the specific position of the scene in the image through the surrounding frame.

In one or more embodiments of the present disclosure, the specific type of target detectable by target detection may be set as desired, and the present disclosure is not limited thereto. For example, it is assumed that the target object capable of being detected by the target detection may include a merchant logo, a decorative poster, an advertisement poster, and the like. If a large number of similar wall prints and advertisement posters exist in a market and the logo of different merchants is not repeated, the decorative wall prints and the advertisement posters can be used as preset types of target objects, and in the subsequent steps, if the two preset types of target objects are detected, masks can be added to the target objects. Or, if a chain store exists in a shopping mall or a merchant with a large floor space of the store is considered, a plurality of identical logos may exist in different positions of the shopping mall, the preset type of target object may include: commercial logo, decorative wall painting and advertisement poster. In a subsequent step, if one or more of these three preset types of targets are detected, a mask may be added thereto. Therefore, the preset type of target object can be adjusted according to the actual situation of the environment corresponding to the reconstruction object.

It should be noted that target detection is a mature technology, and the detailed process of target detection is not described herein. For example, in the present specification, the server may perform target detection by a target detection method such as Fast R-CNN, R-FCN, YOLO (young only look once), ssd (single Shot multi box detector), RetinaNet, and the like, and obtain a bounding box (bounding box) and a type of each target based on the target detection to determine a position and a type of each target. Of course, other target detection techniques may be used, and the description is not limited herein.

In one or more embodiments of the present specification, the server may further perform semantic segmentation on each frame of image through an image semantic segmentation technology. That is, for each pixel point in the frame image, the category of the pixel point is determined to obtain a plurality of pixel groups belonging to different categories. Each pixel group corresponds to a target object so as to realize target detection of each frame of image of the reconstruction object. When the target is detected through semantic segmentation of the image, the image which does not contain the target does not exist any more, so that the server can determine the finally obtained target corresponding to each pixel group as a detection result.

S102: and determining an image area of the target object belonging to a preset type from each frame image according to the detection result, and determining each reconstructed image by adding a mask to the image area of the target object in each frame image.

Because interference objects may exist in a plurality of frames of images of the reconstruction object acquired in advance, after the detection result of the target detection is obtained, the server can determine the interference objects and add masks to the interference objects so as to eliminate the influence of the interference objects on the accuracy of the three-dimensional reconstruction.

In one or more embodiments of the present disclosure, after obtaining the target detection result, specifically, the server may determine, according to the detection result, that each frame image in which the target object belonging to the preset type exists is a target image, and determine that each frame image in which the target object belonging to the preset type does not exist and each frame image in which any target object does not exist are standard images. For each frame of target image, the server can add a mask to the position of a target object belonging to a preset type in the target image, and each target image and each standard image after the mask is added are used as a reconstructed image.

Wherein, the reconstructed image is each image subjected to feature matching in the subsequent step.

Reconstruction of the reconstructed object is affected differently by the number of objects of the same type. In general, a larger number of objects has a larger influence on reconstruction of a reconstruction target, and a mismatch is more likely to occur when features are matched.

Therefore, in one or more embodiments of the present disclosure, the server may determine the preset type of target object in real time according to the number of each type of target object obtained by target detection. Specifically, the server may determine, for each type of the detected target object, the number of the target objects corresponding to the type, and determine whether the number is greater than a preset value, and if so, determine that the type is one of the preset types. If not, the type is not taken as the preset type.

In addition, in consideration of the difference between different reconstruction objects (or the difference between the reconstruction objects and the environments), different types of reconstruction objects (or reconstruction objects corresponding to different environments) have different target objects which become interferents in the three-dimensional reconstruction process. For example, for two reconstructed objects, namely, an inside scene of a stadium and an inside scene of a mall, since sports doodles may be placed on the wall of the stadium and fitness equipment can be placed at different positions, the doodles generally have certain repeatability, and the types of the fitness equipment are limited but the number of the fitness equipment is large, the doodles and the fitness equipment which repeatedly appear at different positions are more likely to be interference objects for reconstructing the inside scene of the stadium of the object, and are more suitable to be target objects of preset types. For a shopping mall, various types of posters and stickers are usually arranged in the shopping mall, and for reconstructing the internal scene of the object shopping mall, the posters and the stickers which repeatedly appear at different positions are easy to be interference objects of the object shopping mall, and are more suitable to be target objects of preset types.

Therefore, in one or more embodiments of the present disclosure, the server may determine, according to the type of the reconstruction object itself or the corresponding environment, a preset type of the target object corresponding to the reconstruction object. That is, as described in step S100, the preset type of target object may be adjusted according to the actual situation of the environment corresponding to the reconstructed object.

S104: acquiring time sequence information of each frame of reconstructed image, and grouping each frame of reconstructed image according to the time sequence information to obtain each data group.

The method aims to reduce the time occupied by feature matching and the calculation amount brought by matching while ensuring the reconstruction accuracy. The server can group the reconstructed images of each frame to obtain data groups, and in the data groups, the server can perform feature matching on the reconstructed images of each frame and the reconstructed images of adjacent frames according to the acquisition sequence. Because the reconstructed images are continuously acquired, the adjacent reconstructed images can have enough common-view areas and are only subjected to feature matching with the reconstructed images of the adjacent frames, and the calculation amount is greatly reduced. Between the data groups, the server can enable the head and tail frame reconstruction images of each data group to be matched with the reconstruction images in other data groups in pairs, a small amount of violent matching is carried out, so that the reconstruction images with the common visual area are searched from other data groups, and the masking processing is carried out on the target object possibly having interference by combining the steps, so that the accuracy of three-dimensional reconstruction can be ensured, meanwhile, the calculated amount brought by feature matching is reduced, and the efficiency of three-dimensional reconstruction is improved.

Due to interference of environmental factors or problems of volume and structure of a reconstructed object, when an acquiring person acquires each frame of image of the reconstructed object, image contents of two adjacent frames of images which are continuous in time are discontinuous, that is, a common-view region does not exist. Taking the reconstruction object as an example of an indoor scene of a market, because the staff in the market flow more, when the acquisition staff performs image acquisition, the acquisition staff may not continuously acquire along the path of the market due to the shielding or interference of the staff in the market, and the acquisition staff may need to bypass or skip the position of the shielded staff in the market and then acquire the position of the shielded staff. Or, due to the complex structure of the mall, for example, when the acquiring person acquires along one path in the mall, the acquiring person walks into the dead end, and at this time, the acquiring person needs to suspend the acquisition, move to the acquired position (to ensure that the next acquired image and the acquired image have the common-view area), and continue the acquisition along another non-acquired path (i.e., while ensuring that the next acquired image and the acquired image have the common-view area, the repeated acquisition is not performed any more).

Therefore, in one or more embodiments of the present disclosure, after obtaining each reconstructed image, the server may group the reconstructed images of each frame according to a time at which the reconstructed image of each frame is acquired. So that in the subsequent step, feature matching is performed based on the grouping result.

In one or more embodiments of the present disclosure, the server may obtain timing information of acquiring each frame of reconstructed image, and group each frame of reconstructed image according to the timing information to obtain each data group. The time sequence information is information that records a timestamp for acquiring each frame of reconstructed image and can reflect the sequence for acquiring each frame of reconstructed image, and the specific form, structure and specific content of the time sequence information may be set as required, which is not limited herein. The time sequence information is the time sequence information of the acquisition personnel when acquiring the image of the reconstruction object corresponding to each frame of the reconstruction image.

As described above, when there is no common-view region between two continuously acquired images due to the influence of the environment or the acquisition path, even if the two images are continuous in acquisition time, the time interval between the two images is large and larger than the time interval between other normally acquired images because the acquisition is suspended and the acquisition is resumed to exclude the influence factor.

Therefore, in one or more embodiments of the present specification, when the server groups the reconstructed images of the frames according to the timing information, the server may determine an acquisition order of the reconstructed images of the frames according to the timing information, and sequentially determine a time interval between two adjacent reconstructed images according to the acquisition order. Then, the server can judge whether two reconstructed images with time intervals larger than a preset interval exist. And if so, taking two reconstructed images with the time interval larger than the preset interval as identification images, and grouping the reconstructed images of the frames according to the determined identification images to obtain data groups. The preset interval is greater than the frame rate of the collected images, the length of the preset interval can be specifically set as required, the preset interval can be greater than the time interval between two adjacent frames of images under the normal collection condition, and the preset interval can be used as a basis for grouping between two adjacent frames of images, which is not limited herein.

Therefore, the situation that an acquisition person is interfered by the environment when acquiring the images for three-dimensional reconstruction can be considered, the influence caused by the acquisition person is ingeniously solved, and the acquired images can be grouped by using the acquisition person, so that in the subsequent step, the sequence characteristics of the reconstructed images of adjacent frames can be matched in the data group, and a small amount of violence matching is carried out between the data groups. The accuracy of three-dimensional reconstruction of the reconstructed object based on the matching result can be ensured while the calculation amount is reduced.

S106: and for each data group, sequentially performing feature matching on each frame of reconstructed image in the data group and adjacent reconstructed images thereof according to the time sequence information, and performing feature matching on head and tail frame reconstructed images in the data group and each frame of reconstructed image in other data groups.

In one or more embodiments of the present specification, after determining each data group, the server may perform, for each data group, feature matching on each reconstructed image in the data group and an adjacent reconstructed image in sequence according to the time sequence information, and perform feature matching on a head-to-tail reconstructed image in the data group and each reconstructed image in other data groups. That is, the reconstructed images in the data groups are sequentially matched, and the reconstructed images in the data groups are violently matched. And obtaining a plurality of characteristic point pairs according to the matching results in the groups and between the groups.

In one or more embodiments of the present disclosure, when sequentially matching reconstructed images in a data set, specifically, the server may sequentially determine each frame of reconstructed image in the data set as a target matching image according to the timing information, and determine a matching interval of the target matching image according to a preset interval length. Then, the server can perform feature matching on the target matching image and other reconstructed images in the matching interval. In the matching interval, the reconstructed image which has been subjected to the feature matching with the target matching image is not subjected to the feature matching again with the target matching image, that is, the server may screen out the reconstructed image which is not matched with the target matching image from the reconstructed images in the matching interval, and then perform the feature matching with the screened reconstructed image.

In one or more embodiments of the specification, the interval length may be the length of a time interval or the length of a number interval. The specific configuration may be set as required, and the present specification is not limited herein.

For example, assuming that the interval length is a number length and the preset interval length is 3 frames (total 6 frames) before and after the target matching image, 7 reconstructed images are assumed to be present in one data set, as shown in fig. 2. Fig. 2 is a schematic diagram of feature matching provided in the present specification. As shown in the figure, each rectangle represents a reconstructed image, and the rectangles filled with oblique lines represent logo images in the reconstructed image. The matching section 1 is a matching section of the target matching image 1, and since the target matching image 1 is the 1 st frame of image in the data set and no image exists in front of the target matching image 1, the matching section of the target matching image 1 only includes 3 reconstructed images (three reconstructed images after the target matching image 1, i.e., 2 nd, 3 rd and 4 th reconstructed images) except the target matching image 1. The matching section 2 is a matching section of the target matching image 2, and since the target matching image 2 is the 2 nd frame image in the data set and has only 1 frame image in front of it, the matching section of the target matching image 2 includes only 4 reconstructed images (the previous frame and the next three frames) in addition to itself. However, since the target matching data are sequentially collected in the data set, the target matching image 1 is subjected to feature matching prior to the target matching image 2, and the target matching image 2 is included in the matching section 1, when the target matching image 1 is subjected to feature matching, the target matching image 2 is subjected to feature matching. When the target matching data 2 is matched with the reconstructed image in the matching section 2, the target matching image 1 included in the matching section 2 is not subjected to feature matching. To avoid repetitive feature matching.

In one or more embodiments of the present disclosure, the server does not need to violently match two reconstructed images among different data groups when violently matching the reconstructed images among the data groups. The server can perform feature matching on only the first frame reconstructed image in each data set, namely the first frame identification image, and each frame reconstructed image in other data sets. Also, when violent matching between data groups is performed, repetitive matching is not performed.

In one or more embodiments of the present specification, when performing feature matching between data sets, the server may perform feature matching on the first frame reconstructed image and/or the last frame reconstructed image of each data set with reconstructed images in other data sets for each data set. That is, the server may perform feature matching on only the first frame reconstructed image in the data set and each frame reconstructed image in another data set, may perform feature matching on only the last frame reconstructed image in the data set, or may perform feature matching on each of the first frame reconstructed image and the last frame reconstructed image in the data set and each of the frame reconstructed images in another data set. The specific configuration may be set as required, and the present specification is not limited herein.

It should be noted that the first frame and the last frame referred to in this specification may be one frame image or multiple frames of images, that is, the first frame may refer to the first frame, or may refer to the first frames, or may refer to the last frames, which may be set as required, and this specification is not limited herein.

In one or more embodiments of the present specification, the server may use an existing Feature matching algorithm, such as Scale-Invariant Feature Transform (SIFT) algorithm, orb (organized FAST and Rotated brief) algorithm, etc., when performing Feature matching on the reconstructed image, although other algorithms may also be used, which may be specifically set as needed, and the present specification is not limited herein.

S108: and performing three-dimensional reconstruction on the reconstruction object according to the obtained matching results.

In one or more embodiments of the present specification, after performing intra-group and inter-group feature matching on all data groups, the server may perform three-dimensional reconstruction on the created object according to each obtained matching result.

In one or more embodiments of the present disclosure, the server may perform three-dimensional reconstruction on the reconstruction object through a Structure From Motion (SFM) technique, but other three-dimensional reconstruction techniques may also be used, and the present disclosure is not limited thereto.

In one or more embodiments of the present specification, when performing three-dimensional reconstruction, the server may randomly select two successfully matched reconstructed images from all reconstructed images according to a matching result, to serve as two keyframes, and determine poses of the two keyframes based on epipolar geometry. Then, the server can determine the coordinates of the same three-dimensional point corresponding to each group of feature point pairs in the two key frames through triangulation according to the poses of the key frames. Then, the server can determine the next frame reconstruction image with the three-dimensional points of the two key frames having the corresponding characteristic points according to the obtained plurality of three-dimensional points corresponding to the characteristic points of each group. Further, the server can calculate, for each three-dimensional point of the two key frames, the pose of the next frame of reconstructed image (i.e. the third frame) by PNP (passive N-point) according to the coordinates of the three-dimensional point. Then, the server can continue to determine the coordinates of more three-dimensional points by triangularization according to the pose of the next reconstructed image and the poses of the two key frames until the pose of the next reconstructed image (i.e. the fourth frame) is solved, and the process from triangularization to solving the pose of the next reconstructed image by PNP is circulated until no reconstructed image which does not participate in the circulation exists.

Based on the three-dimensional reconstruction method shown in fig. 1, target detection is performed on each frame of image of an acquired reconstruction object, the position of a target object belonging to a preset type is subjected to mask processing to determine a reconstructed image, each reconstructed image is grouped according to time sequence information when each reconstructed image is acquired to obtain a plurality of data sets, sequence feature matching of the reconstructed images is performed in the data sets, and for each data set, pairwise matching is performed on a first reconstructed image of the data set and each reconstructed image of other data sets to find a common-view region of different data sets, so that three-dimensional reconstruction is performed.

In addition, in one or more embodiments provided in this specification, after obtaining a plurality of frames of images of a reconstruction object acquired in advance, the server may further filter out images that may have interference by using other methods, for example, taking an indoor scene of a shopping mall as an example of the reconstruction object, because a plurality of merchants exist in the shopping mall and each merchant has its own signboard, when the signboards of different merchants are similar, or when a storefront of one merchant is large, and there are a plurality of entrances and exits and each entrance and exit is decorated with a signboard, although the signboards of different entrances and exits are in different positions in the shopping mall, the same signboard may bring about mismatching when feature matching is performed. Therefore, the server may further acquire a plurality of frames of images of the reconstruction object acquired in advance, and perform Recognition on the signboard, for example, through an Optical Character Recognition (OCR) technique, specifically, the server may perform OCR Recognition on each frame of image to determine whether a signboard region exists therein, and if so, add a mask to the signboard region obtained by Recognition, and use the frame of image with the mask added as the reconstruction image.

In one or more embodiments provided in this specification, before determining the target object belonging to the preset type from each frame image according to the detection result, the server may further obtain a reference image of the target object belonging to the preset type and a number of frame images of a reconstruction object acquired in advance, and then, for each acquired frame image, the server may perform similarity matching between the frame image and the reference image, and determine the target object belonging to the preset type from each frame image according to the matching result. And performing the subsequent steps.

Wherein the reference image may be an image acquired from a plurality of perspectives. For example, the image may include a front view, a side view, a bottom view, a top view, or the like, or may include an image of a partially occluded object. The reference image may be an image of a preset type of target object acquired in advance, or may be determined from an image set, and may be set as needed, which is not limited herein.

In one or more embodiments provided in this specification, the server may perform matching specific to an object or may perform only semantic matching when performing similarity matching between the frame image and the reference image. For example, taking the reference image as an image of a decorative wall painting as an example, when matching specifically to a target object is performed, it is necessary to determine that matching is successful only when the frame image and the reference image are determined to be the same wall painting according to similarity matching, and when only semantic matching is performed, it is determined that matching is successful when the frame image and the reference image are determined to be objects (decorative wall painting) of the same type according to similarity matching, which may be specifically set as required, and this specification is not limited herein.

Since the similarity matching is a mature technology, the process of similarity matching is specifically performed, and is not described herein in detail.

In one or more embodiments of the present disclosure, in step S104, when the server groups the reconstructed images of the frames according to the identification image to obtain data groups, specifically, for each identification image, according to the acquisition sequence, an adjacent identification image of the frame is determined, and then it is determined whether there are other reconstructed images between the identification image of the frame and the adjacent identification image, and if yes, the identification image of the frame, the adjacent identification image, and other reconstructed images between the identification image of the frame and the adjacent identification image are determined as a data group.

The term "adjacent" means that they are closest in time sequence. The two adjacent reconstructed images refer to two reconstructed images which are adjacent to each other before and after each frame of image is arranged according to a time sequence. The adjacent identification image of the frame identification image is the other identification image closest to the frame identification image when the frame images are arranged in time sequence.

Fig. 3 is a schematic diagram of a packet provided in the present specification. As shown in the figure, each rectangle represents each frame image, wherein the rectangles filled with oblique lines represent logo images, and the rectangles filled with white represent reconstructed images. It can be seen that two data sets are shown, the identification image 1, the identification image 2 and the reconstructed images therebetween form one data set, and the identification image 3, the identification image 4 and the reconstructed images therebetween form the other data set. As can be seen, since there is no other reconstructed image between the marker image 2 and the marker image 3, the marker image 2 and the marker image 3 are not divided into the same data group when the respective reconstructed images are grouped.

Fig. 4 is a schematic diagram of an acquisition path provided in the present specification. Fig. 4 is a partial schematic view of an indoor scene of a mall, in which two rectangles represent two business areas in the mall, a dotted line represents an acquisition path of an acquisition person when acquiring an image, an arrow represents an acquisition direction, wherein a slash filled circle represents a start of the acquisition path, and a gray filled circle represents a position where the acquisition person restarts acquisition after suspending acquisition. It can be seen that the acquisition personnel acquire from the beginning downwards, pause acquisition along the acquisition path to the end of the horizontal arrow, then move to the gray filled circular position, and acquire downwards to the end of the vertical arrow.

Fig. 5 is a schematic diagram of an acquisition path provided in the present specification. As shown, two triangles represent two merchant areas within a merchant field, a dotted line represents an acquisition path of an acquiring person when acquiring an image, an arrow represents an acquisition direction, wherein a slash filled a circle represents the start of the acquisition path. It can be seen that the acquisition personnel starts to acquire along the acquisition path from the start position to the lower right corner direction, finishes acquiring the image of the right triangular area to the end position of the arrow pointing to the left side, continues to acquire the image of the left triangular area along the acquisition path without pause, and finishes acquiring at the end position of the arrow pointing to the right side.

In one or more embodiments provided in this specification, before step S106, after acquiring a plurality of frames of images of a reconstruction object acquired in advance, the server may further group the frames of images according to the time sequence information of the acquired frames of images to obtain each data group. Then, the server can perform target detection on the images in each data set, determine image areas of the target objects belonging to the preset type from the image areas, and determine each reconstructed image by adding masks to the image areas of the target objects in each frame of image.

In one or more embodiments of the present description, when the frame images are grouped first and then the target detection is performed, when the server performs the target detection, if it is determined that only the target object of the same preset type appears in different images in the same data set for multiple times, the mask may not be added to the image area. If the same preset type of target object appears between different data sets, a mask is added to the image area of the target object. Alternatively, the server may determine the predetermined type of object in real time based on the presence of the object in different data sets. For example, when it is determined that the same type of object appears only in different images within the same data set, the server may not regard it as a preset type of object. When the same type of target object appears in different data sets, the target object is taken as the preset type of target object.

In one or more embodiments of the present specification, when acquiring each type of object, the number of images required for continuously acquiring the type of object may be determined for each type of object, and the number may be used as a threshold value for the number of the type of objects. Then, after the server acquires a plurality of frames of pre-acquired images of the reconstruction object, the number of images including the type of the object in the acquired images can be determined. If the number of the images is larger than the number threshold, the type of the target object can be used as a preset type of target object, namely, an interfering object which may interfere with the reconstruction of the reconstruction object.

In addition, in one or more embodiments of the present specification, each frame image of the reconstruction object may be acquired by a robot, an unmanned device, or the like. Taking the image acquisition by an unmanned vehicle as an example, the unmanned vehicle can acquire images along a preset acquisition path, and a sensor can be arranged on the unmanned vehicle to determine an encountered obstacle in the acquisition process. When the unmanned vehicle encounters a moving obstacle (such as a flowing person in a mall) in the acquisition process, whether an obstacle avoidance acquisition strategy is adopted or not can be judged according to the number and/or the volume of the obstacles in front. Wherein, should keep away barrier collection strategy can include: one of a first policy, a second policy, and a third policy. Wherein the first policy: waiting in situ and suspending collection, and continuously collecting the obstacle to be moved along the collection path after the obstacle leaves the front collection path. The second strategy is as follows: and the collection is continued after the collection is suspended and the obstacle is moved. The third strategy is as follows: continue to follow the acquisition path but pause acquisition in areas with moving obstacles.

Specifically, the unmanned vehicle can judge whether the number of the moving obstacles is larger than the preset number of the obstacles, and if so, the obstacle avoidance acquisition strategy is determined to be adopted. Or the unmanned vehicle can judge whether the volume of the moving obstacle is larger than the preset obstacle volume, and if so, the obstacle avoidance acquisition strategy is determined to be adopted. Of course, the number and volume of the obstacles may also be combined to determine whether to adopt an obstacle avoidance strategy and which obstacle avoidance strategy is specifically adopted, and the present specification is not limited herein.

In addition, in step S104 in this specification, when the server groups the reconstructed images, the server may further sequentially arrange the reconstructed images of the frames according to the acquisition sequence according to the time sequence information, and sequentially determine whether there are two reconstructed images with a time interval greater than a preset interval, and if so, determine the previous reconstructed image and all reconstructed images ahead of the previous reconstructed image in time sequence as a data set. And continuously judging whether two reconstructed images with the time interval larger than the preset interval exist in the rest non-grouped reconstructed images, if so, continuously determining the determined non-grouped previous reconstructed image and all reconstructed images in front of the non-grouped previous reconstructed image in time sequence as a data group. Until all reconstructed images are grouped. Then, the server may use, for each data group, the first frame reconstructed image and the last frame reconstructed image of the data group as the identification images.

In one or more embodiments of the present disclosure, no matter the acquisition person acquires the image of the reconstructed object, or the acquisition is performed by other devices such as a robot and an unmanned device, when the acquisition needs to be suspended due to an obstacle or dead end due to acquisition, and the acquisition is continued after the obstacle is bypassed or the image is transferred to a new location, the acquisition is continued from the acquired position along an undetected path in order to ensure that the subsequently acquired image and the image acquired before the suspension have a common view area. In addition, for the part of the paths for which the acquisition is suspended in order to avoid the obstacle, the part of the paths for which the acquisition is suspended may be specially acquired after the acquisition of the acquisition object is finished in order to complete the acquired image.

In addition, in order to facilitate the subsequent three-dimensional reconstruction of the reconstructed object as much as possible and reduce interference during acquisition, after acquisition is suspended and acquisition is resumed, acquisition can be started from a target object without a preset type, that is, acquisition is continued along a path which is not acquired from the position where the acquired target object without the preset type does not exist.

In addition, in the present specification, each frame image to be reconstructed is three-dimensionally reconstructed without acquiring position information.

In one or more embodiments of the present description, when performing target detection, the server may also perform detection only on a preset type of target object, that is, directly obtain a preset type of target object in an image through target detection, and add a mask to an image area of the preset type of target object, without screening the preset type of target object from a plurality of types of target objects obtained through target detection.

Based on the same idea, the three-dimensional reconstruction method provided above for one or more embodiments of the present specification further provides a corresponding three-dimensional reconstruction apparatus, as shown in fig. 6.

Fig. 6 is a schematic diagram of a three-dimensional reconstruction apparatus provided in the present specification, the apparatus including:

the detection module 200 is configured to acquire a plurality of frames of images of a pre-acquired reconstruction object, and perform target detection on each frame of image;

a determining module 201, configured to determine, according to the detection result, an image region of the target object belonging to a preset type from each frame image, and determine each reconstructed image by adding a mask to the image region of the target object in each frame image;

the grouping module 202 is configured to acquire timing information of each frame of acquired reconstructed image, and group each frame of reconstructed image according to the timing information to obtain each data group;

the matching module 203 is configured to perform feature matching on each frame of reconstructed image in each data group and adjacent reconstructed images thereof in sequence according to the timing information, and perform feature matching on the head and tail frame of reconstructed images in the data group and each frame of reconstructed images in other data groups;

and the reconstruction module 204 is configured to perform three-dimensional reconstruction on the reconstructed object according to the obtained matching results.

Optionally, the detecting module 200 is configured to perform target detection on each image, determine whether a target object exists in the image, determine the type and the position of the target object in the image if the target object exists in the image, and use the determined type and the determined position of the target object as a detection result, otherwise, use the target object does not exist in the image as the detection result.

Optionally, the determining module 201 is configured to determine, according to the detection result, that each frame of image in which the target object belonging to the preset type exists is a target image, determine that each frame of image in which the target object belonging to the preset type does not exist and each frame of image in which any target object does not exist are standard images, add a mask to a position of the target object belonging to the preset type in each frame of target image, and use each target image and each standard image after the mask is added as a reconstructed image.

Optionally, the grouping module 202 is configured to determine an acquisition sequence of each frame of reconstructed images according to the timing information, sequentially determine a time interval between two adjacent reconstructed images according to the acquisition sequence, determine whether two reconstructed images with a time interval larger than a preset interval exist, if yes, use the two reconstructed images with a time interval larger than the preset interval as identification images, and group the reconstructed images according to the obtained identification images to obtain each data group, where the preset interval is larger than a frame rate of the acquired images.

Optionally, the grouping module 202 is configured to determine, for each frame of the identification image, an adjacent identification image of the frame of the identification image according to the acquisition sequence, determine whether there are other reconstructed images between the frame of the identification image and the adjacent identification image, and if so, determine that the frame of the identification image, the adjacent identification image, and the other reconstructed images between the frame of the identification image and the adjacent identification image are a data group.

Optionally, the matching module 203 is configured to sequentially determine each frame of reconstructed image in the data set as a target matching image according to the timing information, determine a matching interval of the target matching image according to a preset interval length, and perform feature matching on the target matching image and other reconstructed images in the matching interval.

Optionally, the determining module 201 is configured to acquire a reference image of a target object belonging to a preset type and a plurality of frames of images of a reconstruction object acquired in advance, perform similarity matching between each acquired frame of image and the reference image, and determine an image area of the target object belonging to the preset type from each frame of image according to a matching result.

The present specification also provides a computer-readable storage medium having stored thereon a computer program operable to execute the three-dimensional reconstruction method provided in fig. 1 above.

This specification also provides a schematic block diagram of the electronic device shown in fig. 7. As shown in fig. 7, at the hardware level, the electronic device includes a processor, an internal bus, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the three-dimensional reconstruction method provided in fig. 1.

Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of three-dimensional reconstruction, comprising:

2. The method of claim 1, wherein the performing the target detection on each frame of image comprises:

3. The method according to claim 2, wherein determining an image area of the object belonging to the preset type from each frame image according to the detection result, and determining each reconstructed image by adding a mask to the image area of the object in each frame image, specifically comprises:

4. The method of claim 1, wherein grouping the reconstructed images of the frames according to the timing information to obtain data groups comprises:

5. The method of claim 4, wherein grouping the reconstructed images of the frames according to the obtained identification images to obtain data groups comprises:

6. The method of claim 1, wherein sequentially performing feature matching on each reconstructed image in the data set and its neighboring reconstructed images according to the timing information comprises:

7. The method according to claim 1, wherein before determining an image area of the object belonging to the preset type from each frame image according to the detection result, the method further comprises:

8. A three-dimensional reconstruction apparatus, comprising:

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the program.