CN110276831B - Method and device for constructing three-dimensional model, equipment and computer-readable storage medium - Google Patents

Method and device for constructing three-dimensional model, equipment and computer-readable storage medium Download PDF

Info

Publication number
CN110276831B
CN110276831B CN201910573384.9A CN201910573384A CN110276831B CN 110276831 B CN110276831 B CN 110276831B CN 201910573384 A CN201910573384 A CN 201910573384A CN 110276831 B CN110276831 B CN 110276831B
Authority
CN
China
Prior art keywords
main body
visible light
image
target
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910573384.9A
Other languages
Chinese (zh)
Other versions
CN110276831A (en
Inventor
康健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910573384.9A priority Critical patent/CN110276831B/en
Publication of CN110276831A publication Critical patent/CN110276831A/en
Application granted granted Critical
Publication of CN110276831B publication Critical patent/CN110276831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application relates to a method and a device for constructing a three-dimensional model, a terminal device and a computer readable storage medium. The method comprises the steps of acquiring a visible light graph, and generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge; inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene; determining a target subject in the visible light map according to a subject region confidence map; acquiring depth information corresponding to a target main body; and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained, and improving the accuracy of three-dimensional model construction.

Description

Method and device for constructing three-dimensional model, equipment and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for building a three-dimensional model, a terminal device, and a computer-readable storage medium.
Background
With the development of imaging technology, people are more and more accustomed to shooting images or videos through image acquisition equipment such as a camera on electronic equipment and recording various information, and the people are paid more attention because the reality of three-dimensional image processing is stronger.
When a three-dimensional model is constructed in a conventional manner, the three-dimensional model is often affected by surrounding people or objects, which results in low accuracy in constructing the three-dimensional model.
Disclosure of Invention
The embodiment of the application provides a three-dimensional model construction method and device, terminal equipment and a computer readable storage medium, wherein an accurate main body region confidence map is obtained according to a center weight map and a main body detection model, so that a target main body in an image is accurately identified, when the three-dimensional model is constructed, the accurate construction of the three-dimensional model corresponding to the target main body is realized through the depth information of the target main body, and the accuracy of the three-dimensional model construction is improved.
A method of constructing a three-dimensional model, the method comprising:
acquiring a visible light graph, and generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge;
inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
determining a target subject in the visible light map according to the subject region confidence map;
acquiring depth information corresponding to the target main body;
and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
An apparatus for building a three-dimensional model, the apparatus comprising:
the processing module is used for acquiring a visible light map and generating a central weight map corresponding to the visible light map, wherein the weight value represented by the central weight map is gradually reduced from the center to the edge;
the detection module is used for inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is a model obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
a target subject determination module, configured to determine a target subject in the visible light map according to the subject region confidence map;
and the three-dimensional model building module is used for acquiring the depth information corresponding to the target main body, performing three-dimensional reconstruction on the target main body according to the depth information corresponding to the target main body and the target main body, and returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until the three-dimensional model corresponding to the target main body is obtained.
A terminal device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring a visible light graph, and generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge;
inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
determining a target subject in the visible light map according to the subject region confidence map;
acquiring depth information corresponding to the target main body;
and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a visible light graph, and generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge;
inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
determining a target subject in the visible light map according to the subject region confidence map;
acquiring depth information corresponding to the target main body;
and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
According to the construction method, the construction device, the terminal equipment and the computer readable storage medium of the three-dimensional model, the visible light graph is obtained, and the central weight graph corresponding to the visible light graph is generated, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge; inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene; determining a target subject in the visible light image according to the subject region confidence map; acquiring depth information corresponding to a target main body; the method comprises the steps of carrying out three-dimensional reconstruction on a target body according to the depth information corresponding to the target body and the target body, returning to the step of obtaining a visible light image to obtain visible light images at different collection angles until a three-dimensional model corresponding to the target body is obtained, enabling an object at the center of the image to be easily detected by using a center weight image, and accurately identifying the target body in the visible light image by using a trained main body detection model obtained by training the visible light image, the center weight image, a main body mask image and the like.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a block diagram showing an internal configuration of a terminal device in one embodiment;
FIG. 2 is a flow diagram of a method for building a three-dimensional model according to one embodiment;
FIG. 3 is a schematic representation of a three-dimensional model of a target subject according to one embodiment;
FIG. 4 is a flow diagram for determining a target subject in the visible light map based on the subject region confidence map in one embodiment;
FIG. 5 is a diagram illustrating a network architecture of a subject detection model in one embodiment;
FIG. 6 is a diagram illustrating the detection effect of a subject according to an embodiment;
FIG. 7 is a block diagram of an apparatus for constructing a three-dimensional model according to an embodiment;
fig. 8 is an internal configuration diagram of a terminal device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for constructing the three-dimensional model in the embodiment of the application can be applied to terminal equipment. The terminal equipment can be computer equipment with a camera, a personal digital assistant, a tablet computer, a smart phone, wearable equipment and the like. When a camera in the terminal equipment shoots an image, automatic focusing can be carried out so as to ensure that the shot image is clear.
In one embodiment, the terminal device may include therein an Image Processing circuit, which may be implemented by hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 1 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 1, for convenience of explanation, only aspects of the image processing technology related to the embodiments of the present application are shown.
As shown in fig. 1, the image processing circuit includes a first ISP processor 130, a second ISP processor 140 and control logic 150. The first camera 110 includes one or more first lenses 112 and a first image sensor 114. The first image sensor 114 may include a color filter array (e.g., a Bayer filter), and the first image sensor 114 may acquire light intensity and wavelength information captured with each imaging pixel of the first image sensor 114 and provide a set of image data that may be processed by the first ISP processor 130. The second camera 120 includes one or more second lenses 122 and a second image sensor 124. The second image sensor 124 may include a color filter array (e.g., a Bayer filter), and the second image sensor 124 may acquire light intensity and wavelength information captured with each imaging pixel of the second image sensor 124 and provide a set of image data that may be processed by the second ISP processor 140.
The first image collected by the first camera 110 is transmitted to the first ISP processor 130 for processing, after the first ISP processor 130 processes the first image, the statistical data (such as the brightness of the image, the contrast value of the image, the color of the image, etc.) of the first image may be sent to the control logic 150, and the control logic 150 may determine the control parameter of the first camera 110 according to the statistical data, so that the first camera 110 may perform operations such as auto focus and auto exposure according to the control parameter. The first image may be stored in the image memory 160 after being processed by the first ISP processor 130, and the first ISP processor 130 may also read the image stored in the image memory 160 for processing. In addition, the first image may be directly transmitted to the display 170 for display after being processed by the ISP processor 130, or the display 170 may read and display the image in the image memory 160.
Wherein the first ISP processor 130 processes the image data pixel by pixel in a plurality of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the first ISP processor 130 may perform one or more image processing operations on the image data, collecting statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.
The image Memory 160 may be a part of a Memory device, a storage device, or a separate dedicated Memory within a terminal device, and may include a DMA (Direct Memory Access) feature.
Upon receiving the interface from the first image sensor 114, the first ISP processor 130 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 160 for additional processing before being displayed. The first ISP processor 130 receives the processed data from the image memory 160 and performs image data processing in RGB and YCbCr color space on the processed data. The image data processed by the first ISP processor 130 may be output to a display 170 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the first ISP processor 130 may also be sent to the image memory 160, and the display 170 may read image data from the image memory 160. In one embodiment, image memory 160 may be configured to implement one or more frame buffers.
The statistics determined by the first ISP processor 130 may be sent to the control logic 150. For example, the statistical data may include first image sensor 114 statistics such as auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, first lens 112 shading correction, and the like. The control logic 150 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the first camera 110 and control parameters of the first ISP processor 130 based on the received statistical data. For example, the control parameters of the first camera 110 may include gain, integration time of exposure control, anti-shake parameters, flash control parameters, first lens 112 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters, and the like. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as first lens 112 shading correction parameters.
Similarly, the second image collected by the second camera 120 is transmitted to the second ISP processor 140 for processing, after the second ISP processor 140 processes the first image, the statistical data of the second image (such as the brightness of the image, the contrast value of the image, the color of the image, etc.) may be sent to the control logic 150, and the control logic 150 may determine the control parameter of the second camera 120 according to the statistical data, so that the second camera 120 may perform operations such as auto-focus and auto-exposure according to the control parameter. The second image may be stored in the image memory 160 after being processed by the second ISP processor 140, and the second ISP processor 140 may also read the image stored in the image memory 160 for processing. In addition, the second image may be directly transmitted to the display 170 for display after being processed by the ISP processor 140, or the display 170 may read the image in the image memory 160 for display. The second camera 120 and the second ISP processor 140 may also implement the processes described for the first camera 110 and the first ISP processor 130.
In one embodiment, the first camera 110 may be a color camera and the second camera 120 may be a TOF (Time Of Flight) camera or a structured light camera. The TOF camera can acquire a TOF depth map, and the structured light camera can acquire a structured light depth map. The first camera 110 and the second camera 120 may both be color cameras. And acquiring a binocular depth map through the two color cameras. The first ISP processor 130 and the second ISP processor 140 may be the same ISP processor.
The first camera 110 and the second camera 120 capture the same scene to obtain a visible light map and a depth map, respectively, and send the visible light map and the depth map to the ISP processor. The ISP processor can register the visible light image and the depth image according to the camera calibration parameters to keep the visual field completely consistent; then, generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge; inputting the visible light image and the central weight image into a trained subject detection model to obtain a subject region confidence image, and determining a target subject in the visible light image according to the subject region confidence image; the visible light image, the depth image and the central weight image can also be input into a trained subject detection model to obtain a subject region confidence map, and a target subject in the visible light image is determined according to the subject region confidence map. The object positioned in the center of the image can be detected more easily by utilizing the center weight graph, the object closer to the camera can be detected easily by utilizing the depth graph, and the accuracy of main body detection is improved. When the three-dimensional model is built, the three-dimensional model corresponding to the target main body is accurately built through the depth information of the target main body, and the target main body can be accurately identified under the condition that an interference object exists, so that the building accuracy of the three-dimensional model corresponding to the target main body is improved.
FIG. 2 is a flow diagram of a method for building a three-dimensional model according to one embodiment. As shown in fig. 2, a method for constructing a three-dimensional model, which can be applied to the terminal device in fig. 1, includes:
step 202, acquiring a visible light map.
The subject detection (subject detection) is to automatically process the region of interest and selectively ignore the region of no interest when facing a scene. The region of interest is referred to as the subject region. The visible light pattern is an RGB (Red, Green, Blue) image. A color camera can be used for shooting any scene to obtain a color image, namely an RGB image. The visible light map may be stored locally by the terminal device, may be stored by other devices, may be stored from a network, and may also be shot by the terminal device in real time, which is not limited to this. .
Specifically, the ISP processor or the central processing unit of the terminal device may obtain the visible light map from a local or other device or a network, or obtain the visible light map by shooting a scene through a camera.
Step 204, generating a central weight map corresponding to the visible light map, wherein the weight value represented by the central weight map is gradually reduced from the center to the edge.
The central weight map is a map used for recording the weight value of each pixel point in the visible light map. The weight values recorded in the central weight map gradually decrease from the center to the four sides, i.e., the central weight is the largest, and the weight values gradually decrease toward the four sides. And the weight value from the image center pixel point to the image edge pixel point of the visible light image is characterized by the center weight chart to be gradually reduced.
The ISP processor or central processor may generate a corresponding central weight map according to the size of the visible light map. The weight value represented by the central weight map gradually decreases from the center to the four sides. The central weight map may be generated using a gaussian function, or using a first order equation, or a second order equation. The gaussian function may be a two-dimensional gaussian function.
And step 206, inputting the visible light map and the central weight map into a subject detection model to obtain a subject region confidence map, wherein the subject detection model is obtained by training in advance according to the visible light map, the central weight map and a corresponding marked subject mask map of the same scene.
The subject detection model is obtained by acquiring a large amount of training data in advance and inputting the training data into the subject detection model containing the initial network weight for training. Each set of training data comprises a visible light graph, a center weight graph and a labeled main body mask graph corresponding to the same scene. The visible light map and the central weight map are used as input of a trained subject detection model, and the labeled subject mask (mask) map is used as an expected output real value (ground true) of the trained subject detection model. The main body mask image is an image filter template used for identifying a main body in an image, and can shield other parts of the image and screen out the main body in the image. The subject detection model may be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.
Specifically, the ISP processor or central processor may input the visible light map and the central weight map into the subject detection model, and perform detection to obtain a subject region confidence map. The subject region confidence map is used to record the probability of which recognizable subject the subject belongs to, for example, the probability of a certain pixel point belonging to a person is 0.8, the probability of a flower is 0.1, and the probability of a background is 0.1.
And step 208, determining a target subject in the visible light map according to the subject region confidence map.
The subject refers to various subjects, such as human, flower, cat, dog, cow, blue sky, white cloud, background, etc. The target subject refers to a desired subject, and can be selected as desired.
Specifically, the ISP processor or the central processing unit may select the highest or the highest confidence level as the subject in the visible light image according to the subject region confidence map, and if there is one subject, the subject is used as the target subject; if multiple subjects exist, one or more subjects can be selected as target subjects according to the configuration information or the depth information. In one embodiment, the distance between each subject and the shooting terminal is determined according to the depth information corresponding to each subject, and the subject with the smallest distance is taken as the target subject.
And 210, acquiring depth information corresponding to the target main body, performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, and returning to the step 202 to acquire visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
The target body has an accurate contour, only depth information corresponding to each pixel point in the contour of the target body needs to be acquired, the depth information corresponding to the target body can be acquired from a depth map shot by a camera, and the acquisition mode of the depth map is not limited.
Specifically, the target subject is three-dimensionally reconstructed according to the target subject and depth information corresponding to the target subject, and the three-dimensional reconstruction method is not limited. The depth information represents the distance between each pixel point in the target body and the shooting equipment, and the Z-axis coordinate of each pixel point in the target body in the three-dimensional space can be determined according to the depth information, so that the target body is subjected to three-dimensional reconstruction. When three-dimensional reconstruction is carried out, pixel points with the same depth information are in the same plane, reference pixel points can be selected, the depth values corresponding to the reference pixel points are used as reference depth values, and the size relation between the depth values corresponding to other pixel points and the reference depth values is compared, so that the positions of the other pixel points in the three-dimensional space relative to the reference pixel points are determined. For example, if the depth value of a certain pixel in the target subject is 10 and the depth value of an adjacent pixel is 12, the object uses the point with the depth value of 10 as a reference pixel, the depth of the adjacent pixel is 2, and the three-dimensional reconstruction of the target subject is to give relative concave-convex information to the surface of the subject corresponding to the target subject. The current visible light image only shoots one part of the target main body, so that visible light images at other acquisition angles need to be acquired, if the visible light image is shot in real time, the acquisition angles can be changed, all parts of the target main body are shot, depth information corresponding to all parts is acquired, and therefore the complete target main body is subjected to three-dimensional reconstruction. If the shot visible light image is obtained, the visible light images of other shooting angles can be directly obtained, and the complete target subject is subjected to three-dimensional reconstruction until a three-dimensional model corresponding to the target subject is obtained. FIG. 3 is a schematic diagram of a three-dimensional model of a target object constructed according to an embodiment.
In the method for constructing a three-dimensional model in this embodiment, a central weight map corresponding to a visible light map is generated by acquiring the visible light map, wherein a weight value represented by the central weight map is gradually reduced from a center to an edge; inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene; determining a target subject in the visible light image according to the subject region confidence map; acquiring depth information corresponding to a target main body; the method comprises the steps of carrying out three-dimensional reconstruction on a target body according to the depth information corresponding to the target body and the target body, returning to the step of obtaining a visible light image to obtain visible light images at different collection angles until a three-dimensional model corresponding to the target body is obtained, enabling an object at the center of the image to be easily detected by using a center weight image, and accurately identifying the target body in the visible light image by using a trained main body detection model obtained by training the visible light image, the center weight image, a main body mask image and the like.
In one embodiment, as shown in FIG. 4, step 208, comprises:
step 208A, the confidence map of the subject region is processed to obtain a subject mask map.
Specifically, some scattered points with lower confidence exist in the confidence map of the subject region, and the confidence map of the subject region may be filtered by the ISP processor or the central processing unit to obtain the mask map of the subject. The filtering process may employ a configured confidence threshold to filter the pixel points in the confidence map of the subject region whose confidence value is lower than the confidence threshold. The confidence threshold may adopt a self-adaptive confidence threshold, may also adopt a fixed threshold, and may also adopt a threshold corresponding to a regional configuration.
In step 208B, the visible light map is detected to determine the highlight region in the visible light map.
The highlight region is a region having a luminance value greater than a luminance threshold value.
Specifically, the ISP processor or the central processing unit performs highlight detection on the visible light image, screens target pixels with brightness values larger than a brightness threshold, and performs connected domain processing on the target pixels to obtain a highlight area.
In step 208C, the target subject with the highlight eliminated in the visible light map is determined according to the highlight region in the visible light map and the subject mask map.
Specifically, the ISP processor or the central processing unit may perform a difference calculation or a logical and calculation on the highlight region in the visible light image and the body mask image to obtain a target body with highlight eliminated in the visible light image.
In this embodiment, the confidence map of the main body region is filtered to obtain a main body mask map, so that the reliability of the confidence map of the main body region is improved, the visible light map is detected to obtain a highlight region, and then the highlight region is processed with the main body mask map to obtain a target main body without highlights.
In one embodiment, step 208A includes: and carrying out self-adaptive confidence threshold filtering processing on the confidence map of the main body region to obtain a main body mask map.
The adaptive confidence threshold refers to a confidence threshold. The adaptive confidence threshold may be a locally adaptive confidence threshold. The local self-adaptive confidence threshold is a binary confidence threshold determined at the position of a pixel point according to the pixel value distribution of the domain block of the pixel point. The binarization confidence threshold value configuration of the image area with higher brightness is higher, and the binarization threshold confidence value configuration of the image area with lower brightness is lower.
In one embodiment, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is larger than the first brightness value, a first confidence threshold value is configured, when the brightness value of the pixel point is smaller than a second brightness value, a second confidence threshold value is configured, when the brightness value of the pixel point is larger than the second brightness value and smaller than the first brightness value, a third confidence threshold value is configured, wherein the second brightness value is smaller than or equal to the first brightness value, the second confidence threshold value is smaller than the third confidence threshold value, and the third confidence threshold value is smaller than the first confidence threshold value.
In one embodiment, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is larger than the first brightness value, a first confidence threshold value is configured, and when the brightness value of the pixel point is smaller than or equal to the first brightness value, a second confidence threshold value is configured, wherein the second brightness value is smaller than or equal to the first brightness value, and the second confidence threshold value is smaller than the first confidence threshold value.
When the self-adaptive confidence threshold filtering processing is carried out on the confidence map of the main area, the confidence value of each pixel point in the confidence map of the main area is compared with the corresponding confidence threshold, if the confidence value is larger than or equal to the confidence threshold, the pixel point is reserved, and if the confidence value is smaller than the confidence threshold, the pixel point is removed.
In one embodiment, the performing an adaptive confidence threshold filtering process on the confidence map of the subject region to obtain a subject mask map includes:
carrying out self-adaptive confidence coefficient threshold filtering processing on the confidence coefficient map of the main body region to obtain a binary mask map; and performing morphology processing and guide filtering processing on the binary mask image to obtain a main body mask image.
Specifically, after the ISP processor or the central processing unit filters the confidence map of the main area according to the adaptive confidence threshold, the confidence values of the retained pixel points are represented by 1, and the confidence values of the removed pixel points are represented by 0, so as to obtain the binary mask map.
Morphological treatments may include erosion and swelling. Firstly, carrying out corrosion operation on the binary mask image, and then carrying out expansion operation to remove noise; and then conducting guided filtering processing on the morphologically processed binary mask image to realize edge filtering operation and obtain a main body mask image with an edge extracted.
The morphology processing and the guide filtering processing can ensure that the obtained main body mask image has less or no noise points and the edge is softer.
In one embodiment, the determining the target subject with highlight eliminated in the visible light map according to the highlight region in the visible light map and the subject mask map comprises: and carrying out difference processing on the high-light area in the visible light image and the main body mask image to obtain the target main body without high light.
Specifically, the ISP processor or the central processor performs a difference processing on the highlight area in the visible light map and the main body mask map, that is, the corresponding pixel values in the visible light map and the main body mask map are subtracted to obtain the target main body in the visible light map. The target main body without the highlight is obtained through differential processing, and the calculation mode is simple.
In one embodiment, the subject detection model includes an input layer, an intermediate layer, and an output layer in series. Inputting the visible light map and the central weight map into a subject detection model, including: applying the visible light map to an input layer of a subject detection model; applying the central weight map to an output layer of the subject detection model.
The subject detection model may employ a deep learning network model. The deep learning network model can comprise an input layer, a middle layer and an output layer which are connected in sequence. The intermediate layer may be a network structure of one layer or at least two layers. The visible light pattern is input from, i.e., acts on, the input layer of the subject detection model. The central weight map is input at the output layer of the subject detection model, i.e. acts on the output layer of the subject detection model. The central weight map is applied to the output layer of the main body detection model, so that the influence of other layers of the main body detection model on the weight map can be reduced, and an object in the center of the picture can be more easily detected as a main body.
In one embodiment, the obtaining the depth information corresponding to the target subject in step 210 includes: acquiring a depth map corresponding to the visible light map; the depth map comprises at least one of a TOF depth map, a binocular depth map and a structured light depth map, registration processing is carried out on the visible light map and the depth map to obtain a visible light map and a depth map after registration, and depth information corresponding to a target main body is determined from the depth map after registration according to an area where the target main body is located in the visible light map.
The depth map is a map including depth information. And shooting the same scene through a depth camera or a binocular camera to obtain a corresponding depth map. The depth camera may be a structured light camera or a TOF camera. The depth map may be at least one of a structured light depth map, a TOF depth map, and a binocular depth map.
Specifically, the ISP processor or the central processor may capture the same scene through the camera to obtain a visible light map and a corresponding depth map, and then register the visible light map and the depth map by using the camera calibration parameters to obtain the registered visible light map and depth map. In the visible light image and the depth image after registration, a pixel point in each visible light image has a matched pixel point in the depth image, so that a pixel point corresponding to the region where the target subject is located in the depth image can be obtained according to the matching relation, and the depth value corresponding to the target subject can be obtained according to the pixel value of the pixel point.
In one embodiment, when the depth map cannot be obtained by shooting, the simulation depth map can be automatically generated. The depth value of each pixel point in the simulated depth map can be a preset value. In addition, the depth value of each pixel point in the simulated depth map can correspond to different preset values.
In one embodiment, the method comprises the following steps: inputting the visible light image after registration, the depth image and the central weight image into a main body detection model to obtain a main body region confidence map; the main body detection model is obtained by training in advance according to a visible light image, a depth image, a center weight image and a corresponding marked main body mask image of the same scene.
The subject detection model is obtained by acquiring a large amount of training data in advance and inputting the training data into the subject detection model containing the initial network weight for training. Each set of training data comprises a visible light image, a depth image, a center weight image and a labeled main body mask image corresponding to the same scene. The visible light graph and the central weight graph are used as input of a trained subject detection model, and the marked subject mask graph is used as an expected output actual value of the trained subject detection model. The main body mask image is an image filter template used for identifying a main body in an image, and can shield other parts of the image and screen out the main body in the image. The subject detection model may be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.
In the embodiment, the depth map and the central weight map are used as the input of the main body detection model, an object closer to a camera can be more easily detected by using the depth information of the depth map, the object at the center of the image can be more easily detected by using the central attention mechanism with large central weight and small four-side weight in the central weight map, the depth feature enhancement of the main body is realized by introducing the depth map, the central attention feature enhancement of the main body is realized by introducing the central weight map, the target main body in a simple scene can be accurately identified, the identification accuracy of the main body in a complex scene is greatly improved, and the problem that the target robustness of a traditional target detection method for a natural image which changes into universities is poor can be solved by introducing the depth map. The simple scene is a scene with a single main body and low contrast in a background area.
In one embodiment, the training mode of the subject detection model includes: acquiring a visible light image, a depth image and a marked main body mask image of the same scene; generating a center weight map corresponding to the visible light map, wherein the weight values represented by the center weight map gradually decrease from the center to the edge; and applying the visible light image to an input layer of a main body detection model containing initial network weight, applying the depth image and the center weight image to an output layer of the initial main body detection model, taking the marked main body mask image as a true value output by the main body detection model, and training the main body detection model containing the initial network weight to obtain the target network weight of the main body detection model.
A visible light map, a depth map and a corresponding annotated subject mask map of a scene may be collected. And carrying out semantic level labeling on the visible light image and the depth image, and labeling the main body in the visible light image and the depth image. A large number of visible light images can be collected, and then a large number of images of a pure background or a simple background are obtained by fusing a foreground target image and a simple background image in a COCO data set and are used as training visible light images. The COCO dataset contains a large number of foreground objects.
The network structure of the main body detection model adopts a mobile-Unet-based framework, and bridging among layers is added in the decoder part, so that high-level semantic features are more fully transmitted during up-sampling. The central weight graph acts on an output layer of the main body monitoring model, and a central attention mechanism is introduced, so that an object in the center of a picture can be more easily detected as a main body.
The network structure of the main body detection model comprises an input layer, a convolution layer (conv), a pooling layer (Pooling), a Bilinear Up interpolation layer (Bilinear Up sampling), a convolution characteristic connection layer (concat + conv), an output layer and the like. And (3) bridging is realized by adopting deconvoltation + add (deconvolution feature superposition) operation between the bilinear interpolation layer and the convolution feature connection layer, so that the high-level semantic features are more fully transferred during upsampling. Convolutional layers, pooling layers, bilinear interpolation layers, convolutional feature connection layers, and the like may be intermediate layers of the subject detection model.
The initial network weight refers to an initial weight of each layer of the initialized deep learning network model. The target network weight refers to the weight of each layer of the deep learning network model which is obtained through training and can detect the image subject. The target network weight can be obtained through presetting training times, and a loss function of the deep learning network model can also be set. And when the loss function value obtained by training is smaller than the loss threshold value, taking the current network weight of the main body detection model as the target network weight.
FIG. 5 is a diagram illustrating a network structure of a subject detection model according to an embodiment. As shown in fig. 5, the network structure of the subject inspection model includes a convolutional layer 402, a pooling layer 404, a convolutional layer 406, a pooling layer 408, a convolutional layer 410, a pooling layer 412, a convolutional layer 414, a pooling layer 416, a convolutional layer 418, a convolutional layer 420, a bilinear interpolation layer 422, a convolutional layer 424, a bilinear interpolation layer 426, a convolutional layer 428, a convolutional feature connecting layer 430, a bilinear interpolation layer 432, a convolutional layer 434, a convolutional feature connecting layer 436, a bilinear interpolation layer 438, a convolutional layer 440, a convolutional feature connecting layer 442, and the like, the convolutional layer 402 serves as an input layer of the subject inspection model, and the convolutional feature connecting layer 442 serves as an output layer of the subject inspection model. The network structure of the subject detection model in this embodiment is merely an example, and is not a limitation to the present application. It is understood that a plurality of convolutional layers, pooling layers, bilinear interpolation layers, convolutional feature connection layers, and the like in the network structure of the subject detection model may be provided as needed.
The encoding portion of the subject inspection model includes convolutional layer 402, pooling layer 404, convolutional layer 406, pooling layer 408, convolutional layer 410, pooling layer 412, convolutional layer 414, pooling layer 416, and convolutional layer 418, and the decoding portion includes convolutional layer 420, bilinear interpolation layer 422, convolutional layer 424, bilinear interpolation layer 426, convolutional layer 428, convolutional signature connection layer 430, bilinear interpolation layer 432, convolutional layer 434, convolutional signature connection layer 436, bilinear interpolation layer 438, convolutional layer 440, and convolutional signature connection layer 442. Convolutional layer 406 and convolutional layer 434 are cascaded (coordination), convolutional layer 410 and convolutional layer 428 are cascaded, and convolutional layer 414 is cascaded with convolutional layer 424. Bilinear interpolation layer 422 and convolution feature concatenation layer 430 bridge with Deconvolution feature superposition (deconvo + add). Bilinear interpolation layer 432 and convolution signature connection layer 436 bridge with deconvolution signature stacking. Bilinear interpolation layer 438 and convolution signature connection layer 442 bridge with deconvolution signature stacking.
The original image 450 (e.g., visible light map) is input to the convolution layer 402 of the subject detection model, the depth map 460 is applied to the convolution feature linkage layer 442 of the subject detection model, and the center weight map 470 is applied to the convolution feature linkage layer 442 of the subject detection model. The depth map 460 and the center weight map 470 are each input to the convolution signature linkage layer 442 as a multiplication factor. The original image 450, the depth image 460, and the center weight image 470 are input to the subject detection model, and then a confidence map 480 including the subject is output.
The loss rate of the preset numerical value is adopted for the depth map in the training process of the main body detection model. The predetermined value may be 50%. And (3) introducing probability dropout in the training process of the depth map, so that the main body detection model can fully excavate the information of the depth map, and when the main body detection model cannot acquire the depth map, an accurate result can still be output. And a dropout mode is adopted for inputting the depth map, so that the robustness of the subject detection model to the depth map is better, and the subject region can be accurately segmented even without the depth map.
In addition, in the normal shooting process of the terminal equipment, the shooting and calculation of the depth map are time-consuming and labor-consuming and difficult to obtain, the depth map is designed to be 50% dropout probability in the training process, and the main body detection model can still be normally detected when no depth information exists.
Highlight detection is performed on the original 450 using the highlight detection layer 444 to identify highlight regions in the original. And performing adaptive threshold filtering on the confidence map of the main body region output by the main body detection model to obtain a binary mask map, performing morphological processing and guided filtering on the binary mask map to obtain a main body mask map, performing difference processing on the main body mask map and an original image containing the highlight region, and deleting the highlight region from the main body mask map to obtain the highlight-removed main body. The confidence map of the main body region is distributed in confidence maps from 0 to 1, the confidence map of the main body region contains more noise points and has a plurality of noise points with lower confidence or small high confidence regions which are aggregated together, and filtering processing is carried out through a region self-adaptive confidence threshold value to obtain a binary mask map. The binary mask image is subjected to morphological processing, so that noise can be further reduced, and the edge can be smoother by being subjected to guide filtering processing. It will be appreciated that the body region confidence map may be a body mask map containing noise.
In this embodiment, the depth map is used as a feature to enhance a network output result, and the depth map is not directly input into a network of a main body detection model, a dual-depth learning network structure may be additionally designed, where one deep learning network structure is used to process the depth map, and the other deep learning network structure is used to process an RGB map, and then convolution feature connection is performed on outputs of the two deep learning network structures, and then the outputs are output.
In one embodiment, the training mode of the subject detection model includes: acquiring a visible light image and a marked main body mask image of the same scene; generating a center weight map corresponding to the visible light map, wherein the weight values represented by the center weight map gradually decrease from the center to the edge; and applying the visible light graph to an input layer of a main body detection model containing initial network weight, applying the central weight graph to an output layer of the initial main body detection model, taking the marked main body mask graph as a true value output by the main body detection model, and training the main body detection model containing the initial network weight to obtain the target network weight of the main body detection model.
The training in this embodiment uses a visible light map and a central weight map, that is, a depth map is not introduced in the output layer part in the network structure of the subject detection model in fig. 5, the visible light map is used to act on the convolution layer 402, and the central weight map 470 acts on the convolution feature connection layer 442 of the subject detection model.
FIG. 6 is a diagram illustrating the effect of target subject recognition in one embodiment. As shown in fig. 6, a butterfly exists in the RGB diagram 502, the RGB diagram is input to a subject detection model to obtain a subject region confidence map 504, then the subject region confidence map 604 is filtered and binarized to obtain a binarized mask map 506, and then the binarized mask map 506 is subjected to morphological processing and guided filtering to realize edge enhancement to obtain a subject mask map 508.
In one embodiment, the step of returning to the step of acquiring visible light maps to acquire visible light maps for different acquisition angles comprises: and continuously changing the acquisition angles around the target main body by taking the target main body as a center, and acquiring the visible light images in real time under the condition that an overlapping area exists between the adjacent visible light images to obtain the visible light images at different acquisition angles.
Specifically, the conversion of the acquisition angle can be realized by rotating the shooting device, and the conversion amplitude can be customized, for example, a cubic object is shot, and the visible light images at different angles are acquired by rotating 45 degrees every time. Since the collected visible light patterns need to cover the whole target body, an overlapping area needs to exist between adjacent visible light patterns to ensure the integrity of information collection, wherein the overlapping proportion can be customized.
In this embodiment, the acquisition angle is continuously changed around the target body, and the integrity of information acquisition can be ensured by acquiring the visible light image in real time, so that the target body is three-dimensionally reconstructed in real time.
In one embodiment, step 210 includes: acquiring a first depth value corresponding to a first plane pixel point on a target main body, acquiring a first three-dimensional pixel point corresponding to the first plane pixel point in a three-dimensional space according to the position of the first plane pixel point on the target main body and the first depth value, acquiring a second depth value corresponding to a second plane pixel point on the target main body, determining the relative position of the second three-dimensional pixel point relative to the first three-dimensional pixel point according to the second depth value by taking the first depth value as a reference depth value, determining the position of the second three-dimensional pixel point in the three-dimensional space according to the relative position and the position of the second plane pixel point in the target main body; and connecting each three-dimensional pixel point in the three-dimensional space.
Specifically, the matching of three-dimensional pixel points can be performed on each plane pixel point on the target body, and the matching of three-dimensional pixel points can also be performed on key plane pixel points on the target body, so that the efficiency of three-dimensional reconstruction is improved, and computer resources are saved. In one embodiment, when the target subject is a human face, the key plane pixel points are feature key points obtained by detecting the human face, such as points on the tip of the nose, eyes, mouth, and eyebrows. The three-dimensional model is built by selecting a point, such as a first plane pixel point, as a basis to obtain a calibration point, and calculating the depth value corresponding to the point from other points, so that each three-dimensional pixel point corresponding to the target subject in the three-dimensional space can be built step by step, and the three-dimensional model corresponding to the target subject can be obtained by connecting each three-dimensional pixel point in the three-dimensional space.
In one embodiment, step 210 includes: determining a target type corresponding to a target main body, acquiring an initial three-dimensional model of the same type according to the target type, and acquiring an actual depth value corresponding to a key plane pixel point on the target main body; acquiring three-dimensional model pixel points matched with the key plane pixel points from the initial three-dimensional model; adjusting the three-dimensional space position of the matched three-dimensional model pixel point according to the actual depth value proportion among the key plane pixel points; acquiring an actual depth value corresponding to a non-key plane pixel point on a target main body, acquiring a three-dimensional model pixel point matched with the non-key plane pixel point from an initial three-dimensional model, and adjusting the three-dimensional space position of the three-dimensional model pixel point matched with the non-key plane pixel point according to the actual depth value ratio between the non-key plane pixel point and the key plane pixel point until each plane pixel point on the target main body has a matched adjusted three-dimensional model pixel point; and forming a three-dimensional model corresponding to the target main body by each adjusted three-dimensional model pixel point.
Specifically, the subject detection model not only outputs the target subject outline in the visible light map, but also outputs the target type corresponding to the target subject, wherein the target type includes people, flowers, cats, dogs, and the like. The initial three-dimensional models corresponding to the target types can be established in advance, such as a human face initial three-dimensional model, a human body initial three-dimensional model, a flower initial three-dimensional model, a cat initial three-dimensional model, a dog initial three-dimensional model and the like. The key plane pixel points are key feature points for determining the three-dimensional property of the three-dimensional model, for example, for a human face, key plane pixel points are obtained by detecting the human face, and for example, points on the nose tip, eyes, mouth and eyebrows are key plane pixel points. And adjusting the three-dimensional space position of the matched three-dimensional model pixel points according to the actual depth value proportion between the key plane pixel points. And adjusting the initial three-dimensional model to obtain a three-dimensional model stereo contour matched with the target subject, wherein if the nose of the initial three-dimensional model is lower and the nose of the actual target subject is higher, the three-dimensional model stereo contour matched with the target subject is obtained by raising the nose part in the initial three-dimensional model. And further adjusting the initial three-dimensional model according to the actual depth values of the other non-key plane pixel points to obtain a three-dimensional model more matched with the target main body.
In this embodiment, since the initial three-dimensional model conforms to the conventional model, the influence of the black hole formed on the depth map on the construction of the three-dimensional model due to the problems of accuracy and inaccuracy in calculation of the depth map can be reduced. And gradually correcting the initial three-dimensional model to obtain a three-dimensional model matched with the target main body.
In one embodiment, the visible light map is acquired in real time, the method further comprising: and displaying the construction process of the three-dimensional model corresponding to the target body in real time on a preview interface according to the visible light image acquired in real time.
Specifically, a visible light image including a target body can be acquired in real time through the terminal device, a three-dimensional model corresponding to the target body is gradually constructed while the visible light image is acquired, and the construction process of the three-dimensional model corresponding to the target body is displayed in real time on a preview interface, so that a user can visually view the construction process of the three-dimensional model. Therefore, the acquisition angle of the visible light image can be adjusted according to the displayed constructed three-dimensional model, so that the construction of the three-dimensional model corresponding to the target main body is more accurate and efficient.
In one embodiment, the corresponding three-dimensional model of the target subject is matched with the corresponding texture and color according to the texture and color of the visible light map. It should be understood that, although the steps in the flowcharts of fig. 2 and 4 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 4 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
FIG. 7 is a block diagram of an apparatus for constructing a three-dimensional model according to an embodiment. As shown in fig. 7, an apparatus for constructing a three-dimensional model includes a processing module 602, a detecting module 604, a target body determining module 606, and a three-dimensional model constructing module 608. Wherein:
the processing module 602 is configured to acquire a visible light map and generate a center weight map corresponding to the visible light map, where a weight value represented by the center weight map decreases gradually from a center to an edge.
The detection module 604 is configured to input the visible light map and the central weight map into a subject detection model to obtain a subject region confidence map, where the subject detection model is a model obtained by training in advance according to the visible light map, the central weight map and a corresponding labeled subject mask map of the same scene.
A target subject determination module 606, configured to determine a target subject in the visible light map according to the subject region confidence map.
The three-dimensional model building module 608 is configured to obtain depth information corresponding to the target subject, perform three-dimensional reconstruction on the target subject according to the depth information corresponding to the target subject and the target subject, and return to the processing module to obtain visible light maps at different acquisition angles until a three-dimensional model corresponding to the target subject is obtained.
In the three-dimensional model building device in this embodiment, the center weight map is used to enable an object in the center of the image to be detected more easily, the trained main body detection model obtained through training the visible light map, the center weight map, the main body mask map and the like is used to more accurately identify the target main body in the visible light map, when the three-dimensional model is built, the accurate building of the three-dimensional model corresponding to the target main body is realized through the depth information of the target main body, and the target main body can also be accurately identified under the condition that an interfering object exists, so that the accuracy of building the three-dimensional model corresponding to the target main body is improved.
In one embodiment, the target subject determination module 606 is further configured to process the subject region confidence map to obtain a subject mask map; detecting the visible light map, and determining a high light region in the visible light map; and determining the target subject with highlight eliminated in the visible light image according to the highlight area in the visible light image and the subject mask image.
In one embodiment, the target subject determination module 606 is further configured to perform an adaptive confidence threshold filtering process on the subject region confidence map to obtain a subject mask map.
In one embodiment, the target subject determining module 606 is further configured to perform adaptive confidence threshold filtering on the confidence map of the subject region to obtain a binary mask map; and performing morphology processing and guide filtering processing on the binary mask image to obtain a main body mask image.
In one embodiment, the target subject determination module 606 is further configured to perform a difference process on the highlight region in the visible light map and the subject mask map to obtain the target subject in the visible light map.
In one embodiment, the subject detection model comprises an input layer, an intermediate layer and an output layer which are connected in sequence;
the detection module 604 is further configured to apply the visible light map to an input layer of a subject detection model; applying the central weight map to an output layer of the subject detection model.
In one embodiment, three-dimensional model construction module 608 is further configured to obtain a depth map corresponding to the visible light map; the depth map comprises at least one of a TOF depth map, a binocular depth map, and a structured light depth map; and performing registration processing on the visible light image and the depth image to obtain a registered visible light image and a registered depth image, and determining depth information corresponding to the target subject from the registered depth image according to the region of the target subject in the visible light image.
In one embodiment, the three-dimensional model construction module 608 is further configured to continuously transform the acquisition angles around the target subject centered on the target subject; and under the condition that an overlapping area exists between adjacent visible light images, the visible light images are acquired in real time to obtain the visible light images at different acquisition angles.
In one embodiment, the three-dimensional model construction module 608 is further configured to obtain a first depth value corresponding to a first planar pixel point on the target subject; obtaining a first three-dimensional pixel point corresponding to the first plane pixel point in the three-dimensional space according to the position of the first plane pixel point in the target main body and the first depth value; acquiring a second depth value corresponding to a second plane pixel point on the target main body; determining the relative position of a second three-dimensional pixel point relative to a first three-dimensional pixel point according to a second depth value by taking the first depth value as a reference depth value, wherein the second three-dimensional pixel point is a three-dimensional pixel point corresponding to a second plane pixel point in a three-dimensional space; determining the position of the second three-dimensional pixel point in the three-dimensional space according to the relative position and the position of the second plane pixel point in the target body; and connecting each three-dimensional pixel point in the three-dimensional space.
In one embodiment, the three-dimensional model building module 608 is further configured to determine a target type corresponding to the target subject, and obtain an initial three-dimensional model of the same type according to the target type; acquiring an actual depth value corresponding to a key plane pixel point on a target main body; acquiring three-dimensional model pixel points matched with the key plane pixel points from the initial three-dimensional model, and adjusting the three-dimensional space positions of the matched three-dimensional model pixel points according to the actual depth value proportion among the key plane pixel points; acquiring an actual depth value corresponding to a non-key plane pixel point on a target main body; acquiring a three-dimensional model pixel point matched with the non-key plane pixel point from an initial three-dimensional model; adjusting the three-dimensional space position of the three-dimensional model pixel point matched with the non-key plane pixel point according to the actual depth value proportion between the non-key plane pixel point and the key plane pixel point until the matched adjusted three-dimensional model pixel point exists in each plane pixel point on the target main body; and forming a three-dimensional model corresponding to the target main body by each adjusted three-dimensional model pixel point.
In one embodiment, the apparatus further comprises:
and the display module is used for displaying the construction process of the three-dimensional model corresponding to the target body in real time on a preview interface according to the visible light image acquired in real time.
In one embodiment, the detection module 604 is further configured to input the registered visible light map, depth map, and center weight map into a subject detection model to obtain a subject region confidence map; the main body detection model is obtained by training in advance according to a visible light image, a depth image, a center weight image and a corresponding marked main body mask image of the same scene.
In one embodiment, the apparatus for constructing a three-dimensional model further includes a training image obtaining module, a training weight generating module, and a training module.
The training image acquisition module is used for acquiring a visible light image, a depth image and a labeled main body mask image of the same scene.
The training weight generation module is used for generating a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map is gradually reduced from the center to the edge.
The training module is used for acting the visible light image on an input layer of a main body detection model containing initial network weight, acting the depth image and the central weight image on an output layer of the initial main body detection model, taking the marked main body mask image as a real value output by the main body detection model, and training the main body detection model containing the initial network weight to obtain the target network weight of the main body detection model. And when the loss function of the main body detection model is smaller than the loss threshold value or the training times reach the preset times, the network weight of the main body detection model is used as the target network weight of the main body detection model.
Fig. 8 is a schematic diagram of the internal structure of the terminal device in one embodiment. As shown in fig. 8, the terminal device includes a processor and a memory connected by a system bus. The processor is used for providing calculation and control capacity and supporting the operation of the whole terminal equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor for implementing a method for constructing a three-dimensional model provided by the embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The terminal device can be a mobile phone, a tablet computer, a personal digital assistant, a wearable device and the like.
The implementation of each module in the three-dimensional model building device provided in the embodiment of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. The program modules constituted by the computer program may be stored on the memory of the terminal or the server. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of a method of constructing a three-dimensional model.
A computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of a method of constructing a three-dimensional model.
Any reference to memory, storage, database, or other medium used by embodiments of the present application may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of constructing a three-dimensional model, the method comprising:
acquiring a visible light image, and generating a central weight image corresponding to the visible light image, wherein the central weight image is used for recording weight values of all pixel points in the visible light image, and the weight values represented by the central weight image are gradually reduced from the center to the edge;
inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
determining a target subject in the visible light map according to the subject region confidence map;
acquiring depth information corresponding to the target main body;
and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
2. The method of claim 1, wherein said determining a target subject in the visible light map from the subject region confidence map comprises:
processing the confidence coefficient map of the main body region to obtain a main body mask map;
detecting the visible light map and determining a highlight region in the visible light map;
and determining a target subject for eliminating the highlight in the visible light image according to the highlight region in the visible light image and the subject mask image.
3. The method of claim 1, wherein the obtaining depth information corresponding to the target subject comprises:
acquiring a depth map corresponding to the visible light map; the depth map comprises at least one of a TOF depth map, a binocular depth map, and a structured light depth map;
carrying out registration processing on the visible light image and the depth image to obtain a visible light image and a depth image after registration;
and determining depth information corresponding to the target subject from the registered depth map according to the region of the target subject in the visible light map.
4. The method of claim 1, wherein said step of returning to said step of acquiring visible light maps to acquire visible light maps for different acquisition angles comprises:
continuously transforming the acquisition angle around the target body by taking the target body as a center;
and acquiring the visible light images in real time under the condition that an overlapping area exists between the adjacent visible light images to obtain the visible light images at different acquisition angles.
5. The method according to claim 1, wherein the three-dimensional reconstruction of the target subject according to the depth information corresponding to the target subject and the target subject comprises:
acquiring a first depth value corresponding to a first plane pixel point on the target main body;
obtaining a first three-dimensional pixel point corresponding to the first plane pixel point in a three-dimensional space according to the position of the first plane pixel point in the target main body and the first depth value;
acquiring a second depth value corresponding to a second plane pixel point on the target main body;
determining the relative position of a second three-dimensional pixel point relative to the first three-dimensional pixel point according to the second depth value by taking the first depth value as a reference depth value, wherein the second three-dimensional pixel point is a three-dimensional pixel point corresponding to a second plane pixel point in a three-dimensional space;
determining the position of the second three-dimensional pixel point in the three-dimensional space according to the relative position and the position of the second plane pixel point in the target body;
and connecting each three-dimensional pixel point in the three-dimensional space.
6. The method according to any one of claims 1 to 5, wherein the three-dimensional reconstruction of the target subject according to the depth information of the target subject and the target subject, and returning to the step of acquiring the visible light map to acquire the visible light maps at different acquisition angles until obtaining the three-dimensional model of the target subject comprises:
determining a target type corresponding to a target main body, and acquiring initial three-dimensional models of the same type according to the target type;
acquiring an actual depth value corresponding to a key plane pixel point on the target main body;
acquiring three-dimensional model pixel points matched with the key plane pixel points from the initial three-dimensional model, and adjusting the three-dimensional space positions of the matched three-dimensional model pixel points according to the actual depth value proportion among the key plane pixel points;
acquiring an actual depth value corresponding to a non-key plane pixel point on the target main body;
acquiring a three-dimensional model pixel point matched with the non-key plane pixel point from the initial three-dimensional model;
adjusting the three-dimensional space position of the three-dimensional model pixel point matched with the non-key plane pixel point according to the actual depth value proportion between the non-key plane pixel point and the key plane pixel point until the matched adjusted three-dimensional model pixel point exists in each plane pixel point on the target main body;
and forming a three-dimensional model corresponding to the target main body by each adjusted three-dimensional model pixel point.
7. The method of any one of claims 1 to 5, wherein the visible light map is acquired in real-time, the method further comprising:
and displaying the construction process of the three-dimensional model corresponding to the target main body in real time on a preview interface according to the visible light image acquired in real time.
8. An apparatus for constructing a three-dimensional model, the apparatus comprising:
the processing module is used for acquiring a visible light image and generating a central weight image corresponding to the visible light image, wherein the central weight image is used for recording weight values of all pixel points in the visible light image, and the weight values represented by the central weight image are gradually reduced from the center to the edge;
the detection module is used for inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is a model obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene;
a target subject determination module, configured to determine a target subject in the visible light map according to the subject region confidence map;
and the three-dimensional model building module is used for acquiring the depth information corresponding to the target main body, performing three-dimensional reconstruction on the target main body according to the depth information corresponding to the target main body and the target main body, and returning to the processing module to acquire visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained.
9. A terminal device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN201910573384.9A 2019-06-28 2019-06-28 Method and device for constructing three-dimensional model, equipment and computer-readable storage medium Active CN110276831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910573384.9A CN110276831B (en) 2019-06-28 2019-06-28 Method and device for constructing three-dimensional model, equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910573384.9A CN110276831B (en) 2019-06-28 2019-06-28 Method and device for constructing three-dimensional model, equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN110276831A CN110276831A (en) 2019-09-24
CN110276831B true CN110276831B (en) 2022-03-18

Family

ID=67963722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910573384.9A Active CN110276831B (en) 2019-06-28 2019-06-28 Method and device for constructing three-dimensional model, equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN110276831B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784621A (en) * 2019-10-22 2021-05-11 华为技术有限公司 Image display method and apparatus
CN110874851A (en) * 2019-10-25 2020-03-10 深圳奥比中光科技有限公司 Method, device, system and readable storage medium for reconstructing three-dimensional model of human body
CN111366916B (en) * 2020-02-17 2021-04-06 山东睿思奥图智能科技有限公司 Method and device for determining distance between interaction target and robot and electronic equipment
CN112465890A (en) * 2020-11-24 2021-03-09 深圳市商汤科技有限公司 Depth detection method and device, electronic equipment and computer readable storage medium
CN116045852B (en) * 2023-03-31 2023-06-20 板石智能科技(深圳)有限公司 Three-dimensional morphology model determining method and device and three-dimensional morphology measuring equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825544A (en) * 2015-11-25 2016-08-03 维沃移动通信有限公司 Image processing method and mobile terminal
US9430850B1 (en) * 2015-04-02 2016-08-30 Politechnika Poznanska System and method for object dimension estimation using 3D models
CN107507272A (en) * 2017-08-09 2017-12-22 广东欧珀移动通信有限公司 Establish the method, apparatus and terminal device of human 3d model
CN108764180A (en) * 2018-05-31 2018-11-06 Oppo广东移动通信有限公司 Face identification method, device, electronic equipment and readable storage medium storing program for executing
CN108805018A (en) * 2018-04-27 2018-11-13 淘然视界(杭州)科技有限公司 Road signs detection recognition method, electronic equipment, storage medium and system
CN109685853A (en) * 2018-11-30 2019-04-26 Oppo广东移动通信有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN109712105A (en) * 2018-12-24 2019-05-03 浙江大学 A kind of image well-marked target detection method of combination colour and depth information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9373190B2 (en) * 2014-07-09 2016-06-21 Google Inc. High-quality stereo reconstruction featuring depth map alignment and outlier identification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430850B1 (en) * 2015-04-02 2016-08-30 Politechnika Poznanska System and method for object dimension estimation using 3D models
CN105825544A (en) * 2015-11-25 2016-08-03 维沃移动通信有限公司 Image processing method and mobile terminal
CN107507272A (en) * 2017-08-09 2017-12-22 广东欧珀移动通信有限公司 Establish the method, apparatus and terminal device of human 3d model
CN108805018A (en) * 2018-04-27 2018-11-13 淘然视界(杭州)科技有限公司 Road signs detection recognition method, electronic equipment, storage medium and system
CN108764180A (en) * 2018-05-31 2018-11-06 Oppo广东移动通信有限公司 Face identification method, device, electronic equipment and readable storage medium storing program for executing
CN109685853A (en) * 2018-11-30 2019-04-26 Oppo广东移动通信有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN109712105A (en) * 2018-12-24 2019-05-03 浙江大学 A kind of image well-marked target detection method of combination colour and depth information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Semantic Image Segmentation Based on Attentions to Intra Scales and Inner Channels;Hongchao Lu等;《2018 International Joint Conference on Neural Networks (IJCNN)》;20181231;全文 *
基于注意力机制的多尺度融合航拍影像语义分割;郑顾平等;《图学学报》;20181231;第1069-1077页 *

Also Published As

Publication number Publication date
CN110276831A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN110276767B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN110428366B (en) Image processing method and device, electronic equipment and computer readable storage medium
WO2021022983A1 (en) Image processing method and apparatus, electronic device and computer-readable storage medium
CN110248096B (en) Focusing method and device, electronic equipment and computer readable storage medium
CN110276831B (en) Method and device for constructing three-dimensional model, equipment and computer-readable storage medium
US10896518B2 (en) Image processing method, image processing apparatus and computer readable storage medium
Wan et al. CoRRN: Cooperative reflection removal network
CN110222787B (en) Multi-scale target detection method and device, computer equipment and storage medium
CN110493527B (en) Body focusing method and device, electronic equipment and storage medium
CN110334635B (en) Subject tracking method, apparatus, electronic device and computer-readable storage medium
CN113766125B (en) Focusing method and device, electronic equipment and computer readable storage medium
CN108810418B (en) Image processing method, image processing device, mobile terminal and computer readable storage medium
CN108304821B (en) Image recognition method and device, image acquisition method and device, computer device and non-volatile computer-readable storage medium
CN110349163B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN109190533B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN110365897B (en) Image correction method and device, electronic equipment and computer readable storage medium
CN112651911A (en) High dynamic range imaging generation method based on polarization image
CN113658197B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN113673474B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN110392211A (en) Image processing method and device, electronic equipment, computer readable storage medium
CN110650288A (en) Focusing control method and device, electronic equipment and computer readable storage medium
CN107578372B (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
CN110399823B (en) Subject tracking method and apparatus, electronic device, and computer-readable storage medium
CN107770446B (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
CN113298829B (en) Image processing method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant