WO2022143366A1 - Image processing method and apparatus, electronic device, medium, and computer program product - Google Patents

Image processing method and apparatus, electronic device, medium, and computer program product Download PDF

Info

Publication number
WO2022143366A1
WO2022143366A1 PCT/CN2021/140683 CN2021140683W WO2022143366A1 WO 2022143366 A1 WO2022143366 A1 WO 2022143366A1 CN 2021140683 W CN2021140683 W CN 2021140683W WO 2022143366 A1 WO2022143366 A1 WO 2022143366A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
map
segmentation
predicted
target
Prior art date
Application number
PCT/CN2021/140683
Other languages
French (fr)
Chinese (zh)
Inventor
周芳汝
杨玫
安山
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022143366A1 publication Critical patent/WO2022143366A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and more particularly, to an image processing method, apparatus, electronic device, medium, and computer program product.
  • depth estimation is a part of 3D reconstruction, which requires estimation of depth information from 2D images.
  • the target object can be a person, segmenting the target object from the two-dimensional image, and estimating the depth of the target object are extremely important.
  • embodiments of the present disclosure provide an image processing method, apparatus, electronic device, medium, and computer program product.
  • An aspect of the embodiments of the present disclosure provides an image processing method, including: acquiring a target image, wherein the target image includes a target object and a non-target object; performing image segmentation processing and depth estimation processing on the target image, respectively obtaining the above the predicted segmentation map and the predicted depth map of the target image; determine the position of the above-mentioned target object in the predicted depth map of the above-mentioned target image according to the predicted segmentation map of the above-mentioned target image; and, according to the predicted depth map of the above-mentioned target image The position of the above-mentioned target object The predicted depth map is processed to obtain the predicted depth map of the target object.
  • an image processing apparatus including: an acquisition module for acquiring a target image, wherein the target image includes a target object and a non-target object; a first processing module for acquiring the target image The image is subjected to image segmentation processing and depth estimation processing to obtain the predicted segmentation map and the predicted depth map of the above-mentioned target image respectively; the determining module is used to determine the predicted depth map of the above-mentioned target image according to the predicted segmentation map of the above-mentioned target object. and a second processing module, configured to process the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • an electronic device including: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are processed by the one or more programs When executed by the processor, the above-mentioned one or more processors are caused to implement the above-mentioned method.
  • Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, cause the processor to implement the above method.
  • Another aspect of the embodiments of the present disclosure provides a computer program product, where the computer program product includes a computer program, and the computer program is used to implement the above method when executed by a processor.
  • the target image includes a target object and a non-target object, performing image segmentation processing and depth estimation processing on the target image, and obtaining a predicted segmentation map and a predicted depth map of the target image, respectively, according to the target object.
  • the predicted segmentation map of the target image determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map, Therefore, the technical problem of difficulty in realizing depth estimation for the target object in the target image in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the method has strong generalization.
  • FIG. 1 schematically shows an exemplary system architecture to which the image processing method and apparatus according to the embodiments of the present disclosure can be applied;
  • FIG. 2 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • FIG. 3 schematically shows a structural diagram of an image processing model according to an embodiment of the present disclosure
  • FIG. 4 schematically shows a flowchart of another image processing method according to an embodiment of the present disclosure
  • FIG. 5 schematically shows a schematic diagram of a target image according to an embodiment of the present disclosure
  • FIG. 6 schematically shows a predicted depth map of a target image according to an embodiment of the present disclosure
  • FIG. 7 schematically shows a predicted segmentation map of a target image according to an embodiment of the present disclosure
  • FIG. 8 schematically shows a predicted depth map of a target object according to an embodiment of the present disclosure
  • FIG. 9 schematically shows a predicted depth map of another target object according to an embodiment of the present disclosure.
  • FIG. 10 schematically shows a schematic diagram of still another target object according to an embodiment of the present disclosure.
  • FIG. 11 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure
  • FIG. 12 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure
  • FIG. 13 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • FIG. 14 schematically shows a block diagram of an electronic device suitable for an image processing method according to an embodiment of the present disclosure.
  • At least one of the “systems” shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ).
  • Embodiments of the present disclosure provide an image processing method, an image processing apparatus, and an electronic device applying the method.
  • the method includes acquiring a target image, wherein the target image includes a target object and a non-target object. Perform image segmentation processing and depth estimation processing on the target image, and obtain the predicted segmentation map and the predicted depth map of the target image, respectively.
  • the location of the target object in the predicted depth map of the target image is determined according to the predicted segmentation map of the target image.
  • the predicted depth map is processed according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • FIG. 1 schematically shows an exemplary system architecture 100 to which an image processing method or apparatus may be applied according to an embodiment of the present disclosure.
  • FIG. 1 is only an example of a system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art to understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used for other A device, system, environment or scene.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101, 102 and 103, such as image processing applications, model building applications, search applications, instant messaging tools, email clients and/or social platform software, etc. (only example).
  • the terminal devices 101, 102, and 103 may be various electronic devices having a display screen and supporting image processing and web browsing, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers.
  • the server 105 may be a server that provides various services, such as a background management server (just an example) that provides support for websites browsed by users using the terminal devices 101, 102, 103, processed pictures, and the like.
  • the background management server can process, analyze and save the received images, camera information and other data, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device.
  • the image processing method provided by the embodiment of the present disclosure may generally be executed by the server 105 .
  • the image processing apparatus provided by the embodiments of the present disclosure may generally be provided in the server 105 .
  • the image processing method provided by the embodiment of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 .
  • the image processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 .
  • the image processing method provided by the embodiment of the present disclosure may also be executed by the terminal device 101 , 102 or 103 , or may also be executed by other terminal device different from the terminal device 101 , 102 or 103 .
  • the image processing apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101 , 102 or 103 , or in other terminal devices different from the terminal device 101 , 102 or 103 .
  • the target image may be originally stored in any one of the terminal devices 101 , 102 or 103 (eg, the terminal device 101 , but not limited thereto), or stored on an external storage device and imported into the terminal device 101 . Then, the terminal device 101 may locally execute the image processing method provided by the embodiments of the present disclosure, or send the target image to other terminal devices, servers, or server clusters, and the other terminal devices, servers, or A server cluster is used to execute the image processing method provided by the embodiments of the present disclosure.
  • the terminal device 101 may locally execute the image processing method provided by the embodiments of the present disclosure, or send the target image to other terminal devices, servers, or server clusters, and the other terminal devices, servers, or A server cluster is used to execute the image processing method provided by the embodiments of the present disclosure.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • the method includes operations S210-S240.
  • a target image is acquired, wherein the target image includes a target object and a non-target object.
  • the target image may be a monocular image
  • the target image may include a target object and a non-target object, wherein the target object may be a person in the target image, and the non-target object may be a background object in the target image etc., such as desks, trees, cars, etc., but not limited to this, any target can also be designated as a target object according to actual needs, and objects of a different category from the designated target object are non-target objects.
  • the number of target objects can include one or more.
  • image segmentation processing and depth estimation processing are performed on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
  • performing depth estimation processing on the target image may be performing depth estimation on each pixel point on the target image according to the depth relationship reflected by the pixel value relationship.
  • the image segmentation may be a method of semantic segmentation, but is not limited thereto, and may also be a method of instance segmentation.
  • Semantic segmentation can classify each pixel in the image, and obtain a semantic segmentation mask corresponding to the image size, that is, the predicted semantic segmentation map of the image, but semantic segmentation does not distinguish different objects of the same category, that is Instances are not distinguished. Instance segmentation can not only realize the category division of pixels, but also distinguish different objects of the same category, that is, distinguish instances.
  • an instance segmentation mask corresponding to the image size can be obtained, that is, the Predict instance segmentation map.
  • the predicted semantic segmentation map of the image and the predicted instance segmentation map of the image may be collectively referred to as the predicted segmentation map of the image.
  • different categories may be represented by different colors, and correspondingly, different colors in the predicted segmentation map of the image represent different categories.
  • semantic segmentation or instance segmentation can be used to perform image segmentation processing on the target image to obtain a predicted segmentation map of the target image.
  • the size of the predicted segmentation map of the target image may be determined according to the actual situation, which is not specifically limited here.
  • the predicted segmentation map size of the target image is the same as the size of the target image.
  • the predicted segmentation map size of the target image is one-half the size of the target image.
  • using the semantic segmentation method to perform image segmentation processing can achieve real-time speed, so as to meet the processing requirements of real-time tasks.
  • the position of the target object in the predicted depth map of the target image is determined according to the predicted segmentation map of the target object.
  • the position of the target object in the predicted segmentation map of the target image can be determined according to the predicted segmentation map of the target object in the predicted segmentation map of the target image.
  • the position of the target object in the predicted depth map of the image corresponds to the position of the target object. Therefore, the position of the target object in the predicted depth map of the target image can be obtained based on the position of the target object in the predicted segmentation map of the target image.
  • the predicted depth map is processed according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • depth estimation can be performed on a monocular image by combining gradient and texture features. That is, the gradient information and texture information of the target image can be used as depth cues to assist the deep convolutional network to learn the depth information of the target image, and the predicted depth map of the target image can be obtained.
  • the target image can also be fed into a depth estimation network based on camera pose estimation to obtain a predicted depth map of the target image.
  • the above methods can only perform depth estimation on the entire target image, and it is difficult to extract the depth of the target object in the target image.
  • a sample-based learning method can also be used to realize depth estimation.
  • the sample-based learning method is to construct a data set, convert the problem of depth estimation of the target object into a retrieval problem, and use the method of feature matching to retrieve the data set to obtain the depth estimation result of the target object in the target image.
  • the sample-based learning method can estimate the depth of the target object in the target image that has a matching relationship with the image in the dataset. But if an image matching the target image cannot be retrieved in the dataset, depth estimation for the target object in the target image cannot be achieved.
  • the above method can estimate the depth of the target object in the target image, but cannot obtain the position of the target object in the target image. This method has poor generalization and low estimation accuracy.
  • the position of the target object in the predicted depth map of the target image can be obtained according to the predicted segmentation map of the target image
  • the target object in the predicted depth map of the target image can be obtained according to the predicted segmentation map of the target image.
  • the predicted depth map of the target object can be obtained by processing the predicted depth map at the position of the target object, so the depth of the target object in the target image can be more accurately determined.
  • the depth estimation is not implemented by means of retrieval and matching, it is less affected by the samples, and therefore, the solution of the embodiments of the present disclosure has strong generalization.
  • the methods provided in the above operations S210 to S240 may be used to obtain the predicted depth map of the target object.
  • the predicted depth maps of each target object may be presented on one predicted depth map, or the predicted depth maps of each target object may be presented separately.
  • the target image includes a target object and a non-target object, performing image segmentation processing and depth estimation processing on the target image, and obtaining a predicted segmentation map and a predicted depth map of the target image, respectively, according to
  • the predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem that the depth estimation of the target object in the target image is difficult to achieve in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the generalization of the method is strong.
  • FIG. 2 The method shown in FIG. 2 will be further described below with reference to FIGS. 3 to 10 in conjunction with specific embodiments.
  • FIG. 3 schematically shows a structural diagram of an image processing model according to an embodiment of the present disclosure.
  • the image processing model includes a feature extraction network, an image segmentation network and a depth estimation network.
  • the present disclosure constructs an image processing model, which is obtained by training an encoding-decoding network with training samples, that is, inputting the target image to be predicted into the image processing model, and outputting the target image respectively
  • the predicted segmentation map and the predicted depth map are obtained, and then the predicted depth map of the target object of the target image is obtained.
  • the image processing model constructed in the present disclosure makes up for the technical deficiency in the related art that the predicted segmentation map including the target image and the predicted depth map cannot be simultaneously output.
  • FIG. 4 schematically shows a flowchart of another image processing method according to an embodiment of the present disclosure.
  • using the image processing model to process the target image and respectively obtaining the predicted segmentation map and the predicted depth map of the target image may include the following operations S410 to S460 .
  • a feature extraction network is used to process the target image to obtain a first intermediate feature map.
  • the image segmentation network is used to process the first intermediate feature map to obtain a second intermediate feature map.
  • the depth estimation network is used to process the first intermediate feature map to obtain a third intermediate feature map.
  • a fourth intermediate feature map is generated according to the second intermediate feature map and the third intermediate feature map.
  • a depth estimation network is used to process the fourth intermediate feature map to obtain a predicted depth map of the target image.
  • the image segmentation network is used to process the second intermediate feature map to obtain a predicted segmentation map of the target image.
  • the MobileNet+ASPP Atrous Spatial Pyramid Pooling module
  • the feature extraction network that is, the encoding network.
  • the height and width of the target image can be recorded as H and W, respectively.
  • the output of the intermediate layer is a feature map f 1 whose size is
  • the feature map f 2 can be input into the ASPP module, and the obtained output is fused with the feature map f 2 to output the feature map f 3 .
  • both the image segmentation network and the depth estimation network take the same feature map f 4 (ie, the first intermediate feature map) as input, and output the second intermediate feature map (ie, the feature map f 5 ) and the third intermediate feature map respectively.
  • the intermediate feature map ie, feature map f6 . It should be noted that, since the first intermediate feature map processed by the feature extraction network is used as the input of the depth estimation network and the image segmentation network, respectively, the feature extraction network comprehensively considers the detailed information and abstract information of the target image.
  • the second intermediate feature map (ie, feature map f 5 ) output by the image segmentation network can be input into the depth estimation network, and combined with the third intermediate feature map (ie, feature map f 6 ) to obtain the fourth intermediate feature map feature map (ie, feature map f7 ).
  • the fourth intermediate feature map is obtained by convolution and upsampling with a size of the predicted depth map.
  • the size of the second intermediate feature map is obtained by convolution and upsampling as The predicted segmentation map of .
  • the prediction result of the depth estimation network is corrected, so that the prediction result of the depth estimation is more accurate.
  • processing the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object may include the following operations.
  • the pixel values of other positions in the predicted depth map of the target image except the position of the target image are set as preset pixel values to obtain the predicted depth map of the target object.
  • the pixel values of other positions in the predicted depth map of the target image other than the position of the target image can be set as preset pixel values to obtain the predicted depth map of the target object.
  • the preset pixel value may be set according to the actual situation, which is not specifically limited here, for example, the preset pixel value may be 0.
  • FIG. 5 schematically shows a schematic diagram of a target image according to an embodiment of the present disclosure.
  • FIG. 6 schematically illustrates a predicted depth map of a target image according to an embodiment of the present disclosure.
  • FIG. 7 schematically shows a predicted segmentation map of a target image according to an embodiment of the present disclosure.
  • FIG. 8 schematically shows a predicted depth map of a target object according to an embodiment of the present disclosure.
  • the target object in Figure 8 is a person.
  • FIG. 9 schematically shows a predicted depth map of another target object according to an embodiment of the present disclosure.
  • the target object in Figure 9 is a refrigerator.
  • FIG. 10 schematically shows a schematic diagram of still another target object according to an embodiment of the present disclosure.
  • the number of target objects in FIG. 10 includes a plurality.
  • the predicted depth map of the target object is obtained by applying the predicted segmentation map of the target image to the predicted depth map of the target image.
  • black represents the non-target area.
  • an image processing model is used to process a target image, and a predicted segmentation map and a predicted depth map of the target image are obtained respectively, wherein the image processing model is obtained by training using training samples, wherein the training samples include sample images and samples Depth labels and segmentation labels for images.
  • the image processing model is obtained by training using training samples, and may include the following operations.
  • Get training samples Use the training samples to train a fully convolutional neural network model to obtain an image processing model.
  • the fully convolutional neural network model includes an initial feature extraction network, an initial image segmentation network, and an initial depth estimation network.
  • using training samples to train a fully convolutional neural network model to obtain an image processing model may include the following operations.
  • the sample image is processed using the initial feature extraction network to obtain a fifth intermediate feature map.
  • a sixth intermediate feature map is obtained.
  • the fifth intermediate feature map is processed using the initial depth estimation network to obtain the seventh intermediate feature map.
  • an eighth intermediate feature map is generated.
  • the eighth intermediate feature map is processed using the initial depth estimation network to obtain the predicted depth map of the sample image.
  • a predicted segmentation map of the sample image is obtained. Input the depth label, predicted depth map, segmentation label and predicted segmentation map of the sample image into the loss function of the fully convolutional neural network model, and output the loss result. Adjust the network parameters of the fully convolutional neural network model according to the loss results until the loss function converges. Use the trained fully convolutional neural network model as an image processing model.
  • the fifth intermediate feature map can be understood as the feature map f 4 in FIG. 3
  • the sixth intermediate feature map can be understood as The feature map f 5 in FIG. 3 and the seventh intermediate feature map can be understood as the feature map f 6 in FIG. 3
  • the eighth intermediate feature map can be understood as the feature map f 7 in FIG. 3 .
  • the initial feature extraction network, the initial image segmentation network and the initial depth estimation network are respectively called the feature extraction network, the image segmentation network and the depth estimation network.
  • FIG. 11 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure.
  • performing image segmentation processing on the sample image to obtain the segmentation label of the sample image may include the following operations S1110 to S1130.
  • an instance segmentation process is performed on the sample image to obtain an instance segmentation label of the sample image.
  • the label is segmented according to the instance of the sample image to obtain the semantic segmentation label of the sample image.
  • the semantic segmentation label of the sample image is used as the segmentation label of the sample image.
  • the instance is segmented into images that may include multiple instances belonging to the same category, which need to be distinguished.
  • the target image may include a number of people belonging to the category of people, that is, it includes multiple people.
  • instance segmentation it is necessary to distinguish these multiple people, and each person can get the corresponding instance Split labels.
  • semantic segmentation is to classify each pixel in the image, but does not distinguish instances.
  • the target image may include multiple people belonging to the category of people, that is, including multiple people.
  • semantic segmentation there is no need to distinguish these multiple people, and multiple people get the same semantic segmentation. Label.
  • a Mask_RCNN Recurrent Convolutional Neural Network
  • CAD_60 CAD_60
  • CAD_120 CAD_120
  • EPFL labels CAD_60
  • semantic segmentation labels can also be used alone. Since the embodiment of the present disclosure adopts the sample image database for depth estimation, the depth label of the sample image can be obtained.
  • Mask_RCNN can detect and segment objects of 82 categories (including background categories). In practical applications, there may be less than 82 categories in the sample image database for depth estimation. , directly using 82 categories as segmentation labels will expand the segmentation range, resulting in an increase in the error probability of segmentation processing.
  • 59 of these categories appear in the sample image database used in order to construct dense segmentation labels.
  • Table 1 the categories of Mask_RCNN can be mapped, and the categories that are not involved are marked as -1, so as to reduce the error probability and improve the segmentation effect and accuracy on the basis of image segmentation processing.
  • the result of instance segmentation labels obtained by using the Mask_RCNN network has high accuracy, which is beneficial to the construction of an image processing model.
  • semantic segmentation can also be directly performed on a sample image database with depth labels to obtain the semantic segmentation labels of the sample images.
  • image segmentation is performed on the sample images to obtain the segmentation labels of the sample images, so that the sample images have both depth labels and segmentation labels.
  • the segmentation label of the known sample image can also be used to perform depth estimation on the sample image to obtain the depth label of the sample image, so that the sample image has both the depth label and the segmentation label.
  • the accuracy of the predicted depth map of the target object in the target image obtained by the first method is higher.
  • FIG. 12 schematically shows a flowchart of yet another image processing method according to an embodiment of the present disclosure.
  • the operations of the image processing method may include.
  • the input image is normalized to obtain an RGB image, which is used as the input of the image processing model to obtain the predicted depth map and predicted segmentation map of the target image, respectively.
  • the image processing method can be applied to an application scenario in which a monocular robot finds or avoids a specific object, wherein the characteristic object is the target object.
  • FIG. 13 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • the image processing apparatus 1300 includes an acquisition module 1310 , a first processing module 1320 , a determination module 1330 and a second processing module 1340 .
  • the acquiring module 1310 , the first processing module 1320 , the determining module 1330 and the second processing module 1340 are connected in communication.
  • the acquiring module 1310 is configured to acquire a target image, wherein the target image includes a target object and a non-target object.
  • the first processing module 1320 is configured to perform image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
  • the determining module 1330 is configured to determine the position of the target object in the predicted depth map of the target image according to the predicted segmentation map of the target object.
  • the second processing module 1340 is configured to process the predicted depth map according to the position of the target object in the predicted depth map of the target object to obtain the predicted depth map of the target object.
  • the target image includes a target object and a non-target object, and performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
  • the predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem of difficulty in realizing depth estimation of the target object in the target image in the related art is at least partially overcome, and the depth of the target object in the target image can be more accurately determined, and the method has strong generalization.
  • the first processing module 1320 includes a first processing unit.
  • the first processing unit is used to process the target image by using an image processing model to obtain a segmentation map and a depth map respectively, wherein the image processing model is obtained by training with training samples, wherein the training samples include the sample image and the depth label and the depth label of the sample image.
  • Split labels are used to process the target image by using an image processing model to obtain a segmentation map and a depth map respectively, wherein the image processing model is obtained by training with training samples, wherein the training samples include the sample image and the depth label and the depth label of the sample image.
  • the image processing model includes a feature extraction network, an image segmentation network, and a depth estimation network.
  • the first processing unit includes a first processing subunit, a second processing subunit, a third processing subunit, a fourth processing subunit, a fifth processing subunit, and a sixth processing subunit.
  • the first processing subunit is used to process the target image by using the feature extraction network to obtain the first intermediate feature map.
  • the second processing subunit is used to process the first intermediate feature map by using the image segmentation network to obtain the second intermediate feature map.
  • the third processing subunit is used to process the first intermediate feature map by using the depth estimation network to obtain a third intermediate feature map.
  • the fourth processing subunit is configured to generate a fourth intermediate feature map according to the second intermediate feature map and the third intermediate feature map.
  • the fifth processing subunit is used for processing the fourth intermediate feature map by using the depth estimation network to obtain the predicted depth map of the target image.
  • the sixth processing subunit is used to process the second intermediate feature map by using the image segmentation network to obtain the predicted segmentation map of the target image.
  • the image processing model is obtained by training using training samples, and may include the following operations.
  • Get training samples Use the training samples to train a fully convolutional neural network model to obtain an image processing model.
  • the fully convolutional neural network model includes an initial feature extraction network, an initial image segmentation network, and an initial depth estimation network.
  • Using the training samples to train a fully convolutional neural network model to obtain an image processing model may include the following operations.
  • the sample image is processed using the initial feature extraction network to obtain a fifth intermediate feature map.
  • a sixth intermediate feature map is obtained.
  • the fifth intermediate feature map is processed using the initial depth estimation network to obtain a seventh intermediate feature map.
  • an eighth intermediate feature map is generated.
  • the eighth intermediate feature map is processed using the initial depth estimation network to obtain the predicted depth map of the sample image.
  • a predicted segmentation map of the sample image is obtained. Input the depth label, predicted depth map, segmentation label and predicted segmentation map of the sample image into the loss function of the fully convolutional neural network model to obtain the loss result. Adjust the network parameters of the fully convolutional neural network model according to the loss results until the loss function converges. Use the trained fully convolutional neural network model as an image processing model.
  • performing image segmentation processing on a sample image to obtain a segmentation label of the sample image may include the following operations.
  • Instance segmentation is performed on the sample image to obtain the instance segmentation label of the sample image.
  • the semantic segmentation label of the sample image is obtained.
  • the semantic segmentation label of the sample image is used as the segmentation label of the sample image.
  • the second processing module 1320 includes a second processing unit.
  • the second processing unit is configured to set the pixel values of other positions in the predicted depth map of the target image except the position of the target image as preset pixel values to obtain the predicted depth map of the target object.
  • image segmentation processing includes semantic segmentation processing or instance segmentation processing.
  • any of the modules, units, or at least part of the functions of any of the modules according to the embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be divided into multiple modules for implementation. Any one or more of the modules and units according to the embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as Field Programmable Gate Arrays (FPGA), Programmable Logic Arrays (Programmable Logic Arrays, PLA), system-on-chip, system-on-substrate, system-on-package, Application Specific Integrated Circuit (ASIC), or any other reasonable means of hardware or firmware that can integrate or package a circuit, Or it can be implemented in any one of the three implementation manners of software, hardware and firmware, or in an appropriate combination of any of them. Alternatively, one or more of the modules and units according to the embodiments of the present disclosure may be implemented at least in part as computer program modules, which, when executed, may perform corresponding functions.
  • FPGA Field Programmable Gate
  • any one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be combined in one module/unit for implementation, or any one of the modules/units may be split into multiple modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units and implemented in one module/unit.
  • At least one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), Programmable logic array (PLA), system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or hardware or firmware that can be implemented by any other reasonable means of integrating or packaging circuits, Or it can be implemented in any one of the three implementation manners of software, hardware and firmware, or in an appropriate combination of any of them.
  • at least one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be implemented at least partially as a computer program module, which, when executed, may perform corresponding functions .
  • image processing apparatus part in the embodiment of the present disclosure corresponds to the image processing method part in the embodiment of the present disclosure, and the description of the image processing apparatus part refers to the image processing method part, which is not repeated here.
  • FIG. 14 schematically shows a block diagram of an electronic device suitable for implementing the method described above, according to an embodiment of the present disclosure.
  • the electronic device shown in FIG. 14 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 1400 includes a processor 1401, which can be loaded into a random access memory according to a program stored in a read-only memory (Read-Only Memory, ROM) 1402 or from a storage part 1408 (Random Access Memory, RAM) program in 1403 to execute various appropriate actions and processes.
  • the processor 1401 may include, for example, a general-purpose microprocessor (eg, a CPU), an instruction set processor and/or a related chipset, and/or a special-purpose microprocessor (eg, an application-specific integrated circuit (ASIC)), among others.
  • the processor 1401 may also include on-board memory for caching purposes.
  • the processor 1401 may include a single processing unit or multiple processing units for performing different actions of the method flow according to the embodiments of the present disclosure.
  • the processor 1401, the ROM 1402, and the RAM 1403 are connected to each other through a bus 1404.
  • the processor 1401 performs various operations of the method flow according to an embodiment of the present disclosure by executing programs in the ROM 1402 and/or the RAM 1403. Note that the program may also be stored in one or more memories other than ROM 1402 and RAM 1403.
  • the processor 1401 may also perform various operations of the method flow according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
  • the electronic device 1400 may also include an input/output (I/O) interface 1405 that is also connected to the bus 1404 .
  • System 1400 may also include one or more of the following components connected to I/O interface 1405: input portion 1406 including keyboard, mouse, etc.; including components such as cathode ray tube (CRT), liquid crystal display (LCD) ) etc. and an output section 1407 of speakers and the like; a storage section 1408 including a hard disk and the like; and a communication section 1409 including a network interface card such as a LAN card, a modem and the like. The communication section 1409 performs communication processing via a network such as the Internet.
  • Drivers 1410 are also connected to I/O interface 1405 as needed.
  • a removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1410 as needed so that a computer program read therefrom is installed into the storage section 1408 as needed.
  • the method flow according to an embodiment of the present disclosure may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 1409, and/or installed from the removable medium 1411.
  • the above-described functions defined in the system of the embodiment of the present disclosure are performed.
  • the above-described systems, apparatuses, apparatuses, modules, units, etc. can be implemented by computer program modules.
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the device/apparatus/system described in the above embodiments; it may also exist alone without being assembled into the device/system. device/system.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, implement the method according to the embodiment of the present disclosure.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM or flash memory), portable computer Compact disk read-only memory (Computer Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable storage medium may include one or more memories other than ROM 1402 and/or RAM 1403 and/or ROM 1402 and RAM 1403 described above.
  • the target image includes a target object and a non-target object, and performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
  • the predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  • the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem that the depth estimation of the target object in the target image is difficult to achieve in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the generalization of the method is strong.
  • the embodiments of the present disclosure also include a computer program product, which includes a computer program, the computer program includes program codes for executing the methods provided by the embodiments of the present disclosure, and when the computer program product runs on an electronic device, the program The code is used to enable the electronic device to implement the image processing method provided by the embodiments of the present disclosure.
  • the computer program may rely on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like.
  • the computer program may also be transmitted, distributed in the form of a signal over a network medium, and downloaded and installed through the communication portion 1409, and/or installed from a removable medium 1411.
  • the program code embodied by the computer program may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
  • the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages, and specifically, high-level procedures and/or object-oriented programming may be used. programming language, and/or assembly/machine language to implement these computational programs. Programming languages include, but are not limited to, languages such as Java, C++, python, "C" or similar programming languages.
  • the program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server.
  • the remote computing devices may be connected to the user computing device through any kind of network, including Local Area Networks (LANs) or Wide Area Networks (WANs), or may be connected to external A computing device (eg, connected via the Internet using an Internet service provider).
  • LANs Local Area Networks
  • WANs Wide Area Networks
  • a computing device eg, connected via the Internet using an Internet service provider.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.
  • Those skilled in the art will appreciate that various combinations and/or combinations of features recited in various embodiments and/or claims of the present disclosure are possible, even if such combinations or combinations are not expressly recited in the present disclosure.
  • various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or in the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of this disclosure.

Abstract

Embodiments of the present disclosure provide an image processing method and apparatus, an electronic device, a medium, and a computer program product. The method comprises: acquiring a target image, the target image comprising a target object and a non-target object; performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively; determining the position of the target object in the predicted depth map of the target image according to a predicted segmentation map of the target object; and processing the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain a predicted depth map of the target object.

Description

图像处理方法、装置、电子设备、介质及计算机程序产品Image processing method, apparatus, electronic device, medium and computer program product
本申请要求于2021年01月04日递交的中国专利申请No.202110002321.5的优先权,其内容一并在此作为参考。This application claims the priority of Chinese Patent Application No. 202110002321.5 filed on January 4, 2021, the contents of which are incorporated herein by reference.
技术领域technical field
本公开实施例涉及计算机技术领域,更具体地,涉及一种图像处理方法、装置、电子设备、介质及计算机程序产品。The embodiments of the present disclosure relate to the field of computer technology, and more particularly, to an image processing method, apparatus, electronic device, medium, and computer program product.
背景技术Background technique
在计算机视觉领域,深度估计属于三维重建的一部分,其要求从二维图像中估计深度信息。针对某些具体任务,例如单目机器人躲避或寻找目标对象,目标对象可以为人,从二维图像中分割出目标对象,并估计目标对象的深度具有极其重要的作用。In the field of computer vision, depth estimation is a part of 3D reconstruction, which requires estimation of depth information from 2D images. For some specific tasks, such as monocular robot dodging or finding the target object, the target object can be a person, segmenting the target object from the two-dimensional image, and estimating the depth of the target object are extremely important.
在实现本公开构思的过程中,发明人发现相关技术中至少存在如下问题:采用相关技术较难以实现针对目标图像中目标对象的深度估计。During the process of realizing the concept of the present disclosure, the inventor found that there are at least the following problems in the related art: it is difficult to realize the depth estimation for the target object in the target image by using the related art.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本公开实施例提供了一种图像处理方法、装置、电子设备、介质及计算机程序产品。In view of this, embodiments of the present disclosure provide an image processing method, apparatus, electronic device, medium, and computer program product.
本公开实施例的一个方面提供了一种图像处理方法,包括:获取目标图像,其中,上述目标图像包括目标对象和非目标对象;对上述目标图像进行图像分割处理和深度估计处理,分别得到上述目标图像的预测分割图和预测深度图;根据上述目标对象的预测分割图确定上述目标图像的预测深度图中上述目标对象的位置;以及,根据上述目标图像的预测深度图中上述目标对象的位置对上述预测深度图进行处理,得到上述目标对象的预测深度图。An aspect of the embodiments of the present disclosure provides an image processing method, including: acquiring a target image, wherein the target image includes a target object and a non-target object; performing image segmentation processing and depth estimation processing on the target image, respectively obtaining the above the predicted segmentation map and the predicted depth map of the target image; determine the position of the above-mentioned target object in the predicted depth map of the above-mentioned target image according to the predicted segmentation map of the above-mentioned target image; and, according to the predicted depth map of the above-mentioned target image The position of the above-mentioned target object The predicted depth map is processed to obtain the predicted depth map of the target object.
本公开实施例的另一个方面提供了一种图像处理装置,包括:获取模块,用于获取目标图像,其中,上述目标图像包括目标对象和非目标对象;第一处理模块,用于对上述目标图像进行图像分割处理和深度估计处理,分别得到上述目标图像的预测分割图和预测深度图;确定模块,用于根据上述目标对象的预测分割图确定上述目标图像的预测深度图中上述目标 对象的位置;以及,第二处理模块,用于根据上述目标图像的预测深度图中上述目标对象的位置对上述预测深度图进行处理,得到上述目标对象的预测深度图。Another aspect of the embodiments of the present disclosure provides an image processing apparatus, including: an acquisition module for acquiring a target image, wherein the target image includes a target object and a non-target object; a first processing module for acquiring the target image The image is subjected to image segmentation processing and depth estimation processing to obtain the predicted segmentation map and the predicted depth map of the above-mentioned target image respectively; the determining module is used to determine the predicted depth map of the above-mentioned target image according to the predicted segmentation map of the above-mentioned target object. and a second processing module, configured to process the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
本公开实施例的另一个方面提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序,其中,当上述一个或多个程序被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如上述的方法。Another aspect of the embodiments of the present disclosure provides an electronic device, including: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are processed by the one or more programs When executed by the processor, the above-mentioned one or more processors are caused to implement the above-mentioned method.
本公开实施例的另一个方面提供了一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现如上述的方法。Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, cause the processor to implement the above method.
本公开实施例的另一个方面提供了一种计算机程序产品,上述计算机程序产品包括计算机程序,上述计算机程序被处理器执行时用于实现如上所述的方法。Another aspect of the embodiments of the present disclosure provides a computer program product, where the computer program product includes a computer program, and the computer program is used to implement the above method when executed by a processor.
根据本公开的实施例,通过获取目标图像,目标图像包括目标对象和非目标对象,对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图,根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置,并根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。由于将图像分割和深度估计结合,其中,根据预测分割图能够获得预测深度图中目标对象的位置,根据预测深度图中目标对象的位置对预测深度图进行处理能够获得目标对象的预测深度图,因此,至少部分地克服了相关技术中难以实现针对目标图像中目标对象的深度估计的技术问题,进而实现了较为准确地确定目标图像中目标对象的深度,并且方法的泛化性较强。According to an embodiment of the present disclosure, by acquiring a target image, the target image includes a target object and a non-target object, performing image segmentation processing and depth estimation processing on the target image, and obtaining a predicted segmentation map and a predicted depth map of the target image, respectively, according to the target object. The predicted segmentation map of the target image determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object. Due to the combination of image segmentation and depth estimation, the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map, Therefore, the technical problem of difficulty in realizing depth estimation for the target object in the target image in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the method has strong generalization.
附图说明Description of drawings
通过以下参照附图对本公开实施例的描述,本公开的上述以及其他目的、特征和优点将更为清楚,在附图中:The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
图1示意性示出了可以应用本公开实施例的图像处理方法和装置的示例性系统架构;FIG. 1 schematically shows an exemplary system architecture to which the image processing method and apparatus according to the embodiments of the present disclosure can be applied;
图2示意性示出了根据本公开实施例的一种图像处理方法的流程图;FIG. 2 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure;
图3示意性示出了根据本公开实施例的图像处理模型结构图;FIG. 3 schematically shows a structural diagram of an image processing model according to an embodiment of the present disclosure;
图4示意性示出了根据本公开实施例的另一种图像处理方法的流程图;FIG. 4 schematically shows a flowchart of another image processing method according to an embodiment of the present disclosure;
图5示意性示出了根据本公开实施例的目标图像的示意图;FIG. 5 schematically shows a schematic diagram of a target image according to an embodiment of the present disclosure;
图6示意性示出了根据本公开实施例的目标图像的预测深度图;FIG. 6 schematically shows a predicted depth map of a target image according to an embodiment of the present disclosure;
图7示意性示出了根据本公开实施例的目标图像的预测分割图;FIG. 7 schematically shows a predicted segmentation map of a target image according to an embodiment of the present disclosure;
图8示意性示出了根据本公开实施例的一种目标对象的预测深度图;FIG. 8 schematically shows a predicted depth map of a target object according to an embodiment of the present disclosure;
图9示意性示出了根据本公开实施例的另一种目标对象的预测深度图;FIG. 9 schematically shows a predicted depth map of another target object according to an embodiment of the present disclosure;
图10示意性性示出了根据本公开实施例的再一种目标对象的示意图;FIG. 10 schematically shows a schematic diagram of still another target object according to an embodiment of the present disclosure;
图11示意性示出了根据本公开实施例的再一种图像处理方法的流程图;FIG. 11 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure;
图12示意性示出了根据本公开实施例的又一种图像处理方法的流程图;FIG. 12 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure;
图13示意性示出了根据本公开实施例的图像处理装置的框图;以及FIG. 13 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure; and
图14示意性示出了根据本公开实施例的适于图像处理方法的电子设备的框图。FIG. 14 schematically shows a block diagram of an electronic device suitable for an image processing method according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本公开实施例的全面理解。然而,明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. In the following detailed description, for convenience of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It will be apparent, however, that one or more embodiments may be practiced without these specific details. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concepts of the present disclosure.
在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. The terms "comprising", "comprising" and the like as used herein indicate the presence of stated features, steps, operations and/or components, but do not preclude the presence or addition of one or more other features, steps, operations or components.
在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meaning as commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly rigid manner.
在使用类似于“A、B和C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如, “具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。在使用类似于“A、B或C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B或C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。Where expressions like "at least one of A, B, and C, etc.," are used, they should generally be interpreted in accordance with the meaning of the expression as commonly understood by those skilled in the art (eg, "has A, B, and C") At least one of the "systems" shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ). Where expressions like "at least one of A, B, or C, etc.," are used, they should generally be interpreted in accordance with the meaning of the expression as commonly understood by those skilled in the art (eg, "has A, B, or C, etc." At least one of the "systems" shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ).
本公开的实施例提供了一种图像处理方法、图像处理装置以及应用该方法的电子设备。该方法包括获取目标图像,其中,目标图像包括目标对象和非目标对象。对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图。根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置。根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。Embodiments of the present disclosure provide an image processing method, an image processing apparatus, and an electronic device applying the method. The method includes acquiring a target image, wherein the target image includes a target object and a non-target object. Perform image segmentation processing and depth estimation processing on the target image, and obtain the predicted segmentation map and the predicted depth map of the target image, respectively. The location of the target object in the predicted depth map of the target image is determined according to the predicted segmentation map of the target image. The predicted depth map is processed according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
图1示意性示出了根据本公开实施例的可以应用图像处理方法或者装置的示例性系统架构100。需要注意的是,图1所示仅为可以应用本公开实施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容,但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。FIG. 1 schematically shows an exemplary system architecture 100 to which an image processing method or apparatus may be applied according to an embodiment of the present disclosure. It should be noted that FIG. 1 is only an example of a system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art to understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used for other A device, system, environment or scene.
如图1所示,根据该实施例的系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线和/或无线通信链路等。As shown in FIG. 1 , the system architecture 100 according to this embodiment may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如图像处理类应用、模型构建类应用、搜索类应用、即时通信工具、邮箱客户端和/或社交平台软件等(仅为示例)。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101, 102 and 103, such as image processing applications, model building applications, search applications, instant messaging tools, email clients and/or social platform software, etc. (only example).
终端设备101、102、103可以是具有显示屏并且支持图像处理、网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等。The terminal devices 101, 102, and 103 may be various electronic devices having a display screen and supporting image processing and web browsing, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers.
服务器105可以是提供各种服务的服务器,例如对用户利用终端设备101、102、103所浏览的网站、处理的图片等提供支持的后台管理服务器 (仅为示例)。后台管理服务器可以对接收到的图像、摄像信息等数据进行处理、分析以及保存等处理,并将处理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给终端设备。The server 105 may be a server that provides various services, such as a background management server (just an example) that provides support for websites browsed by users using the terminal devices 101, 102, 103, processed pictures, and the like. The background management server can process, analyze and save the received images, camera information and other data, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device.
需要说明的是,本公开实施例所提供的图像处理方法一般可以由服务器105执行。相应地,本公开实施例所提供的图像处理装置一般可以设置于服务器105中。本公开实施例所提供的图像处理方法也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地,本公开实施例所提供的图像处理装置也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。或者,本公开实施例所提供的图像处理方法也可以由终端设备101、102或103执行,或者也可以由不同于终端设备101、102或103的其他终端设备执行。相应地,本公开实施例所提供的图像处理装置也可以设置于终端设备101、102或103中,或设置于不同于终端设备101、102或103的其他终端设备中。It should be noted that, the image processing method provided by the embodiment of the present disclosure may generally be executed by the server 105 . Correspondingly, the image processing apparatus provided by the embodiments of the present disclosure may generally be provided in the server 105 . The image processing method provided by the embodiment of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 . Correspondingly, the image processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 . Alternatively, the image processing method provided by the embodiment of the present disclosure may also be executed by the terminal device 101 , 102 or 103 , or may also be executed by other terminal device different from the terminal device 101 , 102 or 103 . Correspondingly, the image processing apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101 , 102 or 103 , or in other terminal devices different from the terminal device 101 , 102 or 103 .
例如,目标图像可以原本存储在终端设备101、102或103中的任意一个(例如,终端设备101,但不限于此)之中,或者存储在外部存储设备上并可以导入到终端设备101中。然后,终端设备101可以在本地执行本公开实施例所提供的图像处理方法,或者将目标图像发送到其他终端设备、服务器、或服务器集群,并由接收该目标图像的其他终端设备、服务器、或服务器集群来执行本公开实施例所提供的图像处理方法。For example, the target image may be originally stored in any one of the terminal devices 101 , 102 or 103 (eg, the terminal device 101 , but not limited thereto), or stored on an external storage device and imported into the terminal device 101 . Then, the terminal device 101 may locally execute the image processing method provided by the embodiments of the present disclosure, or send the target image to other terminal devices, servers, or server clusters, and the other terminal devices, servers, or A server cluster is used to execute the image processing method provided by the embodiments of the present disclosure.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
图2示意性示出了根据本公开实施例的一种图像处理方法的流程图。FIG. 2 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure.
如图2所示,该方法包括操作S210~S240。As shown in FIG. 2, the method includes operations S210-S240.
在操作S210,获取目标图像,其中,目标图像包括目标对象和非目标对象。In operation S210, a target image is acquired, wherein the target image includes a target object and a non-target object.
根据本公开的实施例,目标图像可以为单目图像,目标图像可以包括目标对象和非目标对象,其中,目标对象可以是目标图像中的人,而非目标对象可以是目标图像中的背景物体等,例如桌子、树和车等,但是并不 局限于此,还可以根据实际需求指定任一目标为目标对象,而与指定的目标对象不同类别的均为非目标对象。目标对象的数量可以包括一个或多个。According to an embodiment of the present disclosure, the target image may be a monocular image, and the target image may include a target object and a non-target object, wherein the target object may be a person in the target image, and the non-target object may be a background object in the target image etc., such as desks, trees, cars, etc., but not limited to this, any target can also be designated as a target object according to actual needs, and objects of a different category from the designated target object are non-target objects. The number of target objects can include one or more.
在操作S220,对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图。In operation S220, image segmentation processing and depth estimation processing are performed on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
根据本公开的实施例,对目标图像进行深度估计处理,可以是根据像素值关系反映的深度关系对目标图像上的每个像素点进行深度估计。According to an embodiment of the present disclosure, performing depth estimation processing on the target image may be performing depth estimation on each pixel point on the target image according to the depth relationship reflected by the pixel value relationship.
根据本公开的实施例,图像分割可以是语义分割的方法,但是并不局限于此,还可以是实例分割的方法。语义分割可以为图像中的每个像素点进行类别划分,得到一幅与图像尺寸具有对应关系的语义分割掩码,即图像的预测语义分割图,但语义分割不区分同一类别的不同对象,即不区分实例。实例分割不仅可以实现像素点的类别划分,还可以区分同一类别的不同对象,即区分实例,通过对图像进行实例分割可以得到一幅与图像尺寸具有对应关系的实例分割掩码,即图像的的预测实例分割图。可以将图像的预测语义分割图和图像的预测实例分割图统称为图像的预测分割图。在本公开的实施例中,不同类别可以用不同颜色表征,相应的,图像的预测分割图中不同颜色表征不同类别。According to an embodiment of the present disclosure, the image segmentation may be a method of semantic segmentation, but is not limited thereto, and may also be a method of instance segmentation. Semantic segmentation can classify each pixel in the image, and obtain a semantic segmentation mask corresponding to the image size, that is, the predicted semantic segmentation map of the image, but semantic segmentation does not distinguish different objects of the same category, that is Instances are not distinguished. Instance segmentation can not only realize the category division of pixels, but also distinguish different objects of the same category, that is, distinguish instances. By instance segmentation of the image, an instance segmentation mask corresponding to the image size can be obtained, that is, the Predict instance segmentation map. The predicted semantic segmentation map of the image and the predicted instance segmentation map of the image may be collectively referred to as the predicted segmentation map of the image. In the embodiment of the present disclosure, different categories may be represented by different colors, and correspondingly, different colors in the predicted segmentation map of the image represent different categories.
根据本公开的实施例,可以采用语义分割或实例分割对目标图像进行图像分割处理,得到目标图像的预测分割图。其中,目标图像的预测分割图尺寸可以根据实际情况确定,在此不作具体限定。例如,目标图像的预测分割图尺寸和目标图像尺寸相同。或者,目标图像的预测分割图尺寸是目标图像尺寸的二分之一。According to the embodiments of the present disclosure, semantic segmentation or instance segmentation can be used to perform image segmentation processing on the target image to obtain a predicted segmentation map of the target image. The size of the predicted segmentation map of the target image may be determined according to the actual situation, which is not specifically limited here. For example, the predicted segmentation map size of the target image is the same as the size of the target image. Alternatively, the predicted segmentation map size of the target image is one-half the size of the target image.
根据本公开的实施例,采用语义分割方法进行图像分割处理能够在速度实现实时,从而满足实时任务的处理要求。According to the embodiments of the present disclosure, using the semantic segmentation method to perform image segmentation processing can achieve real-time speed, so as to meet the processing requirements of real-time tasks.
在操作S230,根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置。In operation S230, the position of the target object in the predicted depth map of the target image is determined according to the predicted segmentation map of the target object.
根据本公开的实施例,可以根据目标图像的预测分割图中目标对象的预测分割图确定目标图像的预测分割图中目标对象的位置,由于目标图像的预测分割图中的目标对象的位置与目标图像的预测深度图中目标对象的位置相对应,因此,可以基于目标图像的预测分割图中的目标对象的位置获得目标图像的预测深度图中目标对象的位置。According to the embodiments of the present disclosure, the position of the target object in the predicted segmentation map of the target image can be determined according to the predicted segmentation map of the target object in the predicted segmentation map of the target image. The position of the target object in the predicted depth map of the image corresponds to the position of the target object. Therefore, the position of the target object in the predicted depth map of the target image can be obtained based on the position of the target object in the predicted segmentation map of the target image.
在操作S240,根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。In operation S240, the predicted depth map is processed according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
根据本公开的实施例,在实施本公开的相关技术中,可以结合梯度和纹理特征对单目图像进行深度估计。即可以把目标图像的梯度信息和纹理信息作为深度线索,协助深度卷积网络学习目标图像的深度信息,得到目标图像的预测深度图。还可以将目标图像输入基于相机位姿估计得到的深度估计网络,以获得目标图像的预测深度图。According to the embodiments of the present disclosure, in implementing the related art of the present disclosure, depth estimation can be performed on a monocular image by combining gradient and texture features. That is, the gradient information and texture information of the target image can be used as depth cues to assist the deep convolutional network to learn the depth information of the target image, and the predicted depth map of the target image can be obtained. The target image can also be fed into a depth estimation network based on camera pose estimation to obtain a predicted depth map of the target image.
但是,以上方法均只能对整个目标图像进行深度估计,难以提取目标图像中目标对象的深度。However, the above methods can only perform depth estimation on the entire target image, and it is difficult to extract the depth of the target object in the target image.
在实施本公开的相关技术中,还可以采用基于样本的学习方法实现深度估计。基于样本的学习方法是构建数据集,将对目标对象的深度估计问题转换为检索问题,采用特征匹配的方法在数据集中进行检索,得到目标图像中目标对象的深度估计结果。基于样本的学习方法能够估计与数据集中的图像具有匹配关系的目标图像中目标对象的深度。但是如果无法在数据集中检索到与目标图像匹配的图像,则不能实现针对目标图像中的目标对象的深度估计。此外,采用上述方法能够估计的是目标图像中目标对象的深度,并不能得到目标图像中目标对象的位置,该方法泛化性较差且估计精度不高。In implementing the related art of the present disclosure, a sample-based learning method can also be used to realize depth estimation. The sample-based learning method is to construct a data set, convert the problem of depth estimation of the target object into a retrieval problem, and use the method of feature matching to retrieve the data set to obtain the depth estimation result of the target object in the target image. The sample-based learning method can estimate the depth of the target object in the target image that has a matching relationship with the image in the dataset. But if an image matching the target image cannot be retrieved in the dataset, depth estimation for the target object in the target image cannot be achieved. In addition, the above method can estimate the depth of the target object in the target image, but cannot obtain the position of the target object in the target image. This method has poor generalization and low estimation accuracy.
根据本公开的实施例,由于将图像分割和深度估计进行结合,其中,根据目标图像的预测分割图能够获得目标图像的预测深度图中目标对象的位置,根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理能够获得目标对象的预测深度图,因此,能够实现较为准确地确定目标图像中目标对象的深度。此外,由于不是通过检索匹配的方式实现的深度估计,受样本的影响较小,因此,本公开实施例的方案的泛化性较强。According to the embodiments of the present disclosure, since image segmentation and depth estimation are combined, the position of the target object in the predicted depth map of the target image can be obtained according to the predicted segmentation map of the target image, and the target object in the predicted depth map of the target image can be obtained according to the predicted segmentation map of the target image. The predicted depth map of the target object can be obtained by processing the predicted depth map at the position of the target object, so the depth of the target object in the target image can be more accurately determined. In addition, since the depth estimation is not implemented by means of retrieval and matching, it is less affected by the samples, and therefore, the solution of the embodiments of the present disclosure has strong generalization.
需要说明的是,如果目标对象的数量包括至少两个,则针对每个目标对象均可以采用上述操作S210~S240提供的方法得到该目标对象的预测深度图。在呈现上,可以将各个目标对象的预测深度图呈现在一幅预测深度图上,或者,每个目标对象的预测深度图各自呈现。It should be noted that, if the number of target objects includes at least two, for each target object, the methods provided in the above operations S210 to S240 may be used to obtain the predicted depth map of the target object. In terms of presentation, the predicted depth maps of each target object may be presented on one predicted depth map, or the predicted depth maps of each target object may be presented separately.
根据本公开实施例的技术方案,通过获取目标图像,目标图像包括目标对象和非目标对象,对目标图像进行图像分割处理和深度估计处理,分 别得到目标图像的预测分割图和预测深度图,根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置,并根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。由于将图像分割和深度估计进行结合,其中,根据预测分割图能够获得预测深度图中目标对象的位置,根据预测深度图中目标对象的位置对预测深度图进行处理能够获得目标对象的预测深度图,因此,至少部分地克服了相关技术中难以实现针对目标图像中目标对象的深度估计的技术问题,进而实现了较为准确地确定目标图像中目标对象的深度,并且方法的泛化性较强。According to the technical solutions of the embodiments of the present disclosure, by acquiring a target image, the target image includes a target object and a non-target object, performing image segmentation processing and depth estimation processing on the target image, and obtaining a predicted segmentation map and a predicted depth map of the target image, respectively, according to The predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object. Due to the combination of image segmentation and depth estimation, the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem that the depth estimation of the target object in the target image is difficult to achieve in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the generalization of the method is strong.
下面参考图3~图10,结合具体实施例对图2所示的方法做进一步说明。The method shown in FIG. 2 will be further described below with reference to FIGS. 3 to 10 in conjunction with specific embodiments.
图3示意性示出了根据本公开实施例的图像处理模型结构图。FIG. 3 schematically shows a structural diagram of an image processing model according to an embodiment of the present disclosure.
如图3所示,图像处理模型包括特征提取网络、图像分割网络和深度估计网络。As shown in Figure 3, the image processing model includes a feature extraction network, an image segmentation network and a depth estimation network.
根据本公开的实施例,本公开构建一种图像处理模型,该图像处理模型是利用训练样本对编解码网络进行训练得到的,即将待预测的目标图像输入至图像处理模型中,分别输出目标图像的预测分割图和预测深度图,进而得到该目标图像的目标对象的预测深度图。根据本公开的实施例,本公开构建的图像处理模型,弥补了相关技术中不能同时输出包含目标图像的预测分割图和预测深度图的技术缺失。According to an embodiment of the present disclosure, the present disclosure constructs an image processing model, which is obtained by training an encoding-decoding network with training samples, that is, inputting the target image to be predicted into the image processing model, and outputting the target image respectively The predicted segmentation map and the predicted depth map are obtained, and then the predicted depth map of the target object of the target image is obtained. According to the embodiments of the present disclosure, the image processing model constructed in the present disclosure makes up for the technical deficiency in the related art that the predicted segmentation map including the target image and the predicted depth map cannot be simultaneously output.
图4示意性示出了根据本公开实施例的另一种图像处理方法的流程图。FIG. 4 schematically shows a flowchart of another image processing method according to an embodiment of the present disclosure.
如图4所示,利用图像处理模型处理目标图像,分别得到目标图像的预测分割图和预测深度图可以包括如下操作S410~S460。As shown in FIG. 4 , using the image processing model to process the target image and respectively obtaining the predicted segmentation map and the predicted depth map of the target image may include the following operations S410 to S460 .
在操作S410,利用特征提取网络处理目标图像,得到第一中间特征图。In operation S410, a feature extraction network is used to process the target image to obtain a first intermediate feature map.
在操作S420,利用图像分割网络处理第一中间特征图,得到第二中间特征图。In operation S420, the image segmentation network is used to process the first intermediate feature map to obtain a second intermediate feature map.
在操作S430,利用深度估计网络处理第一中间特征图,得到第三中间特征图。In operation S430, the depth estimation network is used to process the first intermediate feature map to obtain a third intermediate feature map.
在操作S440,根据第二中间特征图和第三中间特征图,生成第四中间特征图。In operation S440, a fourth intermediate feature map is generated according to the second intermediate feature map and the third intermediate feature map.
在操作S450,利用深度估计网络处理第四中间特征图,得到目标图像的预测深度图。In operation S450, a depth estimation network is used to process the fourth intermediate feature map to obtain a predicted depth map of the target image.
在操作S460,利用图像分割网络处理第二中间特征图,得到目标图像的预测分割图。In operation S460, the image segmentation network is used to process the second intermediate feature map to obtain a predicted segmentation map of the target image.
如图3和图4所示,为了达到实时的目标对象的深度估计,使用MobileNet+ASPP(Atrous Spatial Pyramid Pooling,空洞空间卷积池化金字塔)模块作为特征提取网络,即编码网络。使用深度可分离卷积作为解码网络,即图像分割网络和深度估计网络。As shown in Figure 3 and Figure 4, in order to achieve real-time depth estimation of the target object, the MobileNet+ASPP (Atrous Spatial Pyramid Pooling) module is used as the feature extraction network, that is, the encoding network. Use depthwise separable convolutions as decoding networks, i.e. image segmentation network and depth estimation network.
根据本公开的实施例,可以记目标图像的高和宽分别为H和W。将目标图像输入到MobileNet模块,输出特征图f 2,其大小为
Figure PCTCN2021140683-appb-000001
在这个过程中,中间层的输出为特征图f 1,其大小为
Figure PCTCN2021140683-appb-000002
可以将特征图f 2输入到ASPP模块中,得到的输出与特征图f 2融合,输出特征图f 3。将特征图f 3上采样至
Figure PCTCN2021140683-appb-000003
大小,然后将特征图f 1和f 3融合,最后以特征图f 4作为特征提取网络的输出,特征图f 4即为第一中间特征图。
According to an embodiment of the present disclosure, the height and width of the target image can be recorded as H and W, respectively. Input the target image to the MobileNet module, and output a feature map f 2 of size
Figure PCTCN2021140683-appb-000001
In this process, the output of the intermediate layer is a feature map f 1 whose size is
Figure PCTCN2021140683-appb-000002
The feature map f 2 can be input into the ASPP module, and the obtained output is fused with the feature map f 2 to output the feature map f 3 . Upsample the feature map f3 to
Figure PCTCN2021140683-appb-000003
Then, the feature maps f1 and f3 are fused, and finally the feature map f4 is used as the output of the feature extraction network, and the feature map f4 is the first intermediate feature map.
根据本公开的实施例,图像分割网络和深度估计网络均以相同的特征图f 4(即第一中间特征图)作为输入,分别输出第二中间特征图(即特征图f 5)和第三中间特征图(即特征图f 6)。需要说明的是,由于通过特征提取网络处理后的第一中间特征图,分别作为深度估计网络和图像分割网络的输入,因此,特征提取网络综合考虑了目标图像的细节信息和抽象信息。 According to an embodiment of the present disclosure, both the image segmentation network and the depth estimation network take the same feature map f 4 (ie, the first intermediate feature map) as input, and output the second intermediate feature map (ie, the feature map f 5 ) and the third intermediate feature map respectively. The intermediate feature map (ie, feature map f6 ). It should be noted that, since the first intermediate feature map processed by the feature extraction network is used as the input of the depth estimation network and the image segmentation network, respectively, the feature extraction network comprehensively considers the detailed information and abstract information of the target image.
根据本公开的实施例,由于在预测深度图中,同一个目标对象的深度值是比较接近的,并且在目标对象的边界处深度值的梯度可能会较大,因此,为了获得精度更高的预测深度图,可以将图像分割网络输出的第二中间特征图(即特征图f 5)输入到深度估计网络中,并与第三中间特征图(即特征图f 6)结合,得到第四中间特征图(即特征图f 7)。第四中间特征图通过卷积和上采样得到大小为
Figure PCTCN2021140683-appb-000004
的预测深度图。第二中间特征图通过卷积和上采样得到大小为
Figure PCTCN2021140683-appb-000005
的预测分割图。
According to the embodiments of the present disclosure, since in the predicted depth map, the depth values of the same target object are relatively close, and the gradient of the depth value at the boundary of the target object may be larger, therefore, in order to obtain higher precision To predict the depth map, the second intermediate feature map (ie, feature map f 5 ) output by the image segmentation network can be input into the depth estimation network, and combined with the third intermediate feature map (ie, feature map f 6 ) to obtain the fourth intermediate feature map feature map (ie, feature map f7 ). The fourth intermediate feature map is obtained by convolution and upsampling with a size of
Figure PCTCN2021140683-appb-000004
the predicted depth map. The size of the second intermediate feature map is obtained by convolution and upsampling as
Figure PCTCN2021140683-appb-000005
The predicted segmentation map of .
根据本公开的实施例,通过将图像分割网络中的第二中间特征图输入到深度估计网络中,来矫正深度估计网络的的预测结果,使的深度估计的预测结果更为准确。According to the embodiment of the present disclosure, by inputting the second intermediate feature map in the image segmentation network into the depth estimation network, the prediction result of the depth estimation network is corrected, so that the prediction result of the depth estimation is more accurate.
根据本公开的实施例,根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图,可以包括如下操作。According to an embodiment of the present disclosure, processing the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object may include the following operations.
将目标图像的预测深度图中除目标图像的位置以外的其他位置的像素值设置为预设像素值,得到目标对象的预测深度图。The pixel values of other positions in the predicted depth map of the target image except the position of the target image are set as preset pixel values to obtain the predicted depth map of the target object.
根据本公开的实施例,可以将目标图像的预测深度图中除目标图像的位置以外的其他位置的像素值设置为预设像素值,得到目标对象的预测深度图。其中,预设像素值可以根据实际情况设定,在此不作具体限定,例如预设像素值可以为0。According to the embodiments of the present disclosure, the pixel values of other positions in the predicted depth map of the target image other than the position of the target image can be set as preset pixel values to obtain the predicted depth map of the target object. The preset pixel value may be set according to the actual situation, which is not specifically limited here, for example, the preset pixel value may be 0.
示例性的,图5示意性示出了根据本公开实施例的目标图像的示意图。图6示意性示出了根据本公开实施例的目标图像的预测深度图。图7示意性示出了根据本公开实施例的目标图像的预测分割图。图8示意性示出了根据本公开实施例的一种目标对象的预测深度图。图8中目标对象为人。图9示意性示出了根据本公开实施例的另一种目标对象的预测深度图。图9中目标对象为冰箱。图10示意性性示出了根据本公开实施例的再一种目标对象的示意图。图10中目标对象的数量包括多个。Exemplarily, FIG. 5 schematically shows a schematic diagram of a target image according to an embodiment of the present disclosure. FIG. 6 schematically illustrates a predicted depth map of a target image according to an embodiment of the present disclosure. FIG. 7 schematically shows a predicted segmentation map of a target image according to an embodiment of the present disclosure. FIG. 8 schematically shows a predicted depth map of a target object according to an embodiment of the present disclosure. The target object in Figure 8 is a person. FIG. 9 schematically shows a predicted depth map of another target object according to an embodiment of the present disclosure. The target object in Figure 9 is a refrigerator. FIG. 10 schematically shows a schematic diagram of still another target object according to an embodiment of the present disclosure. The number of target objects in FIG. 10 includes a plurality.
如图5~图10所示,将目标图像的预测分割图作用到目标图像的预测深度图后得到的目标对象的预测深度图。在图7~图10中,黑色表示非目标对象区域。As shown in FIGS. 5 to 10 , the predicted depth map of the target object is obtained by applying the predicted segmentation map of the target image to the predicted depth map of the target image. In FIGS. 7 to 10 , black represents the non-target area.
根据本公开的实施例,利用图像处理模型处理目标图像,分别得到目标图像的预测分割图和预测深度图,其中,图像处理模型是利用训练样本训练得到的,其中,训练样本包括样本图像和样本图像的深度标签和分割标签。According to an embodiment of the present disclosure, an image processing model is used to process a target image, and a predicted segmentation map and a predicted depth map of the target image are obtained respectively, wherein the image processing model is obtained by training using training samples, wherein the training samples include sample images and samples Depth labels and segmentation labels for images.
根据本公开的实施例,图像处理模型是利用训练样本训练得到的,可以包括如下操作。According to an embodiment of the present disclosure, the image processing model is obtained by training using training samples, and may include the following operations.
获取训练样本。利用训练样本训练全卷积神经网络模型,得到图像处理模型。Get training samples. Use the training samples to train a fully convolutional neural network model to obtain an image processing model.
根据本公开的实施例,全卷积神经网络模型包括初始特征提取网络、初始图像分割网络和初始深度估计网络。According to an embodiment of the present disclosure, the fully convolutional neural network model includes an initial feature extraction network, an initial image segmentation network, and an initial depth estimation network.
根据本公开的实施例,利用训练样本训练全卷积神经网络模型,得到图像处理模型,可以包括如下操作。According to an embodiment of the present disclosure, using training samples to train a fully convolutional neural network model to obtain an image processing model may include the following operations.
利用初始特征提取网络处理样本图像,得到第五中间特征图。利用初始图像分割网络处理第五中间特征图,得到第六中间特征图。利用初始深 度估计网络处理第五中间特征图,得到第七中间特征图。根据第六中间特征图和第七中间特征图,生成第八中间特征图。利用初始深度估计网络处理第八中间特征图,得到样本图像的预测深度图。利用初始图像分割网络处理第六中间特征图,得到样本图像的预测分割图。将样本图像的深度标签、预测深度图、分割标签和预测分割图输入全卷积神经网络模型的损失函数,输出损失结果。根据损失结果调整全卷积神经网络模型的网络参数,直至损失函数收敛。将训练后的全卷积神经网络模型作为图像处理模型。The sample image is processed using the initial feature extraction network to obtain a fifth intermediate feature map. Using the initial image segmentation network to process the fifth intermediate feature map, a sixth intermediate feature map is obtained. The fifth intermediate feature map is processed using the initial depth estimation network to obtain the seventh intermediate feature map. According to the sixth intermediate feature map and the seventh intermediate feature map, an eighth intermediate feature map is generated. The eighth intermediate feature map is processed using the initial depth estimation network to obtain the predicted depth map of the sample image. Using the initial image segmentation network to process the sixth intermediate feature map, a predicted segmentation map of the sample image is obtained. Input the depth label, predicted depth map, segmentation label and predicted segmentation map of the sample image into the loss function of the fully convolutional neural network model, and output the loss result. Adjust the network parameters of the fully convolutional neural network model according to the loss results until the loss function converges. Use the trained fully convolutional neural network model as an image processing model.
根据本公开的实施例,可以结合图3进行理解,在图像处理模型的训练过程中,第五中间特征图可以理解为是图3中的特征图f 4,第六中间特征图可以理解为是图3中的特征图f 5,第七中间特征图可以理解为是图3中的特征图f 6,第八中间特征图可以理解为是图3中的特征图f 7。需要说明的是,在训练完成后,初始特征提取网络、初始图像分割网络和初始深度估计网络分别称为特征提取网络、图像分割网络和深度估计网络。 According to an embodiment of the present disclosure, it can be understood with reference to FIG. 3 , in the training process of the image processing model, the fifth intermediate feature map can be understood as the feature map f 4 in FIG. 3 , and the sixth intermediate feature map can be understood as The feature map f 5 in FIG. 3 and the seventh intermediate feature map can be understood as the feature map f 6 in FIG. 3 , and the eighth intermediate feature map can be understood as the feature map f 7 in FIG. 3 . It should be noted that, after the training is completed, the initial feature extraction network, the initial image segmentation network and the initial depth estimation network are respectively called the feature extraction network, the image segmentation network and the depth estimation network.
图11示意性示出了根据本公开实施例的再一种图像处理方法的流程图。FIG. 11 schematically shows a flowchart of still another image processing method according to an embodiment of the present disclosure.
如图11所示,对样本图像进行图像分割处理,得到样本图像的分割标签可以包括如下操作S1110~S1130。As shown in FIG. 11 , performing image segmentation processing on the sample image to obtain the segmentation label of the sample image may include the following operations S1110 to S1130.
在操作S1110,对样本图像进行实例分割处理,得到样本图像的实例分割标签。In operation S1110, an instance segmentation process is performed on the sample image to obtain an instance segmentation label of the sample image.
在操作S1120,根据样本图像的实例分割标签,得到样本图像的语义分割标签。In operation S1120, the label is segmented according to the instance of the sample image to obtain the semantic segmentation label of the sample image.
在操作S1130,将样本图像的语义分割标签作为样本图像的分割标签。In operation S1130, the semantic segmentation label of the sample image is used as the segmentation label of the sample image.
根据本公开的实施例,实例分割为图像中可能包括属于同一类别的多个实例,需要对其进行区分。例如,对于一目标图像,该目标图像中可能包括属于人这一类别的数量为多个,即包括多个人,在实例分割中,需要对这多个人进行区分,每个人都可以得到相应的实例分割标签。According to an embodiment of the present disclosure, the instance is segmented into images that may include multiple instances belonging to the same category, which need to be distinguished. For example, for a target image, the target image may include a number of people belonging to the category of people, that is, it includes multiple people. In instance segmentation, it is necessary to distinguish these multiple people, and each person can get the corresponding instance Split labels.
根据本公开的实施例,语义分割为对图像中的每个像素点进行类别划分,但不区分实例。例如,对于一目标图像,该目标图像中可能包括属于人这一类别的数量为多个,即包括多个人,在语义分割中,不需要对这多个人进行区分,多个人得到相同的语义分割标签。According to an embodiment of the present disclosure, semantic segmentation is to classify each pixel in the image, but does not distinguish instances. For example, for a target image, the target image may include multiple people belonging to the category of people, that is, including multiple people. In semantic segmentation, there is no need to distinguish these multiple people, and multiple people get the same semantic segmentation. Label.
根据本公开的实施例,为了制作更精确的语义分割标签,可以使用Mask_RCNN(Recurrent Convolutional Neural Network,循环卷积神经网络)网络在三个深度估计的样本图像数据库CAD_60、CAD_120和EPFL上输出实例分割标签,然后将实例分割标签转换为语义分割标签,将其作为三个深度估计的样本图像数据库上的语义分割标签。但是并不局限于此,还可以单独使用语义分割标签。由于本公开实施例采用的是深度估计的样本图像数据库,因此,可以获得样本图像的深度标签。According to an embodiment of the present disclosure, in order to make more accurate semantic segmentation labels, a Mask_RCNN (Recurrent Convolutional Neural Network) network can be used to output instance segmentation on three depth-estimated sample image databases CAD_60, CAD_120 and EPFL labels, and then convert the instance segmentation labels to semantic segmentation labels as semantic segmentation labels on the sample image database for three depth estimates. But it is not limited to this, and semantic segmentation tags can also be used alone. Since the embodiment of the present disclosure adopts the sample image database for depth estimation, the depth label of the sample image can be obtained.
根据本公开的其他实施例,Mask_RCNN可以检测并分割出82种类别(含背景类)的对象,在实际应用中,深度估计的样本图像数据库中出现的类别可能少于82种,当进行分割处理时,直接使用82种类别作为分割标签,将扩大分割范围,导致增加分割处理的出错概率。According to other embodiments of the present disclosure, Mask_RCNN can detect and segment objects of 82 categories (including background categories). In practical applications, there may be less than 82 categories in the sample image database for depth estimation. , directly using 82 categories as segmentation labels will expand the segmentation range, resulting in an increase in the error probability of segmentation processing.
根据本公开的实施例,使用的样本图像数据库中出现了其中的59种类别,为了构造稠密的分割标签。如表1所示,可以将Mask_RCNN的类别进行一个映射,将不涉及的类别标记为-1,以满足在对图像分割处理的基础上,降低出错概率,提高分割效果和准确度。According to an embodiment of the present disclosure, 59 of these categories appear in the sample image database used in order to construct dense segmentation labels. As shown in Table 1, the categories of Mask_RCNN can be mapped, and the categories that are not involved are marked as -1, so as to reduce the error probability and improve the segmentation effect and accuracy on the basis of image segmentation processing.
表1Table 1
Figure PCTCN2021140683-appb-000006
Figure PCTCN2021140683-appb-000006
根据本公开的实施例,使用Mask_RCNN网络得到的实例分割标签的结果精度高,有利于图像处理模型的构建。According to the embodiments of the present disclosure, the result of instance segmentation labels obtained by using the Mask_RCNN network has high accuracy, which is beneficial to the construction of an image processing model.
根据本公开的实施例,也可以直接在具有深度标签的样本图像数据库上进行语义分割,得到样本图像的语义分割标签。此外,除了采用在具有深度标签的样本图像数据库的基础上,对样本图像进行图像分割处理,得到样本图像的分割标签,使得样本图像同时具有深度标签和分割标签的方 式外。还可以采用已知样本图像的分割标签,对样本图像进行深度估计,得到样本图像的深度标签,使得样本图像同时具有深度标签和分割标签的方式。但是,采用第一种方式得到的目标图像中的目标对象的预测深度图的精确度更高。According to the embodiments of the present disclosure, semantic segmentation can also be directly performed on a sample image database with depth labels to obtain the semantic segmentation labels of the sample images. In addition, on the basis of the sample image database with depth labels, image segmentation is performed on the sample images to obtain the segmentation labels of the sample images, so that the sample images have both depth labels and segmentation labels. The segmentation label of the known sample image can also be used to perform depth estimation on the sample image to obtain the depth label of the sample image, so that the sample image has both the depth label and the segmentation label. However, the accuracy of the predicted depth map of the target object in the target image obtained by the first method is higher.
下面结合具体实施例对本公开的技术方案作进一步说明,图像处理方法的操作可以具体如下。The technical solutions of the present disclosure will be further described below with reference to specific embodiments, and the operations of the image processing method may be specifically as follows.
图12示意性示出了根据本公开实施例的又一种图像处理方法的流程图。FIG. 12 schematically shows a flowchart of yet another image processing method according to an embodiment of the present disclosure.
根据本公开的实施例,如图12所示,图像处理方法的操作可以包括。According to an embodiment of the present disclosure, as shown in FIG. 12 , the operations of the image processing method may include.
制作数据集,构建全卷积神经网络模型,训练得到图像处理模型。Create a dataset, build a fully convolutional neural network model, and train to obtain an image processing model.
归一化输入图像,得到RGB图像,作为图像处理模型的输入,分别得到目标图像的预测深度图和预测分割图。The input image is normalized to obtain an RGB image, which is used as the input of the image processing model to obtain the predicted depth map and predicted segmentation map of the target image, respectively.
判断目标图像的预测分割图中对应的预测分割图是否是目标对象的,若是,则将目标对象的预测分割图作用到目标图像的预测深度图上得到目标对象的预测深度图。Determine whether the predicted segmentation map corresponding to the predicted segmentation map of the target image belongs to the target object, and if so, apply the predicted segmentation map of the target object to the predicted depth map of the target image to obtain the predicted depth map of the target object.
此外,根据本公开的实施例,还可以判断是否遍历所有目标对象的预测分割图。若否,遍历下一个类别的目标对象的预测分割图;若是,输出所有目标对象的预测深度图。In addition, according to the embodiments of the present disclosure, it can also be determined whether to traverse the predicted segmentation maps of all target objects. If not, traverse the predicted segmentation map of the target object of the next category; if so, output the predicted depth map of all target objects.
根据本公开的实施例,可以将图像处理方法应用于单目机器人寻找或躲避特定物体的应用场景中,其中,特征物体即为目标对象。According to the embodiments of the present disclosure, the image processing method can be applied to an application scenario in which a monocular robot finds or avoids a specific object, wherein the characteristic object is the target object.
图13示意性示出了根据本公开的实施例的图像处理装置的框图。FIG. 13 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.
如图13所示,图像处理装置1300包括获取模块1310、第一处理模块1320、确定模块1330和第二处理模块1340。As shown in FIG. 13 , the image processing apparatus 1300 includes an acquisition module 1310 , a first processing module 1320 , a determination module 1330 and a second processing module 1340 .
获取模块1310、第一处理模块1320、确定模块1330和第二处理模块1340通信连接。The acquiring module 1310 , the first processing module 1320 , the determining module 1330 and the second processing module 1340 are connected in communication.
获取模块1310,用于获取目标图像,其中,目标图像包括目标对象和非目标对象。The acquiring module 1310 is configured to acquire a target image, wherein the target image includes a target object and a non-target object.
第一处理模块1320,用于对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图。The first processing module 1320 is configured to perform image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively.
确定模块1330,用于根据目标对象的预测分割图确定目标图像的预测 深度图中目标对象的位置。The determining module 1330 is configured to determine the position of the target object in the predicted depth map of the target image according to the predicted segmentation map of the target object.
第二处理模块1340,用于根据目标对象的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。The second processing module 1340 is configured to process the predicted depth map according to the position of the target object in the predicted depth map of the target object to obtain the predicted depth map of the target object.
根据本公开实施例的技术方案,通过获取目标图像,目标图像包括目标对象和非目标对象,对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图,根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置,并根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度图。由于将图像分割和深度估计进行结合,其中,根据预测分割图能够获得预测深度图中目标对象的位置,根据预测深度图中目标对象的位置对预测深度图进行处理能够获得目标对象的预测深度图,因此,至少部分地克服了相关技术中难以实现针对目标图像中目标对象的深度估计的技术问题,进而实现了较为准确地确定目标图像中目标对象的深度,并且方法的泛化性较强。According to the technical solutions of the embodiments of the present disclosure, by acquiring a target image, the target image includes a target object and a non-target object, and performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively. The predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object. Due to the combination of image segmentation and depth estimation, the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem of difficulty in realizing depth estimation of the target object in the target image in the related art is at least partially overcome, and the depth of the target object in the target image can be more accurately determined, and the method has strong generalization.
根据本公开的实施例,第一处理模块1320包括第一处理单元。According to an embodiment of the present disclosure, the first processing module 1320 includes a first processing unit.
第一处理单元,用于利用图像处理模型处理目标图像,分别得到分割图和深度图,其中,图像处理模型是利用训练样本训练得到的,其中,训练样本包括样本图像和样本图像的深度标签和分割标签。The first processing unit is used to process the target image by using an image processing model to obtain a segmentation map and a depth map respectively, wherein the image processing model is obtained by training with training samples, wherein the training samples include the sample image and the depth label and the depth label of the sample image. Split labels.
根据本公开的实施例,图像处理模型包括特征提取网络、图像分割网络和深度估计网络。According to an embodiment of the present disclosure, the image processing model includes a feature extraction network, an image segmentation network, and a depth estimation network.
根据本公开的实施例,第一处理单元包括第一处理子单元、第二处理子单元、第三处理子单元、第四处理子单元、第五处理子单元、和第六处理子单元。According to an embodiment of the present disclosure, the first processing unit includes a first processing subunit, a second processing subunit, a third processing subunit, a fourth processing subunit, a fifth processing subunit, and a sixth processing subunit.
第一处理子单元,用于利用特征提取网络处理目标图像,得到第一中间特征图。The first processing subunit is used to process the target image by using the feature extraction network to obtain the first intermediate feature map.
第二处理子单元,用于利用图像分割网络处理第一中间特征图,得到第二中间特征图。The second processing subunit is used to process the first intermediate feature map by using the image segmentation network to obtain the second intermediate feature map.
第三处理子单元,用于利用深度估计网络处理第一中间特征图,得到第三中间特征图。The third processing subunit is used to process the first intermediate feature map by using the depth estimation network to obtain a third intermediate feature map.
第四处理子单元,用于根据第二中间特征图和第三中间特征图,生成第四中间特征图。The fourth processing subunit is configured to generate a fourth intermediate feature map according to the second intermediate feature map and the third intermediate feature map.
第五处理子单元,用于利用深度估计网络处理第四中间特征图,得到目标图像的预测深度图。The fifth processing subunit is used for processing the fourth intermediate feature map by using the depth estimation network to obtain the predicted depth map of the target image.
第六处理子单元,用于利用图像分割网络处理第二中间特征图,得到目标图像的预测分割图。The sixth processing subunit is used to process the second intermediate feature map by using the image segmentation network to obtain the predicted segmentation map of the target image.
根据本公开的实施例,图像处理模型是利用训练样本训练得到的,可以包括如下操作。According to an embodiment of the present disclosure, the image processing model is obtained by training using training samples, and may include the following operations.
获取训练样本。利用训练样本训练全卷积神经网络模型,得到图像处理模型。Get training samples. Use the training samples to train a fully convolutional neural network model to obtain an image processing model.
根据本公开的实施例,全卷积神经网络模型包括初始特征提取网络、初始图像分割网络和初始深度估计网络。According to an embodiment of the present disclosure, the fully convolutional neural network model includes an initial feature extraction network, an initial image segmentation network, and an initial depth estimation network.
利用训练样本训练全卷积神经网络模型,得到图像处理模型,可以包括如下操作。Using the training samples to train a fully convolutional neural network model to obtain an image processing model may include the following operations.
利用初始特征提取网络处理样本图像,得到第五中间特征图。利用初始图像分割网络处理第五中间特征图,得到第六中间特征图。利用初始深度估计网络处理第五中间特征图,得到第七中间特征图。根据第六中间特征图和第七中间特征图,生成第八中间特征图。利用初始深度估计网络处理第八中间特征图,得到样本图像的预测深度图。利用初始图像分割网络处理第六中间特征图,得到样本图像的预测分割图。将样本图像的深度标签、预测深度图、分割标签和预测分割图输入全卷积神经网络模型的损失函数,得到损失结果。根据损失结果调整全卷积神经网络模型的网络参数,直至损失函数收敛。将训练后的全卷积神经网络模型作为图像处理模型。The sample image is processed using the initial feature extraction network to obtain a fifth intermediate feature map. Using the initial image segmentation network to process the fifth intermediate feature map, a sixth intermediate feature map is obtained. The fifth intermediate feature map is processed using the initial depth estimation network to obtain a seventh intermediate feature map. According to the sixth intermediate feature map and the seventh intermediate feature map, an eighth intermediate feature map is generated. The eighth intermediate feature map is processed using the initial depth estimation network to obtain the predicted depth map of the sample image. Using the initial image segmentation network to process the sixth intermediate feature map, a predicted segmentation map of the sample image is obtained. Input the depth label, predicted depth map, segmentation label and predicted segmentation map of the sample image into the loss function of the fully convolutional neural network model to obtain the loss result. Adjust the network parameters of the fully convolutional neural network model according to the loss results until the loss function converges. Use the trained fully convolutional neural network model as an image processing model.
根据本公开的实施例,对样本图像进行图像分割处理,得到样本图像的分割标签,可以包括如下操作。According to an embodiment of the present disclosure, performing image segmentation processing on a sample image to obtain a segmentation label of the sample image may include the following operations.
对样本图像进行实例分割处理,得到样本图像的实例分割标签。根据样本图像的实例分割标签,得到样本图像的语义分割标签。将样本图像的语义分割标签作为样本图像的分割标签。Instance segmentation is performed on the sample image to obtain the instance segmentation label of the sample image. According to the instance segmentation label of the sample image, the semantic segmentation label of the sample image is obtained. The semantic segmentation label of the sample image is used as the segmentation label of the sample image.
根据本公开的实施例,第二处理模块1320包括第二处理单元。According to an embodiment of the present disclosure, the second processing module 1320 includes a second processing unit.
第二处理单元,用于将目标图像的预测深度图中除目标图像的位置以外的其他位置的像素值设置为预设像素值,得到目标对象的预测深度图。The second processing unit is configured to set the pixel values of other positions in the predicted depth map of the target image except the position of the target image as preset pixel values to obtain the predicted depth map of the target object.
根据本公开的实施例,图像分割处理包括语义分割处理或实例分割处理。According to an embodiment of the present disclosure, image segmentation processing includes semantic segmentation processing or instance segmentation processing.
根据本公开的实施例的模块、单元中的任意多个、或其中任意多个的至少部分功能可以在一个模块中实现。根据本公开实施例的模块、单元中的任意一个或多个可以被拆分成多个模块来实现。根据本公开实施例的模块、单元中的任意一个或多个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(Field Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Arrays,PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(Application Specific Integrated Circuit,ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式的硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,根据本公开实施例的模块、单元中的一个或多个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。Any of the modules, units, or at least part of the functions of any of the modules according to the embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be divided into multiple modules for implementation. Any one or more of the modules and units according to the embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as Field Programmable Gate Arrays (FPGA), Programmable Logic Arrays (Programmable Logic Arrays, PLA), system-on-chip, system-on-substrate, system-on-package, Application Specific Integrated Circuit (ASIC), or any other reasonable means of hardware or firmware that can integrate or package a circuit, Or it can be implemented in any one of the three implementation manners of software, hardware and firmware, or in an appropriate combination of any of them. Alternatively, one or more of the modules and units according to the embodiments of the present disclosure may be implemented at least in part as computer program modules, which, when executed, may perform corresponding functions.
例如,获取模块1310、第一处理模块1320、确定模块1330和第二处理模块1340中的任意多个可以合并在一个模块/单元中实现,或者其中的任意一个模块/单元可以被拆分成多个模块/单元。或者,这些模块/单元中的一个或多个模块/单元的至少部分功能可以与其他模块/单元的至少部分功能相结合,并在一个模块/单元中实现。根据本公开的实施例,获取模块1310、第一处理模块1320、确定模块1330和第二处理模块1340中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,获取模块1310、第一处理模块1320、确定模块1330和第二处理模块1340中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。For example, any one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be combined in one module/unit for implementation, or any one of the modules/units may be split into multiple modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units and implemented in one module/unit. According to an embodiment of the present disclosure, at least one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), Programmable logic array (PLA), system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or hardware or firmware that can be implemented by any other reasonable means of integrating or packaging circuits, Or it can be implemented in any one of the three implementation manners of software, hardware and firmware, or in an appropriate combination of any of them. Alternatively, at least one of the acquisition module 1310, the first processing module 1320, the determination module 1330, and the second processing module 1340 may be implemented at least partially as a computer program module, which, when executed, may perform corresponding functions .
需要说明的是,本公开的实施例中图像处理装置部分与本公开的实施例中图像处理方法部分是相对应的,图像处理装置部分的描述具体参考图像处理方法部分,在此不再赘述。It should be noted that the image processing apparatus part in the embodiment of the present disclosure corresponds to the image processing method part in the embodiment of the present disclosure, and the description of the image processing apparatus part refers to the image processing method part, which is not repeated here.
图14示意性示出了根据本公开实施例的适于实现上文描述的方法的电子设备的框图。图14示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Figure 14 schematically shows a block diagram of an electronic device suitable for implementing the method described above, according to an embodiment of the present disclosure. The electronic device shown in FIG. 14 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
如图14所示,根据本公开实施例的电子设备1400包括处理器1401,其可以根据存储在只读存储器(Read-Only Memory,ROM)1402中的程序或者从存储部分1408加载到随机访问存储器(Random Access Memory,RAM)1403中的程序而执行各种适当的动作和处理。处理器1401例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC)),等等。处理器1401还可以包括用于缓存用途的板载存储器。处理器1401可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。As shown in FIG. 14, an electronic device 1400 according to an embodiment of the present disclosure includes a processor 1401, which can be loaded into a random access memory according to a program stored in a read-only memory (Read-Only Memory, ROM) 1402 or from a storage part 1408 (Random Access Memory, RAM) program in 1403 to execute various appropriate actions and processes. The processor 1401 may include, for example, a general-purpose microprocessor (eg, a CPU), an instruction set processor and/or a related chipset, and/or a special-purpose microprocessor (eg, an application-specific integrated circuit (ASIC)), among others. The processor 1401 may also include on-board memory for caching purposes. The processor 1401 may include a single processing unit or multiple processing units for performing different actions of the method flow according to the embodiments of the present disclosure.
在RAM 1403中,存储有电子设备1400操作所需的各种程序和数据。处理器1401、ROM 1402以及RAM 1403通过总线1404彼此相连。处理器1401通过执行ROM 1402和/或RAM 1403中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意,所述程序也可以存储在除ROM1402和RAM 1403以外的一个或多个存储器中。处理器1401也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。In the RAM 1403, various programs and data necessary for the operation of the electronic device 1400 are stored. The processor 1401, the ROM 1402, and the RAM 1403 are connected to each other through a bus 1404. The processor 1401 performs various operations of the method flow according to an embodiment of the present disclosure by executing programs in the ROM 1402 and/or the RAM 1403. Note that the program may also be stored in one or more memories other than ROM 1402 and RAM 1403. The processor 1401 may also perform various operations of the method flow according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
根据本公开的实施例,电子设备1400还可以包括输入/输出(I/O)接口1405,输入/输出(I/O)接口1405也连接至总线1404。系统1400还可以包括连接至I/O接口1405的以下部件中的一项或多项:包括键盘、鼠标等的输入部分1406;包括诸如阴极射线管(CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1407;包括硬盘等的存储部分1408;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1409。通信部分1409经由诸如因特网的网络执行通信处理。驱动器1410也根据需要连接至I/O接口1405。可拆卸介质1411,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1410上,以 便于从其上读出的计算机程序根据需要被安装入存储部分1408。According to an embodiment of the present disclosure, the electronic device 1400 may also include an input/output (I/O) interface 1405 that is also connected to the bus 1404 . System 1400 may also include one or more of the following components connected to I/O interface 1405: input portion 1406 including keyboard, mouse, etc.; including components such as cathode ray tube (CRT), liquid crystal display (LCD) ) etc. and an output section 1407 of speakers and the like; a storage section 1408 including a hard disk and the like; and a communication section 1409 including a network interface card such as a LAN card, a modem and the like. The communication section 1409 performs communication processing via a network such as the Internet. Drivers 1410 are also connected to I/O interface 1405 as needed. A removable medium 1411, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1410 as needed so that a computer program read therefrom is installed into the storage section 1408 as needed.
根据本公开的实施例,根据本公开实施例的方法流程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1409从网络上被下载和安装,和/或从可拆卸介质1411被安装。在该计算机程序被处理器1401执行时,执行本公开实施例的系统中限定的上述功能。根据本公开的实施例,上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。According to an embodiment of the present disclosure, the method flow according to an embodiment of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 1409, and/or installed from the removable medium 1411. When the computer program is executed by the processor 1401, the above-described functions defined in the system of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the above-described systems, apparatuses, apparatuses, modules, units, etc. can be implemented by computer program modules.
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的;也可以是单独存在,而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被执行时,实现根据本公开实施例的方法。The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be included in the device/apparatus/system described in the above embodiments; it may also exist alone without being assembled into the device/system. device/system. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, implement the method according to the embodiment of the present disclosure.
根据本公开的实施例,计算机可读存储介质可以是非易失性的计算机可读存储介质。例如可以包括但不限于:便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器((Erasable Programmable Read Only Memory,EPROM)或闪存)、便携式紧凑磁盘只读存储器(Computer Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM or flash memory), portable computer Compact disk read-only memory (Computer Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
例如,根据本公开的实施例,计算机可读存储介质可以包括上文描述的ROM 1402和/或RAM 1403和/或ROM 1402和RAM 1403以外的一个或多个存储器。For example, according to embodiments of the present disclosure, a computer-readable storage medium may include one or more memories other than ROM 1402 and/or RAM 1403 and/or ROM 1402 and RAM 1403 described above.
根据本公开实施例的技术方案,通过获取目标图像,目标图像包括目标对象和非目标对象,对目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图,根据目标对象的预测分割图确定目标图像的预测深度图中目标对象的位置,并根据目标图像的预测深度图中目标对象的位置对预测深度图进行处理,得到目标对象的预测深度 图。由于将图像分割和深度估计进行结合,其中,根据预测分割图能够获得预测深度图中目标对象的位置,根据预测深度图中目标对象的位置对预测深度图进行处理能够获得目标对象的预测深度图,因此,至少部分地克服了相关技术中难以实现针对目标图像中目标对象的深度估计的技术问题,进而实现了较为准确地确定目标图像中目标对象的深度,并且方法的泛化性较强。According to the technical solutions of the embodiments of the present disclosure, by acquiring a target image, the target image includes a target object and a non-target object, and performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively. The predicted segmentation map of the target object determines the position of the target object in the predicted depth map of the target image, and processes the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object. Due to the combination of image segmentation and depth estimation, the position of the target object in the predicted depth map can be obtained according to the predicted segmentation map, and the predicted depth map of the target object can be obtained by processing the predicted depth map according to the position of the target object in the predicted depth map. Therefore, the technical problem that the depth estimation of the target object in the target image is difficult to achieve in the related art is at least partially overcome, and the depth of the target object in the target image is more accurately determined, and the generalization of the method is strong.
本公开的实施例还包括一种计算机程序产品,其包括计算机程序,该计算机程序包含用于执行本公开实施例所提供的方法的程序代码,当计算机程序产品在电子设备上运行时,该程序代码用于使电子设备实现本公开实施例所提供的图像处理方法。The embodiments of the present disclosure also include a computer program product, which includes a computer program, the computer program includes program codes for executing the methods provided by the embodiments of the present disclosure, and when the computer program product runs on an electronic device, the program The code is used to enable the electronic device to implement the image processing method provided by the embodiments of the present disclosure.
在该计算机程序被处理器1401执行时,执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例,上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。When the computer program is executed by the processor 1401, the above-mentioned functions defined in the system/device of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules.
在一种实施例中,该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中,该计算机程序也可以在网络介质上以信号的形式进行传输、分发,并通过通信部分1409被下载和安装,和/或从可拆卸介质1411被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal over a network medium, and downloaded and installed through the communication portion 1409, and/or installed from a removable medium 1411. The program code embodied by the computer program may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
根据本公开的实施例,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码,具体地,可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java,C++,python,“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(Local Area Network,LAN)或广域网(Wide Area Networks,WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。According to the embodiments of the present disclosure, the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages, and specifically, high-level procedures and/or object-oriented programming may be used. programming language, and/or assembly/machine language to implement these computational programs. Programming languages include, but are not limited to, languages such as Java, C++, python, "C" or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including Local Area Networks (LANs) or Wide Area Networks (WANs), or may be connected to external A computing device (eg, connected via the Internet using an Internet service provider).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法 和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。本领域技术人员可以理解,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合,即使这样的组合或结合没有明确记载于本公开中。特别地,在不脱离本公开精神和教导的情况下,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in various embodiments and/or claims of the present disclosure are possible, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or in the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of this disclosure.
以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。Embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only, and are not intended to limit the scope of the present disclosure. Although the various embodiments are described above separately, this does not mean that the measures in the various embodiments cannot be used in combination to advantage. The scope of the present disclosure is defined by the appended claims and their equivalents. Without departing from the scope of the present disclosure, those skilled in the art can make various substitutions and modifications, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims (12)

  1. 一种图像处理方法,包括:An image processing method, comprising:
    获取目标图像,其中,所述目标图像包括目标对象和非目标对象;acquiring a target image, wherein the target image includes a target object and a non-target object;
    对所述目标图像进行图像分割处理和深度估计处理,分别得到所述目标图像的预测分割图和预测深度图;Perform image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image, respectively;
    根据所述目标对象的预测分割图确定所述目标图像的预测深度图中所述目标对象的位置;以及determining the position of the target object in the predicted depth map of the target image according to the predicted segmentation map of the target object; and
    根据所述目标图像的预测深度图中所述目标对象的位置对所述预测深度图进行处理,得到所述目标对象的预测深度图。The predicted depth map is processed according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  2. 根据权利要求1所述的方法,其中,所述对所述目标图像进行图像分割处理和深度估计处理,分别得到目标图像的预测分割图和预测深度图,包括:The method according to claim 1, wherein, performing image segmentation processing and depth estimation processing on the target image to obtain a predicted segmentation map and a predicted depth map of the target image respectively, comprising:
    利用图像处理模型处理所述目标图像,分别得到所述目标图像的预测分割图和预测深度图,其中,所述图像处理模型是利用训练样本训练得到的,其中,所述训练样本包括样本图像和所述样本图像的深度标签和分割标签。Use an image processing model to process the target image, and obtain a predicted segmentation map and a predicted depth map of the target image, respectively, wherein the image processing model is obtained by training with training samples, wherein the training samples include sample images and The depth label and segmentation label of the sample image.
  3. 根据权利要求2所述的方法,其中,所述图像处理模型包括特征提取网络、图像分割网络和深度估计网络;The method of claim 2, wherein the image processing model comprises a feature extraction network, an image segmentation network and a depth estimation network;
    所述利用图像处理模型处理所述目标图像,分别得到所述目标图像的预测分割图和预测深度图,包括:The described target image is processed by the image processing model, and the predicted segmentation map and the predicted depth map of the target image are obtained respectively, including:
    利用所述特征提取网络处理所述目标图像,得到第一中间特征图;Use the feature extraction network to process the target image to obtain a first intermediate feature map;
    利用所述图像分割网络处理所述第一中间特征图,得到第二中间特征图;Using the image segmentation network to process the first intermediate feature map to obtain a second intermediate feature map;
    利用所述深度估计网络处理所述第一中间特征图,得到第三中间特征图;Using the depth estimation network to process the first intermediate feature map to obtain a third intermediate feature map;
    根据所述第二中间特征图和所述第三中间特征图,生成第四中间特征图;generating a fourth intermediate feature map according to the second intermediate feature map and the third intermediate feature map;
    利用所述深度估计网络处理所述第四中间特征图,得到所述目标图像的预测深度图;以及Using the depth estimation network to process the fourth intermediate feature map to obtain a predicted depth map of the target image; and
    利用所述图像分割网络处理所述第二中间特征图,得到所述目标图像的预测分割图。The second intermediate feature map is processed by the image segmentation network to obtain a predicted segmentation map of the target image.
  4. 根据权利要求2所述的方法,其中,所述图像处理模型是利用训练样本训练得到的,包括:The method according to claim 2, wherein the image processing model is obtained by training with training samples, comprising:
    获取所述训练样本;以及obtaining the training samples; and
    利用所述训练样本训练全卷积神经网络模型,得到所述图像处理模型。Using the training samples to train a fully convolutional neural network model to obtain the image processing model.
  5. 根据权利要求4所述的方法,其中,所述全卷积神经网络模型包括初始特征提取网络、初始图像分割网络和初始深度估计网络;The method of claim 4, wherein the fully convolutional neural network model comprises an initial feature extraction network, an initial image segmentation network and an initial depth estimation network;
    利用所述训练样本训练全卷积神经网络模型,得到所述图像处理模型,包括:Use the training samples to train a fully convolutional neural network model to obtain the image processing model, including:
    利用所述初始特征提取网络处理所述样本图像,得到第五中间特征图;Use the initial feature extraction network to process the sample image to obtain a fifth intermediate feature map;
    利用所述初始图像分割网络处理所述第五中间特征图,得到第六中间特征图;Using the initial image segmentation network to process the fifth intermediate feature map to obtain a sixth intermediate feature map;
    利用所述初始深度估计网络处理所述第五中间特征图,得到第七中间特征图;Using the initial depth estimation network to process the fifth intermediate feature map to obtain a seventh intermediate feature map;
    根据所述第六中间特征图和所述第七中间特征图,生成第八中间特征图;generating an eighth intermediate feature map according to the sixth intermediate feature map and the seventh intermediate feature map;
    利用所述初始深度估计网络处理所述第八中间特征图,得到所述样本图像的预测深度图;Using the initial depth estimation network to process the eighth intermediate feature map to obtain a predicted depth map of the sample image;
    利用所述初始图像分割网络处理所述第六中间特征图,得到所述样本图像的预测分割图;Using the initial image segmentation network to process the sixth intermediate feature map to obtain a predicted segmentation map of the sample image;
    将所述样本图像的深度标签、预测深度图、分割标签和预测分割图输入所述全卷积神经网络模型的损失函数,根据所述损失结果调整所述全卷积神经网络模型的网络参数,直至所述损失函数收敛;以及Input the depth label, predicted depth map, segmentation label and predicted segmentation map of the sample image into the loss function of the fully convolutional neural network model, and adjust the network parameters of the fully convolutional neural network model according to the loss result, until the loss function converges; and
    将训练后的全卷积神经网络模型作为所述图像处理模型。The trained fully convolutional neural network model is used as the image processing model.
  6. 根据权利要求5所述的方法,其中,所述对所述样本图像进行图像分割处理,得到所述样本图像的分割标签,包括:The method according to claim 5, wherein, performing image segmentation processing on the sample image to obtain a segmentation label of the sample image, comprising:
    对所述样本图像进行实例分割处理,得到所述样本图像的实例分割标签;Perform instance segmentation processing on the sample image to obtain an instance segmentation label of the sample image;
    根据所述样本图像的实例分割标签,得到所述样本图像的语义分割标签;以及According to the instance segmentation label of the sample image, the semantic segmentation label of the sample image is obtained; and
    将所述样本图像的语义分割标签作为所述样本图像的分割标签。The semantic segmentation label of the sample image is used as the segmentation label of the sample image.
  7. 根据权利要求1所述的方法,其中,所述根据所述目标图像的预测深度图中所述目标对象的位置对所述预测深度图进行处理,得到所述目标对象的预测深度图,包括:The method according to claim 1, wherein the processing of the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object comprises:
    将所述目标图像的预测深度图中除所述目标图像的位置以外的其他位置的像素值设置为预设像素值,得到所述目标对象的预测深度图。Setting pixel values of other positions in the predicted depth map of the target image other than the position of the target image as preset pixel values to obtain the predicted depth map of the target object.
  8. 根据权利要求1~7中任一项所述的方法,其中,所述图像分割处理包括语义分割处理或实例分割处理。The method according to any one of claims 1 to 7, wherein the image segmentation processing includes semantic segmentation processing or instance segmentation processing.
  9. 一种图像处理装置,包括:An image processing device, comprising:
    获取模块,用于获取目标图像,其中,所述目标图像包括目标对象和非目标对象;an acquisition module for acquiring a target image, wherein the target image includes a target object and a non-target object;
    第一处理模块,用于对所述目标图像进行图像分割处理和深度估计处理,分别得到所述目标图像的预测分割图和预测深度图;a first processing module, configured to perform image segmentation processing and depth estimation processing on the target image, to obtain a predicted segmentation map and a predicted depth map of the target image, respectively;
    确定模块,用于根据所述目标对象的预测分割图确定所述目标图像的预测深度图中所述目标对象的位置;以及a determining module for determining the position of the target object in the predicted depth map of the target image according to the predicted segmentation map of the target object; and
    第二处理模块,用于根据所述目标图像的预测深度图中所述目标对象的位置对所述预测深度图进行处理,得到所述目标对象的预测深度图。The second processing module is configured to process the predicted depth map according to the position of the target object in the predicted depth map of the target image to obtain the predicted depth map of the target object.
  10. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储器,用于存储一个或多个程序,memory for storing one or more programs,
    其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现权利要求1~8中任一项所述的方法。Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method of any one of claims 1-8.
  11. 一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现权利要求1~8中任一项所述的方法。A computer-readable storage medium having executable instructions stored thereon, the instructions, when executed by a processor, cause the processor to implement the method of any one of claims 1-8.
  12. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时用于实现权利要求1~8中任一项所述的方法。A computer program product, the computer program product comprising a computer program for implementing the method according to any one of claims 1 to 8 when the computer program is executed by a processor.
PCT/CN2021/140683 2021-01-04 2021-12-23 Image processing method and apparatus, electronic device, medium, and computer program product WO2022143366A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110002321.5 2021-01-04
CN202110002321.5A CN113781493A (en) 2021-01-04 2021-01-04 Image processing method, image processing apparatus, electronic device, medium, and computer program product

Publications (1)

Publication Number Publication Date
WO2022143366A1 true WO2022143366A1 (en) 2022-07-07

Family

ID=78835376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140683 WO2022143366A1 (en) 2021-01-04 2021-12-23 Image processing method and apparatus, electronic device, medium, and computer program product

Country Status (2)

Country Link
CN (1) CN113781493A (en)
WO (1) WO2022143366A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597213A (en) * 2023-05-18 2023-08-15 北京百度网讯科技有限公司 Target detection method, training device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781493A (en) * 2021-01-04 2021-12-10 北京沃东天骏信息技术有限公司 Image processing method, image processing apparatus, electronic device, medium, and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012074361A1 (en) * 2010-12-03 2012-06-07 Mimos Berhad Method of image segmentation using intensity and depth information
CN109658413A (en) * 2018-12-12 2019-04-19 深圳前海达闼云端智能科技有限公司 A kind of method of robot target grasping body position detection
CN111311560A (en) * 2020-02-10 2020-06-19 中国铁道科学研究院集团有限公司基础设施检测研究所 Method and device for detecting state of steel rail fastener
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN113781493A (en) * 2021-01-04 2021-12-10 北京沃东天骏信息技术有限公司 Image processing method, image processing apparatus, electronic device, medium, and computer program product

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014238731A (en) * 2013-06-07 2014-12-18 株式会社ソニー・コンピュータエンタテインメント Image processor, image processing system, and image processing method
CN104346816B (en) * 2014-10-11 2017-04-19 京东方科技集团股份有限公司 Depth determining method and device and electronic equipment
US10019657B2 (en) * 2015-05-28 2018-07-10 Adobe Systems Incorporated Joint depth estimation and semantic segmentation from a single image
CN110969173B (en) * 2018-09-28 2023-10-24 杭州海康威视数字技术股份有限公司 Target classification method and device
US10846870B2 (en) * 2018-11-29 2020-11-24 Adobe Inc. Joint training technique for depth map generation
CN109785345A (en) * 2019-01-25 2019-05-21 中电健康云科技有限公司 Image partition method and device
CN110310229B (en) * 2019-06-28 2023-04-18 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, terminal device, and readable storage medium
CN110782468B (en) * 2019-10-25 2023-04-07 北京达佳互联信息技术有限公司 Training method and device of image segmentation model and image segmentation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012074361A1 (en) * 2010-12-03 2012-06-07 Mimos Berhad Method of image segmentation using intensity and depth information
CN109658413A (en) * 2018-12-12 2019-04-19 深圳前海达闼云端智能科技有限公司 A kind of method of robot target grasping body position detection
CN111311560A (en) * 2020-02-10 2020-06-19 中国铁道科学研究院集团有限公司基础设施检测研究所 Method and device for detecting state of steel rail fastener
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN113781493A (en) * 2021-01-04 2021-12-10 北京沃东天骏信息技术有限公司 Image processing method, image processing apparatus, electronic device, medium, and computer program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597213A (en) * 2023-05-18 2023-08-15 北京百度网讯科技有限公司 Target detection method, training device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113781493A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
US11861829B2 (en) Deep learning based medical image detection method and related device
US11907850B2 (en) Training image-to-image translation neural networks
US11200424B2 (en) Space-time memory network for locating target object in video content
JP6745328B2 (en) Method and apparatus for recovering point cloud data
CN109508681B (en) Method and device for generating human body key point detection model
WO2019091464A1 (en) Target detection method and apparatus, training method, electronic device and medium
WO2019011249A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
WO2022143366A1 (en) Image processing method and apparatus, electronic device, medium, and computer program product
EP3679521A1 (en) Segmenting objects by refining shape priors
CN110622177A (en) Instance partitioning
WO2019080747A1 (en) Target tracking method and apparatus, neural network training method and apparatus, storage medium and electronic device
CN108182457B (en) Method and apparatus for generating information
WO2020062494A1 (en) Image processing method and apparatus
CN113569740B (en) Video recognition model training method and device, and video recognition method and device
US11544498B2 (en) Training neural networks using consistency measures
WO2020125062A1 (en) Image fusion method and related device
US20220383630A1 (en) Training large-scale vision transformer neural networks
Li et al. VNLSTM-PoseNet: A novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets
US20230289402A1 (en) Joint perception model training method, joint perception method, device, and storage medium
WO2020101781A1 (en) Processing images to localize novel objects
US10296541B2 (en) Searching method and apparatus
WO2023282847A1 (en) Detecting objects in a video using attention models
JP2022185144A (en) Object detection method and training method and device of object detection model
CN114565768A (en) Image segmentation method and device
CN112508005B (en) Method, apparatus, device and storage medium for processing image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21914103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.10.2023)