WO2021164280A1 - 三维边缘检测方法、装置、存储介质和计算机设备 - Google Patents

三维边缘检测方法、装置、存储介质和计算机设备 Download PDF

Info

Publication number
WO2021164280A1
WO2021164280A1 PCT/CN2020/121120 CN2020121120W WO2021164280A1 WO 2021164280 A1 WO2021164280 A1 WO 2021164280A1 CN 2020121120 W CN2020121120 W CN 2020121120W WO 2021164280 A1 WO2021164280 A1 WO 2021164280A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
detection result
encoding
object detection
result
Prior art date
Application number
PCT/CN2020/121120
Other languages
English (en)
French (fr)
Inventor
柳露艳
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20920101.1A priority Critical patent/EP4016454A4/en
Priority to JP2022522367A priority patent/JP7337268B2/ja
Publication of WO2021164280A1 publication Critical patent/WO2021164280A1/zh
Priority to US17/703,829 priority patent/US20220215558A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/20Contour coding, e.g. using detection of edges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This application relates to the field of computer technology, in particular to a three-dimensional edge detection method, device, storage medium and computer equipment.
  • Edge detection is a basic problem in image processing and computer vision. It can provide important information for other computer vision tasks such as semantic segmentation, instance segmentation, and object tracking.
  • most of the current edge detection is the edge detection of two-dimensional images. There are few technologies to solve the problem of edge detection of three-dimensional images, and the accuracy of edge detection of rare three-dimensional images is also low. There is no solution to this problem. Propose effective solutions.
  • a three-dimensional edge detection method, device, storage medium, and computer equipment are provided.
  • a three-dimensional edge detection method executed by a computer device, the method including:
  • Decoding is performed according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an optimized three-dimensional edge detection result of the three-dimensional image.
  • a three-dimensional edge detection device includes:
  • the obtaining module is used to obtain the two-dimensional object detection result and the two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image; stack each of the two-dimensional object detection results into a three-dimensional object detection result, and combine each of the two-dimensional edges The detection results are stacked into three-dimensional edge detection results;
  • An encoding module for encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an encoding result
  • the decoding module is configured to decode according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an optimized three-dimensional edge detection result of the three-dimensional image.
  • a non-volatile storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the three-dimensional edge detection method.
  • a computer device includes a memory and a processor.
  • the memory stores computer readable instructions.
  • the processor executes the steps of the three-dimensional edge detection method.
  • Fig. 1 is an application environment diagram of a three-dimensional edge detection method in an embodiment
  • FIG. 2 is a schematic diagram of the structure of a three-dimensional edge fine detection network in an embodiment
  • FIG. 3 is a schematic structural diagram of a three-dimensional edge fine detection network in another embodiment
  • FIG. 4 is a schematic diagram of a network structure applied by a three-dimensional edge detection method in an embodiment
  • FIG. 5 is a schematic structural diagram of an object detection model in an embodiment
  • Fig. 6 is a schematic structural diagram of an edge detection model in an embodiment
  • FIG. 7 is a comparison diagram of detection results of multiple edge detection methods in an embodiment
  • FIG. 8 is a comparison diagram of the continuity of the detection results of two edge detection methods in an embodiment
  • Fig. 9 is a structural block diagram of a three-dimensional edge detection device in an embodiment
  • Fig. 10 is a structural block diagram of a three-dimensional edge detection device in another embodiment.
  • Fig. 11 is a structural block diagram of a computer device in an embodiment.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure machine vision for targets. Further graphics processing is done to make computer processing more suitable for human eyes to observe or send to the instrument to detect images.
  • Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robotics, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • the three-dimensional edge detection involved in the embodiments of this application is the basis of image processing and computer vision.
  • the three-dimensional edge detection result can be used for other computer vision such as semantic segmentation, object detection, and instance segmentation.
  • object tracking provides important information, which is a very basic but also very important computer vision task.
  • the results of 3D edge detection can help a large number of medical image segmentation or detection tasks.
  • a three-dimensional edge detection method is provided.
  • the method is mainly applied to computer equipment as an example.
  • the computer device may specifically be a terminal or a server.
  • the three-dimensional edge detection method specifically includes the following steps:
  • S102 Acquire a two-dimensional object detection result and a two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image.
  • a three-dimensional image is an image with information in three dimensions. Divide the three-dimensional image in one of the dimensions to obtain each two-dimensional segment of the three-dimensional image. Under normal circumstances, the three dimensions of a three-dimensional image can be considered as the three dimensions of height, width and depth; the two-dimensional slice of a three-dimensional image is to divide the three-dimensional image in the depth dimension, and different two-dimensional slices correspond to different depths. Location. Of course, in other embodiments, the three-dimensional image can also be divided in other dimensions, which is not limited here.
  • the computer device can use the method of object detection on the two-dimensional image to process each two-dimensional segment of the three-dimensional image to obtain the two-dimensional object detection result of each two-dimensional segment of the three-dimensional image;
  • the edge detection algorithm processes each two-dimensional segment of the three-dimensional image to obtain the two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image.
  • the purpose of object detection is to identify the pixel area in the image where the object is located
  • the purpose of edge detection is to identify the pixels in the image where the pixel gray level changes significantly.
  • the edge usually exists between the object and the background.
  • Both object detection and edge detection can be pixel-level detection, that is, the category of each pixel is determined according to the detection task.
  • the object detection can detect the object, and there is no need to classify the object, that is, to distinguish different objects, and it is not necessary to determine what the object is.
  • S104 Stack each two-dimensional object detection result into a three-dimensional object detection result, and stack each two-dimensional edge detection result into a three-dimensional edge detection result.
  • a three-dimensional image is divided into more than one frame of two-dimensional slices in one dimension, and there is a certain order relationship between these two-dimensional slices, that is, each frame of two-dimensional slice corresponds to a position in the divided dimension (Such as depth value), stack the two-dimensional object detection results of these two-dimensional segments according to the order relationship between the corresponding two-dimensional segments to obtain the three-dimensional object detection results; detect the two-dimensional edges of these two-dimensional segments The results are stacked according to the order relationship between the corresponding two-dimensional slices, and the three-dimensional edge detection result can be obtained.
  • S106 Perform encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an encoding result.
  • the feature map is used to reflect the features of the image, which is a form of existence of the features of the image.
  • Feature maps such as the original RGB three-channel image of the image or the Feature Map output by convolution operation on the image, etc.
  • encoding when encoding an image, a common method is to directly encode the feature map of the image.
  • encoding is performed based on three different types of data, the feature map, the three-dimensional object detection result, and the three-dimensional edge detection result. Further, it is also possible to encode the result of the operation after performing certain operations on these three types of data. In this way, more and more useful information can be obtained by referring to the three-dimensional object detection results and the three-dimensional edge detection results during the encoding process.
  • the computer device may use an encoder to perform encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain the encoding result. Further, the computer device can perform a dot multiplication operation on the color feature map of the three-dimensional image and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then use an encoder to encode the result of the foregoing operation to obtain the coding result.
  • S106 includes: encoding more than one time according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result; the input of each encoding is the three-dimensional object detection result and the three-dimensional edge detection result encoding the previous The output of the output is the result of the calculation; the output of each encoding is different and all are the feature maps of the three-dimensional image; and the feature map of the last encoding output is obtained to obtain the encoding result.
  • the computer device may use an encoder for encoding, and the encoder may include more than one encoding stage, so that more than one encoding operation can be performed according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result.
  • the input of each encoding stage of the encoder is the calculation result of the three-dimensional object detection result and the three-dimensional edge detection result after the output of the previous encoding stage is calculated.
  • each encoding stage of the encoder is a feature map of a three-dimensional image, and the output feature maps of each encoding stage of the encoder are different.
  • the computer equipment can use the feature map output in the last encoding stage as the encoding result.
  • the computer device can implement the encoding process through a three-dimensional edge fine detection network (Joint Edge Refinement Network).
  • the three-dimensional edge fine detection network includes an encoder, the encoder may include four encoding stages, each encoding stage may include two convolution modules, each convolution module may include a convolution layer, an activation function layer and Normalization layer.
  • the activation function may specifically be a ReLU function, etc.
  • the normalization may be group normalization (Group Normalization), etc.
  • model structure shown in FIG. 2 is only an example, and does not limit the structure of the three-dimensional edge fine detection network.
  • the actual three-dimensional edge fine detection network may include more or less components than those shown in FIG. 2 Part, and the parameters of the structure included in FIG. 2 may also be different.
  • encoding more than once according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result includes: multiplying the color feature map of the three-dimensional image with the three-dimensional object detection result, and then multiplying the result with the three-dimensional edge After the detection results are added, the current encoding is performed; and the output of the current encoding is multiplied with the three-dimensional object detection results, and then the three-dimensional edge detection results are added to perform the next encoding until the last encoding.
  • object detection is to identify the area where the object in the image is located
  • edge detection is to identify the pixel points in the image that have a significant change in pixel gray level. Since the edge usually exists between the object and the background, it can be considered that there is a certain similarity between object detection and edge detection. Assuming that the result of object detection is D obj and the result of edge detection is D edg , then there is the following logical relationship between the two results:
  • F(g(I) ⁇ D obj + D edg ) D′ edg , where F( ⁇ ), g( ⁇ ) are different edge detection operators, I is the input image feature, D′ edg It is the result of edge detection that is more accurate than D edg. It can be widely understood as: the intersection of object detection and edge detection (ie dot multiplication operation) is edge detection, and the union of the two (ie addition operation) is object detection. Then g(I) ⁇ D obj + D edg obtains the edge detection result, and then uses the edge detection operator on the edge detection result to obtain a more accurate edge detection result.
  • the computer device can perform a dot multiplication operation on the color feature map of the three-dimensional image and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then input the calculation result as the first encoding stage; and then output the output of the encoding stage.
  • the feature map and the three-dimensional object detection result are subjected to a dot multiplication operation, and then added to the three-dimensional edge detection result, and then the operation result is used as the input of the next encoding stage until the final encoding stage outputs the encoding result.
  • the feature map of the three-dimensional image may be the original RGB color channel feature map of the three-dimensional image;
  • the feature map output from the previous encoding can be used.
  • the computer device can implement the point multiplication operation of the feature map of the three-dimensional image with the three-dimensional object detection result through the mutual learning module (Mutual, M), and then add the operation of the three-dimensional edge detection result . That is, the input of the mutual learning module (M) is the feature map of the three-dimensional image (F), the three-dimensional object detection result (O), and the three-dimensional edge detection result (E); the output is the feature map of the new three-dimensional image (F).
  • M the input of the mutual learning module
  • the output is the feature map of the new three-dimensional image (F).
  • the mutual learning module (M) performs a dot multiplication operation on the feature map (F) of the three-dimensional image and the three-dimensional object detection result (O) Then add it to the 3D edge detection result (E) Output the feature map of the new three-dimensional image (F).
  • the feature map of the three-dimensional image (F) is the color feature map of the three-dimensional image
  • the subsequent mutual learning module (M) the feature map of the three-dimensional image (F) is the output of the encoding Feature map.
  • the computer device can input the color feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result into the mutual learning module (M), and the mutual learning module (M) output to the first encoding stage of the encoder;
  • the output of encoding in the first encoding stage is input to the mutual learning module (M) together with the results of 3D object detection and 3D edge detection, and the mutual learning module (M) is output to the second encoding stage of the encoder; the output of encoding in the second encoding stage
  • the three-dimensional object detection result and the three-dimensional edge detection result are input to the mutual learning module (M), and the mutual learning module (M) is output to the third encoding stage of the encoder; the output of the third encoding stage is encoded with the three-dimensional object detection result and the three-dimensional
  • the edge detection results are jointly input to the mutual learning module (M), and the mutual learning module (M) is output to the fourth encoding stage of the encoder; the output of the
  • model structure shown in FIG. 3 is only an example, and does not limit the structure of the three-dimensional edge fine detection network.
  • the actual three-dimensional edge fine detection network may include more or less components than those shown in FIG. 3 Part, and the parameters of the structure included in FIG. 3 may also be different.
  • I in g(I) ⁇ D obj + D edg is the image feature of the previous encoding stage, and the encoding can be regarded as an edge
  • the detection operator, g(I) is the image feature output from the previous encoding stage.
  • the feature map of the three-dimensional image is multiplied by the three-dimensional object detection result, and then added to the three-dimensional edge detection result before encoding.
  • the data is encoded more than once, that is, after more than one level of encoding operation, the resulting encoding output will have a more accurate representation of features.
  • S108 Perform decoding according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an optimized three-dimensional edge detection result of the three-dimensional image.
  • the three-dimensional edge detection result obtained by stacking the two-dimensional edge detection result in S104 is a relatively accurate detection result.
  • the optimized three-dimensional edge detection result is more accurate than the three-dimensional edge detection result obtained by stacking the two-dimensional edge detection results, and fits the real edge of the object more closely.
  • the optimized 3D edge detection result is not limited to the result obtained by using the optimization algorithm to optimize the 3D edge detection result obtained by stacking the 2D edge detection result, or it can be the result obtained by stacking the 2D edge detection result.
  • the three-dimensional edge detection result is applied to the specific process of edge detection on the three-dimensional image, and the three-dimensional edge detection result is obtained.
  • the decoding is performed based on three different types of data: the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result. Further, it is also possible to decode the operation result after performing certain operations on these three types of data. In this way, during the decoding process, more and more useful information can be obtained by referring to the three-dimensional object detection results and the three-dimensional edge detection results.
  • the computer device may use a decoder to decode according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain the decoding result, that is, to obtain the optimized three-dimensional edge detection result of the three-dimensional image. Further, the computer device can perform a dot multiplication operation on the encoding result and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then use a decoder to decode the calculation result of the foregoing operation to obtain the decoding result, that is, to obtain the three-dimensional image Optimized 3D edge detection results.
  • the optimized three-dimensional edge detection result of the three-dimensional image may be a three-dimensional image including two pixel values. Among them, one kind of pixel value indicates that the corresponding pixel is an edge pixel, and the other kind of pixel value indicates that the corresponding pixel is a non-edge pixel.
  • the optimized three-dimensional edge detection result of the three-dimensional image may be a three-dimensional probability matrix.
  • the probability value of each matrix position the probability that the corresponding pixel of the three-dimensional image belongs to the pixel on the edge, when the probability is greater than the preset threshold, it is considered as the pixel on the edge.
  • S108 includes: decoding more than one time according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result; the input of each decoding includes the three-dimensional object detection result and the three-dimensional edge detection result to perform the decoding on the output of the previous decoding.
  • the operation result of the operation and obtaining the last decoded output to obtain the optimized 3D edge detection result of the 3D image.
  • the computer device may use a decoder for decoding, and the decoder may include more than one decoding stage, so that more than one decoding operation can be performed according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result. Since the input of the decoder undergoes multi-level decoding operations, the features extracted from the encoding can be accurately mapped to the output space. Among them, the input of each decoding stage of the decoder includes the operation result of the three-dimensional object detection result and the three-dimensional edge detection result calculated on the output of the previous decoding stage.
  • each decoding stage the three-dimensional object detection result and the three-dimensional edge detection result can be referenced for decoding, so as to improve the effectiveness of the decoding.
  • the output of each decoding stage of the decoder is the feature map of the three-dimensional image, and the output feature map of each decoding stage of the decoder is different.
  • the output space may be the detection result of whether it is a three-dimensional boundary.
  • the feature map output in the last decoding stage may be an optimized three-dimensional edge detection result of the three-dimensional image.
  • the optimized three-dimensional edge detection result may specifically be a classification map of each pixel of the three-dimensional image.
  • the pixel value of the pixel on the classification map indicates the category to which the corresponding pixel of the three-dimensional image belongs. There are two categories here, one is the category that belongs to the edge, and the other is the category that does not belong to the edge.
  • the pixel value of the pixel on the classification map includes two types (0 and 1), 0 indicates that the corresponding pixel of the three-dimensional image is not an edge pixel, and 1 indicates that the corresponding pixel of the three-dimensional image is an edge pixel.
  • the two processes of encoding and decoding the three-dimensional graphics are the process of performing three-dimensional edge detection on the three-dimensional image to determine whether each pixel in the three-dimensional image is a pixel of the three-dimensional edge.
  • the optimized three-dimensional edge detection result may specifically be a probability distribution map in which each pixel of the three-dimensional image is an edge pixel.
  • the pixel value of the pixel on the probability distribution map represents the probability that the corresponding pixel of the three-dimensional image is an edge pixel.
  • the three-dimensional edge fine detection network may include a decoder, which may include three decoding stages, and each decoding stage may include two convolution modules, and each convolution module It can include a convolutional layer, an activation function layer, and a normalization layer.
  • the activation function may specifically be a ReLU function, etc.
  • the normalization may be group normalization (Group Normalization), etc.
  • the input of each decoding may also include the output of the encoding stage of skip connection with the current decoding stage.
  • the image features extracted from the previous encoding can be combined during decoding, thereby further improving the decoding accuracy. For example, suppose the encoder includes four stages and the decoder includes three stages; then the first encoding stage and the third decoding stage can be jump-connected, and the second encoding stage and the second decoding stage can be jump-connected, Connect the third encoding stage to the first decoding stage by skipping.
  • the data is decoded more than once, that is, more than one level of decoding operation is performed, so that the resulting decoded output has a more accurate classification result for pixels.
  • performing more than one decoding based on the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result includes: performing a dot multiplication operation on the encoding result and the three-dimensional object detection result, and then adding the three-dimensional edge detection result to perform the processing. Second decoding; and perform a dot multiplication operation on the output of the current decoding and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result and perform the next decoding until the last decoding.
  • the computer device can perform a dot multiplication operation on the encoding result and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then use the calculation result as the input of the first decoding stage; and then the feature map output from the decoding stage is combined with The three-dimensional object detection result is subjected to a dot multiplication operation, and then added to the three-dimensional edge detection result, and the operation result is used as the input of the next decoding stage until the final decoding stage outputs the optimized three-dimensional edge detection result of the three-dimensional image.
  • the computer device may perform a dot multiplication operation on the encoding result and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then combine the calculation result with the output of the encoding stage jump connected to the first decoding stage As the input of the first decoding stage; then the feature map output from the decoding stage is multiplied with the 3D object detection result, and then added to the 3D edge detection result, and then the operation result is jumped to the encoding stage connected to the current decoding stage The output of is collectively used as the input of the next decoding stage until the output of the last decoding stage.
  • the feature map of the three-dimensional image is multiplied by the three-dimensional object detection result, and then added to the three-dimensional edge detection result before decoding. This makes the decoding focus on the interested The area where the object is located, and the existing potential edge detection results are also enhanced and enhanced in the input feature map, which can improve the decoding accuracy.
  • the three-dimensional edge detection method further includes: processing the encoding result through more than one hole convolution with different sampling rates to obtain more than one feature map; the sizes of the more than one feature maps are different; And the convolution operation is performed after connecting more than one feature map to obtain the multi-scale learning result.
  • Performing more than one decoding based on the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result includes: performing more than one decoding based on the multi-scale learning result, the three-dimensional object detection result, and the three-dimensional edge detection result.
  • Atrous Convolutions also known as Dilated Convolutions
  • Dilation Rate a parameter called "Dilation Rate" in the standard convolutional layer, which defines the convolution kernel processing The spacing between values in data.
  • the purpose of the hole convolution is to provide a larger receptive field without pooling (pooling layer will cause information loss) and the amount of calculation is equivalent.
  • the computer device may process the encoding result through more than one hole convolution with different sampling rates to obtain more than one feature map. Since different sampling rates may be different convolution kernel sizes and/or different expansion rates, the sizes of more than one feature map obtained in this way are different.
  • the computer device then connects more than one feature map and performs a convolution operation to obtain a multi-scale learning result.
  • the multi-scale learning result can also be a feature map of a three-dimensional image.
  • the computer device can use a multi-scale learning module to implement "processing the encoding result through more than one hole convolution with different sampling rates to obtain more than one feature map; the size of more than one feature map is Each is different; connect more than one feature map and perform convolution operation to obtain a multi-scale learning result",
  • the multi-scale learning module can specifically be a spatial pyramid structure (Atrous Spatial Pyramid Pooling, ASPP).
  • ASPP Spatial Pyramid Pooling
  • the three-dimensional edge fine detection network also includes an ASPP module located between the encoder and the decoder. The input of the ASPP module is the coding result output by the fourth coding stage. After the ASPP module extracts features of more than one scale from the input, it outputs the multi-scale learning result.
  • the encoding result is operated through multi-scale hole convolution, so that richer multi-scale and multi-view image features can be extracted, which is helpful for subsequent decoding operations.
  • performing more than one decoding based on the multi-scale learning result, the three-dimensional object detection result, and the three-dimensional edge detection result includes: multiplying the multi-scale learning result with the three-dimensional object detection result, and then multiplying the result with the three-dimensional edge detection result.
  • the current decoding is performed; and the output of the current decoding is multiplied by the three-dimensional object detection result, and then the three-dimensional edge detection result is added to perform the next decoding until the last decoding.
  • performing more than one decoding based on the multi-scale learning result, the three-dimensional object detection result, and the three-dimensional edge detection result includes: multiplying the multi-scale learning result with the three-dimensional object detection result, and then multiplying the result with the three-dimensional edge detection result.
  • the current decoding is performed together with the output of the intermediate encoding; and the output of the current decoding is multiplied with the three-dimensional object detection result, and then added to the three-dimensional edge detection result, and the previous encoding of the intermediate encoding is performed.
  • the output of the encoding is used together for the next encoding until the last decoding.
  • the computer device can perform a dot multiplication operation on the multi-scale learning result and the three-dimensional object detection result, and then add it to the three-dimensional edge detection result, and then use the calculation result and the output of the encoding stage jump connected with the first decoding stage as the first The input of a decoding stage; then the feature map output by the decoding stage is multiplied with the 3D object detection result, and then added to the 3D edge detection result, and then the operation result and the output of the encoding stage jump connected with the current decoding stage Together as the input of the next decoding stage, until the output of the last decoding stage.
  • the computer device can input the multi-scale learning results, the three-dimensional object detection results, and the three-dimensional edge detection results into the mutual learning module (M), and the output of the mutual learning module (M) and the output of the third encoding stage are jointly input to the decoding
  • the first decoding stage of the device the output of the decoding in the first decoding stage is input to the mutual learning module (M) together with the three-dimensional object detection results and the three-dimensional edge detection results, and the output of the mutual learning module (M) is the same as the output of the second encoding stage Input to the second decoding stage of the decoder; the output of the decoding in the second decoding stage is input to the mutual learning module (M) together with the 3D object detection result and the 3D edge detection result, the output of the mutual learning module (M) and the first encoding stage
  • the output of is input to the third decoding stage of the decoder; the output decoded in the third decoding stage is the optimized 3D edge detection result (Subtle 3D
  • the feature maps output by the skip-connected encoding stage are jointly decoded, so that the subsequent decoding input not only clarifies the image features, but also combines the image features extracted by the previous encoding, thereby Further improve the decoding accuracy.
  • an optimized three-dimensional edge detection result can be obtained, thereby obtaining a fine three-dimensional edge (Subtle 3D Edge).
  • This fine three-dimensional edge can provide more and richer features and auxiliary results from other perspectives for various medical image tasks such as segmentation, detection, or tracking, and facilitate the realization of more accurate medical image-assisted diagnosis.
  • the three-dimensional object detection result and the three-dimensional edge detection result are stacked, and then the three-dimensional The object detection result and the three-dimensional edge detection result are encoded, and then the three-dimensional object detection result and the three-dimensional edge detection result are combined for decoding to obtain the optimized three-dimensional edge detection result of the three-dimensional image.
  • the two-dimensional detection results of each two-dimensional segment of the three-dimensional image are used in the three-dimensional edge detection, and the characteristics of the two-dimensional detection result and the spatial structure of the three-dimensional data can be skillfully combined.
  • Continuity complements each other to improve the accuracy of three-dimensional edge detection; and two-dimensional detection results include object detection and edge detection two detection results, the two detection results can also learn from each other and promote each other, which further improves The accuracy of 3D edge detection.
  • S106 and S108 in the foregoing embodiment may be implemented by a three-dimensional edge fine detection network (Joint Edge Refinement Network).
  • the three-dimensional edge fine detection network may include an encoder and a decoder.
  • the encoder may include multiple encoding stages, and the decoder may include multiple decoding stages.
  • the input of the first encoding stage can be the color feature map of the three-dimensional graphics, the result of the three-dimensional object detection and the calculation result of the three-dimensional edge detection result, and the input of the non-first encoding stage can be the output of the previous encoding stage, the three-dimensional object The calculation result of the detection result and the three-dimensional edge detection result.
  • the input of the first decoding stage can include the encoding result, the 3D object detection result and the calculation result of the 3D edge detection result.
  • the input of the non-first decoding stage can include the output of the previous decoding stage, the 3D object detection result and the 3D edge detection Result The result of the calculation.
  • the calculation results of the three types of data included in the input of the encoding (decoding) stage can be implemented through the mutual learning module.
  • each decoding stage may also include the output of the encoding stage jump-connected with the current decoding stage.
  • the three-dimensional edge fine detection network may also include a multi-scale learning module (such as ASPP) located between the encoder and the decoder.
  • the input of the multi-scale learning module is the output of the last coding stage.
  • the input of the first decoding stage may be the output of the multi-scale learning module, the calculation result of the three-dimensional object detection result, and the three-dimensional edge detection result.
  • the three-dimensional edge fine detection network provided by the above embodiments can be obtained by deep supervision (Deep Supervision) learning through training samples with training labels.
  • each structure included in the network can be learned through deep supervision (Deep Supervision) learning.
  • the training samples input to the three-dimensional edge fine detection network are three-dimensional object detection results stacked from two-dimensional object detection results of each two-dimensional segment of three-dimensional image samples, and three-dimensional object detection results of each two-dimensional segment of three-dimensional image samples. Two-dimensional edge detection results are stacked into three-dimensional edge detection results.
  • the training label of the training sample is the three-dimensional edge label of the three-dimensional image sample.
  • the computer equipment can construct a loss function according to the training samples and training labels, and supervise the training of the three-dimensional edge fine detection network.
  • the above-mentioned supervised training loss function may be the Dice Loss loss function, and the loss function is specifically as follows:
  • N is the number of pixels in the three-dimensional image
  • p i is the probability that the i-th pixel is an edge pixel
  • y i is the training label of the i-th pixel.
  • obtaining the two-dimensional object detection result and the two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image includes: obtaining the two-dimensional initial object detection result and the two-dimensional initial edge detection of each two-dimensional segment of the three-dimensional image Results: For each two-dimensional segment of the three-dimensional image, the color feature map of the two-dimensional segment and the two-dimensional initial edge detection result of the two-dimensional segment are subjected to a dot multiplication operation, and then the two-dimensional initial object detection result of the two-dimensional segment is multiplied Perform object detection after addition to obtain the detection result of the two-dimensional target object of the two-dimensional slice; and perform convolution operation on each two-dimensional slice of the three-dimensional image, according to the output of the convolution operation and the corresponding two-dimensional sliced two-dimensional object
  • the detection result is the detection result of the two-dimensional target edge of each two-dimensional segment.
  • the computer equipment can use different networks to independently perform object detection and edge detection.
  • the object detection model (Object Detection Module) is used for object detection
  • the edge detection model (Edge Detection Module) is used for edge detection.
  • the features extracted in object detection and edge detection can be transferred to each other in the process of network training and network use.
  • the computer equipment can realize mutual transmission of features extracted in object detection and edge detection through mutual learning modules.
  • the mutual learning module may specifically perform the following operation g(I) ⁇ D obj +D edg , that is, the image feature and the object detection result are dot multiplied, and then added to the edge detection result.
  • the computer device may supervisely pre-train the object detection model and the edge detection model respectively. After pre-training, the two models are then connected through a mutual learning module to obtain a joint object and edge detection network (Mutual Object and Edge Detection Network), and then the joint object and edge detection network is further trained.
  • a joint object and edge detection network Matual Object and Edge Detection Network
  • the object and edge joint detection network may add a mutual learning module before the object detection model and/or add a mutual learning module after the edge detection model.
  • the object detection model and the edge detection model obtained by the pre-training are used to obtain the initial two-dimensional detection result of the two-dimensional image according to the two-dimensional image.
  • the object and edge joint detection network obtained by further training is used to obtain the two-dimensional detection result of the target of the two-dimensional image according to the two-dimensional image.
  • the two-dimensional detection result of the target is used to stack into a three-dimensional detection result, which is used in steps such as S106 and S108.
  • the computer equipment can input each two-dimensional segment of the three-dimensional image into the pre-trained object detection model to obtain the two-dimensional initial object detection result of each two-dimensional segment; and input each two-dimensional segment of the three-dimensional image into the pre-training
  • the obtained edge detection model obtains the two-dimensional initial edge detection results of each two-dimensional segment.
  • the computer equipment inputs the two-dimensional slices of the three-dimensional image into the object and edge joint detection network.
  • the mutual learning module in front of the object detection model combines the color feature map of the two-dimensional slice with the two-dimensional
  • the two-dimensional initial edge detection results of the shards are subjected to dot multiplication, and the two-dimensional initial object detection results of the two-dimensional shards are added to the object detection model in the object and edge joint detection network, and the two-dimensional two-slices are output.
  • Dimension target object detection result In the object and edge joint detection network, the mutual learning module in front of the object detection model combines the color feature map of the two-dimensional slice with the two-dimensional
  • the two-dimensional initial edge detection results of the shards are subjected to dot multiplication, and the two-dimensional initial object detection results of the two-dimensional shards are added to the object detection model in the object and edge joint detection network, and the two-dimensional two-slices are output.
  • Dimension target object detection result In the object and edge joint detection network, the mutual learning module in front of the object detection model combines the color feature map of
  • the edge detection model in the object and edge joint detection network performs convolution operations on the two-dimensional slices, and the mutual learning module after the edge detection model in the object and edge joint detection network combines the output of the convolution operation with the two-dimensional object detection results After the dot multiplication operation is performed, and the output of the convolution operation is added, the two-dimensional segmented two-dimensional target edge detection result is obtained.
  • the object detection and the edge detection mutually learn and promote each other, so that the obtained two-dimensional detection result is more accurate, so that the reference data in the subsequent three-dimensional detection can be more accurate.
  • the color feature map of the two-dimensional segment and the two-dimensional initial object detection result of the two-dimensional segment are subjected to a dot multiplication operation, and then the two-dimensional segment After the initial object detection results are added, the object detection is performed to obtain the two-dimensional segmented two-dimensional target object detection result, including: for each frame of the two-dimensional segment of the three-dimensional image, the following steps are performed: the color feature map of the two-dimensional segment Perform a dot multiplication operation with the two-dimensional initial object detection result of the two-dimensional segment, and add it to the two-dimensional initial object detection result of the two-dimensional segment as the data to be processed; and encode the data to be processed more than once and more than once Decoding to obtain the two-dimensional target object detection result of the two-dimensional segment output from the last decoding.
  • the computer device may use an encoder for encoding and a decoder for decoding; the encoder may include more than one encoding stage; the decoder may include more than one decoding stage.
  • the computer device can perform a dot multiplication operation on the color feature map of the two-dimensional segment and the two-dimensional initial object detection result of the two-dimensional segment, and then add it to the two-dimensional initial object detection result of the two-dimensional segment as the data to be processed , And then use the data to be processed as the input of the first encoding stage; then use the feature map output from the encoding stage as the input of the next encoding stage until the final encoding stage outputs the encoding result. Then use the encoding result as the input of the first decoding stage; then use the feature map output from the decoding stage as the input of the next decoding stage until the final decoding stage outputs the two-dimensional target object detection result.
  • the encoder in this embodiment and the encoder in S106 are different encoders, and their structures are different, and the dimensions of the encoded data are also different.
  • the decoder in this embodiment and the decoder in S108 are different decoders, and their structures are different, and the dimensions of the decoded data are also different.
  • the encoding stage may also be hop-connected with the decoding stage.
  • the input of the first decoding stage of the decoder can be: the output of the last encoding stage and the output of the jump-connected encoding stage
  • the input of the subsequent decoding stage can be: the output of the previous decoding stage and the jump-connected encoding The output of the stage.
  • the initial detection result is calculated and then used as the encoding object.
  • the initial detection result can be referred to during encoding, and more useful information can be extracted by focusing on a specific area; and more than One-time encoding can make the feature representation more accurate, and more than one-time decoding can make the resulting decoded output more accurate for pixel classification results.
  • encoding and decoding the data to be processed more than once to obtain the two-dimensional target object detection result of the two-dimensional segment output from the last decoding includes: encoding the data to be processed more than once to obtain The object detection encoding result output by the last encoding; the object detection encoding result is processed by more than one hole convolution with different sampling rates, and more than one feature map is obtained; the size of more than one feature map is different; there will be more A convolution operation is performed after connecting a feature map to obtain a multi-scale learning result; and the multi-scale learning result is decoded more than once to obtain a two-dimensional target object detection result of the two-dimensional segment output from the last decoding.
  • the computer device may use an encoder for encoding and a decoder for decoding; the encoder may include more than one encoding stage; the decoder may include more than one decoding stage.
  • the computer device can perform a dot multiplication operation on the color feature map of the two-dimensional segment and the two-dimensional initial object detection result of the two-dimensional segment, and then add it to the two-dimensional initial object detection result of the two-dimensional segment as the data to be processed , And then use the data to be processed as the input of the first encoding stage; then use the feature map output from the encoding stage as the input of the next encoding stage until the final encoding stage outputs the encoding result.
  • the object detection encoding results are processed by more than one hole convolution with different sampling rates to obtain more than one feature map; the size of more than one feature map is different; more than one feature map is connected and then rolled Product operation to obtain multi-scale learning results.
  • This process can be implemented through a multi-scale learning module.
  • the multi-scale learning module is specifically like ASPP structure.
  • the multi-scale learning result is used as the input of the first decoding stage; then the feature map output by the decoding stage is used as the input of the next decoding stage, until the final decoding stage outputs the two-dimensional target object detection result.
  • the encoding stage may also be hop-connected with the decoding stage.
  • the input of the first decoding stage of the decoder can be: the output of the multi-scale learning module and the output of the skip-connected encoding stage
  • the input of the subsequent decoding stage can be: the output of the previous decoding stage and the skip-connected encoding The output of the stage.
  • the encoding result is operated through multi-scale hole convolution, so that richer multi-scale and multi-view image features can be extracted, which is helpful for subsequent decoding operations.
  • the input (Input) of the object detection model is a two-dimensional slice of the three-dimensional image
  • the output (Output) is the object detection result of the two-dimensional slice.
  • the object detection model includes an encoder, a decoder, and an ASPP module located between the encoder and the decoder.
  • the encoder includes an input layer and four encoding stages.
  • the input layer includes a residual module, and the four coding stages respectively include 4, 6, 6, and 4 residual modules.
  • the input and output of each coding stage are connected by an addition operation. After each coding stage, a convolution operation (core size such as 3 ⁇ 3) and an average pooling operation (core size such as 2 ⁇ 2) are connected, and the features are Image downsampling (e.g.
  • the decoder includes four decoding stages and an output convolutional layer.
  • Each decoding stage includes two residual modules. Before each decoding stage, there is an up-sampling (such as double up-sampling) and a convolution operation (core size such as 1 ⁇ 1).
  • the encoding stage and the decoding stage can be connected in jumps, and the input layer and output layer can also be connected in jumps.
  • each residual module includes two convolution modules, and each convolution module includes a convolution layer, a normalization layer, and an activation function layer.
  • the normalization may be Batch Normalization.
  • the activation function may be a ReLU function.
  • the loss function can be recorded as L seg for supervised training.
  • model structure shown in FIG. 5 is only an example, and does not limit the structure of the object detection model.
  • the actual object detection model may include more or less components than those shown in FIG.
  • the parameters of the structure included in 5 can also be different.
  • the computer device can construct a loss function according to the training sample (two-dimensional image) and the training label (object detection label) of the training sample, and train the object detection model in a supervised manner.
  • the above-mentioned supervised training loss function may be a two-category cross-entropy loss function, and the loss function is specifically as follows:
  • y is an image pixel-level label
  • p is the probability value of a pixel with a label of 1 predicted by the model belonging to the category.
  • a label of 1 can specifically indicate that the pixel is the pixel of the object.
  • a convolution operation is performed on each two-dimensional segment of the three-dimensional image, and the two-dimensional object edge detection of each two-dimensional segment is obtained according to the output of the convolution operation and the two-dimensional object detection result of the corresponding two-dimensional segment
  • the result includes: performing the following steps for each frame of the two-dimensional slice of the three-dimensional image: performing more than one stage of convolution operation on the two-dimensional slice; combining the output of each stage with the two-dimensional initial object detection of the two-dimensional slice.
  • the result is a dot multiplication operation, and then is added to the output of the current stage to obtain the stage detection result; and the detection results of each stage are combined to obtain the two-dimensional segmented two-dimensional target edge detection result.
  • the computer device can perform more than one stage of convolution operation on each frame of the two-dimensional slice, and then perform the dot multiplication operation on the output of each stage with the two-dimensional initial object detection result of the two-dimensional slice, and then compare it with the current stage
  • the output of is added to obtain the phase detection result; the detection results of each phase are combined to obtain the two-dimensional target edge detection result of the two-dimensional segmentation.
  • each stage includes more than one convolutional layer.
  • the detection result of each stage can also be used as a two-dimensional segmented two-dimensional target edge detection result.
  • Combining the detection results of each stage may be an element-wise addition of the detection results of each stage.
  • the output of each stage may be the result of bitwise addition of the outputs of the convolutional layers included in the stage.
  • the output of each convolutional layer can be added by bit after convolution operation.
  • more than one stage of convolution operation is performed on the data, and the output is calculated with the object detection result in each stage to obtain the edge detection result of this stage, which can be combined with the object detection result to improve the accuracy of edge detection. ; And the detection results of each stage are combined to obtain a two-dimensional segmented two-dimensional target edge detection result, which can integrate the information extracted at each stage to improve the accuracy of edge detection.
  • the input (Input) of the edge detection model is a two-dimensional slice of the three-dimensional image
  • the output (Output) is an edge detection result of the two-dimensional slice.
  • the edge detection model includes more than one convolutional layer, and these convolutional layers are divided into more than one stage.
  • the edge detection model includes 16 convolutional layers with a kernel size of 3 ⁇ 3. These convolutional layers are divided into 5 stages.
  • the first stage includes 2 convolutional layers, and the second and third stages Including 3 convolutional layers, the fourth and fifth stages include 4 convolutional layers.
  • Each convolutional layer in each stage is connected to a convolution operation with a kernel size of 1 ⁇ 1 and added together to obtain the feature map of each stage.
  • the feature map undergoes a 1 ⁇ 1 convolution operation and two After the double upsampling, it is input to the mutual learning module M mentioned above together with the object detection result, and the five outputs obtained are connected to obtain the edge detection result of the two-dimensional slice. Among them, after each stage obtains the feature map of that stage, there can be a pooling operation to perform double downsampling on the feature map.
  • the model structure shown in FIG. 6 is only an example, and does not limit the structure of the object detection model.
  • the actual object detection model may include more or less components than those shown in FIG.
  • the parameters of the structure included in 6 can also be different.
  • each variable in the mutual learning module M(g(I) ⁇ D obj + D edg ) is specifically: g(I) and D edg are the feature maps output at the current stage after convolution operation and upsampling D obj is the object detection result output by the pre-trained object detection model.
  • the loss function can be recorded as L edge for supervised training; and when building the loss function, a loss function can be constructed for each stage, and the loss function of each stage is used to train and update the current
  • the model parameters of each stage before the current stage and the current stage can also only be trained to update the model parameters of the current stage.
  • the computer device can construct a supervised training loss function based on the training sample (two-dimensional image) and the training label (edge detection label) of the training sample, and train the object detection model in a supervised manner.
  • the above-mentioned supervised training loss function may be a Focal loss loss function, and the loss function is specifically as follows:
  • p is the probability value of a pixel with a label of 1 predicted by the model to belong to this category
  • is a weighting factor with a label of 1
  • is an adjustable focus factor to adjust the control factor (1-p) ⁇ .
  • a label of 1 can specifically indicate that the pixel is an edge pixel.
  • these 6 outputs include: the output of the first mutual learning module, the result of element-wise addition of the outputs of the first and second mutual learning modules, and the first The bitwise addition results of the outputs of the two and three mutual learning modules, the bitwise addition results of the outputs of the first, second, third, and four mutual learning modules, the first, second, third, fourth, and fifth mutual learning modules The bitwise addition result of the output of the learning module, and the output connection (Concatention) result of the first, second, third, fourth, and fifth mutual learning modules.
  • the pre-trained edge detection model may not include a mutual learning module. That is, the feature maps of each stage are added bit by bit after convolution and upsampling to obtain the edge detection result of the two-dimensional image.
  • the input (Input) of the object and edge joint detection network is a two-dimensional slice of a three-dimensional image.
  • Two-dimensional slicing is processed by the mutual learning module before inputting the object detection model in the object and edge joint detection network.
  • the mutual learning module includes three inputs, which are the two-dimensional slice of the three-dimensional image, the two-dimensional initial object detection result of the two-dimensional slice, and the two-dimensional initial edge detection result of the two-dimensional slice.
  • the two-dimensional initial object detection result of the two-dimensional segment is obtained through the pre-trained object detection model, and the two-dimensional initial edge detection result of the two-dimensional segment is obtained through the pre-trained edge detection model.
  • the three inputs included in the mutual learning module can also be the two-dimensional segmentation of the three-dimensional image, the two-dimensional initial object detection result of the two-dimensional segmentation, and the output of the edge detection model in the object and edge joint detection network.
  • the output of the object detection model in the object and edge joint detection network is a two-dimensional segmented two-dimensional target object detection result.
  • the output of each stage of the edge detection model is superimposed after being processed by the mutual learning module to obtain the two-dimensional target object of the two-dimensional segmentation.
  • the mutual learning module connected after each stage includes two inputs, which are the output of the stage and the two-dimensional initial object detection result of the two-dimensional slice.
  • both g(I) and D edg are the outputs of this stage, so there are only two inputs.
  • the pre-trained object detection model has the same model structure as the object detection model in the object and edge joint detection network, but the model parameters are different; the object detection model in the object and edge joint detection network is pre-trained.
  • the object detection model obtained by training is obtained by further training.
  • the edge detection model obtained by pre-training has the same model structure as the edge detection model in the object and edge joint detection network, but the model parameters are different; the edge detection model in the object and edge joint detection network is the edge detection obtained in the pre-training On the basis of the model, it is obtained by further training.
  • the model structure of the object detection model can refer to the model structure shown in FIG. 5, and the model structure of the edge detection model can refer to the model structure shown in FIG. 6.
  • the input of the mutual learning module connected to the object detection model can be a two-dimensional slice of a three-dimensional image, or a pre-trained object detection model.
  • g (I) ⁇ D obj + D obj D edg is the fixed output of the model melody derived;
  • g (I) ⁇ D obj + D edg D edg may be a real-time model is being trained
  • the output can also be the fixed output of the pre-trained model.
  • the input of the mutual learning module connected at each stage of the edge detection model can be the real-time output of each stage and the real-time output of the current object detection model; or, the real-time output of each stage and the output of the pre-trained object detection model.
  • g real-time output of each stage (I) ⁇ D obj + D edg of g (I) and D edg are edge detection model; g (I) ⁇ D obj + D obj D edg may be
  • the real-time output of the model being trained can also be the fixed output of the model obtained by pre-training.
  • the two-dimensional target object detection results can be stacked into a three-dimensional object detection result, and the edge detection results of each two-dimensional target Stacking is the result of three-dimensional edge detection.
  • the three-dimensional image, the three-dimensional object detection result and the three-dimensional edge detection result are input into the three-dimensional edge fine detection network, and the output optimized three-dimensional edge detection result and the fine three-dimensional edge map are obtained.
  • the model structure of the three-dimensional edge fine detection network can refer to the model structure shown in Figure 2.
  • a two-dimensional convolutional neural network can learn features such as rich texture and structure of an image from two-dimensional data, while a three-dimensional convolutional neural network can learn relevant information about spatial structure continuity from three-dimensional data. Parts are mutually complementary.
  • the edge detection task and the object detection task have some similarities, and these two tasks can also learn from each other and promote each other.
  • the embodiments provided in this application implement joint learning of multi-level and multi-scale features in two-dimensional and three-dimensional data to accurately detect the edges of three-dimensional objects.
  • the network structure involved in the embodiments provided in this application includes two stages. The first stage is a joint object and edge detection network.
  • This stage focuses on learning the rich structure, texture, edge, and semantic features of objects in a single two-dimensional image.
  • the second stage is a three-dimensional edge fine detection network. This stage combines the objects and edge detection results learned in the previous stage to further learn continuous and fine three-dimensional object edges. In this way, the embodiments provided in the present application can accurately detect the three-dimensional edge that fits the real edge of the three-dimensional object.
  • the computer equipment has also tested and compared the three-dimensional edge detection method based on the embodiment of the present application with multiple existing edge detection algorithms.
  • Existing edge detection algorithms such as Holistically-Nested Edge Detection, HED; Richer Convolutional Features for Edge Detection, RCF; and Bi-Directional Cascade Network for Perceptual Edge Detection, BDCN.
  • this figure is a comparison diagram of the detection result obtained by the three-dimensional edge detection method provided by the embodiment of the application and the detection result obtained by other edge detection algorithms.
  • Including one of the three-dimensional image, one of the two-dimensional segment (Original), the boundary detection label (Label), the detection result of the first existing edge detection algorithm (HED), and the detection result of the second existing edge detection algorithm (RCF) The detection result of the third existing edge detection algorithm (BDCN) and the detection result of the three-dimensional edge detection method (Proposed) of this application.
  • the detection result of the three-dimensional edge detection method provided by the embodiment of the present application is more refined and closer to the real edge of the object.
  • the existing HED, RCF, and BDCN algorithms can accurately detect the edges of objects to varying degrees, their edge detection results are relatively rough and do not fit the real edges.
  • this figure shows the edge detection results of 5 consecutive frames of two-dimensional slices in the edge detection of the three-dimensional edge detection method provided by the embodiment of the application, and the two-dimensional edge detection algorithm RCF Comparison of edge detection results on the same 5 frames of two-dimensional slices. It can be seen from FIG. 8 that the detection result of the three-dimensional edge detection method provided by the embodiment of the present application has good continuity. This is because the three-dimensional edge detection method provided by the embodiments of the present application can complement the information that is easily missed in the two-dimensional edge detection algorithm by learning the direct spatial continuity of different images.
  • the computer equipment also compares the experimental results of the edge detection index based on the three-dimensional edge detection method of the embodiment of the present application and the existing edge detection algorithms (HED and RCF).
  • Table 1 shows the three-dimensional edge detection method provided by the embodiments of this application and the existing two-dimensional edge detection algorithms HED and RCF on the edge detection indicators ODS (R ⁇ P ⁇ F) and OIS (R ⁇ P ⁇ F) Comparison of experimental results. It can be seen from Table 1 that the three-dimensional edge detection method provided by the embodiment of the present application is superior to the existing two-dimensional edge detection algorithm in each edge detection measurement index.
  • a three-dimensional edge detection device 900 includes: an acquisition module 901, an encoding module 902, and a decoding module 903.
  • Each module included in the three-dimensional edge detection device can be implemented in whole or in part by software, hardware or a combination thereof.
  • the obtaining module 901 is used to obtain the two-dimensional object detection result and the two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image; stack the two-dimensional object detection results into the three-dimensional object detection result, and stack the two-dimensional edge detection results It is the result of 3D edge detection.
  • the encoding module 902 is configured to perform encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain the encoding result.
  • the decoding module 903 is configured to decode according to the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain an optimized three-dimensional edge detection result of the three-dimensional image.
  • the encoding module 902 is further configured to perform more than one encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result; the input of each encoding is a pair of the three-dimensional object detection result and the three-dimensional edge detection result.
  • the output of the previous encoding is the result of the calculation; the output of each encoding is different and is a feature map of the three-dimensional image; and the feature map of the last encoding output is obtained to obtain the encoding result.
  • the encoding module 902 is also used to perform a dot multiplication operation on the color feature map of the three-dimensional image and the three-dimensional object detection result, and then add the three-dimensional edge detection result to perform the current encoding; and output the current encoding Perform a dot multiplication operation with the three-dimensional object detection result, and then add it to the three-dimensional edge detection result and perform the next encoding until the last encoding.
  • the decoding module 903 is further configured to perform more than one decoding based on the encoding result, the three-dimensional object detection result, and the three-dimensional edge detection result; the input of each decoding includes the three-dimensional object detection result and the three-dimensional edge detection result. The output of the operation result of the operation; and the last decoded output is obtained, and the optimized 3D edge detection result of the 3D image is obtained.
  • the three-dimensional edge detection device 900 further includes a multi-scale processing module 904, configured to process the encoding result through more than one hole convolution with different sampling rates to obtain more than one feature The size of more than one feature map is different; and the convolution operation is performed after connecting more than one feature map to obtain a multi-scale learning result.
  • the decoding module 903 is also configured to perform more than one decoding based on the multi-scale learning result, the three-dimensional object detection result, and the three-dimensional edge detection result.
  • the decoding module 903 is further configured to perform a dot multiplication operation on the multi-scale learning result and the three-dimensional object detection result, and then add the three-dimensional edge detection result to perform the current decoding together with the output of the intermediate encoding; and The output of the current decoding is multiplied with the result of the three-dimensional object detection, and then added to the result of the three-dimensional edge detection, and the output of the previous encoding of the intermediate encoding is performed together with the output of the previous encoding for the next encoding until the final decoding.
  • the acquiring module 901 is also used to acquire the initial two-dimensional object detection result and the initial two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image; for each two-dimensional segment of the three-dimensional image, the two-dimensional segment The color feature map is multiplied by the two-dimensional initial object detection results of the two-dimensional slices, and then added to the two-dimensional initial object detection results of the two-dimensional slices for object detection to obtain the two-dimensional target objects of the two-dimensional slices The detection result; and the convolution operation is performed on each two-dimensional segment of the three-dimensional image, and the two-dimensional target edge detection result of each two-dimensional segment is obtained according to the output of the convolution operation and the two-dimensional object detection result of the corresponding two-dimensional segment.
  • the acquisition module 901 is further configured to perform the following steps for each frame of the two-dimensional segment of the three-dimensional image: perform the following steps on the color feature map of the two-dimensional segment and the two-dimensional initial object detection result of the two-dimensional segment.
  • the multiplication operation is then added to the two-dimensional initial object detection result of the two-dimensional segment as the data to be processed; and the data to be processed is encoded more than once and decoded more than once to obtain the two-dimensional segment output from the last decoding.
  • Dimension target object detection result is further configured to perform the following steps for each frame of the two-dimensional segment of the three-dimensional image: perform the following steps on the color feature map of the two-dimensional segment and the two-dimensional initial object detection result of the two-dimensional segment.
  • the multiplication operation is then added to the two-dimensional initial object detection result of the two-dimensional segment as the data to be processed; and the data to be processed is encoded more than once and decoded more than once to obtain the two-dimensional segment output from the last decoding.
  • the acquisition module 901 is also used to encode the data to be processed more than once to obtain the object detection encoding result output by the last encoding; perform the object detection encoding result through more than one hole convolution with different sampling rates. Process to obtain more than one feature map; more than one feature map has different sizes; connect more than one feature map and perform convolution operation to obtain a multi-scale learning result; and decode the multi-scale learning result more than once , Get the two-dimensional target object detection result of the two-dimensional segment output from the last decoding.
  • the acquisition module 901 is further configured to perform the following steps for each frame of the two-dimensional slice of the three-dimensional image: perform more than one stage of convolution operation on the two-dimensional slice; The segmented two-dimensional initial object detection results are subjected to a dot multiplication operation, and then added to the output of the current stage to obtain the phase detection result; and the detection results of each phase are combined to obtain the two-dimensional segmented two-dimensional target edge detection result.
  • the above-mentioned three-dimensional edge detection device obtains the two-dimensional object detection result and the two-dimensional edge detection result of each two-dimensional segment of the three-dimensional image, stacks the three-dimensional object detection result and the three-dimensional edge detection result, and then according to the feature map, three-dimensional edge detection result of the three-dimensional image
  • the object detection result and the three-dimensional edge detection result are encoded, and then the three-dimensional object detection result and the three-dimensional edge detection result are combined for decoding to obtain the optimized three-dimensional edge detection result of the three-dimensional image.
  • the two-dimensional detection results of each two-dimensional segment of the three-dimensional image are used in the three-dimensional edge detection, and the characteristics of the two-dimensional detection result and the spatial structure of the three-dimensional data can be skillfully combined.
  • Continuity complements each other to improve the accuracy of three-dimensional edge detection; and two-dimensional detection results include object detection and edge detection two detection results, the two detection results can also learn from each other and promote each other, which further improves The accuracy of 3D edge detection.
  • Fig. 11 shows an internal structure diagram of a computer device in an embodiment.
  • the computer equipment includes the computer equipment including a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor can enable the processor to implement a three-dimensional edge detection method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the three-dimensional edge detection method.
  • 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the three-dimensional edge detection apparatus provided by the present application may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 11.
  • the memory of the computer device can store various program modules that make up the three-dimensional edge detection apparatus, for example, the acquisition module 901, the encoding module 902, and the decoding module 903 shown in FIG. 9.
  • the computer program composed of each program module causes the processor to execute the steps in the three-dimensional edge detection method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 11 can obtain the two-dimensional object detection result and the two-dimensional edge detection result of each two-dimensional slice of the three-dimensional image through the acquisition module 901 in the three-dimensional edge detection apparatus shown in FIG.
  • the encoding module 902 executes the step of encoding according to the feature map of the three-dimensional image, the three-dimensional object detection result, and the three-dimensional edge detection result to obtain the encoding result.
  • the decoding module 903 executes the step of decoding according to the encoding result, the three-dimensional object detection result and the three-dimensional edge detection result to obtain the optimized three-dimensional edge detection result of the three-dimensional image.
  • a computer device which includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the above-mentioned three-dimensional edge detection method.
  • the steps of the three-dimensional edge detection method may be the steps in the three-dimensional edge detection method of each of the foregoing embodiments.
  • a computer-readable storage medium is provided, and a computer program is stored.
  • the computer program is executed by a processor, the processor executes the steps of the above-mentioned three-dimensional edge detection method.
  • the steps of the three-dimensional edge detection method may be the steps in the three-dimensional edge detection method of each of the foregoing embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the steps in the foregoing method embodiments.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种三维边缘检测方法、装置、计算机可读存储介质和计算机设备,所述方法包括:获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;将各所述二维物体检测结果堆叠为三维物体检测结果,并将各所述二维边缘检测结果堆叠为三维边缘检测结果;根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行编码,得到编码结果;及根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果。

Description

三维边缘检测方法、装置、存储介质和计算机设备
本申请要求于2020年2月20日提交中国专利局,申请号为2020101048501,申请名称为“三维边缘检测方法、装置、存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种三维边缘检测方法、装置、存储介质和计算机设备。
背景技术
随着计算机技术的发展,图像处理越来越普遍。边缘检测是图像处理和计算机视觉中的基本问题,可以为其他的计算机视觉任务如语义分割、实例分割以及物体跟踪等提供重要的信息。然而,目前的边缘检测大多为二维图像的边缘检测,很少有技术来解决三维图像的边缘检测问题,且少有的三维图像的边缘检测的准确性也较低,针对这一问题目前尚未提出有效的解决方案。
发明内容
根据本申请提供的各种实施例,提供一种三维边缘检测方法、装置、存储介质和计算机设备。
一种三维边缘检测方法,由计算机设备执行,所述方法包括:
获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;
将各所述二维物体检测结果堆叠为三维物体检测结果,并将各所述二维边缘检测结果堆叠为三维边缘检测结果;
根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行编码,得到编码结果;及
根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果。
一种三维边缘检测装置,包括:
获取模块,用于获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;将各所述二维物体检测结果堆叠为三维物体检测结果,并将各所述二维边缘检测结果堆叠为三维边缘检测结果;
编码模块,用于根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行编码,得到编码结果;及
解码模块,用于根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果。
一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行三维边缘检测方法的步骤。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行三维边缘检测方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中三维边缘检测方法的应用环境图;
图2为一个实施例中三维边缘精细检测网络的结构示意图;
图3为另一个实施例中三维边缘精细检测网络的结构示意图;
图4为一个实施例中三维边缘检测方法所应用的网络结构示意图;
图5为一个实施例中物体检测模型的结构示意图;
图6为一个实施例中边缘检测模型的结构示意图;
图7为一个实施例中多种边缘检测方法的检测结果对比图;
图8为一个实施例中两种边缘检测方法检测结果在连续性上的对比图;
图9为一个实施例中三维边缘检测装置的结构框图;
图10为另一个实施例中三维边缘检测装置的结构框图;及
图11为一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
其中,计算机视觉(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样 模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的计算机视觉、机器学习/深度学习等技术,具体通过如下实施例进行说明。
本申请实施例中涉及的三维边缘检测是图像处理和计算机视觉中的基础,通过对三维图像进行三维的边缘检测,得到三维边缘检测结果可以为其他的计算机视觉如语义分割、物体检测、实例分割或者物体跟踪等提供重要的信息,是非常基础但是也非常重要的计算机视觉任务。在实际应用中,三维边缘检测结果可以为大量的医学图像分割或者检测等任务助力。
如图1所示,在一个实施例中,提供了一种三维边缘检测方法。本实施例主要以该方法应用于计算机设备来举例说明。该计算机设备具体可以是终端或者服务器等。参照图1,该三维边缘检测方法具体包括如下步骤:
S102,获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果。
其中,三维图像是在三个维度上存在信息的图像。将三维图像在其中一个维度上进行划分,即可得到三维图像的各个二维分片。通常情况下,三维图像的三维可以认为是高度、宽度和深度这三个维度;三维图像的二维分片即是在深度维度上对三维图像进行划分,不同的二维分片对应不同的深度位置。当然,在其他实施例中,也可以对三维图像在其他维度上进行划分,在此不做限定。
具体地,计算机设备可采用对二维图像进行物体检测的方式,对三维图像各二维分片进行处理,得到三维图像各二维分片的二维物体检测结果;并 采用对二维图像进行边缘检测的算法,对三维图像各二维分片进行处理,得到三维图像各二维分片的二维边缘检测结果。这里具体进行物体检测的方式和进行边缘检测的方式可参考后续实施例的详细描述。
其中,物体检测的目的是识别出图像中物体所在的像素点区域,边缘检测的目的是识别出图像中像素灰度显著变化的像素点。边缘通常存在于物体和背景之间。物体检测和边缘检测均可以是像素级别的检测,即根据检测任务确定各像素点所述的类别。在本申请实施例中,物体检测可以检测出物体即可,不需要对物体进行分类,即区分不同的物体,可以不必确定是什么物体。
S104,将各二维物体检测结果堆叠为三维物体检测结果,并将各二维边缘检测结果堆叠为三维边缘检测结果。
具体地,三维图像在其中一个维度上划分为多于一帧的二维分片,这些二维分片之间存在一定的顺序关系,即每帧二维分片对应该划分维度上的一个位置(比如深度值),将这些二维分片的二维物体检测结果按照相应二维分片之间的顺序关系堆叠,即可得到三维物体检测结果;将这些二维分片的二维边缘检测结果按照相应二维分片之间的顺序关系堆叠,即可得到三维边缘检测结果。
S106,根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,得到编码结果。
其中,特征图用于反映图像的特征,是图像的特征的一种存在形式。特征图如图像原始的RGB三通道图或者对图像进行卷积操作输出的Feature Map等。
需要说明的是,在对图像进行编码时,常用的做法是直接对图像的特征图进行编码。而本申请实施例中,则是根据特征图、三维物体检测结果和三维边缘检测结果这三种不同的数据进行编码。进一步地,还可以是在对这三种数据进行一定的运算后,对运算结果进行编码。这样可以在编码的过程中,参考三维物体检测结果和三维边缘检测结果,得到更多更有用的信息。
具体地,计算机设备可采用编码器根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,得到编码结果。进一步地,计算机设备可将三维图像的颜色特征图与三维物体检测结果进行点乘操作,再与三维 边缘检测结果相加,再采用编码器对前述运算的运算结果进行编码,得到编码结果。
在一个实施例中,S106包括:根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行多于一次编码;每次编码的输入为三维物体检测结果和三维边缘检测结果对前次编码的输出进行运算的运算结果;各次编码的输出各不相同且均为三维图像的特征图;及获取末次编码输出的特征图得到编码结果。
具体地,计算机设备可采用编码器进行编码,该编码器可以包括多于一个编码阶段,从而可以根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行多于一次的编码操作。这样对编码器的输入经过多于一个层次的编码操作,所得到的编码结果对特征的表示会更加精确。其中,编码器的每个编码阶段的输入,均为三维物体检测结果和三维边缘检测结果对前一个编码阶段的输出进行运算后的运算结果。这样,在每个编码阶段时都可以参考三维物体检测结果和三维边缘检测结果,可以提高编码有效性。编码器的每个编码阶段输出的均为三维图像的特征图,且编码器的每个编码阶段输出的特征图各不相同。计算机设备可将最后一个编码阶段输出的特征图作为编码结果。
在一个具体的实施例中,计算机设备可通过三维边缘精细检测网络(Joint Edge Refinement Network)实现编码过程。参考图2,三维边缘精细检测网络包括编码器,该编码器可包括四个编码阶段,每个编码阶段可包括两个卷积模块,每个卷积模块可包括卷积层、激活函数层和归一化层。其中,激活函数具体可以是ReLU函数等,归一化可以是组归一化(Group Normalization)等。
需要说明的是,图2所示的模型结构仅为举例说明,并不对三维边缘精细检测网络的结构造成限定,实际的三维边缘精细检测网络可以包括比图2所示更多或者更少的组成部分,且图2所包括的结构的参数也可以不同。
在一个实施例中,根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行多于一次编码包括:将三维图像的颜色特征图与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行当次编码;及将当次编码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行下次编码,直至末次编码。
可以理解,物体检测是识别出图像中物体所在的区域,边缘检测是识别出图像中像素灰度显著变化的像素点。由于边缘通常存在于物体和背景之间,那么可以认为物体检测与边缘检测存在一定的相似性。假设令物体检测的结果为D obj,边缘检测的结果为D edg,那么这两种结果之间存在如下逻辑关联:
D obj=D obj∪D edg D edg=D obj∩D edg          (1)
那么可以有F(g(I)·D obj+D edg)=D′ edg,其中,F(·),g(·)为不同的边缘检测算子,I为输入的图像特征,D′ edg是比D edg更准确的边缘检测的结果。可以通俗地理解为:物体检测与边缘检测的交集(即点乘操作)就是边缘检测,而两者的并集(即相加操作)就是物体检测。那么g(I)·D obj+D edg得到边缘检测结果,对该边缘检测结果再使用边缘检测算子,便可得到更加精准的边缘检测结果。
具体地,计算机设备可将三维图像的颜色特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果作为首个编码阶段输入;继而将该编码阶段输出的特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果作为下一个编码阶段输入,直至最后一个编码阶段输出编码结果。
可以理解,根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行第一次编码时,还未有编码输出,则三维图像的特征图可以是三维图像原始的RGB颜色通道特征图;在后续编码时,则可利用前次编码输出的特征图。
在一个具体的实施例中,参考图3,计算机设备可通过相互学习模块(Mutual,M)实现三维图像的特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加的运算。即,相互学习模块(M)的输入为三维图像的特征图(F)、三维物体检测结果(O)和三维边缘检测结果(E);输出为新的三维图像的特征图(F)。具体地,相互学习模块(M)对三维图像的特征图(F)与三维物体检测结果(O)进行点乘操作
Figure PCTCN2020121120-appb-000001
然后与三维边缘检测结果(E)相加
Figure PCTCN2020121120-appb-000002
输出新的三维图像的特征图(F)。其中,对于第一个相互学习模块(M),三维图像的特征图(F)为三维图像的颜色特征图,后续的相互学习模块(M),三维图像的特征图(F)为编码输出的特征图。
继续参考图3,计算机设备可将三维图像的颜色特征图、三维物体检测结果和三维边缘检测结果输入相互学习模块(M),相互学习模块(M)输出至 编码器的第一编码阶段;第一编码阶段进行编码的输出与三维物体检测结果和三维边缘检测结果共同输入相互学习模块(M),相互学习模块(M)输出至编码器的第二编码阶段;第二编码阶段进行编码的输出与三维物体检测结果和三维边缘检测结果共同输入相互学习模块(M),相互学习模块(M)输出至编码器的第三编码阶段;第三编码阶段进行编码的输出与三维物体检测结果和三维边缘检测结果共同输入相互学习模块(M),相互学习模块(M)输出至编码器的第四编码阶段;第四编码阶段进行编码的输出为编码结果。需要说明的是,图3所示的模型结构仅为举例说明,并不对三维边缘精细检测网络的结构造成限定,实际的三维边缘精细检测网络可以包括比图3所示更多或者更少的组成部分,且图3所包括的结构的参数也可以不同。
可以理解,对于第一编码阶段前的相互学习模块(M),由于还未进行编码,所以g(I)·D obj+D edg中g(I)为图像原始的颜色特征图,编码可以看作是一个边缘检测算子,那么第一编码阶段的操作可以看作是进行F(g(I)·D obj+D edg)的运算。对于后面的编码阶段前的相互学习模块(M),由于已经经过了编码,那么g(I)·D obj+D edg中I为输入前一个编码阶段的图像特征,编码可以看作是一个边缘检测算子,g(I)则为前一个编码阶段输出的图像特征。
在本实施例中,在每次编码时,将三维图像的特征图与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后再进行编码,这使得编码时重点关注在感兴趣的物体所在区域,并且已有的潜在的边缘检测结果也在输入特征图中得到了加强和强化,这样可以提高编码输出对特征的表示更加准确。
上述实施例中,在数据进行多于一次的编码,即经过多于一个层次的编码操作,这样所得到的编码输出对特征的表示会更加精确。
S108,根据编码结果、三维物体检测结果和三维边缘检测结果进行解码,得到三维图像的优化的三维边缘检测结果。
需要说明的是,S104中通过堆叠二维的边缘检测结果所得到的三维的边缘检测结果,是相对准确的检测结果。优化的三维边缘检测结果,比堆叠二维的边缘检测结果所得到的三维的边缘检测结果的精确度更高,更贴合物体真实的边缘。优化的三维边缘检测结果,不限定是采用优化算法对堆叠二维的边缘检测结果所得到的三维的边缘检测结果进行优化所得到的结果,也可 以是将堆叠二维的边缘检测结果所得到的三维的边缘检测结果,应用到对三维图像进行边缘检测的具体过程中,所得到的三维边缘检测结果。
另外,还需要说明的是,在对编码结果进行解码时,常用的做法是直接对编码结果进行解码。而本申请实施例中,则是根据编码结果、三维物体检测结果和三维边缘检测结果这三种不同的数据进行解码。进一步地,还可以是在对这三种数据进行一定的运算后,对运算结果进行解码。这样可以在解码的过程中,参考三维物体检测结果和三维边缘检测结果,得到更多更有用的信息。
具体地,计算机设备可采用解码器根据编码结果、三维物体检测结果和三维边缘检测结果进行解码,得到解码结果,即得到三维图像的优化的三维边缘检测结果。进一步地,计算机设备可将编码结果与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加,再采用解码器对前述运算的运算结果进行解码,得到解码结果,即得到三维图像的优化的三维边缘检测结果。
在一个具体的实施例中,三维图像的优化的三维边缘检测结果可以是包括两种像素值的三维图像。其中,一种像素值表示相应的像素点为边缘上的像素点,另一种像素值表示相应的像素点为非边缘的像素点。
在一个具体的实施例中,三维图像的优化的三维边缘检测结果可以是三维概率矩阵。其中,每个矩阵位置的概率值,三维图像相应像素点属于边缘上的像素点的概率,当概率大于预设阈值时,则认为是边缘上的像素点。
在一个实施例中,S108包括:根据编码结果、三维物体检测结果和三维边缘检测结果进行多于一次解码;每次解码的输入包括三维物体检测结果和三维边缘检测结果对前次解码的输出进行运算的运算结果;及获取末次解码输出,得到三维图像的优化的三维边缘检测结果。
具体地,计算机设备可采用解码器进行解码,该解码器可以包括多于一个解码阶段,从而可以根据编码结果、三维物体检测结果和三维边缘检测结果进行多于一次的解码操作。由于对解码器的输入经过多层次的解码操作,可以准确地将编码提取到的特征映射到输出空间。其中,解码器的每个解码阶段的输入,均包括三维物体检测结果和三维边缘检测结果对前一个解码阶段的输出进行运算后的运算结果。这样,在每个解码阶段时都可以参考三维物体检测结果和三维边缘检测结果进行解码,以提高解码有效性。解码器的 每个解码阶段输出的均为三维图像的特征图,且解码器的每个解码阶段输出的特征图各不相同。输出空间可以是是否为三维边界的检测结果。
可以理解,最后一个解码阶段输出的特征图,可以是三维图像的优化的三维边缘检测结果。该优化的三维边缘检测结果具体可以是三维图像各像素点的分类图。该分类图上像素点的像素值表示三维图像相应像素点所属的类别。这里的类别包括两种,一种为属于边缘的类别,另一种为不属于边缘的类别。比如,分类图上像素点的像素值包括两种(0和1),0表示三维图像相应像素点不是边缘像素点,1表示三维图像相应像素点是边缘像素点。也就是说,三维图形进行编码和解码两个过程即对三维图像进行三维边缘检测的过程,确定三维图像中的每个像素点是否为三维边缘的像素点。
在另外的实施例中,该优化的三维边缘检测结果具体可以是三维图像各像素点为边缘像素点的概率分布图。该概率分布图上像素点的像素值表示三维图像相应像素点为边缘像素点的概率。
在一个具体的实施例中,继续参考图2,三维边缘精细检测网络可包括解码器,该解码器可包括三个解码阶段,每个解码阶段可包括两个卷积模块,每个卷积模块可包括卷积层、激活函数层和归一化层。其中,激活函数具体可以是ReLU函数等,归一化可以是组归一化(Group Normalization)等。
在另外的实施例中,每次解码的输入还可以包括与当前解码阶段跳跃连接(Skip Connection)的编码阶段的输出。这样可以在解码时还能结合在前的编码所提取的图像特征,从而进一步提高解码准确性。举例说明,假设编码器包括四个阶段,解码器包括三个阶段;那么可以将第一个编码阶段与第三个解码阶段跳跃连接,将第二个编码阶段与第二个解码阶段跳跃连接,将第三个编码阶段与第一个解码阶段跳跃连接。
在另外实施例中,对数据进行多于一次的解码,即经过多于一个层次的解码操作,这样所得到的解码输出对像素点的分类结果会更加精确。
在一个实施例中,根据编码结果、三维物体检测结果和三维边缘检测结果进行多于一次解码包括:将编码结果与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行当次解码;及将当次解码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行下次解码,直至末次解码。
具体地,计算机设备可将编码结果与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果作为首个解码阶段的输入;继而将该解码阶段输出的特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果作为下一个解码阶段的输入,直至最后一个解码阶段输出三维图像的优化的三维边缘检测结果。
在另外的实施例中,计算机设备可将编码结果与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果和与首个解码阶段跳跃连接的编码阶段的输出共同作为首个解码阶段的输入;继而将该解码阶段输出的特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果和与当前解码阶段跳跃连接的编码阶段的输出共同作为下一个解码阶段的输入,直至最后一个解码阶段输出。
上述实施例中,在每次解码时,将三维图像的特征图与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后再进行解码,这使得解码时重点关注在感兴趣的物体所在区域,并且已有的潜在的边缘检测结果也在输入特征图中得到了加强和强化,这样可以提高解码准确性。
在一个实施例中,该三维边缘检测方法还包括:通过多于一个采样率相异的空洞卷积对编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;及将多于一个特征图连接后进行卷积操作,得到多尺度学习结果。根据编码结果、三维物体检测结果和三维边缘检测结果进行多于一次解码,包括:根据多尺度学习结果、三维物体检测结果和三维边缘检测结果进行多于一次解码。
其中,空洞卷积(Atrous Convolutions)又名扩张卷积(Dilated Convolutions),是在标准的卷积层引入了一个称为“扩张率(Dilation Rate)”的参数,该参数定义了卷积核处理数据时各值的间距。空洞卷积的目的是在不用池化(pooling)(pooling层会导致信息损失)且计算量相当的情况下,提供更大的感受野。
具体地,计算机设备可通过多于一个采样率相异的空洞卷积对编码结果进行处理,得到多于一个特征图。由于不同的采样率可以是不同的卷积核大小和/或不同的扩张率,这样得到的多于一个特征图的尺寸各不相同。计算机设备再将多于一个特征图连接后进行卷积操作,得到多尺度学习结果。该多 尺度学习结果也可以是三维图像的特征图。
在一个具体的实施例中,计算机设备可通过多尺度学习模块实现“通过多于一个采样率相异的空洞卷积对编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;将多于一个特征图连接后进行卷积操作,得到多尺度学习结果”,该多尺度学习模块具体可以是空间金字塔结构(Atrous Spatial Pyramid Pooling,ASPP)。继续参考图2,三维边缘精细检测网络还包括位于编码器和解码器之间的ASPP模块。ASPP模块的输入为第四编码阶段输出的编码结果,ASPP模块对输入进行多于一个尺度的特征提取后,输出多尺度学习结果。
上述实施例中,通过多尺度的空洞卷积对编码结果进行操作,这样可以提取到更加丰富的多尺度多视角的图像特征,有助于后续的解码操作。
在一个实施例中,根据多尺度学习结果、三维物体检测结果和三维边缘检测结果进行多于一次解码,包括:将多尺度学习结果与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行当次解码;及将当次解码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行下次解码,直至末次解码。
在一个实施例中,根据多尺度学习结果、三维物体检测结果和三维边缘检测结果进行多于一次解码,包括:将多尺度学习结果与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后,与中间次编码的输出共同进行当次解码;及将当次解码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后,与中间次编码的前次编码的输出共同进行下次编码,直至末次解码。
具体地,计算机设备可将多尺度学习结果与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果和与首个解码阶段跳跃连接的编码阶段的输出共同作为首个解码阶段的输入;继而将该解码阶段输出的特征图与三维物体检测结果进行点乘操作,然后与三维边缘检测结果相加,再将运算结果和与当前解码阶段跳跃连接的编码阶段的输出共同作为下一个解码阶段的输入,直至最后一个解码阶段输出。
继续参考图3,计算机设备可将多尺度学习结果、三维物体检测结果和三维边缘检测结果输入相互学习模块(M),相互学习模块(M)的输出和第三 编码阶段的输出共同输入至解码器的第一解码阶段;第一解码阶段进行解码的输出与三维物体检测结果和三维边缘检测结果共同输入相互学习模块(M),相互学习模块(M)的输出和第二编码阶段的输出共同输入至解码器的第二解码阶段;第二解码阶段进行解码的输出与三维物体检测结果和三维边缘检测结果共同输入相互学习模块(M),相互学习模块(M)的输出和第一编码阶段的输出共同输入至解码器的第三解码阶段;第三解码阶段进行解码的输出为三维图像的优化的三维边缘检测结果(Subtle 3D Edge)。
上述实施例中,通过进行解码操作时,将跳跃连接的编码阶段输出的特征图共同进行解码,使得后续解码的输入即明确了图像特征,又能结合在前的编码所提取的图像特征,从而进一步提高解码准确性。
这样,在基于堆叠二维检测结果得到的三维检测结果,对三维图像的特征图进行编码和解码后,即可得到优化的三维边缘检测结果,从而得到精细的三维边缘(Subtle 3D Edge)。该精细的三维边缘可以为各个医学图像任务如分割、检测或者追踪等提供更多、更丰富的特征和其他视角的辅助结果,为更加精准的医学图像辅助诊断落地的实现助力。
上述三维边缘检测方法,在获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果后,堆叠出三维物体检测结果和三维边缘检测结果,然后根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,再结合三维物体检测结果和三维边缘检测结果进行解码,得到三维图像的优化的三维边缘检测结果。这样在对三维图像进行三维的边缘检测时,将三维图像各二维分片的二维的检测结果用到三维边缘检测中去,可以巧妙地将二维检测结果的特征和三维数据的空间结构连续性进行相互补充,进而提高三维边缘检测的准确性;而且二维的检测结果包括物体检测和边缘检测两种检测结果,这两种检测结果之间也可相互学习相互促进,继而进一步提高了三维边缘检测的准确性。
在一个实施例中,上述实施例中S106和S108可通过三维边缘精细检测网络(Joint Edge Refinement Network)实现。该三维边缘精细检测网络可包括编码器和解码器。该编码器可包括多个编码阶段,该解码器可包括多个解码阶段。
其中,首个编码阶段的输入可以是三维图形的颜色特征图、三维物体检 测结果和三维边缘检测结果进行运算的运算结果,非首个编码阶段的输入可以是前一个编码阶段的输出、三维物体检测结果和三维边缘检测结果进行运算的运算结果。首个解码阶段的输入可以包括编码结果、三维物体检测结果和三维边缘检测结果进行运算的运算结果,非首个解码阶段的输入可以包括前一个解码阶段的输出、三维物体检测结果和三维边缘检测结果进行运算的运算结果。
在另外的实施例中,编码(解码)阶段的输入所包括的三种数据的运算结果,可通过相互学习模块实现这三种数据之间的运算。
在另外的实施例中,每个解码阶段的输入还可以包括与当前解码阶段跳跃连接的编码阶段的输出。
在另外的实施例中,该三维边缘精细检测网络还可以包括位于编码器和解码器之间的多尺度学习模块(如ASPP)。多尺度学习模块的输入为最后一个编码阶段的输出。此时,首个解码阶段的输入可以是多尺度学习模块的输出、三维物体检测结果和三维边缘检测结果进行运算的运算结果。
以上实施例仅表达了本申请三维边缘精细检测网络的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请三维边缘精细检测网络的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请三维边缘精细检测网络构思的前提下,还可以做出若干变形和改进,这些都属于本申请要保护的三维边缘精细检测网络。
以上实施例所提供的三维边缘精细检测网络,可通过有训练标签的训练样本进行深度监督(Deep Supervision)学习得到。继续参考图2,该网络中包括的各结构均可以通过深度监督(Deep Supervision)学习得到。
具体地,输入三维边缘精细检测网络的训练样本为三维图像样本、三维图像样本各二维分片的二维物体检测结果所堆叠成的三维物体检测结果、及三维图像样本各二维分片的二维边缘检测结果所堆叠成的三维边缘检测结果。训练样本的训练标签为三维图像样本的三维边缘标签。计算机设备可根据训练样本和训练标签,并构建损失函数,有监督地训练三维边缘精细检测网络。
在一个具体的实施例中,上述有监督训练的损失函数可以是Dice Loss损失函数,该损失函数具体如下式所示:
Figure PCTCN2020121120-appb-000003
其中,N为三维图像中像素点的数量,p i为第i个像素点为边缘像素点的概率,y i为第i个像素点的训练标签。
在一个实施例中,获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果,包括:获取三维图像各二维分片的二维初始物体检测结果和二维初始边缘检测结果;对于三维图像各二维分片,将二维分片的颜色特征图与二维分片的二维初始边缘检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加后进行物体检测,得到二维分片的二维目标物体检测结果;及对于三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果。
需要说明的是,计算机设备可采用不同的网络来独立进行物体检测和边缘检测。如,采用物体检测模型(Object Detection Module)进行物体检测,采用边缘检测模型(Edge Detection Module)进行边缘检测。但是由于物体检测和边缘检测中有很多特征是可以相互学习、相互影响和相互促进的,那么可以将物体检测和边缘检测中提取的特征在网络训练和网络使用的过程中相互输送。
结合前述实施例中描述的式(1),以及相关的逻辑原理描述,计算机设备可通过相互学习模块来实现物体检测和边缘检测中提取的特征的相互输送。相互学习模块具体可以是进行如下运算g(I)·D obj+D edg,即图像特征与物体检测结果进行点乘操作,再与边缘检测结果相加。
具体地,计算机设备可分别有监督地预训练物体检测模型和边缘检测模型。在预训练后,然后通过相互学习模块联系这两个模型,得到物体和边缘联合检测网络(Mutual Object and Edge Detection Network),再进一步训练物体和边缘联合检测网络。举例说明,参考图4,物体和边缘联合检测网络可以是在物体检测模型前增加相互学习模块和/或在边缘检测模型后增加相互学习模块。
其中,预训练得到的物体检测模型和边缘检测模型,用于根据二维图像得到该二维图像的初始的二维检测结果。进一步训练得到的物体和边缘联合检测网络,用于根据二维图像得到该二维图像的目标的二维检测结果。该目 标的二维检测结果用于堆叠成三维检测结果,用于如S106和S108的步骤中。
这样,计算机设备可将三维图像各二维分片分别输入预训练得到的物体检测模型,得到各二维分片的二维初始物体检测结果;并将三维图像各二维分片分别输入预训练得到的边缘检测模型,得到各二维分片的二维初始边缘检测结果。此后,计算机设备再将三维图像各二维分片输入物体和边缘联合检测网络,物体和边缘联合检测网络中在物体检测模型前的相互学习模块,将二维分片的颜色特征图与二维分片的二维初始边缘检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加后输入物体和边缘联合检测网络中的物体检测模型,输出二维分片的二维目标物体检测结果。物体和边缘联合检测网络中的边缘检测模型对二维分片进行卷积操作,物体和边缘联合检测网络中在边缘检测模型后的相互学习模块,将卷积操作的输出和二维物体检测结果进行点乘操作,再与卷积操作的输出相加后,得到二维分片的二维目标边缘检测结果。
上述实施例中,物体检测与边缘检测相互学习相互促进,使得得到的二维的检测结果更加准确,从而可以使得后续的三维检测时的参考数据更加准确。
在一个实施例中,对于三维图像各二维分片,将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加后进行物体检测,得到二维分片的二维目标物体检测结果,包括:对于三维图像的每帧二维分片分别执行以下步骤:将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加作为待处理数据;及对待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的二维分片的二维目标物体检测结果。
具体地,计算机设备可采用编码器进行编码、解码器进行解码;该编码器可以包括多于一个编码阶段;该解码器可以包括多于一个解码阶段。这样,计算机设备可将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,然后与二维分片的二维初始物体检测结果相加作为待处理数据,再将待处理数据作为首个编码阶段的输入;继而将该编码阶段输出的特征图作为下一个编码阶段的输入,直至最后一个编码阶段输出编码结果。然 后将该编码结果作为首个解码阶段的输入;继而将该解码阶段输出的特征图作为下一个解码阶段的输入,直至最后一个解码阶段输出二维目标物体检测结果。
需要说明的是,本实施例中的编码器与S106中的编码器是不同的编码器,他们结构不同,所编码的数据的维度也不同。本实施例中的解码器与S108中的解码器是不同的解码器,他们结构不同,所解码的数据的维度也不同。
在另外的实施例中,编码阶段还可以与解码阶段跳跃连接。此时,解码器的首个解码阶段的输入可以是:最后一个编码阶段的输出和跳跃连接的编码阶段的输出,后续的解码阶段的输入可以是:前一个解码阶段的输出和跳跃连接的编码阶段的输出。
上述实施例中,在对二维图像进行编码时,与初始的检测结果进行运算后再作为编码对象,可以在编码时参考初始的检测结果,关注特定的区域提取更有用的信息;而且多于一次的编码可以使得特征的表示更加准确,多于一次的解码可以使所得到的解码输出对像素点的分类结果更加精确。
在一个实施例中,对待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的二维分片的二维目标物体检测结果,包括:对待处理数据进行多于一次编码,得到末次编码所输出的物体检测编码结果;通过多于一个采样率相异的空洞卷积对物体检测编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;将多于一个特征图连接后进行卷积操作,得到多尺度学习结果;及对多尺度学习结果进行多于一次解码,得到末次解码所输出的二维分片的二维目标物体检测结果。
具体地,计算机设备可采用编码器进行编码、解码器进行解码;该编码器可以包括多于一个编码阶段;该解码器可以包括多于一个解码阶段。这样,计算机设备可将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,然后与二维分片的二维初始物体检测结果相加作为待处理数据,再将待处理数据作为首个编码阶段的输入;继而将该编码阶段输出的特征图作为下一个编码阶段的输入,直至最后一个编码阶段输出编码结果。
然后,通过多于一个采样率相异的空洞卷积对物体检测编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;将多于一个特征图连接后进行卷积操作,得到多尺度学习结果。该过程具体可通过多尺度 学习模块实现。多尺度学习模块具体如ASPP结构。
此后,再将该多尺度学习结果作为首个解码阶段的输入;继而将该解码阶段输出的特征图作为下一个解码阶段的输入,直至最后一个解码阶段输出二维目标物体检测结果。当然,在另外的实施例中,编码阶段还可以与解码阶段跳跃连接。此时,解码器的首个解码阶段的输入可以是:多尺度学习模块的输出和跳跃连接的编码阶段的输出,后续的解码阶段的输入可以是:前一个解码阶段的输出和跳跃连接的编码阶段的输出。
上述实施例中,通过多尺度的空洞卷积对编码结果进行操作,这样可以提取到更加丰富的多尺度多视角的图像特征,有助于后续的解码操作。
在一个具体的实施例中,参考图5,物体检测模型的输入(Input)为三维图像的二维分片,输出(Output)为二维分片的物体检测结果。物体检测模型包括编码器、解码器和位于编码器和解码器之间的ASPP模块。编码器包括一个输入层和四个编码阶段。输入层包括一个残差模块,四个编码阶段分别包括4、6、6、4个残差模块。每个编码阶段的输入和输出由相加操作连接,每个编码阶段之后均连接着一个卷积操作(核大小如3×3)和平均池化操作(核大小如2×2),将特征图降采样(如降采样到一半大小)。解码器包括四个解码阶段和一个输出卷积层。每个解码阶段包括两个残差模块,在每个解码阶段前有一个上采样(如二倍上采样)和卷积操作(核大小如1×1)。编码阶段和解码阶段可以跳跃连接,输入层和输出层也可以跳跃连接。其中,每个残差模块包括两个卷积模块,每个卷积模块包括卷积层、归一化层和激活函数层。归一化可以是批归一化(Batch Normalization)。激活函数可以是ReLU函数。在训练物体检测模型时,可将损失函数记为L seg进行有监督训练。
需要说明的是,图5所示的模型结构仅为举例说明,并不对物体检测模型的结构造成限定,实际的物体检测模型可以包括比图5所示更多或者更少的组成部分,且图5所包括的结构的参数也可以不同。
具体地,计算机设备可根据训练样本(二维图像)和训练样本的训练标签(物体检测标签),并构建损失函数,有监督地训练物体检测模型。
在一个具体的实施例中,上述有监督训练的损失函数可以是二分类的交叉墒损失函数,该损失函数具体如下式所示:
Figure PCTCN2020121120-appb-000004
其中,y为图像像素级标签,p为模型预测的标签为1的像素属于该类别的概率值。标签为1具体可以表示像素点为物体的像素点。
在一个实施例中,对于三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果,包括:对于三维图像的每帧二维分片分别执行以下步骤:对二维分片进行多于一个阶段的卷积操作;将各阶段的输出与二维分片的二维初始物体检测结果进行点乘操作,再与当前阶段的输出相加得到阶段检测结果;及联合各阶段检测结果,得到二维分片的二维目标边缘检测结果。
具体地,计算机设备可对每帧二维分片进行多于一个阶段的卷积操作,然后将各阶段的输出与二维分片的二维初始物体检测结果进行点乘操作,再与当前阶段的输出相加得到阶段检测结果;联合各阶段检测结果,得到二维分片的二维目标边缘检测结果。
其中,每个阶段包括多于一层卷积层。每个阶段检测结果也可以作为二维分片的二维目标边缘检测结果。联合各阶段检测结果可以是将各阶段检测结果按位相加(Element-wise Addition)。
在另外的实施例中,各阶段的输出可以是该阶段包括的各卷积层的输出按位相加的结果。各卷积层的输出可先经过卷积操作后再按位相加。各阶段的输出与二维分片的二维初始物体检测结果进行点乘操作之前,可经过下采样操作、卷积操作以及上采样操作。
上述实施例中,对数据进行多于一个阶段的卷积操作,在每个阶段将输出与物体检测结果进行运算之后得到该阶段的边缘检测结果,可以结合物体检测结果,提高边缘检测的准确率;而且联合各阶段检测结果,得到二维分片的二维目标边缘检测结果,可以将各个阶段提取的信息综合起来,提高边缘检测的准确率。
在一个具体的实施例中,边缘检测模型的输入(Input)为三维图像的二维分片,输出(Output)为二维分片的边缘检测结果。边缘检测模型包括多于一个卷积层,这些卷积层被划分为多于一个阶段。比如,参考图6,边缘检测模型包括16个核大小为3×3卷积层,这些卷积层被划分为5个阶段,第一个阶段包括2个卷积层,第二、三个阶段包括3个卷积层,第四、五个阶段包括4个卷积层。每个阶段中的每个卷积层都在连接了一个核大小为1×1的 卷积操作之后相加起来得到每个阶段的特征图,该特征图经过1×1的卷积操作和二倍上采样之后,与物体检测结果一起输入到前文所提的相互学习模块M,得到的五个输出连接起来得到二维分片的边缘检测结果。其中,每个阶段得到该阶段的特征图后可以有一个池化操作对特征图进行二倍降采样。需要说明的是,图6所示的模型结构仅为举例说明,并不对物体检测模型的结构造成限定,实际的物体检测模型可以包括比图6所示更多或者更少的组成部分,且图6所包括的结构的参数也可以不同。
其中,相互学习模块M(g(I)·D obj+D edg)中各变量的取值具体为:g(I)和D edg均为当前阶段输出的特征图经过卷积操作和上采样之后的结果,D obj为预训练的物体检测模型输出的物体检测结果。在训练物体检测模型时,可将损失函数记为L edge进行有监督训练;且在构建损失函数时,可对每个阶段均构建一个损失函数,该每个阶段的损失函数用于训练更新当前阶段及当前阶段之前各阶段的模型参数,也可仅训练更新当前阶段的模型参数。
具体地,计算机设备可根据训练样本(二维图像)和训练样本的训练标签(边缘检测标签),并构建监督训练损失函数,有监督地训练物体检测模型。
在一个具体的实施例中,上述有监督训练的损失函数可以是Focal loss损失函数,该损失函数具体如下式所示:
FL(p)=-α(1-p) γlog(p)        (4)
其中,p为模型预测的标签为1的像素属于该类别的概率值,α为标签为1的权重因子,γ是可调控的聚焦因子来调节调控因子(1-p) γ。标签为1具体可以表示像素点为边缘的像素点。每个阶段经过M后的输出与在前阶段的输出、以及所有阶段的输出按位相加的一共6个输出进行L edge的计算反传和梯度更新。
举例说明,如图6所示,这6个输出包括:第一个相互学习模块的输出,第一、二个相互学习模块的输出的按位相加(Element-wise Addition)的结果,第一、二、三个相互学习模块的输出的按位相加结果,第一、二、三、四个相互学习模块的输出的按位相加结果,第一、二、三、四、五个相互学习模块的输出的按位相加结果,以及第一、二、三、四、五个相互学习模块的输出连接(Concatention)的结果。
其中,预训练的边缘检测模型可以不包括相互学习模块。即,每个阶段 的特征图经过卷积操作和上采样之后按位相加,得到二维图像的边缘检测结果。
在一个具体的实施例中,继续参考图4,物体和边缘联合检测网络的输入(Input)为三维图像的二维分片。二维分片在输入物体和边缘联合检测网络中的物体检测模型之前,先经过相互学习模块的处理。相互学习模块包括三个输入,分别为三维图像的二维分片、二维分片的二维初始物体检测结果和二维分片的二维初始边缘检测结果。二维分片的二维初始物体检测结果通过预训练得到的物体检测模型得到,二维分片的二维初始边缘检测结果通过预训练得到的边缘检测模型得到。其中,相互学习模块包括的三个输入,也可以分别为三维图像的二维分片、二维分片的二维初始物体检测结果,以及物体和边缘联合检测网络中边缘检测模型的输出。物体和边缘联合检测网络中物体检测模型的输出为二维分片的二维目标物体检测结果。
另外,二维分片在输入物体和边缘联合检测网络中的边缘检测模型后,边缘检测模型各阶段的输出在经过相互学习模块的处理后叠加起来,即得到二维分片的二维目标物体检测结果。每个阶段后连接的相互学习模块包括两个输入,分别为该阶段的输出和二维分片的二维初始物体检测结果。这里,相互学习模块g(I)·D obj+D edg中g(I)和D edg均为该阶段的输出,故只有两个输入。
需要说明的是,预训练得到的物体检测模型,与物体和边缘联合检测网络中的物体检测模型的模型结构相同,但模型参数不同;物体和边缘联合检测网络中的物体检测模型,是在预训练得到的物体检测模型的基础上进一步训练得到。预训练得到的边缘检测模型,与物体和边缘联合检测网络中的边缘检测模型的模型结构相同,但模型参数不同;物体和边缘联合检测网络中的边缘检测模型,是在预训练得到的边缘检测模型的基础上进一步训练得到。物体检测模型的模型结构可参考图5所示的模型结构,边缘检测模型的模型结构可参考图6所示的模型结构。
其中,预训练得到的物体检测模型和边缘检测模型在通过相互学习模块关联后进一步训练时,物体检测模型连接的相互学习模块的输入可以是三维图像的二维分片、预训练的物体检测模型的输出和预训练的边缘检测模型的输出;或者,三维图像的二维分片、预训练的物体检测模型的输出,以及当前边缘检测模型实时的输出。也就是说,g(I)·D obj+D edg中的D obj是与旋律得到的 模型的固定的输出;g(I)·D obj+D edg中的D edg可以是正在训练的模型实时的输出,也可以预训练得到的模型固定的输出。边缘检测模型各阶段连接的相互学习模块的输入可以是各阶段实时的输出和当前物体检测模型实时的输出;或者,各阶段实时的输出和预训练的物体检测模型的输出。也就是说,g(I)·D obj+D edg中的g(I)和D edg都是边缘检测模型各阶段实时的输出;g(I)·D obj+D edg中的D obj可以是正在训练的模型实时的输出,也可以预训练得到的模型固定的输出。
在得到三维图像各二维分片的二维目标物体检测结果和二维目标边缘检测结果之后,可将各二维目标物体检测结果堆叠为三维物体检测结果,并将各二维目标边缘检测结果堆叠为三维边缘检测结果。将三维图像、三维物体检测结果和三维边缘检测结果输入三维边缘精细检测网络,得到输出的优化的三维边缘检测结果,精细的三维边缘图。三维边缘精细检测网络的模型结构可参考图2所示的模型结构。
可以理解,二维的卷积神经网络可以从二维数据中学习到图像丰富的纹理、结构等特征,而三维的卷积神经网络可以从三维数据中学习到空间结构连续性相关信息,这两部分是可以相互补充的。此外,边缘检测任务和物体检测任务有一部分相似性,这两个任务也是可以相互学习、相互促进的。综合以上思考,本申请所提供的实施例实现联合学习二维和三维数据中的多层次、多尺度特征来精确检测三维物体边缘。本申请所提供的实施例涉及的网络结构包括两个阶段,第一阶段是物体和边缘联合检测网络,该阶段着重学习单张二维图像中物体丰富的结构、纹理、边缘以及语义等特征。第二阶段是三维边缘精细检测网络,该阶段则结合前一个阶段学到的物体和边缘检测结果,进一步学习连续且精细的三维物体边缘。这样本申请所提供的实施例可以精确的检测到贴合三维物体真实边缘的三维边缘。
另外,计算机设备还基于本申请实施例的三维边缘检测方法与现有的多个边缘检测算法进行了测试对比。现有的边缘检测算法如:Holistically-Nested Edge Detection,HED;Richer Convolutional Features for Edge Detection,RCF;以及Bi-Directional Cascade Network for Perceptual Edge Detection,BDCN。
在一个具体的实施例中,如图7所示,该图为本申请实施例提供的三维边缘检测方法所得到的检测结果与其他边缘检测算法所得到的检测结果的对比图。其中包括三维图像其中一个二维分片(Original)、边界检测标签(Label)、 第一种现有边缘检测算法(HED)的检测结果、第二种现有边缘检测算法(RCF)的检测结果、第三种现有边缘检测算法(BDCN)的检测结果以及本申请三维边缘检测方法(Proposed)的检测结果。由图7可见,本申请实施例提供的三维边缘检测方法的检测结果更加的精细,更贴近物体的真实边缘。现有的HED、RCF以及BDCN算法虽然均能在不同程度上面准确的检测到物体边缘,但是其边缘检测结果较为粗糙,不够贴合真实的边缘。
在一个具体的实施例中,如图8所示,该图为本申请实施例提供的三维边缘检测方法在边缘检测上连续5帧二维分片的边缘检测结果,与二维边缘检测算法RCF在同样5帧二维分片上的边缘检测结果的对比图。由图8可见,本申请实施例提供的三维边缘检测方法的检测结果具有较好的连续性。这是由于本申请实施例提供的三维边缘检测方法可以通过学习不同图像直接的空间连续性将二维边缘检测算法中容易漏掉的信息补全回来。
而且,计算机设备还基于本申请实施例的三维边缘检测方法与现有的边缘检测算法(HED和RCF)在边缘检测指标上进行了实验结果对比。
表1:
  ODSR ODSP ODSF OISR OISP OISF
HED 0.5598 0.883 0.6852 0.567 0.8854 0.6913
RCF 0.7068 0.9941 0.8084 0.7115 0.9457 0.812
Proposed 0.7593 0.9549 0.846 0.7597 0.9553 0.8463
上述表1所示为本申请实施例提供的三维边缘检测方法与现有的二维边缘检测算法HED和RCF在边缘检测指标ODS(R\P\F)和OIS(R\P\F)上的实验结果对比。由表1可知,本申请实施例提供的三维边缘检测方法在各个边缘检测衡量指标上均优于现有的二维边缘检测算法。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一 部分轮流或者交替地执行。
如图9所示,在一个实施例中,提供了一种三维边缘检测装置900。参照图9,该三维边缘检测装置900包括:获取模块901、编码模块902和解码模块903。三维边缘检测装置中包括的各个模块可全部或部分通过软件、硬件或其组合来实现。
获取模块901,用于获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;将各二维物体检测结果堆叠为三维物体检测结果,并将各二维边缘检测结果堆叠为三维边缘检测结果。
编码模块902,用于根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,得到编码结果。
解码模块903,用于根据编码结果、三维物体检测结果和三维边缘检测结果进行解码,得到三维图像的优化的三维边缘检测结果。
在一个实施例中,编码模块902还用于根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行多于一次编码;每次编码的输入为三维物体检测结果和三维边缘检测结果对前次编码的输出进行运算的运算结果;各次编码的输出各不相同且均为三维图像的特征图;及获取末次编码输出的特征图得到编码结果。
在一个实施例中,编码模块902还用于将三维图像的颜色特征图与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行当次编码;及将当次编码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后进行下次编码,直至末次编码。
在一个实施例中,解码模块903还用于根据编码结果、三维物体检测结果和三维边缘检测结果进行多于一次解码;每次解码的输入包括三维物体检测结果和三维边缘检测结果对前次解码的输出进行运算的运算结果;及获取末次解码输出,得到三维图像的优化的三维边缘检测结果。
如图10所示,在一个实施例中,三维边缘检测装置900还包括多尺度处理模块904,用于通过多于一个采样率相异的空洞卷积对编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;及将多于一个特征图连接后进行卷积操作,得到多尺度学习结果。解码模块903还用于根据多尺度学习结果、三维物体检测结果和三维边缘检测结果进行多于一次解码。
在一个实施例中,解码模块903还用于将多尺度学习结果与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后,与中间次编码的输出共同进行当次解码;及将当次解码的输出与三维物体检测结果进行点乘操作,再与三维边缘检测结果相加后,与中间次编码的前次编码的输出共同进行下次编码,直至末次解码。
在一个实施例中,获取模块901还用于获取三维图像各二维分片的二维初始物体检测结果和二维初始边缘检测结果;对于三维图像各二维分片,将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加后进行物体检测,得到二维分片的二维目标物体检测结果;及对于三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果。
在一个实施例中,获取模块901还用于对于三维图像的每帧二维分片分别执行以下步骤:将二维分片的颜色特征图与二维分片的二维初始物体检测结果进行点乘操作,再与二维分片的二维初始物体检测结果相加作为待处理数据;及对待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的二维分片的二维目标物体检测结果。
在一个实施例中,获取模块901还用于对待处理数据进行多于一次编码,得到末次编码所输出的物体检测编码结果;通过多于一个采样率相异的空洞卷积对物体检测编码结果进行处理,得到多于一个特征图;多于一个特征图的尺寸各不相同;将多于一个特征图连接后进行卷积操作,得到多尺度学习结果;及对多尺度学习结果进行多于一次解码,得到末次解码所输出的二维分片的二维目标物体检测结果。
在一个实施例中,获取模块901还用于对于三维图像的每帧二维分片分别执行以下步骤:对二维分片进行多于一个阶段的卷积操作;将各阶段的输出与二维分片的二维初始物体检测结果进行点乘操作,再与当前阶段的输出相加得到阶段检测结果;及联合各阶段检测结果,得到二维分片的二维目标边缘检测结果。
上述三维边缘检测装置,在获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果后,堆叠出三维物体检测结果和三维边缘检测结果, 然后根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,再结合三维物体检测结果和三维边缘检测结果进行解码,得到三维图像的优化的三维边缘检测结果。这样在对三维图像进行三维的边缘检测时,将三维图像各二维分片的二维的检测结果用到三维边缘检测中去,可以巧妙地将二维检测结果的特征和三维数据的空间结构连续性进行相互补充,进而提高三维边缘检测的准确性;而且二维的检测结果包括物体检测和边缘检测两种检测结果,这两种检测结果之间也可相互学习相互促进,继而进一步提高了三维边缘检测的准确性。
图11示出了一个实施例中计算机设备的内部结构图。如图11所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现三维边缘检测方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行三维边缘检测方法。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的三维边缘检测装置可以实现为一种计算机程序的形式,计算机程序可在如图11所示的计算机设备上运行。计算机设备的存储器中可存储组成该三维边缘检测装置的各个程序模块,比如,图9所示的获取模块901、编码模块902和解码模块903。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的三维边缘检测方法中的步骤。
例如,图11所示的计算机设备可以通过如图9所示的三维边缘检测装置中的获取模块901执行获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;将各二维物体检测结果堆叠为三维物体检测结果,并将各二维边缘检测结果堆叠为三维边缘检测结果的步骤。通过编码模块902执行根据三维图像的特征图、三维物体检测结果和三维边缘检测结果进行编码,得到编码结果的步骤。通过解码模块903执行根据编码结果、三维物体检测结 果和三维边缘检测结果进行解码,得到三维图像的优化的三维边缘检测结果的步骤。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述三维边缘检测方法的步骤。此处三维边缘检测方法的步骤可以是上述各个实施例的三维边缘检测方法中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述三维边缘检测方法的步骤。此处三维边缘检测方法的步骤可以是上述各个实施例的三维边缘检测方法中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技 术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (22)

  1. 一种三维边缘检测方法,由计算机设备执行,所述方法包括:
    获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;
    将各所述二维物体检测结果堆叠为三维物体检测结果,并将各所述二维边缘检测结果堆叠为三维边缘检测结果;
    根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行编码,得到编码结果;及
    根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行编码,得到编码结果,包括:
    根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行多于一次编码;每次编码的输入为所述三维物体检测结果和所述三维边缘检测结果对前次编码的输出进行运算的运算结果;各次编码的输出各不相同且均为所述三维图像的特征图;及
    获取末次编码输出的特征图得到编码结果。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行多于一次编码包括:
    将所述三维图像的颜色特征图与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后进行当次编码;及
    将当次编码的输出与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后进行下次编码,直至末次编码。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果,包括:
    根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码;每次解码的输入包括所述三维物体检测结果和所述三维边缘检测结果对前次解码的输出进行运算的运算结果;及
    获取末次解码输出,得到所述三维图像的优化的三维边缘检测结果。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    通过多于一个采样率相异的空洞卷积对所述编码结果进行处理,得到多于一个特征图;所述多于一个特征图的尺寸各不相同;及
    将所述多于一个特征图连接后进行卷积操作,得到多尺度学习结果;
    所述根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码,包括:
    根据所述多尺度学习结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述多尺度学习结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码,包括:
    将所述多尺度学习结果与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后,与中间次编码的输出共同进行当次解码;及
    将当次解码的输出与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后,与所述中间次编码的前次编码的输出共同进行下次编码,直至末次解码。
  7. 根据权利要求1所述的方法,其特征在于,所述获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果,包括:
    获取三维图像各二维分片的二维初始物体检测结果和二维初始边缘检测结果;
    对于所述三维图像各二维分片,将所述二维分片的颜色特征图与所述二维分片的二维初始物体检测结果进行点乘操作,再与所述二维分片的二维初始物体检测结果相加后进行物体检测,得到所述二维分片的二维目标物体检测结果;及
    对于所述三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果。
  8. 根据权利要求7所述的方法,其特征在于,所述对于所述三维图像各二维分片,将所述二维分片的颜色特征图与所述二维分片的二维初始物体检测结果进行点乘操作,再与所述二维分片的二维初始物体检测结果相加后进 行物体检测,得到所述二维分片的二维目标物体检测结果,包括:
    对于所述三维图像的每帧二维分片分别执行以下步骤:
    将所述二维分片的颜色特征图与所述二维分片的二维初始物体检测结果进行点乘操作,再与所述二维分片的二维初始物体检测结果相加作为待处理数据;及
    对所述待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的所述二维分片的二维目标物体检测结果。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的所述二维分片的二维目标物体检测结果,包括:
    对所述待处理数据进行多于一次编码,得到末次编码所输出的物体检测编码结果;
    通过多于一个采样率相异的空洞卷积对所述物体检测编码结果进行处理,得到多于一个特征图;所述多于一个特征图的尺寸各不相同;
    将所述多于一个特征图连接后进行卷积操作,得到多尺度学习结果;及
    对所述多尺度学习结果进行多于一次解码,得到末次解码所输出的所述二维分片的二维目标物体检测结果。
  10. 根据权利要求7所述的方法,其特征在于,所述对于所述三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果,包括:
    对于所述三维图像的每帧二维分片分别执行以下步骤:
    对所述二维分片进行多于一个阶段的卷积操作;
    将各阶段的输出与所述二维分片的二维初始物体检测结果进行点乘操作,再与当前阶段的输出相加得到阶段检测结果;及
    联合各所述阶段检测结果,得到所述二维分片的二维目标边缘检测结果。
  11. 一种三维边缘检测装置,包括:
    获取模块,用于获取三维图像各二维分片的二维物体检测结果和二维边缘检测结果;将各所述二维物体检测结果堆叠为三维物体检测结果,并将各所述二维边缘检测结果堆叠为三维边缘检测结果;
    编码模块,用于根据所述三维图像的特征图、所述三维物体检测结果和 所述三维边缘检测结果进行编码,得到编码结果;及
    解码模块,用于根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行解码,得到所述三维图像的优化的三维边缘检测结果。
  12. 根据权利要求11所述的装置,其特征在于,所述编码模块还用于根据所述三维图像的特征图、所述三维物体检测结果和所述三维边缘检测结果进行多于一次编码;每次编码的输入为所述三维物体检测结果和所述三维边缘检测结果对前次编码的输出进行运算的运算结果;各次编码的输出各不相同且均为所述三维图像的特征图;及获取末次编码输出的特征图得到编码结果。
  13. 根据权利要求12所述的装置,其特征在于,所述编码模块还用于将所述三维图像的颜色特征图与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后进行当次编码;及将当次编码的输出与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后进行下次编码,直至末次编码。
  14. 根据权利要求11所述的装置,其特征在于,所述解码模块还用于根据所述编码结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码;每次解码的输入包括所述三维物体检测结果和所述三维边缘检测结果对前次解码的输出进行运算的运算结果;及获取末次解码输出,得到所述三维图像的优化的三维边缘检测结果。
  15. 根据权利要求14所述的装置,其特征在于,所述装置还包括:
    多尺度处理模块,用于通过多于一个采样率相异的空洞卷积对所述编码结果进行处理,得到多于一个特征图;所述多于一个特征图的尺寸各不相同;及将所述多于一个特征图连接后进行卷积操作,得到多尺度学习结果;
    所述解码模块还用于根据所述多尺度学习结果、所述三维物体检测结果和所述三维边缘检测结果进行多于一次解码。
  16. 根据权利要求15所述的装置,其特征在于,所述解码模块还用于将所述多尺度学习结果与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后,与中间次编码的输出共同进行当次解码;及将当次解码的输出与所述三维物体检测结果进行点乘操作,再与所述三维边缘检测结果相加后,与所述中间次编码的前次编码的输出共同进行下次编码,直至末 次解码。
  17. 根据权利要求11所述的装置,其特征在于,所述获取模块还用于获取三维图像各二维分片的二维初始物体检测结果和二维初始边缘检测结果;对于所述三维图像各二维分片,将所述二维分片的颜色特征图与所述二维分片的二维初始物体检测结果进行点乘操作,再与所述二维分片的二维初始物体检测结果相加后进行物体检测,得到所述二维分片的二维目标物体检测结果;及对于所述三维图像各二维分片进行卷积操作,根据卷积操作的输出和相应二维分片的二维物体检测结果,得到各二维分片的二维目标边缘检测结果。
  18. 根据权利要求17所述的装置,其特征在于,所述获取模块还用于对于所述三维图像的每帧二维分片分别执行以下步骤:将所述二维分片的颜色特征图与所述二维分片的二维初始物体检测结果进行点乘操作,再与所述二维分片的二维初始物体检测结果相加作为待处理数据;及对所述待处理数据进行多于一次编码以及多于一次解码,得到末次解码所输出的所述二维分片的二维目标物体检测结果。
  19. 根据权利要求18所述的装置,其特征在于,所述获取模块还用于对所述待处理数据进行多于一次编码,得到末次编码所输出的物体检测编码结果;通过多于一个采样率相异的空洞卷积对所述物体检测编码结果进行处理,得到多于一个特征图;所述多于一个特征图的尺寸各不相同;将所述多于一个特征图连接后进行卷积操作,得到多尺度学习结果;及对所述多尺度学习结果进行多于一次解码,得到末次解码所输出的所述二维分片的二维目标物体检测结果。
  20. 根据权利要求17所述的装置,其特征在于,所述所述获取模块还用于对于所述三维图像的每帧二维分片分别执行以下步骤:对所述二维分片进行多于一个阶段的卷积操作;将各阶段的输出与所述二维分片的二维初始物体检测结果进行点乘操作,再与当前阶段的输出相加得到阶段检测结果;及联合各所述阶段检测结果,得到所述二维分片的二维目标边缘检测结果。
  21. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如权利要求1至10中任一项所述的方法的步骤。
  22. 一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至10中任一项所述的方法的步骤。
PCT/CN2020/121120 2020-02-20 2020-10-15 三维边缘检测方法、装置、存储介质和计算机设备 WO2021164280A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20920101.1A EP4016454A4 (en) 2020-02-20 2020-10-15 METHOD AND DEVICE FOR THREE-DIMENSIONAL EDGE DETECTION, STORAGE MEDIUM AND COMPUTER DEVICE
JP2022522367A JP7337268B2 (ja) 2020-02-20 2020-10-15 三次元エッジ検出方法、装置、コンピュータプログラム及びコンピュータ機器
US17/703,829 US20220215558A1 (en) 2020-02-20 2022-03-24 Method and apparatus for three-dimensional edge detection, storage medium, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010104850.1 2020-02-20
CN202010104850.1A CN111325766B (zh) 2020-02-20 2020-02-20 三维边缘检测方法、装置、存储介质和计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/703,829 Continuation US20220215558A1 (en) 2020-02-20 2022-03-24 Method and apparatus for three-dimensional edge detection, storage medium, and computer device

Publications (1)

Publication Number Publication Date
WO2021164280A1 true WO2021164280A1 (zh) 2021-08-26

Family

ID=71172782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121120 WO2021164280A1 (zh) 2020-02-20 2020-10-15 三维边缘检测方法、装置、存储介质和计算机设备

Country Status (5)

Country Link
US (1) US20220215558A1 (zh)
EP (1) EP4016454A4 (zh)
JP (1) JP7337268B2 (zh)
CN (1) CN111325766B (zh)
WO (1) WO2021164280A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222745A (zh) * 2022-09-21 2022-10-21 南通未来文化科技有限公司 基于光学信息的古筝面板材料检测方法
CN115841625A (zh) * 2023-02-23 2023-03-24 杭州电子科技大学 一种基于改进U-Net模型的遥感建筑物影像提取方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325766B (zh) * 2020-02-20 2023-08-25 腾讯科技(深圳)有限公司 三维边缘检测方法、装置、存储介质和计算机设备
CN112991465A (zh) * 2021-03-26 2021-06-18 禾多科技(北京)有限公司 相机标定方法、装置、电子设备和计算机可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018183548A1 (en) * 2017-03-30 2018-10-04 Hologic, Inc. System and method for hierarchical multi-level feature image synthesis and representation
CN109410185A (zh) * 2018-10-10 2019-03-01 腾讯科技(深圳)有限公司 一种图像分割方法、装置和存储介质
CN109598722A (zh) * 2018-12-10 2019-04-09 杭州帝视科技有限公司 基于递归神经网络的图像分析方法
CN109872325A (zh) * 2019-01-17 2019-06-11 东北大学 基于双路三维卷积神经网络的全自动肝脏肿瘤分割方法
CN110648337A (zh) * 2019-09-23 2020-01-03 武汉联影医疗科技有限公司 髋关节分割方法、装置、电子设备和存储介质
CN111325766A (zh) * 2020-02-20 2020-06-23 腾讯科技(深圳)有限公司 三维边缘检测方法、装置、存储介质和计算机设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005211671A (ja) 2005-01-31 2005-08-11 Toshiba Corp 放射線治療計画装置
DE102005030646B4 (de) 2005-06-30 2008-02-07 Siemens Ag Verfahren zur Kontur-Visualisierung von zumindest einer interessierenden Region in 2D-Durchleuchtungsbildern
JP4394127B2 (ja) 2007-01-16 2010-01-06 ザイオソフト株式会社 領域修正方法
AU2016228027B2 (en) * 2015-03-04 2018-11-22 Institute of Mineral Resources, Chinese Academy of Geological Sciences Method for automatically extracting tectonic framework of potential field
CN107025642B (zh) * 2016-01-27 2018-06-22 百度在线网络技术(北京)有限公司 基于点云数据的车辆轮廓检测方法和装置
EP3542250A4 (en) * 2016-11-15 2020-08-26 Magic Leap, Inc. DEPTH LEARNING SYSTEM FOR DETECTION OF RUBBERS
CA3053487A1 (en) 2017-02-22 2018-08-30 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Detection of prostate cancer in multi-parametric mri using random forest with instance weighting & mr prostate segmentation by deep learning with holistically-nested networks
EP3468182A1 (en) * 2017-10-06 2019-04-10 InterDigital VC Holdings, Inc. A method and apparatus for encoding a point cloud representing three-dimensional objects
CN111126242B (zh) * 2018-10-16 2023-03-21 腾讯科技(深圳)有限公司 肺部图像的语义分割方法、装置、设备及存储介质
CN109598727B (zh) * 2018-11-28 2021-09-14 北京工业大学 一种基于深度神经网络的ct图像肺实质三维语义分割方法
CN110276408B (zh) * 2019-06-27 2022-11-22 腾讯科技(深圳)有限公司 3d图像的分类方法、装置、设备及存储介质
CN110599492B (zh) * 2019-09-19 2024-02-06 腾讯科技(深圳)有限公司 图像分割模型的训练方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018183548A1 (en) * 2017-03-30 2018-10-04 Hologic, Inc. System and method for hierarchical multi-level feature image synthesis and representation
CN109410185A (zh) * 2018-10-10 2019-03-01 腾讯科技(深圳)有限公司 一种图像分割方法、装置和存储介质
CN109598722A (zh) * 2018-12-10 2019-04-09 杭州帝视科技有限公司 基于递归神经网络的图像分析方法
CN109872325A (zh) * 2019-01-17 2019-06-11 东北大学 基于双路三维卷积神经网络的全自动肝脏肿瘤分割方法
CN110648337A (zh) * 2019-09-23 2020-01-03 武汉联影医疗科技有限公司 髋关节分割方法、装置、电子设备和存储介质
CN111325766A (zh) * 2020-02-20 2020-06-23 腾讯科技(深圳)有限公司 三维边缘检测方法、装置、存储介质和计算机设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222745A (zh) * 2022-09-21 2022-10-21 南通未来文化科技有限公司 基于光学信息的古筝面板材料检测方法
CN115222745B (zh) * 2022-09-21 2022-12-13 南通未来文化科技有限公司 基于光学信息的古筝面板材料检测方法
CN115841625A (zh) * 2023-02-23 2023-03-24 杭州电子科技大学 一种基于改进U-Net模型的遥感建筑物影像提取方法

Also Published As

Publication number Publication date
US20220215558A1 (en) 2022-07-07
JP2022552663A (ja) 2022-12-19
JP7337268B2 (ja) 2023-09-01
CN111325766B (zh) 2023-08-25
EP4016454A4 (en) 2023-01-04
EP4016454A1 (en) 2022-06-22
CN111325766A (zh) 2020-06-23

Similar Documents

Publication Publication Date Title
EP3933688B1 (en) Point cloud segmentation method, computer-readable storage medium and computer device
WO2021164280A1 (zh) 三维边缘检测方法、装置、存储介质和计算机设备
US20210365717A1 (en) Method and apparatus for segmenting a medical image, and storage medium
CN112017189B (zh) 图像分割方法、装置、计算机设备和存储介质
AU2019213369B2 (en) Non-local memory network for semi-supervised video object segmentation
US20210390706A1 (en) Detection model training method and apparatus, computer device and storage medium
WO2021164322A1 (zh) 基于人工智能的对象分类方法以及装置、医学影像设备
CN111429460B (zh) 图像分割方法、图像分割模型训练方法、装置和存储介质
CN114445670B (zh) 图像处理模型的训练方法、装置、设备及存储介质
CN111476806B (zh) 图像处理方法、装置、计算机设备和存储介质
CN110930378B (zh) 基于低数据需求的肺气肿影像处理方法及系统
CN114677349A (zh) 编解码端边缘信息增强和注意引导的图像分割方法及系统
CN113239866B (zh) 一种时空特征融合与样本注意增强的人脸识别方法及系统
CN114764870A (zh) 对象定位模型处理、对象定位方法、装置及计算机设备
CN114332473A (zh) 目标检测方法、装置、计算机设备、存储介质及程序产品
WO2023160157A1 (zh) 三维医学图像的识别方法、装置、设备、存储介质及产品
CN113255646B (zh) 一种实时场景文本检测方法
Lu et al. Learning to complete partial observations from unpaired prior knowledge
CN114863132A (zh) 图像空域信息的建模与捕捉方法、系统、设备及存储介质
Luo et al. Frontal face reconstruction based on detail identification, variable scale self-attention and flexible skip connection
CN115984583B (zh) 数据处理方法、装置、计算机设备、存储介质和程序产品
CN111680722B (zh) 内容识别方法、装置、设备及可读存储介质
Sun et al. Multi-kernel Fusion Pooling and Linear Convolution Kernel for Retinal Image Segmentation
Ahad et al. Nakagami Weighted Parametric Image-Based Segmentation of Microscopy Images Using U-Net
Wang et al. ASARU-Net: superimposed U-Net with residual squeeze-and-excitation layer for road crack segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920101

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020920101

Country of ref document: EP

Effective date: 20220315

ENP Entry into the national phase

Ref document number: 2022522367

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE