CN112990305A

CN112990305A - Method, device and equipment for determining occlusion relationship and storage medium

Info

Publication number: CN112990305A
Application number: CN202110268994.5A
Authority: CN
Inventors: 康学净; 冯盼贺; 明安龙
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-18
Anticipated expiration: 2041-03-12
Also published as: CN112990305B

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for determining an occlusion relationship, wherein the method comprises the following steps: acquiring an image to be determined; inputting an image to be determined into a pre-trained occlusion relation description model to obtain a boundary prediction image and a direction prediction result; determining the shielding relation of an object in the image to be determined based on the boundary prediction image and the direction prediction result; determining a boundary prediction image and a direction prediction result by using a forward residual module and a feature processing module, and then determining an occlusion relation between objects in an image according to the boundary prediction image and the direction prediction result; by continuously adjusting the parameters of the neural network, the accuracy of the boundary prediction graph and the accuracy of the direction prediction result are improved, and therefore the accuracy of the determined shielding relation between the objects is improved.

Description

Method, device and equipment for determining occlusion relationship and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an occlusion relationship.

Background

Currently, in some scenarios, it is necessary to determine the occlusion relationship between objects. For example, in a three-dimensional reconstruction scene, a two-dimensional image of an object needs to be acquired, the object is reconstructed in a three-dimensional space according to the two-dimensional image, and in the three-dimensional reconstruction process, the position relationship of the object in the space can be determined by determining the occlusion relationship between the objects in the two-dimensional image. As another example, in an obstacle avoidance scene of the robot, an occlusion relationship between objects needs to be determined first, and a priority of obstacle avoidance processing is determined according to the occlusion relationship, where if it is determined that the robot needs to avoid an obstacle that is an obstacle a and an obstacle B, it may be determined that the distance between the a and the robot is closer than the distance between the B and the robot if it is determined that the a occludes the B, and thus, the a may be determined as a first priority, the B may be determined as a second priority, and so on.

Currently, the solution for determining the occlusion relationship between objects generally includes: carrying out image segmentation on an image to be processed to obtain a boundary of an object in the image; judging whether the boundary is an occlusion boundary, if so, judging the front and back sequence of the objects on the two sides of the occlusion boundary according to the occlusion boundary; and determining the shielding relation between the objects according to the front and back sequence of the objects.

However, in the above solution, the image is segmented first, and then the occlusion relationship between the objects is determined according to the segmented boundary, that is, the occlusion relationship is determined by using the image segmentation result, however, if the accuracy of the image segmentation is low, the segmentation effect is poor, and the accuracy of the determined occlusion relationship is low.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device, equipment and a storage medium for determining an occlusion relationship, so as to improve the accuracy of the determined occlusion relationship. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present invention provides a method for determining an occlusion relationship, including:

acquiring an image to be determined;

inputting the image to be determined into a pre-trained occlusion relation description model to obtain a boundary prediction image and a direction prediction result;

determining the shielding relation of the object in the image to be determined based on the boundary prediction image and the direction prediction result;

wherein the step of training the occlusion relation description model comprises:

acquiring a training sample, a standard boundary diagram corresponding to the training sample and a standard direction result;

inputting the training sample into a residual error module of a neural network to obtain a feature map of the training sample; the neural network comprises a residual error module and a feature processing module which are connected in sequence;

inputting the feature map into the feature processing module to obtain a fusion feature map and a separation feature map;

performing convolution on the fusion characteristic graph to obtain a direction prediction result; performing convolution on the separation characteristic graph to obtain a boundary prediction graph;

judging whether the neural network converges or not based on a first loss value and a second loss value, wherein the first loss value is as follows: a penalty between the boundary prediction map and the standard boundary map, the second penalty being: a loss value between the direction prediction result and the standard direction result;

if not, adjusting parameters of the neural network, and returning to the step of inputting the training sample to a residual error module of the neural network until the neural network converges to obtain an occlusion relation description model.

Optionally, a feature enhancement module is further included between the residual error module and the feature processing module; inputting the training sample into a residual error module of a neural network to obtain a feature map of the training sample, wherein the feature map comprises:

inputting the training sample into a residual error module of a neural network to obtain an initial characteristic diagram of the training sample;

and inputting the initial feature map into the feature enhancement module to obtain an enhanced feature map serving as the feature map of the training sample.

Optionally, the convolving the fusion feature map to obtain a direction prediction result includes:

performing convolution on the fusion feature map to obtain a first convolution feature map;

extracting the features of the training sample to obtain a feature map to be fused;

and fusing the first convolution feature map and the feature map to be fused to obtain a direction prediction result.

Optionally, the convolving the separation characteristic map to obtain a boundary prediction map includes:

performing convolution on the separation characteristic diagram to obtain a second convolution characteristic diagram;

the second convolution characteristic graph is subjected to up-sampling, and after the up-sampled second convolution characteristic graph is subjected to classification processing, a boundary probability graph is obtained;

fusing the boundary probability graph and the feature graph to be fused to obtain a boundary fusion graph;

and determining the boundary probability map and the boundary fusion map as a boundary prediction map.

Optionally, the determining whether the neural network converges based on the first loss value and the second loss value includes:

calculating a loss value of the loss function based on the first loss value and the second loss value using a loss function;

and judging whether the loss function is converged.

Optionally, the obtaining of the training sample, the standard boundary diagram corresponding to the training sample, and the standard direction result includes:

acquiring a sample image to be processed, a standard boundary diagram corresponding to the sample image and a standard direction result;

and segmenting the sample image to be processed to obtain a plurality of segmented images serving as training samples.

In order to achieve the above object, an embodiment of the present invention further provides an apparatus for determining an occlusion relationship, including:

the first acquisition module is used for acquiring an image to be determined;

the prediction module is used for inputting the image to be determined to a pre-trained occlusion relation description model to obtain a boundary prediction image and a direction prediction result;

the determining module is used for determining the shielding relation of the object in the image to be determined based on the boundary prediction image and the direction prediction result;

the second acquisition module is used for acquiring the training sample, the standard boundary diagram corresponding to the training sample and the standard direction result;

the first input module is used for inputting the training sample to a residual error module of a neural network to obtain a feature map of the training sample; the neural network comprises a residual error module and a feature processing module which are connected in sequence;

the second input module is used for inputting the feature map into the feature processing module to obtain a fusion feature map and a separation feature map;

the first convolution module is used for performing convolution on the fusion characteristic graph to obtain a direction prediction result;

the second convolution module is used for performing convolution on the separation characteristic graph to obtain a boundary prediction graph;

an updating module, configured to determine whether the neural network converges based on a first loss value and a second loss value, where the first loss value is: a penalty between the boundary prediction map and the standard boundary map, the second penalty being: a loss value between the direction prediction result and the standard direction result; if not, adjusting parameters of the neural network, and returning to the step of inputting the training sample to a residual error module of the neural network until the neural network converges to obtain an occlusion relation description model.

In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the method for determining any shielding relation when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for determining an occlusion relationship is implemented.

By applying the embodiment of the invention, a boundary prediction image and a direction prediction result are determined by utilizing a sequential residual error module and a characteristic processing module, and then the shielding relation between objects in an image is determined according to the boundary prediction image and the direction prediction result; by continuously adjusting the parameters of the neural network, the accuracy of the boundary prediction graph and the accuracy of the direction prediction result are improved, and therefore the accuracy of the determined shielding relation between the objects is improved.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a first flowchart of a method for determining an occlusion relationship according to an embodiment of the present invention;

fig. 2A is a schematic structural diagram of a neural network according to an embodiment of the present invention;

fig. 2B is a schematic structural diagram of a feature enhancing module according to an embodiment of the present invention;

fig. 2C is a schematic structural diagram of a feature processing module according to an embodiment of the present invention;

fig. 2D is a schematic structural diagram of an occlusion sharing module according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a process for training an occlusion relationship description model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an occlusion relationship according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of the present invention;

fig. 6 is a second flowchart of the method for determining an occlusion relationship according to the embodiment of the present invention;

fig. 7 is a schematic structural diagram of an apparatus for determining an occlusion relationship according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an apparatus for training an occlusion relationship description model according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to achieve the above object, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for determining an occlusion relationship, where the method and the apparatus may be applied to various electronic devices, and are not limited specifically. First, a method for determining the occlusion relationship will be described in detail below.

Fig. 1 is a first flowchart of a method for determining an occlusion relationship according to an embodiment of the present invention, including:

s101: and acquiring an image to be determined.

For example, in a three-dimensional reconstruction scene, the image to be determined may be an image including an object to be three-dimensionally reconstructed; or, in the robot obstacle avoidance scene, the image to be determined may be an image acquired by an image acquisition device such as a camera and a video camera carried by the robot; or the image to be determined may also be an image in a public image data set, and the like.

S102: and inputting the image to be determined into a pre-trained occlusion relation description model to obtain a boundary prediction image and a direction prediction result.

The occlusion relationship description model may be obtained by training a neural network to be trained, the structure of the neural network to be trained is the same as that of the occlusion relationship description model obtained after the training is completed, and the training process may be understood as a process of iteratively adjusting parameters in the neural network. The specific structure of the neural network and the occlusion relation description model is not limited.

For example, referring to fig. 2A, the neural network may include a residual module and a feature processing module. And the input image enters a residual error module to carry out feature extraction to obtain a feature map n of a feature map 1 … …, the feature map n of the feature map 1 … … is input to a feature processing module, a fusion feature map and a separation feature map are output, and the fusion feature map and the separation feature map are respectively subjected to convolution processing to obtain a direction prediction result and a boundary prediction map. The number of the feature maps output by the residual error module is the same as the number of residual error sub-modules contained in the residual error module, and n is a positive integer which is the same as the number of the residual error sub-modules contained in the residual error module.

For example, the residual module may be formed by sequentially connecting a plurality of residual sub-modules, each of which may be formed by 3 convolutional layers; and the input image enters a residual error module for feature extraction, and each residual error submodule outputs a feature map of the input image. If the residual error module includes 5 residual error sub-modules, then, after the input image enters the residual error module, a feature map output by each residual error sub-module can be obtained, that is, after the input image enters the residual error module, 5 feature maps can be obtained, the size of the 5 feature maps corresponds to the position of the residual error sub-module of the output feature map in the residual error module, for example, the closer the residual error sub-module is to the input end, the larger the size of the output feature map is, the farther the residual error sub-module is from the input end, the smaller the size of the output feature map is. The number of convolutional layers constituting the residual sub-module may be 3, 4, or the like, and the specific number of convolutional layers constituting the residual sub-module is not limited.

For example, the feature processing module may be understood as a module for fusing and separating feature maps. After entering the feature processing module, the feature map n of the feature map 1 … … is first subjected to feature fusion by the convolutional layer, and then subjected to feature separation by the convolutional layer to obtain a fusion feature map and a separation feature map.

Referring to fig. 3, a process of training an occlusion relationship description model is described below, where fig. 3 is a schematic flowchart of a process of training an occlusion relationship description model according to an embodiment of the present invention, and includes:

s301: and acquiring a training sample, a standard boundary diagram corresponding to the training sample and a standard direction result.

For example, the training sample may be an image in an image dataset, or the training sample may also be an image acquired by an image acquisition device. The embodiment of the invention does not limit the image acquisition device; the image Dataset may be a PIOD (Pascal Instance Dataset), a BSDS (border ownership Dataset), and the like, which are not limited in particular. The standard boundary map may be understood as an image describing a boundary having an occlusion relationship between objects in the training sample. The standard orientation result may be understood as an image describing the order of occlusion between objects in the training sample.

In one embodiment, S301 may include: acquiring a sample image to be processed, a standard boundary diagram corresponding to the sample image and a standard direction result; and segmenting the sample image to be processed to obtain a plurality of segmented images serving as training samples.

For example, the sample image to be processed may be an image in an image data set, or the training sample may be an image acquired by an image acquisition device. The embodiment of the invention does not limit the image acquisition device; the image data set may be a PIOD, a BSDS, etc., and is not particularly limited. The standard boundary map can be understood as an image describing the boundary with an occlusion relationship between objects in the sample image to be processed. The standard orientation result can be understood as an image describing the order of occlusion between objects in the sample image to be processed.

For example, the sample image to be processed may be segmented into images of preset sizes, such as segmenting the sample image to be processed into a plurality of segmented images of 320 pixels × 320 pixels. The predetermined size may be 320 pixels × 320 pixels, 160 pixels × 160 pixels, and the like, and the specific predetermined size is not limited. For example, if the size of the sample image to be processed is 960 pixels × 960 pixels, the sample image to be processed is segmented by using 320 pixels × 320 pixels as the preset size, and 9 segmented images of 320 pixels × 320 pixels are obtained. The size of the sample image to be processed may be 960 pixels × 960 pixels, 480 pixels × 480 pixels, and so on, and the size of the sample image to be processed is not limited.

By applying the embodiment, the sample image to be processed is divided into a plurality of divided images according to the preset size, and the divided images are used as the training sample, and the mode of dividing the image with larger size into a plurality of images with smaller size can reduce the calculated amount and improve the training efficiency of the neural network under the condition of subsequently utilizing the neural network to perform feature extraction, feature fusion and division on the training sample.

S302: inputting the training sample into a residual error module of a neural network to obtain a characteristic diagram of the training sample; the neural network comprises a residual error module and a feature processing module which are connected in sequence.

For example, the residual module may include a plurality of residual sub-modules, and then, after the training sample enters the residual module, a plurality of feature maps output by the plurality of residual sub-modules may be obtained, where each residual sub-module outputs one feature map, and the size of the feature map corresponds to the position of the residual sub-module outputting the feature map in the residual module, one for one, and for example, the larger the size of the feature map output by the residual sub-module is, the later the residual sub-module is, the smaller the size of the feature map output by the residual sub-module is.

In one embodiment, a feature enhancement module is further included between the residual module and the feature processing module, and S302 may include: inputting the training sample into a residual error module of a neural network to obtain an initial characteristic diagram of the training sample; and inputting the initial characteristic diagram into a characteristic enhancement module to obtain an enhanced characteristic diagram serving as a characteristic diagram of a training sample.

For example, the training sample may be input to a residual error module of the neural network to obtain an initial feature map of the training sample; inputting the initial feature map into a feature enhancement module, wherein the structure of the feature enhancement module can be shown in fig. 2B, and the feature enhancement module can comprise three paths, wherein one path can comprise a 1 × 1 convolution unit and a 3 × 3 convolution unit in sequence, another path can comprise a 1 × 1 convolution unit and two 1 × 3 convolution units in sequence, and another path can comprise a 1 × 1 convolution unit and two 3 × 1 convolution units in sequence; and finally, adding the output results of the three paths, and performing feature aggregation by using a convolution unit of 1 multiplied by 1 to obtain an enhanced feature map which is used as the feature map of the training sample. The output of each convolution unit can be directly input to the next convolution unit, or the output of each convolution unit can be input to the next convolution unit after being sequentially subjected to standardization and rectification. The feature enhancement module may also be: SMOTE (Synthetic minor Over-sampling Technique, synthesis of a few oversamples), samplepaire (sample pairing), etc., and the specific feature enhancement module is not limited.

S303: and inputting the feature map into a feature processing module to obtain a fusion feature map and a separation feature map.

For example, if the residual module includes a plurality of residual sub-modules, the feature maps output by the last two residual sub-modules may be input to the feature processing module, the structure of the feature processing module may be as shown in fig. 2C, and the feature processing module may include a convolution unit 1, a convolution unit 2, a convolution unit 3, and a convolution unit 4, where the convolution unit 1 and the convolution unit 2 are sequentially connected, and the convolution unit 3 and the convolution unit 4 are respectively connected to the convolution unit 2; the convolution unit 1 and the convolution unit 2 are used for performing feature fusion on the feature map 1 and the feature map 2 input to the feature processing module, and the convolution unit 3 and the convolution unit 4 are used for performing feature separation on the fused feature map to obtain a fusion feature map and a separation feature map after separation. The fused feature map and the feature map output by the last 3 residual sub-module in the residual module can be input into the feature processing module, and so on.

For example, if the residual module includes 5 residual sub-modules, the feature maps output by the fifth residual sub-module and the fourth residual sub-module may be input to the feature processing module to obtain a fused feature map 1 and a separated feature map 1; then, the feature map output by the fusion feature map 1 and the third residual sub-module can be input to a feature processing module to obtain a fusion feature map 2 and a separation feature map 2; then, the feature map output by the fusion feature map 2 and the second residual sub-module can be input to a feature processing module to obtain a fusion feature map 3 and a separation feature map 3; and then inputting the fused feature map 3 and the feature map output by the first residual sub-module into a feature processing module to obtain a fused feature map 4 and a separated feature map 4.

In one case, the fused feature map may be up-sampled first, and then the up-sampled fused feature map may be input to the feature processing module. In this way, the calculation amount in the fusion process is reduced by the fact that the size of the fused feature map after the upsampling is the same as that of the feature map to be fused with the fused feature map.

S304: and (5) performing convolution on the fusion characteristic graph to obtain a direction prediction result.

In one embodiment, S304 may include: performing convolution on the fusion characteristic diagram to obtain a first convolution characteristic diagram; extracting features of the training samples to obtain a feature map to be fused; and fusing the first convolution feature map and the feature map to be fused to obtain a direction prediction result.

For example, the fused feature map may be convolved to obtain a first convolved feature map; feature extraction can be performed on the training sample by using the 3 layers of convolutional layers to obtain a feature graph to be fused, wherein the size of the feature graph to be fused can be the same as that of the training sample; and fusing the first convolution characteristic diagram and the characteristic diagram to be fused, and performing convolution processing on the fused characteristic diagram to obtain direction prediction results of the two channels. The two channels are respectively a prediction result in the horizontal direction and a prediction result in the vertical direction. The embodiment of the invention does not limit the way of extracting the features of the training samples.

S305: and (4) performing convolution on the separation characteristic graph to obtain a boundary prediction graph.

In one embodiment, S305 may include: performing convolution on the separation characteristic diagram to obtain a second convolution characteristic diagram; the second convolution characteristic graph is subjected to up-sampling, and after the up-sampled second convolution characteristic graph is subjected to classification processing, a boundary probability graph is obtained; extracting features of the training samples to obtain a feature map to be fused; fusing the boundary probability graph and the feature graph to be fused to obtain a boundary fusion graph; and determining the boundary probability map and the boundary fusion map as a boundary prediction map.

For example, the separation feature map may be convolved to obtain a second convolved feature map; and performing upsampling on the second convolution characteristic graph, and performing sigmoid (two-class) classification processing on the upsampled second convolution characteristic graph to obtain a boundary probability graph. The classification processing mode may be sigmoid, softmax (logistic regression), etc., and the specific classification processing mode is not limited.

For example, feature extraction may be performed on a training sample by using 3 convolutional layers to obtain a feature map to be fused, where the size of the feature map to be fused may be the same as that of the training sample; the feature graph to be fused can be used as a boundary probability graph; fusing the boundary probability graph and the feature graph to be fused to obtain a boundary fusion graph; the boundary probability map and the boundary fusion map may be determined as a boundary prediction map.

For example, if 4 separation feature maps are obtained in S303, the 4 separation feature maps may be sequentially subjected to upsampling and classification processing to obtain 4 boundary probability maps, then the 4 boundary probability maps and the fusion feature map may be fused to obtain a boundary fusion map, the feature map to be fused may be used as one boundary probability map, that is, 5 boundary probability maps and one boundary fusion map are obtained, and then the 5 boundary probability maps and the boundary fusion map are determined as a boundary prediction map, that is, 6 boundary prediction maps are obtained.

S306: judging whether the neural network converges or not based on the first loss value and the second loss value, wherein the first loss value is as follows: a loss value between the boundary prediction map and the standard boundary map, wherein the second loss value is: a loss value between the directional prediction result and the standard directional result. If not, S307 may be executed; if so, S308 may be performed.

Referring now to FIG. 4, a schematic diagram illustrating the occlusion relationship is presented, wherein in FIG. 4, the dashed lines represent the occlusion boundaries of the object; the area to the left of the dashed line represents the foreground, or can be understood as an object that occludes other objects in an occluding relationship; the area to the right of the dotted line represents a background or can be understood as an object occluded in an occluding relationship; the vector a represents the relation between the foreground and the background in the horizontal direction, the vector b represents the relation between the foreground and the background in the vertical direction, the vector a and the vector b are mutually orthogonal, the directions of the vector a and the vector b point to the background side in two sides of the shielding boundary, and the length of the vector represents the shielding degree of the foreground on the background; the solid line perpendicular to the dotted line represents the normal of the occlusion boundary; θ represents the angle of the vector a with the normal of the occlusion boundary.

For example, the loss value between the boundary prediction map and the standard boundary map can be calculated using the following equation:

wherein, gamma is_eRepresenting the loss value between the boundary prediction graph and the standard boundary graph,

representing a boundary prediction graph, Y representing a standard boundary graph, Y-representing non-boundary pixel points, Y + representing boundary pixel points,

in a graph representing boundary predictionJ represents the serial number of the pixel point in the boundary prediction graph, and alpha is lambda₊|/(|Y₊|+|Y_-|)，β＝|Y_-|/(|Y₊|+|Y_-And |) |), λ represents a weight that controls the balance between boundary pixels and non-boundary pixels.

For example, the loss value between the direction prediction result and the standard direction result may be calculated using the following equation:

wherein,

wherein, gamma is_oRepresenting the loss value between the direction prediction result and the standard direction result, j representing the serial number of the pixel points in the direction prediction graph, Y + representing the boundary pixel points,

for the two-channel direction prediction result of pixel point j,

represents the horizontal direction prediction result of the pixel point j,

represents the vertical prediction result, theta, of the pixel point j_jThe standard direction result of the pixel point j is represented, and x represents an intermediate variable.

In one embodiment, S306 may include: calculating a loss value of the loss function based on the first loss value and the second loss value using the loss function; it is determined whether the loss function converges. If not, S307 may be executed; if so, S308 may be performed.

For example, the loss function may be:

where Γ represents the loss function, w_oWeight, Γ, representing the loss value between the directional prediction result and the standard directional result_oIndicating the loss value between the directional prediction result and the standard directional result, n indicating the number of boundary prediction maps, i indicating the number of boundary prediction maps,

a weight representing a loss value between the ith boundary prediction map and the standard boundary map,

indicating the loss value between the calculated boundary prediction graph and the standard boundary graph.

Whether the loss function is converged can be judged according to the loss function value; if not, S307 may be executed; if so, S308 may be performed.

S307: parameters of the neural network are adjusted.

After adjusting the parameters of the neural network, the process may return to step S302.

S308: and obtaining an occlusion relation description model.

The occlusion relation description model can be used to predict the boundaries between objects in the image and the anteroposterior relation of occlusion between objects.

S103: and determining the occlusion relation of the object in the image to be determined based on the boundary prediction image and the direction prediction result.

For example, a boundary prediction graph and a direction prediction result may be fused by using NMS (Non-Maximum Suppression) to obtain a boundary graph with a direction, and then, an occlusion relationship of an object in an image to be determined may be determined according to the boundary graph with the direction by using left-handed rule. The embodiment of the invention does not limit the specific mode of fusing the boundary prediction graph and the direction prediction result.

Referring to fig. 5, a specific embodiment is introduced, where the residual module includes a residual sub-module 1, a residual sub-module 2, a residual sub-module 3, a residual sub-module 4, and a residual sub-module 5, and after an image to be determined is input to the occlusion relation description model, the image is input to the residual sub-module 1 in the residual module, so as to obtain an initial feature map 1; inputting the output of the residual sub-module 1 to the residual sub-module 2 to obtain an initial characteristic diagram 2; inputting the output of the residual sub-module 2 to the residual sub-module 3 to obtain an initial characteristic diagram 3; inputting the output of the residual sub-module 3 into the residual sub-module 4 to obtain an initial characteristic diagram 4; and inputting the output of the residual submodule 4 into the residual submodule 5 to obtain an initial characteristic diagram 5. Inputting the initial feature map 4 and the initial feature map 5 into an occlusion sharing module, wherein the structure of the occlusion sharing module can be shown in fig. 2D, the occlusion sharing module comprises a feature enhancement sub-module, in the occlusion sharing module, the initial feature map 5 is firstly up-sampled, the initial feature map 4 is input into the feature enhancement sub-module to obtain an enhanced feature map 4, and the enhanced feature map 4 and the up-sampled initial feature map 5 are fused to obtain a fusion result; inputting the fusion result into the feature processing module 4 to obtain the separation feature map 1, up-sampling the separation feature map 1, and classifying the up-sampled separation feature map 1 to obtain the boundary prediction map 1.

Inputting the initial feature map 1, the initial feature map 2 and the initial feature map 3 into a feature enhancement module respectively to obtain an enhanced feature map 1, an enhanced feature map 2 and an enhanced feature map 3; inputting the output of the feature processing module 4 and the enhanced feature map 3 into the feature processing module 3 to obtain a separation feature map 2, up-sampling the separation feature map 2, and classifying the up-sampled separation feature map 2 to obtain a boundary prediction map 2; inputting the output of the feature processing module 3 and the enhanced feature map 2 into the feature processing module 2 to obtain a separation feature map 3, up-sampling the separation feature map 3, and classifying the up-sampled separation feature map 3 to obtain a boundary prediction map 3; the output of the feature processing module 2 and the enhanced feature map 1 are input to the feature processing module 1 to obtain a fusion feature map and a separation feature map 4, the separation feature map 4 is up-sampled, and the up-sampled separation feature map 4 is classified to obtain a boundary prediction map 4.

Performing convolution processing on the image to be determined to respectively obtain a feature map to be fused and a boundary prediction map 5; fusing the feature graph to be fused with the fused feature graph to obtain direction prediction results of the two channels; the boundary prediction diagram 6 is obtained by fusing the boundary prediction diagram 1, the boundary prediction diagram 2, the boundary prediction diagram 3, the boundary prediction diagram 4 and the boundary prediction diagram 5. The boundary prediction graph 6 may be determined as a boundary prediction result, the boundary prediction graph 6 and the direction prediction result may be fused by using NMS to obtain a boundary graph with a direction, and then, the occlusion relationship of an object in the image to be determined may be determined according to the boundary graph with the direction by using left-hand rule.

Fig. 6 is a second flowchart of the method for determining an occlusion relationship according to the embodiment of the present invention, where the method includes:

s601: acquiring a sample image to be processed, a standard boundary diagram corresponding to the sample image and a standard direction result; and segmenting the sample image to be processed to obtain a plurality of segmented images serving as training samples.

S602: inputting the training sample into a residual error module of a neural network to obtain an initial characteristic diagram of the training sample; the neural network comprises a residual error module, a feature enhancement module and a feature processing module which are connected in sequence.

For example, the residual module may include a plurality of residual sub-modules, and then, after the training sample enters the residual module, a plurality of initial feature maps output by the plurality of residual sub-modules may be obtained, where each residual sub-module outputs one initial feature map, and the size of the initial feature map corresponds to the position of the residual sub-module outputting the initial feature map in the residual module, in one-to-one correspondence, for example, the closer the residual sub-module is to the input end, the larger the size of the output initial feature map is, and the farther the residual sub-module is from the input end, the smaller the size of the output initial feature map is.

S603: and inputting the initial characteristic diagram into a characteristic enhancement module to obtain an enhanced characteristic diagram serving as a characteristic diagram of a training sample.

For example, the initial feature map may be input to a feature enhancement module, the structure of which may be shown in fig. 2B, and the feature enhancement module may include three paths, wherein one path may include one 1 × 1 convolution unit and one 3 × 3 convolution unit in series, another path may include one 1 × 1 convolution unit and two 1 × 3 convolution units in series, and still another path may include one 1 × 1 convolution unit and two 3 × 1 convolution units in series; and finally, adding the output results of the three paths, and performing feature aggregation by using a convolution unit of 1 multiplied by 1 to obtain an enhanced feature map which is used as the feature map of the training sample. The output of each convolution unit can be directly input to the next convolution unit, or the output of each convolution unit can be input to the next convolution unit after being sequentially subjected to standardization and rectification. The feature enhancement module may also be: SMOTE, samplepair, etc., and the specific feature enhancement module is not limited.

S604: and inputting the feature map into a feature processing module to obtain a fusion feature map and a separation feature map.

S605: performing convolution on the fusion characteristic diagram to obtain a first convolution characteristic diagram; extracting features of the training samples to obtain a feature map to be fused; and fusing the first convolution feature map and the feature map to be fused to obtain a direction prediction result.

S606: performing convolution on the separation characteristic diagram to obtain a second convolution characteristic diagram; the second convolution characteristic graph is subjected to up-sampling, and after the up-sampled second convolution characteristic graph is subjected to classification processing, a boundary probability graph is obtained; fusing the boundary probability graph and the feature graph to be fused to obtain a boundary fusion graph; and determining the boundary probability map and the boundary fusion map as a boundary prediction map.

For example, if 4 separation feature maps are obtained in S604, the 4 separation feature maps may be sequentially subjected to upsampling and classification processing to obtain 4 boundary probability maps, then the 4 boundary probability maps and the fusion feature map may be fused to obtain a boundary fusion map, the feature map to be fused may be used as one boundary probability map, that is, 5 boundary probability maps and one boundary fusion map are obtained, and then the 5 boundary probability maps and the boundary fusion map are determined as a boundary prediction map, that is, 6 boundary prediction maps are obtained.

S607: calculating a loss value of the loss function based on the first loss value and the second loss value using the loss function; the first loss value is: a loss value between the boundary prediction map and the standard boundary map, wherein the second loss value is: a loss value between the directional prediction result and the standard directional result.

the method includes the steps that the jth pixel point in the boundary prediction graph is represented, j represents the serial number of the pixel point in the boundary prediction graph, and alpha is lambda.Y₊|/(|Y₊|+|Y_{_}|)，β＝|Y_{_}|/(|Y₊|+|Y_{_}And |) |), λ represents a weight that controls the balance between boundary pixels and non-boundary pixels.

wherein,

for the two-channel direction prediction result of pixel point j,

represents the horizontal direction prediction result of the pixel point j,

For example, the loss function may be:

S608: it is determined whether the loss function converges. If not, S609 may be executed; if so, S610 may be performed.

Whether the loss function is converged can be judged according to the loss function value; if not, S609 may be executed; if so, S610 may be performed.

S609: parameters of the neural network are adjusted.

After adjusting the parameters of the neural network, the process may return to step S602.

S610: and obtaining an occlusion relation description model.

S611: acquiring an image to be determined; inputting an image to be determined into an occlusion relation description model to obtain a boundary prediction image and a direction prediction result; and determining the occlusion relation of the object in the image to be determined based on the boundary prediction image and the direction prediction result.

For example, in a three-dimensional reconstruction scene, the image to be determined may be an image including an object to be three-dimensionally reconstructed; or, in the robot obstacle avoidance scene, the image to be determined may be an image acquired by an image acquisition device such as a camera and a video camera carried by the robot; or the image to be determined may also be an image in a public image data set, and the like. The image to be determined may be input into the occlusion relationship description model obtained in S610, so as to obtain a boundary prediction map and a direction prediction result.

For example, the boundary prediction graph and the direction prediction result may be fused by using NMS to obtain a boundary graph with a direction, and then, the occlusion relationship of an object in the image to be determined may be determined according to the boundary graph with a direction by using left-handed rule. The embodiment of the invention does not limit the specific mode of fusing the boundary prediction graph and the direction prediction result.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides an apparatus for determining an occlusion relationship, as shown in fig. 7, including:

a first obtaining module 701, configured to obtain an image to be determined;

a prediction module 702, configured to input the image to be determined to a pre-trained occlusion relation description model to obtain a boundary prediction graph and a direction prediction result;

a determining module 703, configured to determine an occlusion relationship of an object in the image to be determined based on the boundary prediction map and the direction prediction result.

Corresponding to the above method embodiment, an embodiment of the present invention further provides a device for training an occlusion relation description model, as shown in fig. 8, including:

a second obtaining module 801, configured to obtain a training sample, a standard boundary diagram corresponding to the training sample, and a standard direction result;

a first input module 802, configured to input the training sample to a residual error module of a neural network, so as to obtain a feature map of the training sample; the neural network comprises a residual error module and a feature processing module which are connected in sequence;

a second input module 803, configured to input the feature map to the feature processing module, so as to obtain a fused feature map and a separated feature map;

the first convolution module 804 is configured to perform convolution on the fusion feature map to obtain a direction prediction result;

a second convolution module 805, configured to perform convolution on the separation feature map to obtain a boundary prediction map;

an updating module 806, configured to determine whether the neural network converges based on a first loss value and a second loss value, where the first loss value is: a penalty between the boundary prediction map and the standard boundary map, the second penalty being: a loss value between the direction prediction result and the standard direction result; if not, adjusting parameters of the neural network, and returning to the step of inputting the training sample to a residual error module of the neural network until the neural network converges to obtain an occlusion relation description model.

In one embodiment, a feature enhancement module is further included between the residual error module and the feature processing module; the first input module 802 includes: a first input submodule, a second input submodule (not shown), wherein,

the first input submodule is used for inputting the training sample to a residual error module of a neural network to obtain an initial characteristic diagram of the training sample;

and the second input submodule is used for inputting the initial feature map into the feature enhancement module to obtain an enhanced feature map which is used as the feature map of the training sample.

In one embodiment, the first volume module 804 includes: a first convolution sub-module, a first extraction sub-module, a first fusion sub-module (not shown in the figure), wherein,

the first convolution submodule is used for performing convolution on the fusion feature map to obtain a first convolution feature map;

the first extraction submodule is used for extracting the features of the training sample to obtain a feature map to be fused;

and the first fusion submodule is used for fusing the first convolution feature map and the feature map to be fused to obtain a direction prediction result.

In one embodiment, the second convolution module 805 includes: a second convolution sub-module, a classification sub-module, a second extraction sub-module, a second fusion sub-module, and a determination sub-module (not shown in the figure), wherein,

the second convolution submodule is used for performing convolution on the separation characteristic diagram to obtain a second convolution characteristic diagram;

the classification submodule is used for carrying out up-sampling on the second convolution characteristic diagram and carrying out classification processing on the up-sampled second convolution characteristic diagram to obtain a boundary probability diagram;

the second extraction submodule is used for extracting the features of the training sample to obtain a feature map to be fused;

the second fusion submodule is used for fusing the boundary probability graph and the feature graph to be fused to obtain a boundary fusion graph;

and the determining submodule is used for determining the boundary probability map and the boundary fusion map as a boundary prediction map.

In an embodiment, the update module 806 is specifically configured to:

calculating a loss value of the loss function based on the first loss value and the second loss value using a loss function; and judging whether the loss function is converged.

In an embodiment, the second obtaining module 801 is specifically configured to:

An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,

a memory 903 for storing computer programs;

the processor 901 is configured to implement any one of the above-described methods for determining an occlusion relationship when executing a program stored in the memory 903.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned determination methods for occlusion relationship.

In a further embodiment, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for determining an occlusion relationship according to any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described for simplicity as they are substantially similar to method embodiments, where relevant, reference may be made to some descriptions of method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for determining an occlusion relationship, comprising:

acquiring an image to be determined;

2. The method of claim 1, further comprising a feature enhancement module between the residual module and the feature processing module; inputting the training sample into a residual error module of a neural network to obtain a feature map of the training sample, wherein the feature map comprises:

3. The method according to claim 1, wherein the convolving the fused feature map to obtain the directional prediction result comprises:

4. The method of claim 1, wherein the convolving the separated feature maps to obtain a boundary prediction map comprises:

5. The method of claim 1, wherein determining whether the neural network converges based on the first loss value and the second loss value comprises:

and judging whether the loss function is converged.

6. The method of claim 1, wherein the obtaining of the training samples and their corresponding standard boundary maps and standard orientation results comprises:

7. An apparatus for determining an occlusion relationship, comprising:

the first acquisition module is used for acquiring an image to be determined;

8. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.