CN109118539B - Method, device and equipment for fusing point cloud and picture based on multi-scale features - Google Patents

Method, device and equipment for fusing point cloud and picture based on multi-scale features Download PDF

Info

Publication number
CN109118539B
CN109118539B CN201810779366.1A CN201810779366A CN109118539B CN 109118539 B CN109118539 B CN 109118539B CN 201810779366 A CN201810779366 A CN 201810779366A CN 109118539 B CN109118539 B CN 109118539B
Authority
CN
China
Prior art keywords
features
fusion
point cloud
convolution
convolution operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810779366.1A
Other languages
Chinese (zh)
Other versions
CN109118539A (en
Inventor
徐楷
冯良炳
姚杰
严亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cosmosvision Intelligent Technology Co ltd
Original Assignee
Shenzhen Cosmosvision Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cosmosvision Intelligent Technology Co ltd filed Critical Shenzhen Cosmosvision Intelligent Technology Co ltd
Priority to CN201810779366.1A priority Critical patent/CN109118539B/en
Publication of CN109118539A publication Critical patent/CN109118539A/en
Application granted granted Critical
Publication of CN109118539B publication Critical patent/CN109118539B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention provides a method, a device and equipment for fusing point cloud and pictures based on multi-scale features, wherein the method comprises the following steps: obtaining at least two groups of point cloud characteristics and picture characteristics by extracting a characteristic network, and performing first convolution operation on the characteristics; grouping according to the abstraction degree and respectively carrying out element-by-element averaging and fusion on each group; performing one-time jump connection on the output result after the first convolution operation and the feature map obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion operation; performing a second convolution operation on the feature diagram obtained after jump connection and linear fusion; performing element-by-element averaging fusion on the four types of characteristics obtained by the second convolution operation; and carrying out the third convolution operation on the new fusion characteristics obtained after the average fusion is carried out, and taking the third convolution operation as the final output characteristics. The method can accurately position and predict the direction of the target object so as to improve the accuracy of positioning and predicting the direction of the target object.

Description

Method, device and equipment for fusing point cloud and picture based on multi-scale features
Technical Field
The invention relates to the field of computer vision, in particular to a method, a device and equipment for fusing point cloud and pictures based on multi-scale features.
Background
At present, people pay attention to the problem of automatic driving safety, so that 3D target detection research in the field of automatic driving becomes a hotspot. With respect to 2D target detection, 3D target detection needs to detect depth information that is not required by 2D target detection, and therefore point cloud data including depth information obtained by a radar sensor becomes one of data sources for 3D target detection. However, since the point cloud data is often sparse and cannot convey rich texture information, the detection algorithm does not achieve the expected effect well. Compared with point cloud data, image data can not represent depth information but represent rich texture information, and in such a case, designing an algorithm which can achieve good effects and simultaneously perform 3D target detection by using the point cloud data and the image data becomes a problem to be solved urgently.
However, the existing point cloud data and image fusion method usually adopts a method such as linear addition or averaging to process, the processing method is too simple, and there is no interaction between data, so that the existing method has the problems of poor positioning effect, low prediction accuracy and the like in the aspects of 3D target positioning and direction prediction.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for fusing a point cloud and an image based on multi-scale features, which can accurately locate and predict a target object, so as to improve the accuracy of locating and predicting the direction of the target object.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a point cloud and picture fusion method based on multi-scale features, which comprises the following steps:
obtaining at least two groups of point cloud characteristics and picture characteristics by extracting a characteristic network, and performing first convolution operation on the obtained point cloud characteristics and the obtained picture characteristics through a convolution layer respectively;
grouping result features output after the point cloud features and the picture features are subjected to the first convolution operation according to the abstraction degree, and then respectively carrying out element-by-element averaging fusion on each group of two types of features to obtain two types of fused features;
performing one-time jump connection on the output results of the point cloud characteristics and the image characteristics after the first convolution operation and the characteristic graphs obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion operation;
respectively carrying out a second convolution operation on the feature graphs obtained after the jump connection and the linear fusion through a convolution layer;
performing element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features;
and carrying out element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features, carrying out third convolution operation on the new fusion features, and using the new fusion features as final output features of data fusion.
In some embodiments, the step of obtaining at least two groups of point cloud features and picture features by extracting a feature network, and performing a first convolution operation on the obtained point cloud features and picture features respectively through a convolution layer further includes:
and simultaneously controlling the number of the characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure GDA0002639903430000021
Figure GDA0002639903430000022
Figure GDA0002639903430000023
Figure GDA0002639903430000024
wherein "im 1', pc1', im2 'and pc2' represent output results of different convolutional layers;
Figure GDA0002639903430000031
weight parameters representing different convolutional layers; the weight parameter is automatically obtained through network learning;
"im 1, pc1, im2, pc 2" represent the input features of the different convolutional layers;
“b1im1、b1pc1、b1im2、b1pc2"represents the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
In some embodiments, the grouping of the result features output after the point cloud features and the image features are subjected to the first convolution operation according to the abstraction degree, and then performing element-by-element averaging and fusion on each group of two types of features to obtain two types of fused features includes:
dividing the characteristic abstraction degrees in the groups into a group with the same abstraction degree, and dividing the characteristic abstraction degrees between the groups into a group with different abstraction degrees;
the corresponding mathematical formula is:
Figure GDA0002639903430000032
Figure GDA0002639903430000033
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
"im 1', pc1', im2 'and pc 2'" represent the output results of the different convolutional layers.
In some embodiments, the performing a jump connection on the output result of the first convolution operation on the point cloud feature and the image feature and the feature map obtained after grouping and element-by-element averaging fusion operations, and performing a linear fusion operation includes a corresponding mathematical formula:
Figure GDA0002639903430000034
Figure GDA0002639903430000035
Figure GDA0002639903430000036
Figure GDA0002639903430000041
Figure GDA0002639903430000042
Figure GDA0002639903430000043
Figure GDA0002639903430000044
Figure GDA0002639903430000045
wherein "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number,
"m, n, i" is a positive integer,
the variation ranges of the ' b ', the h and the w ' in different formulas are the same, and the variation ranges of the ' i ' are different;
Figure GDA0002639903430000046
input features representing different convolutional layers;
"im 1', pc1', im2 'and pc2' represent the output results of different convolutional layers;
"impc 1 and impc 2" mean that two types of characteristics of each group are subjected to element-by-element averaging fusion to obtain fused characteristics.
In some embodiments, the performing a second convolution operation on the feature maps obtained by performing jump connection and linear fusion respectively through a convolution layer further includes:
and simultaneously controlling the number of the characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure GDA0002639903430000047
Figure GDA0002639903430000048
Figure GDA0002639903430000049
Figure GDA00026399034300000410
wherein "y 1, y2, y3 and y 4" represent output results of different convolutional layers;
“w1T、w2T、w3T、w4T"represents the weight parameters of different convolutional layers; the weight parameter is automatically obtained through network learning;
Figure GDA00026399034300000411
input features representing different convolutional layers;
"b 1, b2, b3, b 4" represent the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
In some embodiments, the feature obtained by the second convolution operation is subjected to element-by-element averaging fusion to obtain a new fused feature formula:
Figure GDA0002639903430000051
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
"y 1, y2, y3, y 4" represent the output results of different convolutional layers;
"y 5" represents the result of element-by-element averaging and fusing the features obtained by the second convolution operation, i.e., the input features of the convolutional layer.
In some embodiments, the convolutional layer convolution kernel size is 1 × 1, the step size is 1, and the number of feature maps controlling the convolutional layer output is 8.
In some embodiments, the method may further comprise: and carrying out element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features, carrying out the third convolution operation on the new fusion features, and using the new fusion features as final output features of the data fusion part, wherein a corresponding formula is as follows:
y6=σ(w6Ty5+b6) 20;
wherein "y 6" represents the output of the convolutional layer;
“w6T"represents the weight parameter of the convolutional layer; the weight parameter is automatically obtained through network learning;
"y 5" represents the input characteristics of the convolutional layer;
"b 6" represents the bias of the convolutional layer; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
The second aspect of the present invention further provides a device for fusing a point cloud and an image based on multi-scale features, which is applied to any one of the above methods for fusing a point cloud and an image based on multi-scale features, and the device includes:
the characteristic extraction module is used for obtaining point cloud characteristics and picture characteristics by extracting a characteristic network;
the first convolution module is used for performing first convolution operation on the point cloud characteristics and the image characteristics through a convolution layer respectively;
the grouping fusion module is used for grouping result characteristics output after the point cloud characteristics and the picture characteristics are subjected to the first convolution operation according to the abstraction degree, and then performing element-by-element averaging fusion on each group of two types of characteristics to obtain two types of fused characteristics;
the jump fusion module is used for performing jump connection on the point cloud characteristic and the image characteristic after the first convolution operation and the characteristic graph obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion;
the linear fusion module is used for carrying out linear fusion operation on the feature graph after the jump connection;
the second convolution module is used for performing second convolution operation on the feature graphs obtained after jump connection and linear fusion respectively through convolution layers;
the average fusion module is used for carrying out element-by-element average fusion on the features obtained by the second convolution operation to obtain new fusion features;
and the third convolution module is used for performing element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features, performing the third convolution operation on the new fusion features, and using the new fusion features as final output features of data fusion.
The third aspect of the present invention also provides a point cloud and picture fusion device based on multi-scale features, which includes a processor, a computer-readable storage medium, and a computer program stored on the computer-readable storage medium, wherein when the computer program is executed by the processor, the computer program implements the steps in the method according to any one of the above.
The method, the device and the equipment for fusing the point cloud and the picture based on the multi-scale features can enhance the interaction between the point cloud features and the picture features, and can keep the independence of a single sensor for acquiring network features while the features are interacted; the method of the embodiment of the invention adopts a nonlinear fusion method to enhance the expressive force of the characteristics; a flexible linear fusion mode is added under the framework of a nonlinear fusion method, and the utilization rate of the characteristics is improved by utilizing quick connection, so that the target object can be accurately positioned and the direction of the target object can be accurately predicted, and the positioning accuracy and the direction prediction accuracy of the target object are improved.
Drawings
FIG. 1 is a visualization model diagram of a point cloud and picture fusion algorithm based on multi-scale features according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for fusing a point cloud and an image based on multi-scale features according to an embodiment of the present invention;
fig. 3 is a block diagram of a point cloud and picture fusion apparatus based on multi-scale features according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems that in the prior art, a point cloud and picture fusion method is often processed by adopting a linear addition or averaging method, the processing method is too simple, no interaction exists among data, the positioning effect is poor in the aspects of 3D target positioning and direction prediction, and the prediction accuracy is low, the invention provides a point cloud and picture fusion method, a device and equipment based on multi-scale features, which can accurately position and direction prediction a target object so as to improve the accuracy of positioning and direction prediction of the target object, and further improve the safety performance of applying the technology to related fields.
Nominal definition and interpretation:
the convolutional layer mentioned in the embodiment of the present invention is 2D convolutional and has encapsulated the 2D convolutional layer and the active layer Relu in tensorial flow.
The initial parameter of the convolution layer adopts the initialization mode of 0 mean value and 1 variance Gaussian distribution.
The output feature map number of the convolutional layer comprehensively considers the ratio of the point cloud feature map number and the picture feature map number in the fusion data; and the feature map number ratio of the fusion data and the network feature data acquired by the single sensor for re-fusion can be effectively controlled. In this embodiment, the single sensor is used to capture an image to extract network characteristic data on the image.
It should be noted that: the processing methods of the point cloud characteristics and the picture characteristics are the same, abstract processing is required, pc is used for representing the point cloud characteristics, Im is used for representing the picture characteristics, different numbers are used for representing different abstract processing degrees, and the same numbers indicate the same abstract processing degrees. The abstraction degree mainly refers to the number of layers of passed convolutions, and the different abstraction degrees refer to the different number of layers of passed convolutions; the same abstraction level means that the number of layers of convolutions passed through is the same. For example, Im1 represents the same degree of abstraction as pc 1; im2 represents the same degree of abstraction as pc 2; im1 and im2 represent different levels of abstraction; pc1 and pc2 show different levels of abstraction.
The first embodiment is as follows:
the invention provides a method for fusing a point cloud and a picture based on multi-scale features, and please refer to fig. 1, which is a visual model diagram of a multi-scale feature-based point cloud and picture fusion algorithm provided by an embodiment of the invention, and please refer to fig. 2, wherein the method specifically comprises the following steps:
s1 at least two groups of point cloud features and picture features are obtained by extracting the feature network, the obtained point cloud features and picture features are respectively subjected to first convolution operation through a convolution layer, and meanwhile, the number of feature graphs output by the convolution layer is controlled.
Specifically, point cloud features and picture features with different abstraction degrees are obtained through a feature extraction network, in the embodiment, the point cloud features and the picture features are im1, im2, pc1 and pc2 respectively (pc represents the point cloud features; im represents the picture features; the same number represents the same abstraction degree of two types of features, and different numbers represent different abstraction degrees of the two types of features); carrying out convolution operation on the two groups of point cloud characteristics and the image characteristics through a convolution layer respectively to obtain four types of new characteristics im1', im2', pc1 'and pc2', and simultaneously controlling the number of characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure GDA0002639903430000091
Figure GDA0002639903430000092
Figure GDA0002639903430000093
Figure GDA0002639903430000094
wherein "im 1', pc1', im2 'and pc2' represent output results of different convolutional layers;
Figure GDA0002639903430000095
weight parameters representing different convolutional layers; the weight parameter is automatically obtained through network learning;
"im 1, pc1, im2, pc 2" represent the input features of the different convolutional layers;
“b1im1、b1pc1、b1im2、b1pc2"represents the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
The convolution layer convolution kernels are all 1 x 1 in size, the step lengths are all 1, and the number of characteristic graphs output by the convolution layer is 8.
S2, grouping result features output after the point cloud features and the picture features are subjected to the first convolution operation according to the abstraction degree, and then performing element-by-element averaging and fusion on each group of two types of features to obtain two types of fused features.
Specifically, the point cloud features and the picture features in S1 are grouped according to the abstraction degree after the first convolution operation, that is, the feature abstraction degrees in the group are the same and are grouped into one group, and the feature abstraction degrees between the groups are different and are grouped into one group; then, element-by-element averaging and fusing are carried out on each group of the two types of characteristics to obtain two types of fused characteristics, namely, impc1 and impc2, which correspond to the two left characteristics in the figure 1
Figure GDA0002639903430000101
The corresponding mathematical formula is:
Figure GDA0002639903430000102
Figure GDA0002639903430000103
where "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number.
S3, carrying out one-time jump connection on the output result of the point cloud characteristic and the picture characteristic after the first convolution operation and the characteristic graph obtained after grouping and element-by-element averaging fusion operation, and carrying out linear fusion operation.
Specifically, im1' was combined with impc 2; im2' and impc 1; pc1' and impc 2; pc2' and impc1 are connected by jump and are subjected to linear fusion to perform containment operation, which corresponds to the operation in FIG. 1
Figure GDA0002639903430000104
The mathematical formula is:
Figure GDA0002639903430000105
Figure GDA0002639903430000106
Figure GDA0002639903430000107
Figure GDA0002639903430000108
Figure GDA0002639903430000109
Figure GDA00026399034300001010
Figure GDA00026399034300001011
Figure GDA00026399034300001012
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
b in the 'b, h and w' corresponds to the size of the super parameter value during network training (an integer value needs to be set according to the actual condition);
h and w respectively correspond to the length and the width of the characteristic diagram, and can be set to a certain integer value according to the actual situation;
i corresponds to the number of the characteristic graphs and can be set to a certain integer value according to the actual situation;
b, h, w, i have no explicit range limits, and once the design network structure is determined, its value can be determined.
The'm, n, i' are positive integers, the variation ranges of 'b, h and w' in different formulas are the same, and the variation ranges of 'i' are different.
S4, the feature maps obtained by jump connection and linear fusion are respectively passed through a convolution layer to make second convolution operation.
Specifically, the feature maps obtained in S3 are respectively subjected to a second convolution operation by a convolution layer to obtain four new types of features, and the number of feature maps output by the convolution layer is controlled, where the corresponding mathematical formula is:
Figure GDA0002639903430000111
Figure GDA0002639903430000112
Figure GDA0002639903430000113
Figure GDA0002639903430000114
wherein "y 1, y2, y3 and y 4" represent output results of different convolutional layers;
“w1T、w2T、w3T、w4T"represents the weight parameters of different convolutional layers; the weight parameter is automatically obtained through network learning;
Figure GDA0002639903430000115
input features representing different convolutional layers;
"b 1, b2, b3, b 4" represent the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
In this embodiment, the convolutional layer convolution kernels are all 1 × 1 in size, the step sizes are all 1, and the number of feature maps controlling the convolutional layer output is all 8.
S5, carrying out element-by-element averaging fusion on the four types of features obtained by the second convolution operation to obtain new fusion features.
Specifically, the four types of features obtained in S4 are subjected to element-by-element averaging fusion to obtain a new fusion feature, which corresponds to the rightmost feature in fig. 1
Figure GDA0002639903430000121
The corresponding mathematical formula is:
Figure GDA0002639903430000122
where "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number.
"y 1, y2, y3, y 4" represent the output results of different convolutional layers;
“y5"indicates the result obtained by performing element-by-element average fusion on the features obtained by the second convolution operation.
And S6, carrying out element-by-element averaging fusion on the four types of features obtained by the second convolution operation to obtain new fusion features, carrying out the third convolution operation on the new fusion features, and using the new fusion features as final output features of data fusion.
Specifically, the new fusion feature obtained in S5 is subjected to a convolution layer to perform a third convolution operation, and is used as a final output feature of the data fusion part, where the corresponding formula is:
y6=σ(w6Ty5+b6) 20;
wherein "y 6" represents the output of the convolutional layer;
“w6T"represents the weight parameter of the convolutional layer; the weight parameter is automatically obtained through network learning;
"y 5" represents the input characteristics of the convolutional layer;
"b 6" represents the bias of the convolutional layer; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
The convolution layer convolution kernels are all 1 x 1 in size, the step lengths are all 1, and the number of characteristic graphs output by the convolution layer is 8.
The mathematical formula of the model of the above steps S1-S6 is expressed as:
Figure GDA0002639903430000131
Figure GDA0002639903430000132
note: f [ L +1] represents the characteristics of the L +1 layer network, with different subscripts representing different input feature sources (input feature sources refer to the content within the "()" immediately following f);
f [ L ] im2 represents the L-th network characteristics of the picture, and the input source is im 2;
fL pc2 represents the L-level network characteristics of the point cloud, and the input source is pc 2;
Figure GDA0002639903430000133
the operator is C: catenate (linear fusion operation) or M: element-wise mean;
l represents the number of layers of convolution;
k is represented as a positive integer less than L.
The method provided by the embodiment of the invention is improved aiming at the problems that the existing fusion algorithm is simpler and has no interaction between data, and comprises the steps of strengthening the interaction between point cloud and picture data, emphasizing the independence of the data, adopting a nonlinear fusion mode with stronger expressive force, simultaneously utilizing a linear fusion mode to carry out flexible feature splicing processing, and carrying out feature integration processing through a small-scale convolution kernel. The point cloud and picture fusion method based on the multi-scale features, provided by the invention, is proved to have better effects than the existing fusion method in the aspects of 3D target object positioning accuracy and direction prediction accuracy through tests.
The point cloud and picture fusion method based on the multi-scale features can enhance the interaction of the point cloud features and the picture features, and can keep the independence of a single sensor for acquiring network features while the features are interacted; the method of the embodiment of the invention adopts a nonlinear fusion method to enhance the expressive force of the characteristics; a flexible linear fusion mode is added under the framework of a nonlinear fusion method, and the utilization rate of the characteristics is improved by utilizing quick jump connection, so that the target object can be accurately positioned and the direction of the target object can be accurately predicted, and the positioning accuracy and the direction prediction accuracy of the target object are improved.
Example two
The embodiment of the present invention further provides a device for fusing a point cloud and an image based on multi-scale features, please refer to fig. 3, wherein the device includes the following modules:
the system comprises a feature extraction module 10, a first convolution module 20, a grouping fusion module 30, a jump connection module 40, a linear fusion module 50, a second convolution module 60, an average fusion module 70 and a third convolution module 80.
The feature extraction module 10 is configured to obtain point cloud features and image features by extracting a feature network.
The first convolution module 20 is configured to perform a first convolution operation on the obtained point cloud features and the obtained image features through a convolution layer respectively, and control the number of feature maps output by the convolution layer.
Specifically, at least two groups of point cloud features and picture features are obtained through the feature extraction module 10; and then, the obtained point cloud features and the image features are respectively convolved by the first convolution module 20 (with different abstraction degrees of the two types of features), and the number of feature maps output by the convolution layer is controlled. The point cloud characteristics and the picture characteristics are im1, im2, pc1 and pc2 respectively (pc represents the point cloud characteristics; im represents the picture characteristics; the same number represents the same abstraction degree of two types of characteristics, and different numbers represent the different abstraction degrees of the two types of characteristics); carrying out convolution operation on the two groups of point cloud characteristics and the image characteristics through a convolution layer respectively to obtain four types of new characteristics im1', im2', pc1 'and pc2', and simultaneously controlling the number of characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure GDA0002639903430000141
Figure GDA0002639903430000142
Figure GDA0002639903430000143
Figure GDA0002639903430000144
wherein "im 1', pc1', im2 'and pc2' represent output results of different convolutional layers;
Figure GDA0002639903430000151
weight parameters representing different convolutional layers; the weight parameter is automatically obtained through network learning;
"im 1, pc1, im2, pc 2" represent the input features of the different convolutional layers;
“b1im1、b1pc1、b1im2、b1pc2"represents the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
The convolution layer convolution kernels are all 1 x 1 in size, the step lengths are all 1, and the number of characteristic graphs output by the convolution layer is 8.
The grouping and fusing module 30 is configured to group result features output after the point cloud features and the picture features are subjected to the first convolution operation according to the abstraction degree, and perform element-by-element averaging and fusing on each group of two types of features respectively to obtain two types of fused features.
Specifically, the result features output after the point cloud features and the picture features are subjected to the first convolution operation are grouped according to the abstraction degree, namely the feature abstraction degrees in the groups are the same and are grouped into one group, and the feature abstraction degrees between the groups are different and are grouped into one group; then, element-by-element averaging and fusing are carried out on each group of the two types of characteristics to obtain two types of fused characteristics, namely, impc1 and impc2, which correspond to the two left characteristics in the figure 1
Figure GDA0002639903430000152
The corresponding mathematical formula is:
Figure GDA0002639903430000153
Figure GDA0002639903430000154
where "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number.
And the jump fusion module 40 is used for performing jump connection on the output result of the point cloud characteristic and the image characteristic after the first convolution operation and the characteristic graph obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion.
Specifically, im1' was combined with impc 2; im2' and impc 1; pc1' and impc 2; pc2' and impc1 are connected by jump and are subjected to linear fusion to perform containment operation, which corresponds to the operation in FIG. 1
Figure GDA0002639903430000161
The formula is:
Figure GDA0002639903430000162
Figure GDA0002639903430000163
Figure GDA0002639903430000164
Figure GDA0002639903430000165
Figure GDA0002639903430000166
Figure GDA0002639903430000167
Figure GDA0002639903430000168
Figure GDA0002639903430000169
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
b in the 'b, h and w' corresponds to the size of the super parameter value during network training (an integer value needs to be set according to the actual condition);
h and w respectively correspond to the length and the width of the characteristic diagram, and can be set to a certain integer value according to the actual situation;
i corresponds to the number of the characteristic graphs and can be set to a certain integer value according to the actual situation;
b, h, w, i have no explicit ranges and once the design network structure is determined, its values can only be determined.
The'm, n, i' are positive integers, the variation ranges of 'b, h and w' in different formulas are the same, and the variation ranges of 'i' are different.
And the linear fusion module 50 is configured to perform a linear fusion operation on the feature map after the jump connection.
And the second convolution module 60 is configured to perform a second convolution operation on the feature maps obtained by performing jump connection and linear fusion respectively through convolution layers.
Specifically, the feature maps obtained in S3 are respectively subjected to a second convolution operation by a convolution layer to obtain four new types of features, and the number of feature maps output by the convolution layer is controlled, where the corresponding mathematical formula is:
Figure GDA0002639903430000171
Figure GDA0002639903430000172
Figure GDA0002639903430000173
Figure GDA0002639903430000174
wherein "y 1, y2, y3 and y 4" represent output results of different convolutional layers;
“w1T、w2T、w3T、w4T"represents the weight parameters of different convolutional layers; the weight parameter is automatically obtained through network learning;
Figure GDA0002639903430000175
input features representing different convolutional layers;
"b 1, b2, b3, b 4" represent the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
In this embodiment, the convolutional layer convolution kernels are all 1 × 1 in size, the step sizes are all 1, and the number of feature maps controlling the convolutional layer output is all 8.
The average fusion module 70 is configured to perform element-by-element average fusion on the four types of features obtained through the second convolution operation to obtain new fusion features.
Specifically, will pass through the secondThe four types of features obtained by the sub-convolution operation are subjected to element-by-element averaging fusion to obtain new fusion features, which correspond to the rightmost features in the graph 1
Figure GDA0002639903430000176
The corresponding mathematical formula is:
Figure GDA0002639903430000177
where "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number.
The third convolution module 80 performs element-by-element averaging fusion on the four types of features obtained by the second convolution operation to obtain new fusion features, and performs the third convolution operation on the new fusion features, and the new fusion features are used as final output features of data fusion.
Specifically, the feature obtained in S5 is subjected to a convolution layer for the third convolution operation, and is used as the final output feature of the data fusion part, and the corresponding mathematical formula is:
y6=σ(w6Ty5+b6) 20;
wherein "y 6" represents the output of the convolutional layer;
“w6T"represents the weight parameter of the convolutional layer; the weight parameter is automatically obtained through network learning;
"y 5" represents the input characteristics of the convolutional layer;
"b 6" represents the bias of the convolutional layer; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
The convolution layer convolution kernels are all 1 x 1 in size, the step lengths are all 1, and the number of characteristic graphs output by the convolution layer is 8.
The point cloud and picture fusion device based on the multi-scale features can enhance the interaction of the point cloud features and the picture features, and can keep the independence of a single sensor for acquiring network features while the features are interacted; according to the embodiment of the invention, the first convolution module 20, the second convolution module 60, the third convolution module 80 and the jump connection module 40 can enhance the expressive force of features; by adding a flexible linear fusion mode to the linear fusion module 50 under the framework of a nonlinear fusion method, the utilization rate of the features can be effectively improved by using the rapid jump connection module 40, so that the target object can be accurately positioned and the direction can be predicted, and the positioning accuracy and the direction prediction accuracy of the target object can be improved.
Example three:
according to an embodiment of the present invention, the device includes a processor, a computer-readable storage medium, and a computer program stored on the computer-readable storage medium, where the computer program, when executed by the processor, implements the steps in the above method for fusing a point cloud and an image based on multi-scale features, and the specific steps are as described in the first embodiment, and are not described herein again.
The memory in the present embodiment may be used to store software programs as well as various data. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
According to an example of this embodiment, all or part of the processes in the methods of the embodiments described above may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer-readable storage medium, and in this embodiment of the present invention, the program may be stored in the storage medium of a computer system and executed by at least one processor in the computer system, so as to implement the processes including the embodiments of the methods described above. The storage medium includes, but is not limited to, a magnetic disk, a flash disk, an optical disk, a Read-Only Memory (ROM), and the like.
The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not to be construed as limiting the scope of the invention. Those skilled in the art can implement the invention in various modifications, such as features from one embodiment can be used in another embodiment to yield yet a further embodiment, without departing from the scope and spirit of the invention. Any modification, equivalent replacement and improvement made within the technical idea of using the present invention should be within the scope of the right of the present invention.

Claims (9)

1. A point cloud and picture fusion method based on multi-scale features is characterized by comprising the following steps:
obtaining at least two groups of point cloud characteristics and picture characteristics by extracting a characteristic network, and performing first convolution operation on the obtained point cloud characteristics and the obtained picture characteristics through a convolution layer respectively;
grouping result features output after the point cloud features and the picture features are subjected to the first convolution operation according to the abstraction degree, and then respectively carrying out element-by-element averaging fusion on each group of two types of features to obtain two types of fused features;
performing one-time jump connection on the output results of the point cloud characteristics and the image characteristics after the first convolution operation and the characteristic graphs obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion operation;
respectively carrying out a second convolution operation on the feature graphs obtained after the jump connection and the linear fusion through a convolution layer;
performing element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features;
performing element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features, performing third convolution operation on the new fusion features, and using the new fusion features as final output features of data fusion;
the method comprises the following steps of performing convolution operation on point cloud features and image features, grouping result features output after the point cloud features and the image features are subjected to the first convolution operation according to the abstraction degree, and performing element-by-element averaging and average fusion on each group of two types of features respectively to obtain two types of fused features, wherein the two types of fused features comprise:
dividing the characteristic abstraction degrees in the groups into a group with the same abstraction degree, and dividing the characteristic abstraction degrees between the groups into a group with different abstraction degrees;
the corresponding mathematical formula is:
Figure FDA0002639903420000011
Figure FDA0002639903420000012
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
"im 1', pc1', im2 'and pc 2'" represent the output results of the different convolutional layers.
2. The method of claim 1, wherein the step of obtaining at least two groups of point cloud features and image features by extracting a feature network and performing a first convolution operation on the obtained point cloud features and image features respectively by a convolution layer further comprises:
and simultaneously controlling the number of the characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure FDA0002639903420000021
Figure FDA0002639903420000022
Figure FDA0002639903420000023
Figure FDA0002639903420000024
wherein "im 1', pc1', im2 'and pc2' represent output results of different convolutional layers;
Figure FDA0002639903420000025
weight parameters representing different convolutional layers; the weight parameter is automatically obtained through network learning;
"im 1, pc1, im2, pc 2" represent the input features of the different convolutional layers;
“b1im1、b1pc1、b1im2、b1pc2"represents the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
3. The method for fusing point cloud and picture based on multi-scale features of claim 1, wherein the step of performing one-hop connection on the output result of the first convolution operation on the point cloud features and the picture features and the feature graph obtained after the grouping and element-by-element averaging fusion operation, and performing linear fusion operation comprises a corresponding mathematical formula:
Figure FDA0002639903420000026
Figure FDA0002639903420000027
Figure FDA0002639903420000028
Figure FDA0002639903420000031
Figure FDA0002639903420000032
Figure FDA0002639903420000033
Figure FDA0002639903420000034
Figure FDA0002639903420000035
wherein "b, h, w, i" is a non-negative integer representing the tensor subscript ordinal number,
"m, n, i" is a positive integer,
the variation ranges of the ' b ', the h and the w ' in different formulas are the same, and the variation ranges of the ' i ' are different;
Figure FDA0002639903420000036
input features representing different convolutional layers;
"im 1', pc1', im2 'and pc2' represent the output results of different convolutional layers;
"impc 1 and impc 2" mean that two types of characteristics of each group are subjected to element-by-element averaging fusion to obtain fused characteristics.
4. The method for fusing point cloud and picture based on multi-scale features of claim 1, wherein the second convolution operation of the feature map obtained by jump connection and linear fusion through a convolution layer respectively further comprises the following steps:
and simultaneously controlling the number of the characteristic graphs output by the convolution layer, wherein the corresponding mathematical formula is as follows:
Figure FDA0002639903420000037
Figure FDA0002639903420000038
Figure FDA0002639903420000039
Figure FDA00026399034200000310
wherein "y 1, y2, y3 and y 4" represent output results of different convolutional layers;
“w1T、w2T、w3T、w4T"represents the weight parameters of different convolutional layers; the weight parameter is automatically obtained through network learning;
Figure FDA0002639903420000041
input features representing different convolutional layers;
"b 1, b2, b3, b 4" represent the bias of the different convolutional layers; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
5. The method for fusing point cloud and picture based on multi-scale features of claim 1, wherein the feature obtained by the second convolution operation is subjected to element-by-element averaging fusion to obtain a new fused feature formula:
Figure FDA0002639903420000042
wherein "b, h, w, i" is a nonnegative integer representing tensor subscript ordinal number;
"y 1, y2, y3, y 4" represent the output results of different convolutional layers;
"y 5" represents the result of element-by-element average fusion of the features obtained by the second convolution operation, i.e. the input features of the convolutional layer.
6. The method of fusing point cloud and picture based on multi-scale features of claim 4 or 5, wherein the convolution kernel size is 1 x 1, the step size is 1, and the number of feature maps output by the convolution layer is controlled to be 8.
7. The method for fusing point cloud and picture based on multi-scale features according to claim 1, wherein the features obtained by the second convolution operation are subjected to element-by-element averaging fusion to obtain new fusion features, the new fusion features are subjected to a third convolution operation and are used as final output features of a data fusion part, and a corresponding formula is as follows:
y6=σ(w6Ty5+b6) 20;
wherein "y 6" represents the output of the convolutional layer;
“w6T"represents the weight parameter of the convolutional layer; the weight parameter is automatically obtained through network learning;
"y 5" represents the input characteristics of the convolutional layer;
"b 6" represents the bias of the convolutional layer; the bias parameters are automatically obtained through network learning;
"σ" represents the activation function, corresponding to function max {0, x }.
8. A multi-scale feature-based point cloud and picture fusion device applied to the multi-scale feature-based point cloud and picture fusion method of any one of claims 1 to 7, the device comprising:
the characteristic extraction module is used for obtaining point cloud characteristics and picture characteristics by extracting a characteristic network;
the first convolution module is used for performing first convolution operation on the point cloud characteristics and the image characteristics through a convolution layer respectively;
the grouping fusion module is used for grouping result characteristics output after the point cloud characteristics and the picture characteristics are subjected to the first convolution operation according to the abstraction degree, and then performing element-by-element averaging fusion on each group of two types of characteristics to obtain two types of fused characteristics;
the jump fusion module is used for performing jump connection on the point cloud characteristic and the image characteristic after the first convolution operation and the characteristic graph obtained after grouping and element-by-element averaging fusion operation, and performing linear fusion;
the linear fusion module is used for carrying out linear fusion operation on the feature graph after the jump connection;
the second convolution module is used for performing second convolution operation on the feature graphs obtained after jump connection and linear fusion respectively through convolution layers;
the average fusion module is used for carrying out element-by-element average fusion on the features obtained by the second convolution operation to obtain new fusion features;
and the third convolution module is used for performing element-by-element averaging fusion on the features obtained by the second convolution operation to obtain new fusion features, performing the third convolution operation on the new fusion features, and using the new fusion features as final output features of data fusion.
9. A multi-scale feature based point cloud and picture fusion device, comprising a processor, a computer readable storage medium, and a computer program stored on the computer readable storage medium, wherein the computer program, when executed by the processor, implements the steps of the method according to any one of claims 1 to 7.
CN201810779366.1A 2018-07-16 2018-07-16 Method, device and equipment for fusing point cloud and picture based on multi-scale features Active CN109118539B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810779366.1A CN109118539B (en) 2018-07-16 2018-07-16 Method, device and equipment for fusing point cloud and picture based on multi-scale features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810779366.1A CN109118539B (en) 2018-07-16 2018-07-16 Method, device and equipment for fusing point cloud and picture based on multi-scale features

Publications (2)

Publication Number Publication Date
CN109118539A CN109118539A (en) 2019-01-01
CN109118539B true CN109118539B (en) 2020-10-09

Family

ID=64862857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810779366.1A Active CN109118539B (en) 2018-07-16 2018-07-16 Method, device and equipment for fusing point cloud and picture based on multi-scale features

Country Status (1)

Country Link
CN (1) CN109118539B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378398B (en) * 2019-06-27 2023-08-25 东南大学 Deep learning network improvement method based on multi-scale feature map jump fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268935A (en) * 2014-09-18 2015-01-07 华南理工大学 Feature-based airborne laser point cloud and image data fusion system and method
EP2833322A1 (en) * 2013-07-30 2015-02-04 The Boeing Company Stereo-motion method of three-dimensional (3-D) structure information extraction from a video for fusion with 3-D point cloud data
CN105931234A (en) * 2016-04-19 2016-09-07 东北林业大学 Ground three-dimensional laser scanning point cloud and image fusion and registration method
CN108053367A (en) * 2017-12-08 2018-05-18 北京信息科技大学 A kind of 3D point cloud splicing and fusion method based on RGB-D characteristic matchings
CN108229548A (en) * 2017-12-27 2018-06-29 华为技术有限公司 A kind of object detecting method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8275194B2 (en) * 2008-02-15 2012-09-25 Microsoft Corporation Site modeling using image data fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2833322A1 (en) * 2013-07-30 2015-02-04 The Boeing Company Stereo-motion method of three-dimensional (3-D) structure information extraction from a video for fusion with 3-D point cloud data
CN104268935A (en) * 2014-09-18 2015-01-07 华南理工大学 Feature-based airborne laser point cloud and image data fusion system and method
CN105931234A (en) * 2016-04-19 2016-09-07 东北林业大学 Ground three-dimensional laser scanning point cloud and image fusion and registration method
CN108053367A (en) * 2017-12-08 2018-05-18 北京信息科技大学 A kind of 3D point cloud splicing and fusion method based on RGB-D characteristic matchings
CN108229548A (en) * 2017-12-27 2018-06-29 华为技术有限公司 A kind of object detecting method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Door and Cabinet Recognition Using Convolutional Neural Nets and Real-Time Method Fusion for Handle Detection and Grasping;Adrian Llopart等;《IEEE》;20170608;第144-149页 *
基于SIFT算法的激光成像雷达点云图像与可见光图像的融合研究;李知达;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141015;第1-56页 *

Also Published As

Publication number Publication date
CN109118539A (en) 2019-01-01

Similar Documents

Publication Publication Date Title
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
CN110226172B (en) Transforming a source domain image into a target domain image
CN108229479B (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
CN110168560B (en) Method, system and medium for scene understanding and generation
EP3201881B1 (en) 3-dimensional model generation using edges
CN109816769A (en) Scene based on depth camera ground drawing generating method, device and equipment
JP7123133B2 (en) Binocular Image Depth Estimation Method and Apparatus, Equipment, Program and Medium
US20190301861A1 (en) Method and apparatus for binocular ranging
WO2020024585A1 (en) Method and apparatus for training object detection model, and device
EP3679521A1 (en) Segmenting objects by refining shape priors
US20190080462A1 (en) Method and apparatus for calculating depth map based on reliability
KR102140805B1 (en) Neural network learning method and apparatus for object detection of satellite images
CN110852349A (en) Image processing method, detection method, related equipment and storage medium
CN110176024B (en) Method, device, equipment and storage medium for detecting target in video
CN109919209A (en) A kind of domain-adaptive deep learning method and readable storage medium storing program for executing
CN112348828A (en) Example segmentation method and device based on neural network and storage medium
CN111709415B (en) Target detection method, device, computer equipment and storage medium
CN114627173A (en) Data enhancement for object detection by differential neural rendering
CN110807379A (en) Semantic recognition method and device and computer storage medium
CN109118539B (en) Method, device and equipment for fusing point cloud and picture based on multi-scale features
CN113657396B (en) Training method, translation display method, device, electronic equipment and storage medium
CN109035338B (en) Point cloud and picture fusion method, device and equipment based on single-scale features
CN108898557B (en) Image restoration method and apparatus, electronic device, computer program, and storage medium
CN108062761A (en) Image partition method, device and computing device based on adaptive tracing frame
CN111027670B (en) Feature map processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant