CN116468940B

CN116468940B - Perception enhancement and motion judgment algorithm based on deep learning, storage medium and equipment

Info

Publication number: CN116468940B
Application number: CN202310390675.0A
Authority: CN
Inventors: 陈孟元; 程浩
Original assignee: Anhui Polytechnic University
Current assignee: Anhui Polytechnic University
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-09-19
Anticipated expiration: 2043-04-07
Also published as: CN116468940A

Abstract

The invention discloses a perception enhancement and motion judgment algorithm based on deep learning, which comprises the following steps: s1, accurately detecting a dynamic fuzzy object through a perception enhancement network of a fusion fuzzy region attention module and an enhancement detection module; s2, identifying objects in the image by using a sensor, and dividing the objects in the scene into three types of high dynamic, medium dynamic and low dynamic according to the acquired semantic information; s3, extracting characteristic points in the image, and classifying the high-dynamic and medium-dynamic targets into potential dynamic areas for data association; and S4, screening the characteristic points of the potential dynamic region by constructing a global conditional random field, and finally eliminating the characteristic points which are judged to be dynamic in the region. The method acquires the global optimal label by constructing and minimizing the field energy function based on the unitary potential and the binary potential function, eliminates the characteristic points judged to be dynamic in the potential dynamic area, and reduces the influence of the dynamic points on the system.

Description

Perception enhancement and motion judgment algorithm based on deep learning, storage medium and equipment

Technical Field

The invention belongs to the technical field of synchronous positioning and map creation (Simultaneous Location And Mapping, SLAM), and particularly relates to a perception enhancement and motion judgment algorithm based on deep learning, a storage medium and equipment.

Background

Synchronous positioning and map construction (Simultaneous Location And Mapping, SLAM) means that the mobile robot of the machine completes pose estimation and environment map construction by using a sensor carried by the mobile robot of the machine on the premise of no prior information of surrounding environment. The current mainstream SLAM system obtains high-precision positioning and composition in a static environment, but has poor pose estimation and map construction effects in a complex environment, and particularly has the defects that the system is difficult to accurately identify or incorrectly identify scene objects in a dynamic fuzzy environment, the motion condition of the objects cannot be accurately judged, and the dynamic objects have great influence on the positioning precision of the SLAM system. In the prior art, dynamic objects and static objects in an image are identified and classified to judge dynamic points and static points in the image, but the dynamic objects are not necessarily in a motion state and only belong to potential dynamic areas with high possibility of dynamic motion. The prior art cannot judge potential dynamic areas in the acquired images according to global information provided by the acquired images, so that the possibility that static points and dynamic points are wrongly marked is high, the accuracy of motion judgment of objects in the images is insufficient, and the effect of subsequent map construction is affected.

Disclosure of Invention

The invention aims to provide a perception enhancement and motion judgment algorithm based on deep learning, which is used for solving the technical problem that in the prior art, because the potential dynamic areas in the acquired image are judged according to global information provided by the acquired image, the possibility that static points and dynamic points are wrongly marked is high, and the accuracy of motion judgment on objects in the image is insufficient, so that a complete full static map cannot be built.

The perception enhancement and motion judgment algorithm based on deep learning comprises the following steps:

s1, accurately detecting a dynamic fuzzy object through a perception enhancement network of a fusion fuzzy region attention module and an enhancement detection module;

s2, identifying objects in the image by using a sensor, and dividing the objects in the scene into three types of high dynamic, medium dynamic and low dynamic according to the acquired semantic information;

s3, extracting characteristic points in the image, and classifying the high-dynamic and medium-dynamic targets into potential dynamic areas for data association;

and S4, screening the characteristic points of the potential dynamic region by constructing a global conditional random field, and finally eliminating the characteristic points which are judged to be dynamic in the region.

Preferably, in the step S4, the motion state of the potential dynamic point is determined by constructing a global conditional random field, global observation information includes observation conditions and reprojection errors of each point in different frames, a model of the constructed global conditional random field is converted into a Gibbs energy function to be solved, an energy function E (x) is minimized to obtain optimal label distribution of all points, and an effective average field approximation method is used to minimize the energy function to obtain a global optimal label;

the energy function E (x) is:

the energy function is divided into two parts, namely a unitary potential function psi _u (x _i ) And binary potential functionWherein i and j represent different nodes, x _i And x _j Class labels respectively representing nodes i and j; the algorithm constructs a binary potential model to fit each vertex in the global conditional random field, and a binary potential model to fit the connected edges between the vertices.

Preferably, a unitary potential function is used to model the relationship between the feature point set and the observed field as follows:

ψ _u (x _i )＝-ln((α _i -μ _α ) ² (β _i -μ _β ) ² (γ _i -μ _γ ) ² )

wherein, psi is _u (x _i ) Alpha is a unitary potential function of construction _i Is the space point P _i Reprojection error, beta _i Is the space point P _i Gamma, gamma _i For the distance from the corresponding pixel point to the polar line, the pixel point is the point of the space point after projection transformation, mu _α 、μ _β Sum mu _γ Respectively representing the mean values of the above-mentioned re-projection error, total observed number and pixel point-to-epipolar distance.

Preferably, the binary potential function constructs the relation between the current node and the nodes in the field, and improves the detection performance of the change by modeling the spatial correlation of the feature points, as shown in the following formula:

wherein alpha is _i And alpha _j Mean reprojection errors of nodes i, j, respectively, beta _i And beta _j The observation numbers of the nodes i and j are respectively represented, p _i And p _j Representing pose parameters, k, of nodes i and j, respectively ₁ And k ₂ As a weight parameter pair, sigma _α 、σ _β Sum sigma _γ Is a constant used to control the shape and scale of the gaussian kernel, μ (x _i ，x _j ) To represent a compatibility function, the following formula is shown:

preferably, in the step S1, an input feature map is input to an improved channel attention module, a feature map channel expression mechanism is obtained, and pixel value parameter calculation operations are performed on the obtained channel expression mechanism, so that feature map specific channel and region expression is enhanced; improved channel attention derived regional weight change parameter F _c The calculation method is as follows:

wherein η represents a Sigmoid function, Δ ₀ 、Δ ₁ Is the weight of the 2-layer linear layer,and->As a result of the feature map after the pixel value parameter resolution, < >>With average pooling operations, +.>Using maximum pooling operation, and finally using F _c Carrying out layer-by-layer channel weighting on the input feature map to obtain a channel dimension feature map;

inputting the input feature image into an improved space attention module, performing splicing and fusion to form a two-channel feature image after pixel value parameter calculation, performing convolution calculation by a layer of convolution layer, and processing the obtained feature vector by using a Sigmoid function to obtain an improved space attention area weight change parameter F _s The calculation mode is as follows:

wherein η represents a Sigmoid function, f ^7×7 Indicating convolution with a 7 x 7 convolution kernel,and->As a result of the feature map after the pixel value parameter resolution, < >>With average pooling operations, +.>Adopting a maximum pooling operation;

will F _c 、F _s Connecting with input feature map to obtain improved channel and spatial attention weighted feature map F _e 。

Preferably, in the step S2, the target detection sense is added based on the defuzzified network double discriminatorAnd the perceptron processes the restored potential clear image in the generator, downsamples the potential clear image again through the convolution block and performs feature fusion on the image with the same size in the generator. The convolution in the perceptron backbone network is a depth separable convolution, the depth separable convolution is a 3×3 depth convolution and a 1×1 point-by-point convolution, and the depth separable calculation amount isWherein: c (C) _k Representing the size of the convolution kernel, M and N representing the number of input and output data channels, C _w And C _h Representing the height and width of the output feature matrix, respectively.

Preferably, in the step S2, the object in the scene is classified into three types of high dynamic, medium dynamic and low dynamic according to the obtained semantic information by using the recognition of the object in the image by the sensor; an object with autonomous movement capability is defined as a highly dynamic object; objects that are both stationary and moving are defined as medium-dynamic objects; the object that will not move in most cases is defined as a low dynamic object.

Preferably, in the step S3, the high dynamic object and the medium dynamic object in the step S2 are classified as potential dynamic objects, and the low dynamic object and the background are classified as static features; extracting feature points of the acquired image, correlating the state of the feature points with the acquired semantic information, wherein the feature points in the potential dynamic object frame are potential feature points, and the feature points in the low dynamic object and the background are static feature points.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a deep learning based perception enhancement and motion determination algorithm as described above.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, which when executed implements the steps of a deep learning based perception enhancement and motion determination algorithm as described above.

The invention has the following advantages:

1. in the prior art, the lack of effective judgment on the motion state of the feature point in the potential dynamic region is solved, the algorithm introduces a global conditional random field to judge the potential dynamic region, the global conditional random field can refine the label results of the dynamic point and the static point aiming at the problem, and the dynamic point which is wrongly marked as the static label can be removed, so that the estimation precision of the pose of the camera is improved. A global best tag is obtained by constructing and minimizing a field energy function based on three sets of complementary observed variables defining a unitary potential and a binary potential function. Eliminating the feature points judged to be dynamic in the potential dynamic region reduces the influence of the dynamic points on the system. Therefore, the accuracy of motion judgment of the object in the image is greatly improved.

2. According to semantic information acquired by a network, carrying out dynamic rating classification on objects in a scene, wherein the objects with autonomous movement capability are defined as high-dynamic objects; a definition of a medium-dynamic object that is generally stationary, but that is likely to move at any time; in most cases it is defined as a low dynamic object that does not move. And extracting characteristic points from the input image, and correlating the characteristic data according to the semantic information. The feature points in the low dynamic object and the background can be directly used for pose estimation, and the feature points in the medium dynamic object and the high dynamic object serving as potential dynamic objects need to be subjected to motion judgment, so that the real dynamic points are removed.

3. The algorithm provides a single-stage network DYNET for merging deblurring and target detection. The network comprises a backbone network module for recovering local features of an image, a fuzzy region attention module and a perceptron module for enhancing recognition. The backbone network module adopts a MobileNet v 2-based feature pyramid FPN structure, and simultaneously, in order to enhance the deblurring effect, a fuzzy region attention module is introduced into the generator, and a feature map with rich high-dimensional feature information is input into an improved space and channel attention module so as to strengthen the learning of the network on fuzzy features, increase the pixel weight of the fuzzy region and improve the recognition rate of fuzzy objects. An improved target detection sensor is introduced on the basis of the original double-discriminator, and the sensor detects an image when the front end of the network finishes deblurring.

Drawings

Fig. 1 is a schematic flow chart of perception enhancement and motion judgment based on deep learning in the invention.

FIG. 2 is a flow chart of the perception enhancement and motion determination based on deep learning according to the present invention.

Fig. 3 is a diagram of a perceptually enhanced network according to the present invention.

FIG. 4 is a graph of the results of the global conditional random field model proposed by the present invention.

Fig. 5 is a graph comparing the prior to and after restoration of a blurred image in accordance with the present invention.

FIG. 6 is a graph of the number of features and matching contrast before and after deblurring an image in accordance with the present invention.

Fig. 7 is a histogram of feature matching data obtained from the present invention run on a TUM dataset.

FIG. 8 is a graph showing the comparison of detection accuracy before and after image restoration in the present invention.

FIG. 9 is a graph of a dynamic point culling comparison of the present invention run on a TUM data set.

Fig. 10 is a trace diagram of the present invention run under a turn dataset W half sequence.

FIG. 11 is a trace plot of the present invention run under the TUM dataset W_xyz sequence.

Fig. 12 is a trace diagram of the present invention run under the TUM dataset W rpy sequence.

Fig. 13 is a real experimental environment scenario and corresponding floor plan.

Fig. 14 is a comparison of the present invention before and after restoration of an acquired image in a real scene.

FIG. 15 is a graph showing the comparison of the accuracy of object detection before and after deblurring an image in a real scene according to the present invention.

Fig. 16 is a graph of the effect of dynamic point elimination in a real scene according to the present invention.

Fig. 17 is a diagram of a real scene motion trajectory of a mobile robot.

Detailed Description

The following detailed description of the embodiments of the invention, given by way of example only, is presented in the accompanying drawings to aid in a more complete, accurate, and thorough understanding of the inventive concepts and aspects of the invention by those skilled in the art.

The traditional SLAM algorithm is difficult to accurately identify or incorrectly identify objects in a dynamic fuzzy scene, the motion state of potential dynamic points cannot be accurately judged, and the like, so that the SLAM system is poor in positioning and map construction effects finally, and an improved SLAM algorithm is provided based on the algorithm. The method comprises three links of perception enhancement, potential dynamic point screening and dynamic point elimination based on a global conditional random field. The perception enhancement link deblurs and detects targets of the input image through the DYNET network provided by the algorithm. In the potential dynamic point screening link, objects in a scene are divided into three dynamic levels of high, medium and low through the acquired semantic information, image feature points are extracted, and the semantic information and the feature information are associated. And in the dynamic point eliminating link of the global conditional random field, judging the potential dynamic area by constructing the global conditional random field, and eliminating the characteristic points judged to be dynamic. The scheme of the invention is obtained by improving the existing SLAM technology by utilizing the principle.

Embodiment one:

as shown in fig. 1-17, the present invention provides a perception enhancement and motion judgment algorithm based on deep learning, comprising the following steps:

s1, accurately detecting a dynamic fuzzy object through a perception enhancement network of a fusion fuzzy region attention module and an enhancement detection module.

The fuzzy region attention module consists of an improved channel attention part and an improved space attention part, an input feature image is respectively connected in two dimensions of a channel and a space, and the acquired corresponding feature images are spliced and fused to obtain an output F _e Increasing the pixel weight of a fuzzy region, improving the recognition rate and the precision of a fuzzy object, and improving the attention of a channel to enhance the specific channel and region expression of a feature map so as to better learn fuzzy features; the improved spatial attention is used for dynamically adjusting each weight value continuously according to back propagation by learning the fuzzy region pixel value weight in the optimized feature mapAnd further guiding the network model to pay attention to the area where the fuzzy part is located, so that the dynamic fuzzy object is accurately detected.

In the step S1, an input feature map is input to an improved channel attention module, a feature map channel expression mechanism is obtained, and pixel value parameter resolving operations are performed on the obtained channel expression mechanism, so that the expression of a feature map specific channel and region is enhanced. Region expression obtained by pixel value parameter calculation operationAnd->The relevance of region expression is enhanced through a multi-layer perceptron module, the weight change parameters of each region are redistributed, fuzzy features are better learned, and the weight change parameters F of the region are obtained through improved channel attention _c The calculation method is as follows:

wherein η represents a Sigmoid function, Δ ₀ 、Δ ₁ Is the weight of the 2-layer linear layer,and->As a result of the feature map after the pixel value parameter resolution, < >>With average pooling operations, +.>Using maximum pooling operation, and finally using F _c And carrying out layer-by-layer channel weighting on the input feature map to obtain a channel dimension feature map.

In the step, the input feature image is also input into an improved space attention module, the feature image of a double channel is formed by splicing and fusing after the pixel value parameter is resolved, the convolution calculation is carried out by a layer of convolution layer, and the obtained feature vector is processed by using a Sigmoid function to obtain an improved space attention area weight change parameter F _s The calculation mode is as follows:

wherein η represents a Sigmoid function, f ^7×7 Indicating convolution with a 7 x 7 convolution kernel,and->As a result of the feature map after the pixel value parameter resolution, < >>With average pooling operations, +.>A max pooling operation is employed. Will F _c 、F _s Connecting with input feature map to obtain improved channel and spatial attention weighted feature map F _e 。

And S2, identifying objects in the image by using a sensor, and dividing the objects in the scene into three types of high dynamic, medium dynamic and low dynamic according to the acquired semantic information.

In the step, a target detection sensor is added on the basis of a double discriminator of a deblurring network, the sensor processes the restored potential clear image in the generator, downsampling is carried out again through a convolution block, and feature fusion is carried out on the image with the same size in the generator. This allows the network to perform target detection while deblurring, reducing the parameters of the network by using the sensor backbone networkThe standard convolution is replaced by a depth separable convolution, and the traditional convolution first convolves the input feature map and the corresponding convolution kernel in each channel, and then superimposes the output features of the input feature map and the corresponding convolution kernel. The traditional convolution calculated amount isThe one-step operation in the standard convolution is changed into 3 multiplied by 3 depth convolution and 1 multiplied by 1 point convolution by depth separable convolution, and the calculated amount of the depth separable convolution is +.>The ratio alpha of the calculated amounts of the conventional convolution and the depth separable convolution is

Wherein: c (C) _k Representing the size of the convolution kernel, M and N representing the number of input and output data channels, C _w And C _h Representing the height and width of the output feature matrix, respectively. The convolution kernel size is generally 3×3, so the calculation amount of the common convolution is 8-9 times of the improved network. Therefore, the calculation amount consumed in the process of identifying the object in the image is effectively reduced, and the identification efficiency is improved.

In the step, the object in the scene is classified into three types of high dynamic, medium dynamic and low dynamic according to the acquired semantic information by utilizing the recognition of the object in the image by the perceptron. Objects such as humans, animals, etc. having autonomous mobility are defined as highly dynamic objects; objects such as chairs, books, etc. are generally stationary but may be in motion at any time, such objects being either stationary or moving being defined as medium-dynamic objects; objects such as computers and tables are mostly not moved, and are therefore defined as low dynamic objects.

And S3, extracting characteristic points in the image, and classifying the high-dynamic and medium-dynamic targets into potential dynamic areas for data association.

The high dynamic object and the medium dynamic object in the step S2 are classified as potential dynamic objects, and the low dynamic object and the background are classified as static features. Extracting feature points from the acquired image, and associating the feature point states with the acquired semantic information, namely, the feature points in the potential dynamic object frame are potential feature points, and the feature points in the low dynamic object and the background are static feature points. The potential dynamic area needs to further judge the motion condition through the step S4, and the static feature points can be directly used for pose estimation and map construction without judgment.

The global conditional random field is constructed to judge the motion state of the potential dynamic point, so that the global conditional random field can refine the label results of the dynamic point and the static point, and can reject the dynamic point which is wrongly marked as the static label, thereby improving the estimation precision of the camera pose. In this model, the global observation information includes the observation and the re-projection error for each point in different frames. Based on this information, the global conditional random field can autonomously learn the characteristics of the dynamic and static points and can use these characteristics to classify new points. Converting the constructed probability model (namely the model of the global conditional random field) into Gibbs energy function to solve, and minimizing the energy function E (x) to obtain optimal label distribution of all points, wherein the energy function E (x) of the GCRF model provided by the algorithm is as follows:

the energy function is divided into two parts, namely a unitary potential function psi _u (x _i ) And binary potential functionWherein i and j represent different nodes, x _i And x _j The class labels representing nodes i and j, respectively. The algorithm constructs a binary potential model to fit each vertex in the global conditional random field, and a binary potential model to fit the connected edges between the vertices.

The unitary potential function is used for modeling the relation between the feature point set and the observation field, three static likelihood priors are defined based on three groups of complementary observation variables for the unitary potential function, the overall static probability is given by a weighting mode, and the unitary potential function is constructed as follows

Wherein, psi is _u (x _i ) Alpha is a unitary potential function of construction _i Is the space point P _i Reprojection error, beta _i Is the space point P _i Gamma, gamma _i Is the distance from the corresponding pixel point to the polar line. The pixel point is the point after the space point is projected and transformed, mu _α 、μ _β Sum mu _γ Respectively representing the mean values of the above-mentioned re-projection error, total observed number and pixel point-to-epipolar distance. Three-dimensional spatial points are typically mapped onto two-dimensional pixel points using projective transforms, known as perspective projections or pinhole camera models.

The binary potential function constructs the relation between the current node and the nodes in the field, and the detection performance of the change is improved by modeling the spatial correlation of the feature points. The invention uses bilateral Gaussian kernels to fit the model, and the bilateral sides are an observation side and a positioning side respectively. The Gaussian kernel based on the observation edge describes that nodes with similar observation quantity and average weight projection errors often belong to the same class of the same label, and the points of different labels have obvious difference in the observation times and the average weight projection errors; the gaussian kernel based on the locating edge describes that adjacent spatial points should belong to the same object and should also have the same label, and penalty terms of points which are different labels but adjacent to each other are added to the locating kernel (i.e. the gaussian kernel of the locating edge). The binary potential function promotes consistent labels for adjacent points as shown in the following equation:

the global best tag is obtained by minimizing the energy function using an effective average field approximation.

And step S4, after the dynamic point is removed, the algorithm completes the motion judgment of the space point, and the screening and the removal of the dynamic point are realized. And then, the pose estimation and map construction can be carried out by using the residual static points after the dynamic points are removed.

The following describes the process of the above-mentioned perception enhancement and motion judgment algorithm based on deep learning with reference to a specific experiment.

Fig. 5 is a comparison of the present algorithm before and after restoration of blurred images in the public data set TUM, and the present algorithm selects a scene with blur in the data set to verify the effect of the present algorithm. As can be seen from the figure, the characteristic representation of the fuzzy region is enhanced by introducing the fuzzy region attention module into the algorithm, so that the algorithm can realize the restoration of fine grain texture characteristics in the image from thick to thin and can process complex fuzzy problems.

Fig. 6 is a contrast graph of feature point extraction and feature matching of images before and after restoration in a public data set TUM. The image quality is enhanced through the perception enhancement network, then the image is subjected to feature point extraction and feature matching, the number of the feature points and the feature matching of the image before and after the algorithm is used is improved to a certain extent, and part of error feature points and feature matching are removed.

Fig. 7 is a histogram of feature matching numbers in the dynamic sequences w_half, w_xyz and w_ rpy in the public data set TUM, and it can be known from the graph that after repairing the blurred image, the feature matching numbers are improved to a certain extent, and the algorithm has higher robustness in the blurred environment.

Fig. 8 is a comparison chart of detection accuracy of targets before and after deblurring in the public data set TUM selected by the algorithm, and the chart shows that a great number of detection omission and error detection problems exist in directly detecting blurred images in the data set, and the algorithm repairs the images by adopting a perception enhancement network, so that the target detection accuracy is improved.

Fig. 9 is a comparison graph of the feature extraction of the algorithm in the public data set TUM and different algorithms, and the conventional dynamic SLAM algorithm only eliminates dynamic points through a deep learning network, so that a certain error elimination exists, and when the number of error elimination is large, the SLAM system can track and lose. The algorithm combines the deep learning network and the global conditional random field to comprehensively judge the dynamic points, and can screen out the actual dynamic points, so that the characteristic information is increased to a certain extent.

Fig. 10 to 12 the algorithm moves the trajectory diagram in the dynamic sequences w_half, w_xyz and w_ rpy in the public data set TUM. The lines in the figure represent the true track of the camera motion, the estimated camera motion track and the track error of the SLAM method, respectively. After the algorithm is combined, the influence of dynamic objects on the system is reduced due to the use of a perception enhancement network and motion judgment based on a global conditional random field, so that the algorithm is similar to a real track.

The method of this example will be described using another set of experiments: the algorithm selects a school meeting room as an indoor experimental scene, and the size is 10m multiplied by 5m as shown in the left side of fig. 13. Fig. 13 is a plan layout of a real scene, including a workbench, sections a-B are paths of reciprocating motion of pedestrians, and sections C-D are paths of motion of a robot.

Fig. 14 is a comparison of images acquired before and after restoration in a real scene. The deblurring effect of the algorithm in an actual scene is proved by the experiment. The image can be seen that the originally blurred areas in the image are restored, the detail texture of the image is effectively improved, the clear structure of the blurred object is reconstructed, and the visual smear caused in other areas is reduced.

Fig. 15 is a comparison chart of the detection accuracy of objects before and after deblurring an image in a real scene. And verifying the effectiveness of the algorithm through the image frames acquired in the actual scene. According to the image restoration method, the target detection precision is effectively improved through image restoration, and the error matching is reduced.

Fig. 16 is a graph of the effect of dynamic point elimination in a real scene according to the present invention. After the input image is processed through the perception enhancement network, the algorithm constructs a global conditional random field to judge the actual dynamic situation according to the potential dynamic area screened by the semantic information. As can be seen from the graph, compared with the traditional dynamic SLAM algorithm, the feature point elimination of the algorithm is more accurate, the error elimination is reduced, and the robustness of the SLAM system in a complex environment is improved.

Fig. 17 is a diagram of a real scene motion trajectory of a mobile robot. As can be seen from fig. 17, in the dynamic fuzzy environment, the algorithm detects the fuzzy object in the scene by adding the DYNET perception enhancement network, performs data association according to the acquired semantic information, performs dynamic point screening and rejecting by constructing the global conditional random field, reduces error rejecting, and reserves more feature point numbers for pose estimation and map construction. Therefore, the algorithm has higher robustness in a dynamic fuzzy environment.

Embodiment two:

in accordance with a second embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the following steps according to the method of the first embodiment:

The storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an optical disk, or other various media capable of storing program codes.

The specific limitation concerning the implementation steps after the program execution in the computer readable storage medium is referred to as the first embodiment, and will not be described in detail herein.

Embodiment III:

in accordance with a third aspect of the present invention, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the following steps according to the method of the first aspect when executing the program:

The above specific limitation concerning the implementation steps of the computer device may be referred to as embodiment one, and will not be described in detail herein.

It is noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, of the subject specification can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and machine instruction.

While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the above embodiments, but is capable of being modified or applied to other applications without modification, as long as various insubstantial modifications of the inventive concept and technical solutions are adopted, all within the scope of the invention.

Claims

1. The perception enhancement and motion judgment algorithm based on deep learning is characterized in that: the method comprises the following steps:

s4, screening the characteristic points of the potential dynamic area by constructing a global conditional random field, and finally eliminating the characteristic points which are judged to be dynamic in the area;

in the step S1, an input feature map is input into an improved channel attention module, a feature map channel expression mechanism is obtained, and pixel value parameter resolving operations are respectively performed on the obtained channel expression mechanism, so that the specific channel and region expression of the feature map is enhanced; improved channel attention derived regional weight change parameter F _c The calculation method is as follows:

will F _c 、F _s Connecting with input feature map to obtain improved channel and spatial attention weighted feature map F _e ；

In the step S4, the motion state of the potential dynamic point is determined by constructing a global conditional random field, global observation information includes observation conditions and reprojection errors of each point in different frames, a model of the constructed global conditional random field is converted into a Gibbs energy function to be solved, an energy function E (x) is minimized to obtain optimal label distribution of all points, and an effective average field approximation method is used to minimize the energy function to obtain a global optimal label;

the energy function E (x) is:

the energy function is divided into two parts, namely a unitary potential function psi _u (x _i ) And binary potential functionWherein i and j represent different nodes, x _i And x _j Class labels respectively representing nodes i and j; the algorithm constructs a binary potential model to fit each vertex in the global conditional random field, and fits the edges connected between the vertices;

the unitary potential function is used to model the relationship between the feature point set and the observed field as follows:

wherein, psi is _u (x _i ) Alpha is a unitary potential function of construction _i Is the space point P _i Reprojection error, beta _i Is the space point P _i Gamma, gamma _i For the distance from the corresponding pixel point to the polar line, the pixel point is the point of the space point after projection transformation, mu _α 、μ _β Sum mu _γ Respectively representing the mean values of the re-projection errors, the total observation number and the pixel point-to-polar line distance;

the binary potential function constructs the relation between the current node and the nodes in the field, and improves the detection performance of the change by modeling the spatial correlation of the feature points, as shown in the following formula:

2. the deep learning-based perception enhancement and motion determination algorithm of claim 1, wherein: in the step S2, a target detection sensor is added on the basis of a double identifier of a deblurring network, the sensor processes the restored potential clear image in the generator, downsampling is carried out again through a convolution block, and feature fusion is carried out on the image with the same size in the generator; the convolution in the perceptron backbone network is a depth separable convolution, the depth separable convolution is a 3×3 depth convolution and a 1×1 point-by-point convolution, and the depth separable calculation amount isWherein: c (C) _k Representing the size of the convolution kernel, M and N representing the number of input and output data channels, C _w And C _h Representing the height and width of the output feature matrix, respectively.

3. The deep learning-based perception enhancement and motion determination algorithm of claim 1, wherein: in the step S2, the object in the scene is classified into three types of high dynamic, medium dynamic and low dynamic according to the obtained semantic information by using the recognition of the object in the image by the sensor; an object with autonomous movement capability is defined as a highly dynamic object; objects that are both stationary and moving are defined as medium-dynamic objects; the object that will not move in most cases is defined as a low dynamic object.

4. The deep learning-based perception enhancement and motion determination algorithm of claim 1, wherein: in the step S3, the high dynamic object and the medium dynamic object in the step S2 are classified as potential dynamic objects, and the low dynamic object and the background are classified as static features; extracting feature points of the acquired image, correlating the state of the feature points with the acquired semantic information, wherein the feature points in the potential dynamic object frame are potential feature points, and the feature points in the low dynamic object and the background are static feature points.

5. A computer-readable storage medium having stored thereon a computer program, characterized by: the program when executed by a processor implements the steps of a deep learning based perceptual enhancement and motion judgment algorithm as defined in any one of claims 1-4.

6. A computer device comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor, when executing the computer program, implements the steps of a deep learning based perception enhancement and motion determination algorithm as claimed in any one of claims 1-4.