CN113838135B - Pose estimation method, system and medium based on LSTM double-flow convolutional neural network - Google Patents

Pose estimation method, system and medium based on LSTM double-flow convolutional neural network Download PDF

Info

Publication number
CN113838135B
CN113838135B CN202111181525.6A CN202111181525A CN113838135B CN 113838135 B CN113838135 B CN 113838135B CN 202111181525 A CN202111181525 A CN 202111181525A CN 113838135 B CN113838135 B CN 113838135B
Authority
CN
China
Prior art keywords
depth
flow
color
feature map
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111181525.6A
Other languages
Chinese (zh)
Other versions
CN113838135A (en
Inventor
罗元
曾勇超
胡章芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111181525.6A priority Critical patent/CN113838135B/en
Publication of CN113838135A publication Critical patent/CN113838135A/en
Application granted granted Critical
Publication of CN113838135B publication Critical patent/CN113838135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a pose estimation method, a pose estimation system and a pose estimation medium based on an LSTM double-flow convolutional neural network, wherein the method comprises the following steps: s1, preprocessing a color image and a depth image, respectively cascading two adjacent frames of color images and depth images, further preprocessing the depth image by MND (maximum length coding), and finally normalizing the color image and the depth image; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the rgb feature map of the color flow output and depth feature map of the depth flow output to generate a new fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through the LSTM neural network. The result shows that the pose estimation model provided by the method has higher precision and robustness under the conditions of motion blur and insufficient light.

Description

Pose estimation method, system and medium based on LSTM double-flow convolutional neural network
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a pose estimation method of an LSTM double-flow convolutional neural network.
Background
The intelligent manufacturing proposed by industry 4.0 is full life cycle oriented to products, and informatization manufacturing under ubiquitous sensing conditions is realized. The intelligent manufacturing technology is based on modern sensing technology, network technology, automation technology and artificial intelligence, and realizes the intellectualization of product design process, manufacturing process and enterprise management and service through perception, man-machine interaction, decision-making, execution and feedback, and is the deep fusion and integration of information technology and manufacturing technology. Indoor mobile robots are one of the representative products that incorporate modern sensing technology, networking technology, and automation technology proposed by industry 4.0.
Mobile robotics are widely used in the fields of resource exploration and development, medical services, home entertainment, military, aerospace, etc., for example Auto Guided Vehicle (AGV), and cleaning robots have been used in logistics transportation and home sanitary cleaning. In intelligent mobile robots, simultaneous localization and mapping (Simultaneous Localization and Mapping, SLAM) is its core technology. The navigation process of a mobile robot can be broken down into three modules: positioning, mapping and path planning. Positioning is used for determining the pose state of the robot in the environment at the current moment; the construction is to integrate the local continuous observation information of the surrounding environment into a globally consistent model; the path planning determines the optimal navigation path in the map.
Artificial intelligence techniques, which are currently capable of simulating human reasoning, judgment and memory, are widely used in various aspects, such as face recognition, object classification, etc. Similar to the application of the deep learning technology in the face recognition field, the visual odometer based on the feature point method also needs to detect, match and screen feature points. Therefore, the application of the deep learning technology to the visual odometer of SLAM has feasibility, and the visual odometer based on the deep learning is more in line with the human perception mode, and has wide research potential and value. Most of the existing visual mileage calculation methods basically go through links such as feature extraction matching, motion estimation, local optimization and the like, and are greatly influenced by camera parameters, motion blur and insufficient light.
The prior art comprises the following steps: a target positioning method based on deep image double-flow convolutional neural network regression learning (patent application number: 201910624713.8, patent publication number: CN 110443849A). According to the method, a binocular camera is adopted to shoot two pictures at the same time, depth reduction is carried out through an image preprocessing technology to obtain a depth image, and a color image is converted into a gray image in image preprocessing. After pretreatment, the two types of images are respectively input into a convolutional neural network to extract features, then the two features are subjected to convolutional feature fusion, and finally the two features are input into a full-union layer to carry out regression operation. The invention adopts the RGB-D camera as the sensor, can directly acquire the RGB image and the corresponding depth image, and does not need to convert the RGB image into the gray image. The RGB image and the depth image are input into a double-flow convolutional neural network after preprocessing is completed, color features in the RGB image and depth features of the depth image are respectively obtained, two feature maps are input into a feature fusion unit for splicing feature fusion, and finally the fusion features are input into a long-short-period memory cyclic neural network (LSTM) for time sequence modeling to obtain pose information. Compared with 201910624713.8, the method has different sensors, different preprocessing methods, different convolutional neural network structures, different feature fusion methods and different pose estimation methods.
Through the search, the closest prior art is: 201910624713.8A target positioning method based on double-flow convolutional neural network regression learning of depth image is characterized in that S1, at each reference position, a binocular camera collects gray level images and corresponding depth images thereof; s2, converting the gray level image and the depth image into three-channel images by using an image preprocessing technology; s3, double-flow CNN with shared weight coefficient is used for offline regression learning to obtain a regression model based on distance; s4 after preprocessing of the gray image and the depth image, the final distance may be estimated by a distance-based regression model. It is clear that it also makes some beneficial attempts, but there are also some problems of poor robustness and large errors in the line of sight estimation.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A pose estimation method, medium and system based on LSTM double-flow convolutional neural network are provided. The technical scheme of the invention is as follows:
a pose estimation method based on an LSTM double-flow convolutional neural network comprises the following steps:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a MND (minimum normal+depth) method, and finally normalizing the color images and the depth images; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through the LSTM neural network.
Further, the color image preprocessing specifically includes that adjacent frames of the color image are cascaded to generate a color image with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
Further, the step S2 inputs the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the dual-flow convolutional neural network respectively for feature extraction, specifically: adopting a double-flow convolutional neural network architecture, wherein the consistent structure of color flow and depth flow consists of 5 layers of convolutional layers, extracting the characteristics of different layers in an image, and the first four layers are provided with ReLU activating units; pre-processed color image I rgb As input to the color stream, the preprocessed depth image I is to be processed depth And as the input of the depth stream, respectively obtaining a color characteristic spectrum and a depth structure characteristic spectrum through convolution operation.
Further, the dual-flow convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, the first four convolutional layers of each branch pass through a ReLU activation unit, and the formula is expressed as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
Further, in the step S3, the fusing of the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map specifically includes: combining the feature maps output by conv5 in two data stream networks to form a new fusion feature map, and performing global mean value pooling processing after batch normalization and ReLU nonlinear activation units, wherein the generated fusion characteristics are expressed as follows:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
Further, the step S5 predicts the current pose by training using the LSTM neural network, and specifically includes:
performing time sequence modeling on the image sequence by utilizing an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning and forgetting information useless for estimating the current pose information; the forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting door;
wherein the input gate determines what information to add to the current state, the input gate is selected by the input selection layer i k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter; b i Is a selection layer bias; b c Is a bias for the candidate layer;
wherein the output gate decides what predictions to make, the formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate;
finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the LSTM dual-flow convolutional neural network-based pose estimation method as claimed in any one of the claims.
An LSTM double-flow convolutional neural network pose estimation system based on the method comprises the following steps:
and a pretreatment module: the method comprises the steps of preprocessing color images and depth images acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a minimized normal + depth method, and finally normalizing the color images and the depth images;
and the feature extraction module is used for: the method comprises the steps of inputting a preprocessed color image and a preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network respectively for feature extraction;
and a fusion module: the method comprises the steps of fusing a color feature map rgb feature map output by a color flow and a depth feature map depth feature map output by a depth flow to generate a new fused feature map fusion feature map; carrying out global average pooling treatment on the newly generated fusion characteristic map fusion feature map;
and a prediction module: and predicting the current pose through training by utilizing the LSTM neural network.
The invention has the advantages and beneficial effects as follows:
aiming at the problems that a visual odometer is sensitive to camera parameters and is greatly influenced by motion blur and insufficient light, the invention provides a convolutional neural network based on LSTM double flow, and the contour features extracted by depth flow supplement the color features extracted by color flow so as to improve the robustness of a pose estimation system in the motion blur and insufficient light environment.
Experiments on the public data set TUM show that the pose estimation is more robust in motion blur and insufficient light environments when the pose estimation fuses the contour features extracted from the depth image. Comparing the algorithm model with other pose estimation methods based on convolutional neural networks, the model has smaller error on vision estimation and obtains superior performance.
According to the pose estimation method based on the LSTM double-flow convolutional neural network, the step S2 is used for respectively inputting the preprocessed color image and depth image into the color flow and depth flow of the double-flow convolutional neural network for feature extraction. The method provides a new double-flow convolutional neural network structure, wherein the color flow and the depth flow have the same structure and are composed of 5 layers of convolutional layers, and the first four layers are provided with ReLU activating units. The method introduces depth features into the system through the double-flow architecture of the convolutional neural network, has higher precision and robustness compared with other gesture regression systems based on the convolutional neural network, and particularly has better performance in challenging environments.
According to the method, according to claims 4-6, the double-flow convolutional neural network is adopted to extract color features and depth features, the color features and the depth features are fused, and finally the fused features are used as inputs of a long-short-term memory cyclic neural network (LSTM) to conduct time sequence modeling, so that the current pose is estimated. The pose estimation is to estimate the pose by finding out the front and back rules in the image stream, and the long and short-term memory cyclic neural network can memorize the previous state, find out the association between the current moment and the past moment, and is just suitable for solving the problem of pose regression. The common method adopts the full-linked layer to predict pose information, and is more suitable for the problems of object identification and classification.
Drawings
FIG. 1 is a diagram of a pose estimation framework based on an LSTM dual-flow convolutional neural network in accordance with a preferred embodiment of the present invention;
FIG. 2 is a block diagram of an LSTM dual-flow convolutional neural network.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
s1, preprocessing a color image and a depth image, respectively cascading two adjacent frames of color images and depth images, further preprocessing the depth image by MND coding, and finally normalizing the color image and the depth image.
S2, respectively inputting the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the double-flow convolutional neural network to perform feature extraction. And then merging the rgb feature map of the color flow output and the depth feature map of the depth flow output to generate a new fusion feature map. Wherein the dual-flow convolutional neural network adopts a parallel structure, and each branch of the parallel structure consists of five convolutional layers, and the first four convolutional layers of each branch pass through a ReLU activation unit. The formula is as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
The generated fusion features are expressed as:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
S3, predicting the current pose by training through an LSTM neural network. The LSTM neural network is composed of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning, and forgetting information useless for estimating the current pose information. The forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting gate.
The input gate determines what information to add to the current state, and the input gate selects layer i by input k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter. b i Is a selection layer bias; b c Is the bias of the candidate layer.
The output gate decides what predictions to make, and its formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate.
Finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose. .
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (7)

1. The pose estimation method based on the LSTM double-flow convolutional neural network is characterized by comprising the following steps of:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a MND (minimum normal+depth) method, and finally normalizing the color images and the depth images; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through an LSTM neural network;
the color image preprocessing specifically comprises the steps of cascading adjacent frames of color images to generate color images with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
2. The pose estimation method based on the LSTM dual-flow convolutional neural network according to claim 1, wherein the step S2 inputs the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the dual-flow convolutional neural network respectively for feature extraction, specifically comprises: the structure of the color flow and the depth flow are consistent and are composed of 5 convolution layers by adopting a double-flow convolution neural network architecture, and the graph is extractedFeatures of different layers in the image, wherein the first four layers are provided with ReLU activation units; pre-processed color image I rgb As input to the color stream, the preprocessed depth image I is to be processed depth And as the input of the depth stream, respectively obtaining a color characteristic spectrum and a depth structure characteristic spectrum through convolution operation.
3. The pose estimation method based on LSTM dual-flow convolutional neural network according to any one of claims 1-2, wherein the dual-flow convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, the first four convolutional layers of each branch are subjected to a ReLU activation unit, and the formula is expressed as:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
4. The pose estimation method based on LSTM dual-flow convolutional neural network according to claim 3, wherein the step S3 of fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map specifically comprises: combining the feature maps output by conv5 in two data stream networks to form a new fusion feature map, and performing global mean value pooling processing after batch normalization and ReLU nonlinear activation units, wherein the generated fusion characteristics are expressed as follows:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
5. The method for estimating pose based on LSTM dual-flow convolutional neural network according to claim 4, wherein said step S5 predicts the current pose by training using LSTM neural network, and specifically comprises:
performing time sequence modeling on the image sequence by utilizing an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning and forgetting information useless for estimating the current pose information; the forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting door;
wherein the input gate determines what information to add to the current state, the input gate is selected by the input selection layer i k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter; b i Is a selection layer bias; b c Is a bias for the candidate layer;
wherein the output gate decides what predictions to make, the formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate;
finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose.
6. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the pose estimation method based on the LSTM dual-flow convolutional neural network according to any one of claims 1 to 5 is implemented.
7. A pose estimation system of an LSTM dual-flow convolutional neural network based on the method of any one of claims 1-5, comprising the steps of:
and a pretreatment module: the method comprises the steps of preprocessing color images and depth images acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a minimized normal + depth method, and finally normalizing the color images and the depth images;
and the feature extraction module is used for: the method comprises the steps of inputting a preprocessed color image and a preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network respectively for feature extraction;
and a fusion module: the method comprises the steps of fusing a color feature map rgb feature map output by a color flow and a depth feature map depth feature map output by a depth flow to generate a new fused feature map fusion feature map; carrying out global average pooling treatment on the newly generated fusion characteristic map fusion feature map;
and a prediction module: predicting the current pose through training by utilizing an LSTM neural network;
the color image preprocessing specifically comprises the steps of cascading adjacent frames of color images to generate color images with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
CN202111181525.6A 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network Active CN113838135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111181525.6A CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111181525.6A CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Publications (2)

Publication Number Publication Date
CN113838135A CN113838135A (en) 2021-12-24
CN113838135B true CN113838135B (en) 2024-03-19

Family

ID=78968495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111181525.6A Active CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Country Status (1)

Country Link
CN (1) CN113838135B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615183B (en) * 2022-03-14 2023-09-05 广东技术师范大学 Routing method, device, computer equipment and storage medium based on resource prediction
CN115577755A (en) * 2022-11-28 2023-01-06 中环服(成都)科技有限公司 Robot posture correction method, apparatus, computer device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163909A (en) * 2018-02-12 2019-08-23 北京三星通信技术研究有限公司 For obtaining the method, apparatus and storage medium of equipment pose
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
CN111796681A (en) * 2020-07-07 2020-10-20 重庆邮电大学 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
CN111833400A (en) * 2020-06-10 2020-10-27 广东工业大学 Camera position and posture positioning method
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method
WO2021098766A1 (en) * 2019-11-20 2021-05-27 北京影谱科技股份有限公司 Orb feature visual odometer learning method and device based on image sequence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163909A (en) * 2018-02-12 2019-08-23 北京三星通信技术研究有限公司 For obtaining the method, apparatus and storage medium of equipment pose
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
WO2021098766A1 (en) * 2019-11-20 2021-05-27 北京影谱科技股份有限公司 Orb feature visual odometer learning method and device based on image sequence
CN111833400A (en) * 2020-06-10 2020-10-27 广东工业大学 Camera position and posture positioning method
CN111796681A (en) * 2020-07-07 2020-10-20 重庆邮电大学 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method

Also Published As

Publication number Publication date
CN113838135A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN111127513B (en) Multi-target tracking method
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
Mao et al. Fire recognition based on multi-channel convolutional neural network
CN110781262B (en) Semantic map construction method based on visual SLAM
CN113838135B (en) Pose estimation method, system and medium based on LSTM double-flow convolutional neural network
CN111079780B (en) Training method for space diagram convolution network, electronic equipment and storage medium
Akai et al. Simultaneous pose and reliability estimation using convolutional neural network and Rao–Blackwellized particle filter
CN112489081B (en) Visual target tracking method and device
CN112749726B (en) Training method and device for target detection model, computer equipment and storage medium
CN110443849B (en) Target positioning method for double-current convolution neural network regression learning based on depth image
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN111754546A (en) Target tracking method, system and storage medium based on multi-feature map fusion
CN110969648A (en) 3D target tracking method and system based on point cloud sequence data
CN114091554A (en) Training set processing method and device
CN112258565B (en) Image processing method and device
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
Yu et al. LiDAR-based localization using universal encoding and memory-aware regression
Panda et al. Kernel density estimation and correntropy based background modeling and camera model parameter estimation for underwater video object detection
Alvar et al. Mixture of merged gaussian algorithm using RTDENN
CN117058235A (en) Visual positioning method crossing various indoor scenes
CN111578956A (en) Visual SLAM positioning method based on deep learning
CN113128285A (en) Method and device for processing video
CN112529025A (en) Data processing method and device
Xia et al. Hybrid feature adaptive fusion network for multivariate time series classification with application in AUV fault detection
CN114372999A (en) Object detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant