CN113838135A - Pose estimation method, system and medium based on LSTM double-current convolution neural network - Google Patents

Pose estimation method, system and medium based on LSTM double-current convolution neural network Download PDF

Info

Publication number
CN113838135A
CN113838135A CN202111181525.6A CN202111181525A CN113838135A CN 113838135 A CN113838135 A CN 113838135A CN 202111181525 A CN202111181525 A CN 202111181525A CN 113838135 A CN113838135 A CN 113838135A
Authority
CN
China
Prior art keywords
depth
feature map
neural network
image
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111181525.6A
Other languages
Chinese (zh)
Other versions
CN113838135B (en
Inventor
罗元
曾勇超
胡章芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111181525.6A priority Critical patent/CN113838135B/en
Publication of CN113838135A publication Critical patent/CN113838135A/en
Application granted granted Critical
Publication of CN113838135B publication Critical patent/CN113838135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention requests to protect a pose estimation method, a pose estimation system and a pose estimation medium based on an LSTM double-current convolutional neural network, wherein the method comprises the following steps: s1, preprocessing the color image and the depth image, respectively cascading the color image and the depth image of two adjacent frames, further preprocessing the depth image by MND coding, and finally normalizing the color image and the depth image; s2, respectively inputting the color image and the depth image which are preprocessed into a color flow and a depth flow of a double-flow convolution neural network for feature extraction; s3, fusing the rgb feature map output by the color stream and the depth feature map output by the depth stream to generate a new fusion feature map; s4, performing global mean pooling on the newly generated fusion feature map; and S5, predicting the current pose by training through an LSTM neural network. The result shows that the pose estimation model provided by the method has higher precision and robustness under the conditions of motion blur and insufficient light.

Description

Pose estimation method, system and medium based on LSTM double-current convolution neural network
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a pose estimation method of an LSTM double-current convolution neural network.
Background
The intelligent manufacturing proposed by industry 4.0 is oriented to the whole life cycle of products, and realizes the informatization manufacturing under the ubiquitous sensing condition. The intelligent manufacturing technology is based on modern sensing technology, network technology, automation technology and artificial intelligence, realizes the intellectualization of product design process, manufacturing process and enterprise management and service through perception, man-machine interaction, decision, execution and feedback, and is the deep integration of information technology and manufacturing technology. The indoor mobile robot is one of the representative products fusing the modern sensing technology, the network technology and the automation technology proposed by the industry 4.0.
The mobile robot technology is widely applied to the fields of resource exploration and development, medical service, home entertainment, military, aerospace and the like, for example, Auto Guided Vehicle (AGV) and cleaning robots are applied to logistics transportation and home sanitation cleaning. In an intelligent mobile robot, Simultaneous Localization and Mapping (SLAM) is a core technology thereof. The navigation process of a mobile robot can be broken down into three modules: positioning, mapping and path planning. Positioning is used for determining the pose state of the robot in the environment at the current moment; the mapping is to integrate the local continuous observation information of the surrounding environment into a globally consistent model; the path plan determines the optimal navigation path in the map.
Artificial intelligence technology capable of simulating human reasoning, judgment and memory is widely applied in various aspects, such as face recognition, object classification and the like. Similar to the application of deep learning techniques in the field of face recognition, feature point method-based visual odometry also requires feature point detection, matching and screening. Therefore, the deep learning technology is feasible to be applied to the visual odometer of the SLAM, and the visual odometer based on the deep learning is more consistent with the human perception mode and has wide research potential and value. Most of the existing visual mileage calculation methods basically pass through links such as feature extraction matching, motion estimation, local optimization and the like and are greatly influenced by camera parameters, motion blur and insufficient light.
The prior art includes: a target positioning method based on double-current convolution neural network regression learning of depth images (patent application No. 201910624713.8, patent publication No. CN 110443849A). The method adopts a binocular camera to shoot two photos simultaneously, carries out depth reduction through an image preprocessing technology to obtain a depth image, and converts a color image into a gray image in the image preprocessing. After preprocessing, the two images are respectively input into a convolutional neural network to extract features, then the two features are subjected to convolutional feature fusion, and finally the two features are input into a full-link layer to be subjected to regression operation. The invention adopts the RGB-D camera as the sensor, can directly acquire the RGB image and the corresponding depth image by the camera, and does not need to convert the RGB image into a gray image. Preprocessing the RGB image and the depth image, inputting the preprocessed RGB image and the preprocessed depth image into a double-current convolution neural network, respectively obtaining color features in the RGB image and depth features of the depth image, inputting the two feature maps into a feature fusion unit for splicing feature fusion, and finally inputting the fused features into a long-short term memory recurrent neural network (LSTM) for time sequence modeling to obtain pose information. Compared with 201910624713.8, the method has different sensors, different preprocessing methods, different convolutional neural network structures, different feature fusion methods and different pose estimation methods.
After retrieval, the closest prior art is as follows: 201910624713.8, a target positioning method based on the double-current convolution neural network regression learning of depth images, characterized in that, S1, at each reference position, a binocular camera collects a gray image and a depth image corresponding to the gray image; s2 converting the grayscale image and the depth image into three-channel images using an image preprocessing technique; s3 the double-flow CNN with the shared weight coefficient is used for off-line regression learning to obtain a distance-based regression model; s4 after the preprocessing of the grayscale and depth images, the final distance may be estimated by a distance-based regression model. Obviously, some beneficial attempts are made, but some problems of poor robustness and large error of the line-of-sight estimation exist.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A pose estimation method, medium and system based on an LSTM double-current convolutional neural network are provided. The technical scheme of the invention is as follows:
a pose estimation method based on an LSTM double-current convolutional neural network comprises the following steps:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading the color image and the depth image of two adjacent frames, preprocessing the depth image by adopting minimum normal plus depth MND coding, and finally normalizing the color image and the depth image; s2, respectively inputting the color image and the depth image which are preprocessed into a color flow and a depth flow of a double-flow convolution neural network for feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map output by the depth flow to generate a new fusion feature map; s4, performing global mean pooling on the newly generated fusion feature map; and S5, predicting the current pose by training through an LSTM neural network.
Further, the color image preprocessing specifically includes cascading adjacent frames of the color image to generate a color image with a size of 640 × 960; the depth image preprocessing specifically includes that the depth image is subjected to MND coding processing, and the width and the height of the depth image are scaled to be nxAnd nyUsing depth d as the third channel of the image, for the scaled surface normal [ n ]x,ny,d]Satisfy the requirement of
Figure BDA0003297434570000031
Adjacent frames of the depth image are then concatenated to generate a depth image of 640 x 960 size.
Further, step S2 is to input the color image and the depth image after the preprocessing into the color stream and the depth stream of the dual-stream convolutional neural network, respectively, to perform feature extraction, specifically: adopting a double-current convolutional neural network architecture, wherein the structures of the color flow and the depth flow are consistent and are composed of 5 convolutional layers, extracting the characteristics of different layers in the image, and the first four layers are provided with a ReLU activation unit; preprocessing the color image IrgbAs input of the color stream, the preprocessed depth image IdepthAnd (3) as the input of the depth stream, respectively obtaining a color feature map and a depth structure feature map through convolution operation.
Further, the dual-current convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, the first four convolutional layers of each branch pass through a ReLU activation unit, and the formula is as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after the ReLU unit.
Further, the step S3 is to fuse the color feature map rgb feature map output by the color stream and the depth feature map output by the depth stream to generate a new fusion feature map specifically: combining the feature maps output by conv5 in the two data flow networks to form a new fusion feature map, performing global mean pooling after batch normalization and a ReLU nonlinear activation unit, and expressing the generated fusion feature as follows:
Figure BDA0003297434570000041
wherein, XkIs a fusion feature map that is a function of the feature map,
Figure BDA0003297434570000042
is the rgb feature map, and the rgb feature map,
Figure BDA0003297434570000043
is a depth feature map.
Further, the step S5 predicts the current pose by training using the LSTM neural network, and specifically includes:
performing time sequence modeling on the image sequence by using an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and the information which is useful for estimating the current pose information is memorized through learning, and the information which is not useful for estimating the current pose information is forgotten; the forgetting door can control to forget useless information of the previous state, and the formula is as follows:
fk=σ(Wf·[hk-1,xk]+bf) (3)
wherein f iskIs the output of a forgetting gate, σ is the sigmoid function, WfIs a forgetting parameter, hk-1Is the hidden state at the previous moment, xkIs an input of the current time, bfIs the bias of the forgetting gate;
in which the input gate decides what information to add to the current state, the input gate selecting layer i from the inputkAnd candidate layer
Figure BDA0003297434570000044
The formula is as follows:
ik=σ(Wi·[hk-1,xk]+bi) (4)
Figure BDA0003297434570000045
wherein WiIs an input parameter, tanh is a hyperbolic tangent function, WCIs a candidate parameter; biIs the select layer bias; bcIs the bias of the candidate layer;
where the output gate decides what prediction to make, the formula is:
ok=σ(Wo·[hk-1,xk]+bo) (6)
wherein WoIs an output parameter; boIs the offset of the output gate;
and finally, designing a loss function by minimizing the Euclidean distance between the real pose and the estimated pose, wherein the loss function is as follows:
Figure BDA0003297434570000051
where N is the number of samples; w is a weight coefficient of position and attitude;
Figure BDA0003297434570000052
to estimate the pose;
Figure BDA0003297434570000053
is an actual pose.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a pose estimation method based on an LSTM dual-flow convolutional neural network as claimed in any one of the preceding claims.
A pose estimation system of an LSTM double-current convolutional neural network based on the method comprises the following steps:
a preprocessing module: the system is used for preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading the two adjacent frames of the color image and the depth image, coding and preprocessing the depth image by adopting a minimum normal plus a depth method, and finally normalizing the color image and the depth image;
a feature extraction module: the color image and the depth image which are preprocessed are respectively input into a color flow and a depth flow of a double-flow convolution neural network for feature extraction;
a fusion module: the fusion feature map fusion method comprises the steps that a color feature map rgb feature map output by a color flow and a depth feature map output by a depth flow are fused to generate a new fusion feature map; carrying out global mean pooling on the newly generated fusion feature map;
a prediction module: and predicting the current pose by using an LSTM neural network through training.
The invention has the following advantages and beneficial effects:
aiming at the problems that a visual odometer is sensitive to camera parameters and is greatly influenced by motion blur and insufficient light, the invention provides an LSTM-based double-current convolution neural network, so that the contour features extracted by a depth flow supplement the color features extracted by a color flow, and the robustness of a pose estimation system in the environment with motion blur and insufficient light is improved.
By testing on the public data set TUM, experiments show that there is better robustness in motion-blurred and light-deficient environments when the pose estimation fuses the contour features extracted from the depth image. Compared with other pose estimation methods based on a convolutional neural network, the pose estimation method based on the convolutional neural network has the advantages that the error of the model on sight estimation is smaller, and superior performance is achieved.
The method proposes a new pose estimation method based on an LSTM dual-stream convolutional neural network, and according to the pose estimation method based on an LSTM dual-stream convolutional neural network described in claim 1, in step S2, the color image and the depth image after being preprocessed are respectively input into the color stream and the depth stream of the dual-stream convolutional neural network for feature extraction. The method provides a new double-current convolutional neural network structure, the structures of the color flow and the depth flow are consistent and are composed of 5 convolutional layers, and the first four layers are all provided with ReLU activation units. The method introduces the depth characteristics into the system through the double-flow architecture of the convolutional neural network, has higher precision and robustness compared with other attitude regression systems based on the convolutional neural network, and particularly has better performance in challenging environments.
The method provides a new pose estimation method based on an LSTM double-current convolutional neural network, according to the claims 4-6, the method firstly adopts the double-current convolutional neural network to extract color features and depth features, then fuses the color features and the depth features, and finally carries out time sequence modeling by taking the fused features as the input of a long-short term memory cyclic neural network (LSTM) to estimate the current pose. The pose estimation is to estimate the pose by finding out the front and back rules in the image stream, and the long and short term memory cyclic neural network can memorize the former state and find out the association between the current time and the past time, which is just suitable for solving the pose regression problem. The common method adopts full-hierarchy prediction pose information, and is more suitable for the problems of object identification and classification.
Drawings
FIG. 1 is a pose estimation framework diagram based on an LSTM dual-flow convolutional neural network according to a preferred embodiment of the present invention;
FIG. 2 is a diagram of an LSTM dual-stream convolutional neural network.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
and S1, preprocessing the color image and the depth image, respectively cascading the color image and the depth image of two adjacent frames, further preprocessing the depth image by MND coding, and finally normalizing the color image and the depth image.
And S2, respectively inputting the color image and the depth image which are preprocessed into the color flow and the depth flow of the double-flow convolutional neural network for feature extraction. And fusing the rgb feature map output by the color stream and the depth feature map output by the depth stream to generate a new fusion feature map. The dual-current convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, and the first four convolutional layers of each branch pass through a ReLU activation unit. The formula is expressed as:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after the ReLU unit.
The resulting fused features are represented as:
Figure BDA0003297434570000071
wherein, XkIs a fusion feature map that is a function of the feature map,
Figure BDA0003297434570000072
is the rgb feature map, and the rgb feature map,
Figure BDA0003297434570000073
is a depth feature map.
And S3, predicting the current pose by training through an LSTM neural network. The LSTM neural network is composed of a forgetting gate, an input gate and an output gate, and memorizes information useful for estimating the current pose information through learning, and forgets information useless for estimating the current pose information. The forgetting door can control the forgetting of useless information in the previous state, and the formula is as follows:
fk=σ(Wf·[hk-1,xk]+bf) (3)
wherein f iskIs the output of a forgetting gate, σ is the sigmoid function, WfIs a forgetting parameter, hk-1Is the hidden state at the previous moment, xkIs an input of the current time, bfIs the bias of the forgetting gate.
The input gate, which selects layer i from the input, decides what information to add to the current statekAnd candidate layer
Figure BDA0003297434570000081
The formula is as follows:
ik=σ(Wi·[hk-1,xk]+bi) (4)
Figure BDA0003297434570000082
wherein WiIs an input parameter, tanh is a hyperbolic tangent function, WCAre candidate parameters. biIs the select layer bias; bcIs the bias of the candidate layer.
The output gate decides what prediction to make, and it has the formula:
ok=σ(Wo·[hk-1,xk]+bo) (6)
wherein WoIs an output parameter; boIs the offset of the output gate.
And finally, designing a loss function by minimizing the Euclidean distance between the real pose and the estimated pose, wherein the loss function is as follows:
Figure BDA0003297434570000083
where N is the number of samples; w is a weight coefficient of position and attitude;
Figure BDA0003297434570000084
to estimate the pose;
Figure BDA0003297434570000085
is an actual pose. .
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A pose estimation method based on an LSTM double-current convolutional neural network is characterized by comprising the following steps:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading the color image and the depth image of two adjacent frames, preprocessing the depth image by adopting minimum normal plus depth MND coding, and finally normalizing the color image and the depth image; s2, respectively inputting the color image and the depth image which are preprocessed into a color flow and a depth flow of a double-flow convolution neural network for feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map output by the depth flow to generate a new fusion feature map; s4, performing global mean pooling on the newly generated fusion feature map; and S5, predicting the current pose by training through an LSTM neural network.
2. The pose estimation method based on the LSTM dual-current convolutional neural network according to claim 1, wherein the color image preprocessing specifically includes concatenating adjacent frames of a color image to generate a color image with a size of 640 × 960;the depth image preprocessing specifically includes that the depth image is subjected to MND coding processing, and the width and the height of the depth image are scaled to be nxAnd nyUsing depth d as the third channel of the image, for the scaled surface normal [ n ]x,ny,d]Satisfy the requirement of
Figure FDA0003297434560000011
Adjacent frames of the depth image are then concatenated to generate a depth image of 640 x 960 size.
3. The pose estimation method based on the LSTM dual-stream convolutional neural network according to claim 1, wherein the step S2 inputs the color image and the depth image that are processed in advance into the color stream and the depth stream of the dual-stream convolutional neural network respectively for feature extraction, specifically: adopting a double-current convolutional neural network architecture, wherein the structures of the color flow and the depth flow are consistent and are composed of 5 convolutional layers, extracting the characteristics of different layers in the image, and the first four layers are provided with a ReLU activation unit; preprocessing the color image IrgbAs input of the color stream, the preprocessed depth image IdepthAnd (3) as the input of the depth stream, respectively obtaining a color feature map and a depth structure feature map through convolution operation.
4. The pose estimation method based on the LSTM dual-stream convolutional neural network as claimed in any of claims 1-3, wherein the dual-stream convolutional neural network adopts a parallel structure, and each branch of the parallel structure is composed of five convolutional layers, and the first four convolutional layers of each branch pass through a ReLU activation unit, and the formula is as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after the ReLU unit.
5. The pose estimation method based on the LSTM dual-current convolutional neural network according to claim 4, wherein the step S3 is to fuse the color feature map rgb feature map output by the color flow and the depth feature map output by the depth flow to generate a new fusion feature map specifically: combining the feature maps output by conv5 in the two data flow networks to form a new fusion feature map, performing global mean pooling after batch normalization and a ReLU nonlinear activation unit, and expressing the generated fusion feature as follows:
Figure FDA0003297434560000021
wherein, XkIs a fusion feature map that is a function of the feature map,
Figure FDA0003297434560000022
is the rgb feature map, and the rgb feature map,
Figure FDA0003297434560000023
is a depth feature map.
6. The pose estimation method based on the LSTM dual-flow convolutional neural network of claim 5, wherein the step S5 predicts the current pose through training by using the LSTM neural network, and specifically includes:
performing time sequence modeling on the image sequence by using an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and the information which is useful for estimating the current pose information is memorized through learning, and the information which is not useful for estimating the current pose information is forgotten; the forgetting door can control to forget useless information of the previous state, and the formula is as follows:
fk=σ(Wf·[hk-1,xk]+bf) (3)
wherein f iskIs the output of a forgetting gate, σ is the sigmoid function, WfIs a forgetting parameter, hk-1Is the hidden state at the previous moment, xkIs an input of the current time, bfIs a forgetting doorBias of (3);
in which the input gate decides what information to add to the current state, the input gate selecting layer i from the inputkAnd candidate layer
Figure FDA0003297434560000024
The formula is as follows:
ik=σ(Wi·[hk-1,xk]+bi) (4)
Figure FDA0003297434560000031
wherein WiIs an input parameter, tanh is a hyperbolic tangent function, WCIs a candidate parameter; biIs the select layer bias; bcIs the bias of the candidate layer;
where the output gate decides what prediction to make, the formula is:
ok=σ(Wo·[hk-1,xk]+bo) (6)
wherein WoIs an output parameter; boIs the offset of the output gate;
and finally, designing a loss function by minimizing the Euclidean distance between the real pose and the estimated pose, wherein the loss function is as follows:
Figure FDA0003297434560000032
where N is the number of samples; w is a weight coefficient of position and attitude;
Figure FDA0003297434560000033
to estimate the pose;
Figure FDA0003297434560000034
is an actual pose.
7. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the LSTM dual-stream convolutional neural network-based pose estimation method of any of claims 1 to 6.
8. A pose estimation system of an LSTM dual-flow convolutional neural network based on the method of claims 1-6, comprising the steps of:
a preprocessing module: the system is used for preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading the two adjacent frames of the color image and the depth image, coding and preprocessing the depth image by adopting a minimum normal plus a depth method, and finally normalizing the color image and the depth image;
a feature extraction module: the color image and the depth image which are preprocessed are respectively input into a color flow and a depth flow of a double-flow convolution neural network for feature extraction;
a fusion module: the fusion feature map fusion method comprises the steps that a color feature map rgb feature map output by a color flow and a depth feature map output by a depth flow are fused to generate a new fusion feature map; carrying out global mean pooling on the newly generated fusion feature map;
a prediction module: and predicting the current pose by using an LSTM neural network through training.
CN202111181525.6A 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network Active CN113838135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111181525.6A CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111181525.6A CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Publications (2)

Publication Number Publication Date
CN113838135A true CN113838135A (en) 2021-12-24
CN113838135B CN113838135B (en) 2024-03-19

Family

ID=78968495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111181525.6A Active CN113838135B (en) 2021-10-11 2021-10-11 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network

Country Status (1)

Country Link
CN (1) CN113838135B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615183A (en) * 2022-03-14 2022-06-10 广东技术师范大学 Routing method and device based on resource prediction, computer equipment and storage medium
CN115577755A (en) * 2022-11-28 2023-01-06 中环服(成都)科技有限公司 Robot posture correction method, apparatus, computer device, and storage medium
CN116704026A (en) * 2023-05-24 2023-09-05 国网江苏省电力有限公司南京供电分公司 Positioning method, positioning device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163909A (en) * 2018-02-12 2019-08-23 北京三星通信技术研究有限公司 For obtaining the method, apparatus and storage medium of equipment pose
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
CN111796681A (en) * 2020-07-07 2020-10-20 重庆邮电大学 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
CN111833400A (en) * 2020-06-10 2020-10-27 广东工业大学 Camera position and posture positioning method
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method
WO2021098766A1 (en) * 2019-11-20 2021-05-27 北京影谱科技股份有限公司 Orb feature visual odometer learning method and device based on image sequence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163909A (en) * 2018-02-12 2019-08-23 北京三星通信技术研究有限公司 For obtaining the method, apparatus and storage medium of equipment pose
CN110473254A (en) * 2019-08-20 2019-11-19 北京邮电大学 A kind of position and orientation estimation method and device based on deep neural network
WO2021098766A1 (en) * 2019-11-20 2021-05-27 北京影谱科技股份有限公司 Orb feature visual odometer learning method and device based on image sequence
CN111833400A (en) * 2020-06-10 2020-10-27 广东工业大学 Camera position and posture positioning method
CN111796681A (en) * 2020-07-07 2020-10-20 重庆邮电大学 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615183A (en) * 2022-03-14 2022-06-10 广东技术师范大学 Routing method and device based on resource prediction, computer equipment and storage medium
CN114615183B (en) * 2022-03-14 2023-09-05 广东技术师范大学 Routing method, device, computer equipment and storage medium based on resource prediction
CN115577755A (en) * 2022-11-28 2023-01-06 中环服(成都)科技有限公司 Robot posture correction method, apparatus, computer device, and storage medium
CN116704026A (en) * 2023-05-24 2023-09-05 国网江苏省电力有限公司南京供电分公司 Positioning method, positioning device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113838135B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN111127513B (en) Multi-target tracking method
US11823429B2 (en) Method, system and device for difference automatic calibration in cross modal target detection
CN113838135B (en) Pose estimation method, system and medium based on LSTM double-flow convolutional neural network
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
CN108764107B (en) Behavior and identity combined identification method and device based on human body skeleton sequence
CN110781262B (en) Semantic map construction method based on visual SLAM
Akai et al. Simultaneous pose and reliability estimation using convolutional neural network and Rao–Blackwellized particle filter
Wozniak et al. Scene recognition for indoor localization of mobile robots using deep CNN
CN112749726B (en) Training method and device for target detection model, computer equipment and storage medium
CN114708435B (en) Obstacle size prediction and uncertainty analysis method based on semantic segmentation
CN110969648A (en) 3D target tracking method and system based on point cloud sequence data
CN111709471A (en) Object detection model training method and object detection method and device
Ait Abdelali et al. An adaptive object tracking using Kalman filter and probability product kernel
Kadim et al. Deep-learning based single object tracker for night surveillance.
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
Hwang et al. Interactions between specific human and omnidirectional mobile robot using deep learning approach: SSD-FN-KCF
Salimpour et al. Self-calibrating anomaly and change detection for autonomous inspection robots
CN110992404A (en) Target tracking method, device and system and storage medium
CN112529025A (en) Data processing method and device
Omidshafiei et al. Hierarchical bayesian noise inference for robust real-time probabilistic object classification
Dong et al. Combination of modified U‐Net and domain adaptation for road detection
Alvar et al. Mixture of merged gaussian algorithm using RTDENN
CN111008992B (en) Target tracking method, device and system and storage medium
Kim et al. Adaptive surveillance algorithms based on the situation analysis
Qayyum et al. Deep convolutional neural network processing of aerial stereo imagery to monitor vulnerable zones near power lines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant