WO2021218282A1 - 场景深度和相机运动预测方法及装置、设备、介质和程序 - Google Patents
场景深度和相机运动预测方法及装置、设备、介质和程序 Download PDFInfo
- Publication number
- WO2021218282A1 WO2021218282A1 PCT/CN2021/076038 CN2021076038W WO2021218282A1 WO 2021218282 A1 WO2021218282 A1 WO 2021218282A1 CN 2021076038 W CN2021076038 W CN 2021076038W WO 2021218282 A1 WO2021218282 A1 WO 2021218282A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image frame
- time
- hidden state
- state information
- prediction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- the present disclosure relates to the field of computer technology, and relates to, but is not limited to, a scene depth and camera motion prediction method and device, electronic equipment, computer-readable storage media, and computer programs.
- the embodiments of the present disclosure propose a method and device for predicting scene depth and camera motion, an electronic device, a medium, and a technical solution for a program.
- the embodiment of the present disclosure provides a scene depth prediction method, including: obtaining a target image frame at time t; using the first hidden state information at time t-1 to perform scene depth prediction on the target image frame through a scene depth prediction network, The predicted depth map corresponding to the target image frame is determined, wherein the first hidden state information includes feature information related to the scene depth, and the scene depth prediction network is obtained based on the auxiliary training of the camera motion prediction network.
- the scene depth prediction network using the first hidden state information at time t-1 to perform scene depth prediction on the target image frame to determine the predicted depth map corresponding to the target image frame includes : Perform feature extraction on the target image frame, and determine a first feature map corresponding to the target image frame, where the first feature map is a feature map related to the scene depth; according to the first feature map and t
- the first hidden state information at time -1 determines the first hidden state information at time t; and the predicted depth map is determined according to the first hidden state information at time t.
- the first hidden state information at time t-1 includes the first hidden state information at different scales at time t-1; the feature extraction is performed on the target image frame, Determining the first feature map corresponding to the target image frame includes: performing multi-scale downsampling on the target image frame, and determining the first feature map at different scales corresponding to the target image frame; The first feature map and the first hidden state information at time t-1, and determining the first hidden state information at time t includes: for any scale, according to the first feature map and the first hidden state information at that scale The first hidden state information at the scale at time t-1 determines the first hidden state information at the scale at time t; and the first hidden state information at time t is determined according to the first hidden state information at time t. Predicting the depth map includes: performing feature fusion of the first hidden state information at different scales at time t to determine the predicted depth map.
- the method further includes: acquiring a sequence of sample image frames corresponding to time t, wherein the sequence of sample image frames includes a first sample image frame at time t and the first sample image Adjacent sample image frames of the frame; using the second hidden state information at time t-1 to perform camera pose prediction on the sample image frame sequence through the camera motion prediction network, and determine the sample prediction camera motion corresponding to the sample image frame sequence ,
- the second hidden state information includes feature information related to camera motion
- the scene depth prediction network to be trained uses the first hidden state information at time t-1 to perform scene depth on the first sample image frame Prediction, determining the sample predicted depth map corresponding to the first sample image frame, wherein the first hidden state information includes feature information related to the scene depth; predicting the depth map according to the sample and predicting the camera motion according to the sample , Construct a loss function; according to the loss function, train the scene depth prediction network to be trained to obtain the scene depth prediction network.
- predicting the camera motion according to the sample depth map and the sample to construct a loss function includes: predicting the camera motion according to the sample, and determining the first image frame sequence in the sample image frame.
- the reprojection error term of adjacent sample image frames of the sample image frame relative to the first sample image frame; the penalty function term is determined according to the distribution continuity of the sample prediction depth map; according to the reprojection error term And the penalty function term to construct the loss function.
- An embodiment of the present disclosure also provides a camera motion prediction method, including: acquiring a sequence of image frames corresponding to time t, wherein the sequence of image frames includes a target image frame at time t and adjacent image frames of the target image frame Using the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence through the camera motion prediction network to determine the predicted camera motion corresponding to the image frame sequence, wherein the second hidden state information Including feature information related to camera motion, the camera motion prediction network is obtained based on the auxiliary training of the scene depth prediction network.
- the camera motion prediction network uses the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence to determine the predicted camera motion corresponding to the image frame sequence, It includes: performing feature extraction on the image frame sequence, and determining a second feature map corresponding to the image frame sequence, wherein the second feature map is a feature map related to camera motion;
- the second hidden state information at time t-1 determines the second hidden state information at time t; and the predicted camera motion is determined according to the second hidden state information at time t.
- the predicted camera motion includes the relative pose between adjacent image frames in the sequence of image frames.
- the method further includes: acquiring a sequence of sample image frames corresponding to time t, wherein the sequence of sample image frames includes a first sample image frame at time t and the first sample image The adjacent sample image frames of the frame; the scene depth prediction is performed on the first sample image frame using the first hidden state information at time t-1 through the scene depth prediction network, and the sample corresponding to the first sample image frame is determined A predicted depth map, where the first hidden state information includes feature information related to the depth of the scene; the camera motion prediction network to be trained uses the second hidden state information at time t-1 to perform a camera on the sample image frame sequence Pose prediction, determining the sample prediction camera motion corresponding to the sample image frame sequence, wherein the second hidden state information includes feature information related to camera motion; predicting the camera motion based on the sample depth map and the sample , Construct a loss function; according to the loss function, train the camera motion prediction network to be trained to obtain the camera motion prediction network.
- predicting the camera motion according to the sample depth map and the sample to construct a loss function includes: predicting the camera motion according to the sample, and determining the first image frame sequence in the sample image frame.
- the reprojection error term of adjacent sample image frames of the sample image frame relative to the first sample image frame; the penalty function term is determined according to the distribution continuity of the sample prediction depth map; according to the reprojection error term And the penalty function term to construct the loss function.
- An embodiment of the present disclosure also provides a scene depth prediction device, including: a first acquisition module configured to acquire a target image frame at time t; a first scene depth prediction module configured to use time t-1 through a scene depth prediction network
- the first hidden state information performs scene depth prediction on the target image frame to determine the predicted depth map corresponding to the target image frame, wherein the first hidden state information includes feature information related to the scene depth, and the scene
- the depth prediction network is based on the auxiliary training of the camera motion prediction network.
- the first scene depth prediction module includes: a first determining sub-module configured to perform feature extraction on the target image frame, and determine the first feature map corresponding to the target image frame, Wherein, the first feature map is a feature map related to the scene depth; a second determining sub-module is configured to determine the information at time t according to the first feature map and the first hidden state information at time t-1 The first hidden state information; a third determining sub-module configured to determine the predicted depth map according to the first hidden state information at time t.
- the first hidden state information at time t-1 includes the first hidden state information at different scales at time t-1; the first determining submodule is specifically configured to: The target image frame is subjected to multi-scale down-sampling to determine the first feature map at different scales corresponding to the target image frame; the second determining sub-module is specifically configured to: for any scale, according to the scale The first feature map and the first hidden state information at this scale at time t-1, determine the first hidden state information at this scale at time t; the specific configuration of the third determining submodule It is: performing feature fusion of the first hidden state information at different scales at time t to determine the predicted depth map.
- the device further includes a first training module, and the first training module is configured to:
- the camera motion prediction network uses the second hidden state information at time t-1 to perform camera pose prediction on the sample image frame sequence to determine the sample predicted camera motion corresponding to the sample image frame sequence, wherein the second hidden state information
- the status information includes feature information related to camera movement
- the scene depth prediction network to be trained uses the first hidden state information at time t-1 to perform scene depth prediction on the first sample image frame to determine the sample prediction depth map corresponding to the first sample image frame, where ,
- the first hidden state information includes feature information related to the depth of the scene;
- the scene depth prediction network to be trained is trained to obtain the scene depth prediction network.
- the first training module is specifically configured to predict camera motion according to the samples, and determine that adjacent sample image frames of the first sample image frame in the sequence of sample image frames are relative The reprojection error term of the first sample image frame; determine a penalty function term according to the distribution continuity of the sample prediction depth map; construct the loss function according to the reprojection error term and the penalty function term .
- An embodiment of the present disclosure also provides a camera motion prediction device, including: a second acquisition module configured to acquire a sequence of image frames corresponding to time t, wherein the sequence of image frames includes a target image frame at time t and the target Adjacent image frames of an image frame; a first camera motion prediction module configured to use the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence through a camera motion prediction network to determine the image frame The predicted camera motion corresponding to the sequence, wherein the second hidden state information includes feature information related to the camera motion, and the camera motion prediction network is obtained based on the auxiliary training of the scene depth prediction network.
- the first camera motion prediction module includes: a sixth determining sub-module configured to perform feature extraction on the image frame sequence, and determine a second feature map corresponding to the image frame sequence, Wherein, the second feature map is a feature map related to camera motion; the seventh determining sub-module is configured to determine the information at time t based on the features of the second map and the second hidden state information at time t-1 The second hidden state information; an eighth determining sub-module, configured to determine the predicted camera motion according to the second hidden state information at time t.
- the predicted camera motion includes the relative pose between adjacent image frames in the sequence of image frames.
- the device further includes: a second training module configured to:
- the scene depth prediction network uses the first hidden state information at time t-1 to perform scene depth prediction on the first sample image frame to determine the sample prediction depth map corresponding to the first sample image frame, wherein the The first hidden state information includes feature information related to the depth of the scene;
- the camera motion prediction network to be trained uses the second hidden state information at time t-1 to perform camera pose prediction on the sample image frame sequence to determine the sample prediction camera motion corresponding to the sample image frame sequence, wherein the The second hidden state information includes feature information related to camera movement;
- the camera motion prediction network to be trained is trained to obtain the camera motion prediction network.
- the second training module is specifically configured to predict camera motion according to the samples, and determine that adjacent sample image frames of the first sample image frame in the sequence of sample image frames are relative to each other.
- the reprojection error term of the first sample image frame determine a penalty function term according to the distribution continuity of the sample prediction depth map; construct the loss function according to the reprojection error term and the penalty function term .
- An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory configured to store executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute any one of the foregoing Kind of method.
- the embodiment of the present disclosure also provides a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, any one of the above methods is implemented.
- the embodiments of the present disclosure also provide a computer program, including computer-readable code, and when the computer-readable code runs in an electronic device, a processor in the electronic device executes for realizing any of the above-mentioned methods.
- the target image frame corresponding to time t is acquired. Since the scene depth between adjacent times has an association relationship in time series, the first hidden state information related to the scene depth at time t-1 is used to pass the scene depth
- the prediction network performs scene depth prediction on the target image frame, and can obtain a predicted depth map with higher prediction accuracy corresponding to the target image frame.
- the image frame sequence including the target image frame at time t and the adjacent image frame of the target image frame corresponding to time t is acquired. Since the camera poses between adjacent time have an association relationship in time series, use t
- the second hidden state information related to the camera motion at time -1 is used to predict the camera pose of the image frame sequence through the camera motion prediction network, and the predicted camera motion with higher prediction accuracy can be obtained.
- FIG. 1 is a flowchart of a scene depth prediction method according to an embodiment of the disclosure
- Fig. 2 is a block diagram of a scene depth prediction network according to an embodiment of the disclosure
- FIG. 3 is a block diagram of unsupervised network training according to an embodiment of the disclosure.
- FIG. 4 is a flowchart of a camera motion prediction method according to an embodiment of the disclosure.
- FIG. 5 is a schematic structural diagram of a scene depth prediction apparatus according to an embodiment of the disclosure.
- FIG. 6 is a schematic structural diagram of a camera motion prediction device according to an embodiment of the disclosure.
- FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
- FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
- Fig. 1 shows a flowchart of a scene depth prediction method according to an embodiment of the present disclosure.
- the scene depth prediction method shown in Figure 1 can be executed by a terminal device or other processing device, where the terminal device can be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, or a personal Digital processing (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
- Other processing equipment can be servers or cloud servers.
- the scene depth prediction method may be implemented by a processor invoking computer-readable instructions stored in a memory. As shown in Figure 1, the method may include:
- step S11 the target image frame at time t is acquired.
- the scene depth prediction network uses the first hidden state information at time t-1 to perform scene depth prediction on the target image frame to determine the predicted depth map corresponding to the target image frame, where the first hidden state information includes the scene
- the scene depth prediction network is obtained based on the auxiliary training of the camera motion prediction network.
- the target image frame at time t is acquired. Since the scene depth between adjacent time is related in time series, the first hidden state information related to the scene depth at time t-1 is used to predict the network through the scene depth Performing scene depth prediction on the target image frame can obtain a predicted depth map with higher prediction accuracy corresponding to the target image frame.
- the scene depth prediction network uses the first hidden state information at time t-1 to perform scene depth prediction on the target image frame to determine the predicted depth map corresponding to the target image frame, which may include: characterizing the target image frame Extract and determine the first feature map corresponding to the target image frame, where the first feature map is a feature map related to the scene depth; according to the first feature map and the first hidden state information at time t-1, determine the first feature map at time t One hidden state information; according to the first hidden state information at time t, the predicted depth map is determined.
- the scene depth prediction network uses the first feature map related to the scene depth corresponding to the target image frame at the current moment (for example, time t), and the previous moment (for example, , T-1) the first hidden state information related to the scene depth, the first hidden state information related to the scene depth at the current moment can be determined, and then the target image frame based on the first hidden state information related to the scene depth at the current moment.
- a predicted depth map with higher prediction accuracy corresponding to the target image frame at the current moment can be obtained.
- the scene depth prediction network when used to predict the predicted depth map corresponding to each image frame in the image frame sequence (including the image frames from time 1 to time t), in the initialization phase of the scene depth prediction network, set the scene depth-related The preset initial value of the first hidden state information.
- the first hidden state at the first time is determined, and then based on the first hidden state at the first time Perform scene depth prediction on the image frame at the first moment to obtain the predicted depth map corresponding to the image frame at the first moment;
- the feature map determines the first hidden state at the second time, and then performs scene depth prediction on the image frame at the second time based on the first hidden state at the second time, and obtains the predicted depth map corresponding to the image frame at the second time;
- the first hidden state at time 2 and the first feature map related to the scene depth corresponding to the image frame at time 3 determine the first hidden state at time 3, and then compare the first hidden state at time 3 to time 3 Perform scene depth prediction on the image frame to obtain the predicted depth map corresponding to the image frame at time 3; and so on, finally get the predicted depth corresponding to each image frame in the image frame sequence (including the image frame from time 1 to time t) picture.
- the first hidden state information at time t-1 includes first hidden state information at different scales at time t-1; feature extraction is performed on the target image frame to determine the first feature map corresponding to the target image frame , May include: performing multi-scale down-sampling on the target image frame, determining the first feature map at different scales corresponding to the target image frame; determining the first feature map at time t according to the first feature map and the first hidden state information at time t-1
- the first hidden state information may include: for any scale, determining the first hidden state at time t according to the first feature map at that scale and the first hidden state information at that scale at time t-1.
- State information; determining the predicted depth map according to the first hidden state information at time t may include: fusing the first hidden state information at different scales at time t to determine the predicted depth map.
- FIG. 2 shows a block diagram of a scene depth prediction network according to an embodiment of the present disclosure.
- the scene depth prediction network includes a depth encoder 202 and a multi-scale convolutional gated recurrent unit (ConvGRU) And the depth decoder 205.
- ConvGRU convolutional gated recurrent unit
- the target image frame 201 at time t is input to the depth encoder 202 for multi-scale down-sampling to obtain the first feature map at different scales corresponding to the target image frame 203: the first feature map at the first scale The first feature map at the second scale And the first feature map at the third scale
- the multi-scale and multi-scale ConvGRU scale corresponding to the first characteristic diagram i.e., multi-scale ConvGRU comprising: ConvGRU 0 in the first dimension, ConvGRU ConvGRU 1 in the third dimension and the second dimension under 2.
- ConvGRU 0 will be the first feature map
- the first hidden state information at the first scale at time t-1 stored in ConvGRU 0 Perform feature fusion to obtain the first hidden state at the first scale at time t ConvGRU 0 vs.
- the multi-scale hidden state 204 includes the first hidden state at the first scale at time t The first hidden state in the first hidden state in the first scale at time t To store, and the first hidden state at the first scale at time t Output to the depth decoder;
- ConvGRU 1 converts the first feature map And the first hidden state information at the second scale at t-1 stored in ConvGRU 1 Perform feature fusion to obtain the first hidden state at the second scale at time t ConvGRU 1 to the first hidden state at the second scale at time t Store, and store the first hidden state at the second scale at time t Output to the depth decoder;
- ConvGRU 2 converts the first feature map And the first hidden state information at the third scale at time t-1 stored in ConvGRU 2 Perform feature fusion to obtain the first hidden state at the third scale at time t ConvGRU 2 pairs the first hidden state at the third scale at time t For storage, and the first hidden state at the third scale at time t Output to the depth decoder.
- the depth decoder 205 separately calculates the first hidden state at the first scale at time t The first hidden state in the second scale And the first hidden state at the third scale The scale of is restored to the same scale as the scale of the target image frame 201 (hereinafter the scale of the target image frame is referred to as the target scale), and the three first hidden states at the target scale at time t are obtained. Since the first hidden state information includes feature information related to the depth of the scene, it also exists in the form of a feature map in the scene depth prediction network. Therefore, the three first hidden states at the target scale at time t are fused with feature maps, Thus, the predicted depth map D t corresponding to the target image frame at time t is obtained .
- the scene depth prediction method may further include: acquiring a sequence of sample image frames corresponding to time t, where the sequence of sample image frames includes the first sample image frame at time t and the phase of the first sample image frame. Adjacent sample image frames; use the second hidden state information at t-1 to perform camera pose prediction on the sample image frame sequence through the camera motion prediction network, and determine the sample prediction camera motion corresponding to the sample image frame sequence, where the second hidden state
- the information includes feature information related to camera motion
- the scene depth prediction network to be trained uses the first hidden state information at time t-1 to perform scene depth prediction on the first sample image frame, and determine the corresponding Sample prediction depth map, where the first hidden state information includes feature information related to the depth of the scene; predict the depth map according to the sample and predict the camera motion based on the sample to construct a loss function; according to the loss function, train the scene depth prediction network to be trained, In order to get the scene depth prediction network.
- the scene depth prediction network is obtained based on the auxiliary training of the camera motion prediction network, or the scene depth prediction network and the camera motion prediction network are jointly trained.
- the sliding window data fusion mechanism is introduced to extract and memorize the scene depth and camera movement related to the target moment (time t) in the sliding window sequence
- the hidden state information is used to perform unsupervised network training on the scene depth prediction network and/or the camera motion prediction network.
- a training set may be created in advance, and the training set includes a sequence of sample image frames continuously collected in time sequence, and then the scene depth prediction network to be trained is trained based on the training set.
- Fig. 3 shows a block diagram of unsupervised network training according to an embodiment of the present disclosure.
- the target time is time t
- the sample image frame sequence 301 corresponding to the target time includes: the first sample image frame I t at time t , The adjacent sample image frame I t-1 at time t-1 and the adjacent sample image frame I t+1 at time t+1.
- the number of adjacent sample image frames of the first sample image frame in the sequence of sample image frames may be determined according to actual conditions, which is not specifically limited in the present disclosure.
- Figure 3 shows that the scene depth prediction network to be trained uses a single-scale feature fusion mechanism.
- the scene depth prediction network to be trained can adopt the single-scale feature fusion mechanism shown in FIG. 3, or the multi-scale feature fusion mechanism shown in FIG. 2, which is not specifically limited in the present disclosure.
- the scene depth prediction network to be trained includes a depth encoder 202, a ConvGRU, and a depth decoder 205.
- the first sample image frame I t at time t is input to the depth encoder 202 for feature extraction to obtain a first feature map corresponding to the first sample image frame I t
- the first feature map Enter ConvGRU to make the first feature map
- the first hidden state information stored in ConvGRU at t-1 Perform feature fusion to get the first hidden state at time t ConvGRU to the first hidden state at time t To store, and the first hidden state at time t Output to the depth decoder 205 to obtain the sample predicted depth map D t corresponding to the first sample image frame at time t.
- the camera motion prediction network includes a pose encoder 302, a ConvGRU, and a pose decoder 303.
- the second feature map Enter ConvGRU to make the second feature map
- the second hidden state information stored in ConvGRU at t-1 Perform feature fusion to get the second hidden state at time t ConvGRU to the second hidden state at time t To store, and the second hidden state at time t
- the sample prediction camera movement [T t-1 ⁇ t , T t ⁇ t+1 ] it is determined that adjacent sample image frames I t-1 and I t+1 in the sequence of sample image frames are relative to the first sample
- the reprojection error term L reproj of the image frame I t ; the penalty function term L smooth is determined according to the distribution continuity of the sample prediction depth map D t .
- the loss function L(I t ,I t-1 ,I t+1 ,D t ,T t-1 ⁇ t ,T t ⁇ t+1 ) is constructed by the following formula (1):
- ⁇ smooth is a weight coefficient, and the value of ⁇ smooth can be determined according to actual conditions, which is not specifically limited in the present disclosure.
- the specific process of determining the penalty function term L smooth is: determining the gradient value of each pixel in the first sample image frame I t, and the value of each pixel continuous gradient value may reflect the distribution of the first sample image frame I t (also referred to as smoothness), thereby determining the edge area of the first sample in the image frame I t (gradient according to the gradient value of each pixel
- the area constituted by pixels with a value greater than or equal to the threshold) and non-edge area the area constituted by pixels with a gradient value less than the threshold
- the edge in the sample prediction depth map D t corresponding to the first sample image frame I t can be determined Regions and non-edge regions; determine the gradient value of each pixel in the sample prediction depth map D t , in order to ensure the continuity of the distribution of the non-edge region in the sample prediction depth map D t and the discontinuity of the edge region distribution, predict the depth of the sample For each pixel in
- the reprojection error determined by the predicted camera motion obtained by the camera motion prediction network is comprehensively used.
- the item and the loss function constructed according to the penalty function item determined by the predicted depth map obtained by the scene depth prediction network are used to train the scene depth prediction network to be trained.
- the trained scene depth prediction network can improve the prediction accuracy of the scene depth prediction.
- the camera motion prediction network in FIG. 3 may be a camera motion prediction network to be trained.
- the camera motion network to be trained can be trained to realize the scene depth prediction network to be trained and the to-be-trained scene depth prediction network.
- the joint training of the trained camera motion network obtains the trained scene depth prediction network and the camera motion prediction network.
- the reprojection error term determined by the predicted camera motion obtained by the camera motion prediction network is comprehensively used.
- the loss function constructed according to the penalty function item determined by the prediction depth map obtained by the scene depth prediction network, to jointly train the scene depth prediction network and the camera motion prediction network, and the training of the scene depth prediction network and the camera motion prediction network can improve the scene The prediction accuracy of depth prediction and camera motion prediction.
- the depth encoder and the pose encoder may reuse the ResNet18 structure, may reuse the ResNet54 structure, and may also reuse other structures, which are not specifically limited in the present disclosure.
- the depth decoder and the pose decoder may adopt the Unet network structure, and may also adopt other decoder network structures, which are not specifically limited in the present disclosure.
- ConvGRU includes a convolution operation
- the activation function in ConvGRU is an ELU activation function
- ConvGRU that can only process one-dimensional data
- the linear operation in ConvGRU can be replaced by convolution operation
- the tanh activation function in ConvGRU can be replaced by ELU activation function.
- the ConvGRU can perform convolution processing on the image frame sequence corresponding to different moments in time sequence, so that the first hidden state and/or second corresponding to different moments can be obtained.
- Hidden state a convolution processing on the image frame sequence corresponding to different moments in time sequence, so that the first hidden state and/or second corresponding to different moments can be obtained.
- Convolutional Long Short-Term Memory (ConvLSTM) can also be used, and other structures that can realize sliding window data fusion can also be used. There are no specific restrictions on this publicly.
- Fig. 4 shows a flowchart of a camera motion prediction method according to an embodiment of the present disclosure.
- the camera motion prediction method shown in FIG. 4 can be executed by a terminal device or other processing device, where the terminal device can be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, or a personal Digital processing (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
- Other processing equipment can be servers or cloud servers.
- the camera motion prediction method can be implemented by a processor calling computer-readable instructions stored in a memory. As shown in Figure 4, the method may include:
- step S41 an image frame sequence corresponding to time t is acquired, where the image frame sequence includes the target image frame at time t and adjacent image frames of the target image frame.
- step S42 the camera motion prediction network uses the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence to determine the predicted camera motion corresponding to the image frame sequence, where the second hidden state information includes The feature information related to camera motion, the camera motion prediction network is obtained based on the auxiliary training of the scene depth prediction network.
- the image frame sequence including the target image frame at time t and the adjacent image frames of the target image frame is acquired. Since the camera motion between adjacent times has an association relationship in time series, the time t-1 is used to correlate with the camera.
- the camera pose prediction is performed on the image frame sequence through the camera motion prediction network, and the predicted camera motion corresponding to the image frame sequence with higher prediction accuracy can be obtained.
- the camera motion prediction network uses the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence to determine the predicted camera motion corresponding to the image frame sequence, which may include: performing the image frame sequence Feature extraction, determine the second feature map corresponding to the image frame sequence, where the second feature map is a feature map related to camera motion; according to the second image feature and the second hidden state information at time t-1, determine the second feature map at time t The second hidden state information; according to the second hidden state information at time t, determine the predicted camera movement.
- the camera motion prediction network uses the second feature map related to the scene depth corresponding to the image frame sequence at time t, and the second feature map related to the camera motion at time t-1.
- Hidden state information can determine the second hidden state information related to camera motion at time t, and then perform camera motion prediction on the image frame sequence at time t based on the second hidden state information related to camera motion at time t, and we can get t A predicted depth map with higher prediction accuracy corresponding to the sequence of image frames at a time.
- predicting camera motion may include the relative pose between adjacent image frames in the sequence of image frames.
- the relative pose is a six-dimensional parameter, including three-dimensional rotation information and three-dimensional translation information.
- the camera motion prediction network includes a pose encoder, ConvGRU, and a pose decoder.
- the second feature map Enter ConvGRU to make the second feature map
- the second hidden state information stored in ConvGRU at t-1 Perform feature fusion to get the second hidden state at time t ConvGRU to the second hidden state at time t To store, and the second hidden state at time t
- the preset initial value of the second hidden state information related to the camera motion is set. Based on the preset initial value of the second hidden state information and the second feature map related to the camera motion corresponding to the image frame sequence at the first time, the second hidden state at the first time is determined, and then the second hidden state at the first time is determined.
- the state performs camera motion prediction on the image frame sequence at the first time, and obtains the predicted camera motion corresponding to the image frame sequence at the first time; based on the second hidden state at the first time and the camera motion corresponding to the image frame sequence at the second time.
- the related second feature map determines the second hidden state at the second moment, and then performs camera motion prediction on the image frame sequence at the second moment based on the second hidden state at the second moment, and obtains the corresponding image frame sequence at the second moment Predict the camera motion; based on the second hidden state at the second moment and the second feature map related to the camera motion corresponding to the image frame sequence at the third moment, the second hidden state at the third moment is determined, and then the second hidden state at the third moment is determined.
- the two hidden states perform camera motion prediction on the image frame sequence at time 3 to obtain the predicted camera motion corresponding to the image frame sequence at time 3; and so on, finally get the predicted camera motion corresponding to the image frame sequence at different times.
- the camera motion prediction method may further include: acquiring a sequence of sample image frames corresponding to time t, where the sequence of sample image frames includes the first sample image frame at time t and the phase of the first sample image frame. Neighboring sample image frames; the scene depth prediction network uses the first hidden state information at time t-1 to perform scene depth prediction on the target image frame to determine the predicted depth map corresponding to the first sample image frame, where the first hidden state information Including feature information related to the depth of the scene; the camera motion prediction network to be trained uses the second hidden state information at time t-1 to predict the camera pose of the sample image frame sequence, and determine the sample predicted camera motion corresponding to the sample image frame sequence , Where the second hidden state information includes feature information related to camera motion; predicting the depth map and the sample to predict the camera motion according to the sample to construct a loss function; according to the loss function, the camera motion prediction network to be trained is trained to obtain the camera motion Predictive network.
- predicting the depth map based on the samples and predicting the camera motion based on the samples to construct the loss function may include: predicting the camera motion based on the samples, and determining that adjacent sample image frames of the first sample image frame in the sequence of sample image frames are relative to the first sample image frame.
- the reprojection error term of the same image frame; the penalty function term is determined according to the distribution continuity of the sample prediction depth map; the loss function is constructed according to the reprojection error term and the penalty function term.
- the camera motion prediction network is obtained based on the auxiliary training of the scene depth prediction network, or the scene depth prediction network and the camera motion prediction network are jointly trained.
- the camera motion prediction network to be trained can be trained based on the above-mentioned Figure 3.
- the camera motion prediction network in Figure 3 is the camera motion prediction network to be trained, and the scene in Figure 3
- the depth prediction network can be the scene depth prediction network to be trained (joint training the scene depth prediction network to be trained and the camera motion prediction network to be trained), or it can be the trained scene depth prediction network (the camera motion prediction network to be trained performs Separate training), the specific training process is the same as that in FIG.
- the reprojection error term determined by the predicted camera motion obtained by the camera motion prediction network is comprehensively used.
- the loss function constructed according to the penalty function item determined by the prediction depth map obtained by the scene depth prediction network, to jointly train the scene depth prediction network and the camera motion prediction network, and the training of the scene depth prediction network and the camera motion prediction network can improve the scene The prediction accuracy of depth prediction and camera motion prediction.
- the scene depth prediction network and the camera motion prediction network trained by the network training method shown in FIG. 3 can perform environment depth prediction and three-dimensional scene construction.
- the scene depth prediction network is applied to the navigation scenes of indoor and outdoor mobile robots such as sweepers and lawnmowers, and RGB images are obtained through Red Green Blue (RGB) cameras, and then the scene depth prediction network is used to determine RGB
- RGB Red Green Blue
- the predicted depth map corresponding to the image uses the camera prediction network to determine the camera movement of the RGB camera, so as to realize the distance measurement of obstacles and the construction of three-dimensional scenes to complete obstacle avoidance and navigation tasks.
- the present disclosure also provides a scene depth/camera motion prediction device, electronic equipment, computer-readable storage medium, and a program, all of which can be used to implement any of the scene depth/camera motion prediction methods provided in the present disclosure, and the corresponding technical solutions and Description and refer to the corresponding records in the method section, and will not repeat them.
- Fig. 5 shows a block diagram of a scene depth prediction apparatus according to an embodiment of the present disclosure.
- the scene depth prediction device 50 includes:
- the first obtaining module 51 is configured to obtain the target image frame at time t;
- the first scene depth prediction module 52 is configured to perform scene depth prediction on the target image frame by using the first hidden state information at time t-1 through the scene depth prediction network, and determine the predicted depth map corresponding to the target image frame, where the first hidden state information
- the state information includes feature information related to the depth of the scene, and the scene depth prediction network is obtained based on the auxiliary training of the camera motion prediction network.
- the first scene depth prediction module 52 includes:
- the first determining submodule is configured to perform feature extraction on the target image frame, and determine a first feature map corresponding to the target image frame, where the first feature map is a feature map related to the scene depth;
- the second determining submodule is configured to determine the first hidden state information at time t according to the first feature map and the first hidden state information at time t-1;
- the third determining submodule is configured to determine the predicted depth map according to the first hidden state information at time t.
- the first hidden state information at time t-1 includes first hidden state information at different scales at time t-1;
- the first determining sub-module is specifically configured to: perform multi-scale down-sampling on the target image frame, and determine the first feature maps at different scales corresponding to the target image frame;
- the second determining submodule is specifically configured to: for any scale, determine the first hidden state at time t according to the first feature map at that scale and the first hidden state information at that scale at time t-1. status information;
- the third determining submodule is specifically configured to perform feature fusion of the first hidden state information at different scales at time t to determine the predicted depth map.
- the scene depth prediction device 50 further includes a first training module, and the first training module is configured to:
- the camera motion prediction network uses the second hidden state information at time t-1 to perform camera pose prediction on the sample image frame sequence to determine the sample predicted camera motion corresponding to the sample image frame sequence, wherein the second hidden state information
- the status information includes feature information related to camera movement
- the scene depth prediction network to be trained uses the first hidden state information at time t-1 to perform scene depth prediction on the first sample image frame to determine the sample prediction depth map corresponding to the first sample image frame, where ,
- the first hidden state information includes feature information related to the depth of the scene;
- the scene depth prediction network to be trained is trained to obtain the scene depth prediction network.
- the first training module is specifically configured to predict camera motion according to the samples, and determine that adjacent sample image frames of the first sample image frame in the sequence of sample image frames are relative to the first sample image frame.
- the reprojection error term of the sample image frame; the penalty function term is determined according to the distribution continuity of the sample prediction depth map; the loss function is constructed according to the reprojection error term and the penalty function term.
- Fig. 6 shows a block diagram of a camera motion prediction device according to an embodiment of the present disclosure.
- the camera motion prediction device 60 includes:
- the second acquisition module 61 is configured to acquire an image frame sequence corresponding to time t, where the image frame sequence includes a target image frame at time t and adjacent image frames of the target image frame;
- the first camera motion prediction module 62 is configured to use the second hidden state information at time t-1 to perform camera pose prediction on the image frame sequence through the camera motion prediction network, and determine the predicted camera motion corresponding to the image frame sequence, where the second The hidden state information includes feature information related to camera motion, and the camera motion prediction network is obtained based on the auxiliary training of the scene depth prediction network.
- the first camera motion prediction module 62 includes:
- the sixth determining sub-module is configured to perform feature extraction on the image frame sequence and determine a second feature map corresponding to the image frame sequence, where the second feature map is a feature map related to camera motion;
- the seventh determining submodule is configured to determine the second hidden state information at time t according to the features of the second graph and the second hidden state information at time t-1;
- the eighth determining sub-module is configured to determine the predicted camera motion according to the second hidden state information at time t.
- predicting camera motion includes relative poses between adjacent image frames in the sequence of image frames.
- the camera motion prediction device 60 further includes a second training module configured to:
- the scene depth prediction network uses the first hidden state information at time t-1 to perform scene depth prediction on the first sample image frame to determine the sample prediction depth map corresponding to the first sample image frame, wherein the The first hidden state information includes feature information related to the depth of the scene;
- the camera motion prediction network to be trained uses the second hidden state information at time t-1 to perform camera pose prediction on the sample image frame sequence to determine the sample prediction camera motion corresponding to the sample image frame sequence, wherein the The second hidden state information includes feature information related to camera movement;
- the camera motion prediction network to be trained is trained to obtain the camera motion prediction network.
- the second training module is specifically configured to predict camera motion according to the samples, and determine that adjacent sample image frames of the first sample image frame in the sequence of sample image frames are relative to the first sample image frame.
- the reprojection error term of the sample image frame; the penalty function term is determined according to the distribution continuity of the sample prediction depth map; the loss function is constructed according to the reprojection error term and the penalty function term.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
- the computer-readable storage medium may be a volatile or non-volatile computer-readable storage medium.
- the embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute any one of the foregoing Scene depth prediction method or any of the above camera motion prediction methods.
- the embodiments of the present disclosure also provide a computer program product, which includes computer-readable code.
- the processor in the device executes to realize the scene depth and/or provided by any of the above embodiments. Or the instruction of the camera motion prediction method.
- the embodiments of the present disclosure also provide another computer program product for storing computer-readable instructions, which when executed, cause the computer to perform the operations of the scene depth and/or camera motion prediction method provided by any of the foregoing embodiments.
- the electronic device can be provided as a terminal, server or other form of device.
- FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
- the electronic device 800 may be a terminal such as a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, and a personal digital assistant.
- the electronic device 800 may include one or more of the following components: a first processing component 802, a first storage 804, a first power supply component 806, a multimedia component 808, an audio component 810, a first input/output (Input Output, I/O) interface 812, sensor component 814, and communication component 816.
- a first processing component 802 a first storage 804, a first power supply component 806, a multimedia component 808, an audio component 810, a first input/output (Input Output, I/O) interface 812, sensor component 814, and communication component 816.
- the first processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communication, camera operations, and recording operations.
- the first processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the first processing component 802 may include one or more modules to facilitate the interaction between the first processing component 802 and other components.
- the first processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the first processing component 802.
- the first memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
- the first memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random-Access Memory, SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Electrical Programmable Read Only Memory, EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory) Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
- SRAM static random access memory
- EEPROM Electrically erasable programmable read-only memory
- EEPROM Electrically Erasable Programmable Read Only Memory
- EPROM Electrical Programmable Read Only Memory
- the first power supply component 806 provides power for various components of the electronic device 800.
- the first power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Pad, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the first memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the first input/output interface 812 provides an interface between the first processing component 802 and a peripheral interface module.
- the peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) or a charge coupled device (Charge Coupled Device, CCD) image sensor for use in imaging applications.
- CMOS Complementary Metal Oxide Semiconductor
- CCD Charge Coupled Device
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communication.
- the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bluetooth, BT) technology and other technologies. Technology to achieve.
- RFID Radio Frequency Identification
- IrDA Infrared Data Association
- UWB Ultra Wide Band
- Bluetooth Bluetooth
- the electronic device 800 may be used by one or more application specific integrated circuits (ASIC), digital signal processors (Digital Signal Processor, DSP), and digital signal processing equipment (Digital Signal Process, DSPD), programmable logic device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor or other electronic components to implement any of the above A scene depth prediction method or any one of the aforementioned camera motion prediction methods.
- ASIC application specific integrated circuits
- DSP Digital Signal Processor
- DSPD digital signal processing equipment
- PLD programmable logic device
- PLD Field Programmable Gate Array
- controller microcontroller, microprocessor or other electronic components to implement any of the above A scene depth prediction method or any one of the aforementioned camera motion prediction methods.
- a non-volatile computer-readable storage medium is also provided, such as the first memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to accomplish any of the foregoing.
- a scene depth prediction method or any one of the aforementioned camera motion prediction methods is also provided, such as the first memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to accomplish any of the foregoing.
- Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- the electronic device 900 may be provided as a server.
- the electronic device 900 includes a second processing component 922, which further includes one or more processors, and a memory resource represented by the second memory 932, for storing instructions that can be executed by the second processing component 922, For example, applications.
- the application program stored in the second memory 932 may include one or more modules each corresponding to a set of instructions.
- the second processing component 922 is configured to execute instructions to execute any one of the aforementioned scene depth prediction methods or any one of the aforementioned camera motion prediction methods.
- the electronic device 900 may also include a second power supply component 926 configured to perform power management of the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 900 to the network, and a second input and output (I/O ) Interface 958.
- the electronic device 900 can operate based on an operating system stored in the second storage 932, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM or the like.
- a non-volatile computer-readable storage medium is also provided, such as a second memory 932 including computer program instructions, which can be executed by the second processing component 922 of the electronic device 900 to complete Any one of the above-mentioned scene depth prediction methods or any one of the above-mentioned camera motion prediction methods.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Computer-readable storage media include: portable computer disks, hard disks, random-access memory (Random-Access Memory, RAM), read-only memory (ROM), erasable programmable Read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as on it The punch card or the convex structure in the groove that stores the command, and any suitable combination of the above.
- RAM random-access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable Read-only memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as on it
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or one or more Source code or object code written in any combination of two programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to an external computer (for example, Use an Internet service provider to connect via the Internet).
- the electronic circuit is customized by using the state information of the computer-readable program instructions, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic array (Programmable Logic Array, PLA)
- the electronic circuit can execute computer-readable program instructions to realize various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
- the computer program product can be specifically implemented by hardware, software, or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
- the embodiments of the present disclosure provide a scene depth and camera motion prediction method and device, electronic equipment, medium, and program.
- the method includes: acquiring a target image frame at time t; A hidden state information performs scene depth prediction on the target image frame to determine a predicted depth map corresponding to the target image frame, wherein the first hidden state information includes feature information related to the scene depth, and the scene depth prediction
- the network is obtained based on the auxiliary training of the camera motion prediction network.
- the embodiments of the present disclosure can obtain a predicted depth map with high prediction accuracy corresponding to a target image frame.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Auxiliary Devices For Music (AREA)
- Studio Devices (AREA)
Abstract
Description
Claims (23)
- 一种场景深度预测方法,包括:获取t时刻的目标图像帧;通过场景深度预测网络利用t-1时刻的第一隐状态信息对所述目标图像帧进行场景深度预测,确定所述目标图像帧对应的预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息,所述场景深度预测网络是基于相机运动预测网络辅助训练得到的。
- 根据权利要求1所述的方法,其中,所述通过场景深度预测网络利用t-1时刻的第一隐状态信息对所述目标图像帧进行场景深度预测,确定所述目标图像帧对应的预测深度图,包括:对所述目标图像帧进行特征提取,确定所述目标图像帧对应的第一特征图,其中,所述第一特征图为与场景深度相关的特征图;根据所述第一特征图和t-1时刻的所述第一隐状态信息,确定t时刻的所述第一隐状态信息;根据t时刻的所述第一隐状态信息,确定所述预测深度图。
- 根据权利要求2所述的方法,其中,所述t-1时刻的所述第一隐状态信息包括t-1时刻的不同尺度下的所述第一隐状态信息;所述对所述目标图像帧进行特征提取,确定所述目标图像帧对应的第一特征图,包括:对所述目标图像帧进行多尺度下采样,确定所述目标图像帧对应的不同尺度下的所述第一特征图;所述根据所述第一特征图和t-1时刻的所述第一隐状态信息,确定t时刻的所述第一隐状态信息,包括:针对任一尺度,根据该尺度下的所述第一特征图和t-1时刻的该尺度下的所述第一隐状态信息,确定t时刻的该尺度下的所述第一隐状态信息;所述根据t时刻的所述第一隐状态信息,确定所述预测深度图,包括:将t时刻的不同尺度下的所述第一隐状态信息进行特征融合,确定所述预测深度图。
- 根据权利要求1-3任一项所述的方法,其中,所述方法还包括:获取t时刻对应的样本图像帧序列,其中,所述样本图像帧序列包括t时刻的第一样本图像帧和所述第一样本图像帧的相邻样本图像帧;通过相机运动预测网络利用t-1时刻的第二隐状态信息对所述样本图像帧序列进行相机位姿预测,确定所述样本图像帧序列对应的样本预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息;通过待训练的场景深度预测网络利用t-1时刻的第一隐状态信息对所述第一样本图像帧进行场景深度预测,确定所述第一样本图像帧对应的样本预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息;根据所述样本预测深度图和所述样本预测相机运动,构建损失函数;根据所述损失函数,对所述待训练的场景深度预测网络进行训练,以得到所述场景深度预测网络。
- 根据权利要求4所述的方法,其中,所述根据所述样本预测深度图和所述样本预测相机运动,构建损失函数,包括:根据所述样本预测相机运动,确定所述样本图像帧序列中所述第一样本图像帧的相 邻样本图像帧相对所述第一样本图像帧的重投影误差项;根据所述样本预测深度图的分布连续性,确定惩罚函数项;根据所述重投影误差项和所述惩罚函数项,构建所述损失函数。
- 一种相机运动预测方法,包括:获取t时刻对应的图像帧序列,其中,所述图像帧序列包括t时刻的目标图像帧和所述目标图像帧的相邻图像帧;通过相机运动预测网络利用t-1时刻的第二隐状态信息对所述图像帧序列进行相机位姿预测,确定所述图像帧序列对应的预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息,所述相机运动预测网络是基于场景深度预测网络辅助训练得到的。
- 根据权利要求6所述的方法,其中,所述通过相机运动预测网络利用t-1时刻的第二隐状态信息对所述图像帧序列进行相机位姿预测,确定所述图像帧序列对应的预测相机运动,包括:对所述图像帧序列进行特征提取,确定所述图像帧序列对应的第二特征图,其中,所述第二特征图为与相机运动相关的特征图;根据所述第二特征图和t-1时刻的所述第二隐状态信息,确定t时刻的所述第二隐状态信息;根据t时刻的所述第二隐状态信息,确定所述预测相机运动。
- 根据权利要求6或7所述的方法,其中,所述预测相机运动包括所述图像帧序列中相邻图像帧之间的相对位姿。
- 根据权利要求6-8任一项所述的方法,其中,所述方法还包括:获取t时刻对应的样本图像帧序列,其中,所述样本图像帧序列包括t时刻的第一样本图像帧和所述第一样本图像帧的相邻样本图像帧;通过场景深度预测网络利用t-1时刻的第一隐状态信息对所述第一样本图像帧进行场景深度预测,确定所述第一样本图像帧对应的样本预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息;通过待训练的相机运动预测网络利用t-1时刻的第二隐状态信息对所述样本图像帧序列进行相机位姿预测,确定所述样本图像帧序列对应的样本预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息;根据所述样本预测深度图和所述样本预测相机运动,构建损失函数;根据所述损失函数,对所述待训练的相机运动预测网络进行训练,以得到所述相机运动预测网络。
- 根据权利要求9所述的方法,其中,所述根据所述样本预测深度图和所述样本预测相机运动,构建损失函数,包括:根据所述样本预测相机运动,确定所述样本图像帧序列中所述第一样本图像帧的相邻样本图像帧相对所述第一样本图像帧的重投影误差项;根据所述样本预测深度图的分布连续性,确定惩罚函数项;根据所述重投影误差项和所述惩罚函数项,构建所述损失函数。
- 一种场景深度预测装置,包括:第一获取模块,配置为获取t时刻的目标图像帧;第一场景深度预测模块,配置为通过场景深度预测网络利用t-1时刻的第一隐状态信息对所述目标图像帧进行场景深度预测,确定所述目标图像帧对应的预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息,所述场景深度预测网络是基于相机运动预测网络辅助训练得到的。
- 根据权利要求11所述的装置,其中,所述第一场景深度预测模块包括:第一确定子模块,配置为对目标图像帧进行特征提取,确定目标图像帧对应的第一特征图,其中,第一特征图为与场景深度相关的特征图;第二确定子模块,配置为根据第一特征图和t-1时刻的第一隐状态信息,确定t时刻的第一隐状态信息;第三确定子模块,配置为根据t时刻的第一隐状态信息,确定预测深度图。
- 根据权利要求12所述的装置,其中,所述t-1时刻的第一隐状态信息包括t-1时刻的不同尺度下的第一隐状态信息;第一确定子模块具体配置为:对目标图像帧进行多尺度下采样,确定目标图像帧对应的不同尺度下的第一特征图;第二确定子模块具体配置为:针对任一尺度,根据该尺度下的第一特征图和t-1时刻的该尺度下的第一隐状态信息,确定t时刻的该尺度下的第一隐状态信息;第三确定子模块具体配置为:将t时刻的不同尺度下的第一隐状态信息进行特征融合,确定预测深度图。
- 根据权利要求11至13任一项所述的装置,其中,所述装置还包括第一训练模块,所述第一训练模块配置为:获取t时刻对应的样本图像帧序列,其中,所述样本图像帧序列包括t时刻的第一样本图像帧和所述第一样本图像帧的相邻样本图像帧;通过相机运动预测网络利用t-1时刻的第二隐状态信息对所述样本图像帧序列进行相机位姿预测,确定所述样本图像帧序列对应的样本预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息;通过待训练的场景深度预测网络利用t-1时刻的第一隐状态信息对所述第一样本图像帧进行场景深度预测,确定所述第一样本图像帧对应的样本预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息;根据所述样本预测深度图和所述样本预测相机运动,构建损失函数;根据所述损失函数,对所述待训练的场景深度预测网络进行训练,以得到所述场景深度预测网络。
- 根据权利要求14所述的装置,其中,所述第一训练模块,具体配置为:根据所述样本预测相机运动,确定所述样本图像帧序列中所述第一样本图像帧的相邻样本图像帧相对所述第一样本图像帧的重投影误差项;根据所述样本预测深度图的分布连续性,确定惩罚函数项;根据所述重投影误差项和所述惩罚函数项,构建所述损失函数。
- 一种相机运动预测装置,包括:第二获取模块,配置为获取t时刻对应的图像帧序列,其中,所述图像帧序列包括t时刻的目标图像帧和所述目标图像帧的相邻图像帧;第一相机运动预测模块,配置为通过相机运动预测网络利用t-1时刻的第二隐状态信息对所述图像帧序列进行相机位姿预测,确定所述图像帧序列对应的预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息,所述相机运动预测网络是基于场景深度预测网络辅助训练得到的。
- 根据权利要求16所述的装置,其中,所述第一相机运动预测模块包括:第六确定子模块,配置为对图像帧序列进行特征提取,确定图像帧序列对应的第二特征图,其中,第二特征图为与相机运动相关的特征图;第七确定子模块,配置为根据第二图特征和t-1时刻的第二隐状态信息,确定t时刻的第二隐状态信息;第八确定子模块,配置为根据t时刻的第二隐状态信息,确定预测相机运动。
- 根据权利要求16或17所述的装置,其中,所述预测相机运动包括图像帧序列中 相邻图像帧之间的相对位姿。
- 根据权利要求16至18任一项所述的装置,其中,所述装置还包括第二训练模块,所述第二训练模块配置为:获取t时刻对应的样本图像帧序列,其中,所述样本图像帧序列包括t时刻的第一样本图像帧和所述第一样本图像帧的相邻样本图像帧;通过场景深度预测网络利用t-1时刻的第一隐状态信息对所述第一样本图像帧进行场景深度预测,确定所述第一样本图像帧对应的样本预测深度图,其中,所述第一隐状态信息包括与场景深度相关的特征信息;通过待训练的相机运动预测网络利用t-1时刻的第二隐状态信息对所述样本图像帧序列进行相机位姿预测,确定所述样本图像帧序列对应的样本预测相机运动,其中,所述第二隐状态信息包括与相机运动相关的特征信息;根据所述样本预测深度图和所述样本预测相机运动,构建损失函数;根据所述损失函数,对所述待训练的相机运动预测网络进行训练,以得到所述相机运动预测网络。
- 根据权利要求19所述的装置,其中,所述第二训练模块,具体配置为:根据所述样本预测相机运动,确定所述样本图像帧序列中所述第一样本图像帧的相邻样本图像帧相对所述第一样本图像帧的重投影误差项;根据所述样本预测深度图的分布连续性,确定惩罚函数项;根据所述重投影误差项和所述惩罚函数项,构建所述损失函数。
- 一种电子设备,包括:处理器;配置为存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至10中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至10中任意一项所述的方法。
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至10任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217036422A KR102397268B1 (ko) | 2020-04-28 | 2021-02-08 | 시나리오 깊이와 카메라 움직임 예측 방법 및 장치, 기기, 매체와 프로그램 |
JP2021565990A JP7178514B2 (ja) | 2020-04-28 | 2021-02-08 | 場面深度とカメラ運動を予測する方法及び装置、機器、媒体並びにプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010348872.2 | 2020-04-28 | ||
CN202010348872.2A CN111540000B (zh) | 2020-04-28 | 2020-04-28 | 场景深度和相机运动预测方法及装置、电子设备和介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021218282A1 true WO2021218282A1 (zh) | 2021-11-04 |
Family
ID=71977213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/076038 WO2021218282A1 (zh) | 2020-04-28 | 2021-02-08 | 场景深度和相机运动预测方法及装置、设备、介质和程序 |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7178514B2 (zh) |
KR (1) | KR102397268B1 (zh) |
CN (2) | CN113822918B (zh) |
TW (1) | TWI767596B (zh) |
WO (1) | WO2021218282A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339402A (zh) * | 2021-12-31 | 2022-04-12 | 北京字节跳动网络技术有限公司 | 视频播放完成率预测方法、装置、介质及电子设备 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113822918B (zh) * | 2020-04-28 | 2024-07-12 | 深圳市商汤科技有限公司 | 场景深度和相机运动预测方法及装置、电子设备和介质 |
CN112492230B (zh) * | 2020-11-26 | 2023-03-24 | 北京字跳网络技术有限公司 | 视频处理方法、装置、可读介质及电子设备 |
CN112232322B (zh) * | 2020-12-14 | 2024-08-02 | 支付宝(杭州)信息技术有限公司 | 一种基于对象状态预测的图像生成方法及装置 |
CN112767481B (zh) * | 2021-01-21 | 2022-08-16 | 山东大学 | 一种基于视觉边缘特征的高精度定位及建图方法 |
KR102559936B1 (ko) * | 2022-01-28 | 2023-07-27 | 포티투닷 주식회사 | 단안 카메라를 이용하여 깊이 정보를 추정하는 방법 및 장치 |
WO2023155043A1 (zh) * | 2022-02-15 | 2023-08-24 | 中国科学院深圳先进技术研究院 | 一种基于历史信息的场景深度推理方法、装置及电子设备 |
CN114612510B (zh) * | 2022-03-01 | 2024-03-29 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备、存储介质及计算机程序产品 |
CN114998403A (zh) * | 2022-06-13 | 2022-09-02 | 北京百度网讯科技有限公司 | 深度预测方法、装置、电子设备、介质 |
TWI823491B (zh) * | 2022-07-22 | 2023-11-21 | 鴻海精密工業股份有限公司 | 深度估計模型的優化方法、裝置、電子設備及存儲介質 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019099684A1 (en) * | 2017-11-15 | 2019-05-23 | Google Llc | Unsupervised learning of image depth and ego-motion prediction neural networks |
CN110264526A (zh) * | 2019-06-19 | 2019-09-20 | 华东师范大学 | 一种基于深度学习的场景深度和摄像机位置姿势求解方法 |
CN110378250A (zh) * | 2019-06-28 | 2019-10-25 | 深圳先进技术研究院 | 用于场景认知的神经网络的训练方法、装置及终端设备 |
WO2020051270A1 (en) * | 2018-09-05 | 2020-03-12 | Google Llc | Unsupervised depth prediction neural networks |
CN111028282A (zh) * | 2019-11-29 | 2020-04-17 | 浙江省北大信息技术高等研究院 | 一种无监督位姿与深度计算方法及系统 |
CN111540000A (zh) * | 2020-04-28 | 2020-08-14 | 深圳市商汤科技有限公司 | 场景深度和相机运动预测方法及装置、电子设备和介质 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018052875A1 (en) * | 2016-09-15 | 2018-03-22 | Google Llc | Image depth prediction neural networks |
CN106780543B (zh) * | 2017-01-13 | 2019-06-28 | 深圳市唯特视科技有限公司 | 一种基于卷积神经网络的双框架估计深度和运动方法 |
US10803546B2 (en) * | 2017-11-03 | 2020-10-13 | Baidu Usa Llc | Systems and methods for unsupervised learning of geometry from images using depth-normal consistency |
CN109087349B (zh) * | 2018-07-18 | 2021-01-26 | 亮风台(上海)信息科技有限公司 | 一种单目深度估计方法、装置、终端和存储介质 |
US10860873B2 (en) * | 2018-09-17 | 2020-12-08 | Honda Motor Co., Ltd. | Driver behavior recognition and prediction |
CN109978851B (zh) * | 2019-03-22 | 2021-01-15 | 北京航空航天大学 | 一种红外视频空中弱小运动目标检测跟踪方法 |
CN110060286B (zh) * | 2019-04-25 | 2023-05-23 | 东北大学 | 一种单目深度估计方法 |
CN110136185B (zh) * | 2019-05-23 | 2022-09-06 | 中国科学技术大学 | 一种单目深度估计方法及系统 |
CN110310317A (zh) * | 2019-06-28 | 2019-10-08 | 西北工业大学 | 一种基于深度学习的单目视觉场景深度估计的方法 |
CN110473254A (zh) * | 2019-08-20 | 2019-11-19 | 北京邮电大学 | 一种基于深度神经网络的位姿估计方法及装置 |
CN110503680B (zh) * | 2019-08-29 | 2023-08-18 | 大连海事大学 | 一种基于非监督的卷积神经网络单目场景深度估计方法 |
CN110942484B (zh) * | 2019-11-26 | 2022-07-12 | 福州大学 | 基于遮挡感知和特征金字塔匹配的相机自运动估计方法 |
-
2020
- 2020-04-28 CN CN202111204857.1A patent/CN113822918B/zh active Active
- 2020-04-28 CN CN202010348872.2A patent/CN111540000B/zh active Active
-
2021
- 2021-02-08 KR KR1020217036422A patent/KR102397268B1/ko active IP Right Grant
- 2021-02-08 WO PCT/CN2021/076038 patent/WO2021218282A1/zh active Application Filing
- 2021-02-08 JP JP2021565990A patent/JP7178514B2/ja active Active
- 2021-03-04 TW TW110107767A patent/TWI767596B/zh active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019099684A1 (en) * | 2017-11-15 | 2019-05-23 | Google Llc | Unsupervised learning of image depth and ego-motion prediction neural networks |
WO2020051270A1 (en) * | 2018-09-05 | 2020-03-12 | Google Llc | Unsupervised depth prediction neural networks |
CN110264526A (zh) * | 2019-06-19 | 2019-09-20 | 华东师范大学 | 一种基于深度学习的场景深度和摄像机位置姿势求解方法 |
CN110378250A (zh) * | 2019-06-28 | 2019-10-25 | 深圳先进技术研究院 | 用于场景认知的神经网络的训练方法、装置及终端设备 |
CN111028282A (zh) * | 2019-11-29 | 2020-04-17 | 浙江省北大信息技术高等研究院 | 一种无监督位姿与深度计算方法及系统 |
CN111540000A (zh) * | 2020-04-28 | 2020-08-14 | 深圳市商汤科技有限公司 | 场景深度和相机运动预测方法及装置、电子设备和介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339402A (zh) * | 2021-12-31 | 2022-04-12 | 北京字节跳动网络技术有限公司 | 视频播放完成率预测方法、装置、介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
KR20210138788A (ko) | 2021-11-19 |
JP7178514B2 (ja) | 2022-11-25 |
CN113822918A (zh) | 2021-12-21 |
KR102397268B1 (ko) | 2022-05-12 |
CN111540000B (zh) | 2021-11-05 |
JP2022528012A (ja) | 2022-06-07 |
CN113822918B (zh) | 2024-07-12 |
TWI767596B (zh) | 2022-06-11 |
CN111540000A (zh) | 2020-08-14 |
TW202141428A (zh) | 2021-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021218282A1 (zh) | 场景深度和相机运动预测方法及装置、设备、介质和程序 | |
TWI706379B (zh) | 圖像處理方法及裝置、電子設備和儲存介質 | |
TWI766286B (zh) | 圖像處理方法及圖像處理裝置、電子設備和電腦可讀儲存媒介 | |
CN111783986B (zh) | 网络训练方法及装置、姿态预测方法及装置 | |
JP7262659B2 (ja) | 目標対象物マッチング方法及び装置、電子機器並びに記憶媒体 | |
TW202107339A (zh) | 位姿確定方法、位姿確定裝置、電子設備和電腦可讀儲存媒介 | |
WO2021035833A1 (zh) | 姿态预测方法、模型训练方法及装置 | |
WO2021082241A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN111401230B (zh) | 姿态估计方法及装置、电子设备和存储介质 | |
CN111680646B (zh) | 动作检测方法及装置、电子设备和存储介质 | |
WO2022151686A1 (zh) | 场景图像展示方法、装置、设备、存储介质、程序及产品 | |
WO2022193507A1 (zh) | 图像处理方法及装置、设备、存储介质、程序和程序产品 | |
WO2022134475A1 (zh) | 点云地图构建方法及装置、电子设备、存储介质和程序 | |
KR20220123218A (ko) | 타깃 포지셔닝 방법, 장치, 전자 기기, 저장 매체 및 프로그램 | |
CN112184787A (zh) | 图像配准方法及装置、电子设备和存储介质 | |
WO2022247091A1 (zh) | 人群定位方法及装置、电子设备和存储介质 | |
CN113052874B (zh) | 目标跟踪方法及装置、电子设备和存储介质 | |
WO2023155350A1 (zh) | 一种人群定位方法及装置、电子设备和存储介质 | |
JP7261889B2 (ja) | 共有地図に基づいた測位方法及び装置、電子機器並びに記憶媒体 | |
CN112330721A (zh) | 三维坐标的恢复方法及装置、电子设备和存储介质 | |
CN112308878A (zh) | 一种信息处理方法、装置、电子设备和存储介质 | |
CN112967311B (zh) | 三维线图构建方法及装置、电子设备和存储介质 | |
CN112837361B (zh) | 一种深度估计方法及装置、电子设备和存储介质 | |
CN114638817A (zh) | 一种图像分割方法及装置、电子设备和存储介质 | |
CN113297983A (zh) | 人群定位方法及装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021565990 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217036422 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21796670 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.02.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21796670 Country of ref document: EP Kind code of ref document: A1 |