CN112511644A

CN112511644A - Multi-device pose sharing method and device

Info

Publication number: CN112511644A
Application number: CN202011475281.8A
Authority: CN
Inventors: 周宏伟; 陈利敏; 乔秀全; 黄亚坤
Original assignee: Beijing National Speed Skating Hall Management Co ltd; Capinfo Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Beijing National Speed Skating Hall Management Co ltd; Capinfo Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-16

Abstract

The embodiment of the invention provides a multi-device pose sharing method and a multi-device pose sharing device, wherein the method comprises the following steps: determining a video frame distribution scheme according to the equipment information of each terminal equipment; and distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices. By transferring and unloading the pose calculation task with heavy task from the far-end cloud side to the terminal equipment side, the overhead caused by frequent transmission of the video frame is reduced, and meanwhile, the calculation load of the far-end cloud server is reduced.

Description

Multi-device pose sharing method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a multi-device pose sharing method and device.

Background

The increasing proliferation and popularity of mobile device functionality has led to a wide variety of multi-device applications, including device pose sharing, multi-screen display, multi-player interaction, etc., effectively expanding applications across multiple devices for real-time sharing and interaction (e.g., object detection, pose estimation, semantic segmentation, etc.) in video streaming applications. Currently, there is work exploring multi-device interactions that leverage rich computing resources at the cloud (including edge clouds) to accomplish the video streaming task. The methods transmit the real-time video stream from the terminal device to the cloud to complete intensive task calculation, and consume a large amount of network bandwidth and calculation resources. Therefore, it has become a potential direction to accomplish video stream processing and sharing on the terminal device side through multi-device cooperation. However, due to the stringent requirements for latency and intensive computations, current implementations of video stream analysis and sharing for multi-device interactive applications are still in the infancy and many problems need to be solved. Therefore, how to realize multi-device gesture sharing and interaction by analyzing streaming data of a real-time Visual Odometer (VO) is a precondition for realizing various multi-device applications.

The visual odometer is a basic technology in the field of robot positioning and automatic driving, and generates relative postures among images by continuously tracking the self-movement of a camera, and integrates the relative postures into an absolute posture under a given initial state. The visual odometer can be divided into a monocular visual odometer and a binocular visual odometer according to the number of used cameras, and because the monocular visual odometer can obtain the pose only by one camera, the visual odometer has the characteristics of portability, lightness, cheapness and the like, and is widely researched and applied. Conventional visual odometry techniques are classified into feature point methods and direct methods. The feature point method estimates the pose of the camera by matching feature vectors between adjacent frames and comprises modules of feature detection, feature matching, motion estimation, scale estimation, rear-end optimization and the like. The characteristic point method can obtain better effect under most conditions, but the problem of matching failure can be caused by the fact that characteristic points are lost when no texture area exists, and the extraction and calculation of the characteristic points are time-consuming. The direct method estimates the camera motion and the spatial position of the pixels by minimizing the photometric error, which can achieve better effect in a non-textured scene, such as a corridor or a smooth wall, but is only suitable for the situation that the motion amplitude is small and the overall brightness of the picture does not change much. Deep Learning (DL) can extract high-level features from images, providing an alternative to the problem of visual odometry. The monocular vision mileage calculation method based on deep learning does not depend on any module of the traditional vision mileage meter, and can obtain the camera pose in an end-to-end mode without adjusting system parameters.

The existing visual odometer model (such as deep learning) mainly utilizes the combination of a convolutional neural network and a cyclic neural network to directly learn pose transformation from an original image sequence, and compared with the traditional method, the method generates an accurate inter-image pose estimation result and does not need to align with a real track to obtain absolute scale estimation. However, the features obtained by the Flownet network are input to a sequence encoder as a sequence for monocular visual odometer learning by a long-short term memory LSTM, so that the method has the disadvantages of large parameter quantity, large model and high calculation complexity, and is difficult to apply to scenes with higher requirements on real-time performance.

Therefore, how to better realize pose calculation and pose sharing become an urgent problem to be solved in the industry.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for sharing a pose of multiple devices, so as to solve the technical problems mentioned in the foregoing background art, or at least partially solve the technical problems mentioned in the foregoing background art.

In a first aspect, an embodiment of the present invention provides a multi-device pose sharing method, including:

determining a video frame distribution scheme according to the equipment information of each terminal equipment;

and distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.

More specifically, the step of determining a video frame allocation scheme according to the device information of each terminal device specifically includes;

constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames collected by each terminal equipment;

solving each network maximum flow model to obtain an optimal maximum flow matching path so as to determine a video frame distribution scheme according to each optimal maximum flow matching path;

the network maximum flow model is obtained by respectively adding virtual source nodes and virtual destination nodes on the basis of a preset weighted bipartite graph;

the preset weighted bipartite graph is a bipartite graph model which is obtained by distributing video frame calculation tasks collected by terminal equipment to other terminal equipment and modeling the tasks into weighted bipartite graphs.

More specifically, the step of solving each network maximum flow model to obtain an optimal maximum flow matching path specifically includes:

and when the calculation amount of the Ford-Fulkerson algorithm for solving each network maximum flow model is less than that of the real-time recurrent neural network model for solving each network maximum flow model, solving each network maximum flow model through the Ford-Fulkerson algorithm to obtain a plurality of optimal maximum flow matching paths.

In a second aspect, an embodiment of the present invention provides another multi-device pose sharing method, including:

the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme;

wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device;

the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment;

the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.

More specifically, before the step of performing pose estimation on the local video frame of the target terminal and the video frame to be coordinated by the target terminal through a preset pose estimation model, the method further includes:

constructing a pose estimation module;

acquiring an initial sample video frame with real attitude information, and preprocessing the initial sample video frame to obtain a plurality of continuous sample video frames;

taking two sample video frames at continuous time as a group of training samples, taking real attitude information as a label, inputting the training samples with the real attitude information label into a neural network based on a depth separable convolution and time shift module, and obtaining an estimated vector of the samples;

and training by utilizing the estimated vector of the sample and the loss function constructed by the real attitude information label, and finishing the training when a preset training condition is met to obtain a preset pose estimation model.

More specifically, the step of constructing the pose estimation module specifically includes:

constructing a backbone network of a pose estimation module by adopting depth separable convolution;

and inserting the time displacement module into a residual error branch of the backbone network to complete the construction of the pose estimation module.

More specifically, the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically includes:

cutting the initial sample video frame according to a preset size to obtain a cut sample video frame;

dividing the cut sample video frame by a preset RGB variance to obtain a sample video frame;

acquiring a plurality of continuous sample video frames in a sliding window mode;

the preset RGB variance refers to an average RGB value of an initial sample video frame set;

wherein the preset sizes are specifically as follows: the picture width resolution is 512, the picture height resolution is 256, and the number of color channels is 3.

In a third aspect, an embodiment of the present invention provides a multi-device pose sharing apparatus, including:

the distribution module is used for determining a video frame distribution scheme according to the equipment information of each terminal equipment;

and the sharing module is used for distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the multi-device pose sharing method according to the second aspect when executing the program.

In a fifth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the multi-device pose sharing method according to the second aspect.

According to the multi-device pose sharing method and device provided by the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video frame distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the multi-device pose real-time sharing target is realized through broadcast interaction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a multi-device pose sharing method described in an embodiment of the present invention;

FIG. 2 is a diagram illustrating a video frame allocation method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a multi-device pose sharing method according to another embodiment of the present invention;

FIG. 4 is a diagram illustrating operation of the time shift module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a multi-device pose sharing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a multi-device pose sharing method described in an embodiment of the present invention, as shown in fig. 1, including:

step S1, determining a video frame distribution scheme according to the device information of each terminal device;

specifically, the cloud server determines a video frame allocation scheme according to the device information of each terminal device.

The device information of each terminal device described in the embodiments of the present invention specifically refers to information such as computing capability, available computing resources, and network bandwidth of each terminal device.

And constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames acquired by each terminal equipment, solving each network maximum stream model to obtain an optimal maximum stream matching path, and determining a video frame distribution scheme according to each optimal maximum stream matching path.

And step S2, distributing video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the position and pose of the distributed video frame, determine position and pose estimation information, and broadcast the position and pose estimation information to all terminal devices.

Specifically, before the pose estimation is performed, the cloud server distributes the trained preset pose estimation model and the related service deployment to each edge terminal Device, and meanwhile, a Device-to-Device (D2D) technology is used between the plurality of terminal devices participating in the pose sharing and the server to complete the initialization of communication, so as to establish the communication connection between the terminal devices.

The video frame distribution scheme described in the embodiment of the invention specifically indicates that each terminal device distributes the video frames which are acquired by each terminal device but cannot complete the pose calculation to the specified target terminal device, so that the target terminal device assists the terminal device in completing the pose calculation of the video frames.

The cloud server sends the video frame distribution scheme to each terminal device to help the terminal device to determine a target terminal device needing to send a video frame for assisting calculation, after the video frame is redistributed, the video frame corresponding to each terminal device and needing position and posture calculation is obtained, the terminal devices operate the light-weight position and posture estimation model to calculate the position and posture information of the current video frame, then cooperate to complete the received video frame data sent by other devices, and broadcast the calculation result to all the terminal devices, so that the purposes of cooperatively finishing the current position and posture estimation of each terminal device, sharing and interacting the current position and posture information in real time and finishing the requirement of mutually interacting the position and posture information among the terminal devices are achieved.

Meanwhile, when new terminal equipment applies to join in sharing interaction, the cloud server loads a lightweight preset pose estimation model and a light weight preset service to the terminal equipment, then establishes communication connection among the terminal equipment, other terminal equipment and an edge server, registers the terminal equipment in an equipment information table in real time, and activates the terminal equipment to participate in multi-equipment sharing interaction calculation at the next moment; when the terminal equipment fails or actively stops sharing interaction, the cloud server firstly deletes the information of the terminal equipment from the equipment information table, deletes records connected with the equipment, and finally deletes the equipment from the sharing interaction without influencing other current terminal equipment participating in pose sharing.

According to the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the target of multi-device pose real-time sharing is realized through broadcast interaction.

On the basis of the foregoing embodiment, the step of determining a video frame allocation scheme according to the device information of each terminal device specifically includes;

Specifically, fig. 2 is a schematic diagram of a video frame allocation method described in an embodiment of the present invention, as shown in fig. 2, for an arbitrary time T, a video frame calculation task acquired by a terminal device is distributed to the best-matched terminal device and modeled as a weighted bipartite graph model, as shown in (a) in fig. 2, each frame calculation task is matched to a terminal device including an edge calculation center to complete calculation; the real-time task scheduling module adds virtual nodes on the basis of the established weighted bipartite graph to construct a network maximum flow problem, as shown in (b) of fig. 2, namely, a maximum flow matching path with maximum profit and minimum time delay is searched from the constructed virtual start node and the constructed virtual stop node; according to the established network maximum flow model, a real-time task computing and scheduling module carries out real-time solution by adopting a two-stage method according to the scale of the current participating equipment, namely when the scale of a problem is small, a traditional Ford-Fulkerson algorithm is adopted for solving, when the scale of the terminal equipment participating in shared calculation is large, a real-time recurrent neural network model is adopted for solving, the judgment principle is that whether the theoretical calculation amount of the Ford-Fulkerson inference algorithm of the current equipment scale is larger than the calculation amount of the recurrent neural network model, and a video frame distribution scheme is obtained according to the calculated optimal task matching result at the current moment.

Fig. 3 is a multi-device pose sharing method described in another embodiment of the present invention, as shown in fig. 3, including:

step S31, the terminal device obtains a video distribution scheme sent by the cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme;

specifically, according to the embodiment of the invention, a plurality of network maximum flow models are constructed according to the equipment information of each terminal equipment and the video frames acquired by each terminal equipment, each network maximum flow model is solved to obtain an optimal maximum flow matching path, and a video frame distribution scheme is determined according to each optimal maximum flow matching path.

Step S32, the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal devices;

Specifically, the step of estimating the pose by using the preset pose estimation model in the embodiment of the present invention specifically includes:

in the embodiment of the invention, a plurality of continuous video frames are used, two video frames at adjacent moments are combined in a sliding window mode, the two video frames at adjacent moments, namely a T moment picture and a T +1 moment picture are input into a preset pose estimation model, estimated poses of the T moment and the T +1 moment are output, only a rear one-dimensional vector is taken as an estimation of the T +1 moment and taken as a final output result, the final output result is broadcasted to all terminal devices, the estimation calculation of the current pose of each terminal device is completed in a coordinated mode, other terminal devices share and interact the current pose information in real time, and the requirement of mutual pose information interaction between the terminal devices is met.

On the basis of the above embodiment, before the step of performing pose estimation on the local video frame of the target terminal and the video frame to be coordinated by the target terminal through a preset pose estimation model, the method further includes:

constructing a pose estimation module;

The step of constructing the pose estimation module specifically includes:

Specifically, the lightweight pose estimation model adopts a deep separable convolution to construct a backbone network, and a time shift module is inserted into the backbone network, fig. 4 is an operation schematic diagram of the time shift module described in an embodiment of the present invention, and as shown in fig. 4, the embodiment of the present invention uses a lightweight model MobilenetV2 as the backbone network of the pose estimation module, which is used for learning the motion characteristic information of each image. The method comprises the steps of firstly extracting basic features from a conventional convolutional layer by a sample video frame sequence, then inputting an output result of the convolutional layer into a network formed by a time shifting module and a 16-layer depth separable convolution, inputting information exchanged by the time shifting module into the conventional convolutional layer, and finally reducing dimensionality by using a full-connection layer to obtain two 6-dimensional attitude vectors. The time shift module is inserted into a residual error branch of the backbone network, so that the fusion of time information is realized while the spatial information is kept, the additional calculation cost is not increased, and the construction of the pose estimation module is completed.

And applying the convolutional neural network to each sample video frame after the normalization processing, and firstly extracting the basic features of each sample video frame. In each residual block, the time shifting module moves the front 1/8 channel information of the characteristic diagram at the time T to the corresponding position of the characteristic diagram at the time T +1 along the time dimension, the vacant channels reserved by the characteristic diagram at the time T are filled with zero, and the channel shifted out by the characteristic diagram at the time T +1 is cut off. Because the exchanged T +1 moment feature map information contains T moment feature map information, a T +1 moment pose estimation result (a second attitude vector) is taken as a relative pose between a T moment video frame and a T +1 moment video frame, the mean square error of the estimated pose between the T moment video frame and the T +1 moment video frame and the real pose is taken as a target to train the neural network, and when a preset training condition is met, a preset pose estimation model is obtained.

The preset training condition described in the embodiment of the present invention may be that a loss value of the loss function is smaller than a preset threshold, or a preset training number or a preset training time is satisfied.

On the basis of the foregoing embodiment, the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically includes:

Specifically, the initial sample video frame size is clipped to 512 × 256 × 3 in the embodiment of the present invention, where 512 is the picture width resolution, 256 is the picture height resolution, and 3 is the number of color channels. The average RGB value of the data set is subtracted from each picture and divided by the RGB variance to obtain a normalized sample video frame.

Fig. 5 is a schematic diagram of a multi-device pose sharing apparatus according to an embodiment of the present invention, as shown in fig. 5, including: an assignment module 510 and a sharing module 520; the allocating module 510 is configured to determine a video frame allocation scheme according to the device information of each terminal device; the sharing module 520 is configured to allocate video frames to each terminal device according to the video frame allocation scheme, to obtain an allocated video frame corresponding to each terminal device, so that each terminal device performs pose estimation on the allocated video frame, determines pose estimation information, and broadcasts the pose estimation information to all terminal devices.

The apparatus provided in the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following method: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing server instructions, where the server instructions cause a computer to execute the method provided in the foregoing embodiments, for example, the method includes: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multi-device pose sharing method is characterized by comprising the following steps:

2. The multi-device pose sharing method according to claim 1, wherein the step of determining a video frame allocation scheme according to the device information of each of the terminal devices specifically comprises;

3. The multi-device pose sharing method according to claim 2, wherein the step of solving each network maximum flow model to obtain an optimal maximum flow matching path specifically comprises:

4. A multi-device pose sharing method is characterized by comprising the following steps:

5. The multi-device pose sharing method according to claim 4, wherein before the step of pose estimation by the target terminal on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model, the method further comprises:

constructing a pose estimation module;

6. The multi-device pose sharing method according to claim 5, wherein the step of constructing the pose estimation module specifically comprises:

7. The multi-device pose sharing method according to claim 5, wherein the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically comprises:

8. A multi-device pose sharing apparatus, comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the multi-device pose sharing method according to any one of claims 4 to 7 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the multi-device pose sharing method according to any one of claims 4 to 7.