CN112511644A - Multi-device pose sharing method and device - Google Patents

Multi-device pose sharing method and device Download PDF

Info

Publication number
CN112511644A
CN112511644A CN202011475281.8A CN202011475281A CN112511644A CN 112511644 A CN112511644 A CN 112511644A CN 202011475281 A CN202011475281 A CN 202011475281A CN 112511644 A CN112511644 A CN 112511644A
Authority
CN
China
Prior art keywords
video frame
pose
pose estimation
information
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011475281.8A
Other languages
Chinese (zh)
Inventor
周宏伟
陈利敏
乔秀全
黄亚坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing National Speed Skating Hall Management Co ltd
Capinfo Co ltd
Beijing University of Posts and Telecommunications
Original Assignee
Beijing National Speed Skating Hall Management Co ltd
Capinfo Co ltd
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing National Speed Skating Hall Management Co ltd, Capinfo Co ltd, Beijing University of Posts and Telecommunications filed Critical Beijing National Speed Skating Hall Management Co ltd
Priority to CN202011475281.8A priority Critical patent/CN112511644A/en
Publication of CN112511644A publication Critical patent/CN112511644A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The embodiment of the invention provides a multi-device pose sharing method and a multi-device pose sharing device, wherein the method comprises the following steps: determining a video frame distribution scheme according to the equipment information of each terminal equipment; and distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices. By transferring and unloading the pose calculation task with heavy task from the far-end cloud side to the terminal equipment side, the overhead caused by frequent transmission of the video frame is reduced, and meanwhile, the calculation load of the far-end cloud server is reduced.

Description

Multi-device pose sharing method and device
Technical Field
The invention relates to the technical field of information processing, in particular to a multi-device pose sharing method and device.
Background
The increasing proliferation and popularity of mobile device functionality has led to a wide variety of multi-device applications, including device pose sharing, multi-screen display, multi-player interaction, etc., effectively expanding applications across multiple devices for real-time sharing and interaction (e.g., object detection, pose estimation, semantic segmentation, etc.) in video streaming applications. Currently, there is work exploring multi-device interactions that leverage rich computing resources at the cloud (including edge clouds) to accomplish the video streaming task. The methods transmit the real-time video stream from the terminal device to the cloud to complete intensive task calculation, and consume a large amount of network bandwidth and calculation resources. Therefore, it has become a potential direction to accomplish video stream processing and sharing on the terminal device side through multi-device cooperation. However, due to the stringent requirements for latency and intensive computations, current implementations of video stream analysis and sharing for multi-device interactive applications are still in the infancy and many problems need to be solved. Therefore, how to realize multi-device gesture sharing and interaction by analyzing streaming data of a real-time Visual Odometer (VO) is a precondition for realizing various multi-device applications.
The visual odometer is a basic technology in the field of robot positioning and automatic driving, and generates relative postures among images by continuously tracking the self-movement of a camera, and integrates the relative postures into an absolute posture under a given initial state. The visual odometer can be divided into a monocular visual odometer and a binocular visual odometer according to the number of used cameras, and because the monocular visual odometer can obtain the pose only by one camera, the visual odometer has the characteristics of portability, lightness, cheapness and the like, and is widely researched and applied. Conventional visual odometry techniques are classified into feature point methods and direct methods. The feature point method estimates the pose of the camera by matching feature vectors between adjacent frames and comprises modules of feature detection, feature matching, motion estimation, scale estimation, rear-end optimization and the like. The characteristic point method can obtain better effect under most conditions, but the problem of matching failure can be caused by the fact that characteristic points are lost when no texture area exists, and the extraction and calculation of the characteristic points are time-consuming. The direct method estimates the camera motion and the spatial position of the pixels by minimizing the photometric error, which can achieve better effect in a non-textured scene, such as a corridor or a smooth wall, but is only suitable for the situation that the motion amplitude is small and the overall brightness of the picture does not change much. Deep Learning (DL) can extract high-level features from images, providing an alternative to the problem of visual odometry. The monocular vision mileage calculation method based on deep learning does not depend on any module of the traditional vision mileage meter, and can obtain the camera pose in an end-to-end mode without adjusting system parameters.
The existing visual odometer model (such as deep learning) mainly utilizes the combination of a convolutional neural network and a cyclic neural network to directly learn pose transformation from an original image sequence, and compared with the traditional method, the method generates an accurate inter-image pose estimation result and does not need to align with a real track to obtain absolute scale estimation. However, the features obtained by the Flownet network are input to a sequence encoder as a sequence for monocular visual odometer learning by a long-short term memory LSTM, so that the method has the disadvantages of large parameter quantity, large model and high calculation complexity, and is difficult to apply to scenes with higher requirements on real-time performance.
Therefore, how to better realize pose calculation and pose sharing become an urgent problem to be solved in the industry.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for sharing a pose of multiple devices, so as to solve the technical problems mentioned in the foregoing background art, or at least partially solve the technical problems mentioned in the foregoing background art.
In a first aspect, an embodiment of the present invention provides a multi-device pose sharing method, including:
determining a video frame distribution scheme according to the equipment information of each terminal equipment;
and distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.
More specifically, the step of determining a video frame allocation scheme according to the device information of each terminal device specifically includes;
constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames collected by each terminal equipment;
solving each network maximum flow model to obtain an optimal maximum flow matching path so as to determine a video frame distribution scheme according to each optimal maximum flow matching path;
the network maximum flow model is obtained by respectively adding virtual source nodes and virtual destination nodes on the basis of a preset weighted bipartite graph;
the preset weighted bipartite graph is a bipartite graph model which is obtained by distributing video frame calculation tasks collected by terminal equipment to other terminal equipment and modeling the tasks into weighted bipartite graphs.
More specifically, the step of solving each network maximum flow model to obtain an optimal maximum flow matching path specifically includes:
and when the calculation amount of the Ford-Fulkerson algorithm for solving each network maximum flow model is less than that of the real-time recurrent neural network model for solving each network maximum flow model, solving each network maximum flow model through the Ford-Fulkerson algorithm to obtain a plurality of optimal maximum flow matching paths.
In a second aspect, an embodiment of the present invention provides another multi-device pose sharing method, including:
the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme;
wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device;
the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment;
the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
More specifically, before the step of performing pose estimation on the local video frame of the target terminal and the video frame to be coordinated by the target terminal through a preset pose estimation model, the method further includes:
constructing a pose estimation module;
acquiring an initial sample video frame with real attitude information, and preprocessing the initial sample video frame to obtain a plurality of continuous sample video frames;
taking two sample video frames at continuous time as a group of training samples, taking real attitude information as a label, inputting the training samples with the real attitude information label into a neural network based on a depth separable convolution and time shift module, and obtaining an estimated vector of the samples;
and training by utilizing the estimated vector of the sample and the loss function constructed by the real attitude information label, and finishing the training when a preset training condition is met to obtain a preset pose estimation model.
More specifically, the step of constructing the pose estimation module specifically includes:
constructing a backbone network of a pose estimation module by adopting depth separable convolution;
and inserting the time displacement module into a residual error branch of the backbone network to complete the construction of the pose estimation module.
More specifically, the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically includes:
cutting the initial sample video frame according to a preset size to obtain a cut sample video frame;
dividing the cut sample video frame by a preset RGB variance to obtain a sample video frame;
acquiring a plurality of continuous sample video frames in a sliding window mode;
the preset RGB variance refers to an average RGB value of an initial sample video frame set;
wherein the preset sizes are specifically as follows: the picture width resolution is 512, the picture height resolution is 256, and the number of color channels is 3.
In a third aspect, an embodiment of the present invention provides a multi-device pose sharing apparatus, including:
the distribution module is used for determining a video frame distribution scheme according to the equipment information of each terminal equipment;
and the sharing module is used for distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the multi-device pose sharing method according to the second aspect when executing the program.
In a fifth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the multi-device pose sharing method according to the second aspect.
According to the multi-device pose sharing method and device provided by the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video frame distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the multi-device pose real-time sharing target is realized through broadcast interaction.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a multi-device pose sharing method described in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a video frame allocation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a multi-device pose sharing method according to another embodiment of the present invention;
FIG. 4 is a diagram illustrating operation of the time shift module according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a multi-device pose sharing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a multi-device pose sharing method described in an embodiment of the present invention, as shown in fig. 1, including:
step S1, determining a video frame distribution scheme according to the device information of each terminal device;
specifically, the cloud server determines a video frame allocation scheme according to the device information of each terminal device.
The device information of each terminal device described in the embodiments of the present invention specifically refers to information such as computing capability, available computing resources, and network bandwidth of each terminal device.
And constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames acquired by each terminal equipment, solving each network maximum stream model to obtain an optimal maximum stream matching path, and determining a video frame distribution scheme according to each optimal maximum stream matching path.
And step S2, distributing video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the position and pose of the distributed video frame, determine position and pose estimation information, and broadcast the position and pose estimation information to all terminal devices.
Specifically, before the pose estimation is performed, the cloud server distributes the trained preset pose estimation model and the related service deployment to each edge terminal Device, and meanwhile, a Device-to-Device (D2D) technology is used between the plurality of terminal devices participating in the pose sharing and the server to complete the initialization of communication, so as to establish the communication connection between the terminal devices.
The video frame distribution scheme described in the embodiment of the invention specifically indicates that each terminal device distributes the video frames which are acquired by each terminal device but cannot complete the pose calculation to the specified target terminal device, so that the target terminal device assists the terminal device in completing the pose calculation of the video frames.
The cloud server sends the video frame distribution scheme to each terminal device to help the terminal device to determine a target terminal device needing to send a video frame for assisting calculation, after the video frame is redistributed, the video frame corresponding to each terminal device and needing position and posture calculation is obtained, the terminal devices operate the light-weight position and posture estimation model to calculate the position and posture information of the current video frame, then cooperate to complete the received video frame data sent by other devices, and broadcast the calculation result to all the terminal devices, so that the purposes of cooperatively finishing the current position and posture estimation of each terminal device, sharing and interacting the current position and posture information in real time and finishing the requirement of mutually interacting the position and posture information among the terminal devices are achieved.
Meanwhile, when new terminal equipment applies to join in sharing interaction, the cloud server loads a lightweight preset pose estimation model and a light weight preset service to the terminal equipment, then establishes communication connection among the terminal equipment, other terminal equipment and an edge server, registers the terminal equipment in an equipment information table in real time, and activates the terminal equipment to participate in multi-equipment sharing interaction calculation at the next moment; when the terminal equipment fails or actively stops sharing interaction, the cloud server firstly deletes the information of the terminal equipment from the equipment information table, deletes records connected with the equipment, and finally deletes the equipment from the sharing interaction without influencing other current terminal equipment participating in pose sharing.
According to the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the target of multi-device pose real-time sharing is realized through broadcast interaction.
On the basis of the foregoing embodiment, the step of determining a video frame allocation scheme according to the device information of each terminal device specifically includes;
constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames collected by each terminal equipment;
solving each network maximum flow model to obtain an optimal maximum flow matching path so as to determine a video frame distribution scheme according to each optimal maximum flow matching path;
the network maximum flow model is obtained by respectively adding virtual source nodes and virtual destination nodes on the basis of a preset weighted bipartite graph;
the preset weighted bipartite graph is a bipartite graph model which is obtained by distributing video frame calculation tasks collected by terminal equipment to other terminal equipment and modeling the tasks into weighted bipartite graphs.
Specifically, fig. 2 is a schematic diagram of a video frame allocation method described in an embodiment of the present invention, as shown in fig. 2, for an arbitrary time T, a video frame calculation task acquired by a terminal device is distributed to the best-matched terminal device and modeled as a weighted bipartite graph model, as shown in (a) in fig. 2, each frame calculation task is matched to a terminal device including an edge calculation center to complete calculation; the real-time task scheduling module adds virtual nodes on the basis of the established weighted bipartite graph to construct a network maximum flow problem, as shown in (b) of fig. 2, namely, a maximum flow matching path with maximum profit and minimum time delay is searched from the constructed virtual start node and the constructed virtual stop node; according to the established network maximum flow model, a real-time task computing and scheduling module carries out real-time solution by adopting a two-stage method according to the scale of the current participating equipment, namely when the scale of a problem is small, a traditional Ford-Fulkerson algorithm is adopted for solving, when the scale of the terminal equipment participating in shared calculation is large, a real-time recurrent neural network model is adopted for solving, the judgment principle is that whether the theoretical calculation amount of the Ford-Fulkerson inference algorithm of the current equipment scale is larger than the calculation amount of the recurrent neural network model, and a video frame distribution scheme is obtained according to the calculated optimal task matching result at the current moment.
Fig. 3 is a multi-device pose sharing method described in another embodiment of the present invention, as shown in fig. 3, including:
step S31, the terminal device obtains a video distribution scheme sent by the cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme;
wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device;
specifically, according to the embodiment of the invention, a plurality of network maximum flow models are constructed according to the equipment information of each terminal equipment and the video frames acquired by each terminal equipment, each network maximum flow model is solved to obtain an optimal maximum flow matching path, and a video frame distribution scheme is determined according to each optimal maximum flow matching path.
Step S32, the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal devices;
the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
Specifically, the step of estimating the pose by using the preset pose estimation model in the embodiment of the present invention specifically includes:
in the embodiment of the invention, a plurality of continuous video frames are used, two video frames at adjacent moments are combined in a sliding window mode, the two video frames at adjacent moments, namely a T moment picture and a T +1 moment picture are input into a preset pose estimation model, estimated poses of the T moment and the T +1 moment are output, only a rear one-dimensional vector is taken as an estimation of the T +1 moment and taken as a final output result, the final output result is broadcasted to all terminal devices, the estimation calculation of the current pose of each terminal device is completed in a coordinated mode, other terminal devices share and interact the current pose information in real time, and the requirement of mutual pose information interaction between the terminal devices is met.
According to the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the target of multi-device pose real-time sharing is realized through broadcast interaction.
On the basis of the above embodiment, before the step of performing pose estimation on the local video frame of the target terminal and the video frame to be coordinated by the target terminal through a preset pose estimation model, the method further includes:
constructing a pose estimation module;
acquiring an initial sample video frame with real attitude information, and preprocessing the initial sample video frame to obtain a plurality of continuous sample video frames;
taking two sample video frames at continuous time as a group of training samples, taking real attitude information as a label, inputting the training samples with the real attitude information label into a neural network based on a depth separable convolution and time shift module, and obtaining an estimated vector of the samples;
and training by utilizing the estimated vector of the sample and the loss function constructed by the real attitude information label, and finishing the training when a preset training condition is met to obtain a preset pose estimation model.
The step of constructing the pose estimation module specifically includes:
constructing a backbone network of a pose estimation module by adopting depth separable convolution;
and inserting the time displacement module into a residual error branch of the backbone network to complete the construction of the pose estimation module.
Specifically, the lightweight pose estimation model adopts a deep separable convolution to construct a backbone network, and a time shift module is inserted into the backbone network, fig. 4 is an operation schematic diagram of the time shift module described in an embodiment of the present invention, and as shown in fig. 4, the embodiment of the present invention uses a lightweight model MobilenetV2 as the backbone network of the pose estimation module, which is used for learning the motion characteristic information of each image. The method comprises the steps of firstly extracting basic features from a conventional convolutional layer by a sample video frame sequence, then inputting an output result of the convolutional layer into a network formed by a time shifting module and a 16-layer depth separable convolution, inputting information exchanged by the time shifting module into the conventional convolutional layer, and finally reducing dimensionality by using a full-connection layer to obtain two 6-dimensional attitude vectors. The time shift module is inserted into a residual error branch of the backbone network, so that the fusion of time information is realized while the spatial information is kept, the additional calculation cost is not increased, and the construction of the pose estimation module is completed.
And applying the convolutional neural network to each sample video frame after the normalization processing, and firstly extracting the basic features of each sample video frame. In each residual block, the time shifting module moves the front 1/8 channel information of the characteristic diagram at the time T to the corresponding position of the characteristic diagram at the time T +1 along the time dimension, the vacant channels reserved by the characteristic diagram at the time T are filled with zero, and the channel shifted out by the characteristic diagram at the time T +1 is cut off. Because the exchanged T +1 moment feature map information contains T moment feature map information, a T +1 moment pose estimation result (a second attitude vector) is taken as a relative pose between a T moment video frame and a T +1 moment video frame, the mean square error of the estimated pose between the T moment video frame and the T +1 moment video frame and the real pose is taken as a target to train the neural network, and when a preset training condition is met, a preset pose estimation model is obtained.
The preset training condition described in the embodiment of the present invention may be that a loss value of the loss function is smaller than a preset threshold, or a preset training number or a preset training time is satisfied.
On the basis of the foregoing embodiment, the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically includes:
cutting the initial sample video frame according to a preset size to obtain a cut sample video frame;
dividing the cut sample video frame by a preset RGB variance to obtain a sample video frame;
acquiring a plurality of continuous sample video frames in a sliding window mode;
the preset RGB variance refers to an average RGB value of an initial sample video frame set;
wherein the preset sizes are specifically as follows: the picture width resolution is 512, the picture height resolution is 256, and the number of color channels is 3.
Specifically, the initial sample video frame size is clipped to 512 × 256 × 3 in the embodiment of the present invention, where 512 is the picture width resolution, 256 is the picture height resolution, and 3 is the number of color channels. The average RGB value of the data set is subtracted from each picture and divided by the RGB variance to obtain a normalized sample video frame.
Fig. 5 is a schematic diagram of a multi-device pose sharing apparatus according to an embodiment of the present invention, as shown in fig. 5, including: an assignment module 510 and a sharing module 520; the allocating module 510 is configured to determine a video frame allocation scheme according to the device information of each terminal device; the sharing module 520 is configured to allocate video frames to each terminal device according to the video frame allocation scheme, to obtain an allocated video frame corresponding to each terminal device, so that each terminal device performs pose estimation on the allocated video frame, determines pose estimation information, and broadcasts the pose estimation information to all terminal devices.
The apparatus provided in the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.
According to the embodiment of the invention, the preset pose estimation model for pose estimation is updated to each edge terminal device, then the device information of each terminal device determines the video frame distribution scheme, after the video frame distribution is realized according to the video distribution scheme, the pose calculation among the terminal devices is completed through the cooperation of each edge terminal device, and the target of multi-device pose real-time sharing is realized through broadcast interaction.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following method: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing server instructions, where the server instructions cause a computer to execute the method provided in the foregoing embodiments, for example, the method includes: the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme; wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device; the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment; the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A multi-device pose sharing method is characterized by comprising the following steps:
determining a video frame distribution scheme according to the equipment information of each terminal equipment;
and distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.
2. The multi-device pose sharing method according to claim 1, wherein the step of determining a video frame allocation scheme according to the device information of each of the terminal devices specifically comprises;
constructing a plurality of network maximum stream models according to the equipment information of each terminal equipment and the video frames collected by each terminal equipment;
solving each network maximum flow model to obtain an optimal maximum flow matching path so as to determine a video frame distribution scheme according to each optimal maximum flow matching path;
the network maximum flow model is obtained by respectively adding virtual source nodes and virtual destination nodes on the basis of a preset weighted bipartite graph;
the preset weighted bipartite graph is a bipartite graph model which is obtained by distributing video frame calculation tasks collected by terminal equipment to other terminal equipment and modeling the tasks into weighted bipartite graphs.
3. The multi-device pose sharing method according to claim 2, wherein the step of solving each network maximum flow model to obtain an optimal maximum flow matching path specifically comprises:
and when the calculation amount of the Ford-Fulkerson algorithm for solving each network maximum flow model is less than that of the real-time recurrent neural network model for solving each network maximum flow model, solving each network maximum flow model through the Ford-Fulkerson algorithm to obtain a plurality of optimal maximum flow matching paths.
4. A multi-device pose sharing method is characterized by comprising the following steps:
the method comprises the steps that a terminal device obtains a video distribution scheme sent by a cloud server, and sends video frames to be coordinated in video frames collected by the terminal device to a target terminal according to the video frame distribution scheme;
wherein the video allocation scheme is calculated according to the computing power, available computing resources and network bandwidth of each terminal device;
the target terminal carries out pose estimation on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model to obtain local pose estimation information and pose estimation information to be coordinated, and broadcasts the local pose estimation information and the pose estimation information to be coordinated to all terminal equipment;
the preset pose estimation model is obtained by training according to a plurality of continuous sample video frames with real pose information labels.
5. The multi-device pose sharing method according to claim 4, wherein before the step of pose estimation by the target terminal on the local video frame of the target terminal and the video frame to be coordinated through a preset pose estimation model, the method further comprises:
constructing a pose estimation module;
acquiring an initial sample video frame with real attitude information, and preprocessing the initial sample video frame to obtain a plurality of continuous sample video frames;
taking two sample video frames at continuous time as a group of training samples, taking real attitude information as a label, inputting the training samples with the real attitude information label into a neural network based on a depth separable convolution and time shift module, and obtaining an estimated vector of the samples;
and training by utilizing the estimated vector of the sample and the loss function constructed by the real attitude information label, and finishing the training when a preset training condition is met to obtain a preset pose estimation model.
6. The multi-device pose sharing method according to claim 5, wherein the step of constructing the pose estimation module specifically comprises:
constructing a backbone network of a pose estimation module by adopting depth separable convolution;
and inserting the time displacement module into a residual error branch of the backbone network to complete the construction of the pose estimation module.
7. The multi-device pose sharing method according to claim 5, wherein the step of preprocessing the initial sample video frame to obtain a plurality of consecutive sample video frames specifically comprises:
cutting the initial sample video frame according to a preset size to obtain a cut sample video frame;
dividing the cut sample video frame by a preset RGB variance to obtain a sample video frame;
acquiring a plurality of continuous sample video frames in a sliding window mode;
the preset RGB variance refers to an average RGB value of an initial sample video frame set;
wherein the preset sizes are specifically as follows: the picture width resolution is 512, the picture height resolution is 256, and the number of color channels is 3.
8. A multi-device pose sharing apparatus, comprising:
the distribution module is used for determining a video frame distribution scheme according to the equipment information of each terminal equipment;
and the sharing module is used for distributing the video frames to each terminal device according to the video frame distribution scheme to obtain a distributed video frame corresponding to each terminal device, so that each terminal device can estimate the pose of the distributed video frame, determine pose estimation information and broadcast the pose estimation information to all terminal devices.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the multi-device pose sharing method according to any one of claims 4 to 7 when executing the program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the multi-device pose sharing method according to any one of claims 4 to 7.
CN202011475281.8A 2020-12-14 2020-12-14 Multi-device pose sharing method and device Pending CN112511644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011475281.8A CN112511644A (en) 2020-12-14 2020-12-14 Multi-device pose sharing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011475281.8A CN112511644A (en) 2020-12-14 2020-12-14 Multi-device pose sharing method and device

Publications (1)

Publication Number Publication Date
CN112511644A true CN112511644A (en) 2021-03-16

Family

ID=74973394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011475281.8A Pending CN112511644A (en) 2020-12-14 2020-12-14 Multi-device pose sharing method and device

Country Status (1)

Country Link
CN (1) CN112511644A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387338A (en) * 2010-09-03 2012-03-21 中兴通讯股份有限公司 Distributed type video processing method and video session system
CN107766135A (en) * 2017-09-29 2018-03-06 东南大学 Method for allocating tasks based on population and simulated annealing optimization in mobile cloudlet
CN109379727A (en) * 2018-10-16 2019-02-22 重庆邮电大学 Task distribution formula unloading in car networking based on MEC carries into execution a plan with cooperating
CN109640068A (en) * 2018-10-31 2019-04-16 百度在线网络技术(北京)有限公司 Information forecasting method, device, equipment and the storage medium of video frame
CN109639833A (en) * 2019-01-25 2019-04-16 福建师范大学 A kind of method for scheduling task based on wireless MAN thin cloud load balancing
CN110347500A (en) * 2019-06-18 2019-10-18 东南大学 For the task discharging method towards deep learning application in edge calculations environment
CN110826502A (en) * 2019-11-08 2020-02-21 北京邮电大学 Three-dimensional attitude prediction method based on pseudo image sequence evolution
CN111741054A (en) * 2020-04-24 2020-10-02 浙江工业大学 Method for minimizing computation unloading delay of deep neural network of mobile user
CN113794756A (en) * 2021-08-26 2021-12-14 北京邮电大学 Multi-video-stream unloading method and system supporting mobile equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387338A (en) * 2010-09-03 2012-03-21 中兴通讯股份有限公司 Distributed type video processing method and video session system
CN107766135A (en) * 2017-09-29 2018-03-06 东南大学 Method for allocating tasks based on population and simulated annealing optimization in mobile cloudlet
CN109379727A (en) * 2018-10-16 2019-02-22 重庆邮电大学 Task distribution formula unloading in car networking based on MEC carries into execution a plan with cooperating
CN109640068A (en) * 2018-10-31 2019-04-16 百度在线网络技术(北京)有限公司 Information forecasting method, device, equipment and the storage medium of video frame
CN109639833A (en) * 2019-01-25 2019-04-16 福建师范大学 A kind of method for scheduling task based on wireless MAN thin cloud load balancing
CN110347500A (en) * 2019-06-18 2019-10-18 东南大学 For the task discharging method towards deep learning application in edge calculations environment
CN110826502A (en) * 2019-11-08 2020-02-21 北京邮电大学 Three-dimensional attitude prediction method based on pseudo image sequence evolution
CN111741054A (en) * 2020-04-24 2020-10-02 浙江工业大学 Method for minimizing computation unloading delay of deep neural network of mobile user
CN113794756A (en) * 2021-08-26 2021-12-14 北京邮电大学 Multi-video-stream unloading method and system supporting mobile equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JI LIN 等: "TSM: Temporal Shift Module for Efficient Video Understanding", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, 27 February 2020 (2020-02-27), pages 1 - 11 *
曹奥: "面向边缘计算的车载网络高效调度机制研究", 《中国优秀硕士学位论文全文数据库·工程科技Ⅱ辑》, pages 34 - 48 *
胡海洋;刘润华;胡华;: "移动云计算环境下任务调度的多目标优化方法", 计算机研究与发展, no. 09 *

Similar Documents

Publication Publication Date Title
CN111190981B (en) Method and device for constructing three-dimensional semantic map, electronic equipment and storage medium
Mueggler et al. Lifetime estimation of events from dynamic vision sensors
CN113811920A (en) Distributed pose estimation
CN117893680A (en) Room layout estimation method and technique
US10679369B2 (en) System and method for object recognition using depth mapping
US10262464B2 (en) Dynamic, local augmented reality landmarks
Gargees et al. Incident-supporting visual cloud computing utilizing software-defined networking
JP7412847B2 (en) Image processing method, image processing device, server, and computer program
US8903130B1 (en) Virtual camera operator
US20210183161A1 (en) 3-d reconstruction using augmented reality frameworks
CN109126121B (en) AR terminal interconnection method, system, device and computer readable storage medium
CN108932725B (en) Scene flow estimation method based on convolutional neural network
KR101108786B1 (en) N-way multimedia collaboration system
CN110276768B (en) Image segmentation method, image segmentation device, image segmentation apparatus, and medium
JP2023512540A (en) Simultaneous real-time object detection and semantic segmentation system and method and non-transitory computer-readable medium
CN112732450A (en) Robot knowledge graph generation system and method under terminal-edge-cloud cooperative framework
CN109408234A (en) A kind of augmented reality data-optimized systems and method based on edge calculations
CN112308921B (en) Combined optimization dynamic SLAM method based on semantics and geometry
Schorghuber et al. SLAMANTIC-leveraging semantics to improve VSLAM in dynamic environments
CN110807379A (en) Semantic recognition method and device and computer storage medium
CN112990018A (en) Accelerated execution method of deep learning model in dynamic change network environment
Liu et al. EdgeSharing: Edge assisted real-time localization and object sharing in urban streets
WO2023164845A1 (en) Three-dimensional reconstruction method, device, system, and storage medium
JP7290271B2 (en) Self-motion estimation method and device
JP7400118B2 (en) Parking space detection method, apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination