CN111417016A - Attitude estimation method, server and network equipment - Google Patents

Attitude estimation method, server and network equipment Download PDF

Info

Publication number
CN111417016A
CN111417016A CN201910010651.1A CN201910010651A CN111417016A CN 111417016 A CN111417016 A CN 111417016A CN 201910010651 A CN201910010651 A CN 201910010651A CN 111417016 A CN111417016 A CN 111417016A
Authority
CN
China
Prior art keywords
vector
video
information
camera
attitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910010651.1A
Other languages
Chinese (zh)
Inventor
周佳俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910010651.1A priority Critical patent/CN111417016A/en
Publication of CN111417016A publication Critical patent/CN111417016A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides an attitude estimation method, a server and network equipment, and relates to the technical field of communication, wherein the attitude estimation method comprises the following steps: acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station; performing video decoding according to the video information to obtain decoded video content; calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment; sending the pose vector to the network device. According to the embodiment of the invention, at least one piece of video information sent by a network device through an MEC server of a base station is acquired, video decoding is carried out according to the video information to acquire decoded video content, the attitude vector of the network device is calculated according to the video content and the device information of the network device, and the attitude vector is sent to the network device, so that low time delay, high bandwidth and strong processing capability can be ensured.

Description

Attitude estimation method, server and network equipment
Technical Field
The invention relates to the technical field of communication, in particular to a posture estimation method, a server and network equipment.
Background
In the field of AR and robots, accurate and timely Positioning and S L AM (map building) can be achieved by shooting pictures with a single or multiple cameras and calculating the real-time position of the device itself in a reverse direction, which is a general capability that almost all AR devices and robot devices must have.
In current-stage research and application, such camera pose estimation methods based on computer vision are roughly divided into two categories according to different requirements on processing time delay: one is an application that has real-time requirements and a small motion range in a short time, resulting in a small variation difference between two consecutive pictures. Such as AR mobile phones, AR glasses, robots, autonomous vehicles, etc., which are continuously and freely movable in space. The method is extremely sensitive to the time delay requirement of attitude estimation, and the accurate position of the method needs to be acquired in real time; the other type is an application which is completely insensitive to time delay, allows a shot picture or video to be stored and then transmitted to a high-performance machine or a cloud platform for operation to obtain a gesture, such as initial gesture calibration of a VR panoramic camera. The pictures and video contents provided by the applications are often different greatly, so that the calculation amount is large and the requirement on the calculation capacity is high.
The problem of local attitude estimation in the first class of applications is very obvious, and the current devices cannot provide long-time and high-reliability services. Due to the limitation of mobility, the performance of a processor and various other chips of equipment such as mobile phones, AR glasses and robots is relatively low, and the phenomenon that frame levels are asynchronous and the like often occurs when the posture of each frame is calculated in real time, the posture parameters of the equipment at the current position at a certain moment are actually taken, and errors are brought to upper-layer application. In the path planning and automatic driving business of the robot, such errors may cause deviations in physical positions, possibly causing serious influences. In addition, the mobile device is generally powered by a battery, and due to such calculation tasks as burden, the current applications such as mobile phones AR and glasses AR may cause overheating of the body and sudden increase of power consumption in a few minutes, which may not actually support long-term normal use of the applications. In the second type of application, a memory card device needs to be connected to each lens of the VR panoramic camera to store the shot video. And after shooting is finished, the video is uniformly guided into the PC or processing software on the server for attitude estimation. On one hand, the method greatly reduces the attitude estimation efficiency, and in addition, because the method needs to be matched with a PC (personal computer) or a server for use, the front-end processing cost is increased, and the hardware utilization rate is reduced indirectly. Furthermore, the types of cameras that may currently be involved in camera pose estimation include: the video format, code rate and resolution ratio output by each camera are greatly different, and local processing or centralized processing faces the problem of low efficiency caused by more adaptive targets.
Therefore, there is a need for an attitude estimation method, a server and a network device, which can ensure frame-level synchronization during attitude calculation, and have low time delay and increased attitude estimation efficiency.
Disclosure of Invention
The embodiment of the invention provides an attitude estimation method, a server and network equipment, which are used for solving the problems of frame-level asynchronism when attitude calculation of each frame is carried out in real time and low efficiency caused by more adaptive targets in local processing or centralized processing.
In order to solve the above technical problem, an embodiment of the present invention provides an attitude estimation method, which is applied to a service processing server, and includes:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
Preferably, the video decoding according to the video information includes:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
Preferably, when the network device includes a single-camera device, the calculating a pose vector of the network device according to the video content and device information of the network device includes:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
Preferably, when the network device includes at least two camera devices, the calculating a pose vector of the network device according to the video content and device information of the network device includes:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
Preferably, after the calculating the pose vector of the network device according to the video content and the device information of the network device, the method further includes:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure BDA0001937125780000041
Figure BDA0001937125780000042
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
The embodiment of the invention also provides an attitude estimation method, which is applied to network equipment and comprises the following steps:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
The embodiment of the invention also provides a server, which is a service processing server and comprises a processor and a transceiver, wherein the processor is used for:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
Preferably, the processor is specifically configured to:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
Preferably, when the network device includes a single-camera device, the processor is specifically configured to:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
Preferably, when the network device includes at least two camera devices, the processor is specifically configured to:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
Preferably, the processor is further configured to:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure BDA0001937125780000051
Figure BDA0001937125780000061
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
An embodiment of the present invention further provides a network device, including a processor and a transceiver, where the transceiver is configured to:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
The embodiment of the present invention further provides a server, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the attitude estimation method described above is implemented.
The embodiment of the present invention further provides a network device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the posture estimation method described above is implemented.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the posture estimation method described above.
Compared with the prior art, the attitude estimation method, the server and the network equipment provided by the embodiment of the invention at least have the following beneficial effects:
the method comprises the steps of calculating at least one piece of video information sent by an MEC server through a mobile edge of a base station by obtaining network equipment, carrying out video decoding according to the video information to obtain decoded video content, calculating an attitude vector of the network equipment according to the video content and equipment information of the network equipment, and sending the attitude vector to the network equipment, so that low time delay, high bandwidth and high processing capacity can be guaranteed.
Drawings
FIG. 1 is a flow chart of a method for estimating an attitude according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for estimating pose provided by an embodiment of the present invention;
FIG. 3 is a camera imaging aspect diagram provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of an implementation structure of a server according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an implementation structure of a network device according to an embodiment of the present invention;
fig. 6 is a schematic diagram of another implementation structure of the server according to the embodiment of the present invention;
fig. 7 is a schematic diagram of another implementation structure of a network device according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating an embodiment of a method for estimating an attitude of an object;
fig. 9 is another specific flowchart of the attitude estimation method according to the embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In addition, the terms "system" and "network" are often used interchangeably herein.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
As shown in fig. 1, an embodiment of the present invention provides an attitude estimation method, which is applied to a service processing server, and specifically includes the following steps:
step S11, acquiring at least one video information sent by the network device through the mobile edge computing MEC server of the base station.
As shown in fig. 8, the network device collects the video information, and sends the video information to a 5G base station through an uplink channel on a 5G network, the 5G base station sends the video information to a 5G UPF (5G User Plane Function ) connected to a physical cable of the 5G base station, the 5G UPF sends the video information to an MEC (Mobile edge computing) server connected to the physical cable of the 5G UPF, and the MEC server sends the video information to the service processing server. The air interface delay between the network device and the 5G base station (which may be about 1ms) plus the physical connection delay from the 5G base station to the service processing server (which is generally an optical fiber) may be about 5ms to 10ms or less. And, the MEC server has low time delay, high bandwidth and strong processing capability.
And step S12, performing video decoding according to the video information to obtain decoded video content.
After receiving at least one piece of video information, the service processing server performs video decoding on the video information to obtain video content after compression and recovery, namely decoded video content.
Step S13, calculating a pose vector of the network device according to the video content and the device information of the network device.
Wherein the device information includes IMU (Inertial Measurement Unit) information and physical parameter information. The physical parameter information may include a focal length, distortion parameters, a gyroscope, or an accelerometer value.
Step S14, sending the attitude vector to the network device.
The step S12 specifically includes:
when the network equipment comprises single-camera equipment, decoding video information of the single camera; when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
After a receiving end management module of the service processing server receives video information and the device information sent by network devices, the network devices are judged to be single-camera devices or multi-camera devices.
In the foregoing embodiment of the present invention, when the network device includes a single-camera device, the step S13 specifically includes:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content; acquiring image characteristics in the single-frame image; estimating an attitude estimation vector of the network equipment according to the image characteristics; and acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information.
In the above embodiments of the present invention, the image features include FAST (FAST segmentation Test feature) and SIFT (Scale Invariant feature).
When the FAST and SIFT image features are extracted, the image features of two frames (which may be adjacent frames) may be transmitted to a pose estimation module in the service processing server. The attitude estimation module calculates a translation parameter r of a rough camera position movement matrix by using the image change condition of the same image characteristic of two frames1~r9And a rotation parameter t1~t3I.e. the pose estimation vector. Inputting the attitude estimation vector and the IMU information in the device information into an EKF (Extended Kalman Filter) in the service processing server, wherein the EKF performs Extended Kalman prediction and updating on the attitude estimation vector, an accelerometer vector and a gyroscope vector to obtain the attitude estimation vector of a single camera, namely the attitude estimation vector of the network device. Wherein the IMU information includes an accelerometer vector Acc (Accx, Accy, Accz) and a gyroscope vector Gyro (Gyrox, Gyroy, Gyroz).
In the above embodiment of the present invention, when the network device includes at least two camera devices, the step S13 specifically includes:
acquiring video content obtained after decoding the video information of each camera; acquiring a plurality of video frames at the same moment after the video contents are synchronized; acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment; and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
After the decoded video content is obtained, synchronizing the video content to ensure that video frames acquired among cameras at the same moment can be obtained, and reading physical parameter information (including the transverse focal length f of the cameras) of the cameras at the same momentxLongitudinal focal length f of camerayParameter u of camera caused image distortion0And v0). A stereo calibration algorithm may then be employed to calculate the relative pose vector for each camera.
The following describes the above stereo calibration algorithm by an embodiment:
physical parameter information f of known camera AAx、fAy、uA0And vA0Physical parameter information f of the known camera BBx、fBy、uB0And vB0And the video contents shot by the two cameras at the same time are partially overlapped. The two cameras are used for shooting the same calibration plate, and the calibration plate can use 7-9 checkerboards and can also use other checkerboards meeting the requirements. As shown in FIG. 3, P is a point or an object on the calibration plate, PAAnd PBP may be any point on the calibration board, respectively, which is the point or the imaged point of the object in the focal plane of the camera A, B. Three-dimensional coordinates X, Y, Z of a plurality of corner points P can be obtained from the calibration board, and two-dimensional coordinates x of the image of the point P in the camera A can be obtained from the image of the point P in the camera A and the camera B respectivelyAAnd yAAnd two-dimensional seating of the image of point P in Camera BMark xBAnd yB(ii) a The translation parameter R of the camera A relative to the origin in the coordinate system can be calculated according to an imaging formulaA=rA1~rA9And a rotation parameter TA=tA1~tA3I.e. the pose vector of camera a, and the translation parameter R of camera B in the coordinate system with respect to the originB=rB1~rB9And a rotation parameter TB=tB1~tB3I.e. the pose vector of camera B. According to the attitude vector of the camera, the displacement and rotation of the camera relative to the three-dimensional space of the original position at any moment can be known.
The imaging formula is as follows:
Figure BDA0001937125780000101
wherein x and y are two-dimensional coordinates of a plane of an object in a photograph or video captured by a camera;
x, Y and Z are the three-dimensional coordinates of the corresponding object in real physical space;
fxis the lateral focal length of the camera;
fyis the longitudinal focal length of the camera;
u0and v0Parameters that cause image distortion for the camera;
r1~r9is a translation vector;
t1~t3is a rotation vector.
When the network device includes at least two camera devices, the step S13 is followed by:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras; processing the video contents according to the relative attitude vector to obtain processed video contents; and sending the processed video content to a cloud server.
In the above embodiments of the present invention, the relative attitude relationship is:
Figure BDA0001937125780000111
Figure BDA0001937125780000112
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAthe translation vector of one camera, namely the translation vector of the camera A;
TAthe rotation vector of one camera is the rotation vector of the camera A;
RBis the translation vector of the other camera, i.e. the translation vector of camera B;
TBis the rotation vector of the other camera, i.e., the rotation vector of camera B.
As shown in fig. 9, if there are multiple cameras, the relative attitude vectors between the cameras with partial video frame overlaps in the video content shot at the same time can be sequentially calculated according to the overlapping relationship between the cameras; processing a plurality of video contents (such as panorama stitching, 3D image generation, multi-view image generation and the like) according to the relative attitude vector to obtain the processed video contents; the processed video Content is sent to 5GUPF through the MEC server, and then forwarded to the cloud server by the 5G UPF, where the cloud server may include a cloud service, a CDN (Content Delivery Network), or a user player.
As shown in fig. 2, an embodiment of the present invention further provides an attitude estimation method, which is applied to a network device, and specifically includes the following steps:
step S21, acquiring at least one piece of video information shot by the network equipment;
step S22, sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
step S23, receiving the attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
Wherein the network device may include: cell-phone, camera, robot, car etc. and all have camera hardware and video acquisition function. After receiving the attitude vector sent by the service processing server, the attitude vector can be provided for the network equipment in real time, and the method is suitable for scenes such as mobile phones AR, AR glasses, robot path planning, automatic driving and the like.
As shown in fig. 4, an embodiment of the present invention further provides a server, where the server is a service processing server, and includes a processor 401 and a transceiver 402, where the processor 401 is configured to:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
In an embodiment of the present invention, the processor 401 is specifically configured to:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
In an embodiment of the present invention, when the network device includes a single-camera device, the processor 401 is specifically configured to:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
In an embodiment of the present invention, when the network device includes at least two camera devices, the processor 401 is specifically configured to:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
In an embodiment of the present invention, the processor 401 is further configured to:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure BDA0001937125780000131
Figure BDA0001937125780000132
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
As shown in fig. 5, an embodiment of the present invention further provides a network device, which includes a processor 501 and a transceiver 502, where the transceiver 502 is configured to:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
As shown in fig. 6, an embodiment of the present invention further provides another server, which includes a transceiver 601, a memory 602, a processor 600, and a computer program stored on the memory 602 and executable on the processor 600; the processor 600 calls and executes programs and data stored in the memory 602.
The transceiver 601 receives and transmits data under the control of the processor 600, and particularly, the processor 600 for reading the program in the memory 602 may perform the following processes:
the processor 600 is configured to:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
In an embodiment of the present invention, the processor 600 is specifically configured to:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
In an embodiment of the present invention, when the network device includes a single-camera device, the processor 600 is specifically configured to:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
In an embodiment of the present invention, when the network device includes at least two camera devices, the processor 600 is specifically configured to:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
In an embodiment of the present invention, the processor 600 is further configured to:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure BDA0001937125780000151
Figure BDA0001937125780000152
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
Where in fig. 6, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 602. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 601 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 602 may store data used by the processor 600 in performing operations.
Those skilled in the art will understand that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program includes instructions for executing part or all of the steps of the above methods; and the program may be stored in a readable storage medium, which may be any form of storage medium.
As shown in fig. 7, an embodiment of the present invention further provides another network device, including: a processor 701; and a memory 703 connected to the processor 701 through a bus interface 702, where the memory 703 is used to store programs and data used by the processor 701 in executing operations, and the processor 701 calls and executes the programs and data stored in the memory 703.
The transceiver 704 is connected to the bus interface 702, and is configured to receive and transmit data under the control of the processor 701, and specifically, the processor 701 is configured to read a program in the memory 703, and may perform the following processes:
the transceiver 704 is configured to:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
It should be noted that in fig. 7, the bus architecture may include any number of interconnected buses and bridges, with one or more processors represented by processor 701 and various circuits of memory represented by memory 703 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 704 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The user interface 705 may also be an interface capable of interfacing with a desired device for different terminals, including but not limited to a keypad, display, speaker, microphone, joystick, etc. The processor 701 is responsible for managing the bus architecture and general processing, and the memory 703 may store data used by the processor 701 in performing operations.
Those skilled in the art will understand that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program includes instructions for executing part or all of the steps of the above methods; and the program may be stored in a readable storage medium, which may be any form of storage medium.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements each process in the above-described attitude estimation method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition. The computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk or an optical disk.
In the embodiment of the invention, the camera attitude estimation process originally executed at the network equipment end is introduced into the 5G network, and the characteristics of low time delay, high bandwidth and high processing capacity of the MEC server are added, so that the camera attitude estimation and video processing are realized on the service processing server, and in the equipment with a plurality of cameras, a plurality of originally separated video information can be gathered to the service processing server for processing in real time, thereby avoiding the appearance and use of equipment such as a plurality of memory cards, a storage hard disk and the like. And because the camera (array) attitude estimation with the highest calculated amount is carried out by the service processing server, the network equipment only needs to have video shooting and streaming transmission capabilities, and does not need to be matched with a corresponding terminal side processing server and a PC (personal computer), so that the hardware cost of the network equipment is saved, meanwhile, the long-time and high-percentage occupation of a processor is avoided, the use efficiency of the network equipment is improved, and the risk of high power consumption of the network equipment is reduced. Moreover, real-time VR panoramic video stream generation, 3D video stream generation, AR bottom layer positioning capacity, robot path planning capacity and automatic driving capacity can be realized, time delay between a network and network equipment is extremely low, the positions of the robot and the CDN are closer, and the video content processing efficiency and the smoothness can be improved.
Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the method of the invention, it is obvious that the steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (15)

1. An attitude estimation method applied to a business processing server is characterized by comprising the following steps:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
2. The pose estimation method of claim 1, wherein said video decoding according to the video information comprises:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
3. The pose estimation method of claim 1, wherein when the network device is a device comprising a single camera, the computing the pose vector of the network device from the video content and device information of the network device comprises:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
4. The pose estimation method of claim 1, wherein when the network device is a device including at least two cameras, the computing the pose vector of the network device from the video content and device information of the network device comprises:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
5. The pose estimation method of claim 4, wherein after calculating the pose vector of the network device based on the video content and the device information of the network device, further comprising:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure FDA0001937125770000021
Figure FDA0001937125770000022
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
6. An attitude estimation method applied to network equipment is characterized by comprising the following steps:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
7. A server, the server being a traffic processing server comprising a processor and a transceiver, wherein the processor is configured to:
acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;
performing video decoding according to the video information to obtain decoded video content;
calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;
sending the pose vector to the network device.
8. The server of claim 7, wherein the processor is specifically configured to:
when the network equipment comprises single-camera equipment, decoding video information of the single camera;
when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.
9. The server of claim 7, wherein when the network device is a device comprising a single camera, the processor is specifically configured to:
acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;
acquiring image characteristics in the single-frame image;
estimating an attitude estimation vector of the network equipment according to the image characteristics;
acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;
wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).
10. The server of claim 7, wherein when the network device is a device comprising at least two cameras, the processor is specifically configured to:
acquiring video content obtained after decoding the video information of each camera;
acquiring a plurality of video frames at the same moment after the video contents are synchronized;
acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;
and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.
11. The server of claim 10, wherein the processor is further configured to:
calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;
processing the video contents according to the relative attitude vector to obtain processed video contents;
sending the processed video content to a cloud server;
wherein the relative attitude relationship is:
Figure FDA0001937125770000041
Figure FDA0001937125770000042
wherein R is a translation vector in the relative attitude vector;
t is a rotation vector in the relative attitude vector;
RAa translation vector of one of the cameras;
TAis the rotation vector of one camera;
RBis the translation vector of another camera;
TBis the rotation vector of the other camera.
12. A network device comprising a processor and a transceiver, wherein the transceiver is configured to:
acquiring at least one piece of video information shot by the network equipment;
sending the video information to a service processing server through a mobile edge computing MEC server of a base station;
receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.
13. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the pose estimation method according to any of the claims 1 to 5 when executing the program.
14. A network device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the pose estimation method of claim 6 when executing the program.
15. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the pose estimation method according to any one of claims 1 to 5 or 6.
CN201910010651.1A 2019-01-07 2019-01-07 Attitude estimation method, server and network equipment Pending CN111417016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910010651.1A CN111417016A (en) 2019-01-07 2019-01-07 Attitude estimation method, server and network equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910010651.1A CN111417016A (en) 2019-01-07 2019-01-07 Attitude estimation method, server and network equipment

Publications (1)

Publication Number Publication Date
CN111417016A true CN111417016A (en) 2020-07-14

Family

ID=71494081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910010651.1A Pending CN111417016A (en) 2019-01-07 2019-01-07 Attitude estimation method, server and network equipment

Country Status (1)

Country Link
CN (1) CN111417016A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220300A (en) * 2021-02-01 2022-03-22 黄华 Visual intelligent interactive teaching and examination system and method by utilizing augmented reality wearing equipment
CN117294832A (en) * 2023-11-22 2023-12-26 湖北星纪魅族集团有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710932A (en) * 2009-12-21 2010-05-19 深圳华为通信技术有限公司 Image stitching method and device
CN102175261A (en) * 2011-01-10 2011-09-07 深圳大学 Visual measuring system based on self-adapting targets and calibrating method thereof
CN104501814A (en) * 2014-12-12 2015-04-08 浙江大学 Attitude and position estimation method based on vision and inertia information
CN106846415A (en) * 2017-01-24 2017-06-13 长沙全度影像科技有限公司 A kind of multichannel fisheye camera binocular calibration device and method
CN107728617A (en) * 2017-09-27 2018-02-23 速感科技(北京)有限公司 More mesh online calibration method, mobile robot and systems
US20180075614A1 (en) * 2016-09-12 2018-03-15 DunAn Precision, Inc. Method of Depth Estimation Using a Camera and Inertial Sensor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101710932A (en) * 2009-12-21 2010-05-19 深圳华为通信技术有限公司 Image stitching method and device
CN102175261A (en) * 2011-01-10 2011-09-07 深圳大学 Visual measuring system based on self-adapting targets and calibrating method thereof
CN104501814A (en) * 2014-12-12 2015-04-08 浙江大学 Attitude and position estimation method based on vision and inertia information
US20180075614A1 (en) * 2016-09-12 2018-03-15 DunAn Precision, Inc. Method of Depth Estimation Using a Camera and Inertial Sensor
CN106846415A (en) * 2017-01-24 2017-06-13 长沙全度影像科技有限公司 A kind of multichannel fisheye camera binocular calibration device and method
CN107728617A (en) * 2017-09-27 2018-02-23 速感科技(北京)有限公司 More mesh online calibration method, mobile robot and systems

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220300A (en) * 2021-02-01 2022-03-22 黄华 Visual intelligent interactive teaching and examination system and method by utilizing augmented reality wearing equipment
CN117294832A (en) * 2023-11-22 2023-12-26 湖北星纪魅族集团有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN117294832B (en) * 2023-11-22 2024-03-26 湖北星纪魅族集团有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11270460B2 (en) Method and apparatus for determining pose of image capturing device, and storage medium
CN110310326B (en) Visual positioning data processing method and device, terminal and computer readable storage medium
US10984583B2 (en) Reconstructing views of real world 3D scenes
CN109788189B (en) Five-dimensional video stabilization device and method for fusing camera and gyroscope
CN113256718B (en) Positioning method and device, equipment and storage medium
CN111862150B (en) Image tracking method, device, AR equipment and computer equipment
US20240029297A1 (en) Visual positioning method, storage medium and electronic device
CN112819860B (en) Visual inertial system initialization method and device, medium and electronic equipment
CN112927271B (en) Image processing method, image processing device, storage medium and electronic apparatus
CN113029128B (en) Visual navigation method and related device, mobile terminal and storage medium
CN112561978A (en) Training method of depth estimation network, depth estimation method of image and equipment
US20230298344A1 (en) Method and device for determining an environment map by a server using motion and orientation data
CN113542600B (en) Image generation method, device, chip, terminal and storage medium
CN113361365A (en) Positioning method and device, equipment and storage medium
CN111652933B (en) Repositioning method and device based on monocular camera, storage medium and electronic equipment
US20140333724A1 (en) Imaging device, imaging method and program storage medium
CN114543797A (en) Pose prediction method and apparatus, device, and medium
CN111417016A (en) Attitude estimation method, server and network equipment
CN111833459B (en) Image processing method and device, electronic equipment and storage medium
US11758101B2 (en) Restoration of the FOV of images for stereoscopic rendering
CN117788659A (en) Method, device, electronic equipment and storage medium for rendering image
EP3429186B1 (en) Image registration method and device for terminal
CN116137025A (en) Video image correction method and device, computer readable medium and electronic equipment
US20220222834A1 (en) Image processing system, image processing device, image processing method, and program
CN113643343A (en) Training method and device of depth estimation model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200714