CN113938667A

CN113938667A - Video data transmission method and device based on video stream data and storage medium

Info

Publication number: CN113938667A
Application number: CN202111244795.7A
Authority: CN
Inventors: 张煜; 邵志兢; 孙伟; 吕云; 罗栋藩; 胡雨森
Original assignee: Shenzhen Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-14
Anticipated expiration: 2041-10-25
Also published as: CN113938667B

Abstract

The invention provides a video data transmission method and a transmission device based on video stream data, wherein the method comprises the following steps: acquiring three-dimensional data information of a target video, wherein the three-dimensional data information comprises a triangular mesh set and a vertex coordinate set of a three-dimensional model; determining a first reference triangular mesh in the set of triangular meshes; constructing a topological relation between triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes; encoding the topological relation and the vertex coordinates into character strings in a preset format, and storing the character strings obtained by encoding into video stream data of the target video to obtain target video stream data; and when receiving a data transmission operation triggered by the target video, sending the target video stream data to a data receiving end.

Description

Video data transmission method and device based on video stream data and storage medium

Technical Field

The present invention relates to the field of video data processing, and in particular, to a video data transmission method and apparatus based on video stream data, and a storage medium.

Background

With the development of science and technology, people have higher and higher requirements on various image resources, in order to create an immersive feeling for users, more and more image resources adopt a volume video technology, the volume video is a technology capable of capturing information (such as depth information, color information and the like) in a three-dimensional space and generating a three-dimensional model sequence, and the three-dimensional models are connected to form a brand-new video format capable of being watched at any visual angle.

Since the volume video is essentially a string of continuous three-dimensional model sequences, the volume video occupies large data and is difficult to transmit data in the existing network environment.

Therefore, it is desirable to provide a method and an apparatus for transmitting video data based on video stream data to solve the above technical problems.

Disclosure of Invention

The embodiment of the invention provides a video data transmission method and device based on video stream data, and aims to solve the technical problem that the existing volume video occupies larger data.

The embodiment of the invention provides a video data transmission method based on video stream data, which comprises the following steps:

acquiring three-dimensional data information of a target video, wherein the three-dimensional data information comprises a triangular mesh set and a vertex coordinate set of a three-dimensional model;

determining a first reference triangular mesh in the set of triangular meshes;

constructing a topological relation between triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes;

encoding the topological relation and the vertex coordinates into character strings in a preset format, and storing the character strings obtained by encoding into video stream data of the target video to obtain target video stream data;

and when receiving a data transmission operation triggered by the target video, sending the target video stream data to a data receiving end.

In the video data transmission method according to the present invention, the constructing a topological relation between triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the positional relation between the reference triangular mesh and other triangular meshes includes:

determining a triangular mesh associated with the boundary of the first reference triangular mesh in the triangular mesh set to obtain an associated triangular mesh;

and constructing a topological relation between the triangular meshes in the triangular mesh set according to the relative position between the boundary of the first reference triangular mesh and the associated triangular mesh.

In the video data transmission method of the present invention, the constructing a topological relationship between triangular meshes in the triangular mesh set according to a relative position between a boundary of the first reference triangular mesh and an associated triangular mesh includes:

selecting a target boundary among the boundaries of the first reference triangular mesh;

determining an adjacent triangular mesh adjacent to the first reference triangular mesh by taking the target boundary as a reference;

recording the topological relation between the first reference triangular mesh and the adjacent triangular mesh by adopting a preset identifier, and determining the adjacent triangular mesh as the updated first reference triangular mesh;

and returning to the step of selecting the target boundary in the boundaries of the first reference triangular mesh until the topological relation among the triangular meshes in the triangular mesh set is recorded by adopting the identifiers.

In the video data transmission method of the present invention, the method further includes:

when detecting that three boundaries of the triangular meshes in the triangular mesh set are adjacent to other triangular meshes, determining the triangular meshes of which the three boundaries are adjacent to other triangular meshes as reference triangular meshes;

selecting a second reference triangular mesh from triangular meshes adjacent to the reference triangular mesh, and deleting the topological relation between the reference triangular mesh and the second reference triangular mesh;

and recording the topological relation between the second reference triangular mesh and the adjacent triangular mesh by adopting a preset identifier, and recording the topological relation between the reference triangular mesh and the adjacent triangular mesh by adopting a preset branch identifier.

In the video data transmission method of the present invention, the encoding the topological relation and the vertex coordinates into a character string in a preset format includes:

respectively encoding the topological relation and the vertex coordinates into binary character strings;

and converting the binary character string obtained by encoding into a character string in a Base64 format according to a preset mapping relation.

In the video data transmission method of the present invention, the storing the encoded character string into the video stream data of the target video to obtain the target video stream data includes:

and acquiring video stream data of the target video, and storing the character string obtained by encoding into a network abstraction layer of the video stream data to obtain the target video stream data.

An embodiment of the present invention further provides a video data transmission device based on video stream data, including:

the acquisition module is used for acquiring three-dimensional data information of a target video, wherein the three-dimensional data information comprises a triangular mesh set and a vertex coordinate set of a three-dimensional model;

a determining module for determining a first reference triangular mesh in the set of triangular meshes;

the construction module is used for constructing a topological relation between triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes;

the encoding module is used for encoding the topological relation and the vertex coordinates into a character string in a preset format;

the storage module is used for storing the character string obtained by encoding into the video stream data of the target video to obtain target video stream data;

and the sending module is used for sending the target video stream data to a data receiving end when receiving the data transmission operation triggered by the target video.

In the video data transmission apparatus of the present invention, the construction module includes:

a determining unit, configured to determine a triangular mesh associated with a boundary of the first reference triangular mesh in the triangular mesh set, so as to obtain an associated triangular mesh;

and the constructing unit is used for constructing the topological relation between the triangular meshes in the triangular mesh set according to the relative position between the boundary of the first reference triangular mesh and the associated triangular mesh.

In the video data transmission apparatus of the present invention, the construction unit includes:

a selection subunit configured to select a target boundary among the boundaries of the first reference triangular mesh;

a determining subunit, configured to determine, with the target boundary as a reference, an adjacent triangular mesh adjacent to the first reference triangular mesh;

the recording subunit is configured to record a topological relation between the first reference triangular mesh and an adjacent triangular mesh by using a preset identifier, and determine the adjacent triangular mesh as the updated first reference triangular mesh; and returning to the step of selecting the target boundary in the boundaries of the first reference triangular mesh until the topological relation among the triangular meshes in the triangular mesh set is recorded by adopting the identifiers.

Embodiments of the present invention also provide a storage medium having stored therein processor-executable instructions, which are loaded by one or more processors to perform the above video data transmission method based on video stream data.

Compared with the prior art, the video data transmission method and the video transmission device based on the video stream data construct the topological relation among the triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, do not need to record the data volume corresponding to 3 vertexes of each triangular mesh, greatly compress the information needing to be recorded, and reduce the data occupied by the volume video; the technical problem that data occupied by existing volume videos are large is effectively solved.

Drawings

Fig. 1 is a flowchart of an embodiment of a video data transmission method based on video stream data according to the present invention;

FIG. 2 is a flowchart illustrating step 103 of a video data transmission method based on video stream data according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating step 202 of a video data transmission method based on video stream data according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a predetermined encoding scheme;

fig. 5 is a schematic diagram illustrating a topological relation constructed in the video data transmission method based on video stream data according to the present invention;

FIG. 6 is a conversion table between binary and Base64 in the video data transmission method based on video stream data according to the present invention;

FIG. 7 is a schematic structural diagram of an embodiment of an apparatus for transmitting video data based on video stream data according to the present invention;

fig. 8 is a schematic structural diagram of a building module of an embodiment of the apparatus for transmitting video data based on video stream data according to the present invention;

fig. 9 is a schematic structural diagram of a building unit of an embodiment of a video data transmission apparatus based on video stream data according to the present invention;

FIG. 10 is a schematic diagram of video data transmission according to an embodiment of the present invention;

fig. 11 is a schematic diagram of an operating environment of an electronic device in which a video data transmission apparatus based on video stream data according to the present invention is located.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present invention are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the invention and should not be taken as limiting the invention with regard to other embodiments that are not detailed herein.

In the description that follows, embodiments of the invention are described with reference to steps and symbols of operations performed by one or more computers, unless otherwise indicated. It will thus be appreciated that those steps and operations, which are referred to herein several times as being computer-executed, include being manipulated by a computer processing unit in the form of electronic signals representing data in a structured form. This manipulation transforms the data or maintains it at locations in the computer's memory system, which may reconfigure or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the invention have been described in language specific to above, it is not intended to be limited to the specific details shown, since one skilled in the art will recognize that various steps and operations described below may be implemented in hardware.

The video data transmission method and the video data transmission device based on the video stream data can be arranged in any electronic equipment and are used for constructing the topological relation among the triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, and then coding to obtain the target video stream data. The electronic devices include, but are not limited to, wearable devices, head-worn devices, medical health platforms, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, Personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The video data transmission device based on the video stream data is preferably an image processing terminal or a server for transmitting the video data, constructs a topological relation among triangular meshes in a triangular mesh set based on the boundary of a first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, and codes the topological relation and vertex coordinates into character strings in a preset format to further obtain target video stream data, wherein the data size occupied by the volume video is effectively reduced by constructing the topological relation among the triangular meshes in the triangular mesh set.

Referring to fig. 1, fig. 1 is a flowchart illustrating a video data transmission method based on video stream data according to an embodiment of the invention. The video data transmission method based on video stream data of the present embodiment may be implemented using the electronic device, and the video data transmission method based on video stream data of the present embodiment includes:

step 101, acquiring three-dimensional data information of a target video;

step 102, determining a first reference triangular mesh in the triangular mesh set;

103, constructing a topological relation between triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes;

step 104, encoding the topological relation and the vertex coordinates into character strings in a preset format, and storing the character strings obtained by encoding into video stream data of a target video to obtain target video stream data;

and 105, when receiving the data transmission operation triggered by the target video, sending the target video stream data to a data receiving end.

The video data transmission method based on video stream data of the present embodiment is explained in detail below.

In step 101, the video data transmission apparatus acquires three-dimensional data information of a target video. The three-dimensional data information comprises a triangular mesh set and a vertex coordinate set of a three-dimensional model, the three-dimensional model is composed of a group of polygonal patches of a three-dimensional space, and each group of patches comprises a plurality of connected polygons. The polygon is a closed figure formed by sequentially connecting three or more line segments end to end. Preferably, the polygons in the three-dimensional model are triangles. The three-dimensional model can be presented by real objects or imaginary objects, including but not limited to three-dimensional maps, three-dimensional devices, three-dimensional characters, three-dimensional games, and the like.

A triangular Mesh is a kind of polygonal Mesh, which is also called "Mesh" and is a data structure used for modeling various irregular objects in computer graphics. The surface of an object in the real world is intuitively formed by curved surfaces; in the computer world, only discrete structures can be used to simulate real continuous things. Real world surfaces are actually composed of numerous small polygonal patches in a computer. Compared with the models of the following figures, the computer rendered curved surface is very smooth when viewed from the eye, and in fact, a large number of small triangular pieces are used inside the computer to form the shape. The collection of such patches is called Mesh. The Mesh can be composed of a triangle, and also can be composed of other plane shapes such as quadrangle, pentagon and the like; since planar polygons can also be subdivided into triangles in practice. Therefore, it is also general to use a triangular Mesh (Triangle Mesh) composed of all triangles to represent the surface of an object. Vertex refers to the junction of three or more faces in a polyhedron, and in a three-dimensional model, the vertex of each polygon is the vertex of the three-dimensional model, and the vertex coordinates are three-dimensional coordinates, such as (x, y, z). A texture is a picture in two-dimensional space, actually a two-dimensional array, whose elements are color values. The individual color values are referred to as texels or texels. Each texel has a unique address in the texture, i.e. texture coordinates, which are two-dimensional coordinates and can be represented by (u, v). There are cases where a certain number of vertex coordinates are shared in the three-dimensional model, i.e., one vertex coordinate may correspond to a plurality of texture coordinates. Typically at the inflection point of the three-dimensional model, multiple texture coordinates share a vertex coordinate. Thus, in a three-dimensional model, the number of texture coordinates is greater than the number of vertex coordinates.

In the present application, the point coordinates, the normal coordinates and the texture coordinates of the triangular mesh are used to represent the points, the normal and the texture of the triangular mesh, for example, the triangular mesh f (66/229/6642/231/4262/230/62), where 66, 42 and 62 are serial numbers of the point coordinates, 229, 231 and 230 are serial numbers of the normal coordinates, and the corresponding point coordinates, normal coordinates and texture coordinates can be found by table lookup or the like.

Specifically, the point coordinates, the normal coordinates and the texture coordinates in the three-dimensional model can be coded into a Draco Buffer in a standard Draco mode, wherein the representation method of the points is "v 0.026924158697600634-0.015117654524685747-0.21916531359023442", v is an identifier, namely the meaning of vertex points, and the following three numbers are world coordinate system XYZ coordinates, namely the abscissa, the ordinate and the ordinate of the world coordinate system respectively; the normal is represented as: "vn-0.954586757372500210.265212925332830470.13574323882042055", vn being an identifier, i.e. the meaning of vertex normal, the latter three numbers being normal directions of XYZ three directions, respectively; the texture coordinate representation method comprises the following steps: "vt 0.0904169595077159160.75024171767928871", vt is the identifier, i.e. vertex texture, the latter two groups of numbers represent the X and Y coordinates of the texture, it should be noted that the values of the X and Y coordinates of the texture are both in the range of 0-1.

In step 102, the video data transmission apparatus determines a first reference triangular mesh in the set of triangular meshes.

Specifically, the video data transmission apparatus may randomly select a triangular mesh from the triangular mesh set, and determine the selected triangular mesh as a first reference triangular mesh, and of course, the video data transmission apparatus may also determine the first reference triangular mesh from the triangular mesh set according to a preset policy, for example, randomly select a triangular mesh from an outermost triangular mesh of the closed graph, determine the selected triangular mesh as the first reference triangular mesh, for example, determine a triangular mesh coordinate corresponding to the preset policy, and determine the first reference triangular mesh in the triangular mesh set based on the determined triangular mesh coordinate, which is specifically determined according to an actual situation and is not described herein again.

In step 103, the video data transmission apparatus constructs a topological relation between the triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes.

Referring to fig. 2, fig. 2 is a flowchart illustrating a step 103 of a video data transmission method based on video stream data according to an embodiment of the present invention. This step 103 comprises:

step 201, determining a triangular mesh associated with the boundary of a first reference triangular mesh in a triangular mesh set to obtain an associated triangular mesh;

step 202, according to the relative position between the boundary of the first reference triangular mesh and the associated triangular mesh, constructing a topological relation between the triangular meshes in the triangular mesh set.

Wherein the triangular mesh associated with the boundary of the first reference triangular mesh is: in the invention, the triangular meshes in the same plane can be determined as triangular meshes associated with the boundary of the first reference triangular mesh, and then the topological relation between the triangular meshes in the triangular mesh set is constructed according to the relative position between the boundary of the first reference triangular mesh and the associated triangular meshes, such as the associated triangular mesh A on the right side of the boundary of the first reference triangular mesh and the associated triangular mesh B on the upper side of the boundary of the first reference triangular mesh, wherein the topological relation refers to the mutual relation among all spatial data meeting the topological geometry principle.

It should be noted that, in order to avoid a larger deformation of the subsequent three-dimensional model during decoding, the triangular mesh associated with the boundary of the first reference triangular mesh may be determined in the triangular mesh set based on an included angle between each triangular mesh and the plane where the triangular mesh is located, specifically, the triangular mesh whose included angle is smaller than a preset value may be determined as the triangular mesh associated with the boundary of the first reference triangular mesh, and the preset value may be set according to an actual situation, which is not described herein again.

Further, in step 202, a topological relation between the triangular meshes in the triangular mesh set may be constructed based on the region growing principle and the relative positions between the boundary of the first reference triangular mesh and the associated triangular mesh, please refer to fig. 3, fig. 3 is a flowchart of step 202 of an embodiment of the video data transmission method based on video stream data of the present invention, and step 202 includes.

Step 301, selecting a target boundary from the boundaries of the first reference triangular mesh;

step 302, determining an adjacent triangular mesh adjacent to the first reference triangular mesh by taking the target boundary as a reference;

and 303, recording the topological relation between the first reference triangular mesh and the adjacent triangular mesh by adopting a preset identifier, determining the adjacent triangular mesh as the updated first reference triangular mesh, and finally returning to the step of selecting a target boundary in the boundary of the first reference triangular mesh until the topological relation between the triangular meshes in the triangular mesh set is recorded by adopting the identifier.

Specifically, after all the triangular meshes are determined in step 101, the connection relationship between the triangular faces can be determined by using a preset coding method, that is, the topological relationship between the triangular meshes in the triangular mesh set is constructed,

after determining a plurality of triangular surfaces, determining a connection relationship between the triangular surfaces, wherein a preset coding mode used here is as follows:

at initialization, all vertices and triangles are marked as not visited. As shown in FIG. 4, when traversing to triangle X, if item point v has not been visited, then symbol S is output and triangle X and vertex v are marked as visited, and then the triangle to the right of triangle X is traversed, as shown by the arrow in the figure. If vertex v has been visited, there are four possibilities 1, depending on the visited situation for the left triangle and the right triangle of triangle X when there is no triangle on the right of triangle X, the next triangle is visited from the left of the incoming edge of triangle X, at which point the symbol L is output, as shown. 2. When no triangle exists to the left of triangle X, the next triangle is visited from the right of the incoming edge of triangle X, at which point the symbol R is output. 3. If there is a triangle on both the left and right of triangle X, symbol D is output, and then traversal is performed starting with the triangle on the right of triangle X, at which point a recursive call is generated. 4. If both the left and right triangles have been visited, the symbol E is output and the recursive call is returned. In all four cases, triangle X is marked as visited.

Because the structure E is positioned at the outer side of the structure, only one triangle is connected with the structure E, the feedback connection data of the R, L, D structure which is relatively connected with two triangles is less, therefore, when the path structure is arranged, the condition S ═ E > D is met as much as possible, the generation of more E (for reducing the number of paths and increasing the connection data between triangular surfaces) in the annular structure in the model is avoided, the E of the annular structure can be reduced by arranging S or D, but the structure number of D cannot be larger than S.

Further, please refer to fig. 5, fig. 5 is a schematic diagram of constructing a topology relationship in the video data transmission method based on video stream data according to the present invention, s (start) identifies a start point, l (left) identifies a triangle that can enter the left side with reference to an entering edge, r (right) indicates a triangle that can enter the right side with reference to an entering edge, and d (double) identifies a triangle that can enter both sides with reference to an entering edge, but selects the right side to enter; e indicates that the final triangle is reached, e.g., the via design structure of the right graph is due to the via design structure of the left graph. The graph in FIG. 5 may be encoded as { SLLLLRESLLLLE }, as shown in the left diagram of FIG. 5; and also can be coded as SLLLLRLDLLLLE, as shown in the right diagram of figure 5.

In step 104, the video data transmission apparatus encodes the topological relation and the vertex coordinates into a character string in a preset format, and stores the encoded character string in video stream data of the target video to obtain target video stream data. Specifically, the topological relation and the vertex coordinates can be respectively encoded into a binary character string, and the binary character string obtained by encoding is converted into a character string in a Base64 format according to a preset mapping relation. For example, for a triangular mesh with a topological relation of SRDRLESRRRLE, the topological relation can be converted into a binary:

010000110101001001010011010100100100110001000101010000110101001001010010010100100100110001000101, coordinates 0.026924158697600634 to binary 0011000000101110001100000011001000110110001110010011001000110100001100010011010100111000001101100011100100110111001101100011000000110000001101100011001100110100, geometric information encoded as Base64 process: for example, "010000110101" converted to binary is "Q1" after being converted to Base64 according to the table shown in FIG. 6.

The video data transmission device encodes the topological relation and the vertex coordinates into character strings in a preset format, and then stores the encoded character strings into video stream data of a target video to obtain target video stream data.

It should be noted that, in video transmission, each frame of data is a Network Abstraction Layer Unit (NALU), and the NALU is used to store encoded video information and other additional information, that is, a character string obtained by encoding is stored in the Network Abstraction Layer Unit of video stream data, so as to obtain target video stream data.

It should be noted that the encoded video Sequence of h.264 includes a series of NALUs, where each NALU includes an Extended Byte Sequence Payload (EBSP) and a set of NALU Header information corresponding to video encoding, i.e., NALU Header + EBSP.

Among them, compared to the original data Byte stream (a NALU contaminants a Raw Byte Sequence Payload, a Sequence of bytes associating syntax elements, RBSP), EBSP has one more Byte to prevent contention: 0x 03.

Since the start code of NALU is 0x000001 or 0x00000001, and h.264 specifies that when 0x000000 is detected, the end of the current NALU can be indicated, therefore, h.264 proposes a mechanism of "preventing contention", and when the encoder has encoded one NAL, it should detect whether "0 x 0000000", "0 x 00000001", "0 x 0000002" or "0 x 0000003" is present inside the NALU. When their presence is detected, the encoder inserts a new byte just before the last one: 0x 03. It can be seen that, after receiving the h.264 code stream, the decoder needs to detect whether there is a sequence in the EBSP: 0x000003, if any, removing 0x03, thereby obtaining RBSP.

The information header Of the NAL unit defines the type Of the RBSP unit, the rest Of the NAL unit is RBSP Data, after decoding the RBSP Data, an original Data bit Stream (SODB) can be extracted from the RBSP Data, and since the length Of the SODB is not necessarily a multiple Of 8, padding is required, that is, RBSP is SODB + RBSP tail, and thus, the basic structure Of the RBSP is: the end bit is added after the original encoded data, i.e. a bit "1" and several bits "0" are added at the end of the original encoded data.

Furthermore, the decoder can remove the RBSP tail according to the syntax corresponding to h.264 to obtain the SODB, and then, the decoder analyzes the value of the syntax element according to the syntax of the NALU of the corresponding type, thereby reconstructing the image according to the syntax element.

It should be noted that, when the encoder encodes the video data, the encoder may store the Information for assisting decoding and displaying in Supplemental Enhancement Information (SEI), for example, store encoder parameters, video copyright Information, and a clipping event (an event causing scene switching) in the content generation process in the SEI, generate self-defined SEI data, and embed the self-defined SEI data into the video code stream, thereby identifying the video code stream, such as adding an invisible watermark.

The encoding process is a picture frame encoding process, and in order to further improve the data transmission efficiency, the present application determines whether a current picture frame is a key frame (geometric key frame) based on the texture change of the picture, where the determination condition is; acquiring pixel point texture change values of a plurality of image pixel points of a current image frame at set interval pixel intervals, and determining that the current frame is a key frame when the image pixel points of which the pixel point texture change values are larger than a first set value are larger than a second set value (local change is larger) or the average value of the pixel point texture change values is larger than a third set value (overall change is larger); otherwise, determining the current frame as a non-key frame, that is, optionally, in some embodiments, the step of determining the key frame and the non-key frame may specifically include: determining a current picture frame corresponding to a current picture in a target video; acquiring texture change values of pixel points among a plurality of picture pixel points of a current picture frame; and marking the key frames and the non-key frames of the target video according to the texture change value.

Before the key frames and the Non-key frames of the target video are marked, texture coordinates can be unified in an ICP (inductively coupled plasma) and Non-edge tracking mode, namely, the texture coordinates are added to the same coordinate system, so that texture change values of pixel points among a plurality of picture pixel points of the current picture frame can be acquired subsequently, then, all video frames of the target video are traversed, the corresponding key frames and the Non-key frames are determined according to the texture change values of the pixel points among the picture pixel points, and the key frames and the Non-key frames of the target video are marked.

Specifically, it may be detected whether the global texture change value is greater than a first set value, and when the global texture change value is greater than the first set value, a local texture change value corresponding to a local picture in the current picture frame is obtained; when the local texture change value is detected to be larger than a second set value, determining the current picture frame as a key frame, and marking the key frame; and when the integral texture change value is detected to be less than or equal to a second set value, determining the current picture frame as a non-key frame, and marking the non-key frame.

For example, the first setting value is 5, the second setting value is 20, and when the overall texture change value is 10, the local texture change value corresponding to the local picture in the current picture frame is detected, for example, the local texture change value corresponding to the local picture P composed of the pixel point a1, the pixel point a2, the pixel point A3, the pixel point a4, and the pixel point a5 is obtained, that is, a plurality of pixel points are selected in the current picture frame, and the selected pixel points are determined as target pixel points; determining the texture change value between target pixel points as a local texture change value corresponding to a local picture in a current picture frame; when the local texture change value of the local picture P is 30, determining the current picture frame as a key frame, and marking the key frame; when the local texture change value of the local picture P is 15, the current picture frame is determined as a non-key frame, and the non-key frame is marked.

It should be noted that, if the picture frame of the current picture is a key frame, the current picture frame may be marked by using texture coordinates, for example, vt 0.0904169595077159160.75024171767928871 marks the current picture frame; if the frame of the current picture is a non-key frame, the { vt 1} can be marked in the texture coordinate thereof, so that the texture information of the picture frame can be compressed and simplified.

It should be further noted that, when it is detected that the overall texture change value is greater than the third set value, the current picture frame is determined as the key frame; when the texture change value of the target pixel point is detected to be less than or equal to a third set value, determining the current picture frame as a non-key frame, wherein the first set value is reduced along with the increase of the number of continuous non-key frames, the second set value is reduced along with the increase of the number of continuous non-key frames, and the third set value is reduced along with the increase of the number of continuous non-key frames; since a small change in the screen may cause the current frame to be determined as a non-key frame, all the setting values are set as dynamic setting values so that a small change in the screen can be recognized in a short time. Here it is seen that the degree of reduction may be a fixed value, e.g. 10%, or a variable value of 10% × n (n being the number of consecutive non-key frames).

And when the number of the continuous non-key frames is greater than or equal to a fourth set value, the current frame is directly set as the key frame, namely, if the current picture frame is the non-key frame and the number of the continuous non-key frames corresponding to the current picture frame is greater than the fourth set value, the current picture frame is determined as the key frame. It should be noted that determining the key frame is followed by performing the geometric encoding process

In step 105, when receiving a data transmission operation triggered by a target video, the video data transmission apparatus sends target video stream data to a data receiving end, for example, a user performs a video playing operation triggered by the target video on a display screen of a mobile phone, the mobile phone responds to the video playing operation, and for the data transmission operation triggered by the target video, the video data transmission apparatus sends the target video stream data to the mobile phone, and then the mobile phone receives the target video stream data and decodes the target video stream data, where the decoding process is as follows: the mobile phone acquires SEI information in an original byte sequence load from a NAL (network element), extracts information coded in a Base64 format from the SEI information, and then decodes the information coded in the Base64, wherein the first decoding process is conversion from Base64 to binary, for example, after Base64, "Q1" is converted into binary "010000110101", and then the binary character string is converted into character information: SLLLLRESLLLLE and the vertex coordinates, then decoding SLLLLRESLLLLE, the second decoding process restores the compressed triangle information, that is, according to the sequence of SLLLLRESLLLLE, the coordinates of 3 points are sequentially taken from the point coordinate list, the next triangle takes two repeated points of the previous triangle, and finally the original geometric information is obtained by decoding, so as to realize the playing of the target video.

Similarly, the information of the video frame may be encoded in the above manner, and please refer to the foregoing embodiment specifically, which is not described herein again. This completes the video data transmission process of the video data transmission method based on the video stream data of the present embodiment.

The video data transmission method based on video stream data of the embodiment constructs a topological relation between triangular meshes in a triangular mesh set based on the boundary of a first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, and codes the topological relation and vertex coordinates into character strings in a preset format to further obtain target video stream data.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a video data transmission apparatus according to the present invention, which can be implemented by using the video data transmission method according to the present invention. The video data transmission apparatus 60 based on video stream data of this embodiment includes an obtaining module 601, a determining module 602, a constructing module 603, an encoding module 604, a storing module 605 and a sending module 606, which are specifically as follows:

an obtaining module 601, configured to obtain three-dimensional data information of a target video, where the three-dimensional data information includes a triangular mesh set and a vertex coordinate set of a three-dimensional model;

a determining module 602 for determining a first reference triangular mesh in the set of triangular meshes;

a constructing module 603, configured to construct a topological relation between triangular meshes in a triangular mesh set according to a boundary of a first reference triangular mesh and a position relation between the first reference triangular mesh and other triangular meshes;

the encoding module 604 is configured to encode the topological relation and the vertex coordinates into a character string in a preset format;

the storage module 605 is configured to store the encoded character string into video stream data of the target video, so as to obtain target video stream data;

a sending module 606, configured to send target video stream data to a data receiving end when receiving a data transmission operation triggered for a target video.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a building module of an embodiment of an apparatus for transmitting video data based on video stream data according to the present invention, where the building module 603 includes a determining unit 6031 and a building unit 6032.

The determining unit 6031 is configured to determine a triangular mesh associated with a boundary of the first reference triangular mesh in the triangular mesh set, to obtain an associated triangular mesh; a constructing unit 6032, configured to construct a topological relation between the triangular meshes in the triangular mesh set according to the relative positions between the boundary of the first reference triangular mesh and the associated triangular mesh.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a constructing unit of an embodiment of a video data transmission apparatus based on video stream data according to the present invention. The construction unit 6032 includes a pull selection subunit 6032A, a determination subunit 6032B, and a recording subunit 6032C.

A selection subunit 6032A configured to select a target boundary among the boundaries of the first reference triangular mesh; a determination subunit 6032B configured to determine, with the target boundary as a reference, an adjacent triangular mesh adjacent to the first reference triangular mesh; a recording subunit 6032C, configured to record a topological relationship between the first reference triangular mesh and an adjacent triangular mesh by using a preset identifier, and determine the adjacent triangular mesh as the updated first reference triangular mesh; and returning to the step of selecting the target boundary in the boundaries of the first reference triangular mesh until the topological relation between the triangular meshes in the triangular mesh set is recorded by using the identifier.

Further, the recording subunit 6032C may specifically be configured to: when three boundaries of the triangular meshes in the triangular mesh set are detected to be adjacent to other triangular meshes, determining the triangular meshes of which the three boundaries are adjacent to other triangular meshes as reference triangular meshes; selecting a second reference triangular mesh from triangular meshes adjacent to the reference triangular mesh, and deleting the topological relation between the reference triangular mesh and the second reference triangular mesh; and recording the topological relation between the second reference triangular mesh and the adjacent triangular mesh by adopting a preset identifier, and recording the topological relation between the reference triangular mesh and the adjacent triangular mesh by adopting a preset branch identifier.

Further, the encoding module 604 may be specifically configured to encode the topological relation and the vertex coordinate into binary strings respectively; and converting the binary character string obtained by encoding into a character string in a Base64 format according to a preset mapping relation. The storage module 605 may be specifically configured to obtain video stream data of the target video, and store the encoded character string in a network abstraction layer of the video stream data to obtain the target video stream data.

This completes the video data transmission process of the video data transmission apparatus 60 based on the video stream data of the present embodiment.

The specific operation principle of the video data transmission apparatus based on video stream data of this embodiment is the same as or similar to that described in the above embodiment of the video data transmission method based on video stream data, and please refer to the detailed description in the above embodiment of the video data transmission method based on video stream data.

The video data transmission device based on video stream data of the embodiment constructs a topological relation between triangular meshes in a triangular mesh set based on the boundary of a first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, and codes the topological relation and vertex coordinates into character strings in a preset format, so that target video stream data is obtained.

The following describes a specific working principle of the video data transmission method based on video stream data according to an embodiment of the present invention. Referring to fig. 10, fig. 10 is a flowchart illustrating a video data transmission method and a video data transmission apparatus according to an embodiment of the invention.

The video data transmission device based on video stream data of the embodiment is arranged in a data transmission terminal (hereinafter referred to as terminal), and the flow of video data transmission by the data transmission terminal comprises the following steps:

step 901, the terminal acquires three-dimensional data information of the target video.

Step 902, the terminal determines a first reference triangular mesh in the triangular mesh set;

step 903, the terminal determines a triangular mesh associated with the boundary of the first reference triangular mesh in the triangular mesh set to obtain an associated triangular mesh;

904, the terminal constructs a topological relation between triangular meshes in the triangular mesh set according to the relative position between the boundary of the first reference triangular mesh and the associated triangular mesh;

step 905, the terminal encodes the topological relation and the vertex coordinates into character strings in a preset format, and stores the character strings obtained by encoding into video stream data of a target video to obtain target video stream data;

step 906, when receiving the data transmission operation triggered by the target video, the terminal sends the target video stream data to the data receiving end.

This completes the flow of video data transmission by the data transmission terminal of the present embodiment.

According to the video data transmission method and the video data transmission device based on the video stream data, the topological relation among the triangular meshes in the triangular mesh set is constructed according to the boundary of the first reference triangular mesh and the position relation between the first reference triangular mesh and other triangular meshes, the data volume corresponding to 3 vertexes of each triangular mesh does not need to be recorded, the information needing to be recorded is greatly compressed, and the data occupied by the volume video is reduced; the technical problem that data occupied by existing volume videos are large is effectively solved.

As used herein, the terms "component," "module," "system," "interface," "process," and the like are generally intended to refer to a computer-related entity: hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.

Fig. 11 and the following discussion provide a brief, general description of an operating environment of an electronic device in which the video data transmission apparatus of the present invention may be implemented. The operating environment of FIG. 11 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the operating environment. Example electronic devices 1012 include, but are not limited to, wearable devices, head-mounted devices, medical health platforms, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, Personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Although not required, embodiments are described in the general context of "computer readable instructions" being executed by one or more electronic devices. Computer readable instructions may be distributed via computer readable media (discussed below). Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the computer readable instructions may be combined or distributed as desired in various environments.

Fig. 11 illustrates an example of an electronic device 1012 that includes one or more embodiments of the video stream data-based video data transmission apparatus of the present invention. In one configuration, electronic device 1012 includes at least one processing unit 1016 and memory 1018. Depending on the exact configuration and type of electronic device, memory 1018 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated in fig. 11 by dashed line 1014.

In other embodiments, electronic device 1012 may include additional features and/or functionality. For example, device 1012 may also include additional storage (e.g., removable and/or non-removable) including, but not limited to, magnetic storage, optical storage, and the like. Such additional storage is illustrated in FIG. 11 by storage 1020. In one embodiment, computer readable instructions to implement one or more embodiments provided herein may be in storage 1020. Storage 1020 may also store other computer readable instructions to implement an operating system, an application program, and the like. Computer readable instructions may be loaded in memory 1018 for execution by processing unit 1016, for example.

The term "computer readable media" as used herein includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data. Memory 1018 and storage 1020 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by electronic device 1012. Any such computer storage media may be part of electronic device 1012.

Electronic device 1012 may also include communication connection(s) 1026 that allow electronic device 1012 to communicate with other devices. Communication connection(s) 1026 may include, but is not limited to, a modem, a Network Interface Card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interfaces for connecting electronic device 1012 to other electronic devices. The communication connection 1026 may comprise a wired connection or a wireless connection. Communication connection(s) 1026 may transmit and/or receive communication media.

The term "computer readable media" may include communication media. Communication media typically embodies computer readable instructions or other data in a "modulated data signal" such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" may include signals that: one or more of the signal characteristics may be set or changed in such a manner as to encode information in the signal.

Electronic device 1012 may include input device(s) 1024 such as keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device. Output device(s) 1022 such as one or more displays, speakers, printers, and/or any other output device may also be included in device 1012. Input device 1024 and output device 1022 may be connected to electronic device 1012 via a wired connection, wireless connection, or any combination thereof. In one embodiment, an input device or an output device from another electronic device may be used as input device 1024 or output device 1022 for electronic device 1012.

The components of electronic device 1012 may be connected by various interconnects, such as a bus. Such interconnects may include Peripheral Component Interconnect (PCI), such as PCI express, Universal Serial Bus (USB), firewire (IEEE13104), optical bus structures, and so forth. In another embodiment, components of electronic device 1012 may be interconnected by a network. For example, memory 1018 may be comprised of multiple physical memory units located in different physical locations interconnected by a network.

Those skilled in the art will realize that storage devices utilized to store computer readable instructions may be distributed across a network. For example, electronic device 1030 accessible via network 1028 may store computer readable instructions to implement one or more embodiments of the present invention. Electronic device 1012 may access electronic device 1030 and download a part or all of the computer readable instructions for execution. Alternatively, electronic device 1012 may download pieces of the computer readable instructions, as needed, or some instructions may be executed at electronic device 1012 and some at electronic device 1030.

Various operations of embodiments are provided herein. In one embodiment, the one or more operations may constitute computer readable instructions stored on one or more computer readable media, which when executed by an electronic device, will cause the computing device to perform the operations. The order in which some or all of the operations are described should not be construed as to imply that these operations are necessarily order dependent. Those skilled in the art will appreciate alternative orderings having the benefit of this description. Moreover, it should be understood that not all operations are necessarily present in each embodiment provided herein.

Also, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations, and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for a given or particular application. Furthermore, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.

Each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Each apparatus or system described above may perform the method in the corresponding method embodiment.

In summary, although the present invention has been disclosed in the foregoing embodiments, the serial numbers before the embodiments are used for convenience of description only, and the sequence of the embodiments of the present invention is not limited. Furthermore, the above embodiments are not intended to limit the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore, the scope of the present invention shall be limited by the appended claims.

Claims

1. A method for transmitting video data based on video stream data, comprising:

determining a first reference triangular mesh in the set of triangular meshes;

2. The method according to claim 1, wherein constructing the topological relationship between the triangular meshes in the triangular mesh set according to the boundary of the first reference triangular mesh and the position relationship between the reference triangular mesh and other triangular meshes comprises:

3. The method of claim 2, wherein constructing the topological relationship between the triangular meshes in the set of triangular meshes according to the relative positions between the boundaries of the first reference triangular mesh and the associated triangular meshes comprises:

4. The method of claim 3, further comprising:

5. The method according to claim 1, wherein the encoding the topological relation and the vertex coordinates into a character string in a preset format comprises:

6. The method according to claim 1, wherein storing the encoded character string in the video stream data of the target video to obtain the target video stream data comprises:

7. A video data transmission apparatus based on video stream data, comprising:

8. The video data transmission apparatus according to claim 7, wherein the construction module comprises:

9. The video data transmission apparatus according to claim 7, wherein said construction unit includes:

the recording unit is used for recording the topological relation between the first reference triangular mesh and the adjacent triangular mesh by adopting a preset identifier, and determining the adjacent triangular mesh as the updated first reference triangular mesh; and returning to the step of selecting the target boundary in the boundaries of the first reference triangular mesh until the topological relation among the triangular meshes in the triangular mesh set is recorded by adopting the identifiers.

10. A storage medium having stored therein processor-executable instructions to be loaded by one or more processors to perform the video data transmission method based on video stream data according to any one of claims 1 to 6.