CN109510975B

CN109510975B - Video image extraction method, device and system

Info

Publication number: CN109510975B
Application number: CN201910053769.2A
Authority: CN
Inventors: 孟宪民; 李小波
Original assignee: Hengxin Shambala Culture Co ltd
Current assignee: Hengxin Shambala Culture Co ltd
Priority date: 2019-01-21
Filing date: 2019-01-21
Publication date: 2021-01-05
Anticipated expiration: 2039-01-21
Also published as: CN109510975A

Abstract

The application discloses a method, equipment and a system for extracting a video image, and relates to the field of image processing. The main technical scheme of the application is as follows: creating left and right virtual cameras, and acquiring viewport data through the left and right virtual cameras; creating left and right graphs according to viewport data acquired by the left and right virtual cameras; merging and rendering the left and right images to obtain texture image data; and sending the texture image data to a VR device. This application adopts the technical scheme who establishes two virtual cameras in video extraction equipment, the video image on the collection video extraction equipment that can be faster for the image that shows after transmitting for VR equipment is more accurate, need not additionally set up another video extraction equipment in addition and come the left and right eyes of rendering up VR equipment respectively, reduce cost, and can realize that the eyes are synchronous to be updated about VR equipment, simulation that can be better and present the stereogram.

Description

Video image extraction method, device and system

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, device, and system for extracting a video image.

Background

As is well known, the real world is a real three-dimensional stereoscopic world, and most of the existing display devices can only display two-dimensional information and cannot provide people with immersion. In order to make the displayed scenes and objects have the effect of depth of field, many attempts have been made, and the research of 3D display technology has been going through the development of more than ten years, with very fruitful results.

The current 3D display technologies mainly include the following categories:

(1) the three-dimensional technology adopting the optical principle comprises the following steps: the technology is mainly realized by using optical lenses of prisms, polaroids, perspective or gratings, and one image is formed into two different images through the filtering or polarization principle of the optical lenses to be respectively presented in the left eye and the right eye of a person so as to form a three-dimensional image. The technology is influenced by optical lenses and environment, and the most real picture cannot be clearly shown to a user.

(2) Virtual reality stereoscopic projection technology: the video signal output ends of the two computers are respectively connected with the video signal input ends of the two projectors. The front ends of the two projectors are respectively provided with a shading box, the front ends of the two shading boxes are respectively provided with a polarizer, and the directions of the polarization axes of the two polarizers are mutually vertical. The two projectors in this way correspond to the eyes of a person, and the output video content can cause the visual difference between the left eye and the right eye through the polarizers, so that a stereoscopic image is generated in the brain. The technology needs two hosts to respectively render to the respective associated devices, is high in cost and cannot synchronously update.

Disclosure of Invention

The application provides a video image extraction method, which comprises the following steps: creating left and right virtual cameras, and acquiring viewport data through the left and right virtual cameras; creating left and right graphs according to viewport data acquired by the left and right virtual cameras; merging and rendering the left and right images to obtain texture image data; and sending the texture image data to a VR device.

As above, after the left and right virtual cameras are created, initializing a camera distance of the left and right cameras as an average value of pupil distances of both eyes; and responding to the latest camera distance of the VR equipment, and setting the camera distance of the left camera and the right camera to be the latest camera distance.

Creating a left virtual camera and a right virtual camera, wherein creating a collection viewport and initializing viewport data of the collection viewport; initializing viewport data of the acquisition viewport specifically comprises creating a device, a context, a swap chain, a render target, and the viewport, setting the render target to be output to a screen using the context, and initializing the viewport data.

As above, the obtaining of viewport data by the left and right virtual cameras specifically includes the following sub-steps: loading real-time image acquisition plug-ins for the left virtual camera and the right virtual camera respectively and initializing the acquisition plug-ins; and calling a rendering hardware interface in the acquisition plug-in, acquiring current page rendering data from a rendering target in real time by switching contexts and using a swap chain, and updating viewport data by using the current page rendering data.

As above, initializing the collection plug-in specifically includes the following sub-steps: acquiring a scene viewport, and acquiring the width and the height of a current window and a required interface through the scene viewport; creating an application layer renderer and acquiring viewport resource data; acquiring resources of a top-layer window through an application layer renderer; and forcibly converting the acquired resources of the top window into a type which can be identified by a rendering hardware interface.

As described above, the viewport data is updated by using the current page rendering data, specifically, the RGB values of each pixel point in the page are obtained row by row and column by column according to the height and width of the current page, the CPU inputs the RGB values into the GPU in a single thread manner, and the viewport data is updated by using the RGB values.

The present application also provides a video extraction device, comprising: the creation module is used for creating a left virtual camera and a right virtual camera and acquiring viewport data through the left virtual machine and the right virtual machine; the first rendering module is used for creating left and right images according to viewport data acquired by the left and right virtual cameras, and merging and rendering the left and right images to obtain texture image data; and the first communication module is used for sending the texture image data to the VR equipment.

As above, the creating module is further configured to initialize a camera distance of the left and right cameras to be an average value of pupil distances of both eyes after the left and right virtual cameras are created; the video extraction device further comprises a setting module for setting the camera pitch of the left and right cameras to the latest camera pitch in response to the latest camera pitch of the VR device.

The present application further provides a system for extracting a video image, including: the above video extracting device; the server comprises a second communication module and is used for forwarding the texture image data of the video extraction equipment to VR equipment; a VR device including a third communication module and a second rendering module; the third communication module is used for receiving texture image data from the server, and the second rendering module is used for dividing the received texture image data into left and right eye scene images and rendering the left and right eye scene images to left and right cameras of the equipment respectively.

As above, the second rendering module specifically includes: rendering a submodule: the texture image rendering device is used for acquiring a left-eye scene image and a right-eye scene image according to received texture image data, rendering the left-eye scene image and the right-eye scene image to one texture image and obtaining a target texture image; an anti-distortion submodule: the method comprises the steps that a screen area visible to human eyes is determined through parameters of a device screen and parameters of lenses, an anti-distortion mesh is constructed based on the screen area visible to human eyes, mesh vertexes of the anti-distortion mesh are determined, and the mesh vertexes after anti-distortion are determined through the mesh vertexes of the anti-distortion mesh and a drawing view port of a screen of a target terminal; the rendering submodule is further used for determining an image subjected to inverse distortion processing through the mesh vertex subjected to inverse distortion and the target image, dividing the image subjected to inverse distortion processing into a left inverse distortion image and a right inverse distortion image, and rendering the left inverse distortion image and the right inverse distortion image to left and right screens of VR equipment respectively.

The beneficial effect that this application realized is as follows: this application adopts the technical scheme who establishes two virtual cameras in video extraction equipment, the video image on the collection video extraction equipment that can be faster for the image that shows after transmitting for VR equipment is more accurate, need not additionally set up another video extraction equipment in addition and come the left and right eyes of rendering up VR equipment respectively, reduce cost, and can realize that the eyes are synchronous to be updated about VR equipment, simulation that can be better and present the stereogram.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a video image extraction method according to an embodiment of the present application;

fig. 2 is a flowchart illustrating specific operations of initializing after an application is started in a video extraction device according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a specific operation of initializing a collection plug-in for an application in a video extraction device according to an embodiment of the present disclosure;

fig. 4 is a specific operation flowchart of rendering, by VR equipment in a video extraction system, left and right eye scene images to left and right cameras of the equipment respectively according to an embodiment of the present application;

fig. 5 is a schematic diagram of a video image extraction system according to a second embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method for extracting the video image is suitable for a system consisting of video extraction equipment (a PC or a mobile device, and the like), a server and VR equipment, wherein an application program for extracting the video image is operated in the video extraction equipment, the server is used for transmitting the video image between the video extraction equipment and the VR equipment, and the VR equipment is used for rendering the video image received from the video extraction equipment to the left eye and the right eye of the equipment to finish the display of the image.

Example one

Referring to fig. 1, an embodiment of the present application provides a method for extracting a video image, which specifically includes:

step 110: starting an application program of the video extraction equipment, creating a collection viewport in the application program, and initializing viewport data of the collection viewport;

as shown in fig. 2, initializing after the application is started specifically includes the following sub-steps:

step 210: creating a device (device), a context (context), and a swap chain (swapchain);

the device is used for loading video resources in the loading process; the context is used for setting data of an incoming display card in a rendering process; the swap chain is used for describing an output window, a rendering frame rate and a rendering target, and provides a foreground cache and a background cache, wherein the foreground cache is used for rendering, and the background cache is used for drawing the latest image data.

Step 220: creating a rendering target;

the render target (render target) is the final destination of all drawing activities, i.e., the screen, from which the application retrieves the page rendering data when running under the editor.

Step 230: the rendering target is set to be output to the screen using a context.

Step 240: creating a viewport (viewport), initializing viewport data;

wherein, the viewport data includes the height and width of the viewport and RGB information of each pixel point position (RGB color mode is a color standard in the industry, and various colors are obtained by the variation of three color channels of red (R), green (G) and blue (B) and their superposition with each other); after the application program is started, the viewport data is initialized to the set height and width, and the RGB information of each pixel point is set to the initial color, such as white.

In this embodiment, after the application program of the video extraction device is started, it is further determined whether the video extraction device is connected to the server, the connection server successfully makes preparations for the application program to connect to the VR device for data transmission, and the application program sends the acquired image data to the VR device through the server.

Referring back to fig. 1, step 120: the application program creates a left virtual camera and a right virtual camera, and the left virtual camera and the right virtual camera acquire viewport data;

since the principle of rendering a stereo effect by a VR device is that each eye of the human simulator sees a distinct view, the brain combines the two to form a 3D stereo image, which is stereo vision. Because the scene seen by the left eye is different from the scene seen by the right eye, the binocular difference is formed, so that two virtual cameras are simultaneously established to simulate the left eye and the right eye of a user when an application program is manufactured, and plane images with certain difference are respectively obtained;

the application program creates two virtual cameras, initializes the distance between the left virtual camera and the right virtual camera, the initial distance is set according to the interpupillary distance of two eyes of a person, the interpupillary distance IPD of the two eyes ranges from 52mm to 78mm, and preferably, the camera distance ICD of the left virtual camera and the right virtual camera is initialized to the average value of the interpupillary distance of 60 mm.

In this embodiment, the method for acquiring viewport data by an application program through left and right virtual cameras specifically includes the following sub-steps:

step 121: the application program loads real-time image acquisition plug-ins for the left virtual camera and the right virtual camera respectively and initializes the acquisition plug-ins;

in this embodiment, a new blank plug-in template is created under a window editor in which an application program runs, and then an engineering file is generated, and an image acquisition plug-in is loaded into the engineering file.

Referring to fig. 3, the application program initializes the acquisition plug-in, which specifically includes:

step 310: acquiring a scene viewport (scenewport), and acquiring the width and height of a current window and a required interface through the scene viewport;

the window type comprises an editor mode and a runtime mode, scene viewport data is obtained in the runtime mode, and a rendering hardware interface in the runtime mode is processed;

step 320: calling an interface function (FSlateRenderer) to create an application layer renderer and acquiring viewport resource data;

step 330: acquiring resources of a top-layer window through an application layer renderer;

specifically, the application layer renderer acquires the viewport component through the scene viewport, and converts the node of the viewport component into a window class, that is, acquires the resource of the top window.

Step 340: forcibly converting the acquired resources of the top window into a type which can be identified by a rendering hardware interface;

in this embodiment, the rendering hardware interface may be called to obtain rendering data only after the window resource is converted into a type that can be identified by the rendering hardware interface RHI.

Further, after the initialization of the acquisition plug-in is successful, the method further comprises the step of obtaining the current viewport, the resolution and the rendering command list interface.

Step 122: calling a rendering hardware interface in the acquisition plug-in, acquiring current page rendering data from a rendering target in real time by switching contexts and using an exchange chain, and updating viewport data by using the current page rendering data;

the method comprises the steps that current page rendering data comprise the height and the width of a current page and RGB information of the position of each pixel point, RGB values of each pixel point in the page are obtained line by line and column by column according to the height and the width of the current page, a CPU inputs the RGB values into a GPU in a single-thread mode, and viewport data are updated by using the RGB values;

since the user interface rendering (UI rendering) is performed by the main thread of the application program, but is constrained by the processing capability of the main thread and the CPU performance, when the task processing of the main thread is heavy or the CPU performance is low, a page may be jammed during UI rendering. Therefore, according to the UI rendering included in the rendering task 1 or the rendering task 2, when a main thread of an application program executes rendering operation in a CPU, the method transfers the operation of obtaining the backup cache data for rendering to a sub-thread, namely the rendering thread, and executes the backup cache data, wherein the main thread obtains the backup cache data from the rendering thread by calling a rendering hardware interface RHI, and continues to execute the rendering operation in the main thread, so that the load of the main thread is reduced, and meanwhile, the user interface pause caused by the fact that the main thread cannot process the rendering task in time when the task is heavy is reduced;

referring back to fig. 1, step 130: the application program creates left and right images according to viewport data acquired by the left and right virtual cameras, merges and renders the left and right images to obtain texture image data, and sends the texture image data to VR equipment through a server;

specifically, two graphs are created according to viewport data acquired by a left virtual camera and a right virtual camera, then the two graphs are combined into a texture graph, and the texture image data is sent to VR equipment through a server;

the left and right virtual cameras merge and render the left and right images to obtain texture image data specifically includes creating a texture and obtaining a surface of the texture, rendering a scene to the surface of the texture, and rendering the texture.

Step 140: the VR equipment divides the received texture image data into left and right eye scene images, and renders the left and right eye scene images to left and right cameras of the equipment respectively;

in this embodiment, a left camera and a right camera are also created in the VR device, texture image data is divided into a left-eye scene image and an eye scene image, then the left-eye scene image is rendered to the left-eye camera, and the right-eye scene image is rendered to the right-eye camera, referring to fig. 4, the operation specifically includes the following sub-steps:

step 410: and acquiring a left-eye scene image and a right-eye scene image according to the received texture image data.

Specifically, after receiving texture image data sent by the video extraction device, the VR device splits the texture image data in an inverse manner of merging the texture image with the video extraction device, and obtains a left-eye scene image and a right-eye scene image.

Step 420: rendering the left-eye scene image and the right-eye scene image onto a texture image to obtain a target texture image;

in this embodiment, in order to avoid time consumption caused by transmitting the texture images twice, the left-eye scene image and the right-eye scene image are rendered into one texture image, and then the transmission of the texture images is performed only once. Specifically, the left-eye scene image and the right-eye scene image are rendered to two non-overlapping regions on the texture image.

Step 430: determining a screen area visible to human eyes according to the parameters of the equipment screen and the parameters of the lens;

parameters of a screen of the VR device may include a width and a height of the screen, a size of the drawing viewport, and the like, and parameters of a lens of the VR device include a field angle, a refractive index, and the like of the lens; specifically, the width and height of the screen may be determined by the DPI (number of pixels per inch) of the screen, and further, the DPI of the screen may be acquired from the target terminal through the system interface.

Step 440: constructing an inverse distortion mesh based on a screen area visible to human eyes, and determining mesh vertexes of the inverse distortion mesh;

in order to enable a user to have a real sense of immersion in vision, VR equipment needs to cover the visual range of human eyes as far as possible, and therefore a spherical radian lens is arranged in the VR equipment, but images of the spherical radian lens are distorted when the images are projected into the human eyes, so that the human eyes cannot accurately acquire the positioning of a virtual space.

The grid vertex of the inverse distortion grid is the position coordinate of each grid vertex in the inverse distortion grid.

Step 450: determining the mesh vertex after the inverse distortion through the mesh vertex of the inverse distortion mesh and a drawing viewport of a screen of a target terminal;

specifically, the distance between the mesh vertex of the inverse distortion mesh and the center of the drawing viewport of the screen of the target terminal is calculated, and the mesh vertex after inverse distortion is determined based on the calculated distance.

Step 460: determining an image subjected to inverse distortion processing through the mesh vertex subjected to inverse distortion and the target image, dividing the image subjected to inverse distortion processing into a left inverse distortion image and a right inverse distortion image, and respectively rendering the left inverse distortion image and the right inverse distortion image to a left screen and a right screen of VR equipment;

in this embodiment, rendering left and right anti-distortion images to left and right screens of a VR device respectively includes the following steps:

local space: namely a modeling space, and locally organizing the organization mode of a triangle;

world space: converting the objects in the local space into the objects in the world space through translation (D3DXMaterxtranslation function), rotation (D3 DXMaterxrotation X/Y/Z/Axis function) and scaling (D3DXMaterxScalling function), thereby realizing the organization of the scene;

view space: moving the camera to the origin of the world space, rotating the camera to make the positive direction of the camera consistent with the Z direction of the world space, and when the camera is moved or rotated, making the geometric figure of the world space change along with the change of the camera to obtain a camera view matrix (D3DXMatrix LookAtLH function);

back picking: rejecting useless back polygons (g _ Device- > SetRendstate (D3DRS _ CULLMODE, Value)) in a back sorting mode;

light cutting: providing illumination in world space and cropping the portion of the geometry that exceeds the frustum;

projection: converting a 3D scene into a 2D image through a projection transformation matrix (D3DXMaterxPerspectiveFovLH), and then transferring to a projection window;

viewport transformation: a reliable transformation (g _ pDevice- > SetView Port (D3DVIEWPORT) of the projection window into a matrix area on the screen;

and (3) rasterization treatment: and calculating the pixel value of each point in each triangle to be displayed, and displaying the image after the transformation of the viewport on the left screen and the right screen of the VR equipment.

This application VR equipment is after showing video image, still include the user through the interval of two virtual cameras in the adjustment VR equipment of the adjust knob on the regulation VR equipment, then VR equipment sends the camera interval after will adjusting to video extraction equipment through the server, the camera interval of two inside virtual cameras is adjusted simultaneously to video extraction equipment, with this newest camera interval collection video image and send to VR equipment and show, the user adjusts the camera interval in real time so that reflect the picture content more really, increase user experience.

Example two

As shown in fig. 5, a second embodiment of the present application provides a system for extracting a video image, where the system 5 for extracting a video image includes: a video extraction device 510, a server 520, and a VR device 530;

among them, the video extraction device 510 includes the following components:

a creating module 511, configured to create left and right virtual cameras, and obtain viewport data through the left and right virtual machines;

a first rendering module 512, configured to merge and render viewport data acquired by the left and right virtual cameras to a texture to obtain texture image data;

a first communication module 513 is configured to send the texture image data to the VR device.

Specifically, the creating module 511 is further configured to initialize a camera distance of the left and right cameras to be an average value of pupil distances of both eyes after the left and right virtual cameras are created;

further, the video image device 510 also includes a setting module 514 for setting the camera pitch of the left and right cameras to the latest camera pitch in response to the latest camera pitch of the VR device.

The server 520 includes a second communication module 521 for forwarding the texture image data of the video extraction device to the VR device 530.

A VR device 530 comprising a third communication module 531 and a second rendering module 532; the third communication module 531 is configured to receive texture image data from the server, and the second rendering module 532 is configured to divide the received texture image data into left and right eye scene images, and render the left and right eye scene images to left and right cameras of the device respectively;

specifically, the second rendering module 532 specifically includes:

rendering submodule 5321: the texture image rendering device is used for acquiring a left-eye scene image and a right-eye scene image according to received texture image data, rendering the left-eye scene image and the right-eye scene image to one texture image and obtaining a target texture image;

anti-distortion sub-module 5322: the method comprises the steps that a screen area visible to human eyes is determined through parameters of a device screen and parameters of lenses, an anti-distortion mesh is constructed based on the screen area visible to human eyes, mesh vertexes of the anti-distortion mesh are determined, and the mesh vertexes after anti-distortion are determined through the mesh vertexes of the anti-distortion mesh and a drawing view port of a screen of a target terminal;

further, the rendering sub-module 5321 is further configured to determine an image after inverse distortion processing from the mesh vertex and the target image after inverse distortion processing, divide the image after inverse distortion processing into left and right inverse distortion images, and render the images onto left and right screens of the VR device 530, respectively.

The beneficial effect that this application realized is as follows:

(1) the two virtual cameras are built in the video extraction equipment, so that video images on the video extraction equipment can be collected more quickly, and the images displayed after being transmitted to the VR equipment are more accurate;

(2) the additional arrangement of another video extraction device for respectively rendering the left eye and the right eye of the VR device is not needed, so that the cost is reduced, and the synchronous updating of the left eye and the right eye of the VR device can be realized;

(2) the camera distance between two virtual cameras respectively established in the video extraction equipment and the VR equipment can be adjusted in real time, better human-computer interaction can be realized, and a stereogram can be better simulated and presented.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for extracting a video image, comprising:

creating left and right virtual cameras in video extraction equipment, and acquiring viewport data through the left and right virtual cameras;

creating left and right graphs according to viewport data acquired by the left and right virtual cameras;

merging and rendering the left and right images to obtain texture image data;

sending the texture image data to a VR device;

creating a left virtual camera and a right virtual camera in VR equipment, dividing texture image data into left and right eye scene images, and rendering the left and right eye scene images to the left and right virtual cameras of the equipment respectively;

after the VR equipment displays the video images, the method further comprises the steps that a user adjusts the distance between two virtual cameras in the VR equipment by adjusting an adjusting knob on the VR equipment, then the VR equipment sends the adjusted distance between the cameras to video extraction equipment through a server, the video extraction equipment simultaneously adjusts the distance between the two cameras of the two virtual cameras in the video extraction equipment, and the video images are collected at the latest distance between the cameras and sent to the VR equipment for display;

before the left virtual camera and the right virtual camera are created in the video extraction equipment, a collection view port is created, and view port data of the collection view port is initialized; initializing viewport data of a collection viewport specifically comprises creating a device, a context, a swap chain, a rendering target and the viewport, setting the rendering target to be output to a screen by using the context, and initializing the viewport data;

obtaining viewport data through left and right virtual cameras, specifically comprising the following sub-steps:

loading real-time image acquisition plug-ins for the left virtual camera and the right virtual camera respectively and initializing the acquisition plug-ins;

calling a rendering hardware interface in the acquisition plug-in, acquiring current page rendering data from a rendering target in real time by switching contexts and using an exchange chain, and updating viewport data by using the current page rendering data;

initializing the acquisition plug-in, specifically comprising the following substeps:

acquiring a scene viewport, and acquiring the width and the height of a current window and a required interface through the scene viewport;

creating an application layer renderer and acquiring viewport resource data;

acquiring resources of a top-layer window through an application layer renderer;

forcibly converting the acquired resources of the top window into a type which can be identified by a rendering hardware interface;

when a main thread of an application program executes rendering operation in a CPU, the method transfers the obtaining operation of the backup cache data for rendering to a sub-thread, namely the rendering thread, by adopting an asynchronous processing mode, and the main thread obtains the backup cache data from the rendering thread by calling a rendering hardware interface RHI and continues to execute the rendering operation in the main thread.

2. The method for extracting video images according to claim 1, wherein after creating the left and right virtual cameras in the video extraction device, further comprising initializing a camera pitch of the left and right virtual cameras to be an average value of a pupil distance of both eyes; in response to a latest camera pitch of the VR device, the video extraction device sets the camera pitch of the left and right virtual cameras to the latest camera pitch.

3. The method for extracting video images according to claim 1, wherein the viewport data is updated by using the rendering data of the current page, specifically, according to the height and width of the current page, the RGB values of each pixel point in the page are obtained row by row and column by column, the CPU inputs the RGB values to the GPU in a single thread manner, and the viewport data is updated by using the RGB values.

4. A video extraction device characterized by performing the video image extraction method according to any one of claims 1 to 3, the video extraction device comprising:

the system comprises a creating module, a view port data acquiring module and a view port data acquiring module, wherein the creating module is used for creating left and right virtual cameras and acquiring view port data through the left and right virtual cameras;

the first rendering module is used for creating left and right images according to viewport data acquired by the left and right virtual cameras, and merging and rendering the left and right images to obtain texture image data;

and the first communication module is used for sending the texture image data to the VR equipment.

5. The video extraction device of claim 4, wherein the creation module is further configured to initialize a camera separation distance of the left and right cameras to an average value of a binocular pupillary distance after the left and right virtual cameras are created;

the video extraction device further comprises a setting module for setting the camera pitch of the left and right cameras to the latest camera pitch in response to the latest camera pitch of the VR device.

6. A video image extraction system, comprising:

the video extraction device of one of claims 4-5;

the server comprises a second communication module and is used for forwarding the texture image data of the video extraction equipment to VR equipment;

a VR device including a third communication module and a second rendering module; the third communication module is used for receiving texture image data from the server, and the second rendering module is used for dividing the received texture image data into left and right eye scene images and respectively rendering the left and right eye scene images to left and right virtual cameras of the VR equipment.

7. The video image extraction system of claim 6, wherein the second rendering module specifically comprises:

rendering a submodule: the texture image rendering device is used for acquiring a left-eye scene image and a right-eye scene image according to received texture image data, rendering the left-eye scene image and the right-eye scene image to one texture image and obtaining a target texture image;

an anti-distortion submodule: the method comprises the steps that a screen area visible to human eyes is determined through parameters of a device screen and parameters of lenses, an anti-distortion mesh is constructed based on the screen area visible to human eyes, mesh vertexes of the anti-distortion mesh are determined, and the mesh vertexes after anti-distortion are determined through the mesh vertexes of the anti-distortion mesh and a drawing view port of a screen of a target terminal;

the rendering submodule is further used for determining an image subjected to inverse distortion processing through the mesh vertex subjected to inverse distortion and the target image, dividing the image subjected to inverse distortion processing into a left inverse distortion image and a right inverse distortion image, and rendering the left inverse distortion image and the right inverse distortion image to left and right screens of VR equipment respectively.