CN116112716B

CN116112716B - Virtual person live broadcast method, device and system based on single instruction stream and multiple data streams

Info

Publication number: CN116112716B
Application number: CN202310394871.5A
Authority: CN
Inventors: 王英; 陈若含
Original assignee: 4u Beijing Technology Co ltd
Current assignee: Shiyou Beijing Technology Co ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-06-09
Anticipated expiration: 2043-04-14
Also published as: CN116112716A

Abstract

The application provides a virtual person live broadcast method, device and system based on single instruction stream and multiple data streams, wherein the method comprises the following steps: the dynamic capturing device is used for collecting dynamic capturing data of a real anchor for driving a virtual person to carry out live broadcast in real time; dividing the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged, and rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet; and pushing the live broadcast data packet to terminal equipment. The live broadcast processing method and device solve the technical problem that the live broadcast is blocked due to the fact that the dynamic capturing data processing speed is low when the virtual person live broadcast.

Description

Virtual person live broadcast method, device and system based on single instruction stream and multiple data streams

Technical Field

The application relates to the technical field of virtual live broadcasting, in particular to a virtual live broadcasting method, device and system based on single instruction stream and multiple data streams.

Background

The virtual person live broadcasting technology is a technology for live broadcasting by using a virtual person generated by a computer. Virtual characters are usually created by three-dimensional modeling software, and motion data of the virtual characters are acquired in real time by using an dynamic capture technology, so that realistic motions can be simulated. The virtual man live broadcast technology is generally used in the fields of game live broadcast, virtual anchor and the like.

The core of the virtual live broadcasting technology is a dynamic capturing technology. The dynamic capturing technology is to acquire human body action information in real time through sensors arranged on key points of a human body, then transmit the information to a computer, and the computer converts sensor data into three-dimensional animation through an algorithm to realize accurate restoration of human body actions. The dynamic capturing technology can acquire action data in real time, can accurately restore human actions, and is one of key technologies of virtual live broadcasting technology.

Another key technology of virtual live technology is real-time rendering. Real-time rendering refers to the process of converting a three-dimensional model into an image. The live virtual person broadcasting technology needs to render actions of virtual persons in real time, so that real-time display of the virtual persons is achieved. Real-time rendering requires efficient computational resources and excellent algorithm support to achieve high quality rendering effects.

Because the virtual live broadcasting technology needs to process a large amount of dynamic capturing data, when the speed of processing the data cannot keep up with the speed generated by the data, the live broadcasting is blocked. The jamming can affect the viewing experience of the viewer, reducing the quality and appeal of the live broadcast.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a live broadcasting method, device and system of a virtual person based on single instruction stream and multiple data streams, which at least solve the technical problem that live broadcasting is blocked due to low dynamic capturing data processing speed when the virtual person is live broadcasting.

According to an aspect of the embodiment of the application, there is provided a virtual live broadcasting method based on single instruction stream and multiple data streams, including: the dynamic capturing device is used for collecting dynamic capturing data of a real anchor for driving a virtual person to carry out live broadcast in real time; dividing the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged, and rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet; and pushing the live broadcast data packet to terminal equipment.

According to another aspect of the embodiments of the present application, there is further provided a virtual live broadcast apparatus based on single instruction stream multiple data streams, including: the acquisition module is configured to acquire dynamic capturing data of a real anchor in real time, which is used for driving a virtual person to carry out live broadcast, through the dynamic capturing device; the segmentation module is configured to segment the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged; a rendering module configured to render the virtual person based on the plurality of data blocks, resulting in a live data packet; and the push flow module is configured to push the live broadcast data packet to the terminal equipment.

According to another aspect of the embodiment of the application, there is further provided a virtual live system based on single instruction stream multiple data streams, including, a virtual live device as described above; and the terminal equipment is used for presenting the live broadcast data packet pushed by the virtual live broadcast device.

In the embodiment of the application, dynamic capturing data of a real anchor for driving a virtual person to carry out live broadcast are acquired in real time through a dynamic capturing device; dividing the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged; rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet; the live broadcast data packet is pushed to the terminal equipment, so that the technical problem that the live broadcast is blocked due to the fact that the live broadcast capturing data processing speed is low when the virtual person live broadcast is solved, and the live broadcast capturing data packet has the beneficial effects that the blocking of the live broadcast of the virtual person is avoided, and user experience is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a method of virtual live streaming based on single instruction stream multiple data streams in accordance with an embodiment of the present application;

FIG. 2 is a flow chart of another method of virtual live streaming based on single instruction stream multiple data streams in accordance with an embodiment of the present application;

FIG. 3 is a flow chart of a method of rendering cartoon virtual humans according to an embodiment of the present application;

FIG. 4 is a flow chart of a method of creating a view cone according to an embodiment of the present application;

FIG. 5 is a flow chart of a method of computing a minimum bounding box according to an embodiment of the present application;

FIG. 6 is a flow chart of a method of computing depth according to an embodiment of the present application;

FIG. 7 is a flow chart of a method of generating a shadow map according to an embodiment of the present application;

FIG. 8 is a schematic architecture diagram of a single instruction stream multiple data stream based virtual live system in accordance with an embodiment of the present application;

fig. 9 is a schematic structural diagram of a single instruction stream multiple data stream based virtual live device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure, in accordance with an embodiment of the present application.

Wherein the above figures include the following reference numerals:

1001. a CPU; 1002. a ROM; 1003. a RAM; 1004. a bus; 1005. an I/O interface; 1006. an input section; 1007. an output section; 1008. a storage section; 1009. a communication section; 1010. a driver; 1011. a removable medium; 100. a virtual person live broadcast system; 101. a first terminal device; 102. a second terminal device; 103. a third terminal device; 104. a network; 105. a server; 106. a dynamic catching device; 1062. a dynamic catching helmet; 1064. dynamic catching clothes; 1066. a dynamic catching glove; 92. an acquisition module; 94. a segmentation module; 96. a rendering module; 98. and the plug flow module.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Example 1

The embodiment of the application provides a live broadcast method of a virtual anchor, as shown in fig. 1, comprising the following steps:

step S102, dynamic capturing data of a real anchor for driving a virtual person to conduct live broadcast are collected in real time through a dynamic capturing device.

First, a dynamic catching device is needed to be prepared, and the dynamic catching device generally comprises a dynamic catching helmet, a dynamic catching glove, a dynamic catching garment and the like. These dynamic capture devices need to be mounted on various parts of the body of a real anchor in order to capture their movements. Before starting to collect the dynamic capture data, the dynamic capture device needs to be calibrated. For example, the reality anchor is allowed to perform a series of actions so that the system can determine the position and orientation of the dynamic capture device and associate it with the skeletal structure of the virtual person.

After the dynamic capturing device is calibrated, data acquisition can be started. The live anchor performs various actions, and the motion capture device captures their motion and transmits motion capture data to the server. Typically, these dynamic capture data are converted to digital signals or skeletal animation and transmitted to a server. In some embodiments, to ensure accuracy of the dynamic capture data, the dynamic capture data may also be processed, e.g., to remove noise, fill in data loss, etc.

Step S104, based on a single instruction stream multi-data stream technology, dividing the dynamic capture data to obtain a plurality of data blocks which are sequentially arranged.

Firstly, dividing the dynamic capture data based on a preset time period to obtain a plurality of candidate data blocks.

Then, based on the single instruction stream multiple data stream technique, processing two adjacent candidate data blocks in the candidate data blocks in parallel by multithreading or multiple processes to obtain the data blocks.

Specifically, two adjacent candidate data blocks in the plurality of candidate data blocks are allocated to one idle thread in the multithreading or to one idle process in the multithreading.

Then, judging the action continuity between the two adjacent candidate data blocks through the distributed threads or processes, for example, respectively extracting key frames from the two adjacent candidate data blocks, and extracting action sequence features from the key frames of the two adjacent candidate data blocks to obtain a first action sequence feature and a second action sequence feature; calculating the similarity of key frames of the two adjacent candidate data blocks based on the extracted action sequence features; based on the similarity, it is determined whether the actions of the two adjacent candidate data blocks are consecutive.

For example, based on the first action sequence feature and the second action sequence feature, a minimum cost path is found between key frames of the two adjacent candidate data blocks, wherein the minimum cost path is a path with the maximum sum of the similarity of sequence elements corresponding to each point on the path; then, based on the minimum cost path, a similarity of key frames of the two adjacent candidate data blocks is determined. And under the condition that the cost of the minimum cost path is smaller than or equal to a preset threshold value, determining that the key frames of the two candidate data blocks are similar, namely that the actions are coherent, or else, determining that the key frames of the two candidate data blocks are dissimilar, namely that the actions are incoherent.

And finally, recombining the plurality of candidate data blocks based on the judging result of the action continuity to obtain the plurality of data blocks. For example, in a case where it is determined that the actions of the adjacent two candidate data blocks are consecutive, merging the adjacent two candidate data blocks as one data block of the plurality of data blocks; and under the condition that the action of the two adjacent candidate data blocks is not consistent, respectively taking the two adjacent candidate data blocks as one data block in the plurality of data blocks.

In the embodiment, a plurality of candidate data blocks are processed in parallel by adopting multithreading or multiprocessing, so that the computing resource can be effectively utilized, and the processing speed and efficiency are improved; by calculating the similarity of two adjacent candidate data blocks, the action continuity can be judged more accurately, so that the precision and accuracy of data processing are improved; the original captured data is divided into a plurality of data blocks, so that the cost of data transmission and storage can be reduced, and the subsequent data processing and management are convenient; the live broadcast system can realize the live broadcast function by collecting live anchor dynamic capturing data in real time and rapidly dividing and processing the live anchor dynamic capturing data into a plurality of data blocks, thereby meeting the requirement of users on real-time performance; the continuity actions of the dynamic capturing data are combined into one data block to perform data processing, so that the actions of a virtual anchor can be more natural and smooth, and the experience and satisfaction of a user for watching live broadcast are improved.

Step S106, rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet;

first, a virtual human model is created, including creating skeletal structures, adding muscles and skin elements, etc., to ensure that the virtual human model matches the dynamics of real human motion.

Then, animation rendering is performed. Animation rendering is performed using the virtual person model and the plurality of data blocks to create dynamic virtual person motions. This can be achieved by using techniques in computer graphics such as skeletal animation, skinning, texture mapping, etc.

Finally, the rendered animation is converted into a format of a live data packet for real-time transmission through a network. This may be accomplished by using a video encoder to encode the rendered image of the virtual mannequin into a video stream and packetize it into data packets for real time transmission.

Live video is in many cases a cartoon virtual (also called cartoon character), which involves cartoon rendering. Cartoon rendering (ton shading) aims to make three-dimensional computer graphics look like hand-drawn cartoon animations. Unlike conventional rendering techniques, cartoon rendering emphasizes lines and simple shadows, often using bright, vivid colors and flat textures. Cartoon rendering under direct illumination will be described with emphasis. For example, rendering may be performed using the following method:

1) And determining the maximum value t1 and the minimum value t0 of the gradual change range of each shadow point aiming at each shadow point of the cartoon image to be rendered under direct illumination.

For example, calculating an angle between each shadow point and a light source for direct illumination based on the position of each shadow point; the maximum value t1 and the minimum value t0 of the gradation range of each shadow point are determined based on the angle between each shadow point and the light source for direct illumination.

By calculating the maximum value and the minimum value by the method, the contour line of the object can be enhanced. In cartoon rendering, contour lines are very important. By determining the gradual change range, the color gradual change of the shadow part is more natural, the contour line of the object is enhanced, and the cartoon image is clearer. In addition, the illumination effect can be enhanced. Cartoon rendering typically uses a simplified illumination model, whereas direct illumination is one of the most basic. By determining the gradual change range, the illumination effect can be enhanced, so that the cartoon image is more real. Finally, expressive force can also be enhanced. Cartoon characters are often required to express some specific emotion or meaning. By determining the gradient range, the cartoon images can be more abundant and expressive, and the audience can more easily understand the meaning to be expressed.

2) And calculating the current position x of the gradual change based on the maximum value t1, the minimum value t0, the illumination direction L and the normal direction N.

In some examples, the illumination direction L and the normal direction N of each shadow point and the light source may be calculated based on the position of each shadow point and the position of the light source for direct illumination; the current position x of the fade is calculated based on the maximum value t1, the minimum value t0, the unit vector of the illumination direction L, and the unit vector of the normal direction N. For example, the current position x of the fade is calculated based on the following formula: x= (unit vector of illumination direction L. Unit vector of normal direction N-t 0)/(t 1-t 0).

The embodiment can generate the rendering effect similar to the cartoon through the simple and quick rendering method. In this embodiment, by calculating the illumination direction and the normal direction of each shadow point and then calculating the current position of the gradation by combining the maximum value, the minimum value, the illumination direction and the normal direction, the gradation effect of the shadow can be generated, thereby increasing the stereoscopic impression of the scene. Furthermore, this embodiment has the advantage of easy implementation and computation, since only simple vector computation and interpolation are involved, without the need for complex ray tracing or shadow mapping. This enables real-time rendering on mobile devices and low power consumption devices.

3) The color Pcolor of each shadow point is calculated based on the current position x of the fade and the color of the illumination.

For example, the color Pcolor of each shadow point may be calculated based on the following formula:

Pcolor =x ² (3-2 x) the colour of the illumination.

In this embodiment, by calculating the color of each shadow point using a quadratic interpolation function, a smooth color gradation can be generated, thereby generating a rendering effect similar to a cartoon or hand-drawn style, increasing the readability and artistic sense of the scene, and also increasing the stereoscopic sense and visual appeal. Furthermore, by the above method there is also the advantage of high computational efficiency, since it involves only simple mathematical calculations and color interpolation, without the need for complex texture mapping or shading techniques. This makes it suitable for real-time rendering on mobile devices and low power devices.

4) And rendering the shadow of the cartoon image to be rendered under direct illumination based on the color PColor of each shadow point.

In the embodiment, the shadow rendering is calculated based on the illumination color and the gradual change function, and is consistent with the illumination and the color of the cartoon image, so that a more real cartoon rendering effect can be generated, and the stereoscopic impression and the depth impression of the cartoon image are enhanced.

Step S108, pushing the live broadcast data packet to a terminal device.

Example 2

The embodiment of the application provides another virtual live broadcasting method based on single instruction stream and multiple data streams, as shown in fig. 2, the method comprises the following steps:

step S202, based on a single instruction stream multi-data stream technology, dividing dynamic capture data acquired in real time to obtain a plurality of data blocks which are sequentially arranged.

When the dynamic capture data is acquired, the acquisition of one complete action into a plurality of actions should be avoided as much as possible. This can lead to inaccuracy in the data analysis and can increase the difficulty of data processing. Thus, the acquired dynamic capture data needs to be processed.

It is first necessary to determine how many parts the entire dynamic capture data is divided into. The divided time period may be determined according to actual requirements, for example, the divided time period may be divided according to the stage of the action, or divided according to factors such as the time length. Once the time periods to be divided are determined, it is necessary to determine the start and end points of each time period. The setting of the division point can be automated according to the data characteristics, for example, the division point can be determined by detecting a specific frame of action, or the division can be performed by setting a fixed time interval. After the start and end points of each time period are determined, the data sequence may be divided into a plurality of data blocks. Each data block contains all data within a time period.

Assigning two adjacent candidate data blocks in the plurality of candidate data blocks to one idle thread in the multithread or to one idle process in the multithread; and judging the action continuity between the two adjacent candidate data blocks through the distributed threads or processes, and recombining the plurality of candidate data blocks based on the judgment result of the action continuity to obtain the plurality of data blocks.

For example, key frames are respectively extracted from the two adjacent candidate data blocks, and action sequence features are respectively extracted from the key frames of the two adjacent candidate data blocks to obtain a first action sequence feature and a second action sequence feature; calculating the similarity of key frames of the two adjacent candidate data blocks based on the first action sequence feature and the second action sequence feature; based on the similarity, it is determined whether the actions of the two adjacent candidate data blocks are consecutive.

In this embodiment, a path with the smallest cost is found between two sequences, so that the sum of the similarity of sequence elements corresponding to each point on the path is the largest. Specifically, similarity algorithms are used to calculate the similarity of key frame sequences of two adjacent candidate data blocks and obtain their matching paths. And calculating whether actions of two adjacent candidate data blocks are consistent according to the result of the matching path. If the matching path costs of two key frame sequences are small, indicating that their actions are relatively similar, the actions between the two candidate data blocks can be considered to be coherent; if the matching paths are costly, indicating that their actions are dissimilar, the action between the two candidate data blocks may be considered to be disjoint.

Assume that there are two key frame sequences, a and B, respectively, with lengths N and M, respectively. A cost matrix D of NxM is used to calculate the matching paths of the two sequences. The (i, j) th element in matrix D represents the distance between the i-th element in sequence a and the j-th element in sequence B. Different distance measurement methods may be used to calculate the distance between elements, such as euclidean distance, manhattan distance, etc.

By dynamic programming, all elements in the cost matrix D, as well as the minimum cost path from the start point (1, 1) to the end point (N, M), can be calculated. In calculating the minimum cost path, an accumulated cost matrix C is defined, which has the same size as the cost matrix D. The (i, j) th element in the cumulative cost matrix C represents the minimum cumulative cost from the start point (1, 1) to the point (i, j).

The calculation process of the minimum cost path can be divided into the following steps: the first row and first column of the cumulative cost matrix C are initialized. For the first row and first column elements they can only be obtained by moving right or down from the starting point (1, 1). Thus, the element values of the first row and the first column in the cumulative cost matrix C are equal to the cumulative sum of the element values in the corresponding cost matrix D. The values of the remaining elements in the cumulative cost matrix C are calculated. For each non-first row, non-first column element (i, j) in C, it can be calculated by the following formula:

C(i,j) = D(i,j) + min(C(i-1,j), C(i,j-1), C(i-1,j-1))

Wherein min represents the minimum of three numbers.

A minimum cost path is calculated from the start point (1, 1) to the end point (N, M). The minimum cost path may be obtained by back-tracking the elements in the cumulative cost matrix C. Specifically, starting from the end point (N, M), backtracking is performed along the path of least cumulative cost to the start point (1, 1) until the start point is reached. In the backtracking process, the path traversed and the corresponding cost value may be recorded.

The minimum path cost may be used to measure the similarity of two key frame sequences. The smaller the cost, the more similar the two sequences are; the larger the cost, the less similar they are.

When judging the action continuity of two adjacent candidate data blocks, the cost value can be compared with a threshold value to judge the similarity of two key frame sequences. If the cost value is less than or equal to the threshold, the actions of the two key frame sequences are considered to be similar, and the actions between the two candidate data blocks can be considered to be coherent; if the cost value is greater than the threshold, then the actions are not similar, and the action between the two candidate data blocks can be considered to be discontinuous.

Step S204, rendering the virtual person based on the plurality of data blocks, to obtain a live data packet.

The rendering method of the virtual cartoon man in the case of direct illumination of the light source will be described in detail below. The embodiment improves the resolution and quality of the shadow by dividing the scene into a plurality of cascade layers and using different shadow maps for each cascade layer, thereby improving the rendering efficiency and visual effect of the shadow. As shown in fig. 3, the method for rendering the cartoon virtual person provided in the embodiment includes the following steps:

in step S302, a view cone is created.

A view cone is created that covers the field of view of the camera and the distance between the distal section and the light source location is fixed. The view cone is referred to as a "cascade" and may contain multiple cascade layers. Specifically, the method for creating a view cone is shown in fig. 4, and includes the following steps:

in step S3021, the view cone of the camera is determined.

A view cone is a geometric body of objects visible within the camera's view angle. Typically, the viewing cone consists of 6 planes, namely a near plane, a far plane, a left plane, a right plane, an upper plane and a lower plane. The near plane and the far plane are defined by the near and far clipping planes of the camera view cone. The left plane, right plane, upper plane and lower plane are calculated from the camera position and view angle. The intersection of these planes defines the 8 corner points of the view cone.

In step S3022, the light source position and direction are determined.

The light source may be a point light source or a directional light source. The position of the point light source is a fixed point in space. The directional light source has no position and only a direction.

Step S3023, determining the distant section position of each cascade according to the number and the distance of the cascade.

For each cascade, the distance from the far clipping plane of the camera view cone is fixed. Typically, the distance is calculated from the camera position and scene size.

Step S3024, determining each cascaded view cone according to the distal section position.

Each cascaded view cone is calculated from the camera position and orientation. Its far plane is the far plane of the previous cascade, while the near plane is the far plane of the cascade.

In step S3025, for each cascade, an axis alignment bounding box AABB that encloses the entire scene is calculated.

The axis alignment bounding box AABB contains all visible objects and is the sum of the distances from the position of the cascade far section to the scene AABB.

Step S3026, save each of the concatenated view cones.

Finally, each concatenated view cone is saved for use in later steps. Each cascaded view-cone is used to crop objects in the scene and generate a shadow map.

In the embodiment, the scene is divided into a plurality of cascade layers, and each cascade layer uses a different shadow map, so that the resolution of the shadow map can be effectively reduced, and the rendering performance and quality are improved. In addition, the calculation of the far cross-sectional position and AABB of each cascade layer also helps to optimize the shadow rendering process, making it more efficient and accurate.

Step S304, calculating a minimum bounding box.

For each cascade, a minimum bounding box (AABB) is calculated that contains all objects in the scene. Specifically, as shown in fig. 5, the steps may be as follows:

in step S3042, the position and direction of the light source are determined.

The light source may be a point light source or a directional light source. The position of the point light source is a fixed point in space. The directional light source has no position and only a direction. The position and orientation of the light source is determined according to the type of light source and the specific scene. In the case of a point light source, the position of the point light source needs to be determined. If it is a directional light source, the direction of the directional light source needs to be determined.

In step S3043, an observation matrix is calculated.

The observation matrix describes the position and orientation of the scene relative to the light source. The observation matrix is calculated from the light source position and orientation. The observation matrix transforms the scene coordinate system into the light source space.

In step S3044, a projection matrix is calculated.

The projection matrix is used to project objects in the scene into the shadow map. The projection matrix is calculated from the width and height of the observation matrix and the shadow map.

In step S3045, the view projection matrix of the light source is obtained by multiplying the observation matrix and the projection matrix.

The view projection matrix transforms the scene coordinate system into a shadow map coordinate system.

In step S3046, a minimum bounding box is calculated for each cascade that contains all objects in the scene.

The minimum bounding box for each cascade containing all objects in the scene is calculated using the view projection matrix of the light sources.

Through the steps, the position and the direction of the light source are determined by the cascade shadow mapping algorithm, and an observation matrix, a projection matrix and a view projection matrix of the light source are calculated. These matrices will be used in later steps to calculate the shadow map and project objects in the scene into the shadow map.

In this embodiment, the efficiency and accuracy of the algorithm can be improved by calculating the minimum bounding box (AABB) containing all objects in the scene in each cascade. The minimum bounding box can be used to determine the object for which shadows need to be calculated, reducing unnecessary computation. By calculation of the observation matrix and the projection matrix, the position of each pixel in the shadow map can be accurately determined, so that the shadow of the object can be calculated.

Step S306, determining a clipping plane.

The near plane of each cascade of view cones is taken as the clipping plane and this plane is used to clip all objects in the scene to ensure that only objects visible in the current cascade will generate shadows.

Step S308, rendering textures.

For each cascade, the near-planar depth texture of its view cone is rendered onto a piece of texture. This texture, i.e. the shadow map, is used to store the depth values of the objects in the current cascade as seen from the light source perspective. The object is then projected into the shadow map. Objects in a scene are rendered into a shadow map, and the objects may be projected into the respective cascade using a projection matrix of each cascade. In this process, a depth test is required to ensure that only the nearest object in the shadow map is rendered.

In this embodiment, through the above steps, the rendering performance is improved. By projecting objects into the shadow map, the number of objects that need to be considered in scene rendering can be reduced. In rendering a scene, only objects projected into the current cascade need be considered, and other objects not in the cascade need not be considered. This may improve rendering performance. At the same time, the shadow effect is improved. The shadow map may store depth values of objects in the current cascade as seen from the light source perspective. In this way, the intensity and position of the shadow can be calculated from the depth values. The intensity and position of the shadow can be calculated from the depth value and the position and direction of the light source, thereby improving the shadow effect. In addition, shadow effects of distant objects can be supported. Since the distances of objects from the light source are different, different cascades need to be used at different distances to handle the shadow effect. Shadow effects of distant objects can be supported using multiple cascades. Finally, it is also possible to adapt the cascade size and position. The size and position of each cascade is adaptively calculated based on the distance and scene size. This ensures that each cascade contains only objects in the current scene, thereby improving rendering performance and shadow effect.

In step S310, a matrix is calculated.

The projection matrix of each cascaded camera projection matrix and the projection matrix of the view angle of the light source are calculated and stored in a transformation matrix array.

First, a large shadow map texture is created for storing all cascaded shadow information. Each cascaded shadow map is copied into a large shadow map pattern. This can be done by drawing each cascaded shadow map to a different region of a large shadow map texture.

Next, the offset for each cascade is calculated. Since the size and position of the shadow map of each cascade is different, the offset of each cascade relative to the large shadow map texture needs to be calculated in order to correctly acquire shadow information in the subsequent rendering. The offset for each cascade is then stored into a constant buffer for use in subsequent rendering.

Through the above steps, each cascade of shadow maps is merged into one large shadow map texture, and the offset of each cascade relative to the large shadow map texture is calculated. This information will be used in subsequent renderings to obtain the correct shadow information.

The present embodiment can form high shadow quality by the above steps. By separating the scene into different cascades, each cascade can have higher resolution and more accurate shadow information. Meanwhile, the shadow information can be more accurate through the camera projection matrix and the projection matrix of the light source visual angle. In addition, rendering performance can be improved. Combining multiple shadow maps into one large shadow map may reduce rendering calls, thereby improving rendering performance. In addition, according to the position and the size of each cascade, the projection matrix is calculated adaptively, so that the shadow mapping space can be prevented from being wasted in a place with less remote details, and the performance is further improved. Finally, it is also possible to adapt itself. By calculating the offset for each cascade and storing it in a constant buffer, the shading information can be adapted in a later rendering to accommodate changes in the position of the camera and objects in the scene. This makes the algorithm more flexible and adaptable to a variety of different scenarios.

In summary, the present embodiment can improve shadow quality and rendering performance by dividing a scene into a plurality of cascades, adaptively calculating a projection matrix and an offset, and merging shadow map textures, and the like, and adapt to requirements of different scenes.

In step S312, the depth is calculated.

For each object, its depth at the light source perspective is calculated separately in each cascade and compared to the depths in the shadow map to determine whether the object is covered by a shadow. In this step, shadow information needs to be calculated using the merged shadow map texture and the offset for each cascade and applied to objects in the scene. As shown in fig. 6, the method comprises the steps of:

in step S3122, for each pixel, its coordinates in the large shadow map are calculated.

For each pixel, its coordinates in the large shadow map texture are calculated and converted to texture coordinates. This can be done by multiplying the coordinates of the pixels by a texture coordinate scaling factor and adding an offset to each cascade.

In step S3124, a depth value is calculated.

In large shadow tiling, depth values of several surrounding pixels are acquired. These pixels are typically located around the current pixel, so a two-dimensional convolution filter may be used to obtain the depth values of these pixels.

In step S314, a shadow map is generated.

For objects covered by shadows, shadow maps are generated in each cascade and combined together to form the final shadow map. The embodiment adopts a filtering algorithm, and the filtering algorithm is used for reducing jagged edges of shadows and enhancing smoothness of the shadows. As shown in fig. 7, the method includes the steps of:

in step S3142, an average value between the depth values is calculated.

An average value between the current pixel and the depth values of these pixels is calculated. This can be done by adding the depth values of all pixels and dividing the result by the number of pixels.

In step S3144, the distance between the current pixel and the light source is calculated.

The distance between the current pixel and the light source is calculated. This may be done by converting the coordinates of the current pixel into the light source space and calculating its distance to the light source position.

In step S3146, the deviation is calculated.

A deviation between the current pixel and the depth value in the shadow map is calculated. If the depth value of the current pixel is smaller than the depth value in the shadow map, the pixel is in the shadow, otherwise it is in the illumination area.

Step S3148, performing blurring processing, and achieving a shadow effect.

And carrying out fuzzy processing on the shadow information according to the magnitude of the deviation and the setting of the filter radius. This may be achieved by some fuzzy filtering algorithm, such as gaussian fuzzy, average fuzzy, etc. Shadow information is applied to objects in the scene. For pixels in the shadow, its color may be set to black, or some shadow transparency may be used to achieve the shadow effect.

For example, for each shadow point of the cartoon character to be rendered under direct illumination, determining a maximum value t1 and a minimum value t0 of the gradual change range of each shadow point; calculating a gradual change current position x based on the maximum value t1, the minimum value t0, the illumination direction L and the normal direction N; the color Pcolor of each shadow point is calculated based on the current position x of the fade and the color of the illumination.

In some embodiments, the following more comprehensive factors may also be considered in order to make the shadow points more consistent with the characteristics of the cartoon animation when rendering the cartoon character. In calculating the shadow spot color, the effect of the color of the light source on the shadow spot may be considered and a factor may be used to adjust the effect of the color of the light source on the shadow spot color. The reflectivity K of the object surface also affects the color of the shadow spot, and a factor can be used to adjust the effect of the reflectivity of the object surface on the color of the shadow spot. The color and intensity of the ambient light also affects the color of the shadow spot, and a factor can be used to adjust the effect of the color and intensity of the ambient light on the color of the shadow spot. The shadows of the cartoon character are usually black or dark grey, but a factor can be used to adjust the shade of the shadow color to achieve different effects. The roughness R of the object surface also affects the color of the shadow spot, and a coefficient can be used to adjust the effect of the roughness of the object surface on the color of the shadow spot. In addition, the reflectance Ka of ambient light also affects the color of the shadow spot, and a factor can be used to adjust the effect of the reflectance of ambient light on the color of the shadow spot.

For example, the following formula may be employed to calculate the color of the shadow point:

Pcolor = (Il * Cl * K * (1 - x) + Ia *Ca * Ka) * (1 - R) * Cs

wherein, PColor is the color of the shadow point; il is the intensity of the light source; cl is the color of the light source; k is the reflectivity of the surface of the object; x is the current position of gradual change; ia is the intensity of ambient light; ca is the color of ambient light; ka is the reflectivity of ambient light; r is the roughness of the surface of the object; cs is the color of the shade.

Through the method, the cartoon style can be enhanced, the cartoon animation generally has obvious visual styles, one of the cartoon animations is a dark shadow point, and the color of the shadow point can be better calculated by using the method, so that the cartoon style is more accordant, and the visual style of the cartoon animation is enhanced. The method can better calculate the color of the shadow point, so that the shadow point is closer to the shadow effect of the cartoon world, and the sense of reality of the picture is increased. Finally, the expressive force of the picture can also be enhanced. Cartoon animations often have a more pronounced emotional expressive force, where the color of the picture is also an important means of representing emotion. The method can better calculate the color of the shadow point, so that the color of the picture is more expressive, and the expressive force of the picture is enhanced.

Example 3

Fig. 8 illustrates an architecture diagram of a live virtual system based on single instruction stream multiple data streams according to an embodiment of the present application, as illustrated in fig. 8, the live virtual system 100 may include a plurality of terminal devices (e.g., one or more of a first terminal device 101, a second terminal device 102, and a third terminal device 103), a network 104, a server 105, and a dynamic capture device 106.

The network 104 is used as a medium for providing communication links between the terminal devices and the server 105, and between the dynamic capture device 106 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal device may be a variety of electronic devices with a display screen including, but not limited to, a desktop computer, a portable computer, a smart phone, a tablet computer, and the like. It should be understood that the number of terminal devices, networks, dynamic capture devices, and servers in fig. 8 are merely illustrative. There may be any number of terminal devices, networks, dynamic capture devices, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The dynamic capture device 106 is used for collecting dynamic capture data in real time, and sending the dynamic capture data to the server 105 via the network 104. The dynamic capture device 106 may include one or more of a dynamic capture helmet 1062, a dynamic capture suit 1064, and a dynamic capture glove 1066, among others.

The dynamic capture helmet 1062 is provided with a camera that takes up to 60 frames/second, and is capable of capturing rapid lip movements, blink movements, and facial twitches and shakes. Furthermore, the dynamic-catching helmet 1062 in the present embodiment is of an open structure so that air can circulate therein, thereby enabling a person wearing the dynamic-catching helmet 1062 to more comfortably perform an operation. The dynamic capture helmet 1062 may be connected to a dedicated data line, and may be extended by a connection enhanced USB extension if the data line is not long enough.

The dynamic suit 1064 is composed of inertial sensors, control boxes, lycra fabric, etc. The dynamic capture suit 1064 in this embodiment is provided with 17 sensors, which can track the movements of 23 different body links simultaneously, the tracking locations including feet, lower legs, knees, abdomen, hands, elbows, shoulders, etc. The dynamic capturing suit 1064 in this embodiment can meet the strict requirements of motion capturing and animation design by such a structure, and has the advantages of simple use, comfortable wearing and high data quality. In other embodiments, the trackable markers may also be placed on the dynamic capture suit 1064 to capture the motion profile of the person or other object wearing the dynamic capture suit 1064. For example, retroreflective markers may be placed and tracked by a tracking device such as an infrared camera.

The dynamic capture glove 1066 is composed of an inertial sensor, elastic fabric, a hand motion capture system, etc. In this embodiment, 12 high-performance nine-axis inertial sensors are disposed on the dynamic capture glove 1066, the gesture update frequency is 120Hz, the number of the collection points is 12 nodes, the static precision is 0.02 degrees, the dynamic precision is 0.2 degrees, the resolving frequency is about 1000Hz, and the data delay is 30ms.

After receiving the dynamic capture data, the server 105 executes the live virtual person broadcasting method provided by the embodiment of the disclosure, and segments the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged; and rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet, and pushing the live broadcast data packet to the terminal equipment.

The live virtual reality method provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the live virtual reality device is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the live virtual person method provided in the embodiment of the present disclosure may be executed by a terminal device, so as to provide live virtual person services for other terminal devices, and accordingly, the live virtual person device may also be provided in the terminal device, which is not particularly limited in this exemplary embodiment.

In some exemplary embodiments, a user may enter a live broadcast room through an application program on a terminal device, and the server 105 generates a live broadcast data packet through the virtual live broadcast method provided by the embodiments of the present disclosure, and transmits the live broadcast data packet to the terminal device.

Example 4

The embodiment of the application provides a virtual live broadcast device based on single instruction stream and multiple data streams, which comprises an acquisition module 92, a segmentation module 94, a rendering module 96 and a push module 98 as shown in fig. 9.

The acquisition module 92 is configured to acquire live capturing data of a live host for driving a virtual person to live broadcast in real time through a live capturing device; the segmentation module 94 is configured to segment the dynamic capture data based on a single instruction stream multiple data stream technique, so as to obtain a plurality of data blocks which are arranged in sequence; rendering module 96 is configured to render the virtual person based on the plurality of data blocks, resulting in a live data packet; the push module 98 is configured to push the live data packets to the terminal device.

It should be noted that: in the live virtual device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the virtual person live broadcasting device provided in the above embodiment and the virtual person live broadcasting method embodiment belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are described herein.

Example 5

Fig. 10 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 10 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device includes a Central Processing Unit (CPU) 1001 that can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When executed by a Central Processing Unit (CPU) 1001, performs the various functions defined in the methods and apparatus of the present application. In some embodiments, the electronic device may further include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps of the method embodiments described above, and so on.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed terminal device may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. The virtual person live broadcasting method based on single instruction stream and multiple data streams is characterized by comprising the following steps of:

the dynamic capturing device is used for collecting dynamic capturing data of a real anchor for driving a virtual person to carry out live broadcast in real time;

dividing the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged;

rendering the virtual person based on the plurality of data blocks to obtain a live broadcast data packet;

pushing the live broadcast data packet to terminal equipment;

the method for dividing the dynamic capture data based on the single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged comprises the following steps:

Dividing the dynamic capture data based on a preset time period to obtain a plurality of candidate data blocks;

based on the single instruction stream multi-data stream technology, processing two adjacent candidate data blocks in the candidate data blocks in parallel by multithreading or multi-process to obtain the data blocks; wherein the multithreading or multiprocessing processes two adjacent candidate data blocks in the plurality of candidate data blocks in parallel to obtain the plurality of data blocks, includes:

assigning two adjacent candidate data blocks in the plurality of candidate data blocks to one idle thread in the multithread or to one idle process in the multithread;

and judging the action continuity between the two adjacent candidate data blocks through the distributed threads or processes, and recombining the plurality of candidate data blocks based on the judgment result of the action continuity to obtain the plurality of data blocks.

2. The method of claim 1, wherein determining the action continuity of the two neighboring candidate data blocks comprises:

respectively extracting key frames from the two adjacent candidate data blocks to obtain two key frame sequences, and respectively extracting action sequence features from the two key frame sequences to obtain a first action sequence feature and a second action sequence feature;

Determining a similarity of the two key frame sequences based on the first action sequence feature and the second action sequence feature;

based on the similarity, it is determined whether the actions of the two adjacent candidate data blocks are consecutive.

3. The method of claim 2, wherein determining the similarity of the two key frame sequences based on the first action sequence feature and the second action sequence feature comprises:

based on the first action sequence feature and the second action sequence feature, finding a minimum cost path between the two key frame sequences;

and determining the similarity of the two key frame sequences based on the minimum cost path.

4. A method according to claim 3, wherein finding a minimum cost path between the two key frame sequences based on the first action sequence feature and the second action sequence feature comprises:

constructing a cost matrix based on the first action sequence feature and the second action sequence feature, wherein each element in the cost matrix represents a similarity score between elements at corresponding positions in the first action sequence feature and the second action sequence feature;

Calculating the sum of the sequence element similarities corresponding to each point on each path between the two key frame sequences based on the cost matrix;

and taking the path with the maximum sum of the similarity as the minimum cost path.

5. A method according to any one of claims 2 to 3, wherein recombining the plurality of candidate data blocks based on the determination of the action continuity, to obtain the plurality of data blocks, comprises:

merging the two adjacent candidate data blocks as one of the plurality of data blocks if it is determined that the actions of the two adjacent candidate data blocks are consecutive;

and under the condition that the action of the two adjacent candidate data blocks is not consistent, respectively taking the two adjacent candidate data blocks as one data block in the plurality of data blocks.

6. A single instruction stream multiple data stream based virtual live broadcast device, comprising:

the acquisition module is configured to acquire dynamic capturing data of a real anchor in real time, which is used for driving a virtual person to carry out live broadcast, through the dynamic capturing device;

the segmentation module is configured to segment the dynamic capture data based on a single instruction stream multi-data stream technology to obtain a plurality of data blocks which are sequentially arranged;

A rendering module configured to render the virtual person based on the plurality of data blocks, resulting in a live data packet;

the plug flow module is configured to plug flow the live broadcast data packet to a terminal device;

wherein the segmentation module is further configured to:

based on the single instruction stream multi-data stream technology, processing two adjacent candidate data blocks in the candidate data blocks in parallel by multithreading or multi-process to obtain the data blocks;

wherein the segmentation module is further configured to:

7. A single instruction stream multiple data stream based virtual live broadcast system, comprising:

The virtual live appliance of claim 6;

and the terminal equipment is used for presenting the live broadcast data packet pushed by the virtual live broadcast device.

8. A computer readable storage medium, having stored thereon a program, which, when run, causes a computer to perform the method of any of claims 1 to 5.