CN114979789A - Video display method and device and readable storage medium - Google Patents

Video display method and device and readable storage medium Download PDF

Info

Publication number
CN114979789A
CN114979789A CN202110206221.4A CN202110206221A CN114979789A CN 114979789 A CN114979789 A CN 114979789A CN 202110206221 A CN202110206221 A CN 202110206221A CN 114979789 A CN114979789 A CN 114979789A
Authority
CN
China
Prior art keywords
user
video
virtual background
dynamic information
video communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110206221.4A
Other languages
Chinese (zh)
Inventor
沙莎
许显杨
钱靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110206221.4A priority Critical patent/CN114979789A/en
Publication of CN114979789A publication Critical patent/CN114979789A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Abstract

The application discloses a video display method, a video display device and a readable storage medium, wherein the video display method comprises the following steps: the method comprises the steps that a first terminal displays a video communication interface used for providing a video communication function for a first user and a second user in a social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface; when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process; and displaying the updated target virtual background in the video communication interface. By the method and the device, video display modes of video communication can be enriched.

Description

Video display method and device and readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a video display method and apparatus, and a readable storage medium.
Background
With the continuous development of mobile communication technology, intelligent terminals such as mobile phones and tablet computers have occupied a great position in daily life of people. Nowadays, through intelligent terminal, people can carry out real-time video communication anytime and anywhere, have alleviateed people and have communicated the cost.
At present, in the process of using an intelligent terminal for video communication, when a user wants to show some virtual backgrounds, the virtual backgrounds are usually static pictures or simply circulating sequential frame animations, so the video display mode of video communication is single. If the user wants to adjust the virtual background, the user needs to interrupt the video communication for manual adjustment, but generally only can realize simple background switching effect, so that the normal operation and display of the video communication cannot be maintained when the virtual background is updated at present.
Disclosure of Invention
The embodiment of the application provides a video display method, a video display device and a readable storage medium, which can enrich video display modes of video communication and maintain normal operation and display of the video communication while updating a virtual background.
An embodiment of the present application provides a video display method, including:
the method comprises the steps that a first terminal displays a video communication interface used for providing a video communication function for a first user and a second user in a social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;
and displaying the updated target virtual background in the video communication interface.
An aspect of an embodiment of the present application provides a video display apparatus, including:
the first display module is used for displaying a video communication interface for providing a video communication function for a first user and a second user by a first terminal in a social application, and displaying a target virtual background where the first user and the second user are located together in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
the first updating module is used for updating the target virtual background when the user dynamic information of the first user meets the virtual background updating condition to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;
and the second display module is used for displaying the updated target virtual background in the video communication interface.
The first display module is specifically configured to respond to a trigger operation for a background switching control in a video communication interface by a first terminal, and display one or more virtual backgrounds; in response to a selection operation for one or more virtual backgrounds, determining the selected virtual background as a target virtual background; in the video communication interface, an original background where the first user and the second user are located together is switched into a target virtual background.
The video communication interface further comprises a first user picture and a second user picture;
the above-mentioned device still includes:
the first fusion display module is used for shooting key parts of a first user in the video communication process; if the video virtual communication function of the first terminal is in an open state, displaying a video picture for covering a key part of a first user in a video communication interface, and determining the video picture as a first user picture; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying the key part of the second user; and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
Wherein, above-mentioned device still includes:
the second fusion display module is used for shooting the key part of the first user in the video communication process; if the video virtual communication function of the first terminal is in a closed state, displaying a camera video picture in a video communication interface, and determining the camera video picture as a first user picture; the camera shooting video picture refers to a video picture for shooting a key part of a first user; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying the key part of the second user; and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
Wherein, the first updating module comprises:
the first position adjusting unit is used for adjusting the first user picture to the position indicated by the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition; and according to the first user picture after the position adjustment, synchronously adjusting the position of a background element in the target virtual background, which is associated with the first user picture, in a video communication interface to obtain an updated target virtual background matched with the dynamic information of the user.
Wherein, the first updating module comprises:
and the first animation display unit is used for displaying the background animation associated with the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition, and fusing the background animation and the target virtual background to obtain an updated target virtual background matched with the user dynamic information.
Wherein, the first updating module comprises:
and the first size adjusting unit is used for adjusting the size of a background element in the target virtual background, which is associated with the first user picture, according to the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition, so as to obtain an updated target virtual background matched with the user dynamic information.
Wherein, above-mentioned device still includes:
the second updating module is used for acquiring the user dynamic information of the second user and fusing the user dynamic information of the first user with the user dynamic information of the second user to obtain fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process; and when the fused dynamic information meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the fused dynamic information.
The user dynamic information comprises position information corresponding to a key part of a first user;
the above-mentioned device still includes:
the gesture detection module is used for carrying out gesture detection on the first user in the video communication process and acquiring three-dimensional position data corresponding to the key part of the first user; respectively generating a three-dimensional model matrix, a view matrix and a projection matrix associated with the first user according to the geometric relationship between the key part of the first user and a world coordinate system, the position relationship between the key part of the first user and a camera of the first terminal and the size of a screen display area of the first terminal; the world coordinate system is used for describing the key part of the first user and the position of the camera of the first terminal; performing matrix transformation on the three-dimensional position data according to the three-dimensional model matrix, the view matrix and the projection matrix to generate vertex position coordinates corresponding to the key part of the first user, and determining the vertex position coordinates as position information corresponding to the key part of the first user;
and the first condition judgment module is used for determining that the position information has position change if the space coordinate or the rotation angle in the three-dimensional position data has parameter change, and the position information after the position change meets the virtual background updating condition.
The gesture detection module is specifically configured to obtain a geometric relationship between a key part of a first user and an origin and a coordinate axis of a world coordinate system, construct a first translation matrix and a first rotation matrix according to the geometric relationship, and generate a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix; constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix; and obtaining the space parameters of a camera coordinate system according to the size of the screen display area of the first terminal, and constructing a projection matrix associated with the first user according to the space parameters.
Wherein, the first updating module comprises:
the second position adjusting unit is used for determining a displacement distance and a deflection angle according to the position information after the position change in the user dynamic information in the video communication interface when the position information after the position change in the user dynamic information meets the virtual background updating condition, and performing displacement deflection processing on the first user picture according to the displacement distance and the deflection angle; in a video communication interface, according to the displacement distance and the deflection angle, carrying out synchronous displacement deflection processing on background elements in the target virtual background and associated with the first user picture to obtain an updated target virtual background matched with the dynamic information of the user.
Wherein the user dynamic information further comprises a facial expression of the first user;
the above-mentioned device still includes:
the expression detection module is used for carrying out expression detection on the first user in the video communication process to acquire the facial expression of the first user;
and the second condition judgment module is used for determining that the dynamic information of the user meets the virtual background updating condition if the facial expression belongs to the target facial expression type.
Wherein, the first update module comprises:
the second animation display unit is used for traversing in the expression animation mapping table to acquire background animation matched with the facial expression when the facial expression in the user dynamic information meets the virtual background updating condition; and displaying the background animation in the video communication interface, and fusing the background animation and the target virtual background to obtain an updated target virtual background matched with the dynamic information of the user.
The user dynamic information further comprises volume data corresponding to the first user;
the above-mentioned device still includes:
the audio detection module is used for acquiring audio data input by a first user, and sampling the audio data to obtain volume data corresponding to the first user;
and the third condition judgment module is used for determining that the dynamic information of the user meets the virtual background updating condition if the volume data is positioned in the volume detection interval.
Wherein, the first updating module comprises:
the second size adjusting unit is used for extracting volume peak-valley values from the volume data when the volume data in the user dynamic information meets the virtual background updating condition, and constructing a scaling matrix associated with the first user according to the volume peak-valley values; the scaling matrix is composed of at least two scaling coefficients in different stretching directions; in a video communication interface, the size of a background element associated with a first user picture in a target virtual background is adjusted according to at least two scaling coefficients in a scaling matrix, and an updated target virtual background matched with user dynamic information is obtained.
The user dynamic information also comprises the real scene information of the environment where the first user is located;
the above-mentioned device still includes:
the environment detection module is used for acquiring video data of the environment where the first user is located and extracting the real scene information of the environment where the first user is located from the video data; the live-action information comprises one or more of the brightness, the color composition or the key environment object of the environment where the first user is located;
and the fourth condition judgment module is used for determining that the dynamic information of the user meets the virtual background updating condition if the environment of the live-action information changes.
Wherein, above-mentioned device still includes:
the image switching module is used for determining that the user dynamic information of the first user meets a user image switching condition and switching the first user image from a video image to a camera video image in a video communication interface if the facial expression of the first user belongs to the target facial expression type and the display duration of the facial expression is greater than a first duration threshold; the video picture for shooting is a video picture for shooting a key part of the first user, or if the volume data corresponding to the first user is located in the volume detection interval and the duration of the volume data in the volume detection interval is greater than a second duration threshold, it is determined that the user dynamic information of the first user meets the user picture switching condition, and the first user picture is switched from the video picture to the video picture for shooting in the video communication interface.
An aspect of an embodiment of the present application provides a computer device, including: a processor, a memory, a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method in the embodiment of the present application.
An aspect of the present embodiment provides a computer-readable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded by a processor and to execute the method in the present embodiment.
An aspect of the embodiments of the present application provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, the computer instructions are stored in a computer-readable storage medium, and a processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method in the embodiments of the present application.
In the embodiment of the application, when the first user and the second user perform video communication through the social application installed in the respective terminal devices, the first terminal held by the first user may show a target virtual background in which the first user and the second user are located together in the video communication interface. Further, in the video communication process, the first terminal can acquire user dynamic information of the first user from audio and video data of the first user, and when the fact that the user dynamic information meets a virtual background updating condition is detected, the target virtual background can be updated, so that an updated target virtual background matched with the user dynamic information is obtained, and the updated target virtual background can be displayed in a video communication interface. Therefore, in the process of video communication, the terminal equipment can automatically acquire the user dynamic information of the current user, and updates the virtual background in the current video communication interface in real time through the acquired user dynamic information without interrupting the video communication and manually operating the video communication by the user, that is, the application can maintain the normal operation and display of the video communication while updating the virtual background, and improves the video communication quality. In addition, the virtual dynamic interactive environment background is constructed, so that the fusion degree of the image and the background of the user can be improved, and the video display mode of video communication is enriched.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram according to an embodiment of the present application;
2 a-2 e are schematic views of a video presentation according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a video display method according to an embodiment of the present application;
fig. 4 is an interface schematic diagram of a video call scene according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a video display method according to an embodiment of the present application;
6 a-6 d are schematic interface diagrams of a video display process provided by an embodiment of the present application;
fig. 7 is a schematic workflow diagram of a video presentation system according to an embodiment of the present application;
8 a-8 d are schematic diagrams illustrating a coordinate transformation provided by an embodiment of the present application;
fig. 9 is a schematic structural diagram of a video display apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include data processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Key technologies for Speech processing Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.
The scheme provided by the embodiment of the application relates to the technologies such as the computer vision technology of artificial intelligence, the voice processing technology and the like, and the specific process is explained by the following embodiment.
Please refer to fig. 1, which is a schematic diagram of a system architecture according to an embodiment of the present disclosure. The system architecture may include a server 100 and a terminal cluster, and the terminal cluster may include: terminal device 200a, terminal device 200b, terminal devices 200c, …, and terminal device 200n, where there may be a communication connection between terminal clusters, for example, there may be a communication connection between terminal device 200a and terminal device 200b, and a communication connection between terminal device 200a and terminal device 200 c. Meanwhile, any terminal device in the terminal cluster may have a communication connection with the server 100, for example, a communication connection exists between the terminal device 200a and the server 100, where the communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may also be directly or indirectly connected through a wireless communication manner, and may also be through other manners, which is not limited herein.
It should be understood that each terminal device in the terminal cluster shown in fig. 1 may be installed with an application client, and when the application client runs in each terminal device, data interaction may be performed with the server 100 shown in fig. 1. The application client can be an application client with a video communication function, such as a social application, an instant messaging application, a live application, a short video application, a music application, a shopping application, a game application, a novel application, a payment application, a browser and the like. The application client may be an independent client, or may be an embedded sub-client integrated in a certain client (e.g., an instant messaging client, a social client, a video client, etc.), which is not limited herein. Taking the social application as an example, the server 100 may include one or more servers such as a background server and a data processing server corresponding to the social application, so that each terminal device may perform data transmission with the server 100 through an application client corresponding to the social application, for example, each terminal device may perform video communication with other terminal devices through the server 100.
For example, each terminal device may display a virtual background in a video communication interface of a social application. It should be understood that, in order to improve the richness of the call background when the user performs video communication in the social application, one or more virtual backgrounds are provided in the social application, and two users performing video communication can select any one of the virtual backgrounds, so that the original background in the video communication interface can be switched to the selected virtual background. It should be understood that the server 100 in the present application may obtain the service data through the applications, for example, the service data may be a virtual background selected by the user (for example, the virtual background is "star", the virtual background includes background elements, and the background elements are associated with background animation). Taking the terminal device 200a and the terminal device 200b as an example, assuming that the user a selects the virtual background "star" through the terminal device 200a, the terminal device 200a may send the virtual background "star" selected by the user a to the server 100, and may call an application client installed in a local social application, and draw the virtual background "star" in the video communication interface of the terminal device 200 a; after the server 100 acquires the service data related to the virtual background "star", the service data related to the virtual background "star" may be further sent to the terminal device 200b, and then the terminal device 200b may call the application client installed in the local social application according to the service data related to the virtual background "star", and draw the virtual background "star" in the video communication interface of the terminal device 200 b.
Subsequently, since the users of both parties may generate dynamic changes constantly, the server 100 may obtain and detect user dynamic information corresponding to the users of both parties, and when the user dynamic information of one party satisfies the virtual background updating condition, the user dynamic information may be sent to the terminal device 200a and the terminal device 200b, and then the terminal device 200a and the terminal device 200b may respectively update the currently displayed virtual background in real time according to the user dynamic information, so as to obtain an updated virtual background, and display the updated virtual background in respective video communication interfaces. The user dynamic information refers to information acquired from audio and video data of two users in the video communication process, and includes but is not limited to position information corresponding to key parts of the users, facial expressions and volume data of the users, and real scene information of the environment where the users are located. For example, when the facial expression of the user a is smiling, a meteor background animation in the virtual background "star" described above may be triggered, such as a special effect animation in which the user a may see a meteor stroking in the virtual background "star".
Optionally, it may be understood that the system architecture may include a plurality of servers, one terminal device may be connected to one server, and each server may acquire service data (for example, a virtual background selected by a user, background elements included in the virtual background, and background animation associated with the background elements) in the terminal device connected to the server, and acquire and detect user dynamic information corresponding to the terminal device connected to the server, so as to update the current virtual background according to the user dynamic information.
Optionally, it may be understood that the terminal device may also obtain the service data and the user dynamic information, and detect whether the user dynamic information meets the virtual background update condition, so as to update the current virtual background according to the user dynamic information.
It is understood that the method provided by the embodiment of the present application can be executed by a computer device, including but not limited to a terminal device or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud database, a cloud service, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, domain name service, security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), a smart computer, a smart car, or another smart terminal capable of running an instant messaging application or a social application. The terminal device and the server may be directly or indirectly connected in a wired or wireless manner, and the embodiment of the present application is not limited herein.
For ease of understanding, the following description will specifically be given taking as an example that the terminal device 200a and the terminal device 200b perform video communication via the server 100.
Please refer to fig. 2 a-2 e together, which are schematic views of a video display scene according to an embodiment of the present application. The implementation process of the video display scene may be performed in the server 100 shown in fig. 1, or may be performed in a terminal device (e.g., any one of the terminal device 200a, the terminal device 200b, the terminal device 200c, or the terminal device 200n shown in fig. 1), or may be executed by both the terminal device and the server, which is not limited herein, and the embodiment of the present application is described as an example that the terminal device 200a, the terminal device 200b, and the server 100 are executed together. As shown in fig. 2a, a user having a binding relationship with the terminal device 200a is a user a, a user having a binding relationship with the terminal device 200B is a user B, the user a may initiate a video communication request to the server 100 through a social application in the terminal device 200a, after receiving the video communication request, the server 100 may issue the video communication request to the terminal device 200B, so that the user B may see related invitation prompt information on a display interface of the terminal device 200B, and further the server 100 may obtain an invitation result fed back by the terminal device 200B, if the invitation result is that the user B accepts the invitation, the server 100 may notify the terminal device 200a and the terminal device 200B to jointly enter a video communication state, that is, the user a and the user B may start to perform a video call. As shown in fig. 2B, the terminal device 200a may display a video frame 301a corresponding to the user B and a video frame 302a corresponding to the user a in the video communication interface 300a, and similarly, the terminal device 200B may also display a video frame 301B corresponding to the user a and a video frame 302B corresponding to the user B in the video communication interface 300B, it is understood that the video frames 301a and 302B are both obtained by shooting the user B in real time, the video frames 302a and 301B are both obtained by shooting the user a in real time, and it should be noted that, usually, key parts of the user (for example, the head of the user) are shot, optionally, the head portrait of the corresponding user may also be directly displayed at a corresponding position in the video communication interface, which is not limited in this application, and displaying the video picture obtained by real-time shooting is a preferred scheme. In the same video communication interface, the video picture corresponding to the user a and the video picture corresponding to the user B can be displayed in windows with different sizes and shapes, the sizes of the windows are smaller than those of the video communication interface, the two video pictures can not be overlapped with each other, and the two video pictures can also have a partial overlapping area. For example, referring to the video communication interface 300a again, the video frame 301a corresponding to the user B is shown in the interface in a large window form, the video frame 302a corresponding to the user a is shown in the interface in a small window form, and the video frame 302a covers a small area of the video frame 301a, so that the user a can also place the video frame 302a at a proper position through a drag operation. In addition, in the video frames 301a and 302B, the original background corresponding to the user B (indicated simply by the shaded area with horizontal lines in fig. 2B) may be displayed, and in the video frames 302a and 301B, the original background corresponding to the user a (indicated simply by the shaded area with oblique lines in fig. 2B) may be displayed, which may be the background obtained by shooting the real environment where the user a and the user B are located, or may be the default static background picture or the dynamic background of the sequence frame set in the video communication function by the social application.
Further, in the process of video communication between the user a and the user B, in order to improve the interest of the video call, the application supports that any user switches the current original background to the virtual background. For example, as shown in fig. 2c, assuming that the user a wishes to perform context switching, the terminal device 200a may display a virtual context list 301d in response to a triggering operation (e.g., a clicking operation) of the user a on a context switching control 301c in the current video communication interface 300c, where the virtual context list 301d may include one or more virtual contexts, such as a virtual context "street", a virtual context "flower", a virtual context "planet", a virtual context "park", and the like, as shown in the video communication interface 300 d. The virtual background list 301d may be displayed in any area (for example, a bottom area) of the video communication interface 300d in a floating window form, a cover layer form, or a semi-transparent form, or may be displayed by an interface which is capable of changing a display size by a drag operation and is retractable, and the size of the interface is smaller than that of the video communication interface 300 d. Optionally, when the virtual background list 301d is displayed, the video frame displayed in the form of a small window corresponding to the user B or the user a moves to an area that does not have an overlapping portion with the display area of the virtual background list, that is, the video frame corresponding to the user B or the user a is not covered by the display area of the virtual background list 301 d. It can be understood that, because the display area of the virtual background list 301d is limited, if there are many virtual background options in the virtual background list 301d and all the virtual background options cannot be displayed at the same time, the terminal device 200a may display only a part of the virtual background options in the virtual background list 301d and hide the remaining virtual background options, and the user a may find the hidden virtual background options by performing operations such as sliding the virtual background list 301d left and right, sliding the virtual background list 301d up and down, or dragging to change the display size of the virtual background list 301 d. For the process of performing background switching by the terminal device 200b, reference may be made to the above-mentioned process of performing background switching by the terminal device 200a, and details are not described here again.
Assuming that the user a selects the virtual background "flower", as shown in fig. 2c, the terminal device 200a may send a background switching request to the server 100 in response to a trigger operation (e.g., a click operation) of the user a for the virtual background "flower", and simultaneously send service data related to the virtual background "flower" to the server 100, thereby switching the original background shown in the video communication interface 300a in fig. 2b to the virtual background "flower"; after receiving the background switching request, the server 100 may send the service data to the terminal device 200b, and the terminal device 200b may obtain the virtual background "flower" according to the service data, and then switch the original background shown in the video communication interface 300b in fig. 2b to the virtual background "flower".
As shown in fig. 2d, the video communication interface after the terminal device 200a and the terminal device 200b switch the original background to the virtual background "flower" can be referred to as an interface after the terminal device 200a switches the background, and as shown in fig. 2d, the video communication interface 300e is an interface after the terminal device 200a switches the background, and the video communication interface 300f is an interface after the terminal device 200b switches the background. As shown in the video communication interface 300e, the virtual background "flower" may include two flowers, so that the video frame 301e corresponding to the user a may be displayed in a small window at the flower pistil position of one flower, and the video frame 302e corresponding to the user B may be displayed in a small window at the flower pistil position of the other flower. It is understood that the video frames 301e and 302e do not overlap each other so as not to interfere with the quality of video communication. In addition, optionally, in the video communication interfaces of different terminal devices, the video frames corresponding to the two users may be displayed at the same position, for example, as shown in fig. 2d, the video frame corresponding to the user a is displayed on the left side of the video communication interface (including the video communication interface 300e and the video communication interface 300f), and the video frame corresponding to the user B is displayed on the right side of the video communication interface (including the video communication interface 300e and the video communication interface 300 f); optionally, the display position of the video frame may be set differently according to different terminal devices, for example, the video frame corresponding to the user having the binding relationship with the current terminal device may be displayed on the left side (or right side) of the video communication interface by default, and the video frame corresponding to another user is displayed on the right side (or left side) of the video communication interface by default, which is not limited in the present application.
Further, the virtual background may be updated in real time. As shown in fig. 2e, the terminal device 200a and the terminal device 200B can monitor the dynamic changes of the user a and the user B at any time, for example, the terminal device 200a may obtain the audio/video data a (including video data and audio data) of the user a during the video communication, and similarly, the terminal device 200B may obtain the audio/video data B (including video data and audio data) of the user B during the video communication, further, the terminal device 200a may send the collected audio/video data a to the server 100, the terminal device 200B may also send the collected audio/video data B to the server 100, and the server 100 may extract user dynamic information of each user from the received audio/video data (including the audio/video data a and the audio/video data B), and detect whether the user dynamic information satisfies a virtual background update condition. When the user dynamic information of the user a or the user dynamic information of the user B satisfies the virtual background update condition, the server 100 may send the user dynamic information satisfying the condition to the terminal device 200a and the terminal device 200B, and then the terminal device 200a and the terminal device 200B may perform any one or more of update operations such as position adjustment, size adjustment, background animation display, background switching, and the like on the current virtual background according to the user dynamic information satisfying the condition. The user dynamic information includes, but is not limited to, position information corresponding to a key part of the user, facial expressions of the user, volume data, and real-scene information of an environment where the user is located. For example, assuming that the key part is the head of the user, when detecting that the head of the user a and the head of the user B have position changes (e.g., swing) at the same time, the server 100 may obtain position information (e.g., position coordinates of a key point) corresponding to the head of the user a and position information corresponding to the head of the user B from the video picture of the user a and the video picture of the user B, respectively, and send the position information of the two to the terminal device 200a and the terminal device 200B, and the terminal device 200a and the terminal device 200B may adjust the video picture corresponding to the user a and the video picture corresponding to the user B to the positions indicated by the position information, respectively, and simultaneously perform position synchronization adjustment on two flowers in the virtual background "flowers" according to the adjusted positions of the two video pictures, as shown in the video communication interface 300g, when the head of the user a swings, the flower associated with the user a, which is located on the left side in the virtual background "flower", will swing along with the head of the user a. In addition, the virtual background may also be updated according to the facial expression of the user, volume data, and the real-scene information of the environment where the user is located, and the specific implementation manner may refer to the embodiments corresponding to the subsequent fig. 3 and fig. 5, which is not described herein again.
The method for changing the background in the real-time video communication is different from the static background in the traditional video communication, in the video communication process, the terminal device is supported to automatically acquire the user dynamic information of the current user, and the virtual background in the current video communication interface is updated in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user does not need to interrupt the video communication and manually operate to update, that is, the method for changing the background in the real-time video communication can maintain the normal operation and display of the video communication while updating the virtual background, and improves the video communication quality. In addition, the image and the background element of the user in the traditional video call are split and are difficult to fuse, the background does not obviously improve the interest of the video call, and the application can improve the fusion degree of the image and the background of the user by constructing the virtual dynamic interactive environment background, improve the interest of the video call and the interactivity of the call background and enrich the video display mode of the video call.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a video display method according to an embodiment of the present disclosure. The video display method may be executed by a computer device, and the computer device may include a terminal device or a server as described in fig. 1. The video display method at least comprises the following steps S101-S103:
step S101, a first terminal displays a video communication interface for providing a video communication function for a first user and a second user in a social application, and displays a target virtual background where the first user and the second user are located together in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
specifically, in order to enable the first user and the second user to perform video communication, the first user needs to install an application having a video communication function in the first terminal, for example, the application having the video communication function may be a social application, an instant messaging application, a live application, a short video application, a music application, a shopping application, a game application, and the like. Taking the social application as an example, when a first user logs in the social application in a first terminal and a second user logs in the social application in a second terminal, the first user and the second user may start video communication through respective terminal devices. After entering the video communication, the first terminal may display a video communication interface on the screen, where the video communication interface is used to provide a video communication function for the first user and the second user, and the specific form of the video communication interface may refer to the video communication interface 300a and the video communication interface 300b in fig. 2 b.
Further, in order to construct a virtual dynamic interactive environment background, a target virtual background where the first user and the second user are located together may be displayed in the video communication interface. Taking the step performed by the first terminal as an example, specifically, the video communication interface may include a background switching control, and the first terminal may respond to a trigger operation (e.g., a click operation) for the background switching control to display one or more virtual backgrounds, for example, see the virtual background list 301d shown in the video communication interface 300d in fig. 2 c. And the first terminal can respond to the selection operation aiming at the one or more virtual backgrounds, determine the virtual background selected by the first user as the target virtual background, and then switch the original background where the first user and the second user are located together into the target virtual background in the video communication interface. For a specific interaction process and an interface schematic diagram, reference may be made to the description in the embodiments corresponding to fig. 2c to fig. 2d, in the description, the terminal device 200a may be a first terminal, the user a may be a first user, correspondingly, the terminal device 200B may be a second terminal, the user B may be a second user, and the virtual background "flower" is a target virtual background selected by the user a. The virtual background is a dynamic background, and may include one or more background elements, for example, two flower elements in the virtual background "flower" are background elements therein.
Step S102, when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;
specifically, in the video communication process, the first terminal may collect audio and video data (including video data and audio data) of the first user in real time, for example, collect the video data through the camera, collect the audio data through the microphone, and further analyze the collected audio and video data, so as to obtain dynamic information of the user.
Further, after acquiring the user dynamic information of the first user, the first terminal may detect whether the user dynamic information satisfies a virtual background update condition, and only when it is determined that the user dynamic information satisfies the virtual background update condition, the target virtual background is updated, and a specific process of condition judgment may refer to step S202 in the embodiment corresponding to fig. 5 below.
For the convenience of subsequent description and understanding, the first user screen and the second user screen in the video communication interface need to be described first. The method provided by the application is not only suitable for the video call scene of the real camera, but also suitable for the video call scene of the virtual image, and is also suitable for the video call scene of the virtual image and the real camera.
In the video communication process, the first terminal can shoot a key part of the first user through the camera, meanwhile, the video virtual communication function of the first terminal is detected, and if the function is detected to be in a closed state (namely the video virtual communication function is not opened by the first terminal), the first user is shown to be in a video call scene of the real camera, so that a camera video picture can be displayed in a video communication interface, and the camera video picture can be determined as the first user picture. The captured video image is a video image obtained by capturing a key part of the first user. On the contrary, if the video virtual communication function is detected to be in the open state (that is, the first terminal has opened the video virtual communication function), it indicates that the first user is in the avatar video call scene, so that a video picture for covering a key part of the first user can be displayed in the video communication interface, and the video picture can be determined as the first user picture. The state of the video virtual communication function can be changed by triggering a video virtual communication control in the video communication interface.
It can be understood that the second user screen is also used for displaying a key part of the second user, and the second user screen may also be a video image obtained by shooting or any one of the video images, and specifically, which form needs to be determined according to a video call scene where the second user is currently located, so that the second terminal needs to execute the process of performing the determination by the first terminal. Optionally, if the video picture type corresponding to the first user picture is different from the video picture type corresponding to the second user picture, that is, one of the two is a video picture for shooting, and the other is a video picture, then the video call scene of the avatar + the real camera is obtained. Wherein the first user screen and the second user screen are not overlapped with each other. It should be noted that the video frame may include multiple forms, for example, the avatar corresponding to the first user may be displayed, or the avatar corresponding to the first user may be displayed in a certain area in a background (which may be a real background or a virtual background), or the avatar corresponding to the first user may be displayed in a certain area in the background, and a specific form may be determined according to a product requirement, which is not limited in this application. For example, for a video picture displaying an avatar, a video picture generated by key parts of a user, which may refer to an eye part, a lip part, a nose part, an eyebrow part, and the like of the user, may be a video picture, and the key parts may be used to represent expression information (e.g., a smile expression, a closed mouth expression, an open mouth expression, an expression in which eyes are open and lips are open, and the like) of the user (e.g., the first user and the second user). After the video virtual communication function is started, the user can generate a corresponding video picture by selecting one virtual image for image conversion.
Furthermore, a first user picture and a second user picture can be displayed in the video communication interface, and the first user picture and the second user picture and the target virtual background are fused and displayed. For example, the scene depicted in the embodiment corresponding to fig. 2a to fig. 2e is a video call scene of a real camera, please refer to the video communication interface 300e in fig. 2d, in the video communication process, a video picture 301e obtained by shooting a key part (e.g., a head) of the user a is a first user picture, a video picture 302e obtained by shooting a key part of the user B is a second user picture, and after the terminal device 200a acquires the video picture 301e and the video picture 302e, the video picture 301e and the video picture 302e may be fused to corresponding pistil positions.
For another example, please refer to fig. 4 together, and fig. 4 is an interface schematic diagram of a video call scenario provided in the embodiment of the present application. As shown in the video communication interface 400a in fig. 4, in a scenario where both the user a and the user B start the video virtual communication function, the video frame 401a is a first user frame, the avatar selected by the user a can be displayed in the video frame 401a, the video frame 402a is a second user frame, and the avatar selected by the user B can be displayed in the video frame 402 a. Referring to the video communication interface 400B, in a scene where the user a starts the video virtual communication function and the user B does not start the video virtual communication function, the video picture 401B is a first user picture, an avatar selected by the user a can be displayed in the video picture 401B, the camera video picture 402B is a second user picture, and a key part of the user B can be displayed in the camera video picture 402B. It should be noted that each avatar may change along with the change of the key part of the user, for example, when the user a smiles, the avatar in the video picture 401a may also smile; when the user A tilts his head, the avatar also tilts his head. Optionally, in a scene of using the avatar to perform a video call, if the head of the user does not appear in the shooting range of the camera, the avatar currently selected by the user and having a still front face may be displayed as a default avatar in the video picture corresponding to the user, and until the head of the user reappears in the shooting range of the camera, the avatar may change along with the change of the user.
Further, optionally, when the user dynamic information of the first user meets the virtual background update condition, the first terminal may adjust the first user screen to a position indicated by the user dynamic information in the video communication interface; further, according to the first user picture after the position adjustment, in the video communication interface, the position synchronization adjustment is performed on the background element in the target virtual background, which is associated with the first user picture, so that the updated target virtual background matched with the user dynamic information is obtained. For example, as shown in the video communication interface 300g, the flower elements in the virtual background "flower" may be adjusted synchronously with the position of the user's head, as shown in fig. 2 e.
Optionally, when the user dynamic information of the first user meets the virtual background update condition, the first terminal may display a background animation associated with the user dynamic information in the video communication interface, and further fuse the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information.
Optionally, when the user dynamic information of the first user meets the virtual background update condition, the first terminal may perform size adjustment on a background element associated with the first user picture in the target virtual background in the video communication interface according to the user dynamic information, so as to obtain an updated target virtual background matched with the user dynamic information. Where resizing may include one or more of a lifting process or a zooming process, it is understood that the lifting process may change the height of the background elements and the zooming process may change the size of the background elements.
Optionally, when the user dynamic information of the first user meets the virtual background update condition, the first terminal may switch the target virtual background according to the user dynamic information in the video communication interface, so as to obtain an updated target virtual background matched with the user dynamic information.
It can be understood that the second terminal may also perform, for example, an update processing operation of the first terminal in the video communication interface corresponding to the second terminal, which is not described herein again. It should be noted that, because the types of the user dynamic information are various, a situation that various user dynamic information simultaneously satisfies the virtual background update condition may exist, at this time, the update of the target virtual background may include the various update processing operations, that is, in the video communication interface, the overlay effect after the various update processing operations may be displayed. In addition, the application does not limit which updating processing operations can be triggered by different types of user dynamic information, for example, when the target virtual background is a virtual background flower, if the head of the user swings, flower elements in the target virtual background can swing along with the target virtual background flower elements, or background animation with petals falling can be displayed.
In addition, the first terminal may also obtain user dynamic information of the second user in real time, and when the user dynamic information of the second user meets the virtual background updating condition, the target virtual background may also be updated, where the update processing operation at this time is the same as the process of performing update processing according to the user dynamic information of the first user, and is not described here again. Optionally, the user dynamic information of the first user may be fused with the user dynamic information of the second user to obtain fused dynamic information, and when the fused dynamic information satisfies the virtual background update condition, the target virtual background may be updated to obtain an updated target virtual background matched with the fused dynamic information, where the update processing operation at this time is the same as the process of performing update processing according to the user dynamic information of the first user, and is not repeated here. For example, if it is detected that the first user and the second user are laughing at the same time, an animated special effect of flowers opening may be displayed in the target virtual background. As another example, when the first user and the second user simultaneously present head movements and flower elements on both sides of the target virtual background intersect during the swinging process, an associated background animation, such as an animation showing petals falling, may be triggered. And acquiring user dynamic information of the second user from audio and video data of the second user in the video communication process.
The implementation manner of outputting the target virtual background and the video picture to the terminal device (including the first terminal and the second terminal) can be implemented by a real-time rendering component in the terminal device. The real-time rendering component herein may refer to a component having a screen rendering capability. For example, the real-time rendering component may be a real-time three-dimensional (3D) engine, such as an Ace3D engine. The Ace engine is a lightweight cross-platform rendering engine and can be deployed in camera application in terminal equipment, capacity reinforcement is performed on the basis of a Google file open source rendering library, and if improvement in loading speed, memory occupation, compatibility and the like is performed, the Ace engine has the advantages of being high in loading speed, small in memory occupation and high in compatibility, and can be used for hair rendering, 3D animation expression rendering, custom nodes and the like. The method and the device for dynamically drawing the virtual background based on the rendering engine have the advantages that the virtual background is dynamically drawn based on the rendering engine, the flexibility and the interestingness of the virtual background which can be realized are far higher than those of the conventional static pictures or sequence frame animations, and the background is dynamic and interesting instead of static or simple circulating.
And step S103, displaying the updated target virtual background in a video communication interface.
Specifically, the first terminal may fuse the updated target virtual background with the first user screen and the second user screen, and display the fused target virtual background in the video communication interface.
In the video communication process, the user dynamic information of the user can be acquired from the audio and video data of the user, and when the user dynamic information is detected to meet the virtual background updating condition, the target virtual background can be updated, so that the updated target virtual background matched with the user dynamic information is obtained. Therefore, the method and the device support the terminal device to automatically acquire the user dynamic information of the current user, and call the rendering engine to update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user does not need to interrupt video communication and manually operate to update the virtual background, that is, the method and the device can maintain the normal operation and display of video communication while updating the virtual background, and improve the video communication quality. In addition, the virtual dynamic interactive environment background is constructed, so that the fusion degree of the image and the background of the user can be improved, the interestingness of video communication is improved, and the video display mode of the video communication is enriched.
Please refer to fig. 5, which is a flowchart illustrating a video display method according to an embodiment of the present disclosure. The video display method may be executed by a computer device, and the computer device may include a terminal device or a server as described in fig. 1. As shown in fig. 5, the method may include the steps of:
step S201, a first terminal displays a video communication interface for providing a video communication function for a first user and a second user in a social application, and displays a target virtual background, a first user picture and a second user picture where the first user and the second user are located together in the video communication interface;
specifically, after the first user and the second user start video communication through the social application, the first terminal may select one virtual background as a target virtual background, and perform fusion display on the first user picture and the second user picture with the target virtual background in the video communication interface. The first user picture and the second user picture can be any one of a camera shooting video picture or a video picture. Please refer to fig. 6a to fig. 6d, which are schematic interface diagrams of a video display process according to an embodiment of the present application. As shown in fig. 6a, assuming that the first terminal responds to a trigger operation (e.g., a click operation) for a video virtual communication control 501a in a video communication interface 500a to start a video virtual communication function, and the second terminal also starts the video virtual communication function, as shown in the video communication interface 500a, a video frame 502a is a first user frame, a video frame 503a is a second user frame, and at this time, both the first user frame and the second user frame belong to video frames, and the first terminal can respond to the trigger operation for a background switching control to display one or more virtual backgrounds in the video communication interface 500b, and can respond to a selection operation for one or more virtual backgrounds to determine a virtual background "flower" selected by the first user as a target virtual background, and then fuse the first user frame and the second user frame with the virtual background "flower", and displayed in the video communication interface 500 c.
Step S202, audio and video data of a first user are obtained, user dynamic information of the first user is extracted from the audio and video data, and condition judgment is carried out on the user dynamic information;
specifically, in the video communication process, the first terminal may obtain audio and video data (including video data and audio data) of the first user in real time through the camera and the microphone, and further extract user dynamic information of the first user from the audio and video data, where the user dynamic information of the first user may include one or more of position information corresponding to a key part of the first user, facial expression and volume data of the first user, or real-scene information of an environment where the first user is located, as described above.
First, whether the user dynamic information meets the virtual background updating condition can be judged. Optionally, the first terminal may perform gesture detection on the first user in the video data, and in this embodiment, the head of the first user is used as a key part, so that three-dimensional position data corresponding to the head of the first user may be obtained by detecting and tracking a head key point, where the three-dimensional position data may include a spatial coordinate and a rotation angle of the head, and the three-dimensional position data is based on a local coordinate system (also referred to as a model coordinate system). Further, the first terminal may generate a three-dimensional Model matrix (Model matrix), a View matrix (View matrix), and a Projection matrix (Projection matrix) associated with the first user according to a geometric relationship between the key part of the first user and the world coordinate system, a positional relationship between the key part of the first user and a camera of the first terminal, and a size of a screen display area of the first terminal, specifically: the method comprises the steps of obtaining a geometric relation between a key part of a first user and an origin and coordinate axes of a world coordinate system, further constructing a first translation matrix and a first rotation matrix according to the geometric relation, generating a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix, and constructing a scaling matrix to participate in coordinate operation when scaling transformation exists. The world coordinate system is used for describing key parts of the first user and positions of the camera of the first terminal. Meanwhile, a second translation matrix and a second rotation matrix can be constructed according to the position relation between the key part of the first user and the camera of the first terminal, and then a view matrix associated with the first user can be generated according to the second translation matrix and the second rotation matrix; meanwhile, the space parameters of the camera coordinate system can be obtained according to the size of the screen display area of the first terminal, and the projection matrix associated with the first user is constructed according to the space parameters. Further, the three-dimensional position data may be left-multiplied by a three-dimensional Model matrix, a View matrix, and a Projection matrix to realize matrix transformation of the three-dimensional position data, so that a vertex position coordinate corresponding to a key part of the first user may be obtained, and the vertex position coordinate is determined as position information corresponding to the key part of the first user, that is, a process of converting the three-dimensional position data in a local coordinate system into a standard View space is realized, which may be referred to as MVP (Model-View-project) matrix transformation, and the transformation may adopt knowledge of geometric transformation translation, rotation, scaling, and the like in graphics. It can be understood that when the spatial coordinates or the rotation angle in the three-dimensional position data have a parameter change, it may be determined that the position information has a position change, and it may be determined that the position information after the position change satisfies the virtual background update condition, that is, when the head of the first user deflects or moves, the position information corresponding to the head satisfies the virtual background update condition. It should be noted that the present application supports three degrees of freedom of movement in the x, y, and z coordinate axes, and three degrees of freedom of rotation around the axes.
Optionally, the first terminal may perform expression detection on the first user in the obtained video data, specifically, perform detection based on a deep neural network, so as to obtain a facial expression of the first user, and if it is detected that the facial expression of the first user belongs to the target facial expression type, it may be determined that the facial expression of the first user satisfies the virtual background update condition. The target facial expression types include, but are not limited to, a happy type, an astonishing type, a sad type, an angry type, an aversion type, and a fear type.
Optionally, the first terminal may perform audio detection on the acquired audio data, and specifically may sample the audio data, so as to obtain volume data corresponding to the first user, where it can be understood that the volume data includes a plurality of sampled volume values. Further, when it is detected that there is a volume value located in the volume detection section in the volume data, it may be determined that the volume value satisfies the virtual background update condition. Preferably, the volume detection interval is 20-90 db.
Optionally, the first terminal may perform environment detection on an environment where the first user is located in the video data, specifically, may extract real-scene information of the environment where the first user is located from the video data, where the real-scene information may include one or more of a brightness, a color composition, or a key environment object of the environment where the first user is located, where the key environment object may be an object occupying a special position in the photographed environment, for example, if the first user sits on an outdoor lawn, and a camera photographs a flower on the lawn for multiple times, the flower may be determined as one of the key environment objects. Further, if it is detected that the real-scene information has an environmental change, for example, the brightness in the real environment has changed, the important color ratio has changed, or the key environmental object has changed, it may be determined that the changed real-scene information satisfies the virtual background update condition.
It should be noted that the gesture detection, the expression detection, the audio detection, and the environment detection in the present application may be performed in parallel, any multiple information in the position information, the facial expression, the volume data, or the real-world information may be combined, and when all the information in the combination satisfies the respective corresponding virtual background update condition, the combination also satisfies the virtual background update condition.
Secondly, the first terminal can judge whether the user dynamic information meets the user picture switching condition. Specifically, if the facial expression of the first user belongs to the target facial expression type and the display duration of the facial expression is greater than the first duration threshold, it may be determined that the user dynamic information of the first user satisfies the user screen switching condition. Optionally, if the volume data corresponding to the first user is located in the volume detection interval, and the duration of the volume data located in the volume detection interval is greater than the second duration threshold, it may be determined that the user dynamic information of the first user satisfies the user frame switching condition. Optionally, the first terminal may further perform semantic recognition on the audio data of the first user, and when it is recognized that a user picture switching instruction exists in the audio data, it may be determined that the user dynamic information of the first user meets a user picture switching condition. The first duration threshold and the second duration threshold may be set according to actual needs, which is not limited in the present application.
Step S203, when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information;
specifically, when the position information after the position change in step S202 satisfies the virtual background update condition, in the video communication interface, the first terminal may determine a displacement distance and a deflection angle (which may be obtained through MVP matrix transformation calculation) according to the position information after the position change, and may further perform displacement deflection processing on the first user picture according to the displacement distance and the deflection angle, and meanwhile, in the video communication interface, perform synchronous displacement deflection processing on a background element associated with the first user picture in the target virtual background according to the displacement distance and the deflection angle, so as to obtain an updated target virtual background matched with the position information after the position change. For example, referring to fig. 6b, as shown in fig. 6b, when the head position of the user a in the user frame on the left side changes, the user frame and the associated floral element are both displaced and deflected, and the position shown by the area 501d in the original video communication interface 500d is moved to the position shown by the area 501e in the video communication interface 500 e. Optionally, if it is not desired that the flower elements on the left and right sides intersect, the swing range of each flower element may be defined in the video communication interface 500e, or the distance between the flower elements on the two sides may be detected, and when the distance is smaller than the distance threshold, the flower elements that do not swing originally may be triggered to swing in the same direction until the distance between the flower elements on the two sides is greater than the distance threshold, and the specific product form needs to be determined according to the actual product requirements.
Optionally, when the facial expression of the first user in step S202 satisfies the virtual background update condition, the first user may traverse through the expression animation mapping table to obtain a background animation matched with the facial expression, and then may display the background animation in the video communication interface, and fuse the background animation with the target virtual background to obtain an updated target virtual background matched with the facial expression. The expression animation mapping table stores background animation corresponding to each virtual background, and establishes a mapping relation between each facial expression and the background animation. For example, referring to fig. 6c together, when it is detected that user B on the right side of the user frame smiles (the corresponding facial expression type is happy type), as shown in the video communication interface 500g in fig. 6c, a bee appears around the user frame of user B and can move around the user frame, and if user B has a head motion, the bee can follow the motion.
Optionally, when the volume data of the first user in step S202 meets the virtual background update condition, a volume peak and a volume valley (including a volume peak and a volume valley) may be extracted from the volume data, and a scaling matrix associated with the first user may be constructed according to the volume peak and the volume valley, where the scaling matrix is formed by scaling coefficients in at least two different scaling directions, and the specific scaling direction is not limited in the present application. Further, in the video communication interface, the first terminal may perform size adjustment on a background element associated with the first user screen in the target virtual background according to at least two scaling coefficients in the scaling matrix, to obtain an updated target virtual background matched with the user dynamic information. Wherein the resizing may include one or more of a lifting process or a scaling process. In addition, the scaling matrix may be put into the MVP matrix transformation in step S202 for dynamic operation. For example, referring to fig. 6d together, the video communication interface 500h is in the form of an initial video communication interface corresponding to the user a and the user B, when it is detected that the volume data generated by the user B in the user frame on the right side is located in the volume detection interval and the volume peak in the volume data becomes large, in the video communication interface 500i of the user B, the flower element in the virtual background of the user B is raised, and at the same time, the user frame of the user B is raised, and when the raised height exceeds the height threshold, the cloud element may be displayed in the video communication interface 500 i. It can be understood that, as shown in the video communication interface 500j of the user a on the other side, the picture seen by the user a may also be changed accordingly, and it can be seen that the flower element on the right side is raised to a height exceeding the area that can be displayed by the video communication interface 500j, so that the user a can only see the stem of the flower element on the right side at this time. Of course, as user B becomes less louder to speak, its associated floral elements return to the original height.
Optionally, when the real-scene information of the environment in which the first user is located in step S202 meets the virtual background update condition, the current target virtual background may be switched to the virtual background associated with the real-scene information in the video communication interface. For example, when detecting that there are many fresh flowers in the environment of the first user, the first terminal may generate a keyword "flower" and search for a virtual background matching the keyword as an updated target virtual background.
It should be noted that the above-mentioned background updating methods are only some of the various embodiments, and the present application may also support other background updating methods, and the corresponding relationship between different types of user dynamic information and the background updating methods is not limited in the present application, and is not described herein again.
Step S204, when the user dynamic information of the first user meets the user picture switching condition, switching the first user picture;
specifically, when the user dynamic information of the first user meets the user picture switching condition, the first terminal may switch the first user picture from the video picture to the camera shooting video picture in the video communication interface, or switch the first user picture from the camera shooting video picture to the video picture.
And S205, in the video communication interface, fusing and displaying the switched first user picture and the switched second user picture with the updated target virtual background.
Specifically, in the video communication interface, the first terminal may perform fusion display on the switched first user image, the switched second user image, and the updated target virtual background.
Please refer to fig. 7, which is a schematic view of a workflow of a video display system according to an embodiment of the present application, where the video display system may be located in a terminal device or a server. As shown in fig. 7, the work flow of the video presentation system is as follows:
(1) after the video call is started, the system respectively detects the dynamic information of the user in real time: carrying out gesture detection through a camera AI to obtain spatial coordinates and a rotation angle of the head of a user, and obtaining facial expression types through expression detection; and carrying out audio detection on the audio data through a microphone to judge the volume.
(2) And (3) refreshing the real-time data (namely the dynamic information of the user) in the step (1) to a local system, sending the data to the system of the other party of the call, and starting drawing pictures by the systems of the two parties.
(3) The head space coordinates and angles are mapped into the virtual scene.
a) The mapping of the space coordinates uses geometric transformation translation in the graphics, and a translation matrix is constructed to participate in coordinate operation to complete translation operation from a local coordinate system to a world coordinate system through the space distances of an x axis, a y axis and a z axis of a model from an origin, wherein the form of the translation matrix is as follows:
Figure BDA0002950796970000261
please refer to fig. 8 a-8 d, which are schematic diagrams illustrating a coordinate transformation principle provided in an embodiment of the present application. In the coordinate system, as shown in fig. 8a, after the translation operation, the original coordinates (x, y) can be transformed to the new coordinates (x + T) x ,y+T y )。
b) The angle mapping uses geometric transformation rotation in graphics, and a rotation matrix is constructed to participate in coordinate operation to complete rotation operation from a local coordinate system to a world coordinate system through different Euler angle sine and cosine transformation of a model in each direction of x, y and z axes, wherein the form of the rotation matrix is as follows:
Figure BDA0002950796970000271
as shown in fig. 8b, in the coordinate system, after the rotation operation, the original coordinates (x, y) can be transformed to the new coordinates (xcos α -ysin α, xsin α + ycos α) by the rotation angle α.
(4) The virtual background corresponding to the user is displaced and deflected using the rendering engine. After the 3D API (direct interface between the display card and the application program) takes the mapping coordinates, the conversion from a partial model local coordinate system to a world coordinate system is completed, and then a View matrix and a project matrix are determined according to the placement position of the camera and the screen display area. And calculating the Model (namely a three-dimensional Model matrix) View (namely a View matrix) project (namely a Projection matrix) to obtain a final vertex position coordinate, transmitting the vertex position coordinate into a shader (shader), and finally rasterizing and rendering the graph of the virtual background (including the video picture if the virtual background is in a video virtual communication state). The Model matrix and the View matrix can be obtained through the rotation matrix and the translation matrix, the local coordinate system is converted into a world coordinate system through the Model matrix, and the world coordinate system is converted into a camera coordinate system through the View matrix. The Projection matrix relates to trigonometric functions and similar triangles and is of the form:
Figure BDA0002950796970000272
referring to fig. 8c, as shown in fig. 8c, the visible area of the camera is a geometric space extending from the screen along the viewing direction (opposite direction) according to a certain rule. Since geometric objects far from the camera have a limited contribution to the picture, it is cut off to consider that the visible area extends a certain distance. If the screen is a plane rectangle, the visible area is a geometric body enclosed by six planes, the geometric body is a visual field body of the camera, and the top, bottom, left, right, near and far planes are respectively represented by English words, and the top, bottom, left, right, near and far planes in the project matrix can also represent the coordinates of the corresponding planes. The near plane is also called a near clipping plane, the far plane is also called a far clipping plane, the near clipping plane of the shaded portion in fig. 8c is equivalent to a Projection plane, and can be understood as a screen display area showing the position and size of the screen, and the Center of Projection (COP) is the position of the camera. Projective transformation by the Projection matrix is a process of transforming various shapes of view volumes to standard view volumes.
(5) And triggering the corresponding background animation by using the rendering engine according to different facial expression types.
(6) And according to the volume, using the rendering engine to lift or zoom the virtual background corresponding to the user. The volume is determined by the captured PCM (pulse code modulation) sampling value, the scaling of the corresponding background element model is adjusted according to the volume peak-to-valley value, the scaling also uses geometric transformation scaling in graphics, and a scaling matrix is constructed by the stretching scaling coefficients of the model on the x axis, the y axis and the z axis to dynamically participate in the MVP operation. The scaling matrix is of the form:
Figure BDA0002950796970000281
as shown in fig. 8d, in the coordinate system, after scaling operation, the x-axis coordinate of a certain point on the original image may be transformed from x to x Sx, where Sx is a stretching scaling factor in the x-axis direction.
In the video communication process, the user dynamic information of the user can be acquired from the audio and video data of the user, and when the user dynamic information is detected to meet the virtual background updating condition, the target virtual background can be updated, so that the updated target virtual background matched with the user dynamic information is obtained. Therefore, the method and the device support the terminal device to automatically acquire the user dynamic information of the current user, and call the rendering engine to update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user does not need to interrupt video communication and manually operate to update the virtual background, that is, the method and the device can maintain the normal operation and display of video communication while updating the virtual background, and improve the video communication quality. In addition, the application combines with the detection of human body posture, expression, sound and environment through the rendering engine, constructs a virtual dynamic interactive environment background, can improve the fusion degree of images and the background of the user, and improves the interestingness of video communication.
Fig. 9 is a schematic structural diagram of a video display apparatus according to an embodiment of the present application. The video presentation apparatus may be a computer program (including program code) running on a computer device, for example the video presentation apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 9, the video display apparatus 1 may include: a first display module 11, a first updating module 12 and a second display module 13;
the first display module 11 is configured to display, by the first terminal in the social application, a video communication interface for providing a video communication function for the first user and the second user, and display a target virtual background in which the first user and the second user are located together in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
the first display module 11 is specifically configured to respond to a trigger operation for a background switching control in a video communication interface by a first terminal, and display one or more virtual backgrounds; in response to a selection operation for one or more virtual backgrounds, determining the selected virtual background as a target virtual background; in a video communication interface, switching an original background where a first user and a second user are located together into a target virtual background;
the first updating module 12 is configured to update the target virtual background when the user dynamic information of the first user meets the virtual background updating condition, so as to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;
and the second display module 13 is configured to display the updated target virtual background in the video communication interface.
The specific functional implementation manner of the first display module 11 may refer to step S101 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, and the specific functional implementation manner of the first update module 12 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S203 in the embodiment corresponding to fig. 5, and the specific functional implementation manner of the second display module 13 may refer to step S103 in the embodiment corresponding to fig. 3, or may refer to step S205 in the embodiment corresponding to fig. 5, which is not described herein again.
The video communication interface further comprises a first user picture and a second user picture;
referring to fig. 9, the video display apparatus 1 may further include: a first fused display module 14;
the first fusion display module 14 is used for shooting key parts of a first user in the video communication process; if the video virtual communication function of the first terminal is in an open state, displaying a video picture for covering a key part of a first user in a video communication interface, and determining the video picture as a first user picture; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying the key part of the second user; and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
The specific function implementation manner of the first fusion display module 14 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, which is not described herein again.
Referring to fig. 9, the video display apparatus 1 may further include: a second fusion display module 15;
the second fusion display module 15 is used for shooting the key part of the first user in the video communication process; if the video virtual communication function of the first terminal is in a closed state, displaying a camera video picture in a video communication interface, and determining the camera video picture as a first user picture; the camera shooting video picture refers to a video picture for shooting a key part of a first user; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying the key part of the second user; and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
The specific function implementation manner of the second fusion display module 15 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, where the first fusion display module 14 and the second fusion display module 15 may be combined into one fusion display module, which is not described herein again.
Referring to fig. 9, the video display apparatus 1 may further include: a second update module 16;
the second updating module 16 is configured to obtain user dynamic information of the second user, and fuse the user dynamic information of the first user and the user dynamic information of the second user to obtain fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process; and when the fused dynamic information meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the fused dynamic information.
The specific functional implementation manner of the second updating module 16 may refer to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.
The user dynamic information comprises position information corresponding to a key part of a first user;
referring to fig. 9, the video display apparatus 1 may further include: an attitude detection module 17 and a first condition judgment module 18;
the gesture detection module 17 is configured to perform gesture detection on the first user in the video communication process, and acquire three-dimensional position data corresponding to a key part of the first user; respectively generating a three-dimensional model matrix, a view matrix and a projection matrix associated with a first user according to a geometric relation between a key part of the first user and a world coordinate system, a position relation between the key part of the first user and a camera of a first terminal and the size of a screen display area of the first terminal; the world coordinate system is used for describing the key part of the first user and the position of a camera of the first terminal; performing matrix transformation on the three-dimensional position data according to the three-dimensional model matrix, the view matrix and the projection matrix to generate vertex position coordinates corresponding to the key part of the first user, and determining the vertex position coordinates as position information corresponding to the key part of the first user;
the gesture detection module 17 is specifically configured to obtain a geometric relationship between a key part of the first user and an origin and a coordinate axis of a world coordinate system, construct a first translation matrix and a first rotation matrix according to the geometric relationship, and generate a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix; constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix; obtaining a space parameter of a camera coordinate system according to the size of a screen display area of the first terminal, and constructing a projection matrix associated with the first user according to the space parameter;
and the first condition judgment module 18 is configured to determine that the position information has a position change if the spatial coordinate or the rotation angle in the three-dimensional position data has a parameter change, and determine that the position information after the position change meets the virtual background update condition.
The specific functional implementation manners of the gesture detection module 17 and the first condition determination module 18 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein again.
Wherein the user dynamic information further comprises a facial expression of the first user;
referring to fig. 9, the video display apparatus 1 may further include: an expression detection module 19 and a second condition judgment module 20;
the expression detection module 19 is configured to perform expression detection on the first user in the video communication process to obtain a facial expression of the first user;
and the second condition judgment module 20 is configured to determine that the user dynamic information satisfies the virtual background update condition if the facial expression belongs to the target facial expression type.
The specific functional implementation manners of the expression detection module 19 and the second condition judgment module 20 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein again.
The user dynamic information further comprises volume data corresponding to the first user;
referring to fig. 9, the video display apparatus 1 may further include: an audio detection module 21 and a third condition judgment module 22;
the audio detection module 21 is configured to acquire audio data input by a first user, and sample the audio data to obtain volume data corresponding to the first user;
and a third condition determining module 22, configured to determine that the user dynamic information satisfies the virtual background update condition if the volume data is located in the volume detection interval.
The specific functional implementation manners of the audio detection module 21 and the third condition judgment module 22 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein again.
The user dynamic information also comprises the real scene information of the environment where the first user is located;
referring to fig. 9, the video display apparatus 1 may further include: an environment detection module 23 and a fourth condition judgment module 24;
the environment detection module 23 is configured to acquire video data of an environment where the first user is located, and extract real-scene information of the environment where the first user is located from the video data; the live-action information comprises one or more of the brightness, the color composition or the key environment object of the environment where the first user is located;
and a fourth condition determining module 24, configured to determine that the dynamic information of the user meets the virtual background updating condition if the environment of the live-action information changes.
The specific functional implementation manners of the environment detecting module 23 and the fourth condition determining module 24 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein again.
Referring to fig. 9, the video display apparatus 1 may further include: a screen switching module 25;
the picture switching module 25 is configured to determine that the user dynamic information of the first user meets a user picture switching condition if the facial expression of the first user belongs to the target facial expression type and the display duration of the facial expression is greater than a first duration threshold, and switch the first user picture from a video picture to a camera shooting video picture in the video communication interface; the video picture for shooting is a video picture for shooting a key part of the first user, or if the volume data corresponding to the first user is located in the volume detection interval and the duration of the volume data in the volume detection interval is greater than a second duration threshold, it is determined that the user dynamic information of the first user meets the user picture switching condition, and the first user picture is switched from the video picture to the video picture for shooting in the video communication interface.
The specific function implementation manner of the screen switching module 25 may refer to step S202 and step S204 in the embodiment corresponding to fig. 5, which is not described herein again.
Referring to fig. 9, the first update module 12 may include: a first position adjusting unit 121, a first animation display unit 122, a first size adjusting unit 123, a second position adjusting unit 124, a second animation display unit 125, and a second size adjusting unit 126;
a first position adjusting unit 121, configured to adjust, in the video communication interface, the first user frame to a position indicated by the user dynamic information when the user dynamic information of the first user satisfies the virtual background update condition; according to the first user picture after the position adjustment, in a video communication interface, synchronously adjusting the position of a background element in a target virtual background, which is associated with the first user picture, to obtain an updated target virtual background matched with the user dynamic information;
the first animation display unit 122 is configured to display a background animation associated with the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background update condition, and fuse the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information;
a first size adjusting unit 123, configured to, when the user dynamic information of the first user meets the virtual background updating condition, perform size adjustment on a background element associated with the first user picture in the target virtual background in the video communication interface according to the user dynamic information, so as to obtain an updated target virtual background matched with the user dynamic information;
a second position adjusting unit 124, configured to determine, in the video communication interface, a displacement distance and a deflection angle according to the position information after the position change when the position information after the position change in the user dynamic information satisfies the virtual background update condition, and perform displacement deflection processing on the first user picture according to the displacement distance and the deflection angle; in a video communication interface, performing synchronous displacement deflection processing on background elements associated with a first user picture in a target virtual background according to a displacement distance and a deflection angle to obtain an updated target virtual background matched with user dynamic information;
a second animation display unit 125, configured to traverse through the expression animation mapping table when the facial expression in the user dynamic information satisfies the virtual background update condition, to obtain a background animation that matches the facial expression; displaying the background animation in the video communication interface, and fusing the background animation and the target virtual background to obtain an updated target virtual background matched with the dynamic information of the user;
a second resizing unit 126, configured to, when the volume data in the user dynamic information satisfies the virtual background update condition, extract a volume peak-to-valley value from the volume data, and construct a scaling matrix associated with the first user according to the volume peak-to-valley value; the scaling matrix is composed of at least two scaling coefficients in different stretching directions; in a video communication interface, the size of a background element associated with a first user picture in a target virtual background is adjusted according to at least two scaling coefficients in a scaling matrix, and an updated target virtual background matched with user dynamic information is obtained.
The specific functional implementation manners of the first position adjusting unit 121, the first animation display unit 122, and the first size adjusting unit 123 may refer to step S102 in the embodiment corresponding to fig. 3, and the specific functional implementation manners of the second position adjusting unit 124, the second animation display unit 125, and the second size adjusting unit 126 may refer to step S203 in the embodiment corresponding to fig. 5, where the first position adjusting unit 121 and the second position adjusting unit 124 may be combined into one position adjusting unit, the first animation display unit 122 and the second animation display unit 125 may be combined into one animation display unit, and the first size adjusting unit 123 and the second size adjusting unit 126 may be combined into one size adjusting unit, which is not described herein again.
In the video communication process, the user dynamic information of the user can be acquired from the audio and video data of the user, and when the user dynamic information is detected to meet the virtual background updating condition, the target virtual background can be updated, so that the updated target virtual background matched with the user dynamic information is obtained. Therefore, the method and the device support the terminal device to automatically acquire the user dynamic information of the current user, and call the rendering engine to update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user does not need to interrupt video communication and manually operate to update the virtual background, that is, the method and the device can maintain the normal operation and display of video communication while updating the virtual background, and improve the video communication quality. In addition, the virtual dynamic interactive environment background is constructed, so that the fusion degree of the image and the background of the user can be improved, the interestingness of video communication is improved, and the video display mode of the video communication is enriched.
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:
the method comprises the steps that a first terminal displays a video communication interface used for providing a video communication function for a first user and a second user in a social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;
and displaying the updated target virtual background in the video communication interface.
It should be understood that the computer device 1000 described in this embodiment of the present application can perform the description of the video display method in the embodiment corresponding to any one of fig. 3 and fig. 5, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Furthermore, it is to be noted here that: an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program executed by the aforementioned video display apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the video display method in any one of the embodiments corresponding to fig. 3 and fig. 5 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.
The computer-readable storage medium may be the video display apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Further, here, it is to be noted that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided by any one of the corresponding embodiments of fig. 3 and fig. 5.
The terms "first," "second," and the like in the description and claims of embodiments of the present application and in the drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (20)

1. A method for video presentation, comprising:
the method comprises the steps that a first terminal displays a video communication interface used for providing a video communication function for a first user and a second user in a social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
when the user dynamic information of the first user meets a virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from the audio and video data of the first user in the video communication process;
and displaying the updated target virtual background in the video communication interface.
2. The method of claim 1, wherein the presenting a target virtual background in the video communication interface where the first user and the second user are commonly located comprises:
the first terminal responds to the triggering operation of a background switching control in the video communication interface and displays one or more virtual backgrounds;
determining the selected virtual background as a target virtual background in response to a selection operation for the one or more virtual backgrounds;
and in the video communication interface, switching the original background where the first user and the second user are located together into the target virtual background.
3. The method of claim 1, wherein the video communication interface further comprises a first user screen and a second user screen; the method further comprises the following steps:
shooting key parts of the first user in a video communication process;
if the video virtual communication function of the first terminal is in an open state, displaying a video picture for covering a key part of the first user in the video communication interface, and determining the video picture as a first user picture;
displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying a key part of the second user;
and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
4. The method of claim 1, wherein the video communication interface further comprises a first user screen and a second user screen; the method further comprises the following steps:
shooting key parts of the first user in the video communication process;
if the video virtual communication function of the first terminal is in a closed state, displaying a camera video picture in the video communication interface, and determining the camera video picture as a first user picture; the camera shooting video picture refers to a video picture for shooting a key part of the first user;
displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying a key part of the second user;
and in the video communication interface, the first user picture and the second user picture are fused with the target virtual background for display.
5. The method according to claim 3 or 4, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information comprises:
when the user dynamic information of the first user meets the virtual background updating condition, adjusting the first user picture to a position indicated by the user dynamic information in the video communication interface;
according to the first user picture after the position adjustment, in the video communication interface, carrying out position synchronization adjustment on a background element in the target virtual background, which is associated with the first user picture, so as to obtain an updated target virtual background matched with the user dynamic information.
6. The method according to claim 1, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information comprises:
and when the user dynamic information of the first user meets the virtual background updating condition, displaying a background animation associated with the user dynamic information in the video communication interface, and fusing the background animation and the target virtual background to obtain an updated target virtual background matched with the user dynamic information.
7. The method according to claim 3 or 4, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information comprises:
when the user dynamic information of the first user meets the virtual background updating condition, in the video communication interface, according to the user dynamic information, the size of a background element in the target virtual background, which is associated with the first user picture, is adjusted, and an updated target virtual background matched with the user dynamic information is obtained.
8. The method of claim 1, further comprising:
acquiring user dynamic information of the second user, and fusing the user dynamic information of the first user and the user dynamic information of the second user to obtain fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process;
and when the fusion dynamic information meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the fusion dynamic information.
9. The method according to claim 3 or 4, wherein the user dynamic information comprises position information corresponding to a key part of the first user;
the method further comprises the following steps:
performing gesture detection on the first user in a video communication process to acquire three-dimensional position data corresponding to a key part of the first user;
respectively generating a three-dimensional model matrix, a view matrix and a projection matrix associated with the first user according to the geometric relationship between the key part of the first user and a world coordinate system, the position relationship between the key part of the first user and a camera of the first terminal and the size of a screen display area of the first terminal; the world coordinate system is used for describing key parts of the first user and positions of cameras of the first terminal;
performing matrix transformation on the three-dimensional position data according to the three-dimensional model matrix, the view matrix and the projection matrix to generate vertex position coordinates corresponding to the key part of the first user, and determining the vertex position coordinates as position information corresponding to the key part of the first user;
and if the space coordinate or the rotation angle in the three-dimensional position data has parameter change, determining that the position information has position change, and determining that the position information after the position change meets the virtual background updating condition.
10. The method of claim 9, wherein the generating a three-dimensional model matrix, a view matrix and a projection matrix associated with the first user according to the geometric relationship between the key part of the first user and the world coordinate system, the position relationship between the key part of the first user and the camera of the first terminal and the size of the screen display area of the first terminal respectively comprises:
acquiring a geometric relation between a key part of the first user and an origin and coordinate axes of a world coordinate system, constructing a first translation matrix and a first rotation matrix according to the geometric relation, and generating a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix;
constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix;
and obtaining a space parameter of a camera coordinate system according to the size of a screen display area of the first terminal, and constructing a projection matrix associated with the first user according to the space parameter.
11. The method according to claim 9, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information comprises:
when the position information after the position change in the user dynamic information meets a virtual background updating condition, determining a displacement distance and a deflection angle according to the position information after the position change in the video communication interface, and performing displacement deflection processing on the first user picture according to the displacement distance and the deflection angle;
and in the video communication interface, performing synchronous displacement deflection processing on background elements in the target virtual background, which are associated with the first user picture, according to the displacement distance and the deflection angle to obtain an updated target virtual background matched with the user dynamic information.
12. The method of claim 1, wherein the user dynamics information further includes a facial expression of the first user;
the method further comprises the following steps:
performing expression detection on the first user in a video communication process to acquire a facial expression of the first user;
and if the facial expression belongs to the target facial expression type, determining that the user dynamic information meets a virtual background updating condition.
13. The method according to claim 12, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information includes:
when the facial expression in the user dynamic information meets a virtual background updating condition, traversing in an expression animation mapping table to obtain a background animation matched with the facial expression;
and displaying the background animation in the video communication interface, and fusing the background animation and the target virtual background to obtain an updated target virtual background matched with the user dynamic information.
14. The method of claim 3 or 4, wherein the user dynamic information further comprises volume data corresponding to the first user;
the method further comprises the following steps:
acquiring audio data input by the first user, and sampling the audio data to obtain volume data corresponding to the first user;
and if the volume data is positioned in a volume detection interval, determining that the user dynamic information meets a virtual background updating condition.
15. The method according to claim 14, wherein the updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition to obtain an updated target virtual background matching the user dynamic information includes:
when the volume data in the user dynamic information meet a virtual background updating condition, extracting volume peak-valley values from the volume data, and constructing a scaling matrix associated with the first user according to the volume peak-valley values; the scaling matrix is composed of scaling coefficients in at least two different stretching directions;
in the video communication interface, the size of a background element in the target virtual background, which is associated with the first user picture, is adjusted according to at least two scaling coefficients in the scaling matrix, so that an updated target virtual background matched with the user dynamic information is obtained.
16. The method of claim 1, wherein the user dynamic information further comprises real-world information of an environment in which the first user is located;
the method further comprises the following steps:
acquiring video data of the environment where the first user is located, and extracting real scene information of the environment where the first user is located from the video data; the live-action information comprises one or more of the brightness, the color composition or the key environment object of the environment where the first user is located;
and if the real-scene information has environmental change, determining that the user dynamic information meets the virtual background updating condition.
17. The method of claim 3, further comprising:
if the facial expression of the first user belongs to the target facial expression type and the display duration of the facial expression is greater than a first duration threshold, determining that the user dynamic information of the first user meets a user picture switching condition, and switching the first user picture from the video picture to a camera video picture in the video communication interface; the camera shooting video picture refers to a video picture for shooting a key part of the first user, or,
and if the volume data corresponding to the first user is located in a volume detection interval and the duration of the volume data in the volume detection interval is greater than a second duration threshold, determining that the user dynamic information of the first user meets a user picture switching condition, and switching the first user picture from the video picture to a camera shooting video picture in the video communication interface.
18. A video presentation apparatus, comprising:
the first display module is used for displaying a video communication interface for providing a video communication function for a first user and a second user by a first terminal in a social application, and displaying a target virtual background where the first user and the second user are located together in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user performing video communication with the first user in the video communication interface;
the updating module is used for updating the target virtual background when the user dynamic information of the first user meets a virtual background updating condition to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from the audio and video data of the first user in the video communication process;
and the second display module is used for displaying the updated target virtual background in the video communication interface.
19. A computer device, comprising: a processor, a memory, and a network interface;
the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1-17.
20. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program is adapted to be loaded by a processor and to carry out the method of any one of claims 1-17.
CN202110206221.4A 2021-02-24 2021-02-24 Video display method and device and readable storage medium Pending CN114979789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110206221.4A CN114979789A (en) 2021-02-24 2021-02-24 Video display method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110206221.4A CN114979789A (en) 2021-02-24 2021-02-24 Video display method and device and readable storage medium

Publications (1)

Publication Number Publication Date
CN114979789A true CN114979789A (en) 2022-08-30

Family

ID=82973690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110206221.4A Pending CN114979789A (en) 2021-02-24 2021-02-24 Video display method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN114979789A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117041225A (en) * 2023-09-28 2023-11-10 中科融信科技有限公司 Multi-party audio and video communication method and system based on 5G

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100134588A1 (en) * 2008-12-01 2010-06-03 Samsung Electronics Co., Ltd. Method and apparatus for providing animation effect on video telephony call
US20130307920A1 (en) * 2012-05-15 2013-11-21 Matt Cahill System and method for providing a shared canvas for chat participant
CN106817349A (en) * 2015-11-30 2017-06-09 厦门幻世网络科技有限公司 A kind of method and device for making communication interface produce animation effect in communication process
CN110086937A (en) * 2019-04-28 2019-08-02 上海掌门科技有限公司 Display methods, electronic equipment and the computer-readable medium of call interface
US20200070044A1 (en) * 2018-09-05 2020-03-05 Netmarble Corporation Server and Method for Providing Game Service Based on Interface Expressing Audio Visually

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100134588A1 (en) * 2008-12-01 2010-06-03 Samsung Electronics Co., Ltd. Method and apparatus for providing animation effect on video telephony call
US20130307920A1 (en) * 2012-05-15 2013-11-21 Matt Cahill System and method for providing a shared canvas for chat participant
CN106817349A (en) * 2015-11-30 2017-06-09 厦门幻世网络科技有限公司 A kind of method and device for making communication interface produce animation effect in communication process
US20200070044A1 (en) * 2018-09-05 2020-03-05 Netmarble Corporation Server and Method for Providing Game Service Based on Interface Expressing Audio Visually
CN110086937A (en) * 2019-04-28 2019-08-02 上海掌门科技有限公司 Display methods, electronic equipment and the computer-readable medium of call interface

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117041225A (en) * 2023-09-28 2023-11-10 中科融信科技有限公司 Multi-party audio and video communication method and system based on 5G

Similar Documents

Publication Publication Date Title
CN110515452B (en) Image processing method, image processing device, storage medium and computer equipment
US11736756B2 (en) Producing realistic body movement using body images
JP7098120B2 (en) Image processing method, device and storage medium
CN110850983B (en) Virtual object control method and device in video live broadcast and storage medium
CN112379812B (en) Simulation 3D digital human interaction method and device, electronic equipment and storage medium
US11783524B2 (en) Producing realistic talking face with expression using images text and voice
US9479736B1 (en) Rendered audiovisual communication
KR101894573B1 (en) Smart phone interface management system by 3D digital actor
KR102148151B1 (en) Intelligent chat based on digital communication network
Capin et al. Realistic avatars and autonomous virtual humans in: VLNET networked virtual environments
CN112684894A (en) Interaction method and device for augmented reality scene, electronic equipment and storage medium
US11790614B2 (en) Inferring intent from pose and speech input
WO2022252866A1 (en) Interaction processing method and apparatus, terminal and medium
CN111897431A (en) Display method and device, display equipment and computer readable storage medium
CN114787759A (en) Communication support program, communication support method, communication support system, terminal device, and non-language expression program
CN114979789A (en) Video display method and device and readable storage medium
CN114779948B (en) Method, device and equipment for controlling instant interaction of animation characters based on facial recognition
CN113824982A (en) Live broadcast method and device, computer equipment and storage medium
CN112634773B (en) Augmented reality presentation method and device, display equipment and storage medium
US20240071008A1 (en) Generating immersive augmented reality experiences from existing images and videos
US20240087242A1 (en) 3d cursor functionality for augmented reality content in messaging systems
US20240087246A1 (en) Trigger gesture for selection of augmented reality content in messaging systems
US20240087244A1 (en) Cursor functionality for augmented reality content in messaging systems
US20240087245A1 (en) Gestures to enable menus using augmented reality content in a messaging system
US20240071007A1 (en) Multi-dimensional experience presentation using augmented reality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40074442

Country of ref document: HK