WO2022089224A1 - 一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents
一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDFInfo
- Publication number
- WO2022089224A1 WO2022089224A1 PCT/CN2021/124089 CN2021124089W WO2022089224A1 WO 2022089224 A1 WO2022089224 A1 WO 2022089224A1 CN 2021124089 W CN2021124089 W CN 2021124089W WO 2022089224 A1 WO2022089224 A1 WO 2022089224A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- virtual
- video
- audio
- user
- virtual object
- Prior art date
Links
- 230000006854 communication Effects 0.000 title claims abstract description 374
- 238000004891 communication Methods 0.000 title claims abstract description 272
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000004590 computer program Methods 0.000 title claims abstract description 28
- 238000003860 storage Methods 0.000 title claims abstract description 26
- 230000004044 response Effects 0.000 claims abstract description 40
- 230000009471 action Effects 0.000 claims description 60
- 230000002452 interceptive effect Effects 0.000 claims description 55
- 239000000463 material Substances 0.000 claims description 52
- 238000012545 processing Methods 0.000 claims description 40
- 230000003993 interaction Effects 0.000 claims description 23
- 230000007704 transition Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 21
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000012790 confirmation Methods 0.000 claims description 12
- 239000013077 target material Substances 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 6
- 230000003068 static effect Effects 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 description 42
- 238000010586 diagram Methods 0.000 description 30
- 238000005516 engineering process Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 21
- 238000009877 rendering Methods 0.000 description 20
- 238000013473 artificial intelligence Methods 0.000 description 16
- 238000000605 extraction Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 7
- 230000001629 suppression Effects 0.000 description 6
- 210000003128 head Anatomy 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 244000141359 Malus pumila Species 0.000 description 1
- 235000011430 Malus pumila Nutrition 0.000 description 1
- 235000015103 Malus silvestris Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/157—Conference systems defining a virtual conference space and using avatars or agents
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1083—In-session procedures
- H04L65/1089—In-session procedures by adding media; by removing media
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
Definitions
- the present application relates to the field of computer technology, and relates to a video communication method, apparatus, electronic device, computer-readable storage medium, and computer program product.
- smart terminals such as mobile phones and tablet computers have occupied a pivotal position in people's daily life.
- people can conduct real-time video communication anytime, anywhere, reducing the cost of people's communication.
- the displayed video is usually the original audio and video data collected by the collection device. Therefore, the video display method in video communication is single, resulting in poor video rendering effect in video communication. .
- Embodiments of the present application provide a video communication method, apparatus, electronic device, computer-readable storage medium, and computer program product, which can improve video rendering effects in video communication.
- Embodiments of the present application provide a video communication method, including:
- first virtual object information in response to a first virtual object selection operation on a first video communication interface, where the first video communication interface is an interface for video communication between the electronic device and the opposite end of the electronic device;
- first virtual video picture and a second virtual video picture are displayed, the first virtual video picture is associated with the first virtual object information and the first key part of the user, the first virtual video picture is associated with the first virtual object information
- the two virtual video images are associated with second virtual object information and a second key part of the user, and the second virtual object information is obtained by the opposite terminal in response to a second virtual object selection operation on the second video communication interface;
- the target virtual audio includes one or both of a first virtual audio and a second virtual audio
- the first virtual audio is associated with the first voice data and the first virtual object information
- the second virtual audio is associated with second voice data and the second virtual object information.
- An embodiment of the present application provides a video communication device, including:
- a virtual object acquisition module configured to obtain first virtual object information in response to a first virtual object selection operation for a first video communication interface, the first video communication interface for the electronic device and the opposite end of the electronic device. interface for video communication;
- a picture output module configured to display a first virtual video picture and a second virtual video picture in the first video communication interface, the first virtual video picture and the first virtual object information and the first key part of the user
- the second virtual video picture is associated with second virtual object information and a second key part of the user
- the second virtual object information is the second virtual object selection by the peer in response to the second video communication interface obtained by operation
- An audio output module configured to play target virtual audio, the target virtual audio includes one or both of a first virtual audio and a second virtual audio, the first virtual audio and the first voice data and the first virtual audio Virtual object information is associated, and the second virtual audio is associated with second voice data and the second virtual object information.
- An embodiment of the present application provides an electronic device for video communication, including: a processor and a memory; the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the video communication method in the embodiment of the present application .
- the embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed by a processor, execute the video communication method in the embodiments of the present application.
- Embodiments of the present application provide a computer program product, where the computer program product includes computer instructions that, when executed by a processor of an electronic device for video communication, execute the video communication method provided in the embodiments of the present application.
- the embodiments of the present application include at least the following beneficial effects: because in the process of video communication, video communication can be realized in the form of virtual audio and video through the selected virtual object information (that is, the first virtual object information and the second virtual object information), The video display modes in the video communication are diversified, and therefore, the video rendering effect in the video communication can be improved.
- FIG. 1 is a network architecture diagram provided by an embodiment of the present application.
- FIGS. 2 to 3 are schematic diagrams of a scenario of entering video virtual communication provided by an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a video communication method provided by an embodiment of the present application.
- 5a is a schematic diagram 1 of generating a virtual video picture provided by an embodiment of the present application.
- Fig. 5b is a schematic diagram 2 of generating a virtual video picture provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of a module for converting voice data into virtual audio provided by an embodiment of the present application
- FIG. 7 is a schematic flowchart of an interaction in video virtual communication provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of a scenario of interaction in video virtual communication provided by an embodiment of the present application.
- FIG. 9a is a schematic diagram of a scene of performing background material switching in video virtual communication according to an embodiment of the present application.
- FIG. 9b is an interactive schematic diagram of a video communication method provided by an embodiment of the present application.
- FIG. 10 is a system architecture diagram provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a video communication device provided by an embodiment of the present application.
- FIG. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technique of computer science used to determine the essence of intelligence and produce a new type of intelligent machine that can respond in a similar way to human intelligence. Therefore, artificial intelligence is used to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
- the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones It is believed that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
- the video communication method provided by the embodiment of the present application belongs to a speech processing technology (Speech Technology) under the field of artificial intelligence.
- the speech processing technology involves automatic speech recognition technology (ASR), speech synthesis technology (TTS), and voiceprint recognition technology. Making computers able to hear, see, speak, and feel is the development direction of human-computer interaction in the future. Among them, voice interaction is one of the ways of human-computer interaction.
- ASR automatic speech recognition technology
- TTS speech synthesis technology
- voiceprint recognition technology is one of the ways of human-computer interaction.
- the displayed video is usually the original audio and video data collected by the acquisition device. Therefore, the video display method in video communication is single, resulting in a video rendering effect in video communication. poor.
- the video display method in video communication is single, resulting in a video rendering effect in video communication. poor.
- the user wants to display some virtual data (such as special effects animation), it is necessary to interrupt the video communication, and then send the virtual data through the session page. Therefore, the video cannot be maintained when the virtual data is displayed during the video communication process. Normal operation and display of communications.
- the embodiments of the present application provide a video communication method, apparatus, electronic device, computer-readable storage medium, and computer program product, which can improve the video rendering effect in the video communication process and display virtual data in the video communication process. It maintains the normal operation and display of the video communication, reduces the interactive operation in the video communication process, and thus reduces the resource consumption caused by the interactive operation in the video communication process.
- FIG. 1 is a network architecture diagram provided by an embodiment of the present application.
- the network architecture is used to implement a video communication system.
- the video communication system 1-1 may include a service server 1000 and a user terminal cluster, and the user terminal cluster may include one or more user terminals, which will not be discussed here.
- the number of user terminals is limited.
- the multiple user terminals may include user terminals 100a, user terminals 100b, user terminals 100c, . . . , user terminals 100n; as shown in FIG.
- the user terminal 100n can be connected to the service server 1000 via a network, so that each user terminal can perform data interaction with the service server 1000 through the network connection, so as to realize a virtual video picture (called the first virtual video) during the video communication process.
- picture and/or second virtual video picture) and virtual audio referred to as target virtual audio; further, based on the virtual video picture and virtual audio, video communication between at least two user terminals is implemented.
- each user terminal shown in FIG. 1 can be installed with a target application, and when the target application runs in each user terminal, data interaction can be performed with the service server 1000 shown in FIG. 1 respectively. , so that the service server 1000 can receive service data from each user terminal.
- the target application may include an application with a function of displaying data information such as text, image, audio, and video.
- the application may be an instant messaging application, which may be used for real-time communication between users.
- the instant messaging application is a video communication application, the user can perform video communication through the video communication application.
- the embodiment of the present application provides one or more virtual objects in the instant messaging application to perform video communication.
- Two users of the communication can select any virtual object, so as to enter into the video virtual communication; wherein, in the video communication system, each user logs in to the client for video communication running on the user terminal through an account to communicate with other users. The user conducts video communication.
- the service server 1000 in this embodiment of the present application can obtain service data according to these applications.
- the service data can be virtual objects (for example, cartoon characters) selected by the user, the user's voice data, and the user's voice data. expression, etc.
- the service server 1000 converts the acquired voice data of the user into virtual audio based on the selected virtual object, so that the virtual audio has the configured timbre corresponding to the virtual object
- the service server 1000 can also fuse the acquired user's expression with the selected virtual object to generate a virtual object with the user's expression; subsequently, the service server can send the virtual audio and this virtual object with the user's expression to the user
- the terminal, the user terminal can output the virtual video picture including the virtual object with the user's expression and the virtual audio in the video communication interface.
- one user terminal may be selected from multiple user terminals as the target user terminal, and the target user terminal may include: a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart TV, a smart speaker, a desktop computer, Smart terminals that carry data processing functions (eg, text data display function, video data playback function, music data playback function), such as smart watches and vehicle-mounted devices, are not limited thereto.
- the user terminal 100a shown in FIG. 1 may be used as the target user terminal, and the target user terminal may be integrated with the above-mentioned target application.
- the target user terminal may communicate with the service server 1000 through the target application. data exchange between them.
- the target user terminal can display at least one virtual object according to the trigger operation of the video virtual communication control; then, the user can select any virtual object from the at least one virtual object as the target virtual object; then, the service server can The user expression of the user using the target user terminal is obtained, and the user expression of the user is fused with the target virtual object to generate a target virtual object with the user expression (for example, if the user expression is a pursed smile, it is possible to generate The cartoon character with pursed lips and smiling expression); meanwhile, the service server can obtain the audio processing model corresponding to the target virtual object (including the timbre feature corresponding to the target virtual object in the audio processing model), and the service server can obtain the user's voice data , and convert the voice data into virtual audio with the timbre feature of the
- the network architecture may include multiple service servers, one user terminal may be connected to one service server, and each service server may obtain service data in the connected user terminal (for example, the user selected virtual object, user's voice data, user's user expression), and convert the user's voice data into virtual audio with the timbre characteristics of the selected virtual object; fuse the user's expression with the virtual object to generate a virtual audio with the user's expression The virtual video picture corresponding to the object.
- service data in the connected user terminal for example, the user selected virtual object, user's voice data, user's user expression
- each service server may obtain service data in the connected user terminal (for example, the user selected virtual object, user's voice data, user's user expression), and convert the user's voice data into virtual audio with the timbre characteristics of the selected virtual object; fuse the user's expression with the virtual object to generate a virtual audio with the user's expression The virtual video picture corresponding to the object.
- the user terminal can also obtain service data (such as the virtual object selected by the user, the user's voice data, the user's user expression), and convert the user's voice data into a virtual object with the timbre characteristics of the virtual object. Audio; fuse user expressions with virtual objects to generate virtual video images corresponding to virtual objects with user expressions.
- service data such as the virtual object selected by the user, the user's voice data, the user's user expression
- the video communication method provided by the embodiments of the present application may be performed by an electronic device used for video communication, where the electronic device includes but is not limited to a user terminal or a service server.
- the business server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud Cloud servers for basic cloud computing services such as communications, middleware services, domain name services, security services, Content Delivery Network (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
- the user terminal and the service server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
- FIG. 2-FIG. 3 is a schematic diagram of a scenario of entering a video virtual communication provided by an embodiment of the present application.
- the service server shown in FIG. 2-FIG. 3 may be the service server 1000 shown in FIG. 1, and the user terminal A shown in FIG. 2-FIG. 3 may be any user selected in FIG. 1 A terminal, for example, the user terminal A may be the above-mentioned user terminal 100b; and the user terminal B shown in FIG. 2-FIG. 3 may be any one selected in the user terminal cluster of the embodiment corresponding to FIG. 1 except the user User terminals other than terminal A, for example, the user terminal B may be the above-mentioned user terminal 100a.
- FIG. 2 illustrates the video communication performed on the side of the user terminal A corresponding to the user A.
- a video virtual communication control 2-11 (such as a display form of icons and text "video virtual communication") is included for user A and user B to use the video virtual communication control 2- 11 Enter the video virtual communication.
- controls corresponding to the video virtual communication are displayed, namely the video virtual communication controls 2-11, and the user A can click the video virtual communication control 2-11.
- Communication control 2-11 after user A clicks on the video virtual communication control 2-11, user terminal A can respond to this trigger operation of user A, and display virtual object list 2 in the first video communication interface 2-1 -12 (referred to as at least one virtual object), user A can select a virtual object in the virtual object list 2-12 as the target virtual object.
- the virtual object list 2-12 can be displayed in the bottom area of the first video communication interface 2-1 in the form of a floating window, a layered layer, or a semi-transparent form, or can be displayed in a retractable interface that can change the display size through a drag and drop operation. It is shown that the size of the interface is smaller than that of the first video communication interface 2-1.
- the display area that displays user B or user A in the form of a small window will move to an area that does not have an overlapping portion with the display area of the virtual object list 2-12; That is, the display area of user B or user A is not covered by the display area of the virtual object list 2-12.
- the area M where the user B is displayed is moved up in the first video communication interface 2-1, and the area M is different from the display area of the virtual object list. Has overlapping parts.
- the virtual object list 2-12 includes a virtual object 20a, a virtual object 20b, and a virtual object 20c.
- user A selects the virtual object 20a, he can click the open video virtual communication button 2-13, and the user terminal A
- an opening request for virtual communication can be generated and sent to the service server 1000, and the service server 1000 can query whether the user terminal B to which the user B who performs video communication with the user A belongs has Enable video virtual communication.
- the service server 1000 may return the query result to user terminal A (user terminal B has not enabled virtual video communication), and the user terminal A can display the invitation prompt information 2-14 in the first video communication interface 2-1, and user A can view the invitation prompt information 2-14; the invitation prompt information 2-14 can be in the form of a pop-up window or a masked layer Or displayed in a translucent form in any area of the first video communication interface 2-1, for example, as shown in FIG.
- the invitation prompt information 2-14 is displayed in the form of a popup window on the first video communication interface 2-1 , the pop-up window includes the invitation prompt information 2-14 "The other party has not opened the video virtual communication, whether to invite the other party to open the video virtual communication", and as shown in Figure 2, the pop-up window containing the invitation prompt information 2-14 It will also include a confirmation control 2-151 and a cancel control 2-152, for user A to choose whether to send an invitation to open video virtual communication to user B. If user A clicks the confirmation control 2-151, then user terminal A sends a message to the service server.
- the service server 1000 sends the opening invitation request to the user terminal B; if the user A clicks the cancel control 2-152, the service server 1000 will not send the opening invitation request to the user terminal B (called for an open request).
- the display area that displays user B or user A in the form of a small window will move to an area that does not overlap with the display area of the invitation prompt information 2-14. That is, the display area of user B or user A is not covered by the display area of the invitation prompt information 2-14.
- the area M where the user B is displayed is moved to the lower right corner of the first video communication interface 2-1, and the area M does not have the same relationship with the display area of the invitation prompt information. overlapping part.
- the service server 1000 forwards the opening invitation request to the user terminal B.
- the user B can view the opening invitation request on the second video communication interface 3-1 of the user terminal B.
- the opening invitation prompt information 3-11 corresponding to the opening invitation request is displayed, and the virtual object list 3-12 is displayed.
- the service server 1000 can send a notification message to the user terminal A that the user terminal B has started the video virtual communication.
- the user terminal A can In the communication interface 2-1, the time prompt information 3-14 is displayed to remind the user A that the user B has opened the video virtual communication. After the waiting time (for example, 3s), the user A and the user B will enter the video virtual communication .
- user terminal B may also display time prompt information 3-15 to remind user B that after the waiting time (for example, 3s), user B and user A will enter the video virtual communication.
- the waiting period may be presented on the first video communication interface 2-1 and the second video communication interface 3-1 in a countdown manner.
- the service server 1000 can obtain the virtual object 20a selected by the user A, and obtain the expression data of the user A, and then the service server 100 can associate the expression data of the user A with the virtual object 20a. Fusion is performed, so that a first virtual video picture containing a virtual object 20a (the virtual object 20a has the expression of user A) can be generated; in the same way, the service server 1000 can obtain the virtual object 20b selected by user B, and obtain user B Then, the service server 1000 can fuse the expression data of user B with the virtual object 20b, so as to generate a second virtual video image including the virtual object 20b (the virtual object 20b has the expression of user B).
- the service server can send the first virtual video picture and the second virtual video picture to user terminal A and user terminal B respectively, and user terminal A and user terminal B can both display the first virtual video on their respective video communication interfaces The video picture and the second virtual video picture. That is to say, what user A can view on the first video communication interface of user terminal A is that two virtual objects (virtual object 20a and virtual object 20b) are in video communication, and user B can view the second video communication interface of user terminal B It is observed in the video communication interface that two virtual objects (the virtual object 20b and the virtual object 20a) are in video communication.
- the service server 1000 can also obtain the voice data corresponding to user A and user B respectively, and the service server 1000 can perform voice conversion on the voice data of user A (referred to as the first voice data). , generate a virtual audio a (called the first virtual audio) with the timbre characteristics of the virtual object 20a, and send the virtual audio a to the user terminal B; for the same reason, the service server 1000 may also send the voice data of the user B (called the first virtual audio) Voice conversion is performed for the second voice data) to generate a virtual audio b (referred to as a second virtual audio) having the timbre feature of the virtual object 20b, and the virtual audio b is sent to the user terminal A.
- a virtual audio a called the first virtual audio
- Voice conversion is performed for the second voice data
- a virtual audio b referred to as a second virtual audio
- the user terminal A can output the virtual audio b in the first video communication interface, then the picture seen by the user A is the second virtual video picture of the virtual object 20b containing the expression of the user B, and the The voice of user B has the timbre characteristics of the virtual object 20b; similarly, user terminal B can output the virtual audio a in the second video communication interface, then the picture seen by user B is a virtual image containing the expression of user A.
- the sound heard at the same time has the timbre characteristics of the virtual object 20a.
- an entrance for virtual video communication is added, and users can select virtual objects in the video communication interface, so that their chat image can be converted into a virtual image. and when the avatar is displayed, the video communication can operate and display normally; and when the user conducts virtual video chat, the voice of the other party heard is the virtual audio after voice conversion, and the virtual audio is not the user's Original sound, but virtual audio with the timbre characteristics of virtual objects.
- the embodiment of the present application changes the communication users in the video communication from the two dimensions of sound and picture, so that the interestingness of the video communication and the video display mode are increased, and the quality of the video communication can be well improved, so that the video communication can be greatly improved. Improve user experience well.
- FIG. 4 is a schematic flowchart of a video communication method provided by an embodiment of the present application.
- the video communication method may be executed by a user terminal (eg, the user terminal shown in FIG. 1 and FIG. 2 ) or a service server (eg, the service server 1000 shown in FIG. 1 ), or may be executed jointly by the user terminal and the service server implement.
- a user terminal eg, the user terminal shown in FIG. 1 and FIG. 2
- a service server eg, the service server 1000 shown in FIG. 1
- the embodiments of the present application take the example that the video communication method is executed by the above-mentioned user terminal as an example.
- the video communication method may include at least the following S101-S103, and each step will be described below.
- the first video communication interface includes a video virtual communication control for triggering video virtual communication; thus, a user (referred to as a first user, such as user A) can click on the video virtual communication control, thereby Video virtual communication can be performed with the communication user (referred to as the second user, such as user B); here, when the user clicks the video virtual communication control, the first terminal (referred to as the electronic device for video communication) also triggers the video Process flow of virtual communication.
- a user referred to as a first user, such as user A
- the second user such as user B
- the first terminal referred to as the electronic device for video communication
- the first terminal displays at least one virtual object in response to the triggering operation of the first user on the video virtual communication control; wherein, the virtual object may refer to a different image from a real person.
- the virtual image for example, the virtual object can be a three-dimensional animation image (including an animated character image (eg, an animation character image), an animated animal image (eg, an animation animal image), an animated plant image (eg, an animated apple tree image), and many more).
- the first user may select a virtual object from the at least one virtual object as a virtual object for image transformation; when the first user selects a virtual object from the at least one virtual object, the first terminal responds to the first user's request for at least one virtual object.
- a first virtual object selection operation of a virtual object obtaining the virtual object selected by the first user, and determining the information corresponding to the virtual object as the first virtual object information; wherein, the information corresponding to the virtual object includes a virtual object model and a virtual audio model etc.
- the first video communication interface is an interface for performing video communication between the electronic device and the opposite end of the electronic device, wherein the first video communication interface is the video communication interface displayed on the side of the electronic device, and the opposite end of the electronic device is the first video communication interface.
- the second terminal used by the user it may also be at least one virtual object displayed when the electronic device receives a request for virtual video communication, which is not limited in this embodiment of the present application.
- At least one virtual object is added in the video communication, so that, based on the virtual object selected by the user, the character image in the raw data collected by the video capture device can be switched to the image of the selected virtual object, so that Video communication is implemented in the form of video virtual communication, which improves the rendering effect of video in video communication.
- S102 in the first video communication interface, display a first virtual video picture and a second virtual video picture, the first virtual video picture is associated with the first virtual object information and the key part of the first user, and the second virtual video picture is associated with the first virtual video picture.
- the second virtual object information is associated with the second key part of the user, and the second virtual object information is obtained by the opposite end in response to a second virtual object selection operation on the second video communication interface.
- the first terminal may send an invitation message to the second user to enable the video virtual communication to prompt the second user to enable the video virtual communication function.
- the first user can click to start the video virtual communication, and the first terminal can display the invitation prompt information in the first video communication interface in response to the opening operation for the video virtual communication control;
- the prompt information is the opening invitation information for the video virtual communication control, and the opening invitation information is the prompt message for the first terminal to request the second terminal to open the video virtual communication control; that is, the invitation prompt information is used to prompt the first user to ask the second
- the user sends the opening invitation information for the video virtual communication control; for the invitation prompt information, the first user can click the confirmation invitation control, and at this time, the first terminal can respond to the first user's confirmation operation for the invitation prompt information, and send the second terminal to the second terminal. Sends an open request for the video virtual communication control.
- the second terminal can display the opening prompt information on the second video communication interface of the second terminal, and the second user can view the opening prompt information and enable virtual video communication; this
- the second terminal also receives the operation of the second user to start the video virtual communication
- the second terminal responds to the operation of the second user to start the video virtual communication, and returns the confirmation information of the second user's start of the video virtual communication to the first terminal
- the first terminal can receive the confirmation information returned by the second terminal for the opening request, and output the time prompt information; the time prompt information is used to prompt the first user to enter the waiting time of the video virtual communication; and when the time corresponding to the waiting time is reached,
- the first terminal outputs a first virtual video picture and outputs a second virtual video picture in the first video communication interface.
- the first user and the second user need to open the video virtual communication before entering the video virtual communication, and before entering the video virtual communication, there will be a preparation time (waiting time),
- the waiting time can be 3 seconds, 1 minute, 1 hour, etc., which will not be exemplified here.
- the first user and the second user will enter the video virtual communication, and during the video virtual communication, the user key for covering the first user is presented in the first video communication interface.
- the first virtual video picture is a virtual video picture generated by a key part of the user of the first user (referred to as the key part of the first user), and the second virtual video picture is a key part of the user of the second user (referred to as the key part of the user).
- the key part of the user may refer to the user's eyes, lips, nose, eyebrows, etc.
- the key part of the user may be used to represent the expression information (for example, the first user and the second user) of the user ( For example, smile emoji, pursed mouth emoji, open mouth emoji, wide-eyed and open-lipped emoji, etc.).
- the first virtual object information includes a first virtual object model
- the second virtual object information includes a second virtual object model
- the process of outputting the first virtual video picture and the second virtual video picture includes: the first The terminal acquires the first virtual key part in the first virtual object model, and acquires the user key part of the first user; wherein, the first virtual key part and the user key part of the first user belong to the same part type; for example , the user key part of the first user is the eye part, then the first virtual key part in the first virtual object model should also be the corresponding eye part.
- the first terminal can output a first virtual video picture including the first virtual object model in the first video communication interface, and the first virtual object
- the first virtual key part of the model is associated with the user key part of the first user; similarly, the first terminal obtains the second virtual key part in the second virtual object model, and obtains the user key part of the second user; wherein , the second virtual key part and the user key part of the second user belong to the same part type; according to the second virtual key part and the user key part of the second user, also in the first video communication interface, A second virtual video picture including the second virtual object model is output, and the second virtual key part of the second virtual object model is associated with the user key part of the second user.
- the process of outputting a virtual video image by the first terminal according to the key parts of the user and the virtual key parts in the virtual object model is described below.
- the first terminal outputs the first virtual video image according to the user key parts and the first virtual key parts of the first user (the first virtual video image is output according to the user key parts and the second virtual key parts of the second user)
- the specific implementation of the video picture can be the same).
- the first terminal first obtains the virtual action state of the first virtual key part and the part action state corresponding to the user key part of the first user; here, the virtual action state corresponds to two states: the first A critical state and a second critical state, wherein the first virtual key part in the first critical state corresponds to the first model position coordinate, and the second virtual key part in the second critical state corresponds to the second model position coordinate.
- the first terminal determines the relationship between the state and the location coordinates according to the first critical state and the first model location coordinates, and the second critical state and the second model location coordinates, and then determines the relationship between the state and the location coordinates based on the relationship between the state and the location coordinates.
- the position coordinates of the first virtual key part in the action state of the part are corresponding to the position coordinates of the target model; when the first terminal adjusts the first virtual key part in the first virtual object model to the position coordinates of the target model, It also realizes that the virtual action state corresponding to the first virtual key part is adjusted to the action state that matches the action state of the part, and at this time, the first virtual object model is also converted into the first target virtual object model; Subsequently, in the first video communication interface, the first terminal outputs a first virtual video image including the first target virtual object model; and the virtual key part in the first target virtual object model is in a part action state.
- FIG. 5a is a schematic diagram 1 of generating a virtual video image provided by an embodiment of the present application.
- the schematic diagram shown in FIG. 5a is described by taking the virtual key part as the lip part and the user key part as the lip part as an example.
- the image when the virtual action state of the virtual key part (for example, the virtual lip part) in the first virtual object model is the first critical state 5-11, the image may be the image 50a, wherein the first critical state 5-11 may mean that the virtual lip part is in a lip-closed state (at this time, the expression value of the first virtual object model is 0); when the virtual action state of the virtual lip part in the second virtual object model is the second critical state 5
- the image at -12 may be the image 50b, wherein the second critical state 5-12 may refer to a state where the lips are in a state of wide open lips (at this time, the expression value of the first virtual object model is 1).
- the first terminal can obtain the first model position coordinates 5-21 corresponding to the first virtual key part in the first virtual object model when the virtual lip part is in the lip closed state (that is, when the expression value is 0, the virtual lip part corresponds to model position coordinates); the first terminal can also obtain the position coordinates 5-22 of the second model corresponding to the first virtual key part in the first virtual object model when the virtual lip part is in a state of wide-open lips (that is, the expression value is 1 is the model position coordinates corresponding to the virtual lips).
- the first terminal can also obtain the user picture captured by the camera during the video communication, and in the image 50c of the user picture, the user's lips (that is, the first user's lips) can be obtained.
- the user's lip part of the first user is in a half-open lips smile state, that is, the current part action state is a half-open lips smile state.
- the first terminal can determine the expression value (for example, the expression value is 0.3) when the action state of the part is the half-opened smile state, and according to the first model position coordinates 5-21 corresponding to the expression value 0 and the first model corresponding to the expression value 1
- the target model position coordinates 5-3 corresponding to the expression value 0.3 can be determined; according to the target model position coordinates 5-3, the virtual action state of the virtual lips in the virtual object model can be performed.
- Adjustment that is to say, the virtual lip part in the virtual object model can also be adjusted to the half-opened smile state, so that the image 50d in the virtual video screen can be generated, and the expression of the virtual object in the image 50d can match the user's expression.
- the expressions match.
- the first terminal first obtains the two-dimensional coordinates of the key points corresponding to the user key parts of the first user; according to the two-dimensional coordinates of the key points, determines the three-dimensional coordinates of the key points corresponding to the first virtual key part; according to the three-dimensional coordinates of the key points, Determine the target virtual action state corresponding to the first virtual key part; in the first video communication interface, output a first virtual video picture containing the second target virtual object model; the first virtual key part in the target virtual object model in the target virtual action state.
- the first terminal acquires the position mapping relationship between the first user key part and the first virtual key part, and in the first virtual object model, according to the position mapping relationship, adjusts the state of the first virtual key part to The part action state of the key part of the first user is used to obtain a second target virtual object model, and in the first video communication interface, a first virtual video image including the second target virtual object model is output.
- the first terminal can directly map the point position of the key part corresponding to the key part of the first user to the virtual object model, and map the point position change of the key part to the model point change of the first virtual object model by detecting the point position change, so as to It is realized that the expression of the first virtual object model changes with the user's expression.
- FIG. 5b is a second schematic diagram of generating a virtual video picture provided by an embodiment of the present application.
- FIG. 5b when the action state of the part corresponding to the key part of the first user is mapped to the image 5-41 corresponding to the first virtual key part, virtual videos as shown in images 5-42 to 5-44 are obtained screen.
- the implementation of outputting virtual video images for the user terminal may be implemented by a real-time rendering component in the user terminal.
- the real-time rendering component here may refer to a component with screen rendering capability.
- the real-time rendering component may be a real-time 3 Dimensions (3D) engine, such as an "Ace3D” engine.
- the "Ace3D” engine can be deployed in camera applications in user terminals, with fast loading speed, small memory footprint, and high compatibility, and can be used for hair rendering, 3D animation expression rendering, etc.
- the target virtual audio includes one or both of the first virtual audio and the second virtual audio
- the first virtual audio is associated with the first voice data and the first virtual object information
- the second virtual audio is associated with the first virtual audio.
- the second voice data is associated with the second virtual object information; wherein, the first voice data is the voice data of the first user, and the first terminal can convert the second voice data into second virtual audio; the second voice data is the second voice data.
- the first terminal may accept the second virtual audio converted by the second terminal based on the second voice data.
- the target virtual audio when the first user speaks, the target virtual audio includes the first virtual audio, and at this time, the first terminal synchronously plays the target virtual audio and the first virtual video picture.
- the target virtual audio includes the second virtual audio, and at this time, the first terminal plays the target virtual audio synchronously with the second virtual video image.
- the target virtual audio includes the first virtual audio and the second virtual audio.
- the first terminal synchronously plays the target virtual audio and the first virtual video, and displays the The target virtual audio is played synchronously with the second virtual video picture.
- the first terminal can collect the voice data of the first user to obtain the first voice data; and convert the first voice data into a communication with the first user.
- the first virtual audio is obtained by communicating virtual audio associated with the selected first virtual object information.
- the first terminal can also collect the user key part data of the first user, and send the first virtual audio and the user key parts of the first user to the second terminal, so that the second terminal can The key part generates a first virtual video picture, and outputs the first virtual audio and the target virtual video picture synchronously.
- the second terminal may also collect the voice data of the second user to obtain the second voice data; and convert the second voice data into the communication virtual audio associated with the second virtual object information selected by the second user, Obtain the second virtual audio; at the same time, the second terminal can also collect the user key part data of the second user, and the second terminal can also send the second virtual audio and the user key parts of the second user to the first terminal, so that The first terminal may generate a second virtual video image according to the user key part of the second user, and output the second virtual audio and the virtual video image synchronously.
- the process of converting the user's voice data into virtual audio is described below by taking the first terminal converting the voice data of the first user into the first virtual audio associated with the first virtual object information as an example.
- the process of converting the first voice data into the first virtual audio associated with the first virtual object information by the first terminal includes: the first terminal performs voice preprocessing on the voice data to obtain transitional voice data , and input the transitional voice data into an audio processing model, through which the audio features of the transitional voice data can be extracted; wherein, the audio processing model is an audio processing model included in the first virtual object information, and the audio processing model
- the model includes the timbre features of the first virtual object model (the virtual object model included in the first virtual object information), for example, if the first virtual object model is an animation character A model, then the audio processing model including the timbre feature of the cartoon character A model; the first terminal can acquire the timbre feature contained in the audio processing model; the timbre feature is associated with the first virtual object model; subsequently, the first terminal can associate the audio feature with the
- the timbre features are fused, so that a fused audio feature can be obtained, and the first virtual audio can be generated according to the fused audio feature.
- the audio processing model may refer to a model that has the capability of voice-changing voice processing.
- a real-time AI voice changing model For example, a real-time AI voice changing model.
- the real-time AI voice changing model can extract the content information in the user's voice data (for example, the rhythm feature data and emotional feature data of the voice data), and then combine the timbre features of the virtual objects and convert them, so that the model virtual objects can be reproduced. sound.
- the real-time AI voice-changing model not only the realistic timbre conversion between the user's voice and the virtual object's voice can be realized, but also the characteristics of the user's speech rate, pause, emotion and speech mode can be vividly reflected through the virtual object. .
- FIG. 6 is a schematic diagram of a module for converting voice data into virtual audio provided by an embodiment of the present application.
- the audio features of the user voice data 6-2 and the timbre features of the virtual object model 6-3 can be extracted (for example, the probability output in the feature extraction module 6-1 can be The unit 6-11 (SI-DNN unit) outputs the phoneme posterior probability of the speech data, and the phoneme posterior probability is used as the audio feature); while the prosody extraction unit 6-12 (Pitch Extractor unit) in the feature extraction module 6-1 ) can be used to extract the prosodic feature (F0x value) corresponding to the user voice data 6-2 and the prosodic feature (F0y value) corresponding to the virtual object model 6-3.
- the features extracted by the feature extraction module 6-1 can be input into the frame selection module 6-4, and through the frame selection module 6-4, the target spectrum and the F0 path can be determined (which can be based on the Viterbi search of the minimum dynamic difference) to determine the target spectrum and F0 path).
- the final virtual audio can be output through the feature coding module 6-5 (the final determined spectral trace and the F0 trace can be sent to the neural voice encoder 6-52 based on the "LPC" network 6-51, through the neural voice coding 6-52 may generate virtual audio).
- the first terminal performs voice preprocessing on the first voice data to obtain the transition voice data may include: the first terminal determines echo audio data in the first voice data, and deletes the echo audio data in the first voice data to obtain echo transition voice data; then, the first terminal performs echo transition in the echo transition data.
- the voice data determine the noise audio data, and perform suppression processing on the noise audio data in the echo transition voice data to obtain the noise transition voice data; then, the first terminal determines the mute audio data in the noise transition voice data, and The mute audio data is deleted from the noise transition speech data to obtain the transition speech data.
- the first terminal may acquire the key user parts of the second user sent by the second terminal, and generate a second virtual video image according to the key user parts of the second user.
- the first terminal may also The second virtual audio sent by the second terminal is obtained.
- the first terminal may output the second virtual audio synchronized with the second virtual video picture in the first video communication interface.
- the first terminal can obtain the second picture timestamp corresponding to the second virtual video picture and the second voice timestamp corresponding to the second virtual audio; then, in the second voice timestamp, obtain the time corresponding to the second picture.
- stamp a second target voice timestamp with a time matching relationship; in the second virtual audio, the second virtual audio to be output corresponding to the target voice timestamp can be obtained, and the second virtual video picture and the second virtual audio to be output are performed. Synchronized output.
- the first terminal plays the first virtual voice and the first virtual video synchronously, it includes: the first terminal can acquire the first picture timestamp corresponding to the first virtual video picture, and the first picture corresponding to the first virtual audio.
- a first target voice timestamp that has a time matching relationship with the first picture timestamp can be obtained; in the first virtual audio, a first target voice timestamp corresponding to the target voice timestamp can be obtained a virtual audio to be output, and output the first virtual video image and the first virtual audio to be output synchronously.
- the embodiment of the present application enriches the image of the user in the video communication application, enables the users to conduct virtual video communication, and enables the user to normally maintain the operation and display of the video communication when displaying the virtual data. And in the video virtual communication, the user's voice data is converted into virtual audio for output, which can improve the video communication quality. That is to say, the embodiments of the present application can enrich the video display mode and interest of video communication, maintain normal operation and display of video communication while displaying virtual data, and improve the quality of video communication.
- FIG. 7 is a schematic flowchart of an interaction in video virtual communication provided by an embodiment of the present application. The process shown in FIG. 7 is described by taking the terminal as the first terminal and the user as the first user as an example.
- the virtual object model is a first partial display object
- the virtual object model associated with the second virtual object displayed in the second virtual video picture is the second partial display object.
- the interactive actions may include dancing actions, head touching actions, hugging actions, etc., which will not be exemplified here.
- the first video communication interface includes interactive controls. When the first user triggers the interactive controls through operations such as clicking, the first terminal also receives the interactive operation; at this time, the first terminal responds to the interactive operation and displays the interactive action of the virtual object list to select an interaction action from the virtual object interaction action list to interact with the second user. That is to say, the first terminal may display a virtual object interactive action list in the first video communication interface for the first user to select in response to the first user triggering the operation of the interactive control.
- the first terminal may respond to the triggering operation of the first user for the first user, and in the first video communication interface, the first user corresponding to the first user
- a partial display object is switched to the first overall display object.
- the local display object here may refer to an object including a partial area of the virtual object.
- the virtual object is a virtual character
- the local display object may refer to only the head area or neck area or body of the virtual object being displayed.
- the object of the part area or the foot area for example, the virtual object model presented in the first virtual video picture in FIG.
- the first overall display object may refer to an object including the entire area of the virtual object.
- the first overall display object can be understood as including the head area, neck area, body parts area as well as objects in the foot area.
- the first terminal generates a first virtual video image including the first overall display object performing the interactive action selected by the first user based on the interactive action selected by the user (referred to as the target interactive action); If the interactive action selected by the user is a dancing action, the first overall display object presented in the first virtual video picture will also perform the dancing action, and the second overall display object presented in the second virtual video picture will also perform the dancing action Action; both the first user and the second user can view the virtual video picture in which the first overall display object and the second overall display object are dancing.
- each interactive action corresponds to an action execution duration
- the first terminal matches the first virtual object presented in the first virtual video screen with the first virtual object.
- the virtual object model associated with the information is restored from the first overall display object to the first partial display object, and the virtual object model associated with the second virtual object information presented in the first virtual video picture is changed from The second overall presentation object is restored to the second partial presentation object.
- the selected interactive action obtained by the first terminal may be the first virtual video image that generates the first overall display object to perform the interactive action, or the first virtual video screen that generates the first overall display object to perform the interactive action.
- FIG. 8 is a schematic diagram of a scenario of interaction in video virtual communication provided by an embodiment of the present application.
- the user terminal A may be any user terminal in the user terminal cluster in FIG. 1 , for example, the user terminal A is the user terminal 100b.
- the first video communication interface 8-1 of the user terminal A user A and user B are conducting virtual video communication, and the virtual object model presented in the first video communication interface 8-1 is the first partial display object 8-14.
- the first video communication interface 8-1 includes an interactive control 8-11. After the user A clicks the interactive control 8-11, the user terminal A can display the interactive action list 8-12. As shown in FIG.
- the interactive action list 8-12 includes the interactive action of dancing, the interactive action of touching the head, the interactive action of hugging, and the interactive action of stomping feet, and the interactive action clicked by user A is the dancing action.
- the user terminal A may display the first overall display object 8-13 of the virtual object corresponding to the user A in the first video communication interface 8-1, and present the first overall display object 8-13 to perform a dance action
- the first virtual video picture of Object 8-13 reverts to first partial presentation object 8-14.
- the first video communication interface 8-1 when the first terminal implements the virtual video communication, the first video communication interface 8-1 also displays a control for closing the virtual communication, for example, a button for closing the virtual communication.
- the first terminal in order to increase the richness and interest of the video virtual communication, may also switch the background of the first video communication interface when the user performs the video virtual communication.
- the following describes the process of background switching in video virtual communication by taking the first terminal switching the background of the first video communication interface as an example.
- the first video communication interface includes a material switching control.
- the first terminal When the first user clicks the material switching control for the background, the first terminal also receives the material switching operation that triggers the material switching control; The terminal may display the configuration material list in response to the material switching operation; subsequently, the first user may select any configuration material as the target material, and the first terminal may respond to the first user's material selection operation for the configuration material list, and display the configuration material list in the video communication interface.
- the material is switched to the target material; the target material is the material selected by the material selection operation, and the target material includes one or both of static material (eg, static background image) and dynamic material (eg, dynamic background image).
- FIG. 9a is a schematic diagram of a scene of performing background material switching in video virtual communication according to an embodiment of the present application.
- the user terminal A may be any user terminal in the user terminal cluster in Fig. 1, for example, the user terminal A is the user terminal 100b.
- the first video communication interface 9-1 of the user terminal A user A and user B are conducting virtual video communication.
- the first video communication interface 9-1 includes a material switching control 9-11. After the user A clicks the material switching control 9-11, the user terminal A can display the configuration material list 9-12. As shown in FIG.
- the configuration material list 9-12 includes street material, flower material, planetary material, and park material, and the configuration material clicked by user A is the planetary material. Subsequently, the user terminal A switches the background material of the first video communication interface 9-1 in the user terminal A to the planet material.
- the first video communication interface 9-1 in the user terminal A includes a closing control 9-13 for video virtual communication, and when the user A clicks the closing control 9-13, the user terminal A can respond to the user A
- the time prompt information 9-14 is output on the first video communication interface 9-1 to remind the user A that the video virtual communication mode will be closed after waiting for a certain period of time (for example, 3s), and when the waiting period corresponds to time, the video virtual communication mode will be turned off.
- an interaction process between users is added (for example, interactive actions can be selected for interaction, and backgrounds can be switched), thereby increasing the fun of video communication. and interactivity to improve the quality of video communication.
- FIG. 9b is an interactive schematic diagram of a video communication method provided by an embodiment of the present application; as shown in FIG. 9b, the video communication method includes S301 to S310, which describe that the first terminal and the second terminal interact with each other to implement video The process of virtual communication.
- the first terminal sends an opening request to the second terminal in response to the opening operation for the video virtual communication control.
- the second terminal responds to the opening request and sends confirmation information to the first terminal through the service server.
- the first terminal obtains first virtual object information in response to a first virtual object selection operation on the first video communication interface.
- the second terminal obtains second virtual object information in response to the second virtual object selection operation on the second video communication interface.
- the first terminal converts the first voice data into first virtual audio associated with the first virtual object information, and sends the first virtual audio, the first user key part and the first virtual object information to the second terminal.
- the second terminal converts the second voice data into second virtual audio associated with the second virtual object information, and sends the second virtual audio, the second key part of the user, and the second virtual object information to the first terminal.
- the first terminal generates a second virtual video image based on the second user key part and the second virtual object information, and generates a first virtual video image based on the first user key part and the first virtual object information.
- the second terminal generates a first virtual video image based on the first user key part and the first virtual object information, and generates a second virtual video image based on the second user key part and the second virtual object information.
- the first terminal synchronously plays the first virtual video picture and the first virtual audio, and synchronously plays the second virtual video picture and the second virtual audio.
- the second terminal synchronously plays the first virtual video picture and the first virtual audio, and synchronously plays the second virtual video picture and the second virtual audio.
- the execution subject for obtaining the first virtual video picture and the second virtual video picture is an exemplary description, and may also be a server, which is not limited in this embodiment of the present application.
- FIG. 10 is a system architecture diagram provided by an embodiment of the present application.
- the system architecture diagram may include a sending terminal (corresponding to the first terminal) and a receiving terminal (corresponding to the second terminal).
- the system will switch to the audio processing flow and video processing flow shown in FIG. 10 .
- an audio collection module 10-11 is used to collect user voice data.
- the audio preprocessing module 10-12 is used to perform audio preprocessing on the speech data, for example, can perform acoustic echo cancellation (Acoustic Echo Cancellation, AEC) processing, noise suppression (ANS) processing, automatic gain (AGC), and Silence detection processing, etc.
- AEC acoustic Echo cancellation
- ANS noise suppression
- AGC automatic gain
- Silence detection processing etc.
- the voice conversion modules 10-13 are used to convert the timbre of the preprocessed audio.
- the audio coding module 10-14 is used for coding the converted audio to obtain the coded file.
- the audio encapsulation module 10-15 is configured to encapsulate the encoded file obtained by the audio encoding module 10-14 to obtain an audio data stream.
- the video collection modules 10-16 are used to collect video data including users.
- the video preprocessing modules 10-17 are used for preprocessing the collected video data, for example, the video can be transcoded, and the size of the video can be adjusted.
- the key part extraction module 10-18 is used to extract the key part data of the user in the video data, and can also track the user's expression in the video data.
- the key part data packaging module 10-19 is used to package and encapsulate the key part data extracted by the key part extraction module 10-18 to obtain a data stream related to the key part.
- the network packet receiving module 10-21 is used to receive the audio data stream and the data associated with the key parts sent by the transmitting terminal 10-1 through the public network 10-3 flow.
- the network decapsulation module 10-22 is used to decapsulate the audio data stream.
- the data unpacking module 10-23 is used to unpack the data stream corresponding to the key part.
- the audio decoding module 10-24 is used for decoding the decapsulated audio file.
- the audio rendering module 10-25 is used for rendering the data obtained after decoding.
- the picture rendering module 10-26 includes a 3D engine rendering unit 10-261, which is used for 3D rendering of the key part data obtained by decoding; and also includes a video rendering unit 10-262, which is used for virtual images of virtual communication.
- the synchronization module 10-27 is used for synchronizing the rendered audio with the picture.
- the video collection module 10-28 is configured to collect video data of the user corresponding to the receiving terminal 10-2.
- the video preprocessing module 10-29 is used for preprocessing the video data of the user of the receiving terminal 10-2, for example, performing video transcoding and size adjustment. Then, the preprocessed video data can be echoed back to the key part data, and the key part data can be input to the image rendering module 10-26 for image rendering.
- a virtual video communication experience can be brought to the user, so that both parties can transform from the real image of the character to the virtual image, and increase the number of users from the two dimensions of sound and picture.
- interestingness in video virtual communication can greatly improve the user's experience in video communication.
- FIG. 11 is a schematic structural diagram of a video communication apparatus provided by an embodiment of the present application.
- the video communication apparatus may be a computer program (including program code) running in a computer device, for example, the video communication apparatus is an application software; the video communication apparatus may be used to execute the method shown in FIG. 4 .
- the video communication apparatus 11-1 may include: a virtual object acquisition module 11-11, a picture output module 11-12, and an audio output module 11-13.
- the virtual object acquisition module 11-11 is configured to obtain first virtual object information in response to a first virtual object selection operation for a first video communication interface, the first video communication interface being the electronic device and the electronic device.
- the picture output modules 11-12 are configured to display a first virtual video picture and a second virtual video picture in the first video communication interface, the first virtual video picture and the first virtual object information and the first virtual video picture
- the key part of the user is associated
- the second virtual video picture is associated with the second virtual object information and the second key part of the user
- the second virtual object information is the second response of the peer to the second video communication interface. Obtained by virtual object selection operation;
- the audio output modules 11-13 are configured to play target virtual audio, the target virtual audio includes one or both of the first virtual audio and the second virtual audio, the first virtual audio and the first voice data and all is associated with the first virtual object information, and the second virtual audio is associated with the second voice data and the second virtual object information.
- the first video communication interface includes a video virtual communication control; in this case, the virtual object acquisition module 11-11 may include: an object display unit 11-111 and an object information acquisition unit 11-112.
- the object display units 11-111 are configured to display at least one virtual object in response to a triggering operation of the first user on the video virtual communication control.
- the object information acquiring units 11-112 are configured to acquire selected first virtual object information from at least one virtual object in response to a virtual object selection operation for at least one virtual object; the at least one virtual object includes first virtual object information.
- the video communication device 11-1 may further include: an invitation prompt output module 11-14, a start request sending module 11-15, and a time prompt output module 11-16.
- the invitation prompt output module 11-14 is configured to display the invitation prompt information in the first video communication interface in response to the opening operation for the video virtual communication control, and the invitation prompt information is for the video virtual communication control.
- Open invitation information where the open invitation information is a prompt message for the electronic device to request the opposite end to open the video virtual communication control.
- the activation request sending module 11-15 is configured to send an activation request for the video virtual communication control to the opposite end in response to the determination operation for the invitation prompt information.
- the time prompt output module 11-16 is configured to receive confirmation information returned by the opposite end for the opening request, and display time prompt information, where the time prompt information is the waiting time for entering the video virtual communication.
- the picture output modules 11-12 are further configured to display the first virtual video picture and all of the first virtual video picture in the first video communication interface when the time corresponding to the waiting time is reached. the second virtual video picture.
- the audio output module 11-13 may include: a timestamp obtaining unit 11-131, a timestamp matching unit 11-132, and a data synchronization output unit 11-133.
- Timestamp acquisition units 11-131 configured to acquire a first picture timestamp corresponding to the first virtual video picture, and a first voice timestamp corresponding to the first virtual audio; acquire the corresponding second virtual video picture The second picture timestamp of , and the second voice timestamp corresponding to the second virtual audio.
- the timestamp matching units 11-132 are configured to obtain, in the first voice timestamp, a first target voice timestamp that has a time matching relationship with the first picture timestamp, and a second voice timestamp , obtaining a second target voice timestamp that has a time matching relationship with the second picture timestamp.
- the data synchronization output units 11-133 are configured to obtain, in the first virtual audio, the first virtual audio to be played corresponding to the first target voice timestamp, and to obtain the second virtual audio in the second virtual audio. playing the second virtual audio to be played corresponding to the second target voice timestamp; and playing the first virtual audio to be played and the second virtual audio to be played.
- the virtual object model associated with the first virtual object information displayed in the first virtual video picture is the first partial display object
- the virtual object model displayed in the second virtual video picture is related to the first virtual object
- the virtual object model associated with the two virtual objects is the second partial display object.
- the video communication device 11-1 may further include: an interactive action presentation module 11-18, an object presentation switching module 11-19, and an interactive action execution module 11-20.
- the interactive action display modules 11-18 are configured to display a virtual object interactive action list in response to the interactive operation on the first video communication interface.
- the object display switching modules 11-19 are configured to switch the first partial display object to the first overall display object, and switch the second partial display object in response to an action selection operation for the virtual object interaction action list For the second overall display object.
- the interactive action execution module 11-20 is configured to display a picture of the first overall display object executing the target interactive action, and to display a picture of the second overall display object executing the target interactive action, and the target interactive action is all The interactive action selected by the action selection operation.
- the video communication apparatus 11-1 may further include: an execution time acquisition module 11-21 and an object presentation restoration module 11-22.
- the execution time obtaining module 11-21 is configured to obtain the action execution time of the target interactive action.
- the object display restoration module 11-22 is further configured to restore the first overall display object to the first partial display object when the time corresponding to the execution duration of the action is reached, and to restore the second overall display object to the second overall display object. Revert to the second partial display object.
- the video communication device 11-1 may further include: a material presentation module 11-23 and a material switching module 11-24.
- the material display module 11-23 is configured to display a list of configuration materials in response to a material switching operation for the first video communication interface.
- the material switching module 11-24 is configured to, in response to a material selection operation for the configuration material list, switch the material of the first video communication interface to a target material, the target material being the material selected by the material selection operation , the target material includes one or both of static material and dynamic material.
- the first virtual object information includes a first virtual object model
- the second virtual object information includes a second virtual object model
- the screen output modules 11-12 may include: a key part acquisition unit 11 -121 and screen output units 11-122.
- the key part acquisition unit 11-121 is configured to acquire the first virtual key part in the first virtual object model, and to acquire the first user key part, the first virtual key part and the first user key part Parts belong to the same part type.
- the picture output units 11-122 are configured to display the first virtual video picture in the first video communication interface according to the first virtual key part and the first user key part.
- the key part obtaining unit 11-121 is further configured to obtain the second virtual key part in the second virtual object model, and obtain the second user key part, the second virtual key part and the second user Critical parts belong to the same part type.
- the picture output units 11-122 are further configured to display the second virtual video picture in the first video communication interface according to the second virtual key part and the second user key part.
- the screen output unit 11-122 may include: an action state acquisition subunit 11-1221, a model coordinate acquisition subunit 11-1222, a model coordinate determination subunit 11-1223, an action state adjustment subunit 11-1224, and The first output subunit 11-1225.
- the action state obtaining subunit 11-1221 is configured to obtain the action state of the part corresponding to the key part of the first user.
- the model coordinate acquisition subunit 11-1222 and the model coordinate determination subunit 11-1223 are configured to be based on the first model position coordinates when the first virtual key part is in the first critical state, and the first virtual key part is in the first critical state.
- the position coordinates of the second model in the second critical state are used to determine the position coordinates of the target model when the first virtual key part is in the action state of the part.
- the action state adjustment subunit 11-1224 is configured to adjust the first virtual key part to the position coordinate of the target model in the first virtual object model, so as to obtain a first target virtual object model, the first virtual object model.
- the first virtual key part in a target virtual object model is used to cover the first user key part.
- the first output subunit 11-1225 is configured to display a first virtual video picture including the target virtual object model in the first video communication interface.
- action state acquisition subunit 11-1221 corresponds to For the implementation manner, please refer to the description corresponding to S102 in FIG. 4 .
- the picture output unit 11-122 may include: a two-dimensional coordinate acquisition subunit 11-1226, a three-dimensional coordinate determination subunit 11-1227, and a target state determination subunit 11-1228.
- the two-dimensional coordinate acquisition subunit 11-1226 and the three-dimensional coordinate determination subunit 11-1227 are configured to acquire the position mapping relationship between the first user key part and the first virtual key part.
- the target state determination subunit 11-1228 is configured to, in the first virtual object model, adjust the state of the first virtual key part to the part action of the first user key part according to the position mapping relationship state to obtain a second target virtual object model; and in the first video communication interface, output the first virtual video picture including the second target virtual object model.
- the video communication device 11-1 may further include: a voice acquisition module 11-25, a voice conversion module 11-26, and a data transmission module 11-27.
- the voice acquisition module 11-25 is configured to acquire the first voice data and the key parts of the first user.
- the voice conversion module 11-26 is configured to convert the first voice data into the first virtual audio associated with the first virtual object information.
- the data sending module 11-27 is configured to send the first virtual audio and the key part of the first user to the opposite end, so that the opposite end displays in the second video communication interface and the first The first virtual video picture associated with the key part of the user, and the first virtual audio is played.
- the first virtual object information includes a first virtual object model and an audio processing model
- the speech conversion module 11-25 may include: a preprocessing unit 11-251, a feature extraction unit 11-252, a timbre feature acquisition unit 11-253, and a feature fusion unit 11-254.
- the preprocessing unit 11-251 is configured to perform voice preprocessing on the first voice data to obtain transition voice data.
- the feature extraction unit 11-252 is configured to extract audio features of the transition speech data through the audio processing model.
- the timbre feature acquisition unit 11-253 is configured to acquire timbre features associated with the first virtual object model in the audio processing model.
- the feature fusion unit 11-254 is configured to fuse the audio feature and the timbre feature to obtain a fused audio feature; and generate the first virtual audio according to the fused audio feature.
- the preprocessing unit 11-251 may include: an echo processing subunit 11-2511, a noise suppression subunit 11-2512, and a silence detection subunit 11-2513.
- the echo processing subunit 11-2511 is configured to delete echo audio data from the first voice data to obtain echo transition voice data.
- the noise suppression subunit 11-2512 is configured to perform suppression processing on the noise audio data in the echo transition speech data to obtain noise transition speech data.
- the mute detection subunit 11-2513 is configured to delete mute audio data from the noise transition voice data to obtain the transition voice data.
- a virtual object is added to the video communication application, and the user can select any virtual object, so as to switch from the real image to the image of the selected virtual object, and the image between users can be the virtual object.
- the application can convert his own image into an avatar for communication, and the user can view the avatar without interrupting the video communication; and when the user conducts video communication , the audio of the other party heard is also virtual audio instead of the user's original voice data, which can improve the quality of video communication.
- the present application enriches the image of users in video communication applications, so that virtual video communication between users can be performed, and users can normally maintain the operation and display of video communication when displaying virtual data;
- the user's voice data is converted into virtual audio for output, which can improve the quality of video communication. That is to say, the present application can enrich the video display mode and interest of video communication, and can maintain the normal operation and display of video communication while displaying virtual data, and improve the quality of video communication.
- FIG. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the electronic device for video communication may be implemented as a computer device.
- the video communication apparatus 11-11 in FIG. 11 may be applied to a computer device 12-00, and the computer device 12-00 may include: processing 12-01, a network interface 12-04 and a memory 12-05, in addition, the above-mentioned computer device 12-00 also includes: a user interface 12-03, and at least one communication bus 12-02. Among them, the communication bus 12-02 is used to realize the connection and communication between these components.
- the user interface 12-03 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 12-03 may also include a standard wired interface and a wireless interface.
- the network interface 12-04 may include a standard wired interface and a wireless interface (eg, a WI-FI interface).
- the memory 12-05 may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory.
- the memory 12-05 may optionally also be at least one storage device located remote from the aforementioned processor 12-01. As shown in FIG. 12, the memory 12-05 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
- the network interface 12-04 can provide network communication functions; the user interface 12-03 is mainly used to provide an input interface for the user; and the processor 12-01 can be used to call
- the device control application program stored in the memory 12-05 is used to implement the video communication method provided by the embodiment of the present application.
- the computer device 12-00 described in the embodiments of the present application may execute the description of the video communication method in FIG. 4, and may also execute the description corresponding to the video communication apparatus 11-1 in FIG. 11 .
- Embodiments of the present application further provide a computer-readable storage medium, and the computer-readable storage medium stores a computer program executed by the computer device 12-00, and the computer program includes program instructions, when the processor executes the above-mentioned computer program When the program is instructed, the description of the video communication method corresponding to FIG. 4 can be executed.
- the computer-readable storage medium embodiments involved in the present application please refer to the description of the method embodiments of the present application.
- the above-mentioned computer-readable storage medium may be the video communication apparatus provided in the embodiment of the present application or an internal storage unit of the above-mentioned computer device, such as a hard disk or a memory of the computer device.
- the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card equipped on the computer device, Flash card (Flash Card) and so on.
- the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
- the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
- the computer-readable storage medium can also be used to temporarily store data that has been or will be output.
- Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the video communication method provided by the embodiment of the present application.
- each process and/or the schematic structural diagrams of the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. or blocks, and combinations of processes and/or blocks in flowcharts and/or block diagrams.
- These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce a function
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in one or more of the flowcharts and/or one or more blocks of the structural diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the block or blocks of the flowchart and/or structural representation.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Networks & Wireless Communication (AREA)
- Quality & Reliability (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
本申请提供了一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品;本申请属于计算机技术领域,方法包括:响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息;在第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,第一虚拟视频画面与第一虚拟对象信息以及第一用户关键部位相关联,第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联;播放目标虚拟音频,目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,第一虚拟音频与第一语音数据以及第一虚拟对象信息相关联,第二虚拟音频与第二语音数据以及第二虚拟对象信息相关联。
Description
相关申请的交叉引用
本申请基于申请号为202011156220.5、申请日为2020年10月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请涉及计算机技术领域,涉及一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
随着移动通信技术的不断发展,手机和平板电脑等智能终端在人们的日常生活中已占据着举足轻重的地位。如今,通过智能终端,人们可以随时随地进行实时的视频通信,减轻了人们沟通成本。
目前,在智能终端上进行视频通信的过程中,所展示的视频通常是采集设备所采集的原始音视频数据,从而,视频通信中的视频展示方式单一,导致视频通信中的视频渲染效果较差。
发明内容
本申请实施例提供一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提升视频通信中的视频渲染效果。
本申请实施例提供一种视频通信方法,包括:
响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,所述第一视频通信界面为所述电子设备和所述电子设备的对端进行视频通信的界面;
在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,所述第一虚拟视频画面与所述第一虚拟对象信息以及第一用户关键部位相关联,所述第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,所述第二虚拟对象信息是所述对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的;
播放目标虚拟音频,所述目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,所述第一虚拟音频与第一语音数据以及所述第一虚拟对象信息相关联,所述第二虚拟音频与第二语音数据以及所述第二虚拟对象信息相关联。
本申请实施例提供了一种视频通信装置,包括:
虚拟对象获取模块,配置为响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,所述第一视频通信界面为所述电子设备和所述电子设备的对端进行视频通信的界面;
画面输出模块,配置为在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,所述第一虚拟视频画面与所述第一虚拟对象信息以及第一用户关键部 位相关联,所述第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,所述第二虚拟对象信息是所述对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的;
音频输出模块,配置为播放目标虚拟音频,所述目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,所述第一虚拟音频与第一语音数据以及所述第一虚拟对象信息相关联,所述第二虚拟音频与第二语音数据以及所述第二虚拟对象信息相关联。
本申请实施例提供了一种用于视频通信的电子设备,包括:处理器和存储器;存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行本申请实施例中的视频通信方法。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序包括程序指令,程序指令当被处理器执行时,执行本申请实施例中的视频通信方法。
本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括计算机指令,该计算机指令用于视频通信的电子设备的处理器执行时,执行本申请实施例中提供的视频通信方法。
本申请实施例至少包括以下有益效果:由于在视频通信的过程中,能够通过选择的虚拟对象信息(即为第一虚拟对象信息和第二虚拟对象信息)以虚拟音视频的方式实现视频通信,使得视频通信中的视频展示方式多样化,因此,能够提升视频通信中的视频渲染效果。
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构图;
图2-图3是本申请实施例提供的一种进入视频虚拟通信的场景示意图;
图4是本申请实施例提供的一种视频通信方法的流程示意图;
图5a是本申请实施例提供的一种生成虚拟视频画面的示意图一;
图5b是本申请实施例提供的一种生成虚拟视频画面的示意图二;
图6本申请实施例提供的一种将语音数据转换为虚拟音频的模块示意图;
图7是本申请实施例提供的一种在视频虚拟通信中进行互动的流程示意图;
图8是本申请实施例提供的一种在视频虚拟通信中进行互动的场景示意图;
图9a是本申请实施例提供的一种在视频虚拟通信中进行背景的素材切换的场景示意图;
图9b是本申请实施例提供的一种视频通信方法的交互示意图;
图10是本申请实施例提供的一种系统架构图;
图11是本申请实施例提供的一种视频通信装置的结构示意图;
图12是本申请实施例提供的一种计算机设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整 地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
这里,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。也就是说,人工智能是计算机科学的一个综合技术,用于确定智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。从而,人工智能用于研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的视频通信方法属于人工智能领域下属的语音处理技术(Speech Technology)。
2)语音处理技术涉及自动语音识别技术(ASR)和语音合成技术(TTS)以及声纹识别技术等。让计算机能听、能看、能说、以及能感觉,是未来人机交互的发展方向,其中,语音交互是人机交互方式之一。
一般来说,在智能终端上进行视频通信的过程中,所展示的视频通常是采集设备所采集的原始音视频数据,从而,视频通信中的视频展示方式单一,导致视频通信中的视频渲染效果较差。另外,视频通信过程中,若用户想展示一些虚拟数据(如特效动画),则需要中断视频通信,再通过会话页面来发送虚拟数据,因此,视频通信过程中在展示虚拟数据的时候无法维持视频通信的正常运行和显示。
基于此,本申请实施例提供了一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提升视频通信过程中的视频渲染效果,以及在视频通信过程中展示虚拟数据的时候维持视频通信的正常运行和显示,降低视频通信过程中的交互操作,从而,降低视频通信过程中交互操作带来的资源消耗。
参见图1,图1是本申请实施例提供的一种网络架构图。如图1所示,该网络架构用于实现视频通信系统,该视频通信系统1-1中,可以包括业务服务器1000和用户终端集群,用户终端集群可以包括一个或者多个用户终端,这里将不对用户终端的数量进行限制。如图1所示,多个用户终端可以包括用户终端100a、用户终端100b、用户终端100c、…、用户终端100n;如图1所示,用户终端100a、用户终端100b、用户终端100c、…、用户终端100n可以分别与业务服务器1000进行网络连接,以便于每个用户终端可以通过该网络连接与业务服务器1000之间进行数据交互,以实现视频通信过程中虚拟视频画面(称为第一虚拟视频画面和/或第二虚拟视频画面)和虚拟音频(称为目标虚拟音频)的生成;进而,基于虚拟视频画面和虚拟音频,实现至少两个用户终端之间的视频通信。
可以理解的是,如图1所示的每个用户终端均可以安装有目标应用,当该目标应用 运行于各用户终端中时,可以分别与图1所示的业务服务器1000之间进行数据交互,使得业务服务器1000可以接收来自于每个用户终端的业务数据。其中,该目标应用可以包括具有显示文字、图像、音频以及视频等数据信息功能的应用。如,应用可以为即时通信应用,该即时通信应用可以用于用户之间进行实时通信。例如,即时通信应用为视频通信应用时,用户可以通过该视频通信应用进行视频通信。
可以理解的是,为提高即时通信应用(例如,视频通信应用)中,用户进行视频通信时用户形象的丰富度,本申请实施例在即时通信应用中提供了一个或多个虚拟对象,进行视频通信的两个用户可以选择任一虚拟对象,从而可以进入视频虚拟通信中;其中,在视频通信系统中,每个用户通过账号登录用户终端上运行的用于视频通信的客户端,以与其他用户进行视频通信。
可以理解的是,本申请实施例中的业务服务器1000可以根据这些应用获取到业务数据,如,该业务数据可以为用户所选择的虚拟对象(例如,动漫人物)、用户的语音数据、以及用户的表情等。这里,对于所获取到的虚拟对象,业务服务器1000针对获取到的用户的语音数据,基于所选择的虚拟对象将该语音数据转换为虚拟音频,从而,该虚拟音频具有该虚拟对象对应的配置音色;业务服务器1000也可以将获取到的用户的表情与所选择的虚拟对象进行融合,生成具有用户表情的虚拟对象;随后,业务服务器可以将该虚拟音频以及该具有用户表情的虚拟对象发送至用户终端,用户终端可以在该视频通信界面中,输出包含该具有用户表情的虚拟对象的虚拟视频画面以及该虚拟音频。
本申请实施例可以在多个用户终端中选择一个用户终端作为目标用户终端,该目标用户终端可以包括:智能手机、平板电脑、笔记本电脑、桌上型电脑、智能电视、智能音箱、台式计算机、智能手表、车载设备等携带数据处理功能(例如,文本数据显示功能、视频数据播放功能、音乐数据播放功能)的智能终端,但并不局限于此。例如,本申请实施例可以将图1所示的用户终端100a作为该目标用户终端,该目标用户终端中可以集成有上述目标应用,此时,该目标用户终端可以通过该目标应用与业务服务器1000之间进行数据交互。
示例性地,用户在使用目标用户终端中的目标应用(如视频通信应用)时,用户在该视频通信应用中点击视频虚拟通信控件,目标用户终端也就接收到了视频虚拟通信控件的触发操作,此时,目标用户终端可以根据这一视频虚拟通信控件的触发操作,展示至少一个虚拟对象;随后,用户可以在该至少一个虚拟对象中选择任一虚拟对象作为目标虚拟对象;随后,业务服务器可以获取到使用该目标用户终端的用户的用户表情,并将该用户的用户表情与该目标虚拟对象进行融合,生成具有该用户表情的目标虚拟对象(例如,用户表情为抿嘴微笑,则可以生成具有抿嘴微笑表情的动漫人物);同时,该业务服务器可以获取该目标虚拟对象对应的音频处理模型(该音频处理模型中包括目标虚拟对象对应的音色特征),业务服务器可以获取用户的语音数据,并通过该音频处理模型将该语音数据转换为具有该目标虚拟对象的音色特征的虚拟音频(例如,具有动漫人物音色特征的音频);随后,业务服务器可以将该虚拟音频以及具有该用户表情的目标虚拟对象返回至目标用户终端,目标用户终端可以在视频通信界面中,输出包含具有该用户表情的目标虚拟对象的虚拟视频画面以及该虚拟音频。
可以理解的是,网络架构中可以包括多个业务服务器,一个用户终端可以与一个业务服务器相连接,每个业务服务器可以获取到与之相连接的用户终端中的业务数据(如,用户选择的虚拟对象,用户的语音数据,用户的用户表情),并将用户的语音数据转换为具有该选择的虚拟对象的音色特征的虚拟音频;将用户表情与虚拟对象进行融合,生成具有用户表情的虚拟对象所对应的虚拟视频画面。
可以理解的是,用户终端也可以获取到业务数据(如,用户选择的虚拟对象,用户的语音数据,用户的用户表情),并将用户的语音数据转换为具有该虚拟对象的音色特征的虚拟音频;将用户表情与虚拟对象进行融合,生成具有用户表情的虚拟对象所对应的虚拟视频画面。
可以理解的是,本申请实施例提供的视频通信方法可以由用于视频通信的电子设备执行,该电子设备包括但不限于用户终端或业务服务器。其中,业务服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器。其中,用户终端以及业务服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例在此不作限制。
参见图2-图3,图2-图3是本申请实施例提供的一种进入视频虚拟通信的场景示意图。其中,如图2-图3所示的业务服务器可以为上述图1所示的业务服务器1000,且如图2-图3所示的用户终端A可以为在图1中所选取的任意一个用户终端,比如,该用户终端A可以为上述用户终端100b;且如图2-图3所示的用户终端B可以为在上述图1所对应实施例的用户终端集群中所选取的任意一个除用户终端A之外的用户终端,比如,该用户终端B可以为上述用户终端100a。
其中,用户A与用户B可以通过用户终端A与用户终端B进行视频通信。这里,图2以用户A对应的用户终端A这一侧进行视频通信的说明。在第一视频通信界面2-1中,包括有视频虚拟通信控件2-11(比如为图标和文字“视频虚拟通信”的显示形式)以供用户A与用户B通过该视频虚拟通信控件2-11进入视频虚拟通信中。例如,如图2所示,在用户终端A的第一视频通信界面2-1的底部,展示了视频虚拟通信对应的控件,即为视频虚拟通信控件2-11,用户A可以点击该视频虚拟通信控件2-11;在用户A点击该视频虚拟通信控件2-11后,用户终端A可以响应该用户A的这一触发操作,在该第一视频通信界面2-1中展示虚拟对象列表2-12(称为至少一个虚拟对象),用户A可以在该虚拟对象列表2-12中选择一个虚拟对象作为目标虚拟对象。虚拟对象列表2-12可以浮窗形式或蒙层形式或半透明形式展现在第一视频通信界面2-1的底部区域,也可以是用能通过拖拽操作改变显示尺寸且可收缩的界面进行显示,该界面的尺寸小于第一视频通信界面2-1。
可以理解的是,展示虚拟对象列表2-12时,以小窗形式显示用户B或用户A的显示区域会移动至与该虚拟对象列表2-12的展示区域不具有重叠部分的区域中;也就是说,用户B或用户A的显示区域并不会被虚拟对象列表2-12的展示区域所覆盖。例如,如图2所示,当展示虚拟对象列表2-12时,显示用户B的区域M在第一视频通信界面2-1中进行了上移,该区域M与虚拟对象列表的展示区域不具有重叠部分。
如图2所示,该虚拟对象列表2-12中包括虚拟对象20a、虚拟对象20b以及虚拟对象20c,用户A选择虚拟对象20a后,可以点击开启视频虚拟通信按钮2-13,而用户终端A可以响应该用户A的这一开启操作,生成虚拟通信的开启请求并将该开启请求发送至业务服务器1000,业务服务器1000可以查询与该用户A进行视频通信的用户B所属的用户终端B是否已开启视频虚拟通信。若用户终端B未开启视频虚拟通信(也就是用户B还未开启视频虚拟通信),则业务服务器1000可以向用户终端A返回这一查询结果(用户终端B还未开启视频虚拟通信),用户终端A可以在第一视频通信界面2-1中,展示邀请提示信息2-14,用户A可以查看到该邀请提示信息2-14;该邀请提示信息2-14可以以弹窗形式或蒙层形式或半透明形式展示在该第一视频通信界面2-1的任一区域中,例如,如图2所示,该邀请提示信息2-14以弹窗形式展示在第一视频通信界 面2-1中,该弹窗中包括有邀请提示信息2-14“对方尚未开启视频虚拟通信,是否邀请对方开启视频虚拟通信”,且如图2所示,该包含邀请提示信息2-14的弹窗中还会包括确认控件2-151与取消控件2-152,以供用户A选择是否要向用户B发送开启视频虚拟通信的邀请,若用户A点击确认控件2-151,则用户终端A向业务服务器发送与确认结果对应的开启邀请请求,以使业务服务器1000向用户终端B发送开启邀请请求;若用户A点击取消控件2-152,则业务服务器1000不会向用户终端B发送开启邀请请求(称为开启请求)。可以理解的是,展示邀请提示信息2-14时,以小窗形式显示用户B或用户A的显示区域会移动至与该邀请提示信息2-14的展示区域不具有重叠部分的区域中,也就是说,用户B或用户A的显示区域并不会被邀请提示信息2-14的展示区域所覆盖。例如,如图2所示,当展示邀请提示信息2-14时,显示用户B的区域M移动至第一视频通信界面2-1的右下角,该区域M与邀请提示信息的展示区域不具有重叠部分。
如图2所示,针对该邀请提示信息2-14,用户A点击确认控件2-151,以邀请用户B开启视频虚拟通信,用户终端A可以响应用户A这一确认操作,生成开启邀请请求,并将该开启邀请请求发送至业务服务器1000。从而,业务服务器在接收到该开启邀请请求后,将该开启邀请请求转发至用户终端B,如图3所示,用户B可以在用户终端B的第二视频通信界面3-1中查看到该开启邀请请求对应的开启邀请提示信息3-11,并展示虚拟对象列表3-12。用户B在选择一个虚拟对象后,可以点击开启虚拟通信按钮3-13,而用户终端B可以响应该用户B的这一开启操作,生成视频虚拟通信的开启请求,并将该开启请求发送至业务服务器1000,因为用户终端A已开启视频虚拟通信,业务服务器1000可以向用户终端A发送用户终端B已开启视频虚拟通信的通知消息,用户终端A在接收到该通知消息后,可以在第一视频通信界面2-1中,展示时间提示信息3-14,以提示用户A目前用户B已开启视频虚拟通信,在经过等待时长(例如,3s)后,用户A与用户B会进入视频虚拟通信中。
同理,用户终端B在第二视频通信界面3-1中,也可以展示时间提示信息3-15,以提示用户B在经过等待时长(例如,3s)后,用户B与用户A会进入视频虚拟通信中。本申请实施例中,等待时长可以倒计时的方式呈现在第一视频通信界面2-1和第二视频通信界面3-1上。
可以理解的是,在视频虚拟通信中,业务服务器1000可以获取用户A所选择的虚拟对象20a,并获取到用户A的表情数据,随后,业务服务器100可以将用户A的表情数据与虚拟对象20a进行融合,从而可以生成包含虚拟对象20a(该虚拟对象20a具有用户A的表情)的第一虚拟视频画面;同理,业务服务器1000可以获取用户B所选择的虚拟对象20b,并获取到用户B的表情数据,随后,业务服务器1000可以将用户B的表情数据与虚拟对象20b进行融合,从而可以生成包含虚拟对象20b(该虚拟对象20b具有用户B的表情)的第二虚拟视频画面。随后,业务服务器可以将该第一虚拟视频画面与第二虚拟视频画面分别发送至用户终端A与用户终端B,用户终端A与用户终端B均可以在各自的视频通信界面中展示该第一虚拟视频画面与第二虚拟视频画面。也就是说,用户A可以在用户终端A的第一视频通信界面中查看到的是两个虚拟对象(虚拟对象20a与虚拟对象20b)在进行视频通信,用户B可以在用户终端B的第二视频通信界面中查看到是两个虚拟对象(虚拟对象20b与虚拟对象20a)在进行视频通信。
需要说明的是,在视频虚拟通信中,业务服务器1000也可以获取到用户A与用户B分别对应的语音数据,业务服务器1000可以将用户A的语音数据(称为第一语音数据)进行语音转换,生成具有虚拟对象20a的音色特征的虚拟音频a(称为第一虚拟音频),并将该虚拟音频a发送至用户终端B;同理,业务服务器1000也可以将用户B的语音数据(称为第二语音数据)进行语音转换,生成具有虚拟对象20b的音色特征的虚 拟音频b(称为第二虚拟音频),并将该虚拟音频b发送至用户终端A。这里,用户终端A可以在第一视频通信界面中,将该虚拟音频b进行输出,则用户A所看到的画面为包含用户B表情的虚拟对象20b的第二虚拟视频画面,同时听到的用户B的声音具有虚拟对象20b的音色特征;同理,用户终端B可以在第二视频通信界面中,将该虚拟音频a进行输出,则用户B所看到的画面为包含用户A表情的虚拟对象20a的第一虚拟视频画面,同时听到的声音具有虚拟对象20a的音色特征。
可以理解的是,在视频通信界面中,增加了视频虚拟通信的入口,用户可以在视频通信界面中选择虚拟对象,从而可以使得自己的聊天形象转换为虚拟形象,增加了视频通信时,用户形象的丰富度,且在展示虚拟形象时,视频通信可以正常运行和显示;以及在用户进行虚拟视频聊天时,所听到的对方的声音为经过语音转换后的虚拟音频,该虚拟音频并非用户的原声,而是具有虚拟对象的音色特征的虚拟音频。另外,本申请实施例从声音和画面两个维度,对视频通信中的通信用户进行了改变,使得视频通信的趣味性和视频展示方式得到增加,可以很好提高视频通信的质量,从而可以很好地提高用户体验。
参见图4,图4是本申请实施例提供的一种视频通信方法的流程示意图。该视频通信方法可以由用户终端(例如,上述图1、图2所示的用户终端)或业务服务器(如,上述图1所示的业务服务器1000)执行,也可以由用户终端和业务服务器共同执行。为便于理解,本申请实施例以该视频通信方法由上述用户终端执行为例进行说明。其中,该视频通信方法至少可以包括以下S101-S103,下面对各步骤分别进行说明。
S101,响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息。
本申请实施例中,在第一视频通信界面中,包括视频虚拟通信控件,用于触发视频虚拟通信;从而,用户(称为第一用户,比如用户A)可以点击该视频虚拟通信控件,从而可以与通信用户(称为第二用户,比如用户B)进行视频虚拟通信;这里,当用户点击该视频虚拟通信控件时,第一终端(称为用于视频通信的电子设备)也就触发视频虚拟通信的处理流程。
这里,当第一用户点击该视频虚拟通信控件时,第一终端响应第一用户针对视频虚拟通信控件的触发操作,展示至少一个虚拟对象;其中,该虚拟对象可以是指与真实人物形象不同的虚拟形象,例如,该虚拟对象可以为三维动画类形象(包括动画人物形象(如,动漫人物形象)、动画动物形象(如,动漫动物形象)、动画植物形象(如,动画苹果树形象),等等)。以及,第一用户可以在该至少一个虚拟对象中,选择一个虚拟对象作为进行形象转换的虚拟对象;当第一用户从至少一个虚拟对象中选择虚拟对象时,第一终端响应第一用户针对至少一个虚拟对象的第一虚拟对象选择操作,获取第一用户所选择的虚拟对象,并将虚拟对象对应的信息确定为第一虚拟对象信息;其中,虚拟对象对应的信息包括虚拟对象模型和虚拟音频模型等。
需要说明的是,第一视频通信界面为电子设备和电子设备的对端进行视频通信的界面,其中,第一视频通信界面为电子设备侧所展示的视频通信界面,电子设备的对端即第二用户所使用的第二终端。另外,还可以是电子设备接收到视频虚拟通信的请求时,展示的至少一个虚拟对象,本申请实施例对此不作限定。
可以理解的是,在视频通信中增加了至少一个虚拟对象,从而,能够基于用户选择的虚拟对象,将视频采集设备采集到的原始数据中的人物形象切换为所选择的虚拟对象的形象,使得视频通信是以视频虚拟通信的方式实现的,提升了视频通信中视频的渲染效果。
S102,在第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,第 一虚拟视频画面与第一虚拟对象信息以及第一用户关键部位相关联,第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,第二虚拟对象信息是对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的。
本申请实施例中,当第一用户选择虚拟对象并开启视频虚拟通信功能时,若与第一用户进行视频通信的第二用户尚未开启该视频虚拟通信功能,则第一用户与第二用户是不会进行视频虚拟通信的。这里,第一用户在开启视频虚拟通信功能后,第一终端可以向第二用户发起开启视频虚拟通信的邀请信息,以提示第二用户开启该视频虚拟通信功能。也就是说,第一用户在选择虚拟对象后,可以点击开启视频虚拟通信,第一终端可以响应针对视频虚拟通信控件的开启操作,在第一视频通信界面中展示邀请提示信息;其中,该邀请提示信息为针对视频虚拟通信控件的开启邀请信息,开启邀请信息为第一终端请求第二终端开启视频虚拟通信控件的提示消息;也就是说,该邀请提示信息用于提示第一用户向第二用户发送针对视频虚拟通信控件的开启邀请信息;针对该邀请提示信息,第一用户可以点击确认邀请控件,此时,第一终端可以响应第一用户针对邀请提示信息的确定操作,向第二终端发送针对视频虚拟通信控件的开启请求。
需要说明的是,第二终端在接收到该开启请求后,可以在第二终端的第二视频通界面中展示开启提示信息,第二用户可以查看到该开启提示信息并开启视频虚拟通信;此时,第二终端也就接收到了第二用户开启视频虚拟通信的操作,第二终端响应第二用户开始视频虚拟通信的操作,将第二用户开启视频虚拟通信的确认信息返回至第一终端,第一终端可以接收第二终端针对开启请求返回的确认信息,并输出时间提示信息;时间提示信息用于提示第一用户进入视频虚拟通信的等待时长;并在到达该等待时长对应的时间时,第一终端在第一视频通信界面中,输出第一虚拟视频画面,以及输出第二虚拟视频画面。
还需要说明的是,第一用户与第二用户需要在双方均开启视频虚拟通信后,才会进入视频虚拟通信中,且在进入视频虚拟通信前,还会有一个准备时间(等待时长),该等待时长可以为3秒钟、1分钟、1小时,等等,在此不再进行一一举例。当到达等待时长对应的时间后,第一用户与第二用户会进入视频虚拟通信中,且在进行视频虚拟通信时,第一视频通信界面中所呈现的为用于覆盖第一用户的用户关键部位的第一虚拟视频画面,以及用于覆盖第二用户的第二虚拟视频画面。其中,该第一虚拟视频画面是由第一用户的用户关键部位(称为第一用户关键部位)所生成的虚拟视频画面,该第二虚拟视频画面是由第二用户的用户关键部位(称为第二用户关键部位)所生成的虚拟视频画面。其中,该用户关键部位可以是指用户的眼睛部位、嘴唇部位、鼻子部位、眉毛部位,等等,该用户关键部位可以用于表征用户(例如,第一用户与第二用户)的表情信息(例如,微笑表情、抿嘴表情、张嘴表情、眼睛睁大且嘴唇张开的表情,等等)。
在本申请实施例中,第一虚拟对象信息包括第一虚拟对象模型,第二虚拟对象信息包括第二虚拟对象模型;对于输出第一虚拟视频画面与第二虚拟视频画面的过程包括:第一终端获取该第一虚拟对象模型中的第一虚拟关键部位,以及获取第一用户的用户关键部位;其中,该第一虚拟关键部位与该第一用户的用户关键部位属于相同的部位类型;例如,第一用户的用户关键部位为眼睛部位,则第一虚拟对象模型中的第一虚拟关键部位也应该为对应的眼睛部位。随后,第一终端根据该第一虚拟关键部位以及该第一用户的用户关键部位,可以在第一视频通信界面中,输出包括第一虚拟对象模型的第一虚拟视频画面,且第一虚拟对象模型的第一虚拟关键部位与第一用户的用户关键部位相关联;同理,第一终端获取该第二虚拟对象模型中的第二虚拟关键部位,以及获取第二用户的用户关键部位;其中,该第二虚拟关键部位与该第二用户的用户关键部位属于相同的部位类型;根据该第二虚拟关键部位以及该第二用户的用户关键部位,也可以在该第一视 频通信界面中,输出包括第二虚拟对象模型的第二虚拟视频画面,且第二虚拟对象模型的第二虚拟关键部位与第二用户的用户关键部位相关联。
下面说明第一终端根据用户关键部位以及虚拟对象模型中的虚拟关键部位,输出虚拟视频画面的过程。这里,以第一终端根据第一用户的用户关键部位以及第一虚拟关键部位,输出第一虚拟视频画面来进行说明(根据第二用户的用户关键部位以及第二虚拟关键部位,输出第一虚拟视频画面的具体实现方式可以与之相同)。
在本申请实施例中,第一终端先获取该第一虚拟关键部位的虚拟动作状态,以及该第一用户的用户关键部位对应的部位动作状态;这里,该虚拟动作状态对应两种状态:第一临界状态和第二临界状态,其中,第一临界状态下第一虚拟关键部位对应第一模型位置坐标,第二临界状态下第二虚拟关键部位对应第二模型位置坐标。第一终端根据第一临界状态与第一模型位置坐标、以及第二临界状态与第二模型位置坐标,确定状态与位置坐标之间的关系,进而基于该状态与位置坐标之间的关系,确定该部位动作状态下第一虚拟关键部位对应位置坐标,也就获得了目标模型位置坐标;当第一终端将第一虚拟对象模型中的第一虚拟关键部位调整至该目标模型位置坐标处时,也就实现了将该第一虚拟关键部位对应的虚拟动作状态,调整为与该部位动作状态相匹配的动作状态,此时,第一虚拟对象模型也就转换成了第一目标虚拟对象模型;随后,第一终端在该第一视频通信界面中,输出包含第一目标虚拟对象模型的第一虚拟视频画面;并且,该第一目标虚拟对象模型中的虚拟关键部位处于部位动作状态。
示例性地,参见图5a,图5a是本申请实施例提供的一种生成虚拟视频画面的示意图一。其中,图5a所示的示意图是以虚拟关键部位为嘴唇部位,用户关键部位为嘴唇部位为例进行的说明。
如图5a所示,当第一虚拟对象模型中的虚拟关键部位(比如,虚拟嘴唇部位)的虚拟动作状态为第一临界状态5-11时的图像可以为图像50a,其中,第一临界状态5-11可以是指虚拟嘴唇部位为嘴唇闭合状态(此时,第一虚拟对象模型的表情值为0);当第二虚拟对象模型中的虚拟嘴唇部位的虚拟动作状态为第二临界状态5-12时的图像可以为图像50b,其中,第二临界状态5-12可以是指嘴唇部位为嘴唇大张状态(此时,第一虚拟对象模型的表情值为1)。这里,第一终端可以获取虚拟嘴唇部位为嘴唇闭合状态时,第一虚拟对象模型中的第一虚拟关键部位对应的第一模型位置坐标5-21(即为表情值为0时虚拟嘴唇部位对应的模型位置坐标);第一终端也可以获取虚拟嘴唇部位为嘴唇大张状态时,第一虚拟对象模型中的第一虚拟关键部位对应的第二模型位置坐标5-22(即为表情值为1时虚拟嘴唇部位对应的模型位置坐标)。
这里,第一终端还可以获取到用户在视频通信时,通过摄像头所采集到的用户画面,在该用户画面的图像50c中,可以获取到第一用户的用户嘴唇部位(即为第一用户的用户关键部位)的部位动作状态;如图5所示,第一用户的用户嘴唇部位为嘴唇半开微笑状态,也就是说,当前的部位动作状态为嘴唇半开微笑状态。以及,第一终端可以确定部位动作状态为嘴唇半开微笑状态时的表情值(例如,表情值为0.3),并根据表情值0对应的第一模型位置坐标5-21以及表情值1对应第二模型位置坐标5-22,可以确定出该表情值0.3对应的目标模型位置坐标5-3;根据该目标模型位置坐标5-3,可以将虚拟对象模型中的虚拟嘴唇部位的虚拟动作状态进行调整,也就是说,可以将虚拟对象模型中的虚拟嘴唇部位也调整为嘴唇半开微笑状态,从而可以生成虚拟视频画面中的图像50d,并且,该图像50d中的虚拟对象的表情与用户的表情是相符合的。
下面说明第一终端输出第一虚拟视频画面的另一种处理过程。第一终端先获取该第一用户的用户关键部位对应的关键点二维坐标;根据该关键点二维坐标,确定该第一虚拟关键部位对应的关键点三维坐标;根据该关键点三维坐标,确定该第一虚拟关键部位 对应的目标虚拟动作状态;在该第一视频通信界面中,输出包含第二目标虚拟对象模型的第一虚拟视频画面;该目标虚拟对象模型中的第一虚拟关键部位处于该目标虚拟动作状态。也就是说,第一终端获取第一用户关键部位与第一虚拟关键部位之间的位置映射关系,并在第一虚拟对象模型中,根据位置映射关系,将第一虚拟关键部位的状态调整至第一用户关键部位的部位动作状态,得到第二目标虚拟对象模型,以及在第一视频通信界面中,输出包含第二目标虚拟对象模型的第一虚拟视频画面。如此,第一终端可以直接将第一用户关键部位对应的关键部位点位,映射到虚拟对象模型中,通过检测关键部位的点位变化映射到第一虚拟对象模型的模型点位变化,从而可以实现第一虚拟对象模型的表情随着用户表情变化。
示例性地,参见图5b,图5b是本申请实施例提供的一种生成虚拟视频画面的示意图二。如图5b所示,当将第一用户关键部位对应的部位动作状态映射至第一虚拟关键部位对应的图像5-41上时,得到如图像5-42至图像5-44所示的虚拟视频画面。
在本申请实施例中,对于用户终端(包括第一终端与第二终端)输出虚拟视频画面的实现方式可以通过用户终端中的实时渲染组件实现。这里的实时渲染组件可以是指具有画面渲染能力的组件。例如,该实时渲染组件可以为实时三维(3 Dimensions,3D)引擎,如,“Ace3D”引擎。“Ace3D”引擎可以被部署于用户终端中的摄像类应用中,具备加载速度快、内存占用小、兼容性高的性能,且可以用于进行毛发渲染、3D动画表情渲染等。
S103,播放目标虚拟音频。
需要说明的是,目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,第一虚拟音频与第一语音数据以及第一虚拟对象信息相关联,第二虚拟音频与第二语音数据以及第二虚拟对象信息相关联;其中,第一语音数据为第一用户的语音数据,第一终端可以将第二语音数据转换成的第二虚拟音频;第二语音数据为第二用户的语音数据,第一终端可以接受第二终端基于第二语音数据转换成的第二虚拟音频。
在本申请实施例中,当第一用户说话的情况下,目标虚拟音频包括第一虚拟音频,此时,第一终端将目标虚拟音频与第一虚拟视频画面进行同步播放。当第二用户说话的情况下,目标虚拟音频包括第二虚拟音频,此时,第一终端基于将目标虚拟音频与第二虚拟视频画面进行同步播放。当第一用户和第二用户均说话的情况下,目标虚拟音频包括第一虚拟音频和第二虚拟音频,此时,第一终端将目标虚拟音频与第一虚拟视频画面进行同步播放,并将目标虚拟音频与第二虚拟视频画面进行同步播放。
本申请实施例中,第一用户与第二用户在进入视频虚拟通信后,第一终端可以采集第一用户的语音数据,得到第一语音数据;并将第一语音数据转换为与第一用户所选择的第一虚拟对象信息相关联的通信虚拟音频,得到第一虚拟音频。同时,第一终端也可以采集到第一用户的用户关键部位数据,将第一虚拟音频和第一用户的用户关键部位发送至第二终端,以使第二终端可以根据该第一用户的用户关键部位生成第一虚拟视频画面,并将该第一虚拟音频与该目标虚拟视频画面进行同步输出。
同理,第二终端也可以采集到第二用户的语音数据,得到第二语音数据;并将第二语音数据转换为与第二用户所选择的第二虚拟对象信息相关联的通信虚拟音频,得到第二虚拟音频;同时,第二终端也可以采集到第二用户的用户关键部位数据,第二终端也可以将第二虚拟音频和第二用户的用户关键部位发送至第一终端,以使第一终端可以根据该第二用户的用户关键部位生成第二虚拟视频画面,并将该第二虚拟音频与该虚拟视频画面进行同步输出。
下面,以第一终端将第一用户的语音数据转换为与第一虚拟对象信息相关联的第一虚拟音频为例,来说明将用户的语音数据转换为虚拟音频的过程。
在本申请实施例中,第一终端将第一语音数据转换为与第一虚拟对象信息相关联的第一虚拟音频的过程包括:第一终端将该语音数据进行语音预处理,得到过渡语音数据,并将该过渡语音数据输入至音频处理模型,通过该音频处理模型可以提取该过渡语音数据的音频特征;其中,该音频处理模型为第一虚拟对象信息中包括的音频处理模型,该音频处理模型中包含有该第一虚拟对象模型(第一虚拟对象信息总所包括的虚拟对象模型)所具备的音色特征,例如,该第一虚拟对象模型为动漫人物A模型,则该音频处理模型中包括有动漫人物A模型的音色特征;第一终端可以获取该音频处理模型中包含的音色特征;该音色特征与该第一虚拟对象模型相关联;随后,第一终端可以将该音频特征与该音色特征进行融合,从而可以得到融合音频特征,根据该融合音频特征可以生成该第一虚拟音频。
在本申请实施例中,音频处理模型可以是指具有语音变声处理能力的模型。例如,实时AI变声模型。实时AI变声模型可以提取用户的语音数据中的内容信息(例如,语音数据的节奏特征数据和情感特征数据),随后可以结合虚拟对象的音色特征,并进行转换,从而可以复刻模型虚拟对象的声音。通过该实时AI变声模型,不仅可以实现对用户声音与虚拟对象声音之间的逼真的音色转换,还可以将用户说话时的语速、停顿、情感和言语方式等特征通过虚拟对象逼真地体现出来。
参见图6,图6本申请实施例提供的一种将语音数据转换为虚拟音频的模块示意图。如图6所示,通过特征提取模块6-1,可以提取用户语音数据6-2的音频特征与虚拟对象模型6-3的音色特征(例如,可以通过特征提取模块6-1中的概率输出单元6-11(SI-DNN单元)输出语音数据的音素后验概率,并将该音素后验概率作为音频特征);而特征提取模块6-1中的韵律提取单元6-12(Pitch Extractor单元)可以用于提取用户语音数据6-2对应的韵律特征(F0x值)与虚拟对象模型6-3对应的韵律特征(F0y值)。随后,可以将该特征提取模块6-1提取的特征输入至帧选择模块6-4,通过帧选择模块6-4,可以确定出目标频谱和F0路径(可以基于最小动态学差异的维特比搜索来确目标频谱和F0路径)。随后,通过特征编码模块6-5可以输出最终的虚拟音频(可以将最终确定的频谱轨迹与F0轨迹发送至基于“LPC”网络6-51的神经语音编码器6-52,通过该神经语音编码器6-52可以生成虚拟音频)。
需要说明的是,以上是对音频处理模型为实时AI模型,并通过实时AI变声模型将语音数据转换为虚拟音频的具体过程进行的示例性说明,对于音频处理模型,还可以为其他具备语音转换能力的模型,本申请实施例对此不作限制。
在本申请实施例中,第一终端将第一语音数据转换为与第一虚拟对象信息相关联的第一虚拟音频的过程中,对第一语音数据进行语音预处理,得到过渡语音数据的过程,可以包括:第一终端在该第一语音数据中,确定回声音频数据,并在第一语音数据中将该回声音频数据进行删除,得到回声过渡语音数据;随后,第一终端在该回声过渡语音数据中,确定噪声音频数据,并在回声过渡语音数据中对该噪声音频数据进行抑制处理,得到噪声过渡语音数据;随后,第一终端在该噪声过渡语音数据中,确定静音音频数据,并在噪声过渡语音数据中将该静音音频数据进行删除,得到该过渡语音数据。其中,在语音处理的过程中,对于删除噪声音频数据、删除回声音频数据、删除静音音频数据的执行顺序,也可以为先删除噪声音频数据,再删除静音音频数据,最后删除回声音频数据,也可以一起执行,对于三者的执行顺序,本申请实施例对此不作限制。
在本申请实施例中,第一终端可以获取到第二终端发送的第二用户的用户关键部位,并根据该第二用户的用户关键部位生成第二虚拟视频画面,同时,第一终端也可以获取到第二终端发送的第二虚拟音频。随后,第一终端可以在第一视频通信界面中,输出与第二虚拟视频画面同步的第二虚拟音频。这里,第一终端可以获取第二虚拟视频画面对 应的第二画面时间戳,以及第二虚拟音频对应的第二语音时间戳;随后,可以在第二语音时间戳中,获取与第二画面时间戳具有时间匹配关系的第二目标语音时间戳;在第二虚拟音频中,可以获取目标语音时间戳对应的第二待输出虚拟音频,并将第二虚拟视频画面与第二待输出虚拟音频进行同步输出。同理,第一终端在同步播放第一虚拟语音和第一虚拟视频画面时,包括:第一终端可以获取第一虚拟视频画面对应的第一画面时间戳,以及第一虚拟音频对应的第一语音时间戳;随后,可以在第一语音时间戳中,获取与第一画面时间戳具有时间匹配关系的第一目标语音时间戳;在第一虚拟音频中,可以获取目标语音时间戳对应的第一待输出虚拟音频,并将第一虚拟视频画面与第一待输出虚拟音频进行同步输出。
可以理解的是,本申请实施例通过在视频通信应用中增加虚拟对象,使得能够基于用户选择的虚拟对象,将视频采集设备采集到的视频数据转换为所选择的虚拟对象的形象对应的视频画面、以及所选择的虚拟对象对应的虚拟语音,实现了视频虚拟通信;也就是说,本申请实施例中在用户进行视频通信时,可以将原始视频数据中的用户形象转换为虚拟形象进行通信,无需中断视频通信用户就可实现虚拟形象的展示;且在用户进行视频通信时,所播放的音频也为虚拟音频,且是所选择的虚拟对象的音色,而非用户的原始语音数据,从而,能够提高视频通信质量。
还可以理解的是,本申请实施例丰富了视频通信应用中用户的形象,使得用户之间可以进行视频虚拟通信,且可以使得用户可以在展示虚拟数据时,可以正常维持视频通信的运行和显示;且在视频虚拟通信中,将用户的语音数据转换为了虚拟音频进行输出,能够提高视频通信质量。也就是说,本申请实施例可以丰富视频通信的视频展示方式和趣味性,且在展示虚拟数据的同时可以维持视频通信的正常运行和显示,提高视频通信质量。
在本申请实施例中,在用户之间进行视频虚拟通信时,第一终端和第二终端还可以选择互动动作来进行互动。参见图7,图7是本申请实施例提供的一种在视频虚拟通信中进行互动的流程示意图。其中,如图7所示的流程是以终端为第一终端,用户为第一用户为例进行说明的流程,该流程是以第一虚拟视频画面中所呈现的与第一虚拟对象信息相关联的虚拟对象模型属于第一局部展示对象,以及第二虚拟视频画面中所展示的与第二虚拟对象相关联的虚拟对象模型为第二局部展示对象为例进行说明的流程示意图。
S201,响应针对第一视频通信界面的互动操作,展示虚拟对象互动动作列表。
本申请实施例中,互动动作可以包括跳舞动作、摸头动作和拥抱动作,等等,在此不再进行一一举例。第一视频通信界面中包括互动控件,当第一用户通过点击等操作触发了互动控件时,第一终端也就接收到了互动操作;此时,第一终端响应该互动操作,展示虚拟对象互动动作列表,以从虚拟对象互动动作列表选择一个互动动作与第二用户进行互动。也就是说,第一终端可以响应该第一用户触发互动控件的操作,在第一视频通信界面中展示虚拟对象互动动作列表以供第一用户选择。
S202,响应针对虚拟对象互动动作列表的动作选择操作,将第一局部展示对象切换为第一整体展示对象,以及将第二局部展示对象切换为第二整体展示对象。
本申请实施例中,在第一用户选择一个互动动作后,第一终端可以响应第一用户针对该第一用户的这个触发操作,在该第一视频通信界面中,将第一用户对应的第一局部展示对象切换为第一整体展示对象。其中,这里的局部展示对象可以是指包括虚拟对象部分区域的对象,例如,当虚拟对象为虚拟人物形象时,局部展示对象可以是指仅展示了虚拟对象的头部区域或颈部区域或身体部位区域或脚部区域的对象,例如,如图3中的第一虚拟视频画面中所呈现的虚拟对象模型,该第一虚拟视频画面中所呈现的即为第一局部展示对象,仅展示了虚拟对象20a的头部区域。而第一整体展示对象则可以是指 包括虚拟对象全部区域的对象,例如,当虚拟对象为虚拟人物形象时,第一整体展示对象即可理解为是包括头部区域、颈部区域、身体部位区域以及脚部区域的对象。
S203,展示第一整体展示对象执行目标互动动作的画面,以及展示第二整体展示对象执行目标互动动作的画面,目标互动动作为动作选择操作所选择的互动动作。
本申请实施例中,第一终端基于用户所选择的互动动作(称为目标互动动作),生成包括第一整体展示对象执行第一用户所选择的互动动作的第一虚拟视频画面;例如,第一用户所选择的互动动作为跳舞动作,则第一虚拟视频画面中所呈现的第一整体展示对象也会执行跳舞动作,第二虚拟视频画面中所呈现的第二整体展示对象也会执行跳舞动作;第一用户与第二用户均可以查看到该第一整体展示对象和第二整体展示对象正在跳舞的虚拟视频画面。
在本申请实施例中,每个互动动作对应一个动作执行时长,当到达互动动作的动作执行时长对应的时间时,第一终端将该第一虚拟视频画面中所呈现的与该第一虚拟对象信息相关联的虚拟对象模型,从第一整体展示对象恢复为该第一局部展示对象,并将该第一虚拟视频画面中所呈现的与该第二虚拟对象信息相关联的虚拟对象模型,从第二整体展示对象恢复为该第二局部展示对象。
需要说明的是,第一终端获得的所选择的互动动作,可以是生成第一整体展示对象执行该互动动作的第一虚拟视频画面,也可以是生成第一整体展示对象执行该互动动作的第一虚拟视频画面、以及生成第二整体展示对象执行该互动动作的第二虚拟视频画面。
参见图8,图8是本申请实施例提供的一种在视频虚拟通信中进行互动的场景示意图。如图8所示,用户终端A可以为图1中用户终端集群中的任一用户终端,例如,该用户终端A为用户终端100b。在用户终端A的第一视频通信界面8-1中,用户A与用户B在进行视频虚拟通信,且在该第一视频通信界面8-1中所呈现的虚拟对象模型为第一局部展示对象8-14。如图8所示,该第一视频通信界面8-1中包括有互动控件8-11,用户A在点击该互动控件8-11后,用户终端A可以展示互动动作列表8-12。如图8所示,该互动动作列表8-12中包括有跳舞互动动作、摸摸头互动动作、抱抱互动动作以及跺脚互动动作,用户A所点击的互动动作为跳舞动作。随后,用户终端A可以在该第一视频通信界面8-1中,展示该用户A对应的虚拟对象的第一整体展示对象8-13,且呈现该第一整体展示对象8-13执行跳舞动作的第一虚拟视频画面;随后,当到达跳舞动作的动作执行时长(例如,10s)对应的时间时,在第一视频通信界面8-1中,所呈现的虚拟对象模型会从第一整体展示对象8-13恢复为第一局部展示对象8-14。
继续参见图8,第一终端在实现视频虚拟通信的过程中,在第一视频通信界面8-1中还显示有关闭虚拟通信的控件,比如,关闭虚拟通信按钮。
本申请实施例中,为增加视频虚拟通信的丰富性与趣味性,第一终端还可以在用户进行视频虚拟通信时,对第一视频通信界面的背景进行切换。下面以第一终端切换第一视频通信界面的背景为例对视频虚拟通信中进行背景切换的过程进行说明。
需要说明的是,第一视频通信界面中包括素材切换控件,当第一用户点击针对背景的素材切换控件时,第一终端也就接收到了触发素材切换控件的素材切换操作;此时,第一终端可以响应该素材切换操作,展示配置素材列表;随后,第一用户可以选择任一配置素材作为目标素材,第一终端可以响应第一用户针对配置素材列表的素材选择操作,将视频通信界面的素材切换为目标素材;目标素材为素材选择操作所选择的素材,且目标素材包括静态素材(比如,静态背景图像)和动态素材中的一种或两种(比如,动态背景图像)。
参见图9a,图9a是本申请实施例提供的一种在视频虚拟通信中进行背景的素材切换的场景示意图。如图9a所示,用户终端A可以为图1中用户终端集群中的任一用户 终端,例如,该用户终端A为用户终端100b。在用户终端A的第一视频通信界面9-1中,A用户与用户B在进行视频虚拟通信。并且,第一视频通信界面9-1中包括有素材切换控件9-11,用户A在点击该素材切换控件9-11后,用户终端A可以展示配置素材列表9-12。如图9a所示,该配置素材列表9-12中包括有街道素材、花花素材、星球素材以及公园素材,用户A所点击的配置素材为星球素材。随后,用户终端A将该用户终端A中的第一视频通信界面9-1的背景素材,切换为星球素材。
继续参见图9a,用户终端A中的第一视频通信界面9-1中包括视频虚拟通信的关闭控件9-13,当用户A点击该关闭控件9-13后,用户终端A可以响应该用户A的这一触发操作,在第一视频通信界面9-1输出时间提示信息9-14,以提示用户A视频虚拟通信模式在等待一定时长(例如,3s)后会关闭,且在到达等待时长对应的时间后,视频虚拟通信模式会关闭。
在本申请实施例中,在用户之间进行视频虚拟通信时,增加了用户之间的互动流程(例如,可以选择互动动作进行互动,可以将背景进行切换),从而可以增加视频通信的趣味性与互动性,提高视频通信质量。
参见图9b,图9b是本申请实施例提供的一种视频通信方法的交互示意图;如图9b所示,该视频通信方法包括S301至S310,描述了第一终端和第二终端通过交互实施视频虚拟通信的过程。
S301、第一终端响应针对视频虚拟通信控件的开启操作,向第二终端发送开启请求。
S302、第二终端响应开启请求,通过业务服务器向第一终端发送确认信息。
S303、第一终端响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息。
S304、第二终端响应针对第二视频通信界面的第二虚拟对象选择操作,获得第二虚拟对象信息。
S305、第一终端将第一语音数据转换为与第一虚拟对象信息相关联的第一虚拟音频,并将第一虚拟音频、第一用户关键部位和第一虚拟对象信息发送至第二终端。
S306、第二终端将第二语音数据转换为与第二虚拟对象信息相关联的第二虚拟音频,并将第二虚拟音频、第二用户关键部位和第二虚拟对象信息发送至第一终端。
S307、第一终端基于第二用户关键部位和第二虚拟对象信息,生成第二虚拟视频画面,并基于第一用户关键部位和第一虚拟对象信息,生成第一虚拟视频画面。
S308、第二终端基于第一用户关键部位和第一虚拟对象信息,生成第一虚拟视频画面,并基于第二用户关键部位和第二虚拟对象信息,生成第二虚拟视频画面。
S309、第一终端同步播放第一虚拟视频画面和第一虚拟音频,并同步播放第二虚拟视频画面和第二虚拟音频。
S310、第二终端同步播放第一虚拟视频画面和第一虚拟音频,并同步播放第二虚拟视频画面和第二虚拟音频。
需要说明的是,获取第一虚拟视频画面和第二虚拟视频画面的执行主体是示例性的说明,还可以是服务器,本申请实施例对此不作限定。
参见图10,图10是本申请实施例提供的一种系统架构图。如图10所示,该系统架构图中可以包括发送终端(对应于第一终端)与接收终端(对应于第二终端)。当发送终端用户(对应于第一用户)与接收终端(对应于第二用户)用户进入虚拟通信后,系统会切换为如图10所示的音频处理流程以及视频处理流程。
如图10所示,该发送终端10-1中,音频采集模块10-11,用于采集用户的语音数据。
音频预处理模块10-12,用于对语音数据进行音频预处理,例如,可以对音频进行 声学回声消除(Acoustic Echo Cancellation,AEC)处理、噪音抑制(ANS)处理、自动增益(AGC)、以及静音检测处理,等等。
语音转换模块10-13,用于对预处理后的音频进行音色的转换。
音频编码模块10-14,用于对转换后的音频进行编码,得到编码文件。
音频打包模块10-15,用于对音频编码模块10-14得到的编码文件进行封装处理,得到音频数据流。
视频采集模块10-16,用于采集包含用户的视频数据。
视频预处理模块10-17,用于对采集到的视频数据进行预处理,例如,可以对视频进行转码,对视频的尺寸进行调整等处理。
关键部位提取模块10-18,用于对视频数据中用户的关键部位数据进行提取,也可以对视频数据中用户的表情进行追踪。
关键部位数据打包模块10-19,用于对关键部位提取模块10-18所提取到的关键部位数据进行打包封装,得到关键部位相关的数据流。
如图10所示,该接收终端10-2中,网络收包模块10-21,用于接收发送终端10-1通过公网10-3所发送的音频数据流以及与关键部位相关联的数据流。
网络解包模块10-22,用于将音频数据流进行解封装。
数据解包模块10-23,用于将关键部位对应的数据流进行解封装。
音频解码模块10-24,用于将解封装后的音频文件进行解码处理。
音频渲染模块10-25,用于将解码后得到的数据进行渲染处理。
画面渲染模块10-26,包括3D引擎渲染单元10-261,用于将解码得到的关键部位数据进行3D渲染;还包括视频渲染单元10-262,用于进行虚拟通信的虚拟形象。
同步模块10-27,用于将渲染后的音频与画面进行同步处理。
视频采集模块10-28,以用于采集接收终端10-2对应的用户的视频数据。
视频预处理模块10-29,用于对接收终端10-2的用户的视频数据进行预处理,例如,进行视频转码和尺寸调整等处理。随后,该视频预处理后的数据可以进行关键部位数据回显,并将关键部位数据输入至画面渲染模块10-26以进行画面渲染。
可以理解的是,本申请实施例通过增加视频虚拟通信的功能,可以为用户带来虚拟的视频通信体验,使得双方从人物真实形象转化为虚拟形象,从声音与画面两个维度来增加了用户在视频虚拟通信时的趣味性,可以很好地提高用户进行视频通信时的体验。
参见图11,图11是本申请实施例提供的一种视频通信装置的结构示意图。该视频通信装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该视频通信装置为一个应用软件;该视频通信装置可以用于执行图4所示的方法。如图11所示,视频通信装置11-1可以包括:虚拟对象获取模块11-11、画面输出模块11-12以及音频输出模块11-13。
虚拟对象获取模块11-11,配置为响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,所述第一视频通信界面为所述电子设备和所述电子设备的对端进行视频通信的界面;
画面输出模块11-12,配置为在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,所述第一虚拟视频画面与所述第一虚拟对象信息以及第一用户关键部位相关联,所述第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,所述第二虚拟对象信息是所述对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的;
音频输出模块11-13,配置为播放目标虚拟音频,所述目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,所述第一虚拟音频与第一语音数据以及所述第一 虚拟对象信息相关联,所述第二虚拟音频与第二语音数据以及所述第二虚拟对象信息相关联。
需要说明的是,虚拟对象获取模块11-11、画面输出模块11-12以及音频输出模块11-13对应的实现方式,可以参见图4中S101-S103对应的描述。
在本申请实施例中,第一视频通信界面包括视频虚拟通信控件;此时,虚拟对象获取模块11-11可以包括:对象展示单元11-111以及对象信息获取单元11-112。
对象展示单元11-111,配置为响应第一用户针对视频虚拟通信控件的触发操作,展示至少一个虚拟对象。
对象信息获取单元11-112,配置为响应针对至少一个虚拟对象的虚拟对象选择操作,从至少一个虚拟对象中获取所选择的第一虚拟对象信息;至少一个虚拟对象包括第一虚拟对象信息。
需要说明的是,对象展示单元11-111以及对象信息获取单元11-112对应的实现方式,可以参见图4中S101对应的描述。
继续参见图11,该视频通信装置11-1还可以包括:邀请提示输出模块11-14、开启请求发送模块11-15以及时间提示输出模块11-16。
邀请提示输出模块11-14,配置为响应针对所述视频虚拟通信控件的开启操作,在所述第一视频通信界面中展示邀请提示信息,所述邀请提示信息为针对所述视频虚拟通信控件的开启邀请信息,所述开启邀请信息为所述电子设备请求所述对端开启所述视频虚拟通信控件的提示消息。
开启请求发送模块11-15,配置为响应针对所述邀请提示信息的确定操作,向所述对端发送针对所述视频虚拟通信控件的开启请求。
时间提示输出模块11-16,配置为接收所述对端针对所述开启请求返回的确认信息,展示时间提示信息,所述时间提示信息为进入视频虚拟通信的等待时长。
在本申请实施例中,所述画面输出模块11-12,还配置为当到达所述等待时长对应的时间时,在所述第一视频通信界面中,展示所述第一虚拟视频画面以及所述第二虚拟视频画面。
需要说明的是,邀请提示输出模块11-14、开启请求发送模块11-15以及时间提示输出模块11-16对应的实现方式,可以参见图4中S102对应的描述。
继续参见图11,音频输出模块11-13可以包括:时间戳获取单元11-131、时间戳匹配单元11-132以及数据同步输出单元11-133。
时间戳获取单元11-131,配置为获取所述第一虚拟视频画面对应的第一画面时间戳,以及所述第一虚拟音频对应的第一语音时间戳;获取所述第二虚拟视频画面对应的第二画面时间戳,以及所述第二虚拟音频对应的第二语音时间戳。
时间戳匹配单元11-132,配置为在所述第一语音时间戳中,获取与所述第一画面时间戳具有时间匹配关系的第一目标语音时间戳,以及在所述第二语音时间戳中,获取与所述第二画面时间戳具有时间匹配关系的第二目标语音时间戳。
数据同步输出单元11-133,配置为在在所述第一虚拟音频中,获取所述第一目标语音时间戳对应的第一待播放虚拟音频,以及在所述第二虚拟音频中,获取所述第二目标语音时间戳对应的第二待播放虚拟音频;播放所述第一待播放虚拟音频和所述第二待播放虚拟音频。
需要说明的是,时间戳获取单元11-131、时间戳匹配单元11-132以及数据同步输出单元11-133对应的实现方式,可以参见图4中S103对应的描述。
在本申请实施例中,第一虚拟视频画面中所呈现的与第一虚拟对象信息相关联的虚拟对象模型为第一局部展示对象,所述第二虚拟视频画面中所展示的与所述第二虚拟对 象相关联的虚拟对象模型为第二局部展示对象。
继续参见图11,该视频通信装置11-1还可以包括:互动动作展示模块11-18、对象展示切换模块11-19以及互动动作执行模块11-20。
互动动作展示模块11-18,配置为响应针对所述第一视频通信界面的互动操作,展示虚拟对象互动动作列表。
对象展示切换模块11-19,配置为响应针对所述虚拟对象互动动作列表的动作选择操作,将所述第一局部展示对象切换为第一整体展示对象,以及将所述第二局部展示对象切换为第二整体展示对象。
互动动作执行模块11-20,配置为展示所述第一整体展示对象执行目标互动动作的画面,以及展示所述第二整体展示对象执行所述目标互动动作的画面,所述目标互动动作为所述动作选择操作所选择的互动动作。
需要说明的是,互动动作展示模块11-18、对象展示切换模块11-19以及互动动作执行模块11-20对应的实现方式,可以参见图7中S201-S203对应的描述。
继续参见图11,该视频通信装置11-1还可以包括:执行时间获取模块11-21以及对象展示恢复模块11-22。
执行时间获取模块11-21,配置为获获取所述目标互动动作的动作执行时长。
对象展示恢复模块11-22,还配置为当到达所述动作执行时长对应的时间时,将所述第一整体展示对象恢复为所述第一局部展示对象,以及将所述第二整体展示对象恢复为所述第二局部展示对象。
需要说明的是,执行时间获取模块11-21以及对象展示恢复模块11-22对应的实现方式,可以参见图7中S203对应的描述。
继续参见图11,该视频通信装置11-1还可以包括:素材展示模块11-23以及素材切换模块11-24。
素材展示模块11-23,配置为响应针对所述第一视频通信界面的素材切换操作,展示配置素材列表。
素材切换模块11-24,配置为响应针对所述配置素材列表的素材选择操作,将所述第一视频通信界面的素材切换为目标素材,所述目标素材为所述素材选择操作所选择的素材,所述目标素材包括静态素材和动态素材中的一种或两种。
需要说明的是,素材展示模块11-23以及素材切换模块11-24对应的实现方式,可以参见图7中S203对应的描述。
在本申请实施例中,第一虚拟对象信息包括第一虚拟对象模型;第二虚拟对象信息包括第二虚拟对象模型;继续参见图11,画面输出模块11-12可以包括:关键部位获取单元11-121和画面输出单元11-122。
关键部位获取单元11-121,配置为获取所述第一虚拟对象模型中的第一虚拟关键部位,以及获取所述第一用户关键部位,所述第一虚拟关键部位与所述第一用户关键部位属于相同的部位类型。
画面输出单元11-122,配置为根据所述第一虚拟关键部位以及所述第一用户关键部位,在所述第一视频通信界面中,展示所述第一虚拟视频画面。
关键部位获取单元11-121,还配置为获取所述第二虚拟对象模型中的第二虚拟关键部位,以及获取所述第二用户关键部位,所述第二虚拟关键部位与所述第二用户关键部位属于相同的部位类型。
画面输出单元11-122,还配置为根据所述第二虚拟关键部位以及所述第二用户关键部位,在所述第一视频通信界面中,展示所述第二虚拟视频画面。
需要说明的是,关键部位获取单元11-121、画面输出单元11-122对应的实现方式, 可以参见图4中S102对应的描述。
继续参见图11,画面输出单元11-122可以包括:动作状态获取子单元11-1221、模型坐标获取子单元11-1222、模型坐标确定子单元11-1223、动作状态调整子单元11-1224以及第一输出子单元11-1225。
动作状态获取子单元11-1221,配置为获取所述第一用户关键部位对应的部位动作状态。
模型坐标获取子单元11-1222和模型坐标确定子单元11-1223,配置为根据所述第一虚拟关键部位处于第一临界状态时的第一模型位置坐标、以及所述第一虚拟关键部位处于第二临界状态时的第二模型位置坐标,确定所述第一虚拟关键部位处于所述部位动作状态时的目标模型位置坐标。
动作状态调整子单元11-1224,配置为在所述第一虚拟对象模型中,将所述第一虚拟关键部位调整至所述目标模型位置坐标处,得到第一目标虚拟对象模型,所述第一目标虚拟对象模型中的所述第一虚拟关键部位用于覆盖所述第一用户关键部位。
第一输出子单元11-1225,配置为在所述第一视频通信界面中,展示包含所述目标虚拟对象模型的第一虚拟视频画面。
需要说明的是,动作状态获取子单元11-1221、模型坐标获取子单元11-1222、模型坐标确定子单元11-1223、动作状态调整子单元11-1224以及第一输出子单元11-1225对应的实现方式,可以参见图4中S102对应的描述。
继续参见图11,画面输出单元11-122可以包括:二维坐标获取子单元11-1226、三维坐标确定子单元11-1227以及目标状态确定子单元11-1228。
二维坐标获取子单元11-1226和三维坐标确定子单元11-1227,配置为获取所述第一用户关键部位与所述第一虚拟关键部位之间的位置映射关系。
目标状态确定子单元11-1228,配置为在所述第一虚拟对象模型中,根据所述位置映射关系,将所述第一虚拟关键部位的状态调整至所述第一用户关键部位的部位动作状态,得到第二目标虚拟对象模型;以及在所述第一视频通信界面中,输出包含所述第二目标虚拟对象模型的所述第一虚拟视频画面。
需要说明的是,二维坐标获取子单元11-1226、三维坐标确定子单元11-1227以及目标状态确定子单元11-1228对应的实现方式,可以参见图4中S102对应的描述。
继续参见图11,该视频通信装置11-1还可以包括:语音获取模块11-25、语音转换模块11-26以及数据发送模块11-27。
语音获取模块11-25,配置为获取所述第一语音数据和所述第一用户关键部位。
语音转换模块11-26,配置为将所述第一语音数据转换为与所述第一虚拟对象信息相关联的所述第一虚拟音频。
数据发送模块11-27,配置为将所述第一虚拟音频与所述第一用户关键部位发送至所述对端,以使所述对端在所述第二视频通信界面中展示与第一用户关键部位相关联的所述第一虚拟视频画面,并播放所述第一虚拟音频。
需要说明的是,语音获取模块11-25、语音转换模块11-26以及数据发送模块11-27对应的实现方式,可以参见图4中S103对应的描述。
在本申请实施例中,第一虚拟对象信息包括第一虚拟对象模型与音频处理模型;
语音转换模块11-25可以包括:预处理单元11-251、特征提取单元11-252、音色特征获取单元11-253以及特征融合单元11-254。
预处理单元11-251,配置为将所述第一语音数据进行语音预处理,得到过渡语音数据。
特征提取单元11-252,配置为通过所述音频处理模型,提取所述过渡语音数据的音 频特征。
音色特征获取单元11-253,配置为获取所述音频处理模型中与所述第一虚拟对象模型相关联的音色特征。
特征融合单元11-254,配置为将所述音频特征与所述音色特征进行融合,得到融合音频特征;根据所述融合音频特征,生成所述第一虚拟音频。
需要说明的是,预处理单元11-251、特征提取单元11-252、音色特征获取单元11-253以及特征融合单元11-254对应的实现方式,可以参见图4中S103对应的描述。
继续参见图11,预处理单元11-251可以包括:回声处理子单元11-2511、噪声抑制子单元11-2512以及静音检测子单元11-2513。
回声处理子单元11-2511,配置为从所述第一语音数据中,删除回声音频数据,得到回声过渡语音数据。
噪声抑制子单元11-2512,配置为对所述回声过渡语音数据中的噪声音频数据进行抑制处理,得到噪声过渡语音数据。
静音检测子单元11-2513,配置为从所述噪声过渡语音数据中,删除静音音频数据,得到所述过渡语音数据。
需要说明的是,回声处理子单元11-2511、噪声抑制子单元11-2512以及静音检测子单元11-2513的具体实现方式,可以参见图4中S103对应的描述。
在本申请实施例中,在视频通信应用中增加了虚拟对象,用户可以选择任一虚拟对象,从而可以从真实形象切换为所选择的虚拟对象的形象,而用户之间可以在形象为该虚拟对象时,进行通信对话;也就是说,本申请在用户进行视频通信时,可以将自身形象转换为虚拟形象进行通信,无需中断视频通信用户就可查看到虚拟形象;且在用户进行视频通信时,所听到的对方的音频也为虚拟音频,而非用户的原始语音数据,可以提高视频通信质量。应当理解,本申请丰富了视频通信应用中用户的形象,使得用户之间可以进行视频虚拟通信,且可以使得用户可以在展示虚拟数据时,可以正常维持视频通信的运行和显示;且在视频虚拟通信中,将用户的语音数据转换为了虚拟音频进行输出,可以提高视频通信质量。也就是说,本申请可以丰富视频通信的视频展示方式和趣味性,且在展示虚拟数据的同时可以维持视频通信的正常运行和显示,提高视频通信质量。
参见图12,图12是本申请实施例提供的一种计算机设备的结构示意图。如图12所示,用于视频通信的电子设备可以实施为计算机设备,此时,图11中的视频通信装置11-11可以应用于计算机设备12-00,计算机设备12-00可以包括:处理器12-01,网络接口12-04和存储器12-05,此外,上述计算机设备12-00还包括:用户接口12-03,和至少一个通信总线12-02。其中,通信总线12-02用于实现这些组件之间的连接通信。其中,用户接口12-03可以包括显示屏(Display)、键盘(Keyboard),可选用户接口12-03还可以包括标准的有线接口、无线接口。网络接口12-04可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器12-05可以是高速RAM存储器,也可以是非不稳定的存储器(Non-Volatile Memory),例如至少一个磁盘存储器。存储器12-05可选的还可以是至少一个位于远离前述处理器12-01的存储装置。如图12所示,作为一种计算机可读存储介质的存储器12-05中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在图12所示的计算机设备12-00中,网络接口12-04可提供网络通信功能;而用户接口12-03主要用于为用户提供输入的接口;而处理器12-01可以用于调用存储器12-05中存储的设备控制应用程序,以实现本申请实施例提供的视频通信方法。
本申请实施例中所描述的计算机设备12-00可执行图4中对该视频通信方法的描述,也可执行图11中该视频通信装置11-1对应的描述。
本申请实施例还提供了一种计算机可读存储介质,且上述计算机可读存储介质中存储有计算机设备12-00所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行图4对应的视频通信方法的描述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
上述计算机可读存储介质可以是本申请实施例提供的视频通信装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例提供的视频通信方法。
本申请实施例的说明书和权利要求书及附图中的术语“第一”和“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和/或结构示意图来描述的,具体可由计算机程序指令实现方法流程图和/或结构示意图的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。这些计算机程序指令可提供到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或结构示意一个方框或多个方框中指定的功能的步骤。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。
Claims (17)
- 一种视频通信方法,所述方法由电子设备执行,包括:响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,所述第一视频通信界面为所述电子设备和所述电子设备的对端进行视频通信的界面;在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,所述第一虚拟视频画面与所述第一虚拟对象信息以及第一用户关键部位相关联,所述第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,所述第二虚拟对象信息是所述对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的;播放目标虚拟音频,所述目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,所述第一虚拟音频与第一语音数据以及所述第一虚拟对象信息相关联,所述第二虚拟音频与第二语音数据以及所述第二虚拟对象信息相关联。
- 根据权利要求1所述的方法,其中,所述第一视频通信界面包括视频虚拟通信控件;所述响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,包括:响应针对所述视频虚拟通信控件的触发操作,展示至少一个虚拟对象;响应针对所述至少一个虚拟对象的所述第一虚拟对象选择操作,从所述至少一个虚拟对象中获取所选择的虚拟对象,将所选择的虚拟对象对应的信息,确定为所述第一虚拟对象信息。
- 根据权利要求2所述的方法,其中,所述方法还包括:响应针对所述视频虚拟通信控件的开启操作,在所述第一视频通信界面中展示邀请提示信息,所述邀请提示信息为针对所述视频虚拟通信控件的开启邀请信息,所述开启邀请信息为所述电子设备请求所述对端开启所述视频虚拟通信控件的提示消息;响应针对所述邀请提示信息的确定操作,向所述对端发送针对所述视频虚拟通信控件的开启请求;接收所述对端针对所述开启请求返回的确认信息,展示时间提示信息,所述时间提示信息为进入视频虚拟通信的等待时长;所述在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,包括:当到达所述等待时长对应的时间时,在所述第一视频通信界面中,展示所述第一虚拟视频画面以及所述第二虚拟视频画面。
- 根据权利要求1至3任一项所述的方法,其中,当所述目标虚拟音频包括所述第一虚拟音频和所述第二虚拟音频时,所述播放目标虚拟音频,包括:获取所述第一虚拟视频画面对应的第一画面时间戳,以及所述第一虚拟音频对应的第一语音时间戳;获取所述第二虚拟视频画面对应的第二画面时间戳,以及所述第二虚拟音频对应的第二语音时间戳;在所述第一语音时间戳中,获取与所述第一画面时间戳具有时间匹配关系的第一目标语音时间戳,以及在所述第二语音时间戳中,获取与所述第二画面时间戳具有时间匹配关系的第二目标语音时间戳;在所述第一虚拟音频中,获取所述第一目标语音时间戳对应的第一待播放虚拟音频,以及在所述第二虚拟音频中,获取所述第二目标语音时间戳对应的第二待播放虚拟音频;播放所述第一待播放虚拟音频和所述第二待播放虚拟音频。
- 根据权利要求1至3任一项所述的方法,其中,所述第一虚拟视频画面中所展示的与所述第一虚拟对象信息相关联的虚拟对象模型为第一局部展示对象,所述第二虚拟视频画面中所展示的与所述第二虚拟对象相关联的虚拟对象模型为第二局部展示对象;所述方法还包括:响应针对所述第一视频通信界面的互动操作,展示虚拟对象互动动作列表;响应针对所述虚拟对象互动动作列表的动作选择操作,将所述第一局部展示对象切换为第一整体展示对象,以及将所述第二局部展示对象切换为第二整体展示对象;展示所述第一整体展示对象执行目标互动动作的画面,以及展示所述第二整体展示对象执行所述目标互动动作的画面,所述目标互动动作为所述动作选择操作所选择的互动动作。
- 根据权利要求5所述的方法,其中,所述方法还包括:获取所述目标互动动作的动作执行时长;当到达所述动作执行时长对应的时间时,将所述第一整体展示对象恢复为所述第一局部展示对象,以及将所述第二整体展示对象恢复为所述第二局部展示对象。
- 根据权利要求1至3任一项所述的方法,其中,所述方法还包括:响应针对所述第一视频通信界面的素材切换操作,展示配置素材列表;响应针对所述配置素材列表的素材选择操作,将所述第一视频通信界面的素材切换为目标素材,所述目标素材为所述素材选择操作所选择的素材,所述目标素材包括静态素材和动态素材中的一种或两种。
- 根据权利要求1至3任一项所述的方法,其中,所述第一虚拟对象信息包括第一虚拟对象模型,所述第二虚拟对象信息包括第二虚拟对象模型;所述在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,包括:获取所述第一虚拟对象模型中的第一虚拟关键部位,以及获取所述第一用户关键部位,所述第一虚拟关键部位与所述第一用户关键部位属于相同的部位类型;根据所述第一虚拟关键部位以及所述第一用户关键部位,在所述第一视频通信界面中,展示所述第一虚拟视频画面;获取所述第二虚拟对象模型中的第二虚拟关键部位,以及获取所述第二用户关键部位,所述第二虚拟关键部位与所述第二用户关键部位属于相同的部位类型;根据所述第二虚拟关键部位以及所述第二用户关键部位,在所述第一视频通信界面中,展示所述第二虚拟视频画面。
- 根据权利要求8所述的方法,其中,所述根据所述第一虚拟关键部位以及所述第一用户关键部位,在所述第一视频通信界面中,展示所述第一虚拟视频画面,包括:获取所述第一用户关键部位对应的部位动作状态;根据所述第一虚拟关键部位处于第一临界状态时的第一模型位置坐标、以及所述第一虚拟关键部位处于第二临界状态时的第二模型位置坐标,确定所述第一虚拟关键部位处于所述部位动作状态时的目标模型位置坐标;在所述第一虚拟对象模型中,将所述第一虚拟关键部位调整至所述目标模型位置坐标处,得到第一目标虚拟对象模型,所述第一目标虚拟对象模型中的所述第一虚拟关键部位用于覆盖所述第一用户关键部位;在所述第一视频通信界面中,展示包含所述目标虚拟对象模型的所述第一虚拟视频画面。
- 根据权利要求8所述的方法,其中,所述根据所述第一虚拟关键部位以及所述第一用户关键部位,在所述第一视频通信界面中,展示所述第一虚拟视频画面,包括获取所述第一用户关键部位与所述第一虚拟关键部位之间的位置映射关系;在所述第一虚拟对象模型中,根据所述位置映射关系,将所述第一虚拟关键部位的状态调整至所述第一用户关键部位的部位动作状态,得到第二目标虚拟对象模型;在所述第一视频通信界面中,输出包含所述第二目标虚拟对象模型的所述第一虚拟视频画面。
- 根据权利要求1至3任一项所述的方法,其中,当所述目标虚拟音频包括所述第一虚拟音频时,所述方法还包括:获取所述第一语音数据和所述第一用户关键部位;将所述第一语音数据转换为与所述第一虚拟对象信息相关联的所述第一虚拟音频;将所述第一虚拟音频与所述第一用户关键部位发送至所述对端,以使所述对端在所述第二视频通信界面中展示与第一用户关键部位相关联的所述第一虚拟视频画面,并播放所述第一虚拟音频。
- 根据权利要求11所述的方法,其中,所述第一虚拟对象信息包括第一虚拟对象模型与音频处理模型;所述将所述第一语音数据转换为与所述第一虚拟对象信息相关联的所述第一虚拟音频,包括:将所述第一语音数据进行语音预处理,得到过渡语音数据;通过所述音频处理模型,提取所述过渡语音数据的音频特征;获取所述音频处理模型中与所述第一虚拟对象模型相关联的音色特征;将所述音频特征与所述音色特征进行融合,得到融合音频特征;根据所述融合音频特征,生成所述第一虚拟音频。
- 根据权利要求12所述的方法,其中,所述将所述第一语音数据进行语音预处理,得到过渡语音数据,包括:从所述第一语音数据中,删除回声音频数据,得到回声过渡语音数据;对所述回声过渡语音数据中的噪声音频数据进行抑制处理,得到噪声过渡语音数据;从所述噪声过渡语音数据中,删除静音音频数据,得到所述过渡语音数据。
- 一种用于视频通信的电子设备,包括:处理器、存储器以及网络接口;所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供网络通信功能,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行权利要求1-13任一项所述的视频通信方法。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行权利要求1-13任一项所述的视频通信方法。
- 一种视频通信装置,包括:虚拟对象获取模块,配置为响应针对第一视频通信界面的第一虚拟对象选择操作,获得第一虚拟对象信息,所述第一视频通信界面为所述电子设备和所述电子设备的对端进行视频通信的界面;画面输出模块,配置为在所述第一视频通信界面中,展示第一虚拟视频画面以及第二虚拟视频画面,所述第一虚拟视频画面与所述第一虚拟对象信息以及第一用户关键部位相关联,所述第二虚拟视频画面与第二虚拟对象信息以及第二用户关键部位相关联,所述第二虚拟对象信息是所述对端响应针对第二视频通信界面的第二虚拟对象选择操作获得的;音频输出模块,配置为播放目标虚拟音频,所述目标虚拟音频包括第一虚拟音频和第二虚拟音频中的一种或两种,所述第一虚拟音频与第一语音数据以及所述第一虚拟对象信息相关联,所述第二虚拟音频与第二语音数据以及所述第二虚拟对象信息相关联。
- 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令被用于视频通信的电子设备的处理器执行时,执行权利要求1-13任一项所述的视频通信方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/973,410 US20230047858A1 (en) | 2020-10-26 | 2022-10-25 | Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011156220.5 | 2020-10-26 | ||
CN202011156220.5A CN113395597A (zh) | 2020-10-26 | 2020-10-26 | 一种视频通讯处理方法、设备及可读存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/973,410 Continuation US20230047858A1 (en) | 2020-10-26 | 2022-10-25 | Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022089224A1 true WO2022089224A1 (zh) | 2022-05-05 |
Family
ID=77616479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/124089 WO2022089224A1 (zh) | 2020-10-26 | 2021-10-15 | 一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230047858A1 (zh) |
CN (1) | CN113395597A (zh) |
WO (1) | WO2022089224A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114928755A (zh) * | 2022-05-10 | 2022-08-19 | 咪咕文化科技有限公司 | 一种视频制作方法、电子设备及计算机可读存储介质 |
WO2024193412A1 (zh) * | 2023-03-20 | 2024-09-26 | 抖音视界有限公司 | 视频通话方法、装置、电子设备及存储介质 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113395597A (zh) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | 一种视频通讯处理方法、设备及可读存储介质 |
CN113938336A (zh) * | 2021-11-15 | 2022-01-14 | 网易(杭州)网络有限公司 | 会议控制的方法、装置和电子设备 |
CN115237248B (zh) * | 2022-06-20 | 2024-10-15 | 北京有竹居网络技术有限公司 | 虚拟对象的展示方法、装置、设备、存储介质及程序产品 |
CN116319636A (zh) * | 2023-02-17 | 2023-06-23 | 北京字跳网络技术有限公司 | 基于虚拟对象的互动方法、装置、设备、存储介质及产品 |
CN115996303B (zh) * | 2023-03-23 | 2023-07-25 | 科大讯飞股份有限公司 | 视频生成方法、装置、电子设备和存储介质 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103856390A (zh) * | 2012-12-04 | 2014-06-11 | 腾讯科技(深圳)有限公司 | 即时通讯方法及系统、通讯信息处理方法、终端 |
CN106817349A (zh) * | 2015-11-30 | 2017-06-09 | 厦门幻世网络科技有限公司 | 一种在通信过程中使通信界面产生动画效果的方法及装置 |
CN108377356A (zh) * | 2018-01-18 | 2018-08-07 | 上海掌门科技有限公司 | 基于虚拟画像的视频通话的方法与设备 |
US20180268589A1 (en) * | 2017-03-16 | 2018-09-20 | Linden Research, Inc. | Virtual reality presentation of body postures of avatars |
CN108880975A (zh) * | 2017-05-16 | 2018-11-23 | 腾讯科技(深圳)有限公司 | 信息显示方法、装置及系统 |
CN109740476A (zh) * | 2018-12-25 | 2019-05-10 | 北京琳云信息科技有限责任公司 | 即时通讯方法、装置和服务器 |
CN110213521A (zh) * | 2019-05-22 | 2019-09-06 | 创易汇(北京)科技有限公司 | 一种虚拟即时通信方法 |
WO2020056694A1 (zh) * | 2018-09-20 | 2020-03-26 | 华为技术有限公司 | 增强现实的通信方法及电子设备 |
CN113395597A (zh) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | 一种视频通讯处理方法、设备及可读存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110278140B (zh) * | 2018-03-14 | 2022-05-24 | 阿里巴巴集团控股有限公司 | 通讯方法及装置 |
CN110677610A (zh) * | 2019-10-08 | 2020-01-10 | Oppo广东移动通信有限公司 | 一种视频流控制方法、视频流控制装置及电子设备 |
-
2020
- 2020-10-26 CN CN202011156220.5A patent/CN113395597A/zh active Pending
-
2021
- 2021-10-15 WO PCT/CN2021/124089 patent/WO2022089224A1/zh active Application Filing
-
2022
- 2022-10-25 US US17/973,410 patent/US20230047858A1/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103856390A (zh) * | 2012-12-04 | 2014-06-11 | 腾讯科技(深圳)有限公司 | 即时通讯方法及系统、通讯信息处理方法、终端 |
CN106817349A (zh) * | 2015-11-30 | 2017-06-09 | 厦门幻世网络科技有限公司 | 一种在通信过程中使通信界面产生动画效果的方法及装置 |
US20180268589A1 (en) * | 2017-03-16 | 2018-09-20 | Linden Research, Inc. | Virtual reality presentation of body postures of avatars |
CN108880975A (zh) * | 2017-05-16 | 2018-11-23 | 腾讯科技(深圳)有限公司 | 信息显示方法、装置及系统 |
CN108377356A (zh) * | 2018-01-18 | 2018-08-07 | 上海掌门科技有限公司 | 基于虚拟画像的视频通话的方法与设备 |
WO2020056694A1 (zh) * | 2018-09-20 | 2020-03-26 | 华为技术有限公司 | 增强现实的通信方法及电子设备 |
CN109740476A (zh) * | 2018-12-25 | 2019-05-10 | 北京琳云信息科技有限责任公司 | 即时通讯方法、装置和服务器 |
CN110213521A (zh) * | 2019-05-22 | 2019-09-06 | 创易汇(北京)科技有限公司 | 一种虚拟即时通信方法 |
CN113395597A (zh) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | 一种视频通讯处理方法、设备及可读存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114928755A (zh) * | 2022-05-10 | 2022-08-19 | 咪咕文化科技有限公司 | 一种视频制作方法、电子设备及计算机可读存储介质 |
CN114928755B (zh) * | 2022-05-10 | 2023-10-20 | 咪咕文化科技有限公司 | 一种视频制作方法、电子设备及计算机可读存储介质 |
WO2024193412A1 (zh) * | 2023-03-20 | 2024-09-26 | 抖音视界有限公司 | 视频通话方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113395597A (zh) | 2021-09-14 |
US20230047858A1 (en) | 2023-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022089224A1 (zh) | 一种视频通信方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
CN107423809B (zh) | 应用于视频直播平台的虚拟机器人多模态交互方法和系统 | |
US20200125920A1 (en) | Interaction method and apparatus of virtual robot, storage medium and electronic device | |
EP3050604A1 (en) | Location based augmentation for story reading | |
US20140195222A1 (en) | Speech Modification for Distributed Story Reading | |
CN104994921A (zh) | 用于分布式故事阅读的视觉内容修改 | |
CN110609620A (zh) | 基于虚拟形象的人机交互方法、装置及电子设备 | |
CN111787986B (zh) | 基于面部表情的语音效果 | |
CN111654715A (zh) | 直播的视频处理方法、装置、电子设备及存储介质 | |
CN113923462A (zh) | 视频生成、直播处理方法、设备和可读介质 | |
CN109033423A (zh) | 同传字幕显示方法及装置、智能会议方法、装置及系统 | |
CN112188228A (zh) | 直播方法及装置、计算机可读存储介质和电子设备 | |
CN111954063A (zh) | 视频直播间的内容显示控制方法及装置 | |
CN113301372A (zh) | 直播方法、装置、终端及存储介质 | |
JP2003037826A (ja) | 代理画像表示装置およびテレビ電話装置 | |
CN114201596A (zh) | 虚拟数字人使用方法、电子设备和存储介质 | |
CN113157241A (zh) | 交互设备、交互装置及交互系统 | |
CN113286160A (zh) | 视频处理方法、装置、电子设备以及存储介质 | |
CN111160051A (zh) | 数据处理方法、装置、电子设备及存储介质 | |
JP2023099309A (ja) | アバターを通じて映像の音声を手話に通訳する方法、コンピュータ装置、およびコンピュータプログラム | |
CN110764618A (zh) | 一种仿生交互系统、方法及相应的生成系统和方法 | |
JP2001357414A (ja) | アニメーション通信方法およびシステム並びにそれに用いる端末装置 | |
KR102546532B1 (ko) | 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치 | |
WO2020095728A1 (ja) | 情報処理装置及び情報処理方法 | |
CN115730048A (zh) | 一种会话处理方法、装置、电子设备及可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21884953 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/09/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21884953 Country of ref document: EP Kind code of ref document: A1 |