CN111954063B - Content display control method and device for video live broadcast room - Google Patents

Content display control method and device for video live broadcast room Download PDF

Info

Publication number
CN111954063B
CN111954063B CN202010857464.XA CN202010857464A CN111954063B CN 111954063 B CN111954063 B CN 111954063B CN 202010857464 A CN202010857464 A CN 202010857464A CN 111954063 B CN111954063 B CN 111954063B
Authority
CN
China
Prior art keywords
video
audio stream
target
information
virtual object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010857464.XA
Other languages
Chinese (zh)
Other versions
CN111954063A (en
Inventor
李�浩
王聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010857464.XA priority Critical patent/CN111954063B/en
Publication of CN111954063A publication Critical patent/CN111954063A/en
Application granted granted Critical
Publication of CN111954063B publication Critical patent/CN111954063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration

Abstract

The disclosure relates to a content display control method and device for a video live broadcast room. The content display control method of the video live broadcast room comprises the following steps: receiving a first audio stream sent by a viewer end of a video live broadcast room; acquiring first target text information corresponding to the first audio stream by performing voice recognition on the first audio stream; acquiring a first target action group corresponding to the first target text information according to the corresponding relation between the preset text information and the action group; and generating first animation image information of the virtual object based on the first target action group, and sending the first animation image information to each audience of the video live broadcast room.

Description

Content display control method and device for video live broadcast room
Technical Field
The disclosure relates to the technical field of computers and internet, in particular to a content display control method and device for a live video room.
Background
With the development of network technology and terminal technology, live webcast has been widely popularized and has become a very common entertainment mode in people's life.
When the existing video is played directly, if some animation expressions are added into the live video, the stored animation is needed to be used, and the live video player and the virtual object cannot interact in real time. For example: if a live video player wants the virtual object to make a welcome action in the live broadcasting room, the live video player can only submit the animation of the made virtual object to the server to make the welcome action, and cannot control the action of the virtual object in real time.
Therefore, in the related technology, real-time interaction between a video live player and a virtual object cannot be realized, so that the live broadcast effect and the watching effect are influenced.
Disclosure of Invention
The disclosure provides a content display control method and device for a video live broadcast room, which are used for realizing real-time interaction between a video live broadcast person and a virtual object, thereby improving the live broadcast effect and the watching effect. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, a method for controlling content display in a live video room is provided, including: receiving a first audio stream sent by a viewer end of a video live broadcast room; acquiring first target text information corresponding to the first audio stream by performing voice recognition on the first audio stream; acquiring a first target action group corresponding to the first target text information according to the corresponding relation between the preset text information and the action group; and generating first animation image information of the virtual object based on the first target action group, and sending the first animation image information to each audience of the video live broadcast room.
Optionally, the obtaining target text information corresponding to the first audio stream by performing speech recognition on the audio stream includes: carrying out digital signal processing on the first audio stream to obtain uncompressed waveform data; splitting the waveform data according to a preset granularity; acquiring acoustic characteristic information corresponding to each segment of split waveform data by adopting a preset conversion algorithm; and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream.
Optionally, the obtaining, according to a preset correspondence between the text information and the action group, a first target action group corresponding to the first target text information includes: searching target keywords in the first target text message, wherein the preset keywords are one or more keywords in a preset keyword set, and the preset keyword set comprises text messages in the corresponding relationship; and acquiring the first target action group corresponding to the target keyword in the corresponding relation.
Optionally, the first target action group includes a group of animation pictures of a virtual object; generating first animation image information of the virtual object based on the first target action group, and sending the first animation image information to each audience of the video live broadcast room, wherein the first animation image information comprises: generating first animation image information of the virtual object based on a group of animation pictures of the virtual object included in the first target action group; synthesizing the first animation image information of the virtual object with the current audio and video information to be sent; and sending the synthesized audio and video information to each audience terminal of the live video room.
Optionally, before the receiving the first audio stream transmitted by the viewer side of the video live broadcast room, the method further includes: and sending the image information of the virtual object to each audience terminal of the video live broadcast room.
Optionally, a set of animated pictures of the virtual object is used to describe the motion of one or more key vertices of the virtual object.
Optionally, the method further comprises: receiving a second audio stream sent by a main broadcasting end of the video live broadcasting room; acquiring second target text information corresponding to the second audio stream by performing voice recognition on the second audio stream; acquiring a second target action group corresponding to the second target text information according to the corresponding relation; generating second animation image information of the virtual object based on the second target action group; and sending the second animation image information of the virtual object to each audience terminal of the video live broadcast room.
According to a second aspect of the embodiments of the present disclosure, there is provided a content display control method for a live video room, including: receiving and inputting a first audio stream by a viewer end of a live video room, wherein the first audio stream is used for controlling a virtual object on a playing interface of the live video room; sending the first audio stream to a server; receiving first animation image information returned by the server, wherein the first animation image information is generated by the server according to first target text information corresponding to the first audio stream; and playing the first animation image on the playing interface.
Optionally, receiving the first animation image information returned by the server includes: and receiving the synthesized audio and video information sent by the server, wherein the synthesized audio and video information is obtained after the server synthesizes the first animation image information and the audio and video information currently sent to the audience.
According to a third aspect of the embodiments of the present disclosure, there is provided a content display control apparatus of a live video room, including: the first receiving unit is configured to receive a first audio stream transmitted by a viewer side of a video live broadcast; a first acquisition unit configured to perform voice recognition on the first audio stream to acquire first target text information corresponding to the first audio stream; a second obtaining unit configured to obtain a first target action group corresponding to the first target text information according to a preset corresponding relationship between the text information and the action group; a generating unit configured to perform generating first animation image information of the virtual object based on the first target motion group; and the first sending unit is configured to execute sending the first animation image information to each audience of the video live broadcast room.
Optionally, the acquiring, by performing speech recognition on the audio stream, the first acquiring unit acquires target text information corresponding to the first audio stream, and includes: performing digital signal processing on the first audio stream to obtain uncompressed waveform data; splitting the waveform data according to a preset granularity; acquiring acoustic characteristic information corresponding to each section of split waveform data by adopting a preset conversion algorithm; and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream.
Optionally, the second obtaining unit obtains a first target action group corresponding to the first target text information according to a preset correspondence between the text information and the action group, and includes: searching target keywords in the first target text message, wherein the preset keywords are one or more keywords in a preset keyword set, and the preset keyword set comprises text messages in the corresponding relationship; and acquiring the first target action group corresponding to the target keyword in the corresponding relation.
Optionally, the first target action group includes a group of animation pictures of a virtual object; the device further comprises: the synthesis unit is configured to synthesize the first animation image information of the virtual object and the current audio and video information to be sent; the first sending unit sends the video to each audience of the video live broadcast room, and comprises: and sending the synthesized audio and video information to each audience terminal of the live video room.
Optionally, the first sending unit is further configured to perform sending, before receiving the first audio stream sent by the audience of the live video broadcast room, image information of the virtual object to each audience of the live video broadcast room.
Optionally, the first receiving unit is further configured to perform receiving a second audio stream sent by a main broadcasting end of the live video broadcast room; the first obtaining unit is further configured to perform voice recognition on the second audio stream to obtain second target text information corresponding to the second audio stream; the second obtaining unit is further configured to obtain a second target action group corresponding to the second target text information according to the corresponding relationship; the generating unit is further configured to generate second animation image information of the virtual object based on the second target action group; the first sending unit is further configured to send the second animation image information to each audience of the live video room.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a content display control apparatus of a live video room, including: a second receiving unit configured to perform receiving input of a first audio stream, wherein the first audio stream is used for controlling a virtual object on a playing interface of the video live broadcast room; a second transmitting unit configured to perform transmitting the first audio stream to a server; the second receiving unit is further configured to execute receiving first animation image information returned by the server, wherein the first animation image information is generated by the server according to first target text information corresponding to the first audio stream; and the playing unit is configured to play the first animation image on the playing interface.
Optionally, the receiving, by the second receiving unit, the first moving image information returned by the server includes: and receiving the synthesized audio and video information sent by the server, wherein the synthesized audio and video information is obtained after the server synthesizes the first animation image information and the audio and video information currently sent to the audience.
According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the content display control method of the video live broadcast room provided by the first aspect or the second aspect.
According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to execute the content display control method of the video live broadcast room provided in the first aspect or the second aspect.
According to a seventh aspect of the embodiments of the present disclosure, there is provided a computer program product, wherein when the instructions of the computer program product are executed by a processor of an electronic device, the electronic device is caused to execute any one of the above-mentioned content display control methods of the video live broadcast room.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: in the technical scheme provided by the embodiment of the disclosure, when a first audio stream sent by a viewer side of a live video room is received, voice recognition is performed on the first audio stream to obtain text information corresponding to the audio stream sent by the viewer side, a group of animation pictures corresponding to the text information of the audio stream is obtained according to the corresponding relationship between preset text information and action groups, animation image information of a virtual object is generated based on the group of animation pictures, and the animation image information is sent to each viewer side of the live video room. Therefore, the client can display the real-time interactive animation image information of the virtual object, so that the real-time interaction between a video live player and the virtual object can be realized, and the live broadcast effect and the watching effect are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method for content display control in a live video room in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating another method of controlling display of content in a live video room in accordance with an illustrative embodiment;
FIG. 3 is a flow diagram illustrating yet another method for controlling display of content in a live video room in accordance with an illustrative embodiment;
fig. 4 is a block diagram illustrating a content display control device of a video live room according to an exemplary embodiment;
fig. 5 is a block diagram illustrating a content display control apparatus of still another video live room according to an exemplary embodiment;
FIG. 6 is a block diagram illustrating an apparatus in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating an apparatus in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a content display control method of a live video room, which may be used in a server, according to an exemplary embodiment, and includes the following steps S11 to S16, as illustrated in fig. 1.
In step S11, a first audio stream transmitted by a viewer in a live video room is received.
In practical application, the server can perform audio connection with the main broadcasting end and the audience end by using the voice microphone connecting function in the live broadcasting process. A video live broadcast room comprises two paths of audio, a main broadcast end audio and a spectator end audio. The audio of the anchor end is the audio of the anchor collected by the user terminal of the anchor end, and the audio of the audience end is the audio of the audience collected by the user terminal of the audience end.
In the embodiment of the present disclosure, the user Terminal may be a Terminal device, a Mobile Terminal (Mobile Terminal), a Mobile user equipment, or the like, and may communicate with the server via a network (including a wired network and a wireless network). Mobile terminals and mobile user equipment include, but are not limited to, mobile phones and computers with mobile terminals, such as mobile devices that may be portable, pocket, hand-held, computer-included, or vehicle-mounted, and terminal equipment includes, but is not limited to, notebooks, desktop computers (desktops), laptop computers (laptops), and the like. The present embodiment is not particularly limited.
In practical applications, the main broadcast terminal may receive the audio of the audience terminal and then input the audio into a microphone of the main broadcast terminal, so that a situation of returning to the microphone occurs, and a user terminal of the main broadcast terminal may filter the audio of the audience terminal through a hardware-included filter of the user terminal and then send the filtered audio to the server, or the server may filter the received audio of the main broadcast terminal after receiving the audio of the main broadcast terminal, which is not limited in the embodiment of the present disclosure. Similarly, the viewer-side audio may be similarly processed.
Therefore, in one possible implementation manner, in this embodiment of the disclosure, in S11, after receiving the first audio stream (i.e., the spectator audio stream) sent by the spectator, the server may filter the first audio stream to improve the accuracy of the subsequent speech recognition on the first audio stream.
In step S12, a first target text message corresponding to the first audio stream is obtained by performing speech recognition on the first audio stream.
In the embodiment of the disclosure, since the server extracts the audio stream of the audience and then identifies the corresponding text information from the extracted audio stream, the amount of calculation can be reduced, and thus the real-time response can be improved.
In a possible implementation manner, when performing speech recognition on the first audio stream, the first audio stream may be subjected to digital signal processing to obtain uncompressed waveform data, and then the waveform data is split according to a preset granularity; acquiring acoustic characteristic information corresponding to each section of split waveform data by adopting a preset conversion algorithm; and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream. By this possible implementation, the accuracy of speech recognition can be improved.
In the above possible implementation, the preset granularity may be determined according to practical applications, and may be, for example, in milliseconds. Since the split waveform is not describable in the time domain, a preset transform algorithm may be used to extract acoustic features of the waveform, where the preset transform algorithm may include, but is not limited to Mel Frequency Cepstrum Coefficient (MFCC), and converts each frame (i.e. each piece of split waveform data) into a multidimensional vector according to physiological characteristics of human ears, where the vector includes content information of the frame of speech, and then identifies, through a decoding algorithm, a first target text information corresponding to the first audio stream. The preset decoding algorithm may be an AI (artificial intelligence) neural network related algorithm, for example, a hidden markov model, and by constructing a state network, a path that is most matched with a vector is found from the state network, and after understanding a voice at a high probability, the voice is converted into a text and output.
In step S13, a first target action group corresponding to the first target text information is obtained according to a preset correspondence between the text information and the action group.
Wherein the virtual object may be any animated figure, such as: a small pet, a disney cartoon character, etc., the 3D model information of the virtual object may be stored in the server in advance.
Optionally, the first target action group includes a group of animation pictures of a virtual object.
In the embodiment of the present disclosure, the server may store the correspondence between the text information and the action group in advance, for example, a group of animation pictures of virtual objects corresponding to different text information are preset in the server, so as to establish the correspondence between the text information and the animation group.
For example, in 3D modeling of a virtual object, a series of skeleton animations can be designed, i.e. the movement of some key vertices of the virtual object, and the process of animation is to create a skeleton and then map the vertices to the skeleton, for example, the movement of the hand joint is the movement of the vertices on the hand joint, and the movement of the shoulder joint is the movement of the key vertices on the shoulder. For 3D objects, the skin of the rendered model is one set, but skeletal animation may require multiple sets of pictures. The skeletal animation corresponding to different text information can be preset in the server. Thus, in one possible implementation, a virtually corresponding set of animation pictures is used to describe the motion of one or more key vertices of a virtual object.
In practical application, in order to reduce the storage space, the states corresponding to the text information may be recorded in the correspondence relationship, and then a group of animation pictures corresponding to each state may be recorded. For example, "family good" corresponds to state 1, and state 1 corresponds to a set of skeletal animations.
In embodiments of the present disclosure, different textual information may correspond to the same set of animation pictures, e.g., "good family" and "good friends" may correspond to the same set of animation pictures used to show the animation of the virtual object making the welcome action. The content of the group of animation pictures corresponding to each text message may be set according to an actual application, for example, the action of the group of animation pictures corresponding to each text message may be a gesture corresponding to the text message (that is, the meaning of the gesture is the same as that of the text message), or may also be an expression corresponding to the text message, which is not limited in the embodiment of the present disclosure.
In one possible implementation, the text information recognized from the first audio stream may not completely match the text information in the corresponding relationship, for example, the text information recorded in the corresponding relationship is "damask", and the text information recognized from the first audio stream is "maiden of girls", so that a group of animation pictures corresponding to the text information recognized by voice may not be found in the corresponding relationship. Therefore, in order to find a group of animation pictures corresponding to the identified text information, in this possible implementation manner, in step S13, a target keyword in the first target text information may be first searched, where the target keyword is one or more keywords in a preset keyword set, and the preset keyword set includes the text information in the corresponding relationship; and then acquiring the first target action group corresponding to the target keyword in the corresponding relation. In the possible implementation manner, the target keyword is found from the first target text information, and then the target action group corresponding to the target keyword is found from the corresponding relationship, so that the success rate of finding the matched action group can be improved, and the number of the text information stored in the corresponding relationship can be reduced.
In one possible implementation, the virtual object may be started in the live room before step S11. Therefore, in this possible implementation, before step S11, the method may further include: and synthesizing the image information of the virtual object with the current audio and video information to be sent, and sending the synthesized audio and video information to each audience terminal of the live video room. In a specific application, the server may execute the above steps after receiving a start message sent by the anchor terminal or the viewer terminal, or start the virtual object when the live broadcasting is started, that is, display the virtual object in the live broadcasting room, which is not limited in this embodiment.
In step S14, first moving image information of the virtual object is generated based on the first target motion group.
In this embodiment, for example, the server may add time information to a group of animation pictures of the virtual object included in the first target motion group to generate a piece of animation, that is, the first animation image information.
In step S15, the first moving picture information of the virtual object is sent to each viewer in the live video room.
In a possible implementation manner, when the first animation image information is sent, the first animation image information and the current audio/video information to be sent may be synthesized, and the synthesized audio/video information is sent to each viewer terminal in the live video room.
The current audio/video information to be sent is the audio/video information currently sent to the live video room, that is, the audio/video information displayed at the client, for example, the video information of the video broadcaster.
Wherein, the audience terminal refers to a playing terminal of the live video. And sending the synthesized audio and video information to each audience terminal of the live video room, so that the animation of the virtual object can be displayed at the audience terminal.
In one possible implementation, the method may further include: receiving a second audio stream sent by a main broadcasting end of the video live broadcasting room; acquiring second target text information corresponding to the second audio stream by performing voice recognition on the second audio stream; acquiring a second target action group corresponding to the second target text information according to the corresponding relation, wherein the second target action group comprises a group of animation pictures of the virtual object; generating second animation image information of the virtual object based on a group of animation pictures of the virtual object included in the second target action group; synthesizing the second animation image information of the virtual object with the current audio and video information to be sent; and sending the synthesized audio and video information to each audience terminal of the live video room. That is, in this possible implementation manner, the anchor terminal may also control the motion of the virtual object through the voice, so that the virtual object performs a motion corresponding to the voice input by the anchor, for example, may perform a gesture corresponding to the voice content, thereby enabling a viewer who is inconvenient to hear the voice or has a hearing impairment to acquire the voice information output by the anchor.
In the foregoing possible implementation manners, the processing manner of the first audio stream at the viewer end in the foregoing possible implementation manners may be adopted, and the corresponding processing of the second audio stream may be performed, for a specific method, refer to the description in the foregoing possible implementation manners, and details are not described here again.
In the technical scheme provided by the embodiment of the disclosure, when a first audio stream sent by a viewer terminal of a live video broadcast room is received, voice recognition is performed on the first audio stream to obtain text information corresponding to the audio stream sent by the viewer terminal, then a group of animation pictures corresponding to the text information of the audio stream is obtained according to the corresponding relationship between preset text information and action groups, animation image information of a virtual object is generated based on the group of animation pictures, and the animation image information is synthesized with the current audio/video information to be sent and then sent to each viewer terminal of the live video broadcast room. Therefore, the client can display the real-time interactive animation image information of the virtual object and the audio and video information synthesized by the current audio and video information to be sent, so that real-time interaction between a live video player and the virtual object can be realized, and the live broadcast effect and the watching effect are improved.
The overall process of the real-time interaction between the audience and the virtual object in the live video room in the embodiment of the present disclosure is described below with reference to fig. 2.
First, an action group corresponding to the text information is preset in the server, so as to establish a corresponding relationship between the text information and the action group, in the corresponding relationship, each item of text information may be one or more keywords, each action group may include a group of animation pictures, and the group of animation pictures may be used for describing a group of body movements or a set of facial expressions and the like of the virtual object.
In the case that the virtual object is not started in the live video room, that is, in the case that the virtual object is not displayed in the playing interface of the live video room, in step S201, the anchor inputs a start command of the virtual object through its user equipment (hereinafter, referred to as an anchor), and the anchor sends the start command of the virtual object to the server.
For example, the live video interface may provide a menu displaying the virtual object, and the anchor selects the menu and sends a start command for the virtual object to the server.
Wherein the virtual object may be any animated figure, such as a small pet, an animated character, etc. The animation model of the virtual object is stored in the server in advance. In this embodiment, the virtual object may be a three-dimensional animated figure.
In one possible implementation, multiple virtual objects may be provided for user selection, for example, a selection button is provided in a client APP of a live video for user selection, and a user may select a virtual object of his mood meter.
In practical applications, a viewer in a live video room may input a start command of a virtual object through a user device (hereinafter referred to as a viewer side), which is not limited in this embodiment.
In step S202, the server receives a virtual object activation command, and generates moving image information of the virtual object.
In practical applications, the state of the animated image information of the virtual object generated in step S202 may be static or may be a default animated state, for example, a move of the virtual object while holding a fist, which is not limited in this embodiment.
In step S203, the server synthesizes the animation image information of the virtual object with the current audio/video information to be sent.
The server synthesizes the animation image information of the virtual object generated in the step S202 with the current audio/video information of the live broadcast room, and synthesizes the animation image information of the virtual object into the current audio/video information of the live broadcast room.
In a possible implementation manner, when synthesizing the animation image information of the virtual object and the current audio/video information to be sent, the display position of the virtual object may be set to a fixed position, for example, the lower right corner of the playing interface, and the like, which may be specifically determined according to the actual application.
And in S204, the server sends the synthesized audio and video information to the client.
The client refers to a playing end of a live video, for example, a spectator end.
In step S205, the viewer displays the synthesized audio/video information.
In step S206, if the viewer needs to control the virtual object to act during watching the played live video, the viewer may input audio information through the viewer (e.g., a user terminal with a microphone), and the viewer sends the collected audio stream to the server.
In step S207, the server receives the audio stream transmitted by the viewer.
In step S208, the audio stream received from the viewer side is subjected to digital signal processing to convert it into uncompressed waveform data.
In step S209, the server splits the waveform data in milliseconds, and since the split waveform has no description capability in the time domain, it needs to extract acoustic features from the waveform, and in S209, the server converts each frame into a multi-dimensional vector according to the physiological characteristics of human ears by a predetermined conversion algorithm (e.g., MFCC), and the multi-dimensional vector contains content information of the frame of speech.
Decoding is performed in step S210. The server can construct a state network through an AI neural network related algorithm, such as a hidden Markov model, find a path which is most matched with the vector from the state network, and convert the voice into a text for output after understanding the voice roughly.
In step S211, the action group of the virtual object corresponding to the character obtained in step S210 is acquired from the preset correspondence relationship.
In 3D modeling of a virtual object, a series of skeleton animations (i.e. multiple groups of skeleton animations) may be designed, where the skeleton animations abstract the movements of some key vertices, and the animation creation process is to create skeleton and then map the vertices to the skeleton, for example, the hand joint movement is the movement of the vertices on the hand joint, and the shoulder joint movement is the movement of the key vertices on the shoulder.
For a 3D image, the skin for drawing the model is one set, but the skeleton animation may need multiple sets of pictures, for example, a good family corresponds to state 1, and state 1 corresponds to a set of skeleton animation, so that each sentence spoken by the speaking person can be changed into a corresponding action to show the 3D image.
In step S212, animation image information of the virtual object is generated based on the motion group of the virtual object obtained in step S211, and the animation image information is synthesized with the current audio/video information to be transmitted.
In step S213, the audio/video information synthesized in step S212 is sent to each viewer side of the live video room.
In step S314, the viewer end displays the audio/video information obtained by synthesizing the animation image information of the virtual object and the current audio/video information to be sent.
When the viewer-side control of the virtual object to perform the action in real time is continued, the steps S201 to S205 do not need to be performed, and the step of virtual object real-time control may be performed from S206.
Fig. 3 is a flowchart illustrating a content display control method of a live video room, which may be used in a client, according to an exemplary embodiment, and includes the following steps S31 to S34, as illustrated in fig. 3.
In S31, a spectator end of a live video room receives and inputs a first audio stream, where the first audio stream is used to control a virtual object on a play interface of the live video room.
For example, if a viewer of a live video room wants to control the action of a virtual object on a playing interface of the live video room, a first audio stream can be input through an audio input device at the viewer end.
In S32, the viewer side of the live video room transmits the first audio stream to the server.
After receiving the first audio stream, the server may generate first animation image information of the virtual object by using the content display control method of the live video room shown in fig. 1 or fig. 2, and send the first animation image information to each viewer end of the live video room.
In S33, first animation image information returned by the server is received, where the first animation image information is generated by the server according to the first target text information corresponding to the first audio stream.
In one possible implementation manner, receiving the first animation image information returned by the server may include: and receiving the synthesized audio and video information sent by the server, wherein the synthesized audio and video information is obtained after the server synthesizes the first animation image information and the audio and video information currently sent to the audience. For details, reference may be made to the related description in the content display control method of the live video room shown in fig. 1 and fig. 2.
In S34, the viewer side of the live video room plays the first animation image on the playing interface. Thereby enabling viewers in the live video room to view the actions of the virtual objects.
Fig. 4 is a block diagram illustrating a content display control apparatus of a video live room according to an exemplary embodiment. Referring to fig. 4, the apparatus 400 includes a first receiving unit 411, a first acquiring unit 412, a second acquiring unit 413, a generating unit 414, a synthesizing unit 415, and a first transmitting unit 416.
In this embodiment, the first receiving unit 411 is configured to perform receiving a first audio stream transmitted by a viewer of a live video room; a first obtaining unit 412 configured to perform obtaining of first target text information corresponding to the first audio stream by performing voice recognition on the first audio stream; a second obtaining unit 413 configured to perform obtaining a first target action group corresponding to the first target text information according to a preset correspondence relationship between text information and action groups; a generating unit 414 configured to perform generating first animation image information of the virtual object based on the first target motion group; a first sending unit 416, configured to perform sending the first animated image information of the virtual object to each viewer end of the live video room.
In one possible implementation manner, the acquiring unit 412 acquires target text information corresponding to the first audio stream by performing speech recognition on the audio stream, and includes: carrying out digital signal processing on the first audio stream to obtain uncompressed waveform data; splitting the waveform data according to a preset granularity; acquiring acoustic characteristic information corresponding to each segment of split waveform data by adopting a preset conversion algorithm; and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream.
In one possible implementation manner, the second obtaining unit 413 obtains a first target action group corresponding to the first target text information according to a preset correspondence between text information and action groups, where the obtaining includes: searching target keywords in the first target text information, wherein the target keywords are one or more keywords in a preset keyword set, and the preset keyword set comprises text information in the corresponding relation; and acquiring the first target action group corresponding to the target keyword in the corresponding relation.
In one possible implementation, the first target action group includes a group of animation pictures of a virtual object; as shown in fig. 4, the apparatus may further include: a synthesizing unit 415 configured to perform synthesizing the first animation image information of the virtual object with the current audio/video information to be sent; the first sending unit 416 sends the video to each viewer in the live video room, including: and sending the synthesized audio and video information to each audience terminal of the live video room.
In one possible implementation manner, the first sending unit 416 is further configured to send the image information of the virtual object to each spectator end of the live video room before the receiving of the first audio stream sent by the spectator end of the live video room.
In one possible implementation manner, the first receiving unit 411 is further configured to perform receiving a second audio stream sent by a main broadcasting end of the video live broadcasting room; the first obtaining unit 412 is further configured to perform obtaining of second target text information corresponding to the second audio stream by performing voice recognition on the second audio stream; the second obtaining unit 413 is further configured to perform obtaining, according to the correspondence, a second target action group corresponding to the second target text information; the generating unit 414 is further configured to perform generating second animated image information of the virtual object based on the second target motion group; the synthesizing unit 415 is further configured to synthesize the second animation image information of the virtual object with the current audio/video information to be sent; the first sending unit 416 is further configured to perform sending the second animated image information to each viewer end of the live video room.
With regard to the apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment related to the method, and has the same advantageous effects that will not be set forth in detail herein.
Fig. 5 is a block diagram illustrating a content display control apparatus of a video live room according to an exemplary embodiment. Referring to fig. 5, the apparatus 500 includes a second receiving unit 511, a second transmitting unit 512, and a playing unit 513.
In this embodiment, the second receiving unit 511 is configured to perform receiving and inputting a first audio stream, where the first audio stream is used for controlling a virtual object on a playing interface of the live video room; a second transmitting unit 512 configured to perform transmitting the first audio stream to a server; the second receiving unit 511 is further configured to perform receiving first animation image information returned by the server, where the first animation image information is generated by the server according to first target text information corresponding to the first audio stream; a playing unit 513 configured to execute playing the first animation image on the playing interface.
In one possible implementation manner, the second receiving unit 512 receives the first animation image information returned by the server, including: and receiving the synthesized audio and video information sent by the server, wherein the synthesized audio and video information is obtained after the server synthesizes the first animation image information and the audio and video information currently sent to the audience.
With regard to the apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment related to the method, and has the same advantageous effects that will not be set forth in detail herein.
Fig. 6 illustrates a block diagram of an apparatus 600 for content display control for a video live room, according to an example embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an interface for input/output (I/O) 612, a sensor component 614, and a communication component 616.
The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.
The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
A power supply component 606 provides power to the various components of the device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.
The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operating mode, such as a call mode, a record mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.
The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 7 is a block diagram illustrating an apparatus 700 for content display control for a video live room, according to an example embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform the above-described content display control method for the live video room.
The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as a Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or similar operating system.
The embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the embodiment of the content display control method for a live video room, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (17)

1. A content display control method of a video live broadcast room is characterized by comprising the following steps:
receiving a first audio stream sent by a viewer end of a video live broadcast room;
acquiring first target text information corresponding to the first audio stream by performing voice recognition on the first audio stream;
acquiring a first target action group corresponding to the first target text information according to the corresponding relation between the preset text information and the action group;
generating first animation image information of a virtual object based on the first target action group, and sending the first animation image information to each audience of the video live broadcast room, wherein the virtual object is a three-dimensional animation image;
the obtaining of the target text information corresponding to the first audio stream by performing speech recognition on the audio stream includes:
carrying out digital signal processing on the first audio stream to obtain uncompressed waveform data;
splitting the waveform data according to a preset granularity;
acquiring acoustic characteristic information corresponding to each section of split waveform data by adopting a preset conversion algorithm;
and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream.
2. The method according to claim 1, wherein the obtaining a first target action group corresponding to the first target text information according to a preset correspondence between text information and action groups comprises:
searching target keywords in the first target text information, wherein the target keywords are one or more keywords in a preset keyword set, and the preset keyword set comprises text information in the corresponding relation;
and acquiring the first target action group corresponding to the target keyword in the corresponding relation.
3. The method of claim 1, wherein the first set of target actions comprises a set of animated pictures of virtual objects;
generating first animation image information of the virtual object based on the first target action group, and sending the first animation image information to each audience of the video live broadcast room, wherein the first animation image information comprises:
generating first animation image information of the virtual object based on a group of animation pictures of the virtual object included in the first target action group;
synthesizing the first animation image information of the virtual object with the current audio and video information to be sent;
and sending the synthesized audio and video information to each audience terminal of the live video room.
4. The method of any of claims 1-3, wherein prior to said receiving the first audio stream transmitted by the viewer of the live video room, the method further comprises:
and synthesizing the image information of the virtual object with the current audio and video information to be sent, and sending the synthesized audio and video information to each audience terminal of the live video room.
5. A method according to any one of claims 1 to 3, wherein a set of animated pictures of the virtual object is used to describe the motion of one or more key vertices of the virtual object.
6. The method according to any one of claims 1 to 3, further comprising:
receiving a second audio stream sent by a main broadcasting end of the video live broadcasting room;
acquiring second target text information corresponding to the second audio stream by performing voice recognition on the second audio stream;
acquiring a second target action group corresponding to the second target text information according to the corresponding relation;
and generating second animation image information of the virtual object based on the second target action group, and sending the second animation image information to each audience of the video live broadcast room.
7. A content display control method of a video live broadcast room is characterized by comprising the following steps:
receiving and inputting a first audio stream by a spectator end of a live video room, wherein the first audio stream is used for controlling a virtual object on a playing interface of the live video room, and the virtual object is a three-dimensional animation image;
sending the first audio stream to a server;
receiving first animation image information returned by the server, wherein the first animation image information is generated by the server according to first target text information corresponding to the first audio stream, the first target text information is obtained by the server performing digital signal processing on the first audio stream to obtain uncompressed waveform data, splitting the waveform data according to a preset granularity, obtaining acoustic feature information corresponding to each split segment of waveform data by adopting a preset conversion algorithm, and decoding the acoustic feature information corresponding to each segment of waveform data by adopting a preset decoding algorithm;
and playing the first animation image on the playing interface.
8. The method of claim 7, wherein receiving the first animated image information returned by the server comprises:
and receiving the synthetic audio and video information sent by the server, wherein the synthetic audio and video information is obtained by synthesizing the first animation image information and the audio and video information currently sent to the audience by the server.
9. A content display control apparatus for a live video room, comprising:
the first receiving unit is configured to receive a first audio stream transmitted by a viewer side of a video live broadcast;
a first acquisition unit configured to perform voice recognition on the first audio stream to acquire first target text information corresponding to the first audio stream;
the second acquisition unit is configured to execute acquisition of a first target action group corresponding to the first target text information according to a preset corresponding relation between the text information and the action group;
a generating unit configured to perform generating first animation image information of a virtual object based on the first target motion group, wherein the virtual object is a three-dimensional animated figure;
a first sending unit configured to execute sending the first animation image information to each audience of the video live broadcast room;
the first obtaining unit obtains target text information corresponding to the first audio stream by performing voice recognition on the audio stream, and the obtaining unit includes:
carrying out digital signal processing on the first audio stream to obtain uncompressed waveform data;
splitting the waveform data according to a preset granularity;
acquiring acoustic characteristic information corresponding to each section of split waveform data by adopting a preset conversion algorithm;
and decoding the acoustic characteristic information corresponding to each section of waveform data by adopting a preset decoding algorithm to obtain first target text information corresponding to the first audio stream.
10. The apparatus according to claim 9, wherein the second obtaining unit obtains a first target action group corresponding to the first target text information according to a preset correspondence between text information and action groups, and includes:
searching target keywords in the first target text information, wherein the target keywords are one or more keywords in a preset keyword set, and the preset keyword set comprises text information in the corresponding relation;
and acquiring the first target action group corresponding to the target keyword in the corresponding relation.
11. The apparatus of claim 9, wherein the first set of target actions comprises a set of animated pictures of virtual objects;
the device further comprises: the synthesis unit is configured to synthesize the first animation image information of the virtual object and the current audio and video information to be sent;
the first sending unit sends the video to each audience of the video live broadcast room, and comprises: and sending the synthesized audio and video information to each audience terminal of the live video room.
12. The apparatus according to any of claims 9 to 11, wherein the first sending unit is further configured to perform sending the image information of the virtual object to each viewer end of the video live broadcast room before receiving the first audio stream sent by the viewer end of the video live broadcast room.
13. The apparatus according to any one of claims 9 to 11,
the first receiving unit is also configured to execute receiving a second audio stream sent by a main broadcasting end of the video live broadcasting room;
the first obtaining unit is further configured to perform voice recognition on the second audio stream to obtain second target text information corresponding to the second audio stream;
the second obtaining unit is further configured to obtain a second target action group corresponding to the second target text information according to the corresponding relationship;
the generating unit is further configured to execute generating second animation image information of the virtual object based on the second target action group;
the first sending unit is further configured to send the second animation image information to each audience of the live video room.
14. A content display control apparatus for a live video room, comprising:
the second receiving unit is configured to execute receiving and inputting a first audio stream, wherein the first audio stream is used for controlling a virtual object on a playing interface of the video live broadcast room, and the virtual object is a three-dimensional animation image;
a second transmitting unit configured to perform transmitting the first audio stream to a server;
the second receiving unit is further configured to execute receiving of first animation image information returned by the server, where the first animation image information is generated by the server according to first target text information corresponding to the first audio stream, the first target text information is obtained by performing digital signal processing on the first audio stream by the server to obtain uncompressed waveform data, the waveform data is split according to a preset granularity, acoustic feature information corresponding to each segment of the split waveform data is obtained by using a preset conversion algorithm, and the acoustic feature information corresponding to each segment of the waveform data is obtained by decoding by using a preset decoding algorithm;
and the playing unit is configured to play the first animation image on the playing interface.
15. The apparatus of claim 14, wherein the second receiving unit receives the first moving picture information returned by the server, and comprises:
and receiving the synthetic audio and video information sent by the server, wherein the synthetic audio and video information is obtained by synthesizing the first animation image information and the audio and video information currently sent to the audience by the server.
16. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement a content display control method of a video live room as claimed in any one of claims 1 to 6; or implementing a content display control method of a live video room as claimed in any of claims 7 to 8.
17. A storage medium characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to execute a content display control method of a live video room according to any one of claims 1 to 6; or, a content display control method of a video live room according to any one of claims 7 to 8 is executed.
CN202010857464.XA 2020-08-24 2020-08-24 Content display control method and device for video live broadcast room Active CN111954063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010857464.XA CN111954063B (en) 2020-08-24 2020-08-24 Content display control method and device for video live broadcast room

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010857464.XA CN111954063B (en) 2020-08-24 2020-08-24 Content display control method and device for video live broadcast room

Publications (2)

Publication Number Publication Date
CN111954063A CN111954063A (en) 2020-11-17
CN111954063B true CN111954063B (en) 2022-11-04

Family

ID=73360296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010857464.XA Active CN111954063B (en) 2020-08-24 2020-08-24 Content display control method and device for video live broadcast room

Country Status (1)

Country Link
CN (1) CN111954063B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114630135A (en) * 2020-12-11 2022-06-14 北京字跳网络技术有限公司 Live broadcast interaction method and device
CN112541959A (en) * 2020-12-21 2021-03-23 广州酷狗计算机科技有限公司 Virtual object display method, device, equipment and medium
CN113014935B (en) * 2021-02-20 2023-05-09 北京达佳互联信息技术有限公司 Interaction method and device of live broadcasting room, electronic equipment and storage medium
CN114071177B (en) * 2021-11-16 2023-09-26 网易(杭州)网络有限公司 Virtual gift sending method and device and terminal equipment
CN114415907B (en) * 2022-01-21 2023-08-18 腾讯科技(深圳)有限公司 Media resource display method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959718A (en) * 2016-06-24 2016-09-21 乐视控股(北京)有限公司 Real-time interaction method and device in video live broadcasting
CN106162230A (en) * 2016-07-28 2016-11-23 北京小米移动软件有限公司 The processing method of live information, device, Zhu Boduan, server and system
CN106804007A (en) * 2017-03-20 2017-06-06 合网络技术(北京)有限公司 The method of Auto-matching special efficacy, system and equipment in a kind of network direct broadcasting
CN109618181A (en) * 2018-11-28 2019-04-12 网易(杭州)网络有限公司 Exchange method and device, electronic equipment, storage medium is broadcast live
CN110798696A (en) * 2019-11-18 2020-02-14 广州虎牙科技有限公司 Live broadcast interaction method and device, electronic equipment and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11736756B2 (en) * 2016-02-10 2023-08-22 Nitin Vats Producing realistic body movement using body images
CN106228436A (en) * 2016-08-26 2016-12-14 北京小米移动软件有限公司 Live platform virtual objects method to set up and device
CN106792246B (en) * 2016-12-09 2021-03-09 福建星网视易信息系统有限公司 Method and system for interaction of fusion type virtual scene
CN108076392A (en) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 Living broadcast interactive method, apparatus and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959718A (en) * 2016-06-24 2016-09-21 乐视控股(北京)有限公司 Real-time interaction method and device in video live broadcasting
CN106162230A (en) * 2016-07-28 2016-11-23 北京小米移动软件有限公司 The processing method of live information, device, Zhu Boduan, server and system
CN106804007A (en) * 2017-03-20 2017-06-06 合网络技术(北京)有限公司 The method of Auto-matching special efficacy, system and equipment in a kind of network direct broadcasting
CN109618181A (en) * 2018-11-28 2019-04-12 网易(杭州)网络有限公司 Exchange method and device, electronic equipment, storage medium is broadcast live
CN110798696A (en) * 2019-11-18 2020-02-14 广州虎牙科技有限公司 Live broadcast interaction method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN111954063A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN111954063B (en) Content display control method and device for video live broadcast room
CN110662083B (en) Data processing method and device, electronic equipment and storage medium
CN109446876B (en) Sign language information processing method and device, electronic equipment and readable storage medium
CN109637518B (en) Virtual anchor implementation method and device
CN108363706B (en) Method and device for man-machine dialogue interaction
CN106791893B (en) Video live broadcasting method and device
CN107644646B (en) Voice processing method and device for voice processing
WO2019153925A1 (en) Searching method and related device
US20210029304A1 (en) Methods for generating video, electronic device and storage medium
WO2023279960A1 (en) Action processing method and apparatus for virtual object, and storage medium
US20180173394A1 (en) Method and apparatus for inputting expression information
CN110730360A (en) Video uploading and playing methods and devices, client equipment and storage medium
CN112788359A (en) Live broadcast processing method and device, electronic equipment and storage medium
CN115273831A (en) Voice conversion model training method, voice conversion method and device
CN111696536A (en) Voice processing method, apparatus and medium
CN113689530B (en) Method and device for driving digital person and electronic equipment
CN111292743B (en) Voice interaction method and device and electronic equipment
CN110162710A (en) Information recommendation method and device under input scene
CN114356068B (en) Data processing method and device and electronic equipment
CN113115104B (en) Video processing method and device, electronic equipment and storage medium
CN111225269B (en) Video playing method and device, playing terminal and storage medium
CN113709548A (en) Image-based multimedia data synthesis method, device, equipment and storage medium
CN113792178A (en) Song generation method and device, electronic equipment and storage medium
CN110753233B (en) Information interaction playing method and device, electronic equipment and storage medium
CN112232901A (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant