CN118233665A - Live broadcast method and device, electronic equipment and storage medium - Google Patents

Live broadcast method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118233665A
CN118233665A CN202410190220.9A CN202410190220A CN118233665A CN 118233665 A CN118233665 A CN 118233665A CN 202410190220 A CN202410190220 A CN 202410190220A CN 118233665 A CN118233665 A CN 118233665A
Authority
CN
China
Prior art keywords
information
interaction
response
virtual image
live
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410190220.9A
Other languages
Chinese (zh)
Inventor
于鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202410190220.9A priority Critical patent/CN118233665A/en
Publication of CN118233665A publication Critical patent/CN118233665A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/475End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data
    • H04N21/4756End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data for rating content, e.g. scoring a recommended movie
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4784Supplemental services, e.g. displaying phone caller identification, shopping application receiving rewards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • H04N2005/2726Means for inserting a foreground image in a background image, i.e. inlay, outlay for simulating a person's appearance, e.g. hair style, glasses, clothes

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a live broadcast method, a live broadcast device, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring interaction information of a live broadcasting room, wherein a host broadcasting in the live broadcasting room is an virtual image, and the interaction information comprises at least one of interaction operation information and audience comment information; processing the interaction information through a large language model to obtain response information of the virtual image; in the living room, the control avatar responds according to the response information. The interaction information can embody the interaction condition of the live broadcasting room, and the response information of the virtual image is matched with the interaction information, so that the response of the virtual image is matched with the interaction condition of the live broadcasting room, the interaction effect of the virtual image with audiences in the live broadcasting room is displayed adaptively according to the interaction condition of the live broadcasting room, the fidelity of the virtual image is improved, and the live broadcasting effect is improved.

Description

Live broadcast method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a live broadcast method, a live broadcast device, an electronic device, and a storage medium.
Background
In recent years, with the wide application of live broadcasting function and the development of artificial intelligence technology, live broadcasting is performed by replacing real people with virtual images, and the live broadcasting method is an emerging live broadcasting mode.
In a living room, the avatar may speak, make a certain expression, or make a certain action for viewing by a viewer in the living room. However, the expression or action of the virtual image is set in advance, so that the virtual image is hard, and the live broadcasting effect of the live broadcasting room is poor.
Disclosure of Invention
The embodiment of the application provides a live broadcast method, a live broadcast device, electronic equipment and a storage medium, which improve the fidelity of an virtual image and further improve the live broadcast effect. The technical scheme is as follows:
according to an aspect of the embodiment of the present application, there is provided a live broadcast method, including:
acquiring interaction information of a live broadcasting room, wherein a host broadcast in the live broadcasting room is an virtual image, the interaction information comprises at least one of interaction operation information and audience comment information, and the interaction operation information represents interaction operation occurring in the live broadcasting room;
Processing the interaction information through a large language model to obtain response information of the virtual image, wherein the response information is used for responding to the interaction information of the live broadcasting room, and the response information comprises at least one of voice, expression, gestures and body actions;
And in the live broadcasting room, the virtual image is controlled to respond according to the response information.
In one possible implementation manner, the acquiring the interaction information of the live broadcast room includes at least one of the following:
Obtaining comment text sent by audience in the live broadcasting room;
The comment voice sent by the audience in the live broadcasting room is obtained, and the comment voice is converted into comment text;
Acquiring the voice of a host broadcasting the virtual image, and converting the voice into a text;
acquiring the wheat connecting score of the virtual image in the live broadcasting room;
acquiring the number of gifts in the living broadcast room;
Obtaining the praise times of the live broadcasting room;
And acquiring activity information corresponding to the ongoing activity in the live broadcasting room.
In one possible implementation manner, the processing, through a large language model, the interaction information to obtain response information of the avatar includes at least one of the following:
processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the character of the virtual image;
Processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information;
And processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and meets response conditions, wherein the response conditions represent conditions which the response information of the avatar should meet.
In one possible implementation manner, the processing, by the large language model, the interactive information to obtain response information for responding to the interactive information and matching with the character of the avatar includes:
acquiring character characteristics of the virtual image;
Coding the interaction information through the large language model to obtain interaction characteristics;
Processing the character features and the interaction features through the large language model to obtain a response text which is used for responding to the interaction information and is matched with the character features;
and based on the character features, performing voice synthesis on the response text to obtain response voice, wherein the response voice speaks the response text in a tone color matched with the character features.
In one possible implementation manner, the processing, by the large language model, the interactive information to obtain response information for responding to the interactive information and matching with the emotion type of the interactive information includes:
encoding the interactive information through the large language model to obtain interactive features, and carrying out emotion analysis on the interactive information to obtain emotion features, wherein the emotion features represent emotion types of the interactive information;
And processing the interaction characteristics and the emotion characteristics through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information.
In one possible implementation manner, the processing, through a large language model, the interaction information to obtain response information of the avatar includes:
And processing the interaction information and the corpus information through a large language model to obtain response information of the virtual image, wherein the corpus information is information for introducing the virtual image.
In one possible implementation manner, the processing, through a large language model, the interaction information to obtain response information of the avatar includes:
processing the interaction information through the large language model to obtain response information of the virtual image and sound effect types matched with the response information;
And in the live broadcasting room, controlling the avatar to respond according to the response information, wherein the method comprises the following steps:
and in the live broadcasting room, the virtual image is controlled to respond according to the response information, and meanwhile, the sound effect belonging to the sound effect type is played.
In one possible implementation, the method further includes:
acquiring sample data, wherein the sample data comprises sample interaction information and sample response information corresponding to the sample interaction information, and the sample response information represents a response required by an avatar in a live broadcast room containing the sample interaction information;
training the large language model based on the sample data.
According to another aspect of an embodiment of the present application, there is provided a live broadcast apparatus, including:
The information acquisition module is used for acquiring interaction information of a live broadcasting room, wherein a host broadcast in the live broadcasting room is an virtual image, the interaction information comprises at least one of interaction operation information and audience comment information, and the interaction operation information represents interaction operation occurring in the live broadcasting room;
the processing module is used for processing the interaction information through a large language model to obtain response information of the virtual image, wherein the response information is used for responding to the interaction information of the live broadcasting room, and the response information comprises at least one of voice, expression, gestures and body actions;
and the response module is used for controlling the avatar to respond according to the response information in the live broadcasting room.
In one possible implementation manner, the information acquisition module is configured to at least one of the following:
Obtaining comment text sent by audience in the live broadcasting room;
The comment voice sent by the audience in the live broadcasting room is obtained, and the comment voice is converted into comment text;
Acquiring the voice of a host broadcasting the virtual image, and converting the voice into a text;
acquiring the wheat connecting score of the virtual image in the live broadcasting room;
acquiring the number of gifts in the living broadcast room;
Obtaining the praise times of the live broadcasting room;
And acquiring activity information corresponding to the ongoing activity in the live broadcasting room.
In one possible implementation, the processing module includes at least one of:
The first processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with characters of the virtual image;
the second processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information;
And the third processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and meets response conditions, wherein the response conditions represent conditions which the response information of the avatar should meet.
In one possible implementation, the first processing unit is configured to:
Acquiring character characteristics of the virtual image; coding the interaction information through the large language model to obtain interaction characteristics; processing the character features and the interaction features through the large language model to obtain a response text which is used for responding to the interaction information and is matched with the character features; and based on the character features, performing voice synthesis on the response text to obtain response voice, wherein the response voice speaks the response text in a tone color matched with the character features.
In one possible implementation, the second processing unit is configured to:
Encoding the interactive information through the large language model to obtain interactive features, and carrying out emotion analysis on the interactive information to obtain emotion features, wherein the emotion features represent emotion types of the interactive information; and processing the interaction characteristics and the emotion characteristics through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information.
In one possible implementation, the processing module includes:
and the fourth processing unit is used for processing the interaction information and the corpus information through a large language model to obtain response information of the virtual image, wherein the corpus information is information for introducing the virtual image.
In one possible implementation, the processing module includes:
The fifth processing unit is used for processing the interaction information through the large language model to obtain response information of the virtual image and sound effect types matched with the response information;
The response module comprises:
And the response unit is used for controlling the virtual image to respond according to the response information and playing the sound effect belonging to the sound effect type in the live broadcast room.
In one possible implementation, the apparatus further includes:
The training module is used for acquiring sample data, wherein the sample data comprises sample interaction information and sample response information corresponding to the sample interaction information, and the sample response information represents a response required by an avatar in a live broadcast room containing the sample interaction information; training the large language model based on the sample data.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a processor and a memory, wherein the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so as to implement the live broadcast method in any one of the possible implementation manners.
According to another aspect of the embodiments of the present application, there is provided a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the live method described in any one of the possible implementations.
According to another aspect of embodiments of the present application, there is provided a computer program product comprising computer program code, the computer program code being stored in a computer readable storage medium, from which computer readable storage medium a processor reads the computer program code, the processor executing the computer program code to implement a live method as described in any one of the possible implementations above.
According to the scheme provided by the embodiment of the application, the interaction information of the live broadcasting room is obtained, the interaction information is processed through the large language model, so that the response information of the virtual image is obtained, the virtual image is controlled to respond according to the response information in the live broadcasting room, and the response information of the virtual image is matched with the interaction information because the interaction information can embody the interaction condition of the live broadcasting room, so that the response of the virtual image is matched with the interaction condition of the live broadcasting room, an interaction effect of the virtual image and a viewer in the live broadcasting room is displayed, the fidelity of the virtual image is improved, and the live broadcasting effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;
fig. 2 is a flowchart of a live broadcast method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a live broadcast device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of a live broadcast server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another.
The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality.
It should be noted that, the information (including, but not limited to, information for processing, stored information, presented information, etc.), the data (including, but not limited to, data for processing, stored data, presented data, etc.) related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use and processing of the related data are required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the interactive information related to the application is obtained under the condition of full authorization.
Embodiments of the application are performed by an electronic device, which may comprise any type of device.
Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application, referring to fig. 1, where the implementation environment includes a terminal 101 and a live server 102, and the terminal 101 and the live server 102 are connected through a communication network.
Optionally, terminal 101 includes, but is not limited to, a smart phone, tablet, notebook, desktop, smart television, smart exercise device, and the like. The terminal 101 may access the live server 102 via the internet. Optionally, the terminal 101 has installed thereon a live application for accessing the live server 102. For example, the live application is a short video application, a shopping application, a blog application, or the like having a live function.
Optionally, the live server 102 is at least one of a live server, a live server cluster composed of a plurality of live servers, a cloud live server, a cloud computing platform, and a virtualization center, which is not limited in the embodiment of the present application.
The live broadcast server 102 creates an avatar and a live broadcast room corresponding to the avatar, controls the avatar to live in the live broadcast room, transmits live contents of the avatar to the terminal 101, and displays the live contents in the live broadcast room so that a user of the terminal 101 views live broadcasting of the avatar. In addition, in the live broadcast process, the user of the terminal 101 may also interact with the avatar, for example, the user sends comment information in the live broadcast room, and the avatar replies to the comment information. Or the user of the terminal 101 plays a game with the avatar in the living room.
Fig. 2 is a flowchart of a live broadcast method according to an embodiment of the present application. The execution body of the embodiment of the application is an electronic device, referring to fig. 2, the method includes the following steps:
201. the electronic equipment acquires interaction information of the live broadcasting room, and a host in the live broadcasting room is an virtual image.
Wherein the avatar is an avatar, or an avatar object, etc., and the avatar may be a real person avatar, a cartoon avatar, or other type of avatar, which may be called a digital person, a virtual anchor, a virtual character, etc. The avatar is displayed based on appearance information including information of a hairstyle, clothing, makeup, wearing ornament, etc. of the avatar. And the appearance information can be updated according to the requirements, for example, clothes matched with the holiday are arranged for the virtual image on the holiday. Different appearance information can be set for different avatars to highlight the individuality of the avatar itself, and real-time rendering is performed in the live broadcasting room based on the appearance information of the avatar by using a computer graphics technology to display the avatar with the appearance information in the live broadcasting room, so that the display effect of the avatar is real and natural.
The interaction information can reflect the current interaction condition of the live broadcasting room, the interaction information comprises at least one of interaction operation information and audience comment information, the interaction operation information represents interaction operation occurring in the live broadcasting room, and the audience comment information is comment information sent by an audience in the live broadcasting room and comprises comment voice or comment text and the like.
In one possible implementation manner, the electronic device includes a terminal and a live broadcast server, and the terminal acquires interaction information of the live broadcast room and sends the interaction information to the live broadcast server. In another possible implementation, the live broadcast server counts interactive operations occurring in the live broadcast room, such as comment operations, praise operations, wheat linking operations, gift giving operations, etc., and obtains interactive information based on the interactive operations occurring.
In another possible implementation, in order to facilitate the avatar to respond in time, the latest interaction information of the live broadcasting room is acquired every target time period, so as to determine the response information of the avatar based on the latest interaction information, wherein the target time period is 5 seconds or 1 minute, etc.
202. The electronic equipment processes the interaction information through the large language model to obtain response information of the virtual image.
The large language model is a deep learning model trained based on massive text data, only natural language texts can be generated, text meanings can be deeply understood, and various natural language tasks such as text abstracts, questions and answers, translation and the like are processed. The response information is used for responding to the interaction information of the living broadcast room, and the response information comprises at least one of voice, expression, gesture and body action.
In one possible implementation, the large language model includes an encoder and a decoder, the interactive information is encoded by the encoder to obtain interactive features, and the interactive features are decoded by the decoder to obtain response information. The large language model utilizes natural language processing technology to understand key information such as intention in the interactive information, so that response information is generated based on the interactive characteristics.
In another possible implementation manner, different interaction information can be generated in the live broadcast room at different moments, and then the interaction information at the previous moment can be acquired each time when the electronic device acquires the interaction information at the current moment, and the interaction information at the current moment and the interaction information at the previous moment are processed to obtain response information corresponding to the current moment. Because the interaction information at the previous moment represents the history interaction condition of the live broadcasting room, the context of the interaction information at the current moment can be used, and the response information obtained by comprehensively considering the interaction information at a plurality of moments is more accurate.
For example, there is a context relation between comment texts sent by the audience at different moments, and each time the avatar is to reply to the audience, not only the comment text sent by the audience at the current moment but also the comment text sent by the audience at the previous moment and the reply text of the avatar, that is, the context is considered, so that the latest reply text is generated.
In another possible implementation manner, the live broadcast room includes multiple audiences, and interaction information generated by each audience is different, so that in order to realize personalized interaction between the virtual image and each audience, the interaction information of different audiences can be distinguished, and interaction information corresponding to account numbers of each audience is processed through a large language model to obtain response information. For example, if two viewers send different comment texts in the live broadcast room, different response texts are generated for the two comment texts respectively, and the subsequent control avatar speaks the two different response texts respectively so as to reply to the two viewers respectively.
For example, in order to improve interactivity, the generated response information may further include an account number to which the viewer belongs, so that the avatar may speak the account number when responding, giving the viewer a feeling of being focused by the avatar.
For example, when the interaction information is stored, account numbers corresponding to the interaction information are stored in an associated manner, so that the interaction information of different account numbers can be distinguished later. Or the interactive information contains voices sent by audiences, and the interactive information of different audiences can be identified by carrying out voiceprint extraction on the voices contained in each piece of interactive information.
In another possible implementation manner, the electronic device may further perform state management on each viewer, for example, record a state of an account to which each viewer belongs, where the state includes an online state and an offline state, the online state is a state in which the live broadcast is being watched in the live broadcast room, and the offline state is a state in which the live broadcast room is exited. In addition, the online state may further include an interactive state and a non-interactive state, wherein the interactive state is a state in which an interactive operation is performed in the live broadcast room, and the non-interactive state is a state in which an interactive operation is not performed in the live broadcast room. When the large language model is used for processing the interaction information, the state of the account number of the audience can be acquired first, and when the state is determined to be an on-line state or the state is determined to be an interaction state, the large language model is used for processing the interaction information to obtain response information.
It should be noted that, while the embodiments of the present application are described with respect to an avatar of one living room, in other embodiments, a living server may create multiple avatars through which living is performed in at least one living room. The live broadcast server can create a large language model, and the interactive information in different live broadcast rooms is processed through the large language model to respectively obtain the response information of the virtual images in the different live broadcast rooms. Or the live broadcast server creates a personalized large language model for each virtual image, and processes the interaction information in the live broadcast room to which the large language model belongs through each large language model to obtain the response information of the virtual image to which the large language model belongs.
203. The electronic equipment controls the virtual image to respond according to the response information in the live broadcasting room.
By utilizing the computer graphics technology and the real-time rendering technology, the virtual image is controlled to make a response matched with the interaction information, so that the effect that the virtual image adaptively interacts with the audience of the live broadcasting room according to the current interaction condition of the live broadcasting room can be presented.
In one possible implementation manner, the electronic device includes a terminal and a live broadcast server, the live broadcast server obtains response information of the avatar through the large language model, controls the avatar to respond according to the response information so as to obtain live broadcast content of the avatar, responds to the avatar in the live broadcast content according to the response information, and issues the live broadcast content to the live broadcast room so that the terminal in the live broadcast room can output the live broadcast content.
Wherein the response information includes response voice and response state, the response state includes expression, gesture or physical action, etc., and the control avatar responds according to the response information to obtain live content of the avatar, including: the method comprises the steps of obtaining an initial avatar, loading the response state on the basis of the avatar to change the expression, gesture or body action and the like of the avatar, obtaining an updated avatar, creating a live broadcast picture comprising the updated avatar, and combining the live broadcast picture and response voice into live broadcast content.
Wherein publishing the live content to the live room comprises: the low-delay transmission technology is adopted to send the live broadcast content to the terminal of the audience in the live broadcast room so as to ensure that the time difference between the response made by the virtual image and the generation of the interaction information is small, and ensure the real-time interaction between the virtual image and the audience.
In one possible implementation, the response information includes at least one of:
1. And if the response information comprises voice, the live broadcast content obtained by the live broadcast server comprises the voice, and the live broadcast content is released to the live broadcast room, so that the voice can be played when a terminal in the live broadcast room outputs the live broadcast content, and the effect that the virtual image is in dialogue with a viewer in the live broadcast room is presented.
2. And if the response information comprises the expression, the live broadcast server controls the virtual image to make the expression, the obtained live broadcast content comprises the virtual image making the expression, and the live broadcast content is released into the live broadcast room, so that the virtual image making the expression can be displayed when the terminal in the live broadcast room outputs the live broadcast content.
3. And if the response information comprises a gesture, the live broadcast server controls the virtual image to make the gesture, the obtained live broadcast content comprises the virtual image making the gesture, and the live broadcast content is released into the live broadcast room, so that the virtual image making the gesture can be displayed when a terminal in the live broadcast room outputs the live broadcast content.
4. And if the response information comprises the body action, the live broadcast server controls the virtual image to make the body action, the obtained live broadcast content comprises the virtual image making the body action, and the live broadcast content is released into the live broadcast room, so that the virtual image making the body action can be displayed when the terminal in the live broadcast room outputs the live broadcast content.
According to the method provided by the embodiment of the application, the interaction information of the live broadcasting room is obtained, the interaction information is processed through the large language model, the response information of the virtual image is obtained, in the live broadcasting room, the virtual image is controlled to respond according to the response information, and as the interaction information can embody the interaction condition of the live broadcasting room, the response information of the virtual image is matched with the interaction information, so that the response of the virtual image is matched with the interaction condition of the live broadcasting room, the effect that the virtual image adaptively interacts with audience in the live broadcasting room according to the interaction condition of the live broadcasting room is presented, the fidelity of the virtual image is improved, and the live broadcasting effect is improved.
In the related art, live broadcasting is performed through a live host, a large amount of resources including cost, live broadcasting equipment, live broadcasting sites, auxiliary personnel and the like are consumed, so that the live broadcasting cost is high, and the live host is limited by time and space and possibly cannot meet the requirement of long-time live broadcasting. And live broadcasting of a live person is highly dependent on personal ability and state of a live person host, certain uncertainty exists, and live broadcasting content can expose personal information of the live person host, so that the problem of information leakage is caused. In the embodiment of the application, live broadcasting is performed through the virtual image, so that the live broadcasting cost is saved, the requirement of long-time live broadcasting is easily met, the virtual image responds according to the response information generated by the large language model, the certainty is strong, the problem of unstable state is avoided, and the problem of leakage of personal information of a live person and a host is avoided.
On the basis of the embodiment shown in fig. 2 described above, in one possible implementation, step 201 includes at least one of:
2011. and acquiring comment text sent by audience in the live broadcasting room.
For example, the audience inputs comment text on the terminal, the terminal sends the comment text to the live broadcast server, the live broadcast server publishes the comment text to a comment area of the live broadcast room, so that the comment text is displayed on the terminal corresponding to each audience in the live broadcast room, and the live broadcast server also takes the comment text as interaction information. The comment text can represent the words of the audience to the avatar, and then the follow-up control avatar responds to the comment text, so that the effect of the dialog between the avatar and the audience is realized.
2012. And acquiring comment voices sent by audience in the live broadcasting room, and converting the comment voices into comment texts.
For example, a viewer inputs comment voice on a terminal, the terminal sends the comment voice to a live broadcast server, the live broadcast server publishes the comment voice to a comment area of a live broadcast room, so that after clicking on the terminal corresponding to each viewer in the live broadcast room, the comment voice can be played, and the live broadcast server also converts the comment voice into comment text, and the comment text is used as interaction information. The comment text can represent the words of the audience to the avatar, and then the follow-up control avatar responds to the comment text, so that the effect of the dialog between the avatar and the audience is realized.
2013. And acquiring the voice of the host broadcasting the avatar, and converting the voice into text.
The virtual image can be connected with the host in other live broadcasting rooms, and interacts with the host in the process of connecting the wheat, so that the voice of the host is obtained, the voice is converted into text, and the text is used as interaction information. The text can represent the words of the wheat middleman, and the follow-up control avatar responds to the text, so that the effect of the dialogue between the avatar and the wheat middleman is realized.
2014. The ligature score of the avatar in the living room is obtained.
In the process of connecting the virtual image with the anchor in other live broadcasting rooms, the virtual image and the connecting score of the anchor are determined according to the virtual image, the gifts received by the anchor, the number of audiences participating in the connection, the rising amplitude of the number of audiences, the number of new audiences entering the live broadcasting room, the popularity value of the live broadcasting room and the like. And acquiring the wheat connecting score of the virtual image, and taking the wheat connecting score as interaction information. The ligature score can represent a ligature condition of the avatar, and the subsequent control avatar is responsive to the ligature score, such as outputting speech for the ligature score to call a viewer in the live room to actively participate in the ligature activity to increase the ligature score. Or make happy or disappointed expressions aiming at the company score, and attract audience to watch. An effect that the avatar responds according to the wheat connecting situation is achieved.
2015. The number of gifts in the living room is obtained.
In the living broadcast room, the audience can give the gift to the virtual image, the number of gifts is obtained, the number of gifts is taken as interaction information, the number of gifts can represent the situation that the audience gives the gift to the virtual image, the virtual image is subsequently controlled to respond to the number of gifts, for example, the voice is output for the number of gifts, so that the audience in the living broadcast room is called to actively give the gift. Or make happy or disappointed expression to the number of gifts, attract the audience to watch, realize an effect that the virtual image responds according to the condition that the audience gifts.
2016. And obtaining the praise times of the live broadcasting room.
In the living broadcast room, the audience can perform praise operation, the praise times are acquired, the praise times are taken as interaction information, the praise times can represent approval of the audience to living broadcast contents of the virtual image, the virtual image is subsequently controlled to respond to the praise times, for example, voice is output aiming at the praise times, the audience in the living broadcast room is called to praise, or happy or disappointed expression is made aiming at the praise times, the audience is attracted to watch, and the effect of responding according to praise conditions of the virtual image is realized.
2017. And acquiring activity information corresponding to the ongoing activity in the live broadcasting room.
The live broadcasting room can hold activities such as interactive games, article sales promotion, red-robbed packages and the like, the activity information represents the participation condition or the activity progress of the activities, and the virtual image can be controlled to respond to the activities by acquiring the activity information and taking the activity information as the interaction information. Wherein, the activity information in the interactive game comprises: the number of spectators participating in the interactive game, the level of spectators, the operations or words of the spectators in the interactive game, the game score of the spectators, the game score of the avatar, etc. The activity information in the item promotion activity includes: the viewer account number of the purchased item, the highest number of items sold, the remaining duration of the campaign, the offer information for the item, etc. The activity information in the robbery package activity includes: the total number of red packets, the number of remaining red packets, the remaining time period, the conditions that the audience who is allowed to rob the red packets needs to meet, the number of resources contained in the red packets that the avatar robs, and the like. Based on the above-described activity information, the avatar may be controlled to respond, for example, in an interactive game, the avatar is controlled to perform a next game operation with respect to a game operation of the spectator, or in a robbery activity, the avatar is controlled to rob a red envelope together with the spectator, or the like.
In the embodiment of the application, the electronic equipment can acquire any one of the interaction information or at least two of the interaction information, so that the data volume of the interaction information is increased, the at least two types of interaction information more accurately reflect the current live broadcast condition of the live broadcast room, the response information of the virtual image is more accurately determined based on the at least two types of interaction information, the fidelity of the virtual image is further improved, and the live broadcast effect is improved.
On the basis of the above embodiment, in one possible implementation manner, the method further includes: acquiring sample data, wherein the sample data comprises sample interaction information and sample response information corresponding to the sample interaction information, and the sample response information represents a response required by an avatar in a living broadcast room containing the sample interaction information; a large language model is trained based on the sample data.
The content of the sample interaction information is similar to the content of the interaction information in the above embodiment, and the content of the sample response information is similar to the content of the response information in the above embodiment, which is not described herein. The sample data may be set by a technician or obtained from at least one live room. For example, a live broadcast room in which a live person plays a role as a sample live broadcast room, sample interaction information in the sample live broadcast room is acquired at one or more moments, and response information of the live person to the sample interaction information is acquired as sample interaction information corresponding to the sample interaction information. In this way, one or more pieces of sample data may be collected, and a large language model trained based on the collected sample data. The trained large language model has the capability of generating response information based on the interaction information of the live broadcasting room, and because the sample data is acquired from the live broadcasting room of the real person, the response information generated by the trained large language model is closer to the response information of the real person, so that the response of the virtual image to the interaction condition of the live broadcasting room is close to the response of the real person, and the fidelity of the virtual image is improved.
In one possible implementation manner, the live broadcast server creates a large language model, the large language model is suitable for different virtual images, the sample response information is the response information of any virtual image in sample data used in the process of training the large language model, the virtual images to which the sample response information in a plurality of pieces of sample data belong are the same or different, and the large language model is trained based on the plurality of pieces of sample data, so that the large language model can be suitable for different virtual images and has universality.
In another possible implementation, the live server creates different large language models for different avatars, which are applicable only to their corresponding avatars. The sample response information is the response information of the virtual image corresponding to the large language model in the sample data used in the process of training the large language model, and the large language model is only suitable for the corresponding virtual image based on one or more pieces of sample data, so that the personalized large language model can be trained by the training mode, and the accuracy of the large language model is improved.
Illustratively, characters are set for the avatar, and in sample data used in the process of training the large language model, sample response information is response information of the avatar corresponding to the large language model, and the sample response information represents a response required by the avatar having the characters in a living room containing sample interaction information, and the large language model is trained based on one or more pieces of sample data, so that response information generated by the large language model accords with the characters of the avatar.
In the embodiment of the application, the large language model is trained based on the sample data to improve the performance of the large language model, so that the large language model can learn the association between the interaction information and the response information of the avatar, thereby having the capability of determining the response information of the avatar in the live broadcasting room based on the interaction information in the live broadcasting room, and realizing the interaction between the avatar and the audience in the live broadcasting room.
On the basis of the above embodiment, in one possible implementation, step 203 includes at least one of the following:
2031. And processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the character of the virtual image.
In the embodiment of the application, characters are set for the avatar, wherein the characters comprise characters in various aspects such as language style, word style, humorous feeling and the like, and the characters are required to be matched with the characters when the avatar speaks, makes expression, gesture or body action, makeup of the avatar, worn clothes, tone of the avatar and the like. For example, the tone of a younger active avatar is brighter and the tone of a mature, older avatar is darker. Therefore, in processing the interactive information through the large language model, it is also necessary to consider the character of the avatar so that the response information matches the character of the avatar.
For example, the response information includes a voice, a text contained in the voice matches a character of the avatar, or a tone contained in the voice matches a character of the avatar. Or the response information includes at least one of an expression, a gesture, or a body action, and the at least one of an expression, a gesture, or a body action matches a character of the avatar.
In one possible implementation, step 2031 includes: the character features of the virtual image are obtained, the character features represent characters of the virtual image, the interactive information is encoded through a large language model to obtain interactive features, the character features and the interactive features are processed through the large language model to obtain response texts which are used for responding to the interactive information and are matched with the character features, voice synthesis is carried out on the response texts based on the character features to obtain response voices, and the response texts are spoken by tone colors matched with the character features in the response voices.
The large language model comprises an encoder and a decoder, the electronic equipment obtains characters of the virtual image, the interactive information is encoded through the encoder to obtain interactive characteristics, character characteristics corresponding to the characters are obtained through the encoder, the character characteristics and the interactive characteristics are fused to obtain fusion characteristics, and the fusion characteristics are decoded through the decoder to obtain response texts which are used for responding to the interactive information and are matched with the character characteristics.
By acquiring the character features of the avatar, the interactive information and the character features can be comprehensively considered to generate the response information, so that the response information is matched with the interactive information and the character features, thereby ensuring that the response made by the avatar is matched with the character of the avatar, providing an experience of interaction with the avatar with the character for the audience in the living broadcast room, improving the authenticity and improving the user experience.
In another possible implementation, the method further includes: an activity portal matching the character of the avatar is displayed in the living room for audience members of the living room to participate in activities including interactive games, item promotions, robbers, group chat, etc. The matching of the activity entry with the character of the avatar means that the activity corresponding to the activity entry matches with the character of the avatar, and is an activity in which a person having the character participates. For example, an avatar of optimistic sunlight may participate in a happy interactive game with a spectator, while a calm avatar may organize group chat activities with a spectator for deep discussion and academic communication.
In addition, other information matching the character of the avatar may be displayed in the living room, such as a pendant matching the character of the avatar, an item purchase portal matching the character of the avatar, a portal of other living rooms matching the character of the avatar, etc.
2032. And processing the interactive information through the large language model to obtain response information which is used for responding to the interactive information and is matched with the emotion type of the interactive information.
When the audience performs interactive operation in the live broadcasting room, the generated comment information or interactive information contains emotion of the audience, and in order to make proper response to the interaction of the audience, response information matched with the emotion type of the interactive information is generated through a large language model. Wherein the response information is matched with the emotion type, and comprises at least one of the following: the response information comprises a text matched with the emotion type, tone color of voice in the response information is matched with the emotion type, tone of voice in the response information is matched with the emotion type, and the like. ; for example, if the emotion type of comment information transmitted by the viewer is a negative type, text for encouraging the viewer is included in the response information of the avatar. Or the spectator gives the gift, and the emotion type corresponding to the gift giving operation is the type supporting live broadcast, so that the response information of the avatar contains thank you text or the voice tone in the response information of the avatar is higher.
In one possible implementation, step 2032 includes: encoding the interactive information through a large language model to obtain interactive features, and carrying out emotion analysis on the interactive information to obtain emotion features, wherein the emotion features represent emotion types of the interactive information; and processing the interaction characteristics and the emotion characteristics through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information.
The large language model comprises an encoder and a decoder, the electronic equipment encodes the interaction information through the encoder to obtain interaction characteristics, emotion analysis is carried out on the interaction information through the encoder to obtain emotion characteristics, the emotion characteristics represent emotion types of the interaction information, the emotion characteristics and the interaction characteristics are fused to obtain fusion characteristics, the fusion characteristics are decoded through the decoder to obtain a response text which is used for responding to the interaction information and is matched with the emotion types of the interaction information, and voice synthesis is carried out on the response text to obtain response voice.
The response information is generated by comprehensively considering the interaction information and the emotion type of the interaction information, so that the response information is matched with the interaction information and the emotion type, the response made by the virtual image is matched with the emotion type of the interaction information, the virtual image is presented to the audience in the living broadcast room to feel the emotion of the audience and respond, the authenticity is improved, the distance between the virtual image and the audience is shortened, and the user experience is improved.
2033. And processing the interactive information through the large language model to obtain response information which is used for responding to the interactive information and meets response conditions, wherein the response conditions represent conditions which the response information of the virtual image should meet.
Wherein the response condition may be set by a technician on the electronic device, the response condition may be a restriction on the expression, gesture, or physical action that the avatar speaks, makes. The response information generated by the large language model needs to meet the response conditions, so that the virtual image can be prevented from making impermissible actions, the information safety is ensured, and the interactive process of the virtual image is ensured to accord with the regulations.
2034. And processing the interaction information and the corpus information through the large language model to obtain response information of the virtual image.
In order to enrich the information amount contained in the response information, corpus information can be set for the avatar, wherein the corpus information is information for introducing the avatar and can comprise occupation, family background, hobbies and interests, life experiences and the like of the avatar, and in addition, the corpus information can also comprise professional knowledge of specific fields, such as legal fields, medical fields and the like, and the fields to which different avatars belong are different. The virtual image can be formed into a personalized and comprehensive-information image by setting corpus information, so that the feeling that the virtual image is a real person is given to people.
And generating response information of the avatar by comprehensively considering the interaction information and the corpus information through the large language model so as to increase the information quantity contained in the response information and ensure that the content of the response information accords with the background of the avatar.
In one possible implementation, in the living broadcast room, profile information of the avatar is displayed, the profile information is extracted from the corpus information, and may be background information of the avatar contained in the corpus information, or the profile information is a story line or the like contained in the corpus information and the avatar is currently experiencing. By displaying profile information so that viewers entering the living room can view the profile information, the avatar is quickly known. And when the audience asks the avatar questions, the avatar can obtain answers to the questions from the corpus information and feed back the answers to the questions as response information to the audience.
In one possible implementation, the corpus information or profile information of the avatar may be updated, e.g., new storylines may be added to the corpus information and profile information aperiodically, thereby adding new experiences to the avatar. The corpus information or profile information may be updated by a technician on the live server.
2035. And processing the interaction information through the large language model to obtain response information of the virtual image and sound effect types matched with the response information.
And, in the living broadcasting room, controlling the avatar to respond according to the response information, including: in the live broadcasting room, the virtual image is controlled to respond according to the response information and play the sound effect belonging to the sound effect type.
In the living room, in addition to the avatar to respond, sound effects may be played in the living room. In order to promote the live broadcast effect, the electronic device generates response information of the avatar through the large language model, and also generates an audio effect type matched with the response information, for example, if the tone of voice in the response information is higher, the audio effect type is a cheering type, or if the text in the response information comprises a professional term, the audio effect type is a sinking type. Therefore, when the virtual image responds, the sound effect type corresponding to the sound effect included in the live broadcasting room is matched with the response of the virtual image, the situation that the played sound effect is inconsistent with the response mode of the virtual image is avoided, and the immersion of the audience is improved.
It should be noted that steps 2031-2035 described above may be combined in any fashion to form alternative embodiments of the application. For example, the interaction is processed through a large language model, and response information which is used for responding to the interaction information, is matched with character features of the avatar and emotion types of the interaction information, and also meets response conditions is obtained.
Fig. 3 is a schematic structural diagram of a live broadcast device according to an embodiment of the present application. Referring to fig. 3, the apparatus includes:
The information acquisition module 301 is configured to acquire interaction information of a live broadcasting room, wherein a host in the live broadcasting room is an avatar, the interaction information includes at least one of interaction operation information and audience comment information, and the interaction operation information represents an interaction operation occurring in the live broadcasting room;
The processing module 302 is configured to process the interaction information through the large language model to obtain response information of the avatar, where the response information is used to respond to the interaction information of the living broadcast room, and the response information includes at least one of voice, expression, gesture and body action;
And a response module 303 for controlling the avatar to respond according to the response information in the living room.
According to the device provided by the embodiment of the application, the interaction information of the live broadcasting room is obtained, the interaction information is processed through the large language model, the response information of the virtual image is obtained, in the live broadcasting room, the virtual image is controlled to respond according to the response information, and as the interaction information can embody the interaction condition of the live broadcasting room, the response information of the virtual image is matched with the interaction information, so that the response of the virtual image is matched with the interaction condition of the live broadcasting room, the effect that the virtual image adaptively interacts with audience in the live broadcasting room according to the interaction condition of the live broadcasting room is presented, the fidelity of the virtual image is improved, and the live broadcasting effect is improved.
In one possible implementation, the information acquisition module 301 is configured to at least one of:
obtaining comment text sent by audience in a live broadcasting room;
The method comprises the steps of obtaining comment voices sent by audience in a live broadcasting room, and converting the comment voices into comment texts;
acquiring the voice of a host broadcasting the virtual image, and converting the voice into a text;
Acquiring the wheat connecting score of the virtual image in the live broadcasting room;
acquiring the number of gifts in a living broadcast room;
Acquiring the number of praise times of a live broadcasting room;
and acquiring activity information corresponding to the ongoing activity in the live broadcasting room.
In one possible implementation, the processing module 302 includes at least one of:
The first processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with characters of the virtual image;
The second processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information;
And the third processing unit is used for processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and meets response conditions, wherein the response conditions represent conditions which the response information of the virtual image should meet.
In one possible implementation, the first processing unit is configured to:
Acquiring character characteristics of the virtual image; coding the interaction information through a large language model to obtain interaction characteristics; processing character features and interaction features through a large language model to obtain a response text which is used for responding to the interaction information and is matched with the character features; and based on the character features, performing voice synthesis on the response text to obtain response voice, and speaking the response text in the response voice by tone colors matched with the character features.
In one possible implementation, the second processing unit is configured to:
Encoding the interactive information through a large language model to obtain interactive features, and carrying out emotion analysis on the interactive information to obtain emotion features, wherein the emotion features represent emotion types of the interactive information; and processing the interaction characteristics and the emotion characteristics through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information.
In one possible implementation, the processing module 302 includes:
and the fourth processing unit is used for processing the interaction information and the corpus information through the large language model to obtain response information of the virtual image, wherein the corpus information is information for introducing the virtual image.
In one possible implementation, the processing module 302 includes:
the fifth processing unit is used for processing the interaction information through the large language model to obtain response information of the virtual image and sound effect types matched with the response information;
the response module 303 includes:
and the response unit is used for playing the sound effect belonging to the sound effect type while controlling the virtual image to respond according to the response information in the live broadcast room.
In one possible implementation, the apparatus further includes:
the training module is used for acquiring sample data, wherein the sample data comprises sample interaction information and sample response information corresponding to the sample interaction information, and the sample response information represents a response required by an avatar in a live broadcast room containing the sample interaction information; a large language model is trained based on the sample data.
The embodiment of the application provides electronic equipment, which comprises a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to realize the live broadcast method in the embodiment.
Optionally, the electronic device is a terminal, and fig. 4 is a block diagram of a structure of a terminal 400 according to an embodiment of the present application. The terminal 400 may be a smart phone, tablet computer, notebook computer, desktop computer, smart television, smart exercise device, etc.
In general, the terminal 400 includes: a processor 401 and a memory 402.
Processor 401 may include one or more processing cores such as a 4-core processor, an 8-core processor, etc. In some embodiments, the processor 401 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 401 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one computer program for execution by processor 401 to implement the live method provided by the method embodiments of the present application.
In some embodiments, the terminal 400 may further optionally include: a peripheral interface 403 and at least one peripheral. The processor 401, memory 402, and peripheral interface 403 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 403 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 404, a display screen 405, a camera assembly 406, an audio circuit 407, a positioning assembly 408, and a power supply 409.
Peripheral interface 403 may be used to connect at least one Input/Output (I/O) related peripheral to processor 401 and memory 402. In some embodiments, processor 401, memory 402, and peripheral interface 403 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 401, memory 402, and peripheral interface 403 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 404 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 404 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 404 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal.
The display screen 405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 405 is a touch display screen, the display screen 405 also has the ability to collect touch signals at or above the surface of the display screen 405. The touch signal may be input as a control signal to the processor 401 for processing. At this time, the display screen 405 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard.
The camera assembly 406 is used to capture images or video. Optionally, camera assembly 406 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. The audio circuit 407 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 401 for processing, or inputting the electric signals to the radio frequency circuit 404 for realizing voice communication.
The location component 408 is used to locate the current geographic location of the terminal 400 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 408 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.
The power supply 409 is used to power the various components in the terminal 400. The power supply 409 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
In some embodiments, the terminal 400 further includes one or more sensors 410. The one or more sensors 410 include, but are not limited to: acceleration sensor 411, gyro sensor 412, pressure sensor 413, optical sensor 414, and proximity sensor 415.
Those skilled in the art will appreciate that the structure shown in fig. 4 is not limiting of the terminal 400 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.
Optionally, the electronic device is a live server, and fig. 5 is a schematic structural diagram of a live server provided in the embodiment of the present application, where the live server 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors 501 and one or more memories 502, where at least one program code is stored in the memories 502, and the at least one program code is loaded and executed by the processors 501 to implement the live method provided in each method embodiment described above. Of course, the live broadcast server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the device functions, which are not described herein.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the live broadcast method in the above-described embodiments. The computer readable storage medium may be a memory. For example, the computer readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory ), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage terminal, and the like.
In an exemplary embodiment, there is also provided a computer program product comprising computer program code, the computer program code being stored in a computer readable storage medium, the computer program code being read from the computer readable storage medium by a processor, the computer program code being executed by the processor to implement the live method as in the above embodiments.
In some embodiments, a computer program according to an embodiment of the present application may be deployed to be executed on one electronic device, or on a plurality of electronic devices located at one site, or on a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network, where the plurality of electronic devices distributed at the plurality of sites and interconnected by the communication network may constitute a blockchain system.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., which fall within the spirit and principles of the present application.

Claims (12)

1. A live broadcast method, the method comprising:
acquiring interaction information of a live broadcasting room, wherein a host broadcast in the live broadcasting room is an virtual image, the interaction information comprises at least one of interaction operation information and audience comment information, and the interaction operation information represents interaction operation occurring in the live broadcasting room;
Processing the interaction information through a large language model to obtain response information of the virtual image, wherein the response information is used for responding to the interaction information of the live broadcasting room, and the response information comprises at least one of voice, expression, gestures and body actions;
And in the live broadcasting room, the virtual image is controlled to respond according to the response information.
2. The method of claim 1, wherein the obtaining interaction information of the live room comprises at least one of:
Obtaining comment text sent by audience in the live broadcasting room;
The comment voice sent by the audience in the live broadcasting room is obtained, and the comment voice is converted into comment text;
Acquiring the voice of a host broadcasting the virtual image, and converting the voice into a text;
acquiring the wheat connecting score of the virtual image in the live broadcasting room;
acquiring the number of gifts in the living broadcast room;
Obtaining the praise times of the live broadcasting room;
And acquiring activity information corresponding to the ongoing activity in the live broadcasting room.
3. The method of claim 1, wherein the processing the interactive information through the large language model to obtain the response information of the avatar includes at least one of:
processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the character of the virtual image;
Processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information;
And processing the interaction information through the large language model to obtain response information which is used for responding to the interaction information and meets response conditions, wherein the response conditions represent conditions which the response information of the avatar should meet.
4. The method of claim 3, wherein the processing the interactive information through the large language model to obtain response information for responding to the interactive information and matching the character of the avatar comprises:
acquiring character characteristics of the virtual image;
Coding the interaction information through the large language model to obtain interaction characteristics;
Processing the character features and the interaction features through the large language model to obtain a response text which is used for responding to the interaction information and is matched with the character features;
and based on the character features, performing voice synthesis on the response text to obtain response voice, wherein the response voice speaks the response text in a tone color matched with the character features.
5. The method of claim 3, wherein processing the interactive information through the large language model to obtain response information for responding to the interactive information and matching emotion types of the interactive information comprises:
encoding the interactive information through the large language model to obtain interactive features, and carrying out emotion analysis on the interactive information to obtain emotion features, wherein the emotion features represent emotion types of the interactive information;
And processing the interaction characteristics and the emotion characteristics through the large language model to obtain response information which is used for responding to the interaction information and is matched with the emotion type of the interaction information.
6. The method of claim 1, wherein the processing the interactive information through the large language model to obtain the response information of the avatar comprises:
And processing the interaction information and the corpus information through a large language model to obtain response information of the virtual image, wherein the corpus information is information for introducing the virtual image.
7. The method of claim 1, wherein the processing the interactive information through the large language model to obtain the response information of the avatar comprises:
processing the interaction information through the large language model to obtain response information of the virtual image and sound effect types matched with the response information;
And in the live broadcasting room, controlling the avatar to respond according to the response information, wherein the method comprises the following steps:
and in the live broadcasting room, the virtual image is controlled to respond according to the response information, and meanwhile, the sound effect belonging to the sound effect type is played.
8. The method according to any one of claims 1-7, further comprising:
acquiring sample data, wherein the sample data comprises sample interaction information and sample response information corresponding to the sample interaction information, and the sample response information represents a response required by an avatar in a live broadcast room containing the sample interaction information;
training the large language model based on the sample data.
9. A live broadcast device, the device comprising:
The information acquisition module is used for acquiring interaction information of a live broadcasting room, wherein a host broadcast in the live broadcasting room is an virtual image, the interaction information comprises at least one of interaction operation information and audience comment information, and the interaction operation information represents interaction operation occurring in the live broadcasting room;
the processing module is used for processing the interaction information through a large language model to obtain response information of the virtual image, wherein the response information is used for responding to the interaction information of the live broadcasting room, and the response information comprises at least one of voice, expression, gestures and body actions;
and the response module is used for controlling the avatar to respond according to the response information in the live broadcasting room.
10. An electronic device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the live method of any of claims 1-8.
11. A computer readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor to implement the live method of any of claims 1-8.
12. A computer program product, characterized in that the computer program product comprises a computer program code, which is stored in a computer readable storage medium, from which computer program code a processor reads, which processor executes the computer program code to implement the live method according to any of claims 1-8.
CN202410190220.9A 2024-02-20 2024-02-20 Live broadcast method and device, electronic equipment and storage medium Pending CN118233665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410190220.9A CN118233665A (en) 2024-02-20 2024-02-20 Live broadcast method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410190220.9A CN118233665A (en) 2024-02-20 2024-02-20 Live broadcast method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118233665A true CN118233665A (en) 2024-06-21

Family

ID=91507871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410190220.9A Pending CN118233665A (en) 2024-02-20 2024-02-20 Live broadcast method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118233665A (en)

Similar Documents

Publication Publication Date Title
US11222632B2 (en) System and method for intelligent initiation of a man-machine dialogue based on multi-modal sensory inputs
US11468894B2 (en) System and method for personalizing dialogue based on user's appearances
CN107750005B (en) Virtual interaction method and terminal
CN112087655B (en) Method and device for presenting virtual gift and electronic equipment
CN109416701A (en) The robot of a variety of interactive personalities
CN108245891B (en) Head-mounted equipment, game interaction platform and table game realization system and method
JP2001230801A (en) Communication system and its method, communication service server and communication terminal
US20230047858A1 (en) Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication
CN113382274B (en) Data processing method and device, electronic equipment and storage medium
CN112749956B (en) Information processing method, device and equipment
CN113409778A (en) Voice interaction method, system and terminal
US20230254449A1 (en) Information processing system, information processing method, information processing program
US20220383849A1 (en) Simulating crowd noise for live events through emotional analysis of distributed inputs
CN111870935A (en) Business data processing method and device, computer equipment and storage medium
KR20180105861A (en) Foreign language study application and foreign language study system using contents included in the same
CN113032542A (en) Live data processing method, device and equipment and readable storage medium
CN115273865A (en) Intelligent voice interaction method, device, equipment and storage medium
CN112752159B (en) Interaction method and related device
CN114146426A (en) Control method and device for game in secret room, computer equipment and storage medium
CN113301352A (en) Automatic chat during video playback
CN118233665A (en) Live broadcast method and device, electronic equipment and storage medium
KR102659886B1 (en) VR and AI Recognition English Studying System
JP7371770B2 (en) Avatar control program, avatar control method, and information processing device
CN114710709A (en) Live broadcast room virtual gift recommendation method and device, storage medium and electronic equipment
US20240357187A1 (en) Server and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination