CN114765033A - Information processing method and device based on live broadcast room - Google Patents

Information processing method and device based on live broadcast room Download PDF

Info

Publication number
CN114765033A
CN114765033A CN202110057957.XA CN202110057957A CN114765033A CN 114765033 A CN114765033 A CN 114765033A CN 202110057957 A CN202110057957 A CN 202110057957A CN 114765033 A CN114765033 A CN 114765033A
Authority
CN
China
Prior art keywords
emotion
target
interactive
live broadcast
analysis result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110057957.XA
Other languages
Chinese (zh)
Inventor
韩卫生
万玉龙
高杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110057957.XA priority Critical patent/CN114765033A/en
Publication of CN114765033A publication Critical patent/CN114765033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Computer Graphics (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

An embodiment of the present specification provides a live broadcast room-based information processing method and apparatus, and a specific implementation manner of the method includes: in response to the fact that the interactive statements submitted by at least one audience user in the target live broadcast room are obtained, identifying target emotion marks corresponding to the at least one audience user respectively according to the interactive statements; generating a first emotion analysis result according to the target emotion mark; and providing a first emotion analysis result to the anchor of the target live broadcast room.

Description

Information processing method and device based on live broadcast room
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a method and a device for processing information based on a live broadcast room, a method and a device for processing information based on an e-commerce live broadcast room, a method and a device for processing information based on a government affair live broadcast room, a method and a device for processing information based on an education live broadcast room, and a method and a device for processing information based on a conference live broadcast room.
Background
With the rapid development of the live broadcast industry, various types of live broadcast platforms are more and more, more and more people are put into the live broadcast industry, and the development situation of 'all flowers are in order' is presented. The current anchor mainly obtains audience feedback according to the real-time subtitles of audience comments, and performs live broadcast adjustment according to the audience feedback, so that the working strength is high.
Therefore, a reasonable and reliable scheme is urgently needed, so that the anchor can quickly and systematically master the live broadcast condition, live broadcast adjustment is made according to the live broadcast condition, and the workload of the anchor is reduced.
Disclosure of Invention
The embodiment of the specification provides a live broadcast room-based information processing method and device, an E-commerce live broadcast room-based information processing method and device, a government affair live broadcast room-based information processing method and device, an education live broadcast room-based information processing method and device, and a conference live broadcast room-based information processing method and device.
In a first aspect, an embodiment of the present specification provides an information processing method based on a live broadcast room, including: in response to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, inputting the interactive sentences into a pre-trained emotion recognition model, and enabling the emotion recognition model to output target emotion marks corresponding to the at least one audience user respectively; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to the anchor of the target live broadcast room.
In some embodiments, the target emotion marker comprises any of: neutral, positive, negative emotions.
In some embodiments, the positive emotion comprises any of: happy, excited and worship; the negative emotion comprises any one of the following: anger, sadness, disgust, fear.
In some embodiments, the generating a first emotion analysis result according to the target emotion mark comprises: counting the occurrence frequency respectively corresponding to different target emotion marks in the identified target emotion marks; generating a first emotion analysis result, wherein the first emotion analysis result comprises the target emotion marks which are different from each other and at least one of the following: the ratio of the frequency of occurrence, the frequency of occurrence and the total frequency of occurrence of the target emotion marks different from each other.
In some embodiments, the generating a first emotion analysis result according to the target emotion mark comprises: acquiring a second emotion analysis result generated in the live broadcast process; and updating the second emotion analysis result according to the target emotion mark, and determining the updated second emotion analysis result as the first emotion analysis result.
In some embodiments, after causing the emotion recognition model to output the target emotion marks respectively corresponding to the at least one audience user, the method further comprises: acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark; providing the interactive voice data to the anchor.
In some embodiments, the obtaining interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark includes: inputting the interactive sentences and the target emotion marks into a target emotion voice synthesis model, so that the target emotion voice synthesis model outputs the interactive voice data.
In some embodiments, before said inputting said interactive statement and said target emotion markup into a target emotion speech synthesis model, said method further comprises: and determining the target emotion voice synthesis model in a plurality of emotion voice synthesis models trained in advance according to the target emotion mark.
In some embodiments, the preset first emotion mark group corresponds to at least one dialect, the first emotion mark belongs to a negative emotion, and the plurality of emotion voice synthesis models comprise emotion voice synthesis models respectively corresponding to the at least one dialect; and determining the target emotion voice synthesis model in a plurality of emotion voice synthesis models trained in advance according to the target emotion mark, wherein the determining comprises the following steps: and if the target emotion mark is contained in the first emotion mark group, selecting one dialect from the at least one dialect, and determining an emotion voice synthesis model corresponding to the dialect as the target emotion voice synthesis model.
In some embodiments, after said providing said interactive voice data to said anchor, said method further comprises: and responding to the target emotion mark contained in a preset second emotion mark group, and providing a target voice template to the anchor, wherein the second emotion mark belongs to the positive emotion, and the target voice template expresses the positive emotion.
In some embodiments, the target emotion speech synthesis model corresponds to a sample utterance object and a target speech template obtained by recording a sound of the sample utterance object reading out a sample text template expressing a positive emotion.
In a second aspect, an embodiment of the present specification provides an information processing method based on a live broadcast room, including: in response to the fact that interactive statements submitted by at least one audience user in a live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to a main broadcasting of the live broadcasting room.
In a third aspect, an embodiment of the present specification provides an information processing method based on a live broadcast room, including: in response to the acquisition of interactive sentences submitted by audience users in a live broadcast room, identifying target emotion marks corresponding to the audience users according to the interactive sentences; acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark; and providing the interactive voice data to the anchor of the target live broadcast room.
In a fourth aspect, an embodiment of the present specification provides an information processing method based on an e-commerce live broadcast room, including: in response to the fact that interactive statements submitted by at least one audience user in a live television broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to a main broadcast of the E-commerce live broadcast room.
In a fifth aspect, an embodiment of the present specification provides an information processing method based on a government affair live broadcast room, including: in response to the fact that interactive statements submitted by at least one audience user in a government affair live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to a main broadcasting of the government affair live broadcasting room.
In a sixth aspect, an embodiment of the present specification provides an information processing method based on an education live broadcast room, including: in response to the fact that interactive statements submitted by at least one audience user in an education live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to a main broadcasting of the education live broadcasting room.
In a seventh aspect, an embodiment of the present specification provides an information processing method based on a conference live room, including: in response to the acquisition of interactive sentences submitted by at least one audience user in a conference live broadcasting room, identifying target emotion marks respectively corresponding to the at least one audience user according to the interactive sentences; generating a first emotion analysis result according to the target emotion mark; and providing the first emotion analysis result to a main broadcasting of the conference live broadcasting room.
In an eighth aspect, an embodiment of the present specification provides a live broadcast room-based information processing apparatus, including: the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, input the interactive sentences into a pre-trained emotion recognition model, and enable the emotion recognition model to output target emotion marks corresponding to the at least one audience user respectively; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a main broadcast of the target live broadcast room.
In a ninth aspect, an embodiment of the present specification provides a live broadcast room-based information processing apparatus, including: the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, and target emotion marks respectively corresponding to the at least one audience user are recognized according to the interactive sentences; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a main broadcasting of the target live broadcasting room.
In a tenth aspect, an embodiment of the present specification provides a live broadcast room-based information processing apparatus, including: the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by audience users in a target live broadcast room are obtained, and recognize target emotion marks corresponding to the audience users according to the interactive sentences; the obtaining unit is configured to obtain interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark; a providing unit configured to provide the interactive voice data to a host of the target live broadcast room.
In an eleventh aspect, an embodiment of the present specification provides an information processing apparatus based on a live telecast, including: the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a live telecast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive sentences; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a main broadcasting of the live telecast.
In a twelfth aspect, an embodiment of the present specification provides an information processing apparatus based on a government affair live broadcast room, including: the emotion recognition unit is configured to respond to the fact that interactive statements submitted by at least one audience user in a government affair live broadcast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive statements; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a broadcaster of the government affair live broadcast.
In a thirteenth aspect, an embodiment of the present specification provides an information processing apparatus based on an education live broadcast room, including: the emotion recognition unit is configured to respond to the acquisition of interactive sentences submitted by at least one audience user in an education live broadcast room, and recognize target emotion marks respectively corresponding to the at least one audience user according to the interactive sentences; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a main broadcasting of the education live broadcasting room.
In a fourteenth aspect, an embodiment of the present specification provides an information processing apparatus based on a live conference room, including: the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a conference live broadcast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive sentences; a generating unit configured to generate a first emotion analysis result according to the target emotion mark; a providing unit configured to provide the first emotion analysis result to a director of the live conference room.
In a fifteenth aspect, the present specification provides a computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed in a computer, the computer is caused to execute the method described in any one of the implementation manners of the first aspect to the seventh aspect.
In a sixteenth aspect, the present specification provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of the implementation manners of the first aspect to the seventh aspect.
In the information processing method and apparatus based on the live broadcast room provided in the above embodiments of the present specification, by responding to an interactive statement obtained by at least one viewer user in a target live broadcast room, target emotion marks respectively corresponding to the at least one viewer user are identified according to the interactive statement, and then a first emotion analysis result is generated according to the target emotion marks, so as to provide the first emotion analysis result to a main broadcast of the target live broadcast room. Therefore, the targeted information generation is realized, and the live broadcasting function is enriched. Moreover, the anchor can quickly and systematically master the live broadcast condition through the first emotion analysis result, and then timely make live broadcast adjustment according to the live broadcast condition, so that the work load of the anchor can be effectively reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings needed to be used in the description of the embodiments will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments disclosed in the present specification, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present description may be applied;
FIG. 2 is a flow diagram of one embodiment of a live-room based information processing method in accordance with the present description;
FIG. 3a is a diagram illustrating the effect of the first emotion analysis result;
FIG. 3b is another schematic diagram showing the effect of the first emotion analysis result;
fig. 3c is a schematic diagram of a personalized voice broadcast sub-process;
fig. 3d is another schematic diagram of a personalized voice announcement sub-process;
FIG. 4 is a flow diagram of one embodiment of a method for E-commerce live room-based information processing according to the present description;
FIG. 5 is a flow diagram of one embodiment of a government affairs live room based information processing method according to the present description;
FIG. 6 is a flow diagram of one embodiment of a method for information processing based on an educational direct broadcast room in accordance with the present description;
FIG. 7 is a flow diagram for one embodiment of a live conference room-based information processing method in accordance with the present description;
FIG. 8 is a flow diagram of one embodiment of a live-room based information processing method in accordance with the present description;
FIG. 9 is a flow diagram of one embodiment of a live room-based information processing method according to the present description;
fig. 10 is a schematic configuration diagram of a live room-based information processing apparatus according to the present specification;
fig. 11 is a schematic configuration diagram of a live-air-room-based information processing apparatus according to the present specification;
fig. 12 is a schematic configuration diagram of a live-air-room-based information processing apparatus according to the present specification;
fig. 13 is a schematic configuration diagram of an information processing apparatus based on a live telecast room according to the present specification;
fig. 14 is a schematic configuration diagram of an information processing apparatus based on a government affairs live broadcast room according to the present specification;
fig. 15 is a schematic view of a configuration of an information processing apparatus based on an education live room according to the present specification;
fig. 16 is a schematic diagram of a configuration of a live conference room-based information processing apparatus according to the present specification.
Detailed Description
The present specification will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. The described embodiments are only a subset of the embodiments described herein and not all embodiments described herein. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present application.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present description may be combined with each other without conflict. In addition, the terms "first", "second", and the like in the present specification are used only for information distinction and do not play any limiting role.
As described above, the current anchor mainly obtains audience feedback according to the real-time subtitles of audience comments, and makes live broadcast adjustment according to the audience feedback, so that the working intensity is high.
Based on this, some embodiments of the present specification provide a live broadcast room-based information processing method by which targeted information generation can be achieved and live broadcast functions can be enriched. Moreover, the anchor can master the live broadcast condition quickly and systematically, and live broadcast adjustment is made according to the live broadcast condition, so that the work load of the anchor can be effectively reduced. In particular, FIG. 1 illustrates an exemplary system architecture diagram suitable for use with these embodiments.
As shown in fig. 1, terminal devices 101, 102, 103, 105 and a server 104 are shown. Wherein, terminal devices 101, 102, 103 are respectively installed with audience version APP (Application), terminal device 105 is installed with main broadcast version APP, and server 104 is a background server providing support for these two APPs.
It should be noted that the terminal device may be various electronic devices, which may include, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like, and is not limited in particular herein.
The audience version APP may be an APP for use by an audience user watching a live broadcast. Further, the APP category of the audience version APP may be, for example, a shopping APP, a social APP, a game APP, an education APP, a government APP, a conference APP, or a live APP, and is not limited herein. When the audience version APP belongs to the live broadcast type APP, the audience version APP can be called a live broadcast audience version APP.
The anchor version APP may be an APP for use by the anchor. The user who carries out the live broadcast in the live broadcast room that opens in the main broadcast version APP is called the main broadcast. The live broadcast room opened in the main broadcast APP can be called a target live broadcast room. In practice, the application category of the anchor version APP may be consistent with that of the viewer version APP, and is not limited herein. When the anchor version APP belongs to the live broadcast type APP, the anchor version APP can be called a live broadcast anchor version APP.
In different live scenes, the target live broadcast room may have different designations. For example, in an educational live scenario, the target live room may be referred to as an educational live room. In a conference live scenario, the target live room may be referred to as a conference live room. In the E-commerce live broadcast scene, the target live broadcast room can be called an E-commerce live broadcast room. In a government affair live broadcasting scene, the target live broadcasting room can be called as a government affair live broadcasting room.
Generally, audience users in a target live broadcast room can enter the target live broadcast room by using an audience version APP and submit interactive statements in the target live broadcast room. Wherein the interactive sentence may be a sentence in which the audience user interacts with the anchor. Further, the interactive statements may include, but are not limited to, comment statements.
Taking an interactive sentence as a comment sentence, and taking a target live broadcast room opened in a main broadcast version APP installed on the terminal device 105 as a live broadcast room a as an example. The audience User1 to which the terminal device 101 belongs may open an audience version live interface of the live broadcast room a in an audience version APP installed on the terminal device 101, then input a comment statement 1 in a comment area of the interface, and trigger (e.g., click) a submit button, so that the audience version APP sends the comment statement 1 to the server 104. The audience User2 to which the terminal device 102 belongs may adopt a method similar to the User1, so that the audience version APP installed on the terminal device 102 sends the comment statement 2 to the server 104. The audience User3 to which the terminal device 103 belongs may also adopt a method similar to the User1, so that the audience version APP installed on the terminal device 103 sends the comment sentence 3 to the server 104. The audience version live broadcast interface is a live broadcast interface in the audience version APP. It should be understood that the comment sentences 1, 2, 3 are only exemplary comment sentences, and the specification does not set any limit to the specific comment content of the comment sentences.
In general, the server 104 may send the interactive statements submitted by the viewer user in the live broadcast room a, for example, the comment statements 1, the comment statements 2, and the comment statements 3, as described above, to the anchor APP installed on the terminal device 105, so that the anchor APP displays the received interactive statements in an interactive statement display area (for example, a comment statement display area) on the anchor live broadcast interface of the live broadcast room a. And the main broadcast version live interface is a live interface in the main broadcast version APP. Note that the sending of the interactive statements by the server 104 and the presentation of the interactive statements by the main-cast APP are not shown in fig. 1.
It should be understood that, in addition to displaying the interactive sentence in the interactive sentence display area, other information related to the interactive sentence, such as a nickname, an avatar of a viewer user who transmits the interactive sentence, and/or a transmission time of the interactive sentence, etc., may be displayed, which is not particularly limited herein.
In order to realize information generation rich in pertinence, enrich the live broadcast function, enable the anchor to quickly and systematically master the live broadcast condition, make live broadcast adjustment according to the live broadcast condition, and effectively reduce the workload of the anchor, the server 104 can perform statistical analysis on the emotion of the audience according to the interactive statements submitted by the audience users in the target live broadcast room, and provide emotion analysis results for the anchor in the target live broadcast room, so that the anchor adjusts the live broadcast content according to the emotion analysis results.
Continuing to take comment statement 1, comment statement 2, and comment statement 3 as an example, as shown in fig. 1, after the server 104 acquires comment statement 1, comment statement 2, and comment statement 3 submitted by the audience users belonging to the terminal devices 101, 102, and 103 respectively in the live broadcast room a, the server can identify, according to the 3 comment statements, the target emotion mark 1 corresponding to the audience User1, the target emotion mark 2 corresponding to the audience User2, and the target emotion mark 3 corresponding to the audience User 3. Then, the server 104 can generate a first emotion analysis result according to the 3 target emotion marks. Then, the server 104 may send the first emotion analysis result to the anchor version APP installed on the terminal device 105, so that the anchor version APP shows the first emotion analysis result to the anchor, for example, shows the first emotion analysis result on an anchor version live interface of the live broadcast room a.
The above briefly describes the case of statistical analysis of audience emotion at the server side. It should be noted that, when the computing power and the storage power of the terminal device 105 are strong, the operation of performing the statistical analysis on the audience emotion may also be performed on the terminal side, for example, by the main broadcast version APP, which is not limited in this respect. In addition, the operation of performing statistical analysis on the audience emotion can be executed under the condition that the audience emotion statistical analysis function is started aiming at the target live broadcast room.
It should be understood that the number of terminal devices and servers in fig. 1 is merely illustrative. There may be any number of terminal devices and servers, as desired for implementation.
The specific steps of the above method are described below with reference to specific examples. In order to distinguish the currently generated emotion analysis result from the previously generated emotion analysis result in the current live broadcast process, the currently generated emotion analysis result is referred to as a first emotion analysis result, and the previously generated emotion analysis result is referred to as a second emotion analysis result.
Referring to fig. 2, a flow 200 of one embodiment of a live-room based information processing method is shown. The execution subject of the method may be server 104, terminal device 105 or a main broadcast APP installed on terminal device 105 as shown in fig. 1. The method comprises the following steps:
step 201, in response to acquiring an interactive statement submitted by at least one audience user in a target live broadcast room, identifying target emotion marks respectively corresponding to the at least one audience user according to the interactive statement;
step 202, generating a first emotion analysis result according to the target emotion mark;
step 203, providing a first emotion analysis result to the anchor of the target live broadcast room.
The above steps are further explained below.
In step 201, interactive statements submitted by at least one audience user in a target live broadcast room may be obtained in real time, and target emotion marks corresponding to the at least one audience user are identified according to the obtained interactive statements. The interactive sentences may include, but are not limited to, comment sentences.
The single target sentiment mark may include, for example, neutral, positive sentiment or negative sentiment. Further, the positive emotions may include happy, excited, or worship, etc., and the negative emotions may include angry, anger, sadness, disgust, or fear, etc. Optionally, the neutrality may further include surprise, boredom, weakness, or the like.
It should be noted that, no matter whether the execution subject is located at the terminal side or the server side, a local identification method may be adopted to identify the target emotion mark corresponding to the audience user. Optionally, when the execution main body is located at the terminal side, the execution main body may also use a remote identification manner to identify the target emotion mark corresponding to the audience user, for example, an interactive statement submitted by the audience user in a target live broadcast room may be sent to a corresponding emotion identification server, and the emotion identification server identifies the target emotion mark corresponding to the audience user according to the interactive statement and returns the target emotion mark.
Further, the specific emotion mark recognition method may include, for example, a keyword recognition method. For example, an emotion mark set can be preset, and each emotion mark in the emotion mark set corresponds to a keyword set. The keywords in the keyword set represent the emotion indicated by the emotion mark. For example, the emotion mark may be a happy emotion mark, and the keyword set corresponding to the emotion mark may include keywords such as happy keyword, haha keyword, yaho keyword, and/or bizarre keyword. For interactive sentences submitted by audience users in a target live broadcast room, the matching degree of the interactive sentences and the keyword sets corresponding to the emotion marks in the emotion mark sets can be calculated, and the emotion marks corresponding to the keyword sets with the highest matching degree of the interactive sentences are determined as the target emotion marks corresponding to the audience users.
Alternatively, the specific emotion mark recognition method may include, for example, automatically identifying a target emotion mark corresponding to the viewer user through NLP (Natural Language Processing) technology. For example, the interactive sentences are input into a pre-trained emotion recognition model, so that the emotion recognition model outputs target emotion marks. The emotion recognition model may be, for example, a classifier for classifying emotions. In emotion recognition using an emotion recognition model, one or more interactive sentences may be input to the emotion recognition model each time. It should be understood that the description does not specifically limit the emotion recognition model. Based on this, step 201 may further include: in step 2011, in response to obtaining the interactive sentences submitted by the at least one audience user in the target live broadcast room, the interactive sentences are input into the pre-trained emotion recognition model, so that the emotion recognition model outputs the target emotion marks corresponding to the at least one audience user respectively. Thus, the method described in the corresponding embodiment of fig. 2 may be further illustrated in fig. 8. Fig. 8 is a flowchart of an embodiment of a live room-based information processing method according to the present specification.
In step 202, a first emotion analysis result can be generated according to the target emotion mark. It should be noted that the first emotion analysis result may be an accumulated result in the live broadcast process, or may not be an accumulated result, and is not specifically limited herein.
When the first emotion analysis result is an accumulated result in the live broadcast process, the first emotion analysis result may be generated on the basis of an emotion analysis result (hereinafter, referred to as a second emotion analysis result) previously generated in the live broadcast process. When the first emotion analysis result is not the accumulated result, the first emotion analysis result is generated only according to the target emotion mark recognized in the current round.
As an implementation manner, if the first emotion analysis result is not an accumulated result, or the first emotion analysis result is an accumulated result and the process 200 is executed for the first time in the live broadcast process, in step 202, the occurrence frequencies corresponding to different target emotion marks may be counted among the identified target emotion marks. Then, a first emotion analysis result can be generated, and the first emotion analysis result can include the target emotion marks which are different from each other and at least one of the following: the ratio of the frequency of occurrence, the frequency of occurrence and the total frequency of occurrence of the mutually different target emotion marks. Wherein, the ratio can be regarded as the proportion of the occurrence frequency.
It should be understood that the frequency of occurrence of any one of the target emotion marks different from each other is the frequency of occurrence of the target emotion mark in each of the target emotion marks.
As another implementation manner, if the first emotion analysis result is an accumulated result and the process 200 is not executed for the first time in the live broadcast process, in step 202, a second emotion analysis result generated in the live broadcast process may be obtained, the second emotion analysis result is updated according to the target emotion mark, and the updated second emotion analysis result is determined as the first emotion analysis result.
It should be understood that the second sentiment analysis result and the first sentiment analysis result comprise the same fields, for example each comprising a sentiment flag and at least one of the following sentiment flags: the occurrence frequency and the proportion of the occurrence frequency.
Assuming that a second emotion analysis result generated in the live broadcast process includes an emotion mark, an occurrence frequency and a proportion field of the occurrence frequency, for any identified target emotion mark, if the second emotion analysis result includes the target emotion mark, the occurrence frequency and the proportion of the occurrence frequency of the target emotion mark can be adjusted in the second emotion analysis result. If the second emotion analysis result does not include the target emotion mark, the frequency of occurrence of the target emotion mark, and the proportion of the frequency of occurrence may be supplemented to the second emotion analysis result.
In step 203, the executing main body may provide a first emotion analysis result to the anchor of the target live broadcast room. Specifically, when the execution main body is located at the terminal side, the execution main body may directly present the first emotion analysis result to the anchor. When the execution main body is located at the server side, the execution main body can send the first emotion analysis result to the anchor version APP where the target live broadcast room is located, so that the anchor version APP shows the first emotion analysis result to the anchor.
As an example, assuming that the first emotion analysis result includes two fields of emotion flag and frequency of occurrence, the first emotion analysis result generated in step 202 includes [ happy: 3, gas generation: 1; and (3) neutrality: 1], wherein the numbers 3 and 1 are the occurrence frequencies. The effect of the first emotion analysis result can be shown in fig. 3 a. Fig. 3a is a schematic diagram illustrating the effect of the first emotion analysis result.
As another example, assuming that the first emotion analysis result includes three fields of emotion mark, frequency of occurrence, and proportion of frequency of occurrence, the first emotion analysis result generated in step 202 includes [ happy: <3, 60% >, gassing: <1, 20% >; and (3) neutrality: <1, 20% > ] ], wherein the numbers 3 and 1 are the occurrence frequency, and the percentages 60% and 20% are the proportion of the occurrence frequency. The effect of the first emotion analysis result can be shown in fig. 3 b. Fig. 3b is another schematic diagram of the display effect of the first emotion analysis result.
According to the scheme provided by the embodiment, the target emotion marks corresponding to the audience users are identified according to the interactive statements obtained in response to the interactive statements submitted by the audience users in the target live broadcast room, and then the first emotion analysis result is generated according to the target emotion marks, so that the first emotion analysis result is provided for the anchor broadcast in the target live broadcast room. Therefore, the targeted information generation is realized, and the live broadcasting function is enriched. Moreover, the anchor can quickly and systematically master the live broadcast condition through the first emotion analysis result, and then timely make live broadcast adjustment according to the live broadcast condition, so that the work load of the anchor can be effectively reduced. In addition, the first emotion analysis result is provided for the anchor, so that the anchor can actively communicate with audience users, for example, when the dissatisfaction proportion of the audience (such as the proportion of occurrence frequency of negative emotion) reaches a certain proportion, the anchor can pause the current live broadcast and listen to the audience opinions together with the wheat.
In practice, the information processing method based on the live broadcast room provided by the embodiment of the present specification may be applied to different live broadcast scenes, such as an e-commerce live broadcast scene, a government affair live broadcast scene, an education live broadcast scene, and/or a conference live broadcast scene.
For example, in an e-commerce live broadcast scenario, a flow of the information processing method based on the e-commerce live broadcast room may be as shown in fig. 4. Fig. 4 is a flowchart of an embodiment of an information processing method based on a live telecast room. The method comprises the following steps: step 401, in response to the acquisition of the interactive statements submitted by at least one audience user in the e-commerce live broadcast room, identifying target emotion marks respectively corresponding to the at least one audience user according to the interactive statements; step 402, generating a first emotion analysis result according to the target emotion mark; and step 403, providing a first emotion analysis result to the anchor of the live television broadcast room.
Wherein, the E-commerce live broadcast room can be a live broadcast room for the main broadcast to sell the commodities. The commodity may include a physical commodity, a virtual commodity, and the like, and is not particularly limited herein. The audience user may submit any interactive statements within the e-commerce live room, such as interactive statements relating to items sold by the anchor or anchor, etc.
The method described in the embodiment corresponding to fig. 4 realizes information generation rich in pertinence, and enriches the live broadcast function in the e-commerce live broadcast scene. Moreover, the live broadcast condition of the anchor of the live broadcast room of the E-commerce can be rapidly and systematically mastered through the first emotion analysis result, and then live broadcast adjustment can be timely made according to the live broadcast condition, so that the workload of the anchor can be effectively reduced.
In the scene of the direct government affairs, the flow of the information processing method based on the direct government affair room can be as shown in fig. 5. Fig. 5 is a flowchart of an embodiment of an information processing method based on a government affair live broadcast room. The method comprises the following steps: step 501, in response to the fact that interactive statements submitted by at least one audience user in a government affair live broadcast room are obtained, identifying target emotion marks corresponding to the at least one audience user respectively according to the interactive statements; step 502, generating a first emotion analysis result according to the target emotion mark; step 503, providing the first emotion analysis result to the anchor of the government affair live broadcast room.
Wherein, the direct broadcasting room of the government affairs can be a direct broadcasting room for the direct broadcasting of the government affairs. Government affairs generally refer to the transactional work of the government. The government affair live broadcast may include a live broadcast relating to a transactional job of the government. For example, the government affair live broadcast can include, but is not limited to, a live broadcast related to official statement, a live broadcast related to audition, a live broadcast related to government affair publishing and opinion feedback collection, a government affair meeting live broadcast suitable for meeting listening, and the like, and the embodiment does not limit the content of the live broadcast of the government affair. Audience users may submit any interactive statements within the government live room, such as interactive statements related to the anchor or live government content.
The method described in the embodiment corresponding to fig. 5 realizes the targeted information generation, and enriches the live broadcast function in the government affair live broadcast scene. Moreover, the anchor of the government affair live broadcasting room can quickly and systematically master the live broadcasting condition through the first emotion analysis result, and then timely make live broadcasting adjustment according to the live broadcasting condition, so that the work burden of the anchor can be effectively reduced.
In the education live broadcast scene, the flow of the information processing method based on the education live broadcast room can be as shown in fig. 6. FIG. 6 is a flow diagram of one embodiment of a method for information processing based on an educational direct broadcast room. The method comprises the following steps: 601, in response to the interactive statements obtained and submitted by at least one audience user in the education live broadcast room, identifying target emotion marks respectively corresponding to the at least one audience user according to the interactive statements; step 602, generating a first emotion analysis result according to the target emotion mark; step 603, providing a first emotion analysis result for the anchor of the education live broadcast room.
Wherein the education live broadcast room may be a live broadcast room for education live broadcast. In practice, the students in free teaching, schools, teachers in training institutions and the like can all use the direct broadcasting room for teaching on line. Audience users may submit any interactive statements within the educational live room, such as interactive statements related to the main or live lessons, and the like.
The method described in the embodiment corresponding to fig. 6 realizes targeted information generation, and enriches the live broadcast function in the educational live broadcast scene. Moreover, the anchor of the education live broadcast room can quickly and systematically master the live broadcast condition through the first emotion analysis result, and then live broadcast adjustment is timely made according to the live broadcast condition, so that the workload of the anchor can be effectively relieved.
In a conference live scene, a flow of the information processing method based on the conference live room may be as shown in fig. 7. Fig. 7 is a flowchart of an embodiment of a conference live room-based information processing method. The method comprises the following steps: step 701, in response to the acquisition of an interactive statement submitted by at least one audience user in a live conference room, identifying target emotion marks respectively corresponding to the at least one audience user according to the interactive statement; step 702, generating a first emotion analysis result according to the target emotion mark; step 703, providing a first emotion analysis result to the anchor of the conference live broadcast room.
Wherein, the conference live room can be a live room for conference live. The conference live may include live broadcasts relating to various categories of conferences. The various categories may include, but are not limited to, business meetings, educational meetings at schools, meetings at state organizations, social organizational meetings, and the like. The viewer user may submit any interactive statements within the live conference room, such as interactive statements related to the anchor, the conference content, or the conference flow, etc.
The method described in the embodiment corresponding to fig. 7 realizes information generation rich in pertinence, and enriches the live broadcast function in the conference live broadcast scene. Moreover, the anchor of the conference live room can quickly and systematically master the live broadcast condition through the first emotion analysis result, and then live broadcast adjustment can be timely made according to the live broadcast condition, so that the workload of the anchor can be effectively reduced.
Optionally, the anchor APP may further have an interactive voice broadcast function, and the flows of the embodiments corresponding to fig. 2 and fig. 8 may further include a personalized voice broadcast sub-flow. The sub-process can be executed when the anchor starts the interactive voice broadcast function for the target live broadcast room. Specifically, the sub-process is executed after step 201 or step 2011, and includes:
step 204, acquiring interactive voice data corresponding to the interactive sentences according to the interactive sentences and the target emotion marks;
step 205, providing interactive voice data to the anchor of the target live broadcast room.
In step 205, when the execution main body is located at the terminal side, the execution main body may directly play the interactive voice data to the anchor. When the execution main body is located at the server side, the execution main body can send the interactive voice data to the main broadcast version APP where the target live broadcast room is located, so that the main broadcast version APP plays the interactive voice data to the main broadcast. It should be noted that the terminal device where the anchor version APP is located includes a voice playing device, such as a speaker, and the anchor version APP can control the voice playing device to play interactive voice data.
In addition, in step 204, whether the execution main body is located at the terminal side or the server side, the execution main body may acquire the interactive voice data corresponding to the interactive sentence in a local acquisition manner. When the interactive sentence is a comment sentence, the interactive voice data may be referred to as comment voice data.
Optionally, when the execution main body is located at the terminal side, a remote acquisition mode may also be adopted to acquire interactive voice data corresponding to the interactive statement. For example, the interactive statement and the target emotion mark may be sent to a corresponding voice recognition server, and the voice recognition server obtains interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark, and returns the interactive voice data. The speech recognition server and the emotion recognition server in the foregoing may be the same server or different servers, and are not limited specifically herein.
It should be noted that the interactive voice data in this specification, with the emotion indicated by the corresponding target emotion mark, is personalized voice data.
Optionally, acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark, which may further include: and inputting the interactive sentences and the target emotion marks into the target emotion voice recognition model, so that the target emotion voice recognition model outputs interactive voice data corresponding to the interactive sentences. By effectively utilizing the target emotion voice recognition model, the acquisition efficiency and accuracy of the interactive voice data can be improved.
The target emotion voice synthesis model can be trained in the following way: taking first information as input, wherein the first information at least comprises text information and emotion marks corresponding to the text information, taking voice data of the text information read by a sample voice object as a training label, and training an initial voice synthesis model. The initial speech synthesis model may be a pre-trained model or an untrained model, and is not limited herein.
The emotion marks in the first information may be any of the emotion marks listed above. The sample sound-emitting object is typically a natural person. The speech data used as the training labels may be read by the sample uttering subject using mandarin or some dialect. Dialects are typically local languages and may include, for example, Sichuan, northeast, Henan, Guangdong, Shandong, and/or Shaanxi, among others. It should be understood that the present description is not intended to specifically limit the kind of dialect.
Optionally, before step 204, the method may further include: and determining a target emotion voice synthesis model from a plurality of emotion voice synthesis models trained in advance according to the target emotion mark.
In one implementation, the preset first emotion mark group may correspond to at least one dialect, the first emotion mark belongs to the negative emotion, and the emotion voice synthesis models include emotion voice synthesis models corresponding to the at least one dialect respectively. If the target emotion mark is included in the first emotion mark group, one dialect can be selected from the at least one dialect, for example, one dialect is randomly selected, and the emotion voice synthesis model corresponding to the dialect is determined as the target emotion voice synthesis model.
Any one of the first emotion marks may include anger or anger, etc. Any one of the at least one dialect may be a dialect that can be used for an active atmosphere and is highly interesting, for example, northeast china language or Sichuan china language. The training label of the emotion voice synthesis model corresponding to the dialect is voice data adopting the dialect. And synthesizing emotion voice by using the emotion voice synthesis model corresponding to the dialect, so that personalized interactive voice data adopting the dialect can be synthesized.
Referring to fig. 3c, a schematic diagram of a personalized voice broadcast sub-process is shown. Taking the execution subject as a server, the interactive statement includes a comment statement, and in the sub-flow, it is assumed that the at least one dialect includes a Sichuan dialect. As shown in fig. 3c, if the server recognizes the emotion mark corresponding to the audience user according to the comment sentence of the audience user, the server may select one dialect from the at least one dialect, for example, select a tetralogy, and determine the emotion speech synthesis model corresponding to the tetralogy as the target speech synthesis model. Then, the server can input the comment sentences and the angry emotion marks into the target emotion voice synthesis model, so that the target emotion voice synthesis model outputs synthesized comment voice data. And then, the server side can send the comment voice data to the main broadcasting version APP where the target live broadcasting room is located. The anchor version APP may then play the commentary voice data to the anchor.
When the target emotion mark is the first emotion mark, for example, anger or anger, the emotion speech synthesis model corresponding to the dialect of the at least one dialect is selected as the target emotion speech synthesis model, and personalized interactive speech data can be synthesized according to the interactive sentence and the target emotion mark by using the target emotion speech synthesis model, so that the interactive speech data is relatively funny and has relatively high interest. By providing the anchor with this interactive voice data, the live ambience can be effectively adjusted.
In another implementation, the emotion speech synthesis models corresponding to the at least one dialect are referred to as first models, and if the target emotion mark is not included in the first emotion mark group, the target emotion speech synthesis model may be identified in a second model other than the first model among the plurality of emotion speech synthesis models. For example, a second model is randomly selected as the target emotion speech synthesis model, or a second model corresponding to mandarin as the target emotion speech synthesis model, or a second model corresponding to dialect as the target emotion speech synthesis model, which is not specifically limited herein.
Optionally, in order to enrich the live broadcast function and activate the live broadcast atmosphere, the personalized voice broadcast sub-process may further include a voice template providing step. The speech template providing step may be performed after step 205. Specifically, the voice template providing step may include: and providing the target voice template to the anchor in response to the target emotion mark being included in a preset second emotion mark group. It is noted that when the execution body is located on the terminal side, the execution body may directly play the target voice template to the anchor. When the execution main body is located at the server side, the execution main body can send the target voice template to the anchor version APP where the target live broadcast room is located, so that the anchor version APP plays the target voice template to the anchor.
Any one of the second emotion marks belongs to positive emotion, and the second emotion mark can comprise happiness or excitement and the like. In practice, the target speech template expresses a positive emotion and may be preset. The target voice template may be referred to as a target voice egg, and the semantics thereof may include "give you a great like" or "stick and drop", for example, and is not specifically limited herein.
Alternatively, the target emotion speech synthesis model may correspond to the sample utterance object and the target speech template. The target speech template may be obtained by recording the sound of the sample utterance object reading the sample text template. Wherein the sample text template expresses a positive emotion. The sample text template may be referred to as a sample text colored egg, the text content of which may include, for example, "give you a great like" or "lollipop," etc. It should be understood that the sample text template may be set according to actual requirements, and the present specification is not limited thereto.
On the basis of obtaining the interactive voice data by using the target emotion voice synthesis model, the target voice template provided for the anchor can be a target voice template corresponding to the target emotion voice synthesis model.
Referring to fig. 3d, another schematic diagram of the personalized voice broadcast sub-process is shown. Taking the execution main body as a server, taking the interactive statement comprising the comment statement as an example, and in the sub-flow, assuming that the happy emotion mark exists in the second emotion mark group. As shown in fig. 3d, if the server identifies the happy emotion mark corresponding to the audience user according to the comment sentence of the audience user, the server may input the comment sentence and the happy emotion mark into the target emotion voice synthesis model, so that the target emotion voice synthesis model outputs synthesized comment voice data. The target emotion voice synthesis model corresponds to the target voice colored eggs. Thereafter, the server may send (e.g., simultaneously or separately) the comment speech data and the target speech egg to the main broadcasting APP where the target live broadcasting room is located. Then, the anchor APP can sequentially play the comment voice data and the target voice painted eggs to the anchor.
With further reference to fig. 9, the present specification provides a flow 900 of an embodiment of a live-room-based information processing method. The execution subject of the method may be server 104, terminal device 105 or a main broadcast APP installed on terminal device 105 as shown in fig. 1. The method comprises the following steps:
step 901, in response to the acquisition of an interactive statement submitted by an audience user in a target live broadcast room, identifying a target emotion mark corresponding to the audience user according to the interactive statement;
step 902, acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark;
step 903, providing the interactive voice data to the anchor of the target live broadcast room.
In step 901, the number of viewer users may be one or more, which is not limited herein. In addition, for the explanation of the steps 901-903, reference may be made to the related description in the foregoing, and details are not repeated here.
In the information processing method based on the live broadcast room provided by this embodiment, the target emotion mark corresponding to the audience user is identified according to the interactive statement in response to the interactive statement submitted by the audience user in the target live broadcast room, and then the interactive voice data corresponding to the interactive statement is acquired according to the interactive statement and the target emotion mark, so that personalized interactive voice data is provided to the anchor broadcast in the target live broadcast room. From this, can enrich live broadcast function to and effectively adjust live broadcast atmosphere.
With further reference to fig. 10, the present specification provides an embodiment of a live-air-based information processing apparatus, which may be applied to a server 104, a terminal device 105, or an anchor version APP installed on the terminal device 105 as shown in fig. 1.
As shown in fig. 10, the information processing apparatus 1000 based on the live broadcast room of the present embodiment includes: emotion recognition section 1001, generation section 1002, and providing section 1003. The emotion recognition unit 1001 is configured to, in response to acquiring an interactive statement submitted by at least one audience user in a target live broadcast room, recognize target emotion marks respectively corresponding to the at least one audience user according to the interactive statement; the generating unit 1002 is configured to generate a first emotion analysis result according to the target emotion mark; the providing unit 1003 is configured to provide the first emotion analysis result to the anchor of the target live broadcast room.
Optionally, the emotion recognition unit 1001 may be further configured to: and inputting the interactive sentences into a pre-trained emotion recognition model, so that the emotion recognition model outputs target emotion marks.
With further reference to fig. 11, the present specification provides an embodiment of a live-air-room-based information processing apparatus, which can be applied to the server 104, the terminal device 105, or the main broadcast version APP installed on the terminal device 105 as shown in fig. 1.
As shown in fig. 11, the information processing apparatus 1100 based on the live broadcast room of the present embodiment includes: emotion recognition section 1101, generation section 1102, and providing section 1103. The emotion recognition unit 1101 is configured to respond to the acquisition of interactive sentences submitted by at least one audience user in a target live broadcast room, input the interactive sentences into a pre-trained emotion recognition model, and enable the emotion recognition model to output target emotion marks corresponding to the at least one audience user respectively; the generating unit 1102 is configured to generate a first emotion analysis result according to the target emotion mark; the providing unit 1103 is configured to provide the first emotion analysis result to the anchor of the target live broadcast.
Optionally, in the embodiments respectively corresponding to fig. 10 and 11, the target emotion mark may include any one of the following: neutral, positive, negative emotions. Wherein the positive emotion may comprise any one of: happy, excited, worship, etc. The negative emotion may include any of: anger, sadness, disgust, fear, etc.
Optionally, the generating unit 1002 and/or the generating unit 1102 may be further configured to: counting the occurrence frequency respectively corresponding to different target emotion marks; generating a first emotion analysis result, wherein the first emotion analysis result comprises the target emotion marks which are different from each other and at least one of the following items: the ratio of the frequency of occurrence, the frequency of occurrence and the total frequency of occurrence of the mutually different target emotion marks.
Optionally, the generating unit 1002 and/or the generating unit 1102 may be further configured to: acquiring a second emotion analysis result generated in the live broadcast process; and updating the second emotion analysis result according to the target emotion mark, and determining the updated second emotion analysis result as the first emotion analysis result.
Optionally, the apparatus 1000 and/or the apparatus 1100 may further include: an obtaining unit (not shown in the figure) configured to obtain interactive voice data corresponding to the interactive sentences according to the interactive sentences and the target emotion marks after the emotion recognition unit recognizes the target emotion marks corresponding to the audience users; a voice providing unit (not shown in the figure) configured to provide interactive voice data to the anchor.
Optionally, the obtaining unit may be further configured to: and inputting the interactive sentences and the target emotion marks into the target emotion voice synthesis model, so that the target emotion voice synthesis model outputs interactive voice data.
Optionally, the obtaining unit may be further configured to: and determining a target emotion voice synthesis model from a plurality of emotion voice synthesis models trained in advance according to the target emotion mark.
Optionally, the preset first emotion mark group may correspond to at least one dialect, the first emotion mark belongs to a negative emotion, and the plurality of emotion speech synthesis models may include emotion speech synthesis models respectively corresponding to the at least one dialect; and the above-mentioned obtaining unit may be further configured to: if the target emotion mark is contained in the first emotion mark group, one dialect is selected from the at least one dialect, and an emotion voice synthesis model corresponding to the dialect is determined as a target emotion voice synthesis model. Wherein any of the first affective markers can include happy or excited, etc.
Optionally, the voice providing unit may be further configured to: after providing the interactive voice data to the anchor, providing a target voice template to the anchor in response to the target emotion mark being included in a preset second emotion mark group, wherein any one of the second emotion marks belongs to the positive emotion, and the target voice template expresses the positive emotion. Further, the second affective marker can include happy or excited, etc.
Alternatively, the target emotion speech synthesis model may correspond to the sample utterance object and a target speech template obtained by recording a sound read out of the sample text template by the sample utterance object, the sample text template expressing the positive emotion.
With further reference to fig. 12, the present specification provides an embodiment of a live-air-based information processing apparatus, which can be applied to the server 104, the terminal device 105 or the main broadcast version APP installed on the terminal device 105 as shown in fig. 1.
As shown in fig. 12, the information processing apparatus 1200 based on the live room of the present embodiment includes: emotion recognition unit 1201, acquisition unit 1202, and providing unit 1203. The emotion recognition unit 1201 is configured to respond to the acquisition of an interactive statement submitted by an audience user in a live telecast room, and recognize a target emotion mark corresponding to the audience user according to the interactive statement; the obtaining unit 1202 is configured to obtain interactive voice data corresponding to the interactive sentence according to the interactive sentence and the target emotion mark; the providing unit 1203 is configured to provide the interactive voice data to the anchor of the target live broadcast room.
With further reference to fig. 13, the present specification provides an embodiment of an information processing apparatus based on an e-commerce live broadcast room, which may be applied to a server, an anchor version APP or a terminal device where the anchor version APP is located in an e-commerce live broadcast scene.
As shown in fig. 13, an information processing apparatus 1300 based on a live telecast of the present embodiment includes: an emotion recognition unit 1301, a generation unit 1302, and a provision unit 1303. The emotion recognition unit 1301 is configured to respond to the acquisition of an interactive statement submitted by at least one audience user in a live television broadcast room, and recognize target emotion marks corresponding to the at least one audience user respectively according to the interactive statement; the generating unit 1302 is configured to generate a first emotion analysis result according to the target emotion mark; the providing unit 1303 is configured to provide the first emotion analysis result to the anchor of the e-commerce live broadcast room.
With further reference to fig. 14, the present specification provides an embodiment of an information processing apparatus based on a government affairs live broadcast room, which can be applied to a server, a anchor version APP or a terminal device where the anchor version APP is located in a government affairs live broadcast scenario.
As shown in fig. 14, the information processing apparatus 1400 based on the government affair live broadcast room of the present embodiment includes: emotion recognition section 1401, generation section 1402, and providing section 1403. The emotion recognition unit 1401 is configured to respond to the obtained interactive statements submitted by at least one audience user in the government affair live broadcast room, and recognize target emotion marks respectively corresponding to the at least one audience user according to the interactive statements; the generating unit 1402 is configured to generate a first emotion analysis result according to the target emotion mark; the providing unit 1403 is configured to provide the first emotion analysis result to the anchor of the government affair live broadcast.
With further reference to fig. 15, the present specification provides an embodiment of an information processing apparatus based on an education live broadcast room, which may be applied to a server, an anchor version APP or a terminal device where the anchor version APP is located in an education live broadcast scene.
As shown in fig. 15, the information processing apparatus 1500 based on the education live room of the present embodiment includes: emotion recognition section 1501, generation section 1502, and providing section 1503. The emotion recognition unit 1501 is configured to, in response to acquiring interactive statements submitted by at least one audience user in a live education room, recognize target emotion marks respectively corresponding to the at least one audience user according to the interactive statements; generating unit 1502 is configured to generate a first emotion analysis result from the target emotion markup; providing unit 1503 is configured to provide the first emotion analysis result to the anchor of the education live room.
With further reference to fig. 16, the present specification provides an embodiment of an information processing apparatus based on a live conference room, which may be applied to a server, an anchor APP, or a terminal device where the anchor APP is located in a live conference scene.
As shown in fig. 16, the information processing apparatus 1600 based on the live conference room of the present embodiment includes: emotion recognition section 1601, generation section 1602, and providing section 1603. The emotion recognition unit 1601 is configured to, in response to acquiring an interactive statement submitted by at least one viewer user in a live conference room, recognize target emotion marks respectively corresponding to the at least one viewer user according to the interactive statement; the generating unit 1602 is configured to generate a first emotion analysis result according to the target emotion mark; the providing unit 1603 is configured to provide the first emotion analysis result to the anchor of the live conference room.
In the device embodiments corresponding to fig. 10-16, the detailed processing of each unit and the technical effects thereof can refer to the related descriptions in the corresponding method embodiments, and are not repeated herein.
The present specification also provides a computer readable storage medium, on which a computer program is stored, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the methods respectively described in the above method embodiments.
The present specification further provides a computing device, including a memory and a processor, where the memory stores executable codes, and the processor executes the executable codes to implement the methods respectively described in the above method embodiments.
The present specification embodiment also provides a computer program product, which when executed on a data processing apparatus, causes the data processing apparatus to implement the described methods respectively shown in the above method embodiments.
Those skilled in the art will recognize that the functionality described in the various embodiments disclosed herein may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the embodiments disclosed in the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the embodiments disclosed in the present specification, and are not intended to limit the scope of the embodiments disclosed in the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments disclosed in the present specification should be included in the scope of the embodiments disclosed in the present specification.

Claims (26)

1. An information processing method based on a live broadcast room comprises the following steps:
in response to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, inputting the interactive sentences into a pre-trained emotion recognition model, and enabling the emotion recognition model to output target emotion marks corresponding to the at least one audience user respectively;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to the anchor of the target live broadcast room.
2. The method of claim 1, wherein the generating a first sentiment analysis result from the target sentiment tag comprises:
counting the occurrence frequency respectively corresponding to different target emotion marks;
generating a first emotion analysis result, wherein the first emotion analysis result comprises the target emotion marks which are different from each other and at least one of the following items: the ratio of the frequency of occurrence, the frequency of occurrence and the total frequency of occurrence of the target emotion marks different from each other.
3. The method of claim 1, wherein the generating a first sentiment analysis result from the target sentiment tag comprises:
acquiring a second emotion analysis result generated in the live broadcast process;
and updating the second emotion analysis result according to the target emotion mark, and determining the updated second emotion analysis result as the first emotion analysis result.
4. The method of claim 1, wherein after causing the emotion recognition model to output the target emotion markup respectively corresponding to the at least one audience user, the method further comprises:
acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark;
providing the interactive voice data to the anchor.
5. The method of claim 4, wherein the obtaining of the interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark comprises:
inputting the interactive sentences and the target emotion marks into a target emotion voice synthesis model, so that the target emotion voice synthesis model outputs the interactive voice data.
6. The method of claim 5, wherein prior to said inputting said interactive statement and said target emotion markup into a target emotion speech synthesis model, said method further comprises:
and determining the target emotion voice synthesis model in a plurality of emotion voice synthesis models trained in advance according to the target emotion mark.
7. The method of claim 6, wherein the preset first emotion mark group corresponds to at least one dialect, the first emotion mark belongs to a negative emotion, and the plurality of emotion voice synthesis models comprise emotion voice synthesis models corresponding to the at least one dialect respectively; and
the step of determining the target emotion voice synthesis model in a plurality of emotion voice synthesis models trained in advance according to the target emotion mark comprises the following steps:
and if the target emotion mark is contained in the first emotion mark group, selecting one dialect from the at least one dialect, and determining an emotion voice synthesis model corresponding to the dialect as the target emotion voice synthesis model.
8. The method of any of claims 4-7, wherein after said providing said interactive voice data to said anchor, said method further comprises:
and responding to the target emotion mark contained in a preset second emotion mark group, and providing a target voice template to the anchor, wherein the second emotion mark belongs to the positive emotion, and the target voice template expresses the positive emotion.
9. The method of claim 5, wherein the target emotion speech synthesis model corresponds to a sample utterance object and a target speech template obtained by recording a sound of the sample utterance object reading out a sample text template expressing a positive emotion.
10. The method of claim 1, wherein the target emotion marker comprises any of: neutral, positive, negative emotions.
11. The method of claim 10, wherein,
the positive emotions include any one of: happy, excited and worship;
the negative emotion comprises any one of the following: anger, sadness, disgust, fear.
12. An information processing method based on a live broadcast room comprises the following steps:
in response to the fact that interactive statements submitted by at least one audience user in a target live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to the anchor of the target live broadcast room.
13. An information processing method based on a live broadcast room comprises the following steps:
in response to the fact that an interactive statement submitted by an audience user in a target live broadcast room is obtained, a target emotion mark corresponding to the audience user is identified according to the interactive statement;
acquiring interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark;
and providing the interactive voice data to the anchor of the target live broadcast room.
14. An information processing method based on an E-commerce live broadcast room comprises the following steps:
in response to the fact that interactive statements submitted by at least one audience user in a live television broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to a main broadcast of the E-commerce live broadcast room.
15. An information processing method based on a government affair live broadcast room comprises the following steps:
in response to the fact that interactive statements submitted by at least one audience user in a government affair live broadcasting room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to a main broadcasting of the government affair live broadcasting room.
16. An information processing method based on an education live broadcast room comprises the following steps:
in response to the fact that interactive statements submitted by at least one audience user in an education live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to a main broadcast of the education live broadcast room.
17. An information processing method based on a conference live room comprises the following steps:
in response to the fact that interactive statements submitted by at least one audience user in a conference live broadcast room are obtained, target emotion marks corresponding to the at least one audience user are identified according to the interactive statements;
generating a first emotion analysis result according to the target emotion mark;
and providing the first emotion analysis result to a main broadcasting of the conference live broadcasting room.
18. An information processing apparatus based on a live broadcast room, comprising:
the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, input the interactive sentences into a pre-trained emotion recognition model, and enable the emotion recognition model to output target emotion marks corresponding to the at least one audience user respectively;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a main broadcasting of the target live broadcasting room.
19. An information processing apparatus based on a live broadcast room, comprising:
the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a target live broadcast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive sentences;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a main broadcasting of the target live broadcasting room.
20. An information processing apparatus based on a live broadcast room, comprising:
the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by audience users in a target live broadcast room are obtained, and target emotion marks corresponding to the audience users are recognized according to the interactive sentences;
the obtaining unit is configured to obtain interactive voice data corresponding to the interactive statement according to the interactive statement and the target emotion mark;
a providing unit configured to provide the interactive voice data to a host of the target live broadcast room.
21. An information processing device based on an E-commerce live broadcast room comprises:
the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a live telecast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive sentences;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a main broadcasting of the live telecast.
22. An information processing device based on a government affair live broadcast room comprises:
the emotion recognition unit is configured to respond to the fact that interactive statements submitted by at least one audience user in a government affair live broadcast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive statements;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a anchor of the government affair live broadcast room.
23. An information processing apparatus based on an education live room, comprising:
the emotion recognition unit is configured to respond to the acquisition of interactive sentences submitted by at least one audience user in an education live broadcast room, and recognize target emotion marks respectively corresponding to the at least one audience user according to the interactive sentences;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a main broadcasting of the education live broadcasting room.
24. An information processing device based on a conference live room comprises:
the emotion recognition unit is configured to respond to the fact that interactive sentences submitted by at least one audience user in a conference live broadcast room are obtained, and target emotion marks corresponding to the at least one audience user respectively are recognized according to the interactive sentences;
a generating unit configured to generate a first emotion analysis result according to the target emotion mark;
a providing unit configured to provide the first emotion analysis result to a director of the live conference room.
25. A computer-readable storage medium, on which a computer program is stored, wherein the computer program causes a computer to carry out the method of any one of claims 1-17 when the computer program is carried out in the computer.
26. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-17.
CN202110057957.XA 2021-01-15 2021-01-15 Information processing method and device based on live broadcast room Pending CN114765033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110057957.XA CN114765033A (en) 2021-01-15 2021-01-15 Information processing method and device based on live broadcast room

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110057957.XA CN114765033A (en) 2021-01-15 2021-01-15 Information processing method and device based on live broadcast room

Publications (1)

Publication Number Publication Date
CN114765033A true CN114765033A (en) 2022-07-19

Family

ID=82364809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110057957.XA Pending CN114765033A (en) 2021-01-15 2021-01-15 Information processing method and device based on live broadcast room

Country Status (1)

Country Link
CN (1) CN114765033A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884392A (en) * 2023-09-04 2023-10-13 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884392A (en) * 2023-09-04 2023-10-13 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis
CN116884392B (en) * 2023-09-04 2023-11-21 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis

Similar Documents

Publication Publication Date Title
CN107203953B (en) Teaching system based on internet, expression recognition and voice recognition and implementation method thereof
US9621851B2 (en) Augmenting web conferences via text extracted from audio content
JP2022534708A (en) A Multimodal Model for Dynamically Reacting Virtual Characters
JP2021168139A (en) Method, device, apparatus and medium for man-machine interactions
CN114125492B (en) Live content generation method and device
CN111711834B (en) Recorded broadcast interactive course generation method and device, storage medium and terminal
CN108877803B (en) Method and apparatus for presenting information
CN106027485A (en) Rich media display method and system based on voice interaction
KR20180105861A (en) Foreign language study application and foreign language study system using contents included in the same
CN109326151A (en) Implementation method, client and server based on semantics-driven virtual image
CN110046290B (en) Personalized autonomous teaching course system
CN104932862A (en) Multi-role interactive method based on voice recognition
CN114201596A (en) Virtual digital human use method, electronic device and storage medium
CN114765033A (en) Information processing method and device based on live broadcast room
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN113253838A (en) AR-based video teaching method and electronic equipment
CN110781327B (en) Image searching method and device, terminal equipment and storage medium
CN113963306B (en) Courseware title making method and device based on artificial intelligence
US20220208190A1 (en) Information providing method, apparatus, and storage medium, that transmit related information to a remote terminal based on identification information received from the remote terminal
CN114449297A (en) Multimedia information processing method, computing equipment and storage medium
CN112309390A (en) Information interaction method and device
CN110059231B (en) Reply content generation method and device
US20240153398A1 (en) Virtual meeting coaching with dynamically extracted content
ZHANG et al. AidServer: Design and Research of a Communication Accessibility Service System for Hearing Impaired Servers in Silent Restaurants
Constable et al. Modelling Conversation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination