CN116866646A

CN116866646A - Information processing method, device and readable storage medium

Info

Publication number: CN116866646A
Application number: CN202310882929.0A
Authority: CN
Inventors: 王峰; 冯健峰; 陆舜健; 付方全; 冯帆
Original assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-10-10

Abstract

The application provides an information processing method, equipment and a readable storage medium, wherein the method comprises the following steps: acquiring state information of a professional content producer in a live video, wherein the state information is determined according to emotion characteristics of the professional content producer; and sending the state information to the second equipment, and obtaining feedback information of the user corresponding to the state information. The scheme of the application realizes the effective interaction between the host and the user.

Description

Information processing method, device and readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to an information processing method, an information processing device, and a readable storage medium.

Background

The live video broadcast is live broadcast by utilizing the Internet and streaming media technology, and a strong field feel is created through real and vivid transmission, eyeballs are attracted, and the transmission effect of impressing and lasting memory is achieved.

However, in the existing live broadcast, especially live broadcast of professional production content (PGC) with comments such as sports, since professional hosting needs to meet the requirement of broadcasting control, user feedback information needs to be obtained from other approaches, so that the user feedback information is used to interact with the user in the process of commenting.

However, cross-platform communication between the user and the host cannot meet the communication requirement that the user wants to grasp user feedback in the explanation process, or the requirement that the host wants to grasp user feedback in the explanation process.

Disclosure of Invention

The application aims to provide an information processing method, information processing equipment and a readable storage medium, so as to realize effective interaction between a host and a user.

To achieve the above object, an embodiment of the present application provides an information processing method, including:

acquiring state information of a professional content producer in a live video, wherein the state information is determined according to emotion characteristics of the professional content producer;

and sending the state information to the second equipment, and obtaining feedback information of the user corresponding to the state information.

Optionally, the acquiring the status information of the professional content producer in the live video includes:

according to the emotion characteristics, obtaining fusion weights and distinguishing scores corresponding to the emotion characteristics respectively;

acquiring state prediction information of the professional content producer according to the fusion weight and the distinguishing score;

and determining the state information of the professional content producer according to the state prediction information.

Optionally, obtaining the distinguishing scores respectively corresponding to the emotion features according to the emotion features, including:

inputting the emotion characteristics into a discriminator to obtain the discrimination score output by the discriminator;

the distinguishing device is provided with a plurality of different emotion states, and the distinguishing score is used for representing the emotion states corresponding to the emotion characteristics.

Optionally, obtaining the fusion weight according to the emotion feature includes:

processing the emotion characteristics to obtain target characteristics;

and obtaining the fusion weight through the target feature and the context perception weight function.

Optionally, the fusion weight is a matrix of n×m, and the element W in the matrix _i，j A weight representing the jth emotional characteristic at the ith emotional state; wherein M, N is an integer greater than 0, N is equal to the number of emotional states, and M is equal to the number of emotional features.

Optionally, the state prediction information comprises prediction information associated with different of the emotional states;

the obtaining the state prediction information of the professional content producer according to the fusion weight and the distinguishing score comprises the following steps:

calculating a weighted value of the corresponding weight in the distinguishing score and the fusion weight aiming at each emotion state;

and respectively adding the weighted values corresponding to each emotion state to obtain the prediction information corresponding to each emotion state.

Optionally, the state information is used for indicating the second device to highlight the interaction identifier corresponding to the state information.

To achieve the above object, an embodiment of the present application provides an information processing apparatus including:

the acquisition module is used for acquiring state information of a professional content producer in the live video, wherein the state information is determined according to emotion characteristics of the professional content producer;

and the processing module is used for sending the state information to the second equipment and obtaining feedback information of the user corresponding to the state information.

To achieve the above object, an embodiment of the present application provides an information processing apparatus including a transceiver, a processor, a memory, and a program or instructions stored on the memory and executable on the processor; the processor, when executing the program or instructions, implements the information processing method as described above.

To achieve the above object, an embodiment of the present application provides a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps in the information processing method as described above.

The technical scheme of the application has the following beneficial effects:

according to the method provided by the embodiment of the application, the state information of the professional content producer can be determined according to the emotion characteristics of the professional content producer in the live video, and the state information is informed to the second equipment, so that the feedback information of the user corresponding to the state information is obtained, the professional content producer can know the feedback of the user on the current emotion state of the user, and the effective interaction between the host and the user is realized.

Drawings

FIG. 1 is a flowchart of an information processing method according to an embodiment of the present application;

FIG. 2 is a second flowchart of an information processing method according to an embodiment of the application;

FIG. 3 is a schematic illustration of an embodiment of the present application;

FIG. 4 is a schematic diagram showing one of the following;

FIG. 5 is a second schematic diagram;

FIG. 6 is a third schematic diagram;

FIG. 7 is a fourth schematic diagram;

fig. 8 is a block diagram of an information processing apparatus according to an embodiment of the present application;

fig. 9 is a structural diagram of an information processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present application, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In addition, the terms "system" and "network" are often used interchangeably herein.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

As shown in fig. 1, an information processing method according to an embodiment of the present application is performed by a first device, and includes:

step 101, acquiring state information of a professional content producer in a live video, wherein the state information is determined according to emotion characteristics of the professional content producer;

step 102, sending the state information to the second device, and obtaining feedback information of the user corresponding to the state information.

According to the steps, the first device can determine the state information of the professional content producer aiming at the emotion characteristics of the professional content producer in the live video, and inform the state information to the second device, so that the feedback information of the user corresponding to the state information is obtained, the professional content producer can know the feedback of the user to the current emotion state of the user, and the effective interaction between the host and the user is realized.

The professional content producer can be a host or a guest or a performer in the live video. The following is exemplified by a moderator.

Alternatively, in this embodiment, the first device may obtain the status information (BroadcastStatus) of the presenter based on the set times, and obtain BroadcastStatus of the live presenter once at the end time of each set time, for example, broadcastStatus1 at the end time of T1, broadcastStatus2 at the end time of T2, and BroadcastStatus3 at the end time of T3.

Wherein the set time may be periodic.

It should also be appreciated that this embodiment presets various emotional states, such as happiness, sadness, anger, fear, surprise and aversion, and that the obtained state information may indicate which emotional state the professional content producer is currently in particular.

Optionally, the emotional features include one or more of an expressive feature, an action feature, and a sound feature. Of course, the emotional characteristics are not limited to the characteristics of the above examples. Optionally, in the case that the emotional characteristics include one or more of an expression characteristic, an action characteristic, and a sound characteristic, the method further includes at least one of:

extracting expression characteristics of a professional content producer in the live video through facial expression recognition;

extracting action characteristics of a professional content producer in the live video through action detection;

and extracting the sound characteristics of the professional content producer in the live video through sound detection.

That is, for the current live video, the first device may perform facial expression recognition on the professional content producer to extract its expression feature x ^f Extracting its motion feature x by motion detection ^m And extracting its sound feature x by sound detection ^a Providing a basis for determining status information of the professional content producer.

Optionally, as shown in fig. 2, in this embodiment, the acquiring status information of the professional content producer in the live video includes:

step 201, according to the emotion characteristics, obtaining fusion weights and distinguishing scores corresponding to the emotion characteristics respectively;

step 202, obtaining state prediction information of the professional content producer according to the fusion weight and the distinguishing score;

step 203, determining the status information of the professional content producer according to the status prediction information.

That is, based on the emotional characteristics of the professional content producer in the live video, the first device can obtain the fusion weight W, the discrimination scores corresponding to the different emotional characteristics through steps 201-202, for example, the emotional characteristics include the expressive characteristics, the action characteristics and the sound characteristics, and the discrimination score includes the discrimination score sc corresponding to the expressive characteristics ^f Discrimination score sc corresponding to motion feature ^m Discrimination score sc corresponding to sound feature ^a The method comprises the steps of carrying out a first treatment on the surface of the Then, use is made of W, sc ^f 、sc ^m 、sc ^a Obtaining state prediction information of the professional content producer; finally, status information of the moderator is determined based on the status prediction information.

Optionally, in this embodiment, obtaining, according to the emotional characteristics, discrimination scores respectively corresponding to the emotional characteristics includes:

inputting the emotion characteristics into a discriminator to obtain a discrimination score output by the discriminator;

That is, in the case where the discriminator is provided with a plurality of different emotion states, different emotion features such as expression features, motion features and sound features are input into the discriminator, respectively, that is, the discrimination score sc corresponding to the expression features can be obtained, respectively ^f Discrimination score sc corresponding to motion feature ^m Discrimination score sc corresponding to sound feature ^a 。

The first device may be configured to sequentially obtain the distinguishing scores of the emotional features by using one discriminator, or may be configured to obtain the distinguishing scores of the emotional features in parallel by using a plurality of discriminators, so as to indicate the emotional states corresponding to the emotional features.

For example, as shown in fig. 3, the first device is configured with a discriminator F for the expression feature, the motion feature, and the sound feature, respectively ^f () Divider F ^m () Divider F ^a (). Expression feature input F ^f () Output to obtain sc ^f I.e. sc ^f ＝F ^f (x ^f ) The method comprises the steps of carrying out a first treatment on the surface of the Motion feature input F ^m () Output to obtain sc ^m I.e. sc ^m ＝F ^m (x ^m ) The method comprises the steps of carrying out a first treatment on the surface of the Sound feature input F ^a () Output to obtain sc ^a I.e. sc ^a ＝F ^a (x ^a ). Wherein the separator F ^f () Divider F ^m () Divider F ^a () Can be set as a six-member discriminator with reference to a plurality of emotional states previously set, such as happiness, sadness, anger, fear, surprise and aversion. Thus sc ^f Can indicate what the emotional state of the expressive feature is, sc ^m Can indicate what the emotional state of the motion feature is, sc ^a Which of the emotional states of the sound features is can be indicated.

Optionally, in this embodiment, obtaining the fusion weight according to the emotional characteristic includes:

processing the emotion characteristics to obtain target characteristics;

Here, the processing of the emotional features may be that the emotional features such as the expression features, the action features and the sound features are connected in series by a serializer, as shown in fig. 3, to obtain the target feature x ^f ⊕x ^m ⊕x ^a . And obtaining a fusion weight W through a context awareness weight function G ().

Where G () is a context-aware weight function that is learned by a validation set and then determined during testing, with multiple (e.g., 18 if six emotional states are preset) output neurons. G () is able to learn fusion weights from the video material itself, i.e. different emotion features versus state prediction information sc, through context awareness ^c The contribution of (2) is highly dependent on the image context. For example, when the explanation of the limb movement is not frequent, the limb movement is started to have a larger amplitude, sc is determined ^c Limb movements (i.e. movement characteristics) should dominate at this time; when there is a large change in speech intonation while maintaining a smooth interpretation of consistent speech intonation, then sc is determined ^c Speech utterances (i.e., sound features) should dominate.

Of course, the overall loss function can also be determined based on the discrimination scores of the different emotion features:

Loss＝λ _c L(sc ^c )+λ _f L(sc ^f )+λ _m L(sc ^m )+λ _a L(sc ^a ) Where L () is the cross entropy loss function, λ _c ，λ _f ，λ _m ，λ _a Is the contribution of the overall loss function, and is usually set to 1.

Wherein w=g (x ^f ⊕x ^m ⊕x ^a )＝[w^ ^f ，w^ ^m ，w^ ^a ]. That is, the context-aware weighting function may output weights w≡for different emotion features ^f ，w^ ^m And w) ^a 。

In this embodiment, the weight of each emotional characteristic also includes weights corresponding to different emotional states. In order to preset six emotional states (happy,Sadness, anger, fear, surprise and aversion) as an example, w ^f ，w^ ^m And w) ^a Each comprising six weights.

Optionally, in this embodiment, the fusion weight is a matrix of n×m, and element W in the matrix _i，j A weight representing the jth emotional characteristic at the ith emotional state; wherein M, N is an integer greater than 0, N is equal to the number of emotional states, and M is equal to the number of emotional features.

That is, W is a matrix of N.times.M, and the elements W in the matrix _i，j I.e. the weight of the i-th emotional state, j-th emotional feature.

Alternatively, in this embodiment, for the output of G (), the fusion weight W will also be obtained by normalization processing. In particular, the method comprises the steps of,wherein (1)>Is the weight of the ith emotional state, jth emotional feature of the G () output.

Optionally, the state prediction information includes prediction information associated with different ones of the thread states;

and respectively adding the weighted values corresponding to each emotion state to obtain prediction information corresponding to each emotion state.

Through the steps, the obtained state prediction information is a plurality of scores obtained by fusing different emotion characteristics and calculating according to the fusion weight W obtained by learning from the graph. Here, the score represents a probability that the actual emotional state of the professional content producer is the emotional state to which the score corresponds.

Wherein for the ith conditionThe weighting value of the end state and the jth emotion feature can be W _i，j *sc ^y ，sc ^y Is the distinguishing score corresponding to the jth emotion feature. Correspondingly, for the ith emotional state, the predicted information is W _i，1 *sc ¹ +W _i，2 *sc ² +…+W _i，M *sc ^Q Q=m. For example, in the case where six emotional states (happy, sad, anger, fear, surprise, and aversion) are preset, the emotional features include an expression feature, an action feature, and a sound feature, 6 pieces of prediction information are calculated, each piece of prediction information is obtained by multiplying the expression feature by a corresponding weight to obtain a first weight value, multiplying the action feature by the corresponding weight to obtain a second weight value, multiplying the sound feature by the corresponding weight to obtain a third weight value, and adding the 3 weight values. Assuming that i=1 indicates that the emotional state is happy, its associated prediction information passes through formula W _1,1 *sc ^f +W _1,2 *sc ^m +W _1,3 *sc ^a At this time, the prediction information can represent the probability that the actual emotional state of the presenter is happy.

Wherein, in the process of calculating the weighted value by distinguishing the score from the corresponding weight, the calculation result can be quantized.

Optionally, the determining the status information of the presenter according to the status prediction information includes:

ranking the predictive information associated with different ones of the emotional states;

and determining the state information of the presenter according to the emotion state corresponding to the largest predicted information after sequencing.

In this way, the first device can select the emotional state corresponding to the maximum value as the state information of the presenter obtained by the present process through the prediction information obtained in the above manner.

Specifically, highlighting is performed according to a preset rule. In one embodiment, after receiving the status information, the second device, such as the user device, displays the interactive identifier corresponding to the status information in the live video according to a preset rule, so as to highlight the emotion of the presenter. For example, in the case where six emotional states (happy, sad, anger, fear, surprise, and disgust) are set in advance, six live host states of happy, sad, anger, fear, surprise, disgust are laterally arranged on the progress bar. If the current live host status is happy, as shown in fig. 4, the corresponding tag (interactive identifier) of happy is big tag and can be clicked, and some additional effects such as playing a preset dynamic image can be provided during the clicking. When the display interface is shown in fig. 5, that is, the current state of the live host is changed from happy to sad, the happy tag becomes smaller and can not be clicked, and the sad corresponding tag (interactive identifier) is a big tag and can be clicked. Wherein, clicking the interactive mark by the user is feedback of the user.

In this embodiment, the first device generates feedback information based on the feedback of the user, so as to display the feedback information to the professional content producer, and the professional content producer can realize effective interaction with the user, such as changing the emotional state, by knowing the feedback of the user on the emotional state. Wherein the first device is an information processing device. As an alternative implementation, the first device is a director device or a live device.

The feedback information of the user may be a statistical result generated by counting the number of clicks of the user by the first device after the user clicks the interactive identifier. The first device may display the statistics directly, or may send the statistics information to the studio device to display the statistics, so that the professional content producer sees the click rate corresponding to the emotional state in the studio, so as to interact with the user, where the studio device displays as shown in fig. 6.

Of course, the director can trigger the control information through the first device or the studio device, so that the first device sends the statistics result information of the feedback information of the user to the second device, the existing live broadcast picture is switched to the audience live broadcast state statistics picture, the user can know the watching experience of the audience watching live broadcast together with the user by using a direct display method, and the second device displays the watching experience as shown in fig. 7.

In summary, based on the unified measurement and expression recognition of non-language information, namely emotion characteristics (comprising actions, expressions and voice tones (sounds)), of the live professional content producer, corresponding interactive contents and interactive modes are provided for the user to interact, so that the professional content producer and the user can effectively interact, and the interests and the attention of the user are improved.

As shown in fig. 8, an information processing apparatus of an embodiment of the present application includes:

an obtaining module 810, configured to obtain status information of a professional content producer in a live video, where the status information is determined according to emotional characteristics of the professional content producer;

and the processing module 820 is configured to send the status information to the second device, and obtain feedback information corresponding to the status information by the user.

Optionally, the acquiring module includes:

the first processing unit is used for obtaining fusion weights and distinguishing scores corresponding to the emotion characteristics respectively according to the emotion characteristics;

the second processing unit is used for obtaining state prediction information of the professional content producer according to the fusion weight and the distinguishing score;

and the third processing unit is used for determining the state information of the professional content producer according to the state prediction information.

Optionally, the first processing unit is further configured to:

processing the emotion characteristics to obtain target characteristics;

Optionally, the state prediction information comprises prediction information associated with different of the emotional states; the second processing unit is also used for

For each emotional state, calculating a weighted value of the corresponding weight in each distinguishing score and the fusion weight;

Optionally, the third processing unit is further configured to

Ordering the prediction information associated with the different emotion states;

and determining the state information of the professional content producer according to the emotion state corresponding to the largest predicted information after sequencing.

The device can determine the state information of the professional content producer aiming at the emotion characteristics of the professional content producer in the live video, and inform the state information to the second device, so that the feedback information of the user corresponding to the state information is obtained, the professional content producer can know the feedback of the user to the current emotion state of the user, and the effective interaction between the host and the user is realized.

It should be noted that, the apparatus applies the above information processing method, and the implementation manner of the embodiment of the above information processing method is applicable to the apparatus, so that the same technical effects can be achieved.

As shown in fig. 9, an information processing apparatus of an embodiment of the present application includes a transceiver 910, a processor 900, a memory 920, and a program or instructions stored on the memory 920 and executable on the processor 900; the processor 900, when executing the program or instructions, implements the information processing method performed by the first device described above.

The transceiver 910 is configured to receive and transmit data under the control of the processor 900.

Wherein in fig. 9, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 900 and various circuits of memory represented by memory 920, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 910 may be a number of elements, i.e., include a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 900 is responsible for managing the bus architecture and general processing, and the memory 920 may store data used by the processor 900 in performing operations.

The readable storage medium of the embodiment of the present application stores a program or an instruction, which when executed by a processor, implements the steps in the information processing method described above, and can achieve the same technical effects, and is not described herein again for avoiding repetition.

Wherein the processor is a processor in the information processing apparatus described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk or an optical disk.

It is further noted that the second devices described in this specification include, but are not limited to, smartphones, tablets, etc., and that many of the functional components described are referred to as modules in order to more particularly emphasize their implementation independence.

In an embodiment of the application, the modules may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the module and achieve the stated purpose for the module.

Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified within modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices.

Where a module may be implemented in software, taking into account the level of existing hardware technology, a module may be implemented in software, and one skilled in the art may, without regard to cost, build corresponding hardware circuitry, including conventional Very Large Scale Integration (VLSI) circuits or gate arrays, and existing semiconductors such as logic chips, transistors, or other discrete components, to achieve the corresponding functions. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

The exemplary embodiments described above are described with reference to the drawings, many different forms and embodiments are possible without departing from the spirit and teachings of the present application, and therefore, the present application should not be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will convey the scope of the application to those skilled in the art. In the drawings, the size of the elements and relative sizes may be exaggerated for clarity. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Unless otherwise indicated, a range of values includes the upper and lower limits of the range and any subranges therebetween.

While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. An information processing method, characterized by comprising:

2. The method of claim 1, wherein the obtaining status information of the professional content producer in the live video comprises:

3. The method of claim 2, wherein obtaining discrimination scores respectively corresponding to the emotional characteristics from the emotional characteristics comprises:

4. The method of claim 2, wherein obtaining fusion weights from the emotional characteristics comprises:

processing the emotion characteristics to obtain target characteristics;

5. The method according to claim 2 or 4, wherein the fusion weight is a matrix of N x M, and the element W in the matrix _i，j A weight representing the jth emotional characteristic at the ith emotional state; wherein M, N is an integer greater than 0, N is equal to the number of emotional states, and M is equal to the number of emotional features.

6. The method of claim 5, wherein the state prediction information comprises prediction information associated with different ones of the emotional states;

7. The method of claim 1, wherein the status information is used to instruct the second device to highlight an interaction identifier corresponding to the status information.

8. An information processing apparatus, characterized by comprising:

9. An information processing apparatus comprising: a transceiver, a processor, a memory, and a program or instructions stored on the memory and executable on the processor; the information processing method according to any one of claims 1 to 7, characterized in that the processor executes the program or instructions.

10. A readable storage medium having stored thereon a program or instructions, which when executed by a processor, realizes the steps in the information processing method according to any of claims 1-7.