CN106612465B

CN106612465B - Live broadcast interaction method and device

Info

Publication number: CN106612465B
Application number: CN201611198395.6A
Authority: CN
Inventors: 蔡毅
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2016-12-22
Filing date: 2016-12-22
Publication date: 2020-01-03
Anticipated expiration: 2036-12-22
Also published as: CN110708607A; CN110708607B; CN106612465A

Abstract

The application provides a live broadcast interaction method and a live broadcast interaction device, wherein the method comprises the following steps: in the live broadcast process, determining whether a user initiates a speaking request; when a user initiates a speaking request, displaying a plurality of preset speaking messages in the live broadcast interface; and when the preset speech message is detected to be triggered, carrying out live broadcast interactive processing on the preset speech message. According to the embodiment of the application, the speech message is intelligently recommended, so that the user can rapidly speak, the text editing operation of the user is reduced, the input time of the user in the live broadcast interaction process is reduced, and the live broadcast interaction processing is more intelligent.

Description

Live broadcast interaction method and device

Technical Field

The application relates to the technical field of internet, in particular to a live broadcast interaction method and device.

Background

The network live broadcast technology is an internet technology that a server side broadcasts live video data of a main broadcast user to a plurality of audience users for watching. In the related art, the live broadcast service provider provides a function for the audience user to interact with the anchor user. For example, a comment function is usually provided in a live broadcast interface, and a specific mode is usually to display an input control, so that a viewer user can edit a text through the input control in a live broadcast process to input a comment message, a client can send the comment message to a live broadcast server, and the live broadcast server can broadcast the comment message to each client, so that the viewer can interact with a main broadcast. The intelligent level of the interaction mode is low, more operations are needed by the user, and the enthusiasm of the audience user for participating in the interaction is reduced to a certain degree.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a live broadcast interaction method and a live broadcast interaction device.

According to a first aspect of an embodiment of the present application, a live broadcast interaction method is provided, where the method includes:

in the live broadcast process, determining whether a user initiates a speaking request;

when a user is determined to initiate a speaking request, acquiring a plurality of preset speaking messages, and displaying the preset speaking messages in the live broadcast interface;

and when the preset speech message is detected to be triggered, carrying out live broadcast interactive processing on the preset speech message.

According to a second aspect of the embodiments of the present application, there is provided a live broadcast interaction apparatus, including:

a talk request determination module to: in the live broadcast process, determining whether a user initiates a speaking request;

a talk message presentation module to: when a user is determined to initiate a speaking request, acquiring a plurality of preset speaking messages, and displaying the preset speaking messages in the live broadcast interface;

an interactive processing module, configured to: and when the preset speech message is detected to be triggered, carrying out live broadcast interactive processing on the preset speech message.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the method and the system, the client side can display a plurality of preset speech messages for the user to select when the user is determined to initiate the speech request in the live broadcast process, the user can trigger the required preset speech messages, and the preset speech messages are sent to the live broadcast server side as live broadcast interaction messages and are used for the live broadcast server side to conduct live broadcast interaction processing. According to the embodiment of the application, the speech message is intelligently recommended, so that the user can rapidly speak, the text editing operation of the user is reduced, the input time of the user in the live broadcast interaction process is reduced, and the live broadcast interaction processing is more intelligent.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1A is a schematic view of an application scene of a live interaction according to an exemplary embodiment of the present application.

Fig. 1B is a schematic diagram of a live interface in the related art.

Fig. 2A is a flow chart illustrating a live interaction method according to an exemplary embodiment of the present application.

Fig. 2B is a diagram of a live interface shown in the present application according to an example embodiment.

Fig. 2C is a schematic diagram of 3 standard emoticons of the emoticons depicted in the present application, according to an exemplary embodiment.

Fig. 3 is a block diagram of a live interaction device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The scheme of the embodiment of the application can be applied to any scene related to live broadcast interaction, such as live webcast, as shown in fig. 1A, which is an application scene schematic diagram of live broadcast interaction shown in the present application according to an exemplary embodiment, and fig. 1A includes a server serving as a server device, and a smart phone, a tablet computer, and a personal computer serving as a client device. The client device may also be an intelligent device such as a PDA (Personal Digital Assistant), a multimedia player, and a wearable device.

The server in fig. 1A provides live broadcast services to each client, and a user may install a live broadcast client using an intelligent device to obtain the live broadcast services through the live broadcast client, and may also install a browser client using the intelligent device to obtain the live broadcast services by logging in a live broadcast page provided by the server through the browser client. Typically, two types of users are involved in the live broadcast process, one type being the anchor user and the other type being the viewer user. The client provides the anchor live broadcast function and the live broadcast watching function, the anchor user can use the live broadcast function provided by the client to carry out the live broadcast, and the audience user can use the watching function provided by the client to watch the live broadcast content of the anchor user.

In the related art, the live broadcast service provider provides many functions for the audience user to interact with the anchor user, such as a speaking function or a gift-offering function. For the speech function, as shown in fig. 1B, the speech function is a schematic diagram of a live broadcast interface in the related art, generally, an input control is displayed in the live broadcast interface, an audience user can edit a text through the input control, the edited text is used as an interactive message and is sent to a live broadcast server by a client, and the live broadcast server can broadcast the interactive message to each client, so that the speech of the audience can be displayed on a screen of each client in a personalized manner, and the interaction between the audience and a main broadcast is realized. The intelligent level of the interaction mode is low, and the user is required to perform more speech editing operations, so that the enthusiasm of the audience user in participating in the interaction is reduced to a certain degree.

According to the scheme provided by the embodiment of the application, in the live broadcast process, when the user is determined to initiate the speech request, a plurality of preset speech messages are displayed in the live broadcast interface for the user to select, the user can trigger the required preset speech messages, and the preset speech messages are sent to the live broadcast server as live broadcast interaction messages and are used for the live broadcast server to perform live broadcast interaction processing. According to the embodiment of the application, the speech message is intelligently recommended, so that the user can rapidly speak, the text editing operation of the user is reduced, the input time of the user in the live broadcast interaction process is reduced, and the live broadcast interaction processing is more intelligent. Next, examples of the present application will be described in detail.

As shown in fig. 2A, fig. 2A is a flowchart of a live interaction method according to an exemplary embodiment, where the method includes the following steps 201 to 203:

in step 201, during the live broadcast, it is determined whether the user initiates a talk request.

In step 202, when it is determined that a user initiates a speech request, obtaining a plurality of preset speech messages, and displaying the preset speech messages in the live broadcast interface;

in step 203, when it is detected that the preset speech message is triggered, performing live broadcast interactive processing on the preset speech message.

The method can be applied to the client equipment, the client can determine whether the user has a speech demand in the live broadcast process, and if the user initiates the speech request, the speech demand of the user can be responded, and a plurality of preset speech messages are displayed to enable the user to quickly and conveniently select the required speech message, so that live broadcast interaction can be conveniently realized.

In practical applications, how to determine whether the user has a speaking requirement, that is, how to determine whether the user initiates a speaking request, may be implemented in various ways, for example, it may be detected that the user presses some keys of the device, or that the user inputs a preset sliding track, or that the user inputs a preset gesture, or that the user triggers a set control or a set area in a screen in a preset triggering manner, and the like, and the client may provide one or more ways for the user to conveniently initiate the speaking request.

In order to better guide the user to speak and make the user more conveniently initiate the speak request, the determining whether the user initiates the speak request includes:

and displaying a speech trigger object in a live interface, and determining whether a user initiates a speech request by detecting whether the speech trigger object is triggered.

The speaking triggering object is used for detecting the speaking requirement of the user, and the user can trigger conveniently because the speaking triggering object is displayed in the live interface, so that the user can initiate the speaking request conveniently. In practical applications, the utterance trigger object may be specifically an icon, an option, a button, or the like displayed in the live interface. Fig. 2B is a schematic view of a live interface according to an exemplary embodiment shown in the present application, where an utterance trigger object in fig. 2B is specifically an icon, and a user may trigger the object by clicking or pressing the icon.

The client of this embodiment may intelligently recommend an appropriate preset utterance message in the live broadcast interface, the specific number and content may be configured in advance, the preset utterance message may be a common call word, such as "hello", or a phrase set in combination with a live broadcast scene, such as "anchor good beauty", "anchor welcome you", or the like, or may be a common utterance message that is collected in advance and sent by each audience user, and is determined by analyzing the collected data, or the like. As an example, 6 preset talk messages are shown in fig. 2B.

In a live broadcast scene, different live broadcast contents may be involved in the whole live broadcast process, and the speaking messages desired to be sent by the user are different in the face of the different live broadcast contents. For example, when a live broadcast begins, a user will typically want to send a hello message for a call such as "anchor hello", "welcome anchor", etc.; or, in the live broadcasting process, if the host broadcasts a song, the user may want to send a related speaking message such as "the song is good to hear"; alternatively, the anchor is angry, and the user may wish to send a talk message related to "anchor angry," etc. Alternatively, the anchor says "i love you," the user may wish to send a related talk message such as "anchor i love you" and so on.

In this embodiment, obtaining a plurality of preset speech messages may be performed according to live data, that is, the preset speech message displayed by the client may change according to different live data in a live process, and in order to enable a user to speak as needed, the preset speech message displayed by the client may be adapted to live content at this stage, so that the client may accurately recommend a suitable speech message, the user may speak more conveniently, and the intelligent level of live interaction processing is improved, the present application provides the following embodiments:

the preset speaking message is obtained from a preset speaking database, and a plurality of speaking samples are stored in the speaking database.

Selecting a speech sample from the speech database as the preset speech message based on one or more of the following information:

voice information of the anchor user, speech messages sent by other audience users, and facial expression information of the anchor user.

In this embodiment, a speech database may be configured in advance, and a plurality of speech samples may be stored in the database. When a suitable preset speech message is selected from a plurality of speech samples, the speech message of the anchor user, the speech messages sent by other audience users, the facial expression information of the anchor user and other factors can be selected by referring to one or more factors. The specific selection processing mode may be to determine a keyword by using the information, and search a speech message matched with the keyword from the database by using a search algorithm. The specific search method may refer to related technologies, and this embodiment is not described herein again.

The voice information of the anchor user represents the speaking content of the anchor user, the preset speaking message is selected by taking the voice information as a reference factor, the characteristics of a live broadcast scene are better met, and the proper speaking message can be obtained. Specifically, when it is detected that the user initiates a speech request, a speech file within a preset time period, for example, a time period of 3 seconds, 5 seconds, 20 seconds, etc., may be acquired as the speech information. Then, when the keywords are specifically obtained, one way may be to compare the voice information with a preset standard voice, and obtain the keywords by using the standard voice matched with the voice information, where the standard voice is labeled with the corresponding keywords in advance; in other alternative implementation manners, the voice information may be converted into text information, a keyword is identified from the text information, and a proper preset speaking message may be searched from a database by using the identified keyword.

In the live broadcast process, the client can receive the interactive information broadcast by the server and sent by other client users, and interactive display is carried out in the screen. Because the interactive information received by the client and sent by other audience users is actively sent by the audience users, the interactive information is used as a reference factor to select the preset speech message, so that the current live broadcast content can be better met, and the appropriate speech message can be accurately recommended to the user. Specifically, when it is detected that the user initiates a speech request, interaction information sent by other users within a preset time is collected, a keyword is extracted from the interaction information, and a proper preset speech message can be searched from a database by using the identified keyword.

In the live broadcast process, the facial expression information of the anchor user represents the emotion change of the anchor user, and a related preset speech message can be recommended to the user according to the facial expression information of the anchor user. Therefore, the embodiment can acquire the video image of the anchor client in the live broadcast process, identify the anchor facial expression information from the video image, determine the corresponding keywords for the anchor facial expression information, and search out the appropriate preset speech message from the database by using the identified keywords. Specifically, when a user initiates a speech request, a current frame image is acquired from a live video stream, face detection is performed in the frame image by using an image recognition technology, face key points are calibrated by using a face calibration algorithm, an expression calibration point matching a standard is searched for through the calibrated face key points, and the facial expression information of the frame image is determined by using the matched standard expression calibration point. Fig. 2C is a schematic diagram of 3 standard expressive calibration points according to an exemplary embodiment of the present application. For a specific processing procedure, reference may be made to related technologies, which are not described herein again.

In the three processing modes, an appropriate utterance sample is selected from an utterance database based on the three types of information to serve as a preset utterance message. In practical applications, the utterance samples may be respectively configured in advance according to the characteristics of the three types of information, and stored in the utterance database. In the selection, all the utterance samples stored in the utterance database may be selected, or a plurality of corresponding utterance samples may be searched based on each type of information.

In order to improve the selection efficiency under the condition that a great number of speech samples are preset based on each type of information, in an optional implementation manner, the speech database may be provided with corresponding sub-databases for the voice information, the speech message and the facial expression information; and selecting a speech sample from the speech database as the preset speech message based on the voice information, the speech message or the facial expression information, and selecting from the corresponding sub-database.

In this embodiment, the speech database has corresponding sub-databases respectively set for the three types of information, and the keyword corresponding to each type of information can be searched in the corresponding sub-database, so that the search efficiency can be improved, and the performance of live broadcast interactive processing can be improved.

According to the multiple reference factors, the client can acquire more preset speech messages, and when each preset speech message is displayed in the live broadcast interface, the specific display mode can be flexibly configured, for example, each preset speech message can be randomly displayed in a list mode, or displayed in a word sequence according to the speech message, and the like.

In order to enable a user to select preset speech messages more quickly, when a plurality of preset speech messages are displayed in the live broadcast interface, the preset speech messages can be displayed in a sequencing mode according to priority; the preset speaking message with higher priority is ranked earlier; the priority is from high to low:

the voice message comprises a preset speech message determined through the voice information, a preset speech message determined through the speech message, and a preset speech message determined through anchor facial expression information.

In the embodiment, considering that the voice information represents the speaking content of the anchor, the adaptation degree of the preset speaking message determined by the voice information and the live broadcasting content is higher than that of the preset speaking message determined by the speaking messages of other users, and the adaptation degree of the preset speaking message determined by the face expression information of the anchor is lower, so that the preset speaking message is displayed in a sequencing mode according to the priority, and a better speaking message recommendation effect can be achieved.

After the preset speech message is displayed in the live broadcast interface, if the preset speech message which the user wishes to send is recommended in the interface, the user can select the preset speech message by clicking, long pressing and other triggering modes, and when the client detects that a certain preset speech message is triggered, the client can perform live broadcast interactive processing on the preset speech message. Specifically, the live broadcast interactive processing may include various manners, for example, the client displays the preset speech message on a live broadcast interface of the client, or the client sends the preset speech message to a live broadcast server, and the live broadcast server displays the preset speech message on a public screen (that is, the live broadcast server broadcasts the preset speech message to other clients, and each client displays the preset speech message on the live broadcast interface), and the like.

It can be understood that if the user triggers the preset speech message recommended by the client, the user represents the preset speech message to achieve the purpose of accurate recommendation; if more other users send the same speech message, the more recommended value the preset speech message has is represented, and the easier the preset speech message is selected by the user. In view of the above situation, the present embodiment provides the following embodiment, which can continuously improve the recommendation accuracy of the preset talk message in the live broadcast interaction process.

In this embodiment, the utterance sample is configured with a corresponding score;

when a plurality of preset speech messages are displayed in the live broadcast interface, sequencing and displaying the preset speech messages according to the grade of the corresponding speech sample;

the scoring is determined by the number of times that the speech sample is triggered by the user after being used as a preset speech message, and the scoring is higher the number of times is.

For example, the same score may be preconfigured for each speech sample, and if a speech sample is selected by a user after being displayed as a preset speech message in the live broadcast interface, the score of the speech sample may be increased, so that the message is ranked further forward when the speech sample is next acquired to be displayed as the preset speech message.

As an example, assuming that all the speech samples have a default score of 50, when the speech samples are displayed in the live interface and selected by the user, the score of the speech samples is increased by 1, and the phrases with high scores are displayed in the front in the next ranking. The scoring score can also set an upper limit, for example, the upper limit is 100 points, and when the comment sample reaches 100 points, the recommendation value of the comment sample is very high, so that subsequent scoring processing is not required, and the processing pressure of the client is reduced.

As can be seen from the above embodiments, in the preset speech message of the sequencing presentation, two sequencing reference factors are involved:

one is the priority ordering according to the following three types of information: the voice message comprises a preset speech message determined through the voice information, a preset speech message determined through the speech message, and a preset speech message determined through anchor facial expression information.

The other is the scoring of the utterance sample.

In practical applications, one of the messages may be selected to be implemented alone, or may be implemented in combination, and when implemented in combination, the messages may be sorted according to the priority of the three types of messages, and the messages may be sorted according to the level of the score for a plurality of preset messages of each type of message.

For example, assuming that there are six preset speech messages, which are two preset speech messages determined by the voice information, two preset speech messages determined by the speech messages, and two preset speech messages determined by the anchor facial expression information, the two preset speech messages may be sorted according to the priority of the three types of information, and for the two preset speech messages in each type of information, the two preset speech messages are sorted according to the score of the preset speech messages.

In the present embodiment, the speech messages may be used to automatically update the speech database, so as to enrich speech samples in the speech database. In an alternative implementation, the utterance database updates the utterance sample by:

and collecting the speaking messages of other clients sent by the server.

If the speaking message is not stored in the speaking database and the repetition number within a preset time period is greater than a preset number threshold, adding the speaking message into the speaking database as a speaking sample; wherein the score of the utterance sample is determined according to the repetition number, the higher the score.

In this embodiment, the client may perform the above operation according to a preset period to update the speech database, may also perform real-time update in the live broadcast process, may also be initiated by a user, may also perform the operation under the instruction of the update instruction of the server, and so on.

When the database needs to be updated, the speaking messages of other clients can be collected, and through analysis, if more users send the same speaking message, the fact that the speaking message has recommendation value is shown. In the case where the talk message is not stored in the talk database, the talk message may be inserted into the database. The score setting of the talk message can be determined according to the number of the talk messages sent by the users, and the more users send the talk messages, the higher the score of the talk messages is.

As an example, assuming that the default score of the utterance sample is 50 points, the score is increased by 1 point every time the utterance is repeated, if 10 people input "hello" in the current 3-second time period in the collection period, the message needs to be inserted into the database as the utterance sample, and the score of "hello" may be set to 50+ 10-60 points.

In practical application, the speech database of the client can be uploaded to the live broadcast server according to a set period, and the speech database is updated to other clients after being synchronized by the live broadcast server, so that the speech databases can be shared by the clients, and the update work of a new speech sample is completed.

Corresponding to the embodiment of the live broadcast interaction method, the application also provides an embodiment of a live broadcast interaction device.

As shown in fig. 3, fig. 3 is a block diagram of a live interactive device according to an exemplary embodiment of the present application, where the device includes:

an origination request determining module 31 configured to: in the live broadcast process, whether a user initiates a speaking request is determined.

A talk message presentation module 32 configured to: when a user is determined to initiate a speaking request, acquiring a plurality of preset speaking messages, and displaying the preset speaking messages in the live broadcast interface;

an interaction processing module 33, configured to: and when the preset speech message is detected to be triggered, carrying out live broadcast interactive processing on the preset speech message.

In an optional implementation manner, the talk request determining module 31 is further configured to:

In an optional implementation manner, the preset speech message is obtained from a preset speech database, and a plurality of speech samples are stored in the speech database;

In an optional implementation manner, the speech database is provided with corresponding sub-databases for the voice information, the speech message, and the facial expression information.

The talk message presenting module 32 is further configured to: and selecting a speech sample from the speech database as the preset speech message based on the voice information, the speech message or the facial expression information, and selecting from the corresponding sub-database.

In an optional implementation manner, the talk message presenting module 32 is further configured to: when the preset speech message is acquired from a preset speech database, determining a keyword in one or more of the following ways, and searching a speech sample matched with the keyword in the speech database by using the keyword:

comparing the voice information with a preset standard voice, and obtaining the keyword by using the standard voice matched with the voice information, wherein the standard voice is marked with a corresponding keyword in advance;

converting the voice information into text information, and identifying keywords from the text information;

extracting keywords from the speaking message;

the method comprises the steps of obtaining a video image of a main broadcast client in a live broadcast process, identifying main broadcast facial expression information from the video image, and determining corresponding keywords aiming at the main broadcast facial expression information.

In an optional implementation manner, the talk message presenting module is further configured to:

when a plurality of preset speech messages are displayed in the live broadcast interface, sequencing and displaying the preset speech messages according to the priority; the preset speaking message with higher priority is ranked earlier; the priority is from high to low:

In an optional implementation manner, the speech sample is configured with a corresponding score, the score is determined by the number of times that the speech sample is triggered by a user after being used as a preset speech message, and the higher the number of times is, the higher the score is;

the speech message presentation module is further configured to:

and when a plurality of preset speech messages are displayed in the live broadcast interface, sequencing and displaying the preset speech messages according to the grade of the corresponding speech sample.

In an alternative implementation, the utterance database updates the utterance sample by:

collecting speaking messages of other clients sent by a server;

if the speaking message is not stored in the speaking database and the repetition number within a preset time period is greater than a preset number threshold, adding the speaking message into the speaking database as a speaking sample; and determining the score corresponding to the speaking sample according to the repetition times, wherein the score is higher when the repetition times are higher.

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A live interaction method, comprising:

when a user is determined to initiate a speaking request, acquiring a plurality of preset speaking messages, and displaying the preset speaking messages in the live broadcast interface; the preset speech message is obtained from a preset speech database, and a plurality of speech samples are stored in the speech database; when the preset speech message is acquired from a preset speech database based on live broadcast data, determining a keyword according to the live broadcast data, and searching a speech sample matched with the keyword in the speech database by using the keyword;

when the preset speech message is detected to be triggered, performing live broadcast interactive processing on the preset speech message;

wherein the live data comprises at least one of: voice information of the anchor user and facial expression information of the anchor user;

determining the keywords comprises at least one of the following ways: if the live broadcast data is voice information of a main broadcast user, comparing the voice information with preset standard voice, and obtaining the keywords by using the standard voice matched with the voice information, wherein the standard voice is marked with corresponding keywords in advance, or converting the voice information into text information, and identifying the keywords from the text information; and if the live broadcast data is the facial expression information of the anchor user, acquiring a video image of an anchor client in the live broadcast process, identifying the anchor facial expression information from the video image, and determining a corresponding keyword aiming at the anchor facial expression information.

2. The method of claim 1, wherein the determining whether the user initiated the talk request comprises:

3. The method of claim 1, wherein the live data further comprises speech messages sent by other audience users;

the determining the keywords further comprises the following steps:

and if the live broadcast data is a speech message sent by other audience users, extracting keywords from the speech message.

4. The method of claim 3, wherein the speech database is provided with corresponding sub-databases for the voice information, speech message and facial expression information;

and selecting a speech sample from the speech database as the preset speech message based on the voice information, the speech message or the facial expression information, and selecting from the corresponding sub-database.

5. The method according to claim 3, characterized in that when a plurality of preset speech messages are displayed in the live interface, the preset speech messages are displayed in a sorted manner according to priority; the preset speaking message with higher priority is ranked earlier; when the preset speech message comprises a preset speech message determined by the voice information, a preset speech message determined by the speech message, and a preset speech message determined by the anchor facial expression information, the priority level is from high to low:

the voice message is used for determining a preset speech message, the speech message is used for determining a preset speech message, and the anchor facial expression information is used for determining a preset speech message.

6. The method of claim 1 or 4, wherein the utterance sample is configured with a corresponding score;

7. The method of claim 6, wherein the utterance database updates the utterance sample by:

collecting speaking messages of other clients sent by a server;

8. A live interaction device, the device comprising:

a talk message presentation module to: when a user is determined to initiate a speaking request, acquiring a plurality of preset speaking messages, and displaying the preset speaking messages in the live broadcast interface; the preset speech message is obtained from a preset speech database, and a plurality of speech samples are stored in the speech database; when the preset speech message is acquired from a preset speech database based on live broadcast data, determining a keyword according to the live broadcast data, and searching a speech sample matched with the keyword in the speech database by using the keyword;

an interactive processing module, configured to: when the preset speech message is detected to be triggered, performing live broadcast interactive processing on the preset speech message;

determining the keyword includes at least one of: if the live broadcast data is voice information of a main broadcast user, comparing the voice information with preset standard voice, and obtaining the keywords by using the standard voice matched with the voice information, wherein the standard voice is marked with corresponding keywords in advance, or converting the voice information into text information, and identifying the keywords from the text information; and if the live broadcast data is the facial expression information of the anchor user, acquiring a video image of an anchor client in the live broadcast process, identifying the anchor facial expression information from the video image, and determining a corresponding keyword aiming at the anchor facial expression information.

9. The apparatus of claim 8, wherein the live data further comprises

Speech messages sent by other audience users;

the determining the keywords further comprises the following steps:

10. The apparatus of claim 9, wherein the talk message presentation module is further configured to:

when a plurality of preset speech messages are displayed in the live broadcast interface, sequencing and displaying the preset speech messages according to the priority; the preset speaking message with higher priority is ranked earlier; when the preset speech message comprises a preset speech message determined by the voice information, a preset speech message determined by the speech message, and a preset speech message determined by the anchor facial expression information, the priority level is from high to low:

11. The apparatus of claim 9, wherein the speech sample is configured with a corresponding score, and the score is determined by a number of times the speech sample is triggered by a user after being used as a preset speech message, and the higher the number of times, the higher the score;

the speech message presentation module is further configured to: