CN111586430A

CN111586430A - Online interaction method, client, server and storage medium

Info

Publication number: CN111586430A
Application number: CN202010407299.8A
Authority: CN
Inventors: 张艳军; 郭晓彬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2020-08-25

Abstract

The present disclosure provides an online interaction method, a client, a server, a computer device and a storage medium. An online interaction method for a client, comprising: receiving a notification of starting an online interaction mode; obtaining input from a first category of users; sending an interaction request of a first class of users for participating in interaction to a cloud server based on input; responding to a received notification of successful participation request of the cloud server, and displaying a message of successful participation request; wherein the online interaction mode is enabled by a second category of users.

Description

Online interaction method, client, server and storage medium

Technical Field

The present invention relates to the field of network communication technologies, and in particular, to an online interaction method, a client, a server, a computer device, and a storage medium.

Background

The development of internet technology has led to a plurality of emerging network interaction modes, wherein the network live broadcast is popular among a plurality of users and becomes a popular interaction mode. Generally, webcasting is conducted via a live platform and involves a main broadcast (i.e., a presenter) and a viewer (i.e., a viewer). The audio and/or video data of the anchor may be captured, processed, and transmitted via the live platform for viewing by the audience. In addition, in order to promote interactive experience, some live broadcast platforms also provide a live broadcast mode of multi-person live broadcast, namely, multiple anchor broadcasters can enter the same live broadcast room and simultaneously conduct live broadcast, and audiences can enter the live broadcast room and watch the live broadcast of the multiple anchor broadcasters simultaneously.

Disclosure of Invention

In the current live broadcast mode, the interaction mode between the audience and the main broadcast is generally limited to leave messages and send gifts, and the interaction degree between the two is not high, so that the participation of the audience cannot be well improved. Meanwhile, the similar interaction mode for a long time easily causes the audience to be fatigued, thereby losing interest.

In view of the above, there is a need to provide a method, apparatus, system, computing device and storage medium for session re-establishment or sharing that may alleviate, alleviate or even eliminate the above-mentioned problems.

According to an aspect of the present invention, an online interaction method for a client is provided. The method comprises the following steps: receiving a notification of starting an online interaction mode; obtaining input from a first category of users; sending an interaction request of a first class of users for participating in interaction to a cloud server based on input; responding to a received notification of successful participation request of the cloud server, and displaying a message of successful participation request; wherein the online interaction mode is enabled by a second category of users.

In some embodiments, the method further comprises: sending a first audio of a first category user to a cloud server; receiving an evaluation score for the first audio and an experience value of a first class of users from a cloud server, wherein the evaluation score is calculated based on audio features extracted from the first audio, and the experience value is calculated based on the evaluation score and historical experience values of the first class of users; and displaying the evaluation score and the empirical value.

In some embodiments, the method further includes sending a first video from the first category of users to the cloud server.

In some embodiments, the evaluation score is derived by inputting the audio features extracted from the first audio into a trained machine learning model, the machine learning model being pre-trained with a data set comprising a plurality of audios and evaluation scores corresponding to the plurality of audios.

In some embodiments, the audio features are extracted by: enabling the audio signal of the first audio frequency to pass through a high-pass filter to obtain an audio signal subjected to high-frequency filtering; dividing the high-frequency filtered audio signal into a plurality of audio signal frames with preset lengths; windowing each audio signal frame of the plurality of frames to obtain a plurality of windowed audio signal frames; converting a plurality of audio signal frames into an audio energy distribution by fast fourier transform; the audio energy is distributed through a triangular filter bank, and audio features are extracted through the energy output by the triangular filter bank.

In some embodiments, before receiving the notification to start the online interaction mode, the method further comprises: initiating the establishment of an interactive room, comprising: sending a matching request to a cloud server so that the cloud server matches the client with one or more other clients; and receiving a matching result from the cloud server, and enabling the first class user of the client and the first class user of the client matched with the client to join the same interactive room.

In some embodiments, before receiving the notification to start the online interaction mode, the method further comprises: initiating the establishment of an interactive room, comprising: sending an invitation request to a cloud server so that the cloud server sends an invitation to a client corresponding to the invitation request; and receiving an invitation result from the cloud server, and enabling the first class user of the client and the first class user of the client accepting the invitation to join the same interactive room.

In some embodiments, before receiving the notification to start the online interaction mode, the method further comprises: initiating the establishment of an interactive room, comprising: receiving an invitation request from a cloud server, wherein the invitation request invites a client to join an interactive room; and sending a message for accepting the invitation request to the cloud server, and enabling the first class user of the client and the first class users of one or more other clients to join the same interactive room.

In some embodiments, after displaying the message requesting successful participation in the interaction in response to receiving the notification requesting successful participation in the interaction from the cloud server, the method further includes: and receiving a notice of playing the second audio from the cloud server and playing the second audio, wherein the second audio is selected by the second category of users.

In some embodiments, before obtaining the input from the first category of users, further comprising: a portion of the second audio for the predetermined length of time is received and played from the cloud server.

In some embodiments, the method further comprises: the method comprises the steps of responding to a received notification of interaction participation request failure from a cloud server, displaying a message of interaction participation request failure, and receiving and playing audio of a first-class user which requests successful interaction participation in at least one other first-class user.

According to another aspect of the invention, an online interaction method for a client is provided. The method comprises the following steps: obtaining input from a second category of users; sending a second type of user request to initiate an interactive request to the cloud server based on the input; responding to a received notification of successful interaction initiation from the cloud server, and displaying a message of successful interaction initiation; and responding to the received notification that the first-class user requests to participate in the interaction successfully, and displaying a message that the first-class user requests to participate in the interaction successfully.

In some embodiments, the method further comprises: playing first audio of a first class of users requesting to participate in interaction successfully; and playing a second audio while playing a first audio of the first class of users requesting successful interaction, wherein the first audio and the second audio are combined into the same audio for playing.

In some embodiments, the method further includes playing a first video of the first category of users requesting successful engagement in the interaction.

In some embodiments, before displaying the message that the initiation of the interaction is successful in response to receiving the notification that the initiation of the interaction is successful from the cloud server, the method further includes: and responding to the received notification that the interactive request is added into the queuing list, and displaying the message that the interactive request is in the queuing state.

In some embodiments, the method further comprises: receiving an evaluation score for the first audio and an experience value of a first class of users from a cloud server, wherein the evaluation score is calculated based on audio features extracted from the first audio, and the experience value is calculated based on the evaluation score and historical experience values of the first class of users; the evaluation scores and empirical values are displayed.

According to another aspect of the present invention, an online interaction method for a cloud server is provided, including: receiving a first interaction request for initiating an interaction with respect to an interaction room from one of at least one second category user client; sending a notice for starting an online interaction mode to at least two first-class user clients; receiving one or more second interaction requests within a predetermined time period from one or more of the at least two first category user clients; determining a second interaction request with the earliest receiving time in one or more second interaction requests; and sending a notification of successful participation request to the first-class user client corresponding to the second interaction request with the earliest receiving time.

In some embodiments, the method further includes receiving first audio from a first category user client requesting successful participation in the interaction.

In some embodiments, the method further includes receiving a first video from a first category of user clients requesting successful participation in the interaction.

According to still another aspect of the present invention, there is provided a client comprising: a receiving module configured to receive a notification to start an online interaction mode; an acquisition module configured to acquire input from a first category of users; the sending module is configured to send an interaction request of a first class of users for participating in interaction to the cloud server based on the input; the display module is configured to respond to the received notification of successful participation in the interaction from the cloud server and display a message of successful participation in the interaction; wherein the online interaction mode is enabled by a second category of users.

According to still another aspect of the present invention, there is provided a client comprising: an acquisition module configured to acquire input from a second category of users; a sending module configured to send an interaction request for initiating an interaction to the cloud server based on the input, the interaction request being of a second category of user requests; the display module is configured to respond to the received notification of successful interaction initiation from the cloud server and display a message of successful interaction initiation; and responding to the received notification that the first-class user requests to participate in the interaction successfully, and displaying a message that the first-class user requests to participate in the interaction successfully.

According to another aspect of the present invention, there is provided a cloud server, including: a first receiving module configured to receive a first interaction request for initiating an interaction with an interactive room from one of at least one second category user client; a first sending module configured to send a notification of opening an online interaction mode to at least two first category user clients; a second receiving module configured to receive one or more second interaction requests within a predetermined time period from one or more of the at least two first category user clients; a determining module configured to determine a second interaction request with the earliest reception time among the one or more second interaction requests; and the second sending module is configured to send a notification of successful participation request to the first-class user client corresponding to the second interaction request with the earliest receiving time.

According to a further aspect of the present invention, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the online interaction method provided according to the preceding aspect.

According to a further aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the online interaction method provided according to the preceding aspect.

The online interaction method according to the embodiment of the invention provides a new interaction method allowing two types of users (such as a main broadcast and a spectator in a live network) to participate. Multiple users of the first category may choose to participate in an online interaction initiated by users of the second category, and users of the first category who successfully participate in the interaction may send audio according to the interaction requirements of users of the second category. Thereby, a close interaction between the users of the first category and the users of the second category may be provided, which may enrich the variety of interactions and improve the sense of engagement of the users of the second category (such as viewers in a web broadcast). Moreover, since the audio content sent by the first category of users is determined by the interactive requests of the second category of users, it is helpful to allocate network channels, server resources, etc. to the content with higher priority (for example, the content in which the users are more interested), so as to optimize the utilization efficiency of these resources. In addition, the interestingness of the interaction can be further increased by scoring the audio and the like, and the interactive experience is improved.

These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the accompanying drawings, in which:

FIG. 1 illustrates an example scenario in which the online interaction method provided by embodiments of the present invention may be applied;

FIG. 2A schematically illustrates an example flow diagram of a method for online interaction of a client in accordance with one embodiment of this disclosure;

FIG. 2B schematically illustrates another example flow diagram of a method for online interaction of a client in accordance with one embodiment of the invention;

FIG. 3 schematically shows an example flow diagram of an audio feature extraction method according to one embodiment of the invention;

FIG. 4A schematically illustrates an example flow diagram of a method for online interaction of another client, in accordance with one embodiment of the present invention;

FIG. 4B schematically illustrates another example flow diagram of an online interaction method for another client, in accordance with one embodiment of the present invention;

FIGS. 5A-5D schematically illustrate an example interface diagram according to one embodiment of this disclosure;

FIG. 6 schematically illustrates an example flow diagram of a method for online interaction of a server in accordance with one embodiment of this disclosure;

FIG. 7 schematically illustrates an example interaction flow diagram of a method of online interaction, in accordance with one embodiment of the present invention;

FIG. 8 schematically illustrates an example block diagram of a client in accordance with one embodiment of this disclosure;

FIG. 9 schematically illustrates an example block diagram of another client in accordance with one embodiment of this disclosure;

fig. 10 schematically illustrates an example block diagram of a cloud server in accordance with one embodiment of this disclosure; and

FIG. 11 schematically shows an example block diagram of a computing device in accordance with one embodiment of this invention.

Detailed Description

Before describing embodiments of the present invention in detail, some relevant concepts are explained first:

1. and (4) live broadcast: refers to a technology that collects data (one or more of audio, video, etc.) of a broadcasting party through a certain device, performs a series of processing (e.g., video coding compression, etc.) on the data to form a media stream that can be viewed and transmitted, and outputs the media stream to a watching client, wherein live broadcast includes live audio broadcast and live video broadcast;

2. multi-person live broadcast: refers to the way at least two broadcasters (e.g. two or more anchor) live in the same live room.

Fig. 1 illustrates an example scenario 100 in which the online interaction method provided by the embodiments of the present invention may be applied. As shown, the scenario 100 includes a plurality of first category users 110 and their terminal devices (i.e., first terminal devices) 120, a server 130, and a plurality of second category users 140 and their terminal devices (i.e., second terminal devices) 150. The first terminal device 120, the server 130, and the second terminal device 150 may have respective programs deployed thereon for performing the various methods provided by the present disclosure. The first category of users 110 may play an online interactive game with the second category of users 140.

According to an embodiment of the present invention, a plurality of first category users 110 may enter the same interactive room to participate in an interactive game, and one or more second category users 140 may enter the interactive room to play the interactive game with the plurality of first category users 110. The second category of users 140 may enter through their second terminal device 150 to initiate the interactive game. The second terminal device 150 may process the input of the second category user 140 through a corresponding program deployed thereon and transmit an interaction request requesting initiation of interaction to the server 130 via the network. The server 130 may process the interaction request and send a notification to the first terminal device 120 to open the online interaction mode. Thus, the first category user 110 may request to participate in the interactive game by inputting through the first terminal device 120 thereof in order to obtain an interactive opportunity. The first terminal device 120 may process the input of the first category user 110 through a corresponding program disposed thereon and transmit an interactive request requesting participation in the interactive game to the server 130 via the network. When the server 130 receives requests from a plurality of first terminal devices 120, it may determine the earliest received request and send a notification of successful participation in the interaction to the corresponding first terminal device 120. Therefore, the user of the corresponding first terminal device 120 obtains the interactive game opportunity, and can send the audio of the user through the device according to the interactive requirement. The audio may be sent to the server 130 via a network. The server 130 may send it to the second terminal device 140 for viewing or listening by the corresponding second category user 150. The server 130 may also score the audio through a corresponding algorithm deployed thereon, and the scoring result may also be sent to the first terminal device 120 and the second terminal device 150 via the network.

Illustratively, the interaction may be a song-robbing game while a plurality of persons play on the fly. In this case, the first category of users may be anchor and the second category of users may be viewers. For example, the audience can request a song through the terminal equipment to initiate an interactive mode of a song singing-robbing game. The server can process the interaction mode, so that a plurality of anchor broadcasters can participate in the singing-robbing game through respective terminal equipment. And the anchor broadcasts with successful singing will sing the segments of the song at the point, after the singing is finished, each anchor broadcast enters the next round of singing, the anchor broadcasts with successful singing will continue to sing the next segment of the song, and the process is circulated until all the segments of the song are sung finished. Thereafter, a determination may be made as to whether to begin a new song robbing interaction based on whether the viewer clicked on the new song. Further, optionally, the song segments sung by the anchor may be scored by the server, which may be accumulated or used to generate a ranking.

It should be noted that in addition to the interactive mode of the song-robbing game, the various embodiments provided by the present disclosure may also be used for other interactive modes, such as similar interactive modes relating to movie dubbing games, improvised games, sound simulation games, instrumental performances, and the like. Alternatively, in a live scene, the first category of users may also be viewers, and the second category of users is a main broadcast, that is, the main broadcast may also be an initiator of the interactive game, for example, a task may be issued by the main broadcast, and the viewers may contend for eligibility for performing the task, such as connecting to the main broadcast, chorusing with the main broadcast, and the like.

For clarity, the first terminal device 120 and the second terminal device 150 are depicted in fig. 1 as two different devices. However, in practice, the first terminal device 120 and the second terminal device 150 may be the same terminal device, and the plurality of first terminal devices 120 may be different terminal devices, and likewise, the plurality of second terminal devices 150 may also be different terminal devices. Examples of the first terminal device 120 and the second terminal device 150 include, but are not limited to, desktop computers, laptop computers, tablet computers, smart phones, smart watches, and the like. Further, the program deployed on the first terminal device 120 or the second terminal device 150 may be a separate application program, may be some function embedded in another application program, or may relate to an applet or a web program that can be accessed via another application program, or the like.

The server 130 may be a single server or a cluster of servers, or may be other computing devices with communication, storage, and communication capabilities. Furthermore, although the server 130 is shown separately in fig. 1, it is also possible that it is integrated with the first terminal device 120 or the second terminal device 150, or that one part is integrated with the first terminal device 120 and another part is integrated with the second terminal device 150. It should be understood that when in the integrated state, communication via the network may be replaced by internal communication of the device.

The specific structures of the first terminal device 120, the server 130, and the second terminal device 150 will be described in detail below. Further, it should be understood that "network" as described herein may include the internet, a Local Area Network (LAN), a telephone network, an intranet, other public and/or proprietary networks, and combinations thereof, and may involve wireless networks (such as in the form of cellular networks, WiFi, bluetooth, LiFi, ZigBee, etc.) as well as wired networks (such as via cable, fiber optics, etc.). Accordingly, the first terminal device 120, the server 130 and the second terminal device 150 may have interfaces that support communication via respective networks.

FIGS. 2A and 2B schematically illustrate exemplary flow diagrams of

online interaction methods

200A and 200B, respectively, for a client, according to one embodiment of the invention. The

methods

200A and 200B are both applicable to a first terminal device 120, such as the first category of users 110 shown in fig. 1.

In step 211, a notification to turn on the online interaction mode is received. The online interaction may be performed in an interactive room. The interactive room may include a plurality of users of a first category and at least one user of a second category. The plurality of first category users and the at least one second category user are logged on at respective clients, and the online interaction may be initiated by one of the at least one second category user (the initiation process will be described in detail below). In some embodiments, the first terminal device 120 may receive a notification from the server 130 to turn on an online interaction mode (such as the aforementioned song-robbing game when multi-player is live, etc.), which may be displayed on the first terminal device 120 in various alternative forms of text, pictures, animation effects, audio, etc., so that the first category user 110 may obtain the notification through his terminal device.

Optionally, prior to step 211, there may be step 221 as shown in FIG. 2B. At step 221, the establishment of an interactive room is initiated. This step can be implemented in various ways: (1) sending a matching request to a cloud server so that the cloud server matches the current client with one or more other clients; receiving a matching result from the cloud server, and enabling a first class user of the current client and a first class user of the client matched with the current client to join the same interactive room; (2) sending an invitation request to a cloud server so that the cloud server sends an invitation to a client corresponding to the invitation request; receiving an invitation result from the cloud server, and enabling a first class user of the current client and a first class user of the client accepting the invitation to join the same interactive room; (3) receiving an invitation request from a cloud server, wherein the invitation request invites a client to join an interactive room; and sending a message for accepting the invitation request to the cloud server, and enabling the first class user of the current client and the first class users of one or more other clients to join the same interactive room. Briefly, in various embodiments, first class users 110 may use respective first end devices 120 to directly initiate or indirectly participate in initiating the establishment of an interactive room by initiating a match request to participate in automatic matching, actively sending an invite request to invite other first class users, or passively accepting invitations from other first class users.

Illustratively, in a live scene, a certain anchor may enter a live multiplayer interface through its terminal device, the interface may display options such as random matching, inviting friends, etc., and may display received live multiplayer invitations, such as at a message center or in other alert manners. If the anchor selects the option of random matching, the terminal device may send a corresponding matching request to the server. The server may then match the anchor that is currently selecting a random match, either randomly or according to some predetermined criteria, such as anchor-based popularity, historical live content, gender or age, etc. And the server may enable the matched anchor to enter the same live room, e.g. open an interface to them entering a live room with the same identity, and send the matching results to the matched end devices of the respective anchors. Therefore, the matched anchor broadcasters can enter the same live broadcast room through the terminal equipment to carry out multi-person live broadcast. Alternatively, if the anchor selects the option to invite friends, it may jump to another interface so that the anchor may choose, for example, to invite a particular friend or randomly invite several friends, either through a messaging system within the live platform or by means of other communication means. The terminal device may send an invitation request including, for example, invited friends from the anchor to the server, and the server may send invitation information to the invited anchor and receive a feedback message from the corresponding anchor, whereupon the anchor initiating the invitation and the anchor accepting the invitation are enabled to enter the same live room based on the feedback message, and send an invitation result to the anchor initiating the invitation. Or, if the anchor receives the invitation information from other anchors, the anchor can choose to accept or not accept the invitation, if so, the terminal equipment can send the message that the anchor accepts the invitation to the server; if not, no message or a message not accepting the invitation is sent. Alternatively, there may be possibilities to establish an interactive room in many of the ways described above. For example, a live room may define a minimum anchor number, an anchor may choose to invite one or more anchors and make up the minimum anchor number by random matching, or other anchors may be matched by random matching when the invited anchor does not accept the invitation, or an anchor accepting the invitation may choose to continue inviting other anchors to enter the same live room, etc.

At step 212, input from a first category of users is obtained. The input indicates that the first category of users requested to engage in an interaction to send a first audio and optionally also a first video. In some embodiments, the first terminal device 120 may display an interface for receiving user input, such as a physical or virtual button, slider, dial, or the like, or via a voice input interface such as a microphone, via a gesture input interface such as a camera, or the like, so that the first category user 110 may input to express a desire to participate in the present interaction. For example, in a multi-player live broadcast, one or more anchor may make an input via their terminal device (such as clicking a button displayed on the screen that identifies "snatching") in an attempt to get an opportunity to sing a current song or a segment to be sung of the current song, for example.

Optionally, prior to step 212, there may be step 222 as shown in FIG. 2B. At step 222, a portion of the second audio for the predetermined length of time is received and played from the cloud server. In some embodiments, the first terminal 120 may receive the portion of the predetermined length of time of the second audio from the server 130 and play the portion, or the second terminal 120 may receive the second audio from the server 130 and play the portion of the predetermined length of time therein. For example, the second audio may be an accompaniment of a song to be performed, a portion of which, such as a prelude, may be played.

In step 213, an interaction request of a first category of users requesting to participate in the interaction is sent to the cloud server based on the input. In some embodiments, the first terminal device 120 may process the input of the first category user 110 in step 212, generate a corresponding interaction request requesting to participate in the interaction, and then send the interaction request requesting to participate in the interaction from the first category user 110 to the server 130 via the network.

In step 214, in response to receiving the notification of successful participation in the interaction from the cloud server, a message of successful participation in the interaction is displayed. In some embodiments, after receiving an interaction request requesting to participate in the interaction from one or more first terminal devices 120, the server 130 may process the requests and respectively send a feedback message to the one or more first terminal devices 120, for example, send a notification of success of participation in the interaction to the first terminal device 120 corresponding to the first received request. When the first terminal device 120 receives the notification that the request for participation in the interaction was successful, the notification may be displayed in one or more of alternative forms of text, picture, animation effect, audio, etc., similar to step 211, for example, text information such as "snatched" and corresponding animation effect.

Optionally, after step 214, there may be step 223 as shown in FIG. 2B. In step 223, a notification to play a second audio selected by the second category of users is received from the cloud server and the second audio is played. In particular, the second audio is associated with an interaction initiated by one of the at least one second category of users. In some embodiments, after or simultaneously with receiving the message requesting successful participation in the interaction from the server 130, the first terminal device 120 may also receive a notification from the server 130 to play the second audio, and play the second audio according to the notification. For example, as described above, the second audio may be an accompaniment of a song to be performed, or may be a part of the accompaniment. For example, the second audio may be the accompaniment of a song that the viewer clicked on when the interaction mode was initiated.

Optionally, there is also step 226. In step 226, a first audio, and optionally a first video, is sent from the first category of users to the cloud server. In some embodiments, the first terminal device 120 receiving the notification requesting successful participation in the interaction may be allowed to capture audio from the respective first user 110, e.g. via a microphone, and send the audio or an information stream comprising the audio to the server 130. For example, when a plurality of people live and perform song singing interaction, the anchor that has succeeded in the singing may sing the corresponding song or a segment of the corresponding song, and the singing audio of the anchor may be acquired by the terminal device, optionally processed into a form that is easy to transmit, and sent to the server 130. Alternatively, a host that is successful in the pre-singing may choose to forgo singing, in which case the singing audio from the host may be replaced with the original audio of the song.

Optionally, there may also be steps 224 and 225 as shown in FIG. 2B. In step 224, in response to receiving the notification of the failure to request participation in the interaction from the cloud server, a message of the failure to request participation in the interaction is displayed. In some embodiments, after the first terminal device 120 sends the interaction request of the first class user 110 requesting to participate in the interaction to the server 130 in step 213, the server 130 may send a message requesting to participate in the interaction failure to the first terminal device 120 corresponding to the request that is not received first, for example, after processing the plurality of requests. When the first terminal device 120 receives the notification that the request to participate in the interaction failed, the notification may be displayed in one or more of alternative forms of text, pictures, animation effects, audio, etc., e.g. a text message such as "not snatched" and a corresponding animation effect, similar to step 214.

In step 225, the audio of the first category user, which is requested to successfully participate in the interaction, of the at least one other first category user is received and played, and optionally the video of the first category user is also played. In some embodiments, the first terminal device 120 receiving the notification requesting the failure to participate in the interaction may receive audio or an information stream including the audio from the first user 110 requesting the success of the interaction from the server 130 and play the corresponding audio. For example, when a plurality of persons live and perform song singing-robbing interaction, the anchor who failed in singing-robbing can listen to the singing of the anchor who succeeded in singing through the terminal equipment. And optionally, during the singing of the successful anchor, the anchor with failed singing may be prohibited from transmitting audio through its terminal device.

It is to be understood that

steps

214 and 226 and steps 224 and 225 may be performed on different first terminal devices 120 in the same round of interaction or may be performed on the same first terminal device 120 in different rounds of interaction.

And, optionally, after step 226, there may also be

steps

227 and 228. In step 227, an evaluation score for the first audio and experience values of the first class of users are received from the cloud server. The evaluation score may be calculated based on audio features extracted from the first audio, and the empirical value may be calculated based on the evaluation score and historical empirical values of the first class of users. In some embodiments, after the first terminal device 120 sends the first audio from the corresponding first category user 110 to the server 130, the server 130 may score the first audio to obtain the rating score. The process may be derived by inputting audio features extracted from the first audio into a trained machine learning model that is pre-trained with a data set that includes a plurality of audios and evaluation scores corresponding to the plurality of audios. For example, in a song racing through live broadcasting of multiple persons, audio features can be extracted from the singing audio of a main broadcasting in which racing is successful, and the audio features are input into a trained machine learning model to obtain the evaluation score of the singing audio. If the anchor gives up singing after a certain time of successful singing, the evaluation score defaults to zero. The above-described feature extraction and score evaluation process will be described in further detail below with reference to fig. 3.

Further, the above evaluation scores may be accumulated. For example, a song may include several rounds of song snatching, a anchor may sing more than one segment of the song, and the rating scores for the various segments it sings may be accumulated as its accumulated rating score in the song snatching of the song. Or in a live broadcast, a plurality of rounds of singing can be performed on a plurality of songs, a certain anchor can sing more than one song, and the evaluation scores of the singing songs can be accumulated to be used as the accumulated evaluation scores in the live broadcast song singing. In addition, a plurality of anchor studios in the live broadcast room may be divided into at least two camps, and evaluation scores of singing of anchor studios in the same camps may be accumulated as an accumulated evaluation score of the corresponding camps in song singing of a certain song or song singing of a certain live broadcast.

In some embodiments, the evaluation scores of the first type of user 110 from each round of interaction may be processed to form their empirical values, such as by directly accumulating the evaluation scores or assigning empirical values based on the evaluation scores according to some rule. For example, in a song singing competition scenario, after a plurality of segments of a song are sung or after a plurality of songs in a live broadcast are sung, the anchor or anchor lineup may be ranked according to the accumulated evaluation scores, and optionally, the top anchor or anchor lineup may be rewarded, such as additional experience values, virtual prop rewards in a live broadcast platform, and the like. Further, the experience value of the anchor may be calculated based on the above-described evaluation score or the cumulative evaluation score. For example, the above evaluation score or the cumulative evaluation score may be added as a new experience value directly on the basis of the historical experience value of the anchor, or a certain experience value may be added thereto based on the ranking of the anchor or the ranking of the affiliated lineup, or the like. In some embodiments, the ranking may be generated based on the anchor's empirical values. Alternatively, the empirical values may be accumulated for a period (e.g., one week, one month, etc.) and a ranking generated. After a period is over, the experience values may be cleared and a new round of experience value accumulation and ranking may be initiated. In some embodiments, top ranked anchor may be rewarded. Thus, the enthusiasm of the anchor in participating in the interaction can be stimulated, more audiences are indirectly attracted to participate in the interaction, and the interaction enthusiasm between the audiences and the anchor is improved.

At step 228, the evaluation score and empirical values are displayed. In some embodiments, the first terminal device 120 may display the evaluation score and the experience value after receiving them from the server 130, such as in text, graphics, animation effects, voice, and the like.

Fig. 3 schematically illustrates an example flow diagram of an audio feature extraction method 300 that may be used in step 227 shown in fig. 2B.

In step 301, the audio signal of the first audio is passed through a high-pass filter to obtain a high-frequency filtered audio signal. Exemplarily, the first audio signal may be passed through a high-pass filter h (z) =1- μ z^-1Wherein μ has a value between 0.9 and 1.0. By this step, the first audio can be pre-emphasized to boost the high frequency part so that the spectrum of the signal becomes flat, thereby enabling the spectrum to be found with the same signal-to-noise ratio in the entire frequency band. Furthermore, it is also possible to compensate for a high-frequency portion of the speech signal that is suppressed by a sound production system (e.g., lips, vocal cords, etc.), thereby emphasizing a formant of the high-frequency portion.

In step 302, the high-frequency filtered audio signal is divided into a plurality of audio signal frames of a preset length. Illustratively, a plurality of sample points may be combined into one observation unit, which may be referred to as a frame. In general, the value of N may be 256 or 512, and the time covered may be approximately 20-30 ms. To avoid excessive variation between two adjacent frames, two adjacent frames may have an overlap region, which may include M sampling points, typically M having a value of about 1/2 or 1/3 of N. Generally, the sampling frequency of the voice signal used for voice recognition is 8 KHz or 16 KHz. Taking 8 KHz as an example, if the frame length is 256 sampling points, the corresponding time length is 256/8000 × 1000=32 ms.

In step 303, each audio signal frame of the plurality of frames is windowed, resulting in a windowed plurality of audio signal frames. Illustratively, a Hamming Window (Hamming Window) is multiplied for each of the plurality of frames to increase the continuity of the left and right ends of the frame. Assuming that the signal after framing is S (N), N =0,1, …, N-1, N, then multiplied by the hamming window, S' (N) = S (N) × w (N), w (N) represents the hamming window, and its form is as follows:

（1)

where a is a predetermined parameter, different values of a may result in different hamming windows, and in general, a may have a value of 0.46.

In step 304, a plurality of frames of audio signals are converted into an audio energy distribution by a fast fourier transform. In general, for an audio signal, it may be converted into an energy distribution on the frequency domain in consideration of difficulty in observing signal characteristics for its transformation on the time domain. Different energy distributions may represent different audio characteristics. Thus, a fast fourier transform may be performed on each audio signal frame in order to obtain the frequency spectrum of each audio signal frame. Furthermore, the power spectrum of the audio signal, that is, the audio energy distribution, can be obtained by performing a modulo square operation on the frequency spectrum of each frame.

In step 305, the audio energy is distributed through a triangular filter bank, and audio features are extracted from the energy output by the triangular filter bank. Illustratively, the audio energy distribution may be passed through a set of triangular filters, which may include M triangular filters, M being close to the number of critical bands, typically in the range of 22 to 26. The center frequency of each triangular filter may be f (M), M =1,2, …, M, wherein the spacing between each f (M) increases with increasing M. Further, log energy may be calculated for the output of each triangular filter, and discrete cosine transform (DTC) may be performed on the obtained log energy to obtain, for example, mel-frequency cepstrum coefficients (MFCCs) of order L, where L refers to the order of the MFCC, and the value is typically in the range of 12 to 16. The resulting mel-frequency cepstral coefficients may be used as audio features extracted from the first audio.

Furthermore, the energy of an audio frame may reflect the volume, which is also an important feature of audio, and is easy to calculate. Thus, the logarithmic energy of an audio frame may be characterized by the audio features, for example, by summing the squares of the signals in a frame, taking the base-10 logarithm, and multiplying by 10 to obtain the logarithmic energy. In addition, considering that the mel-frequency cepstrum coefficients can only reflect the static characteristics of the audio, the dynamic difference parameters (including the first-order difference and the second-order difference) can be further calculated to describe the dynamic characteristics of the audio. The dynamic difference parameters may also be incorporated into the extracted audio features to enable the extracted audio features to more fully characterize the audio.

The extracted audio features may then be input to a trained machine learning model to derive an evaluation score for the corresponding audio. In some embodiments, the machine learning model that can derive the evaluation score of the corresponding audio from the input audio features may be a Resnet (residual neural network) model, which adds the idea of residual learning to a conventional convolutional neural network, thereby helping to solve the problems of gradient dispersion and accuracy degradation in a deep network, so that accuracy can be guaranteed and speed can be controlled in the case of the deep network. But alternatively other machine learning models may be used, such as a conventional Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or the like. In some embodiments, the trained machine learning model may be deployed on server 130. Alternatively, the trained machine learning model may be deployed to the first terminal device 120 in case the processing power of the first terminal device 120 is sufficiently powerful, or when part or all of the functionality of the server 130 is integrated into the first terminal device 120.

The machine learning model may be trained using pre-labeled training samples. The training samples may include audio segments and rating scores for the audio segments, which may be manually labeled. For example, in a song singing scenario with multiple live broadcasts, the audio segment in the training sample may be a singing segment of a song, and the rating score may be a single composite rating score or a multi-dimensional rating score comprising multiple scores. For example, the annotator can annotate the comprehensive evaluation score of the singing segment based on the intonation, tone, breath, volume, typos, etc. of the singing segment, or respectively derive scores for the evaluation dimensions and form a multi-dimensional evaluation score. It should be understood that during the training process of the machine learning model, it should be combined with the feature extraction module to be used during actual use, and trained using the training samples; alternatively, the audio segments in the training samples may be subjected to such a feature extraction module in advance to extract features thereof, and the machine learning model may be trained using the samples composed of the extracted features and the evaluation scores. As such, the trained machine learning model may output an evaluation score for an audio segment, including a composite evaluation score or a multi-dimensional evaluation score, when features of the audio segment are input. Furthermore, it should be understood that the machine learning model may be trained to perform score evaluation on the audio features of the predetermined length of the audio segments, and therefore, when the trained machine learning model is used, the longer first audio may be cut into a plurality of predetermined length segments in advance, and then feature extraction and score evaluation are performed.

FIGS. 4A and 4B schematically illustrate example flow diagrams of

online interaction methods

400A and 400B, respectively, for another client, according to one embodiment of the invention, and FIGS. 5A-5D schematically illustrate example interfaces 500A-500D that appear when the

online interaction method

400A or 400B is executed. The

methods

400A and 400B are both applicable to a second terminal device 150, such as the second class of users 140 shown in fig. 1.

At step 411, input from a second category of users is obtained. The input indicates that the second category of users initiated an interaction for the interactive room request. The interactive room may comprise at least one second category user and at least two first category users, the at least one second category user and the at least two first category users being logged in at respective clients. In some embodiments, the second terminal device 150 may display an interface for receiving user input, e.g., a physical or virtual button, slider, dial, etc., or via a voice input interface such as a microphone, via a gesture input interface such as a camera, etc., so that the second category of users 140 may input to express a desire to initiate an interaction. The second category of users 140 may enter an established interactive room and initiate interaction by entering through the interface. For example, in a scene with multiple live players, the viewer can enter a certain live broadcasting room and initiate song-singing interaction in the live broadcasting room by requesting a song.

Fig. 5A shows an example interface 500A of a song-robbing interaction process for a live multi-user (live video), which may be displayed on the second terminal device 150, for example. As shown, a video frame of each anchor may be displayed above the interface, in this example, four anchors in the live room are participating in a multi-person live. The viewer may send a barrage message, give a gift, etc. through the function buttons below the interface. The text message in the box at the bottom right corner of interface 500A indicates to the viewer that a song snatch may be currently initiated. Illustratively, the viewer may select a song of the gist by clicking "click-to-send song". When the viewer clicks "song on" song order, "a song selection box below interface 500B, as shown in FIG. 5B, may pop up. The song box displays a plurality of songs, song difficulty and corresponding song-ordering conditions, such as virtual currency and platform props which need to be paid. Spectators can select songs of their mind instruments in the song selection box, click the 'give away' button at the lower right corner, and pay corresponding virtual currency, platform props and the like according to the guide to complete song ordering, so as to initiate song singing interaction. If there are songs in the song box that do not have audience mood, it can also click on the "song list" to view more songs. In addition, optionally, the audience can also acquire the song presented by the live broadcast platform through continuous login and the like, and use the song to initiate song rap interaction after entering the live broadcast room.

At step 412, an interaction request for initiating an interaction is sent to the cloud server based on the input. In some embodiments, the second terminal device 150 may process the input of the second category user 140 in step 412, generate a corresponding interaction request for initiating interaction, and then send the interaction request for initiating interaction, requested by the second category user 140, to the server 130 via the network.

In step 413, in response to receiving the notification of successful initiation of interaction from the cloud server, a message of successful initiation of interaction is displayed. In some embodiments, after receiving an interaction request requesting to initiate interaction from one or more second terminal devices 150, the server 130 may process the requests and respectively send a feedback message to the corresponding one or more second terminal devices 150, for example, send a notification of successful initiation of interaction to the second terminal device 150 corresponding to the request with the earliest receiving time. When the second terminal device 150 receives the notification that the initiated interaction is successful, the notification may be displayed in one or more of optional forms of text, pictures, animation effects, audio, etc. similar to

steps

211 and 214 to inform the corresponding second category user 140 that the interaction is successfully started.

Optionally, before step 413, there may also be step 421 as shown in fig. 4B. In step 421, in response to receiving the notification that the interactive request is joined to the queuing list, a message that the interactive request is in a queuing state is displayed. In some embodiments, after receiving an interaction request requesting to initiate interaction from one or more second terminal devices 150, the server 130 may maintain the queuing list by sorting the requests from morning to evening, for example, according to the receiving time, and send a notification that the interaction request is added to the queuing list to the second terminal device 150 corresponding to the request not sorted at the first position. When the second terminal device 150 receives the notification that the interaction requests to join the queuing list, the notification may be displayed in one or more of alternative forms of text, picture, animation effect, audio, etc. similar to step 413, to inform the corresponding second category user 140 that the interaction has been joined to the queuing list and will be opened later. Furthermore, in some embodiments, it may happen that multiple users 140 of the second category choose to initiate the same interaction in one interaction activity, at which point the server 130 may only open or join the interaction to the queuing list when it first receives a request for the interaction, and not open or join it to the queuing list when it later receives a request for the same interaction again. However, in calculating or accumulating the rating score, a weight of the rating score related to the interaction may be increased, for example, multiplied by the number of requests for the interaction, or the like. For example, in a song snatching scenario, when multiple viewers click on the same song during a song snatching activity, the song may be played only once, but the rating score obtained for snatching the song may be multiplied by a coefficient that may be associated with the number of times the song was clicked during the song snatching activity.

In response to receiving the notification that the first category user request successfully participates in the interaction, a message that the first category user request successfully participates in the interaction is displayed in step 414. In some embodiments, when the first category user 110 requests to participate in the interaction to send the first audio success (e.g., step 214 described with reference to fig. 2), the server 130 may also send a notification to the second terminal device 150 of the second category user 140 that the first category user requests to participate in the interaction to send the first audio success. When the second terminal device 150 receives the notification, the notification may be displayed similarly to step 214.

Fig. 5C illustrates an example interface 500C for a song capture interaction with multiple live broadcasts. As shown in the figure, on the multi-person live broadcast interface, a message can be popped up to show which anchor is successful in grabbing the song.

Optionally, in step 422, the first audio of the first category of users requesting successful participation in the interaction is played, and optionally the first video of the first category of users is also played. In some embodiments, the second terminal device 150 may receive the audio or the information stream including the audio from the first category user 110 requesting successful participation in the interaction from the server 130 and play the corresponding audio. For example, in a song singing interaction with live broadcasts of multiple people, the audience can listen to the singing of the main broadcast successfully singing through the terminal equipment.

Optionally, in synchronization with step 422, there may be step 423 as shown in FIG. 4B. At step 423, the second audio is played. The first audio and the second audio played at step 422 may be combined into the same audio for playing. In some embodiments, the second terminal device 150 may receive the second audio or the information stream including the second audio from the server 130 and play the corresponding audio. The server 130 may mix the first audio and the second audio into the same audio and transmit the same to the second terminal device 150, and the second terminal device 150 may play the mixed audio. Alternatively, the server 130 may send the first audio and the second audio to the second terminal device 150, respectively, and the second terminal device may play the two audios in combination. For example, the first audio may be a singing of a main broadcast, the second audio may be a song accompaniment, the server 130 may appropriately combine the singing of the main broadcast and the song accompaniment, or the second terminal device 150 may appropriately combine the singing and the song accompaniment so that the singing and the song accompaniment may be matched.

Further optionally, after step 422, there may also be step 424 and step 425 as shown in fig. 4B. At step 424, an evaluation score for the first audio and an experience value of the first class of users are received from the cloud server. The evaluation score may be calculated based on audio features extracted from the first audio, and the empirical value may be calculated based on the evaluation score and historical empirical values of the first class of users. At step 425, the evaluation scores and empirical values are displayed.

Steps

424 and 425 are similar to

steps

227 and 228 described with reference to fig. 2B and will not be described in detail herein.

It is worth noting that the form of special effect animation playing can be adopted to show which anchor is successful in grabbing a song. Specifically, the material of the animation is packaged using mp4 format, so that the audio can be packaged together. The animation adopts H.264 to carry out coding and decoding, so that the file volume is small, and the time consumption for pulling the material is reduced. Since h.264 does not support the Alpha channel in video (Alpha channel is a term in computer graphics, and refers to "achromatic" channel, and is mainly used for saving and editing selection area), it is necessary to divide each frame image into two parts, one half carrying Alpha data and the other half carrying RGB channel data, and then paste the Alpha data to the corresponding position in the source shader. In one embodiment, the process of displaying the special effect is as follows: first, the 3D animation system is requested to pull animation effect materials and decompress them to get an effect MP4 file. Next, the structure of the special effect MP4 file is analyzed to obtain a h.264 bare stream, and hard decoding is performed. And finally, performing pipeline processing on the decoded video stream, performing texture cutting on image data, and combining the image data with an Alpha channel for rendering and coloring. As will be appreciated by those skilled in the art, the message of success of the song snatching can be displayed in other forms, and the display of special effects can be performed by other suitable steps.

Fig. 5D illustrates an example interface 500D for a song capture interaction process with multiple live broadcasts. The interface shows an example ranking display, and the viewer may enter the interface 500D by clicking on a "leaderboard" in the

interfaces

500A, 500B, 500C. The leaderboards may include a master leaderboard ("song leader board") and a viewer leaderboard ("song order board"). A master leaderboard may be generated from the relevant content in step 424 described with reference to fig. 2B, and a viewer leaderboard may be similarly generated. For example, the viewer may be added with points based on the behavior of the viewer requesting songs, such as adding a fixed amount of points to each song, or adding different amounts of points according to different song requesting conditions, or adding corresponding points to the top few viewers based on ranking of the number of songs in a song requesting activity, etc. The ranking may then be generated based on the viewer's score. Alternatively, points may be accumulated for a period (e.g., one week, one month, etc.) and a ranking generated. After a period is over, the credits may be cleared and a new round of credit accumulation and ranking may be started. This can stimulate the enthusiasm of the audience for ordering songs, and provide the interactive enthusiasm of the audience.

FIG. 6 schematically illustrates an example flow diagram of a method 600 for online interaction with a server in accordance with one embodiment of this disclosure. The method 600 may be applied, for example, to the server 130 shown in fig. 1.

In step 611, a first interaction request to initiate an interaction with respect to an interactive room is received from one of the at least one second category user client. The interactive room comprises at least two first-class users and at least one second-class user, and the at least two first-class users and the at least one second-class user can log in at respective clients. In some embodiments, the server 130 may receive a first interaction request requesting to initiate an interaction, such as a request to request a song from a viewer, from a second end device 150 of the second category of users 140.

At step 612, a notification to open an online interaction mode is sent to at least two first category user clients. In some embodiments, the server 130 may send a notification to the first terminal devices 120 of the plurality of first category users 110 to turn on the online interaction mode, for example, to turn on a song-grabbing mode.

In step 613, one or more second interaction requests within a predetermined time period are received from one or more of the at least two first category user clients, the second interaction requests indicating that the first category users of the at least two first category user clients request to participate in the interaction to send the first audio. In some embodiments, the server 130 may receive a second interaction request from the first terminal device 120 of the plurality of first category users 110 requesting to participate in the interaction within a predetermined time period. For example, in a song singing interaction of live broadcast of multiple persons, after each round of singing starts, the server 130 may receive and process only the singing request from the first terminal device 120 within a predetermined time period, and the singing request exceeding the time period may be directly ignored. The predetermined period of time may be any period of time such as 30 seconds, 10 seconds, 5 seconds, and the like.

At step 614, the second interactive request with the earliest reception time in the one or more second interactive requests is determined. In some embodiments, the server 130 may determine, when receiving a second interaction request requesting participation in the interaction from the plurality of first terminal devices 120, a second interaction request in which the reception time is earliest.

In step 615, a notification requesting successful participation in the interaction is sent to the first category user clients corresponding to the second interaction request with the earliest reception time. In some embodiments, after determining the second interaction request with the earliest reception time, the server 130 may send a notification requesting successful participation in the interaction, for example, a notification of successful singing, to the first terminal device 120 of the first class user 110 corresponding to the request. Optionally, the server 130 may also send a notification that the first category user 110 requests successful participation in the interaction to the other first terminal device 120 and the second terminal device 150.

Optionally, after step 615, a first audio is received from a first category of user clients requesting successful participation in the interaction. In some embodiments, the server 130 may allow receiving a first audio from the first terminal device 120, e.g. a singing audio of the respective anchor, simultaneously or after sending a notification to the respective first terminal device 120 requesting a successful participation in the interaction. Optionally, during this time, the server 130 may not receive audio from the other first terminal devices 120.

Further details regarding the steps of the method 600 and the operations after receiving the first audio are described in detail with reference to fig. 2A to 5D, and are not repeated herein.

FIG. 7 schematically illustrates an example interaction flow diagram of an online interaction method 700 according to one embodiment of this disclosure. Fig. 7 shows an interactive system according to an embodiment of the present invention, which includes a first category client, a cloud server, and a second category user client. The first category of clients may operate according to the

online interaction methods

200A, 200B described with reference to fig. 2A, 2B, the second category of clients may operate according to the

online interaction methods

400A, 400B described with reference to fig. 4A, 4B, and the cloud server may operate according to the online interaction method 600 described with reference to fig. 6. The main steps in the interaction process between them are schematically shown in fig. 7 in order to help a better understanding of the various steps of the online interaction method described in this specification. The details of the various steps shown in fig. 7 have been described previously and are not repeated here.

It should be noted that the various steps described above and shown in the flowcharts or interactive flowcharts need not be performed in the steps described or shown. Some of the steps may be performed in parallel, or in reverse order of that described or illustrated, as the case may be.

Fig. 8 schematically shows an example block diagram of a client 800 according to an embodiment of the invention. The client 800 may be deployed at a first user terminal 120 of a first category of users 110 as shown in fig. 1. Client 800 may include a receiving module 801, an obtaining module 802, a sending module 803, and a display module 804.

The receiving module 801 may be configured to receive a notification to turn on the online interaction mode. The obtaining module 802 may be configured to obtain input from a first category of users. Optionally, the input indicates that the first category of users requested to engage in the interaction to send the first audio. The sending module 803 may be configured to send an interaction request of the first category of users requesting to participate in the interaction to the cloud server based on the input. The display module 804 may be configured to display a message requesting successful participation in the interaction in response to receiving a notification from the cloud server requesting successful participation in the interaction. The online interaction may be performed in an interactive room, the interactive room comprising the first category of users and at least one other first category of users and at least one second category of users, the first category of users and the at least one other first category of users and the at least one second category of users logging in at their respective clients, the interaction being initiated by one of the at least one second category of users.

Fig. 9 schematically shows an example block diagram of a client 900 according to one embodiment of this invention. The client 900 may be deployed at a second end device 150 of the second category of users 140 shown in fig. 1. The client 900 may include an acquisition module 901, a sending module 902, and a display module 903.

The obtaining module 901 may be configured to obtain input from a second category of users. Specifically, the input indicates that the second category of users initiates an interaction with respect to a request for an interactive room, where the interactive room includes at least one second category of users and at least two first category of users, and the at least one second category of users and the at least two first category of users log in at their respective clients. The sending module 902 may be configured to send an interaction request of the second category of user requests to initiate an interaction to the cloud server. The display module 903 may be configured to display a message that the initiation of the interaction is successful in response to receiving a notification from the cloud server that the initiation of the interaction is successful; and in response to receiving a notification that one of the at least two first class users requested to participate in the interaction to send the first audio was successful, displaying a message that the first class user requested to participate in the interaction was successful.

Fig. 10 schematically illustrates an example block diagram of a cloud server 1000 in accordance with one embodiment of this disclosure. The cloud server 1000 may be deployed in the server 130 shown in fig. 1. The cloud server 1000 may include a first receiving module 1001, a first sending module 1002, a second receiving module 1003, a determining module 1004, and a second sending module 1005.

The first receiving module 1001 may be configured to receive a first interaction request for initiating an interaction with an interactive room from one of at least one second category user client. The first sending module 1002 may be configured to send a notification to at least two first category user clients to open the online interaction mode. The second receiving module 1003 may be configured to receive one or more second interaction requests within a predetermined time period from one or more of the at least two first category user clients. Specifically, the second interaction request indicates that the first category user of the at least two first category user clients requests to participate in the interaction to send the first audio. The determining module 1004 may be configured to determine a second interaction request with an earliest reception time among the one or more second interaction requests. The second sending module 1005 may be configured to send a notification of successful participation request to the first category user clients corresponding to the second interaction request with the earliest reception time.

It should be noted that the various modules described above may be implemented in software or hardware or a combination of both. Several different modules may be implemented in the same software or hardware configuration, or one module may be implemented by several different software or hardware configurations.

FIG. 11 schematically shows an example block diagram of a computing device 1100 in accordance with one embodiment of this disclosure. Computing device 1100 may represent a device to implement various means or modules described herein and/or perform various methods described herein. Computing device 1100 can be, for example, a server, a desktop computer, a laptop computer, a tablet, a smartphone, a smartwatch, a wearable device, or any other suitable computing device or computing system, which can include various levels of devices from full resource devices with substantial storage and processing resources to low-resource devices with limited storage and/or processing resources. In some embodiments, the client and cloud server described above with respect to fig. 8-10 may be implemented in one or more computing devices 1100, respectively.

As shown, the example computing device 1110 includes a processing system 1101, one or more computer-readable media 1102, and one or more I/O interfaces 1103 communicatively coupled to each other. Although not shown, the computing device 1100 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Alternatively, control and data lines, for example, may be included.

The processing system 1101 represents functionality to perform one or more operations using hardware. Accordingly, the processing system 1101 is illustrated as including hardware elements 1104 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. Hardware element 1104 is not limited by the material from which it is formed or the processing mechanisms employed therein. For example, a processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, processor-executable instructions may be electronically-executable instructions.

The computer-readable medium 1102 is illustrated as including a memory/storage 1105. Memory/storage 1105 represents memory/storage associated with one or more computer-readable media. Memory/storage 1105 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 1105 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). Illustratively, the memory/storage 1105 may be used to store the first audio of the first category of users, the requested queuing list, and the like mentioned in the above embodiments. The computer-readable medium 1102 may be configured in various other ways, which are further described below.

One or more input/output interfaces 1103 represent functionality that allows a user to enter commands and information to computing device 1100, and that also allows information to be displayed to the user and/or sent to other components or devices using various input/output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), a camera (e.g., motion that does not involve touch may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), a network card, a receiver, and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a haptic response device, a network card, a transmitter, and so forth. Illustratively, in the above-described embodiments, the first category of users and the second category of users may input through input interfaces on their respective terminal devices to initiate requests and enter audio and/or video and the like, and may view various notifications and view video or listen to audio and the like through output interfaces.

Computing device 1100 also includes online interaction policy 1106. The online interaction policy 1106 may be stored as computer program instructions in the memory/storage 1105. The online interaction policy 1106 may implement all of the functions of the various modules of the client 800, the client 900, and the cloud server 1000 described with respect to fig. 8-10, along with the processing system 1101 and the like.

Various techniques may be described herein in the general context of software, hardware, elements, or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and the like, as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that can be accessed by computing device 1100. By way of example, and not limitation, computer-readable media may comprise "computer-readable storage media" and "computer-readable signal media".

"computer-readable storage medium" refers to a medium and/or device, and/or a tangible storage apparatus, capable of persistently storing information, as opposed to mere signal transmission, carrier wave, or signal per se. Accordingly, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of computer readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or an article of manufacture suitable for storing the desired information and accessible by a computer.

"computer-readable signal medium" refers to a signal-bearing medium configured to transmit instructions to the hardware of the computing device 1100, such as via a network. Signal media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signal media also includes any information delivery media. By way of example, and not limitation, signal media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

As previously described, hardware element 1101 and computer-readable medium 1102 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware form that may be used in some embodiments to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or systems-on-chips, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), and other implementations in silicon or components of other hardware devices. In this context, a hardware element may serve as a processing device that performs program tasks defined by instructions, modules, and/or logic embodied by the hardware element, as well as a hardware device for storing instructions for execution, such as the computer-readable storage medium described previously.

Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 1101. The computing device 1100 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, implementing a module as a module executable by the computing device 1100 as software may be implemented at least partially in hardware, for example, using the processing system's computer-readable storage medium and/or hardware elements 1101. The instructions and/or functions may be executed/operable by, for example, one or more computing devices 1100 and/or processing system 1101 to implement the techniques, modules, and examples described herein.

The techniques described herein may be supported by these various configurations of the computing device 1100 and are not limited to specific examples of the techniques described herein.

Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. An online interaction method for a client, comprising:

receiving a notification of starting an online interaction mode;

obtaining input from a first category of users;

sending an interaction request of the first class user for requesting to participate in interaction to a cloud server based on the input;

responding to a received notification of successful participation request of the cloud server, and displaying a message of successful participation request;

wherein the online interaction mode is enabled by a second category of users.

2. The method of claim 1, further comprising:

sending a first audio from the first category of users to the cloud server;

receiving an evaluation score for the first audio and an experience value of the first class of users from the cloud server, wherein the evaluation score is calculated based on audio features extracted from the first audio, and the experience value is calculated based on the evaluation score and historical experience values of the first class of users; and

displaying the evaluation score and the empirical value.

3. The method of claim 2, wherein the rating scores are derived by inputting audio features extracted from the first audio into a trained machine learning model, the machine learning model being pre-trained with a data set comprising a plurality of audios and rating scores corresponding to the plurality of audios.

4. The method of claim 2, wherein the audio features are extracted by:

enabling the audio signal of the first audio frequency to pass through a high-pass filter to obtain an audio signal subjected to high-frequency filtering;

dividing the high-frequency filtered audio signal into a plurality of audio signal frames of a preset length;

windowing each audio signal frame of the plurality of frames to obtain a plurality of windowed audio signal frames;

converting the plurality of frames of audio signals into an audio energy distribution by a fast Fourier transform;

and distributing the audio energy through a triangular filter bank, and extracting the audio features through the energy output by the triangular filter bank.

5. The method of any of claims 1-4, further comprising:

responding to a received notification of the interaction participation request failure from the cloud server, displaying a message of the interaction participation request failure, and receiving and playing the audio of the first-class user which is requested to successfully participate in the interaction among the at least one other first-class user.

6. An online interaction method for a client, comprising:

obtaining input from a second category of users;

sending an interaction request for initiating interaction to the second category user request to a cloud server based on the input;

responding to a received notification of successful interaction initiation from the cloud server, and displaying a message of successful interaction initiation;

and responding to the received notification that the first-class user requests to participate in the interaction successfully, and displaying a message that the first-class user requests to participate in the interaction successfully.

7. The method of claim 6, further comprising:

playing first audio of the first class of users requesting successful participation in the interaction;

and playing a second audio at the same time of playing a first audio of the first class user successfully participating in the interaction, wherein the first audio and the second audio are combined into the same audio to be played.

8. The method of claim 6, wherein in response to receiving the notification of successful initiation of the interaction from the cloud server, before displaying the message of successful initiation of the interaction, further comprising:

and responding to the received notification that the interaction request is added into the queuing list, and displaying the message that the interaction request is in the queuing state.

9. The method of claim 6, further comprising:

receiving an evaluation score for the first audio and an experience value of the first class of users from the cloud server, wherein the evaluation score is calculated based on audio features extracted from the first audio, and the experience value is calculated based on the evaluation score and historical experience values of the first class of users;

displaying the evaluation score and the empirical value.

10. An online interaction method for a cloud server comprises the following steps:

receiving a first interaction request for initiating an interaction with respect to an interaction room from one of at least one second category user client;

sending a notice for starting an online interaction mode to at least two first-class user clients;

receiving one or more second interaction requests within a predetermined time period from one or more of the at least two first category user clients;

determining a second interaction request with the earliest receiving time in the one or more second interaction requests;

and sending a notification of successful participation request to the first-class user client corresponding to the second interaction request with the earliest receiving time.

11. The method of claim 10, wherein said receiving a first interaction request from one of at least one second category user client to initiate an interaction with respect to an interactive room comprises:

receiving one or more first interaction requests for initiating interactions with an interactive room from the at least one second category user client;

and determining the first interaction request which is sent with the earliest time and initiates the interaction in the one or more first interaction requests.

12. A client, comprising:

a receiving module configured to receive a notification to start an online interaction mode;

an acquisition module configured to acquire input from a first category of users;

a sending module configured to send an interaction request for the first category of user to request to participate in an interaction to a cloud server based on the input;

the display module is configured to respond to the notification of successful participation request of the cloud server and display a message of successful participation request of the interaction;

wherein the online interaction mode is enabled by a second category of users.

13. A client, comprising:

an acquisition module configured to acquire input from a second category of users;

a sending module configured to send an interaction request for initiating an interaction for the second category of user request to a cloud server based on the input;

the display module is configured to respond to the received notification of successful interaction initiation from the cloud server and display a message of successful interaction initiation; and responding to the received notification that the first-class user requests to participate in the interaction successfully, and displaying a message that the first-class user requests to participate in the interaction successfully.

14. A cloud server, comprising:

a first receiving module configured to receive a first interaction request for initiating an interaction with an interactive room from one of at least one second category user client;

a first sending module configured to send a notification of opening an online interaction mode to at least two first category user clients;

a second receiving module configured to receive one or more second interaction requests within a predetermined time period from one or more of the at least two first category user clients;

a determining module configured to determine a second interaction request with an earliest reception time among the one or more second interaction requests;

and the second sending module is configured to send a notification of successful participation request to the first-class user client corresponding to the second interaction request with the earliest receiving time.

15. A computer arrangement, characterized by a memory and a processor, in which a computer program is stored which, when being executed by the processor, causes the processor to carry out the steps of the method of any one of claims 1-11.