CN118098224A

CN118098224A - Screen sharing control method, device, equipment, medium and program product

Info

Publication number: CN118098224A
Application number: CN202211493442.5A
Authority: CN
Inventors: 卫万成; 翟周伟; 李嘉麒; 赵宏; 黄胜森; 欧阳月令
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2024-05-28
Also published as: WO2024109698A1

Abstract

The present disclosure relates to a screen sharing control method, apparatus, device, medium, and program product. In an embodiment of the present disclosure, voice data generated by a first user of a plurality of users during a plurality of user interactions is obtained; identifying a screen sharing intent and a sharer indication based on the voice data; determining a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and generating an instruction for the screen sharer to share a screen during the interaction. According to the embodiment of the disclosure, unnecessary complicated operations can be omitted during multi-user interaction, screen sharing can be rapidly and accurately realized, and communication efficiency and processing efficiency are improved while user experience is effectively improved.

Description

Screen sharing control method, device, equipment, medium and program product

Technical Field

The present disclosure relates generally to the field of computer technology, and more particularly, to a screen sharing control method, a screen sharing control apparatus, a computing device, a computer-readable storage medium, and a computer program product.

Background

In recent years, with the increase in data processing level and the increase in communication bandwidth, applications capable of realizing user interaction via the internet have been increasing. For example, in the intelligent office area, multiple people can communicate with voice conferences or video conferences to enable voice, image and video streaming. For example, in the field of online education, real-time lectures can be given to a plurality of students from a learner in a remote manner, and multi-person interactions in many aspects such as voice, video, document presentation, etc. can be realized between the learner and the students, and between the students. For another example, in the field of live games, players can communicate with each other in voice or video, transmit information or files, and realize real-time participation and transmission among multiple players in the process of playing games by multiple players.

Disclosure of Invention

According to some embodiments of the present disclosure, there is provided a screen sharing control method, a screen sharing control apparatus, a computing device, a computer-readable storage medium, and a computer program product.

In a first aspect of the present disclosure, a screen sharing control method is provided. The method comprises the following steps: acquiring voice data generated by a first user of a plurality of users during a plurality of user interactions; identifying a screen sharing intent and a sharer indication based on the voice data; determining a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and generating instructions for the screen sharer to share a screen during the interaction. According to the screen sharing control method of the first aspect of the present disclosure, unnecessary complicated operations can be omitted during multi-user interaction, screen sharing can be rapidly and accurately achieved, and communication efficiency and processing efficiency are improved while user experience is effectively improved.

In some embodiments, determining the screen sharer includes: determining the first user as the screen sharer in response to the sharer indication representing information about a first person; determining a user who generated voice data prior to the first user as the screen sharer in response to the sharer indication representing information about a second person; or in response to the sharer indication representing information related to a third person or information related to a questioning pronoun, determining a user of the plurality of users that is semantically most relevant to the sharer indication as the screen sharer. Thus, the screen sharing person which is about to share the screen and is intended by the speaker can be accurately identified by taking the speaker as a reference, so that the user does not need to further manually specify a participant user or manually initiate the screen sharing by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, determining the screen sharer includes: determining whether the sharer indication includes a user name associated with each of the plurality of users based on the user name; and responsive to the sharer indication including the user name, determining a user associated with the user name as the screen sharer. Thus, the screen sharing person who is about to share the screen and is intended by the speaker can be accurately identified from a plurality of participant users of the participant, so that the participant users are not required to be further manually designated or the screen sharing is manually initiated by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, determining the screen sharer includes: determining whether the sharer indication includes a user abbreviation associated with the user name based on the user names respectively associated with the plurality of users; responding to the sharer indication to contain the user abbreviation, and determining the user name with highest correlation with the user abbreviation; and determining the user with the highest correlation with the determined user name as the screen sharer. Thus, the screen sharing person which is intended by the speaker and is to be shared can be accurately identified from a plurality of participant users of the participant, so that the screen sharing can be initiated without the user further accurately speaking the name of the participant user, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, determining the screen sharer includes: determining whether the sharer indication includes a user characteristic; determining a user with highest correlation with the user characteristics from the plurality of users based on image data or video data respectively generated by the plurality of users; and determining the user with highest correlation with the user characteristics as the screen sharer. Thus, the screen sharing person who is about to share the screen and intended by the speaker can be accurately identified from a plurality of participating users, thereby avoiding the situation that the speaker is not familiar with the user name of the screen sharing person, effectively improving the user experience and improving the communication efficiency.

In some embodiments, identifying the screen sharing intent and the sharer indication includes: converting the voice data generated by the first user into text data; determining whether the text data matches a preset semantic rule in a rule set based on the rule set including at least one preset semantic rule; and in response to the text data matching the preset semantic rules in the rule set, identifying the first user indication screen sharing intent and identifying a sharer indication contained in the text data. In some embodiments, the preset semantic rules sequentially include, in semantic structure: an indication recognition field, a mood assist word field, an intention recognition field, and a semantic expansion field, wherein the indication recognition field contains information representing the indication of the sharer, the intention recognition field contains information for recognizing a screen sharing intention, the mood assist word field contains information for connecting the indication recognition field and the intention recognition field, and the semantic expansion field contains information for fuzzy matching the text data with the preset semantic rules. In this way, whether the speaker indicates the screen sharing intention and the intended user thereof can be accurately identified, the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

In some embodiments, identifying the screen sharing intent and the sharer indication includes: converting the voice data generated by the first user into text data; and identifying a screen sharing intention and a sharer indication using a neural network model based on the text data, wherein the neural network model is trained based on a mapping relationship between the text data and the screen sharing intention and the sharer indication. In some embodiments, identifying screen sharing intent and sharer indications using a neural network model includes: classifying the text data with respect to the screen sharing intent as indicating screen sharing intent and not indicating screen sharing intent using a classification algorithm, and classifying with respect to the sharer indication as at least a presence of a sharer indication related to a person name, a presence of a sharer indication related to a user name, a sharer indication related to a user feature, and an absence of a sharer indication; and in response to classifying the text data as indicating screen sharing intent with respect to the screen sharing intent and classifying the sharer indication as having at least one of a sharer indication related to a person name, a sharer indication related to a user, and a sharer indication related to a user feature, identifying the first user to indicate the screen sharing intent and identifying a sharer indication contained in the text data to determine a screen sharer from the plurality of users based on the first user and the determined sharer indication, wherein the sharer indication related to a person name includes at least one of information related to a first person name, information related to a second person name, information related to a third person name, and information related to a query, the sharer indication related to a user including at least one of information related to a user name, information related to a user abbreviation, and information related to a user feature. Therefore, whether the speaker indicates the screen sharing intention and the intended user can be accurately identified by using the neural network model obtained by training the previous big data, the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

In some embodiments, the screen sharing intent includes a positive screen sharing intent and a negative screen sharing intent, and identifying the screen sharing intent and the sharer indication includes: identifying, based on the voice data, whether the screen sharing intent is a positive screen sharing intent or a negative screen sharing intent; in response to identifying that the first user indicates the affirmative screen sharing intent, determining a screen sharer from the plurality of users based on the first user and the identified sharer indication, and generating instructions for the screen sharer to share a screen during the interaction; and in response to identifying that the first user indicates the negative screen sharing intent, determining a sharer to stop from the plurality of users based on the first user and the identified sharer indication, generating an instruction to cause the sharer to stop sharing a screen. In this way, the participant user who intends to stop sharing can be further identified on the basis of the screen sharer who identified the user's intention, thereby enabling the screen sharing to form an effective closed loop.

In some embodiments, the method further comprises: based on the instructions for the screen sharer to share a screen during the interaction, sending a request to a user device of the screen sharer for initiating a screen sharing; and in response to the screen sharer initiating screen sharing, transmitting content presented on a screen of a user device of the screen sharer to other users of the plurality of users except the screen sharer. Therefore, the content presented on the screen of the screen sharing person can be automatically promoted to other participant users, and the communication efficiency and the processing efficiency are improved.

In some embodiments, before sending the request for initiating screen sharing to the screen sharer, further comprising: when a user different from the screen sharer is performing screen sharing, sending a request for allowing the screen sharer to initiate screen sharing to the user with screen sharing control authority; and in response to the user with the screen sharing control authority allowing the screen sharer to initiate screen sharing, sending a request for initiating screen sharing to the screen sharer. Therefore, after confirmation of the host with the conference control authority, the content presented on the screen of the screen sharing person can be rapidly pushed to other participant users, thereby improving the communication efficiency and the processing efficiency while guaranteeing the conference order.

In some embodiments, the method further comprises: the screen sharing by the user different from the screen sharer is stopped in response to the user having the screen sharing control authority allowing the screen sharer to initiate screen sharing or in response to the screen sharer initiating screen sharing. In this way, automatic screen sharing can be effectively avoided from conflicting with a participant currently sharing a screen.

In some embodiments, the method further comprises: responsive to the plurality of users participating in the online video conference, obtaining a user name associated with each of the plurality of users; responsive to a particular user of the plurality of users being speaking, obtaining speech data generated by the particular user and associating a user name of the particular user with the speech data generated by the particular user; and determining a screen sharer based on the voice data associated with the particular user and the user name associated with each of the plurality of users. Thus, the screen sharing can be controlled in a centralized way by being deployed on the server, and the processing efficiency is effectively improved.

In some embodiments, the method is performed by a user equipment of the first user, the method further comprising: obtaining a user name associated with each of a plurality of users participating in an online video conference; acquiring the voice data generated by the first user, and associating a user name of the first user with the voice data generated by the first user; a screen sharer is determined based on the voice data associated with the first user and the user name associated with each of the plurality of users. Therefore, the screen sharing intention of the local user can be identified by being deployed on the user equipment without being transmitted to a server, and the processing efficiency is effectively improved.

In some embodiments, the screen sharing includes presenting all or part of the content displayed on the screen of the user device of the screen sharer to at least one of: user devices of other users interacting with the screen sharer, other electronic devices connected to the user devices of the screen sharer, and applications associated with the user devices of the screen sharer. In this way, in the case of a user device presented to another user interacting with the screen sharer, for example, another participant user who is able to communicate or interact with the screen sharer is enabled to observe the screen content of the screen sharer, thereby effectively remotely sharing information. Under the condition of being presented in other electronic equipment connected with user equipment of the screen sharer, cross-screen or multi-screen display of pictures such as equipment screen throwing can be realized, so that multidimensional smooth interaction of information is realized. In the case of an application program presented in association with the user device of the screen sharer, for example, data generated and displayed by one application program can be synchronized to other application programs, thereby realizing multi-platform screen sharing.

According to a second aspect of the present disclosure, there is provided a screen sharing control device. The device comprises: a data acquisition module that acquires voice data generated by a first user of a plurality of users during a plurality of user interactions; a sharing identification module that identifies a screen sharing intention and a sharer instruction based on the voice data; a user determination module that determines a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and an instruction generation module that generates an instruction for the screen sharer to share a screen during the interaction.

According to a third aspect of the present disclosure, a computing device is provided. The computing device includes: a processor; and a memory storing instructions that, when executed by the processor, cause the at least one computing device to perform the method according to the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer-readable storage medium stores instructions that, when executed by a computing device, cause the computing device to perform a method according to the first aspect of the present disclosure. Computer-readable storage media include, but are not limited to, volatile memory (e.g., random access memory), non-volatile memory (e.g., flash memory, hard disk (HARD DISK DRIVE, HDD), solid state disk (Solid state STATE DRIVE, SSD), and the like).

According to a fifth aspect of the present disclosure, a computer program product is provided. The computer program product comprises instructions that, when executed by a computing device, cause the computing device to perform a method according to the first aspect of the present disclosure. In some embodiments, the program product may comprise one or more software installation packages that may be downloaded or copied and executed on the computing device in the event that a method provided using the foregoing first aspect or a possible variation thereof is required.

Drawings

FIG. 1A illustrates an example diagram of a screen sharing scenario according to an embodiment of the present disclosure;

FIG. 1B illustrates an exemplary diagram of an interface for multi-person interaction according to an embodiment of the present disclosure;

fig. 2A illustrates an example flowchart of a screen sharing control method according to an embodiment of the present disclosure;

FIGS. 2B-2D illustrate exemplary diagrams of interfaces for multi-person interaction in different states according to embodiments of the present disclosure;

FIG. 3A illustrates a flowchart of an example of identification of screen sharing intent and sharer indication in accordance with an embodiment of the present disclosure;

FIG. 3B shows a schematic diagram of an example rule set according to an embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of another example of identification of a screen sharing intent and sharer indication in accordance with an embodiment of the present disclosure;

FIG. 5A illustrates an example diagram of another screen sharing scenario according to an embodiment of the present disclosure;

FIG. 5B illustrates an example flowchart of another screen sharing control method according to an embodiment of the disclosure;

FIG. 6A illustrates an example diagram of a specific implementation in accordance with an embodiment of the present disclosure;

FIG. 6B illustrates an example diagram of another specific implementation in accordance with an embodiment of the present disclosure;

FIG. 6C illustrates an example diagram of an interface in another specific implementation in accordance with an embodiment of the present disclosure;

fig. 7 illustrates a schematic block diagram of a screen sharing control device according to some embodiments of the present disclosure; and

Fig. 8 shows a schematic block diagram of an example device for implementing an example implementation of the present disclosure.

Detailed Description

Preferred implementations of the present disclosure will be described in more detail below with reference to the accompanying drawings. While example embodiments of the present disclosure are illustrated in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "embodiment" and "some embodiments" each denote "at least some embodiments. Other explicit and implicit definitions are also possible below.

As described above, although numerous applications in various fields of multi-person interaction are rapidly developing, most of these applications focus on how to optimize the efficiency or quality of communication transmission, and further intelligent and facilitated processes related to multi-person interaction are still lacking. However, with the increase of the number of users and different demands of users, how to process interactions among multiple users to more accurately meet the demands of users, so as to achieve more efficient communication efficiency, and become a subject that needs to be focused on in a multi-user online interaction scene.

For multi-user online interaction scenarios, the inventors of the present disclosure noted that screen sharing operations are frequently used for presentation and communication between multiple users. For example, in an online video conference in which multiple users participate, all or part of the content presented at the user's user device is transmitted to other individual users participating in the video conference through screen sharing of the users, so that conference communication efficiency in the online video conference can be greatly enhanced. As a first implementation in the screen sharing process, screen sharing may be actively initiated based on a user's manual click. For example, in a video conference in which multiple users participate, initiation of screen sharing is actively achieved by a user desiring to share a screen manually clicking a sharing button displayed on a user interface of an application in a video conference application of his user device. Then, the user temporarily obtains rights for screen sharing in the video conference, and can transmit the local content displayed on the screen of his user device to other users, so that the other users can conduct explanation and explanation while viewing the transmitted local content. As a second implementation in the screen sharing processing manner, a confirmation window for confirming whether screen sharing is required may be pop-up displayed to the newly added user by default at the point in time when the user just added to the online video conference. If the newly added user clicks a button indicating that screen sharing is needed in the popped confirmation window, the user temporarily obtains the authority of screen sharing in the video conference, so that the local content displayed on the screen of the user equipment can be transmitted to other users; if the user clicks a button indicating rejection of screen sharing in the pop-up confirmation window, the user does not obtain the right of screen sharing in the video conference. Then, in the subsequent process of the video conference, if the user needs the screen sharing, the screen sharing is actively initiated based on the user's manual clicking in the manner of the first implementation scheme described above.

However, the inventors of the present disclosure noted that both of the above-described implementations present a cumbersome and inefficient problem in a multi-person interaction scenario. For example, in the first implementation, since the user who wants to share the screen needs to manually click the sharing button on the user interface in the video conference application, but there is a problem in that the sharing button is often hidden or set as a sub-function of a lower hierarchy among a plurality of functional modules in order to avoid a false touch or a function setting or the like during the actual online video conference, it is difficult for the user to find the identification or the icon of the sharing button as a screen sharing entry, so that the screen sharing cannot be achieved for a long time. In particular, this type of difficulty in finding shared buttons is particularly problematic for primary users who are unfamiliar with a particular video conferencing application. Furthermore, even for users familiar with a particular video conferencing application, there may be a failure to find a screen sharing portal accurately at a time due to problems such as emotional tension during an online video conference. On the other hand, for example, in the second implementation, the user newly joining the online conference often does not have the requirement of screen sharing at the beginning, so that almost most users need to reject the screen sharing immediately, which results in cumbersome user experience and increases useless processing amounts of the application program and the background server thereof. Further, during the subsequent course of the video conference, the user still needs to find a screen sharing entry of a specific video conference application, and the same problem as the first implementation described above still exists. Therefore, according to the observation and study of the inventor of the present disclosure, in the screen sharing process under the multi-user online interaction scenario, there are mainly many problems that the screen sharing entrance is difficult to find, the screen sharing operation is complicated, the error rate is high, etc. to be solved, so that the user experience and the communication efficiency are reduced, and meanwhile, many useless processes and resource losses are caused to the application program and the server thereof.

To this end, embodiments of the present disclosure provide a screen sharing control scheme. In this approach, voice data generated by a first user of a plurality of users during a plurality of user interactions is acquired; identifying a screen sharing intent and a sharer indication based on the voice data; determining a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and generating instructions for the screen sharer to share a screen during the interaction. According to the scheme, unnecessary complicated operation can be omitted during multi-user interaction, screen sharing can be rapidly and accurately achieved, user experience is effectively improved, and meanwhile communication efficiency and processing efficiency are improved.

Fig. 1A illustrates an example diagram of a screen sharing scenario, and fig. 1B illustrates an example diagram of an interface for multiple person interactions, according to an embodiment of the present disclosure. In the exemplary screen sharing scenario 100, multiple user devices 110-1, 110-2, 110-3, … …, 110-N (hereinafter sometimes collectively referred to as user devices 110) are included that participate in the same online video conference, with each user device being associated with a user name of a user holding the user device. For example, as shown in fig. 1A and 1B, in an exemplary screen sharing scenario 100, a user device 110-1 is held by a user 1 and is associated with a user name "Zhang Sanj" for that user 1; user device 110-2 is held by user 2 and is associated with the user name "Lifour" for that user 2; user device 110-3 is held by user 3 and is associated with the user name "cinal" for that user 3; user device 110-N is held by user N and is associated with user name "Peter" of user N. In the exemplary screen sharing scenario 100, a video conference module 111, a data acquisition module 112, a data transmission module 113, a data output module 114 may also be included on each user device. In some embodiments, the video conference module 111 is installed as a video conference application, through which a user may participate in an online video conference through the video conference module 111. In an embodiment of the present disclosure, the videoconferencing module 111 can acquire voice data generated when a user speaks through the data acquisition module 112, or voice data stored locally or generated internally by the user device 110 to be transmitted to the server 120 described later through the data transmission module 113, or convert voice data received from the server 120 into sound to be output to the user through the data output module 114. In some embodiments, the video conference module 111 can acquire image data or video data captured by a camera of the user device 110 through the data acquisition module 112, or image data or video data locally stored or internally generated by the user device 110 to be transmitted to the later-described server 120 through the data transmission module 113, or display the image data or video data received from the server 120 to the user through the data output module 114. In the embodiment of the present disclosure, the video conference module 111 can implement related functions and processes for video conference including a screen sharing control process, which is not particularly limited. In embodiments of the present disclosure, the user device 110 may be any device capable of implementing a portion or all of the embodiments for the present disclosure, such as a desktop computer, a laptop computer, a tablet computer, a display device, a camera, a printer, or a smart phone.

In the exemplary screen sharing scenario 100 shown in fig. 1A, a server 120 for integrated control of video conferences is also included. In an embodiment of the present disclosure, server 120 includes a speech recognition engine 121, an intent recognition engine 122, and a conference control system 123 for controlling the videoconferencing process. In the embodiment of the present disclosure, the voice recognition engine 121 is used for performing voice recognition processing on voice data acquired from the user device 110, and the intention recognition engine 122 is used for performing recognition processing of screen sharing intention based on the voice recognition result of the voice recognition engine 121. In some embodiments, the intent recognition engine 122 may include at least one of a rule matching module 1221 and a model recognition module 1222, and further include an object determination module 1223. In an embodiment of the present disclosure, conference control system 123 includes a data acquisition module 1231, a data processing module 1232, a data transmission module 1233, and a data sharing module 1234. In the embodiment of the present disclosure, the conference control system 123 performs data processing based on the instruction about the screen sharer obtained from the intention recognition engine 122, so that the screen sharer realizes screen sharing.

Fig. 2A illustrates an example flowchart of a screen sharing control method according to an embodiment of the present disclosure, and fig. 2B to 2D illustrate interface example diagrams of multi-person interaction in different states according to an embodiment of the present disclosure. At block 201, the server 120 obtains a user name associated with each of a plurality of participant users participating in the video conference. In some embodiments, server 120 obtains a list of users participating in the same video conference based on user names from a plurality of participating users participating in the same video conference, wherein the list of users includes at least the user names of each of the plurality of participating users participating in the same online video conference. In the specific example of the screen sharing scenario shown in fig. 1A and 1B, the user list includes the user name "Zhang three" associated with the user device 110-1, the user name "Lifour" associated with the user device 110-2, the user name "Zheli" associated with the user device 110-3, … …, and the user name "Peter" associated with the user device 110-N. In some embodiments, the user list may also include information related to the user and/or the user device, such as an identifier of the user device, a user account identifier of the user holding the user device, a user gender, a user age, etc., for identifying the particular participating user. It should be appreciated that the server 120 need not necessarily obtain the user name of the participating user, as long as information is able to identify the particular participating user. In some embodiments, the server 120 may acquire voice data generated by each of the plurality of participant users through the data acquisition module 1231 and associate the user name of each of the plurality of participant users with the voice data generated by each of the plurality of participant users. In some embodiments, server 120 may obtain, via data acquisition module 1231, voice data generated by a talking participant in the plurality of participant users during the video conference and associate a user name of the talking participant with the voice data generated by the talking participant to determine a screen sharer based on the voice data associated with the talking participant and the user name associated with each of the plurality of participant users. In the embodiments of the present disclosure, the server 120 may be any computer, such as an application server, a file server, or the like, or may be a user device in some cases, as long as it is capable of implementing a part or all of the embodiments for the present disclosure. In the embodiment of the present disclosure, the server 120 may identify the participating user based on voice data or image or video data instead of having to acquire information related to the participating user such as a user name.

At block 203, the speech recognition engine 121 obtains speech data generated by a speaking one of the plurality of participant users during the video conference by the plurality of participant users. In some embodiments, the server 120 obtains voice data of the participating users in real-time from respective user devices 110 of the plurality of participating users. In some embodiments, in response to a user participating in a video conference, video conference module 111, via data acquisition module 112, acquires voice data generated when the user speaks, or voice data stored locally or generated internally by user device 110, for transmission to server 120, described below, via data transmission module 113, such that server 120 receives and acquires voice data of the participating user. In some embodiments, the timing of the capturing of the voice data by the video conference module 111 may be captured once the user participates in the video conference, may be captured based on a control instruction of the conference control system 123 of the server 120, or may be captured based on an operation instruction of the user. In an example of the present disclosure, the server 120 may acquire voice data generated by each of the plurality of participant users during the video conference by the plurality of participant users in this manner to be input to the voice recognition engine 121 for voice recognition processing.

At block 205, the speech recognition engine 121 performs a speech recognition process on speech data generated by the speaking participant user based on the speech data. In some embodiments, the speech recognition engine 121 utilizes automatic speech recognition (Automatic Speech Recognition, ASR) techniques to convert speech data to text data. In some embodiments, the automatic speech recognition technique may employ natural language processing (Natural Language Processing, NLP), hidden Markov models (Hidden Markov Model, HMM), N-Grams models, artificial neural networks (ARTIFICIAL NEURAL NETWORK, ANN), and the like, but is not limited thereto, as long as it is capable of converting speech data into text data suitable for processing by the intent recognition engine 122. In a further embodiment, the speech recognition engine 121 associates text data converted from the speech data with the participant user who generated the speech data and sends the text data associated with the participant user to the intent recognition engine 122. In some embodiments, the speech recognition engine 121 associates text data with the user name of the participating user. Here, the speech recognition engine 121 may distinguish the text data from the information related to the participating user to distinguish the participating user who participates in the video conference, and may be at least one of the identifier of the user device, the user account identifier of the user who holds the user device, the user sex, the user age, and the like, for example.

At block 207, the intent recognition engine 122 recognizes the screen sharing intent and sharer indication based on the voice data generated by the talking participant. In some embodiments, the intent recognition engine 122 recognizes, for text data converted by the speech recognition engine 121, whether screen sharing intent and sharer indication are included in the text data. In embodiments of the present disclosure, the term "screen sharing intent" for example, indicates that a participant user who is speaking has intent information that is intended to instruct a user to screen share, and the term "sharer indication" for example, indicates that information is available for the intent recognition engine 122 to determine the user to screen share that the participant user is intended to. It should be appreciated that in the embodiments of the present disclosure, recognition of the screen sharing intention and the sharer instruction is performed to convert voice data into text data and based on the converted text data, but recognition may also be performed with respect to the data pattern of the voice data using only voice data.

At block 209, the intent recognition engine 122, in response to identifying that the talking participant user is indicating a screen sharing intent, determines a screen sharer from the plurality of participant users based on the talking participant user and the identified sharer indication. In embodiments of the present disclosure, the term "screen sharer" refers to a participant user determined by the intent recognition engine 122 to be provided with screen sharing rights.

At block 211, the object determination module 1223 generates instructions for the screen sharer to share the screen during the video conference. In an embodiment of the present disclosure, the object determination module 1223 sends the generated instructions to the conference control system 123.

At block 213, the conference control system 123 determines the screen sharer shown in the instruction to share the user equipment screen of the screen sharer based on the instruction for the screen sharer to share the screen during the video conference generated by the object determination module 1223.

Fig. 3A illustrates a flowchart of an example of identification of screen sharing intent and sharer indication according to an embodiment of the present disclosure. In an embodiment of the present disclosure, the intent recognition engine 122 further includes a rule matching module 1221. At block 301, the intent recognition engine 122 determines, based on a rule set including at least one preset semantic rule, whether text data converted by the speech recognition engine 121 matches a preset semantic rule in the rule set. At block 303, the intent recognition engine 122, in response to the text data matching a preset semantic rule in the rule set, recognizes that the speaking participant user indicates a screen sharing intent and recognizes a sharer indication contained in the text data. Otherwise, the intent recognition engine 122 determines not to initiate screen sharing in response to the text data not matching a preset semantic rule in the rule set.

Fig. 3B shows a schematic diagram of an example rule set according to an embodiment of the present disclosure. In an embodiment of the present disclosure, for example, rule set 300 includes a plurality of preset semantic rules 310-1, 310-2, 310-3, … …, 310-N, and these preset semantic rules comprise, in semantic structure, in order: the indication recognition field, the word-of-speech field, the intention recognition field, and the semantic augmentation field follow, for example, the semantic structure of "[ indication recognition field ] [ word-of-speech field ] [ intention recognition field ] [ semantic augmentation field ]". In an embodiment of the present disclosure, for example, the indication-identifying field contains information representing a sharer indication: information about the first person, such as "me", "azan", "I", etc.; information about the second person, such as "you", "your", "noble", etc.; information about a third person's name, such as "he", "she", "it", "they", "those", etc.; information about the pronouns in question, such as "who", "where", etc.; user names respectively associated with a plurality of participant users participating in the video conference, for example, "Zhang Sano", "Lisi" and "Zhu" and the like shown in fig. 1B; user abbreviations associated with the user names of multiple participant users, such as "Zhang Ge" associated with "Zhang Sant", "Sango", "Lao Li", "Liu Li", "xiao Zhu" associated with "Julie", and "P" associated with "Peter", etc. In some embodiments, the indication identification field may be a user dictionary that includes information related to the participating user that indicates the participating user, which may vary based on the participating user participating in the video conference. In the embodiments of the present disclosure, for example, the intention recognition field contains information for recognizing a screen sharing intention, such as "Share", "show", and the like. In embodiments of the present disclosure, for example, the mood word or words field contains information for connecting the indication identification field with the intent identification field, such as "come", "go", "want", "enable" and the like. In some embodiments, the mood word or words aid field may not contain any information. In embodiments of the present disclosure, for example, the semantic augmentation field contains information for fuzzy matching of text data with preset semantic rules, such as "one-touch", "out", "bar", "mock", "yah", etc. In some embodiments, the semantic augmentation field may or may not contain any information. In some embodiments, the semantic augmentation field may be implemented in a wildcard Fu Xinghao (. In some embodiments, the preset semantic rules may follow the rules of regular expressions, indicating that the recognition field, the word of speech field, the intent recognition field, and the semantic augmentation field may be implemented in the form of characters, strings, or clusters of characters. In the example of rule set 300 shown in FIG. 3B, preset semantic rule 310-1 is, for example "[ indicating an identification field: first person name ] [ mood word aid field ] [ intent identification field: shared ] [ semantic augmentation field: * The preset semantic rules 310-2 are, for example "[ indicate identification field ]: the second person calls the [ [ mood word aid field ] [ intent identification field: shared screen ] [ semantic augmentation field: * The preset semantic rules 310-3 are, for example "[ indicate identification field ]: user name ] [ word-aid field of mood ] [ intention identification field: shared ] [ semantic augmentation field: ? ]". As one example, the speaking participant is the participant shown in fig. 1B, "Zhang San", and speaks "i am sharing bar" speech, and the speech data it produces is then converted to text data by the speech recognition engine 121 and sent to the intent recognition engine 122. The intention recognition engine 122 determines, based on the rule set 300, whether the character string "i am sharing bar" contained in the text data converted by the voice recognition engine 121 matches a preset semantic rule in the rule set 300. In this case, the intent recognition engine 122 determines the string "I am to share bar" with preset semantic rules 310-1"[ indicating a recognition field: first person name ] [ mood word aid field ] [ intent identification field: shared ] [ semantic augmentation field: * "match". For example, in response to the text data matching the preset semantic rules 310-1 in the rule set 300, the intent recognition engine 122 determines that the participating user "Zhang Sano" indicates a screen sharing intent (intent recognition field: "sharing") and determines that the sharer included in the text data indicates (indication recognition field: first person "me"). As another example, the speaking participant is the participant "Zhang San", shown in fig. 1B, and speaks "today's home bar" speech, and the speech data it produces is then converted to text data by the speech recognition engine 121 for transmission to the intent recognition engine 122. The intention recognition engine 122 determines, based on the rule set 300, whether the character string "today's bar" contained in the text data converted by the speech recognition engine 121 matches a preset semantic rule in the rule set 300. In this case, the intent recognition engine 122 determines that the string "present today's bar" does not match any of the rule sets 300. In response, the intent recognition engine 122 determines not to initiate screen sharing. According to the embodiment of the disclosure, whether the speaker indicates the screen sharing intention and the intended user can be accurately identified, so that the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

Fig. 4 illustrates a flowchart of another recognition example of a screen sharing intent and a sharer indication according to an embodiment of the present disclosure. In an embodiment of the present disclosure, the intent recognition engine 122 also includes a model recognition module 1222. At block 401, the intent recognition engine 122 performs a recognition process using the neural network model of the model recognition module 1221 to recognize screen sharing intent and sharer indication based on the text data converted by the speech recognition engine 121. In an embodiment of the present disclosure, the neural network model of model recognition module 1221 is trained based on the mapping relationship between text data and screen sharing intent and sharer instructions. In some embodiments, for example, the model recognition module 1222 of the intent recognition engine 122 performs a classification process on screen sharing intent and sharer instructions for text data using a classification algorithm. In an embodiment of the present disclosure, the screen sharing intention is classified into an indicated screen sharing intention and an unexpired screen sharing intention, and the sharer indication is classified into at least a presence of a sharer indication related to a person name, a presence of a sharer indication related to a user name, a sharer indication related to a user feature, and an absence of a sharer indication. At block 403, the model identification module 1222 identifies screen sharing intents and sharer indications based on the identification results of the classification process. For example, the model identification module 1222 identifies the speaking participant user indication screen sharing intent, and identifies the sharer indication contained in the text data to determine the screen sharer from the plurality of participant users based on the speaking participant user and the determined sharer indication, in response to the text data being classified as indicating the screen sharing intent with respect to the screen sharing intent and the sharer indication being classified as having at least one of a person-related sharer indication, a user-related sharer indication, and a user-feature-related sharer indication. In some embodiments, the sharer indication related to the person scale includes at least one of information related to the first person scale, information related to the second person scale, and information related to the third person scale, and information related to the query, and the sharer indication related to the user includes at least one of information related to the user name, information related to the user abbreviation, and information related to the user feature. Otherwise, the model identification module 1222 determines that screen sharing is not initiated in response to the text data being classified as not indicating screen sharing intent with respect to screen sharing intent and/or not being classified as having at least one of a sharer indication related to a person scale, a sharer indication related to a user, and a sharer indication related to a user feature with respect to a sharer indication.

In an embodiment of the present disclosure, for example, the neural network model of model identification module 1221 may be based on an algorithm as shown in equation (1) below. In formula (1), y ₁ represents a screen sharing intention, y ₂ represents a sharer instruction, x ₁ represents text data, and f ₁ () represents a mapping relationship between the text data and the screen sharing intention and sharer instruction. In some embodiments, f ₁ () may be trained by annotating a large number of text data samples with a classification of screen sharing intent and a classification indicated by the sharer. In some embodiments, the neural Network model may be implemented using a convolutional neural Network (Convolutional Neural Networks, CNN), which may be, for example, a visual geometry group Network (Visual Geometry Group Network, VGGNet), a Residual Network (ResNET), a feature pyramid Network (Feature Pyramid Network, FPNet), or the like. In some embodiments, the neural network model may also be implemented using a recurrent neural network (Recurrent Neural Networks, RNN), such as a Long Short-Term Memory (LSTM) model, a bidirectional Long-Short Term Memory (Bidirectional Long Short-Term Memory, biLSTM) model, biLSTM-CRF, and the like. It should be appreciated that the specific implementation of the neural network model is not limited thereto, as long as it is a model suitable for classification based on the mapping relationship between text data and screen sharing intent and sharer instructions.

(Y ₁,y₂)＝f₁(x₁) formula (1)

As one example, the speaking participant is the participant shown in fig. 1B, "Zhang San", and speaks "i am sharing bar" speech, and the speech data it produces is then converted to text data by the speech recognition engine 121 and sent to the intent recognition engine 122. The intention recognition engine 122 performs a classification process for the text data "i am sharing bar" using the neural network model of the model recognition module 1221 based on the formula (1). For example, the model recognition module 1221 "i share bar" for text data, classifies y ₁ as indicating screen sharing intent with respect to screen sharing intent, and classifies y ₂ as having a sharer indication related to person (related to first person pronoun). In response, the intent recognition engine 122 determines that the screen sharing intent is indicated and determines that the sharer indicates y ₂ as information "me" related to the first person pronoun. As another example, the speaking participant is the participant "Zhang San", shown in fig. 1B, and speaks "today's home bar" speech, and the speech data it produces is then converted to text data by the speech recognition engine 121 for transmission to the intent recognition engine 122. The intention recognition engine 122 performs a classification process for the text data "today's home bar" using the neural network model of the model recognition module 1221 based on the formula (1). For example, the model recognition module 1221 classifies y ₁ as not indicating screen sharing intention with respect to the text data "today to the bar" and y ₂ as not having a sharer indication with respect to the sharer indication. In response, the intent recognition engine 122 determines not to initiate screen sharing. According to the embodiment of the disclosure, whether the speaker indicates the screen sharing intention and the user with the intention can be accurately identified by using the neural network model obtained by training the previous big data, so that the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

It should be appreciated that the intent recognition engine 122 may include one of the rule matching module 1221 and the model recognition module 1222, or may include both the rule matching module 1221 and the model recognition module 1222. In the case where both the rule matching module 1221 and the model identifying module 1222 are included, only one of them may be enabled for determining the screen sharing intention and the sharer indication, one of them may be enabled for determining the screen sharing intention and the other one may be enabled for determining the sharer indication, and both may be enabled for cross-validation.

As described above, in embodiments of the present disclosure, the intent recognition engine 122 determines a screen sharer from among a plurality of participant users based on the speaking participant user and the recognized sharer indication in response to recognizing that the speaking participant user indicates a screen sharing intent. For example, in an embodiment of the present disclosure, the intent recognition engine 122 further includes an object determination module 1223. In examples of the present disclosure, the object determination module 1223 of the intent recognition engine 122 may determine a screen sharer based on information related to the human pronoun. In some examples, the object determination module 1223 may determine the talking participant user as a screen sharer in response to the sharer indication representing information about the first person, including, for example, "me," "world," "I," and so forth. As a specific example, for the speech of "i am sharing bar" that the speaking participant user "Zhang three" speaks, the object determination module 1223 of the intent recognition engine 122 recognizes an indication screen sharing intent at block 207 and recognizes that the sharer indicates "i am". In response, the intent recognition engine 122 determines the participant user "Zhang Sano" (i.e., the present round of speakers) as the screen sharer based on the participant user "Zhang Sano" being speaking and the sharer indication "I" representing information about the first person.

In some examples, the object determination module 1223 may determine, as the screen sharer, a participant user who generated voice data prior to the participant user who is speaking in response to the sharer indication representing information about the second person, including, for example, "you," "your," "noble," and the like. As a specific example, before the talking participant "Zhang San", the participant "Liqu" talks, the object determination module 1223 identifies an indication screen sharing intent at block 207 for the "you to share bar" voice that the talking participant "Zhang San", and identifies the sharer indication as "you". In response, the intent recognition engine 122 determines the participant user "Liqu" (i.e., the previous round of talker) who talked before the participant user "Zhang Sano" as the screen sharer based on the participant user "Zhang Sano" who is talking and the sharer indication "you" representing information about the second person.

In some examples, the intent recognition engine 122 may determine, as the screen sharer, a user of the plurality of participant users that is semantically most relevant to the sharer indication in response to the sharer indication representing information related to a third person, including, for example, "he," "she," "it," "they," "those," and the like. As a specific example, for the speech "let Peter he share one go" spoken by the talking participant "Zhang Sanj", the intent recognition engine 122 recognizes an indication screen sharing intent at block 207 and recognizes that the sharer indicates "he". In response thereto, the intent recognition engine 122 determines, as the screen sharer, the participant user "Peter" having the highest semantic relevance to the sharer indication "he" among the plurality of participant users based on the participant user "Zhang Sano" being speaking and the sharer indication "he" representing information about the third person. It should be appreciated that in embodiments of the present disclosure, the method of determining semantic relevance is not limited, as long as it can be adapted to determine a particular participant user, either by considering semantic probabilities or by considering semantic spatial relationships.

In some examples, the intent recognition engine 122 may determine, as the screen sharer, a user of the plurality of participant users that is most semantically associated with the sharer indication in response to the sharer indication representing information related to the problematic pronoun, including, for example, "who," "where," and the like. As a specific example, before the talking participant user "Zhang San", the participant user "Liqu" speaks, the intent recognition engine 122 recognizes an indication screen sharing intent at block 207 for the speech of "who just said to be shared" that the talking participant user "Zhang Sanj" speaks, and recognizes that the sharer indicates as "who". In response, the intent recognition engine 122 determines, as the screen sharer, the participant user "Lifour" that is most semantically associated with the sharer indication "who" among the plurality of participant users based on the participant user "Zhang Sanu" that is speaking and the sharer indication "who" represents information related to the query pronouns. It should be appreciated that in embodiments of the present disclosure, the method of determining semantic relevance is not limited, as long as it can be adapted to determine a particular participant user, either by considering semantic probabilities or by considering semantic spatial relationships.

According to the embodiment of the invention, the screen sharing person which is about to share the screen and is intended by the speaker can be accurately identified by taking the speaker as a reference, so that the user does not need to further manually specify a participant user or manually initiate the screen sharing by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In examples of the present disclosure, the intent recognition engine 122 may determine the screen sharer based on information related to the user name. In some examples, the intent recognition engine 122 may determine whether the sharer indication contains a user name based on user names respectively associated with the plurality of participant users; and, the intent recognition engine 122 determines the participant user associated with the user name as a screen sharer in response to the sharer indication containing the user name. In the example of fig. 1, the user names respectively associated with the plurality of participant users participating in the present video conference include, for example, "Zhang Sano", "Lifour", "Zhu Li", "Peter", and the like. As a specific example, for the speech of "cinal share bar" spoken by the talking participant, "Zhang Sanj", the intent recognition engine 122 recognizes the indication screen sharing intent at block 207 and recognizes the sharer indication as "cinal". In response, the intent recognition engine 122 determines the participant user "cinal" associated with the user name as the screen sharer based on the user name "cinal" contained in the sharer indication. According to the embodiment of the disclosure, the screen sharing person which is intended by the speaker and is to be shared can be accurately identified from a plurality of participant users of the participant, so that the participant users are not required to be further manually designated or the screen sharing is manually initiated by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In some examples, the intent recognition engine 122 may determine whether the sharer indication includes a user abbreviation associated with a user name based on the user names respectively associated with the plurality of participant users; the intent recognition engine 122 determines, in response to the sharer indication containing the user abbreviation, a user name that has highest relevance to the user abbreviation; and, the intention recognition engine 122 determines the user having the highest correlation with the determined user name as the screen sharer. In the example of fig. 1, the user abbreviations associated with the user names of the plurality of participant users include, for example, "Zhang Ge" small "three-bar" associated with "Zhang Sano", "old plum" associated with "Lifour", the "Liriot", the "xiao Zhu" associated with "Julie", the "P" associated with "Peter", and the like. As a specific example, for the speech of "please hold principal share" spoken by the talking participant user "Zhang Sanj", the intent recognition engine 122 recognizes the indication screen sharing intent at block 207 and recognizes that the sharer indicates "hold principal". In response thereto, the intent recognition engine 122 determines, based on the user abbreviation "plum owner" contained in the sharer instruction, that the user name having the highest correlation with the user abbreviation is "plum four", and determines the participating user "plum four" having the highest correlation with the determined user name as the screen sharer. In embodiments of the present disclosure, for a user abbreviation, the logic to determine a correlation (i.e., match) with the user abbreviation and the user name may be to follow a word-order match. In some embodiments, for example, the first word may be matched in a first order, if the first word of the user abbreviation matches the first word of the user name as a unique result, then the match is successful; otherwise, continuing to match the second word in the second order, and if the matching of the additional second word is the unique result, successfully matching; otherwise, the matching is performed by analogy until the matching is finished, and in each order, if the matching is the only result, the matching is successful, and the matched user name is determined to be the highest in correlation. It should be appreciated that in embodiments of the present disclosure, the method of determining the correlation of a user's abbreviation with a user name is not limited, as long as it can be adapted to determine a particular participating user. According to the embodiment of the invention, the screen sharer who is about to share the screen and is intended by the speaker can be accurately identified from a plurality of participant users, so that the screen sharing can be initiated without the need of the user to further accurately speak the name of the participant user, the user experience is effectively improved, and the communication efficiency is improved.

In examples of the present disclosure, the intent recognition engine 122 may determine a screen sharer based on information related to user characteristics. In some examples, the intent recognition engine 122 may determine whether the sharer indication contains a user characteristic. The intent recognition engine 122, in response to the sharer indication containing the user characteristics, determines the participant user having the highest relevance to the user characteristics contained in the sharer indication from among the plurality of participant users based on the image data or video data respectively generated by the plurality of participant users, and determines the participant user having the highest relevance to the user characteristics as the screen sharer. As a specific example, the participant user "cinly" is a woman wearing red clothing, the intent recognition engine 122 recognizes an indication of the screen sharing intent at block 207 for the voice of "please wear red clothing women share a screen bar" spoken by the participant user "Zhang Sanj", who is speaking, and recognizes that the sharer indicates "red clothing women". In response thereto, the intent recognition engine 122 determines, based on the user feature "red-clothing women" included in the sharer instruction, a participant user having the highest correlation with the user feature "red-clothing women" included in the sharer instruction from among the plurality of participant users based on the image data or video data respectively generated by the plurality of participant users, and determines the participant user "cinnabar" having the highest correlation with the user feature as the screen sharer. In the examples of the present disclosure, the method of identifying the user characteristics from the image data or the video data and the correlation determination method are not limited as long as they can be adapted to determine a specific participating user. According to the embodiment of the invention, the screen sharer which is intended by the speaker and is to be shared on the screen can be accurately identified from a plurality of participant users, so that the situation that the speaker is not familiar with the user name of the screen sharer is avoided, the user experience is effectively improved, and the communication efficiency is improved.

As described above, in embodiments of the present disclosure, the object determination module 1223 generates instructions for a screen sharer to share a screen during a video conference. In an embodiment of the present disclosure, the object determination module 1223 sends the generated instructions to the conference control system 123. As a specific example, in the event that the object determination module 1223 determines that the screen sharer is a participant user "Zhang Sanj", the object determination module 1223 generates an instruction for causing the participant user "Zhang Sanj" to share the screen during the video conference and sends to the conference control system 123.

As described above, in the embodiment of the present disclosure, the conference control system 123 determines the screen sharer shown in the instruction based on the instruction for causing the screen sharer to share the screen during the video conference generated by the object determination module 1223 to share the user equipment screen of the screen sharer. In an embodiment of the present disclosure, the data sharing module 1214 of the conference control system 123 sends a request for initiating screen sharing to the user device of the determined screen sharer. In an embodiment of the present disclosure, a screen sharer initiates screen sharing for a request sent by the conference control system 123 to initiate screen sharing. In some embodiments, the request sent by the conference control system 123 to initiate screen sharing may cause the screen sharer to automatically initiate screen sharing such that content presented on the screen of the screen sharer's user device is sent to the conference control system 123. Therefore, the content presented on the screen of the screen sharing person can be automatically promoted to other participant users, and the communication efficiency and the processing efficiency are improved. In some embodiments, the request sent by the conference control system 123 to initiate screen sharing may be presented on the screen sharer's user device in the form of a pop-up window including options for screen sharing "initiate" and "reject" as shown in fig. 2B, and if the screen sharer confirms the option of "initiate" for screen sharing, the screen sharing is initiated from the screen sharer's user device and the content presented on the screen of the screen sharer's user device is sent to the conference control system 123. Therefore, after confirmation of the host with the conference control authority, the content presented on the screen of the screen sharing person can be rapidly pushed to other participant users, thereby improving the communication efficiency and the processing efficiency while guaranteeing the conference order. In an embodiment of the present disclosure, the conference control system 123 receives content presented on the screen of the user equipment of the screen sharer from the user equipment of the screen sharer through the data acquisition module 1231 in response to the screen sharer initiating the screen sharing, and transmits the content to other participant users except the screen sharer through the data transmission module 1213 after processing the content through the data processing module 1212, so that the content presented on the screen of the user equipment of the screen sharer is presented on the screen of the user equipment of the other participant users as shown in fig. 2C.

In some embodiments, before the data sharing module 1214 of the conference control system 123 sends a request to the screen sharer to initiate screen sharing, when the data sharing module 1214 detects that a participant user other than the screen sharer is currently performing screen sharing, the data sharing module 1214 of the conference control system 123 sends a request to allow the screen sharer to initiate screen sharing to a participant user (e.g., a moderator of a video conference) having control authority for screen sharing. In some embodiments, the request sent by conference control system 123 to allow a screen sharer to initiate screen sharing may be presented on the user device of the participant user with screen sharing control authority in the form of a pop-up window including options regarding screen sharing "allow" and "disallow" as shown in fig. 2D, and if the participant user with screen sharing control authority confirms the option of "allow screen sharing", a response is sent to conference control system 123 that the user with screen sharing control authority allows the screen sharer to initiate screen sharing. In an embodiment of the present disclosure, the conference control system 123, in response to the user having the screen sharing control authority allowing the screen sharer to initiate screen sharing, transmits a request for initiating screen sharing to the screen sharer through the data sharing module 1214, so that content presented on the screen of the user device of the screen sharer is transmitted to the conference control system 123 as described above. In some embodiments, conference control system 123 may stop screen sharing currently ongoing by a participant user other than the screen sharer in response to the user having screen sharing control authority allowing the screen sharer to initiate screen sharing, or in response to the screen sharer initiating screen sharing. In this way, automatic screen sharing can be effectively avoided from conflicting with a participant currently sharing a screen. In the example of the present disclosure, after determining the screen sharer, the corresponding processing may be performed for the shared screen according to the specific case, which is not particularly limited by the present disclosure.

In some embodiments, the screen sharing intent includes a positive screen sharing intent and a negative screen sharing intent. In some embodiments, the intent recognition engine 122 recognizes whether the screen sharing intent is a positive or negative screen sharing intent based on voice data generated by the talking participant. In some embodiments, the intent recognition engine 122, in response to recognizing that the talking participant indicates a positive screen sharing intent, determines a screen sharer from the plurality of participant users based on the talking participant and the recognized sharer indication, and generates instructions for the screen sharer to share the screen during the video conference. In other embodiments, the intent recognition engine 122, in response to recognizing that the talking participant indicates a negative screen sharing intent, determines to stop the sharer from the plurality of participant users based on the talking participant and the recognized sharer indication, generates instructions for causing the stop sharer to stop sharing the screen. In embodiments of the present disclosure, the term "affirmative screen sharing intent" refers to, for example, intent information indicating that a speaker wants to have a particular participant user initiate screen sharing, and the term "negative screen sharing intent" refers to, for example, intent information indicating that a speaker wants to have a particular participant user stop screen sharing. In embodiments of the present disclosure, the identification of negative screen sharing intent and the identification of positive screen sharing intent may both employ similar methods. For example, for recognition of negative screen sharing intent, the rule matching module 1221 of the intent recognition trigger 122 may also employ preset semantic rules that follow the semantic structure of "[ instruction recognition field ] [ word-of-speech ] [ intent recognition field ] [ semantic extension field ]". The intent recognition field may include, for example, information such as "Share," "show," etc. to represent affirmative screen sharing intent, and may include information such as "stop sharing," "not sharing," "end sharing," "rest," etc. to represent negative screen sharing intent. In other embodiments of the present disclosure, the model identification module 1222 of the intent identification initiator 122 may further classify the indicated screen sharing intent as indicating a positive screen sharing intent and indicating a negative screen sharing intent based on classifying the screen sharing intent as indicating a screen sharing intent and not indicating a screen sharing intent for the text data. It should be appreciated that one skilled in the art may, based on the embodiments described above, apply and adjust for specific implementations based on affirmative screen sharing intent and negative screen sharing intent to enable a particular participant user to initiate screen sharing if affirmative screen sharing intent is indicated or to disable screen sharing if negative screen sharing intent is indicated. According to the embodiment of the disclosure, on the basis of identifying the screen sharer of the user intention, the participant user of which the user intends to stop sharing can be further identified, so that the screen sharing forms an effective closed loop.

Fig. 5A illustrates an example diagram of another screen sharing scenario according to an embodiment of the present disclosure. In an embodiment of the present disclosure, the exemplary screen sharing scenario 500 illustrated in fig. 5 differs from the exemplary screen sharing scenario 100 illustrated in fig. 1B in that a portion or all of the structure of the speech recognition engine 121 and/or a portion or all of the structure of the intent recognition engine 122 of the server 120 is configured on a user device. For example, in the exemplary screen sharing scenario 500 shown in FIG. 5, a plurality of user devices 110-1, 110-2, 110-3, … …, 110-N (hereinafter sometimes collectively referred to as user devices 110) are included that participate in the same online video conference, where each user device is associated with a user name of a user holding the user device. Each user device includes a voice recognition engine 115 and an intent recognition engine 116 in addition to the video conference module 111, the data acquisition module 112, the data transmission module 113, and the data output module 114, where the intent recognition engine 116 may include at least one of a rule matching module 1161 and a model recognition module 1162, and further includes an object determination module 1163.

Fig. 5B illustrates an example flowchart of another screen sharing control method according to an embodiment of the present disclosure. At block 501, user device 110, in response to a plurality of participating users participating in a video conference, obtains a user name associated with each of the plurality of participating users participating in the video conference from server 120. At block 503, the user device 110, in response to detecting that a participant user holding the user device 110 (hereinafter simply referred to as a "native user") is speaking, obtains, via the data acquisition module 112, native user-generated speech data. In some embodiments, the user device 110 may associate a user name of the local user with the native user generated voice data such that the voice recognition engine 115 and the intent recognition engine 116 of the user device 110 determine the screen sharer based on the voice data associated with the local user and the user name associated with each of the plurality of participant users obtained from the server 120. At block 505, the speech recognition engine 115 performs a speech recognition process on speech data generated by the native user based on the speech data. At block 507, the intent recognition engine 116 recognizes the screen sharing intent and sharer indication based on the voice data generated by the native user. At block 509, the intent recognition engine 116 determines a screen sharer indicated by the local user from among the plurality of participant users based on the local user and the identified sharer indication in response to the local user indicating a screen sharing intent. At block 511, the intent recognition engine 126 generates instructions for requesting the screen sharer to share the screen during the video conference and sends to the server 120. At block 513, the conference control system 123 of the server 120 determines, based on the instruction generated by the user device 110 to request the screen sharer to share the screen during the video conference, the screen sharer shown in the instruction to cause the screen sharer to initiate screen sharing. In some embodiments, if the intent recognition engine 126 of the user device 110 determines that the screen sharer is a native user of the user device 110, screen sharing is automatically initiated from the user device 110 to send content presented on the screen of the native user's user device 110 to the conference control system 123 of the server 120.

It should be appreciated that in the embodiments of the present disclosure, the specific embodiments of fig. 1 to 4 described above may be directly applied or appropriately adapted to the structural configuration under the screen sharing scenario 500 shown in fig. 5A and the example flow shown in fig. 5B to those skilled in the art. It should be appreciated that the structure or processing of both the speech recognition engine and the intent recognition engine need not of course be entirely configured on the server or user device. In some embodiments, one of the speech recognition engine and the intent recognition engine may be configured at the server and the other at the user device. In some embodiments, both the speech recognition engine and the intent recognition engine may be distributed across the server and the user device, i.e., any structure or processing of the speech recognition engine may be distributed across the server and the user device, and any structure or processing of the intent recognition engine may be distributed across the server and the user device. For example, at least one of a rule matching module and a model recognition module of the speech recognition engine and the intent recognition engine may be configured at the user device, and an object determination module of the intent recognition engine may be configured at the server.

In some embodiments, the speech recognition engine and the intent recognition engine may exist separately, may exist integrally, may further be integrated into a video conference module or conference control system or both in the user device. In some embodiments, any structure or process of the speech recognition engine and the intent recognition engine may integrate or cooperate with any structure or process of the video conference module and the conference control system in the user device in any manner of presence. It should be appreciated that where the speech recognition engine, the intent recognition engine, and the conference control system are configured with a server, the server may be one or more, and any structure or process of the speech recognition engine, the intent recognition engine, and the conference control system may be distributed across any portion of the one or more servers, respectively.

Fig. 6A shows an example diagram of a specific implementation according to an embodiment of the present disclosure, fig. 6B shows an example diagram of another specific implementation according to an embodiment of the present disclosure, and fig. 6C shows an example diagram of an interface in another specific implementation according to an embodiment of the present disclosure. In some embodiments, the speech recognition engine and intent recognition engine of the present disclosure may exist as separate functions when user-oriented, for example in the form of a "conference assistant". In some embodiments, the conference assistant may be enabled or disabled according to a user's selection. For example, in the case where any part of the speech recognition engine and the intention recognition engine are configured in the server as shown in fig. 6A in the exemplary screen sharing scene 100, a "conference assistant" may be integrally constituted with the conference control system. Thus, the screen sharing can be controlled in a centralized way by being deployed on the server, and the processing efficiency is effectively improved. For example, in the case where any portion of the speech recognition engine and intent recognition engine in the exemplary screen sharing scenario 500 is configured in the user device as shown in FIG. 6B, a "conference assistant" may be provided in the user device in the form of a plug-in to work with the video conference module, and may be presented in the video conference interface of the user device as shown in FIG. 6C in an icon 610 representing the "conference assistant". Therefore, the screen sharing intention of the local user can be identified by being deployed on the user equipment without being transmitted to a server, and the processing efficiency is effectively improved.

In some embodiments, screen sharing as described above includes presenting all or part of the content displayed on the screen of the user device of the screen sharer to at least one of: user devices of other users interacting with the screen sharer, other electronic devices connected to the user devices of the screen sharer, and applications associated with the user devices of the screen sharer. In this way, in the case of a user device presented to another user interacting with the screen sharer, for example, another participant user who is able to communicate or interact with the screen sharer is enabled to observe the screen content of the screen sharer, thereby effectively remotely sharing information. Under the condition of being presented in other electronic equipment connected with user equipment of the screen sharer, cross-screen or multi-screen display of pictures such as equipment screen throwing can be realized, so that multidimensional smooth interaction of information is realized. In the case of an application program presented in association with the user device of the screen sharer, for example, data generated and displayed by one application program can be synchronized to other application programs, thereby realizing multi-platform screen sharing.

It should be appreciated that although embodiments of the present disclosure are described with respect to a multi-person interaction scenario of a video conference as an example, embodiments of the present disclosure may be applied to any scenario or architecture capable of screen sharing, such as a multi-person interaction scenario of online education, a multi-person interaction scenario of remote presentation, a multi-person interaction scenario of live game, to which the present disclosure is not particularly limited. For example, in embodiments of the present disclosure, in the context of a video conference, a plurality of user interaction periods refer to, for example, the periods described above during which a plurality of participating users participate in the video conference and interact with screen sharing. For example, in a multi-person interaction scenario of online education, a plurality of user interaction periods refer to periods of interaction such as lessons, lectures, questions, and presentations by which a plurality of users including a learner and a student perform with screen sharing. For example, in a multi-user interaction scenario of remote presentation, a plurality of user interaction periods refer to, for example, periods during which a plurality of users including a presenter and a viewer perform interactions such as recording and erasing by using screen sharing. For example, the plurality of user interaction periods in a live game multi-player interaction scenario refers to a period during which a user such as a game player or a host performs interaction such as a live projection game operation interface by screen sharing.

According to the screen sharing control method of the embodiment of the present disclosure, voice data generated by a first user of a plurality of users during a plurality of user interactions is acquired; identifying a screen sharing intent and a sharer indication based on the voice data; determining a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and generating instructions for the screen sharer to share a screen during the interaction. Therefore, unnecessary complicated operation can be saved during multi-user interaction, screen sharing can be rapidly and accurately realized, and communication efficiency and processing efficiency are improved while user experience is effectively improved.

Fig. 7 illustrates a schematic block diagram of a screen sharing control device according to some embodiments of the present disclosure. As shown in fig. 7, the screen sharing control device 700 is characterized by comprising: a data acquisition module 710 that acquires voice data generated by a first user of a plurality of users during a plurality of user interactions; a sharing recognition module 720 for recognizing a screen sharing intention and a sharer instruction based on the voice data; a user determination module 730 that, in response to identifying that the first user indicates the screen sharing intent, determines a screen sharer from the plurality of users based on the first user and the identified sharer indication; and an instruction generation module 740 that generates instructions for the screen sharer to share a screen during the interaction.

In some embodiments, the user determination module 730 may determine the first user as the screen sharer in response to the sharer indication representing information about a first person. In some embodiments, the user determination module 730 may determine a user who generated voice data prior to the first user as the screen sharer in response to the sharer indication representing information about a second person. In some embodiments, the user determination module 730 may determine a user of the plurality of users that is semantically most relevant to the sharer indication as the screen sharer in response to the sharer indication representing information related to a third person or information related to a query. Thus, the screen sharing person which is about to share the screen and is intended by the speaker can be accurately identified by taking the speaker as a reference, so that the user does not need to further manually specify a participant user or manually initiate the screen sharing by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, the user determination module 730 may determine whether the sharer indication includes a user name associated with each of the plurality of users based on the user name. In some embodiments, the user determination module 730 may determine a user associated with the user name as the screen sharer in response to the sharer indication containing the user name. Thus, the screen sharing person who is about to share the screen and is intended by the speaker can be accurately identified from a plurality of participant users of the participant, so that the participant users are not required to be further manually designated or the screen sharing is manually initiated by the screen sharing person, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, the user determination module 730 may determine, based on the user names respectively associated with the plurality of users, whether the sharer indication includes a user abbreviation associated with the user name. In some embodiments, the user determination module 730 may determine a user name that is most relevant to the user abbreviation in response to the sharer indication including the user abbreviation. In some embodiments, the user determination module 730 may determine the user having the highest correlation with the determined user name as the screen sharer. In some embodiments, the user determination module 730 may determine whether the sharer indication includes a user name associated with each of the plurality of users based on the user name; and responsive to the sharer indication including the user name, determining a user associated with the user name as the screen sharer. Thus, the screen sharing person which is intended by the speaker and is to be shared can be accurately identified from a plurality of participant users of the participant, so that the screen sharing can be initiated without the user further accurately speaking the name of the participant user, the user experience is effectively improved, and the communication efficiency is improved.

In some embodiments, the user determination module 730 may determine whether the sharer indication contains a user characteristic; determining a user with highest correlation with the user characteristics from the plurality of users based on image data or video data respectively generated by the plurality of users; and determining the user with highest correlation with the user characteristics as the screen sharer. Thus, the screen sharing person who is about to share the screen and intended by the speaker can be accurately identified from a plurality of participating users, thereby avoiding the situation that the speaker is not familiar with the user name of the screen sharing person, effectively improving the user experience and improving the communication efficiency.

In some embodiments, the shared recognition module 720 may convert the voice data generated by the first user into text data; determining whether the text data matches a preset semantic rule in a rule set based on the rule set including at least one preset semantic rule; and in response to the text data matching the preset semantic rules in the rule set, identifying the first user indication screen sharing intent and identifying a sharer indication contained in the text data. In some embodiments, the preset semantic rules sequentially include, in semantic structure: an indication recognition field, a mood assist word field, an intention recognition field, and a semantic expansion field, wherein the indication recognition field contains information representing the indication of the sharer, the intention recognition field contains information for recognizing a screen sharing intention, the mood assist word field contains information for connecting the indication recognition field and the intention recognition field, and the semantic expansion field contains information for fuzzy matching the text data with the preset semantic rules. In this way, whether the speaker indicates the screen sharing intention and the intended user thereof can be accurately identified, the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

In some embodiments, the shared recognition module 720 may convert the voice data generated by the first user into text data; and identifying a screen sharing intention and a sharer indication using a neural network model based on the text data, wherein the neural network model is trained based on a mapping relationship between the text data and the screen sharing intention and the sharer indication. In some embodiments, the sharing identification module 720 may utilize a classification algorithm to classify the text data as indicating screen sharing intent and not indicating screen sharing intent with respect to the screen sharing intent, and at least classifying as having a sharer indication related to a person name, having a sharer indication related to a user feature, and not having a sharer indication with respect to the sharer indication; and in response to classifying the text data as indicating screen sharing intent with respect to the screen sharing intent and classifying the sharer indication as having at least one of a sharer indication related to a person name, a sharer indication related to a user, and a sharer indication related to a user feature, identifying the first user to indicate the screen sharing intent and identifying a sharer indication contained in the text data to determine a screen sharer from the plurality of users based on the first user and the determined sharer indication, wherein the sharer indication related to a person name includes at least one of information related to a first person name, information related to a second person name, information related to a third person name, and information related to a query, the sharer indication related to a user including at least one of information related to a user name, information related to a user abbreviation, and information related to a user feature. Therefore, whether the speaker indicates the screen sharing intention and the intended user can be accurately identified by using the neural network model obtained by training the previous big data, the user experience is effectively improved, and the communication efficiency and the processing efficiency are improved.

In some embodiments, the screen sharing intent includes a positive screen sharing intent and a negative screen sharing intent. In some embodiments, the sharing identification module 720 may identify whether the screen sharing intent is a positive screen sharing intent or a negative screen sharing intent based on the voice data. In some embodiments, the sharing identification module 720 may determine a screen sharer from the plurality of users based on the first user and the identified sharer indication and generate instructions for the screen sharer to share a screen during the interaction in response to identifying that the first user indicates the affirmative screen sharing intent. In some embodiments, the sharing identification module 720 may determine, based on the first user and the identified sharer indication, a sharer to stop from the plurality of users in response to identifying that the first user indicates the negative screen sharing intent, generate an instruction to cause the sharer to stop sharing a screen. In this way, the participant user who intends to stop sharing can be further identified on the basis of the screen sharer who identified the user's intention, thereby enabling the screen sharing to form an effective closed loop.

In some embodiments, the screen sharing control apparatus 700 further comprises a conference control module 750, the conference control module 750 sending a request to the user device of the screen sharer for initiating screen sharing based on instructions for the screen sharer to share a screen during the interaction; and in response to the screen sharer initiating screen sharing, transmitting content presented on a screen of a user device of the screen sharer to other users of the plurality of users except the screen sharer. Therefore, the content presented on the screen of the screen sharing person can be automatically promoted to other participant users, and the communication efficiency and the processing efficiency are improved.

In some embodiments, before the conference control module 750 sends a request to the screen sharer to initiate screen sharing, the conference control module 750 sends a request to allow the screen sharer to initiate screen sharing to a user having screen sharing control authority when the user different from the screen sharer is performing screen sharing; and in response to the user with the screen sharing control authority allowing the screen sharer to initiate screen sharing, sending a request for initiating screen sharing to the screen sharer. Therefore, after confirmation of the host with the conference control authority, the content presented on the screen of the screen sharing person can be rapidly pushed to other participant users, thereby improving the communication efficiency and the processing efficiency while guaranteeing the conference order.

In some embodiments, the screen sharing control device 700 is implemented by a server. In some embodiments, conference control module 750 may stop screen sharing by the user different from the screen sharer in response to the user having screen sharing control authority allowing the screen sharer to initiate screen sharing, or in response to the screen sharer initiating screen sharing. In this way, automatic screen sharing can be effectively avoided from conflicting with a participant currently sharing a screen.

In some embodiments, the data acquisition module 710 may acquire a user name associated with each of the plurality of users in response to the plurality of users participating in the online video conference; in response to a particular user of the plurality of users being speaking, the particular user-generated speech data is acquired and a user name of the particular user is associated with the particular user-generated speech data. In some embodiments, the user determination module 730 may determine a screen sharer based on the voice data associated with the particular user and the user name associated with each of the plurality of users. Thus, the screen sharing can be controlled in a centralized way by being deployed on the server, and the processing efficiency is effectively improved.

In some embodiments, the screen sharing control apparatus 700 is implemented by a user device of the first user. In some embodiments, the data acquisition module 710 may acquire a user name associated with each of a plurality of users participating in an online video conference; the first user-generated voice data is acquired and a user name of the first user is associated with the first user-generated voice data. In some embodiments, the user determination module 730 may determine a screen sharer based on the voice data associated with the first user and the user name associated with each of the plurality of users. Therefore, the screen sharing intention of the local user can be identified by being deployed on the user equipment without being transmitted to a server, and the processing efficiency is effectively improved.

The screen sharing control device according to the embodiment of the present disclosure can be realized by both a server and a user device, and can be realized by both the server and the user device in a distributed manner. Therefore, unnecessary complicated operation can be saved during multi-user interaction, screen sharing can be rapidly and accurately realized, and communication efficiency and processing efficiency are improved while user experience is effectively improved.

Fig. 8 illustrates a schematic block diagram of an example device for implementing an example implementation of the present disclosure, which may be used to implement at least a portion of the methods or processes of embodiments of the present disclosure. As shown, the device 800 includes a Central Processing Unit (CPU) 801 that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 802 or loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The various methods and/or processes described above may be performed by the processing unit 801. For example, in some example implementations, the method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some example implementations, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM 803 and executed by CPU 801, one or more actions of the methods described above may be performed.

According to an exemplary implementation of the present disclosure, there is provided an electronic device including: at least one processor and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the apparatus to perform the method as described above.

The present disclosure may be methods, apparatus, systems, and/or computer program products. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object-oriented programming language and conventional procedural programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some example implementations, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which may execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products according to example implementations of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various exemplary implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is illustrative, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A screen sharing control method, characterized by comprising:

Acquiring voice data generated by a first user of a plurality of users during a plurality of user interactions;

Identifying a screen sharing intent and a sharer indication based on the voice data;

Determining a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and

Instructions are generated for the screen sharer to share a screen during the interaction.

2. The method of claim 1, wherein determining a screen sharer comprises:

determining the first user as the screen sharer in response to the sharer indication representing information about a first person;

determining a user who generated voice data prior to the first user as the screen sharer in response to the sharer indication representing information about a second person; or alternatively

In response to the sharer indication representing information related to a third person or information related to a query pronoun, determining a user of the plurality of users that is semantically most relevant to the sharer indication as the screen sharer.

3. The method of claim 1, wherein determining a screen sharer comprises:

Determining whether the sharer indication includes a user name associated with each of the plurality of users based on the user name; and

In response to the sharer indication including the user name, a user associated with the user name is determined to be the screen sharer.

4. The method of claim 3, wherein determining a screen sharer comprises:

Determining whether the sharer indication includes a user abbreviation associated with the user name based on the user names respectively associated with the plurality of users;

responding to the sharer indication to contain the user abbreviation, and determining the user name with highest correlation with the user abbreviation; and

And determining the user with the highest correlation with the determined user name as the screen sharer.

5. The method of claim 1, wherein determining a screen sharer comprises:

determining whether the sharer indication includes a user characteristic;

Determining a user with highest correlation with the user characteristics from the plurality of users based on image data or video data respectively generated by the plurality of users; and

And determining the user with the highest correlation with the user characteristics as the screen sharer.

6. The method of claim 1, wherein identifying a screen sharing intent and a sharer indication comprises:

Converting the voice data generated by the first user into text data;

determining whether the text data matches a preset semantic rule in a rule set based on the rule set including at least one preset semantic rule; and

In response to the text data matching the preset semantic rules in the rule set, the first user indication screen sharing intent is identified and sharer indications contained in the text data are identified.

7. The method of claim 6, wherein the preset semantic rules comprise, in semantic structure, in order: an indication recognition field, a mood assist word field, an intention recognition field, and a semantic expansion field, wherein the indication recognition field contains information representing the indication of the sharer, the intention recognition field contains information for recognizing a screen sharing intention, the mood assist word field contains information for connecting the indication recognition field and the intention recognition field, and the semantic expansion field contains information for fuzzy matching the text data with the preset semantic rules.

8. The method of claim 1, wherein identifying a screen sharing intent and a sharer indication comprises:

converting the voice data generated by the first user into text data; and

Based on the text data, screen sharing intent and sharer instructions are identified using a neural network model,

The neural network model is trained based on the mapping relationship between text data and screen sharing intent and sharer indication.

9. The method of claim 8, wherein identifying screen sharing intents and sharer indications using a neural network model comprises:

Classifying the text data with respect to the screen sharing intent as indicating screen sharing intent and not indicating screen sharing intent using a classification algorithm, and classifying with respect to the sharer indication as at least a presence of a sharer indication related to a person name, a presence of a sharer indication related to a user name, a sharer indication related to a user feature, and an absence of a sharer indication; and

In response to classifying the text data as indicating screen sharing intent with respect to the screen sharing intent and as having at least one of a person-related sharer indication, a user-related sharer indication, and a user-feature-related sharer indication with respect to the sharer indication, identifying the first user as indicating the screen sharing intent, and identifying a sharer indication contained in the text data to determine a screen sharer from the plurality of users based on the first user and the determined sharer indication,

Wherein the sharer indication related to the person scale includes at least one of information related to the first person scale, information related to the second person scale, information related to the third person scale, and information related to the questioning subjects, and the sharer indication related to the user includes at least one of information related to a user name, information related to a user abbreviation, and information related to a user feature.

10. The method of any of claims 1 to 9, wherein the screen sharing intent comprises a positive screen sharing intent and a negative screen sharing intent, and identifying a screen sharing intent and a sharer indication comprises:

Identifying, based on the voice data, whether the screen sharing intent is a positive screen sharing intent or a negative screen sharing intent;

In response to identifying that the first user indicates the affirmative screen sharing intent, determining a screen sharer from the plurality of users based on the first user and the identified sharer indication, and generating instructions for the screen sharer to share a screen during the interaction; and

In response to identifying that the first user indicates the negative screen sharing intent, determining a sharer to stop from the plurality of users based on the first user and the identified sharer indication, generating an instruction to cause the sharer to stop sharing a screen.

11. The method as recited in claim 1, further comprising:

Based on the instructions for the screen sharer to share a screen during the interaction, sending a request to a user device of the screen sharer for initiating a screen sharing;

And in response to the screen sharer initiating screen sharing, transmitting content presented on a screen of a user device of the screen sharer to other users of the plurality of users except the screen sharer.

12. The method of claim 11, further comprising, prior to sending the request to the screen sharer to initiate screen sharing:

when a user different from the screen sharer is performing screen sharing, sending a request for allowing the screen sharer to initiate screen sharing to the user with screen sharing control authority; and

And in response to the user with the screen sharing control authority allowing the screen sharer to initiate screen sharing, sending a request for initiating screen sharing to the screen sharer.

13. The method as recited in claim 12, further comprising:

The screen sharing by the user different from the screen sharer is stopped in response to the user having the screen sharing control authority allowing the screen sharer to initiate screen sharing or in response to the screen sharer initiating screen sharing.

14. The method as recited in claim 1, further comprising:

Responsive to the plurality of users participating in the online video conference, obtaining a user name associated with each of the plurality of users;

Responsive to a particular user of the plurality of users being speaking, obtaining speech data generated by the particular user and associating a user name of the particular user with the speech data generated by the particular user; and

A screen sharer is determined based on the voice data associated with the particular user and the user name associated with each of the plurality of users.

15. The method of claim 1, wherein the method is performed by a user device of the first user, further comprising:

Obtaining a user name associated with each of a plurality of users participating in an online video conference;

Acquiring the voice data generated by the first user, and associating a user name of the first user with the voice data generated by the first user;

a screen sharer is determined based on the voice data associated with the first user and the user name associated with each of the plurality of users.

16. The method of claims 1-9, wherein the screen sharing includes presenting all or a portion of content displayed on a screen of a user device of the screen sharer to at least one of: user devices of other users interacting with the screen sharer, other electronic devices connected to the user devices of the screen sharer, and applications associated with the user devices of the screen sharer.

17. A screen sharing control device, characterized by comprising:

A data acquisition module that acquires voice data generated by a first user of a plurality of users during a plurality of user interactions;

a sharing identification module that identifies a screen sharing intention and a sharer instruction based on the voice data;

A user determination module that determines a screen sharer from the plurality of users based on the first user and the identified sharer indication in response to identifying that the first user indicates the screen sharing intent; and

And the instruction generation module is used for generating instructions for enabling the screen sharer to share a screen during the interaction.

18. A computing device, comprising:

A processor; and

A memory storing instructions that, when executed by the processor, cause the computing device to perform the method of any one of claims 1 to 16.

19. A computer-readable storage medium storing instructions that, when executed by a computing device, cause the computing device to perform the method of any one of claims 1 to 16.

20. A computer program product, characterized in that it comprises instructions that, when executed by a computing device, cause the computing device to perform the method according to any of claims 1 to 16.