CN115396404A

CN115396404A - Synchronous screen projection method and related device for speaker explanation position in cloud conference scene

Info

Publication number: CN115396404A
Application number: CN202210945934.7A
Authority: CN
Inventors: 唐串串
Original assignee: Shenzhen Happycast Technology Co Ltd
Current assignee: Shenzhen Happycast Technology Co Ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-25
Anticipated expiration: 2042-08-08
Also published as: CN115396404B

Abstract

The embodiment of the application discloses a synchronous screen projection method and a related device for an explanation position of a speaker in a cloud conference scene, wherein the method comprises the following steps: acquiring audio information of a speaker in a cloud conference; acquiring an explanation page corresponding to the audio information of the speaker; determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page; updating the page content of the explanation page according to the explanation position of the speaker; and sending the updated explanation page to local equipment participating in the cloud conference. According to the method and the device, the synchronous screen projection of the speaker explanation position to the local equipment in the cloud conference scene can be realized, and the user experience is improved.

Description

Synchronous screen projection method and related device for speaker explanation position in cloud conference scene

Technical Field

The application relates to the technical field of data processing, in particular to a synchronous screen projection method and a related device for a speaker explanation position in a cloud conference scene.

Background

In the application program with the cloud conference function, only the conference participants in the same space with the speaker can indicate the location of the speaker through gestures of the speaker or physical tools such as a laser pen, and the conference participants in the same space without the speaker cannot determine the location of the explanation page where the content explained by the speaker corresponds, so that the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a synchronous screen projection method and a related device for an explanation position of a speaker in a cloud conference scene, so that the explanation position of the speaker is synchronously projected to local equipment in the cloud conference scene, and user experience is improved.

In a first aspect, an embodiment of the present application provides a synchronous screen projection method for a lecture position of a speaker in a cloud conference scene, which is applied to a server, and the method includes:

acquiring audio information of a speaker in a cloud conference, wherein the cloud conference refers to a conference group created by a conference creator at a cloud end through a first local device, and the speaker refers to a participant who acquires control authority of a conference desktop of the cloud conference through a second local device;

acquiring an explanation page corresponding to the audio information of the speaker, wherein the page content of the explanation page is content information included in shared content, and the shared content refers to content information uploaded to a cloud space of the cloud conference by participants of the cloud conference through third local equipment;

determining the explanation position of the speaker according to the audio information of the speaker and the page content of the explanation page;

updating the page content of the explanation page according to the explanation position of the speaker;

and sending the updated explanation page to the local equipment participating in the cloud conference.

In a second aspect, an embodiment of the present application provides a synchronous screen projection method for a lecture position of a speaker in a cloud conference scene, which is applied to a second local device, where the apparatus includes:

receiving audio information of a speaker from a server;

receiving an updated explanation page from a server, wherein the updated explanation page refers to an explanation page after page content of the explanation page is updated according to an explanation position of a speaker, and the explanation position of the speaker refers to a position, corresponding to audio information of the speaker, in the page content of the explanation page;

displaying the updated explanation page;

and playing the audio information of the speaker.

In a third aspect, an embodiment of the present application provides a synchronous screen projection device for a lecture position of a speaker in a cloud conference scene, where the device includes:

a sending unit, configured to send the updated explanation page to the local device participating in the cloud conference;

the system comprises a receiving unit and a sharing unit, wherein the receiving unit is used for receiving audio information and shared content of a speaker in a cloud conference, the cloud conference refers to a conference group created by a conference creator at a cloud end through first local equipment, the speaker refers to a participant who obtains control authority of a conference desktop of the cloud conference through second local equipment, and the shared content refers to content information uploaded to a cloud space of the cloud conference by the participant of the cloud conference through third local equipment;

the processing unit is used for acquiring an explanation page corresponding to the audio information of the speaker, wherein the page content of the explanation page is content information included in shared content; the processing unit is also used for determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page; and updating the page content of the explanation page according to the explanation position of the speaker.

In a fourth aspect, embodiments of the present application provide a server comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the first aspect of embodiments of the present application.

In a fifth aspect, an embodiment of the present application provides a computer storage medium storing a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the first aspect of the present embodiment.

It can be seen that the synchronous screen-casting method and the related device for the speaker explanation position in the cloud conference scene described in this embodiment can obtain the audio information of the speaker in the cloud conference; acquiring an explanation page corresponding to the audio information of the speaker; determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page; updating the page content of the explanation page according to the explanation position of the speaker; and sending the updated explanation page to the local equipment participating in the cloud conference. So, according to the explanation page of the who confirms the leading person who obtains explanation position update, can demonstrate the leading person explanation position for the meeting participant who participates in the cloud meeting to make meeting participant can fix a position the content that leading person explained in the explanation page fast, can improve meeting participant's meeting experience.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a synchronous screen projection method for explaining positions of a speaker in a cloud conference scene, applied to a server according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an explanation page provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of an updated explanation page corresponding to the explanation page of FIG. 4;

fig. 6 is a functional unit composition block diagram of a synchronous screen projection device for explaining a speaker position in a cloud conference scene according to an embodiment of the present application;

fig. 7 is a block diagram illustrating functional units of a synchronous screen projection apparatus for explaining a location of a speaker in another cloud conference scenario according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The key concepts and terms referred to in this application include, but are not limited to, the following:

(1) The shared content refers to content information uploaded to a cloud space of the cloud conference by participants of the cloud conference through the third local device, and the content information includes information such as files, real-time screen recording pictures and web pages. The files include, for example, document files (txt, doc, docx, ppt, pptx, pdf, etc.), cad files, audio files, picture files, video files, etc., and the real-time screen recording pictures include, for example, split-screen mirror images, full-screen pictures, etc. of the local device, which is not limited herein.

(2) The explanation page is a page created and generated according to shared content selected by a speaker of the cloud conference. At least one explanation page may be created based on one shared content. The page content of the explanation page is at least part of content information included in the shared content.

(3) The cloud conference refers to a conference group created in a cloud space by a conference creator through a first local device. The conference creator of the cloud conference can participate in the conference as one of the participants of the cloud conference, or does not participate in the conference, or exits the cloud conference in the conference process, which is not limited herein. If the conference creator is one of the conference participants of the cloud conference, the third local device, through which the conference participants upload the shared content, and the first local device, through which the conference creator creates the conference group, may be the same local device.

(4) The speaker is a participant who obtains the control authority of the conference desktop of the cloud conference through the second local device. The conference participant who obtains the control authority of the conference desktop of the cloud conference can be any conference participant who participates in the cloud conference.

(5) The cloud space refers to a resource space in which the cloud end is used for operating and storing data of the cloud conference. The cloud end can correspond to a server cluster or a single server under a cloud technical architecture, and is used for supporting a user to create a conference group on the cloud server and provide cloud conference service.

(6) The local equipment comprises terminal equipment and conference equipment.

The terminal device is a device connected to the server and capable of performing information interaction with the server, for example, sending information to the server and receiving information pushed by the server. The terminal device may be a smart phone, a portable computer, a desktop computer, a smart television, or a smart watch, a smart bracelet, or other devices capable of performing information interaction with the server, which is not limited herein. Herein, the first local device for creating the cloud conference and the third local device for uploading the shared content are both terminal devices.

The conference device is a device connected to the server and capable of performing information interaction with the server, and for example, transmits information to the server and receives information pushed by the server. The conference device may be a large screen device, a projection device, etc., and is not limited herein.

The information sent by the conference device to the server is used for accessing the conference device into the cloud conference, and the information sent by the terminal device to the server can be used for not only accessing the terminal device into the cloud conference, but also feeding back, to the server, the operation performed by the user on the shared content in the cloud space of the cloud conference and/or the conference desktop.

In the application program with the cloud conference function, only the participants in the same space with the speaker can indicate the position of the speaker for explanation through gestures of the speaker or physical tools such as a laser pen, and the participants in the same space without the speaker cannot determine the position of the explanation page corresponding to the content explained by the speaker, so that the user experience is poor.

In order to solve the above problems, the present application provides a synchronous screen-casting method and a related device for a speaker explaining a position in a cloud conference scene, and the following describes the present application in detail with reference to the accompanying drawings.

Please refer to fig. 1, which is a schematic structural diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a server 100 and a local device 200, and the local device 200 may include one or more terminal devices 200a and conference devices 200b, where the number of terminal devices 200a and the number of conference devices 200b are not limited herein. As shown in fig. 1, the local device 200 has a plurality of local devices, and the plurality of local devices may specifically include a terminal device 200a and a conference device 200b, and each of the terminal device 200a and the conference device 200b shown in fig. 1 may be respectively connected to the server 100 through a network, so that each of the terminal device 200a and the conference device 200b may perform data interaction with the server 100 through the network connection.

The server 100 shown in fig. 1 may be a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communications, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms. The server includes, but is not limited to, a server with an IOS system, an Android system, a Microsoft system, or other operating systems.

The composition structure of the server 100 in the present application may be as shown in fig. 2, where fig. 2 is a schematic structural diagram of a server provided in this embodiment of the present application. The server 100 may comprise a processor 110, a memory 120, a communication interface 130, and one or more programs 121, wherein the one or more programs 121 are stored in the memory 120 and configured to be executed by the processor 110, and wherein the one or more programs 121 comprise instructions for performing any of the steps of the method embodiments described below.

The communication interface 130 is used to support communication between the server 100 and other devices. The Processor 110 may be, for example, a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, transistor logic, hardware components, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

The memory 120 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

In one implementation, the processor 110 is configured to perform any of the steps performed by the server in the method embodiments described below, and when performing a data transfer operation, such as sending an explanation page, the processor may select to invoke the communication interface 130 to complete the corresponding operation.

It should be noted that the schematic structural diagram of the server is merely an example, and more or fewer components may be specifically included, which is not limited herein.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for synchronously screen-casting an explanation location of a presenter in a cloud conference scene according to an embodiment of the present application, where the method may be executed by a server, for example, the method may be applied to the server 100 shown in fig. 1 or fig. 2, and as shown in fig. 3, the method for synchronously screen-casting an explanation location of a presenter in a cloud conference scene includes:

step S101, audio information of a speaker in the cloud conference is obtained.

The cloud conference and the speaker can refer to the foregoing description, and details are not repeated herein.

The server can acquire the audio equipment uploaded by the audio acquisition equipment of the speaker from the cloud space, and the audio acquisition equipment is used for acquiring the audio information of the speaker. The audio acquisition equipment can be the same equipment as the second local equipment of the speaker, and can also be different equipment from the second local equipment of the speaker.

And step S102, acquiring an explanation page corresponding to the audio information of the speaker.

The shared content and the explanation page can refer to the foregoing description, and are not described herein again.

The obtained explanation page corresponding to the audio information of the speaker may be an explanation page currently used by the speaker, that is, the speaker controls the cloud conference through the second local device to stay on the explanation page in the display interface of the second local device. The obtained explanation page corresponding to the audio information of the speaker may also be a plurality of explanation pages corresponding to shared contents explained by the speaker. The specific number of the obtained explanation pages corresponding to the audio information of the speaker is not further limited, and can be set according to actual requirements.

In specific implementation, if it is monitored that the second local device of the speaker sends a page replacement instruction, the obtained explanation page is an explanation page after the replacement instruction is executed, and there is one explanation page corresponding to the audio information of the speaker obtained at this time. Or, when the shared content has a plurality of topics, before the explanation page is changed to the next topic, at least one explanation page corresponding to the next topic may be acquired, where the acquired explanation page corresponding to the audio information of the speaker includes at least one. Or, the explanation pages corresponding to the audio information of the speaker may be periodically obtained, for example, an explanation page may be obtained once after a preset time period elapses, where the duration of the preset time period may be set according to a requirement, and the explanation pages corresponding to the audio information of the speaker that are obtained at this time include at least one. The scheme does not further limit the specific form of the explanation page and can be set according to actual requirements.

And step S103, determining the explanation position of the speaker according to the audio information of the speaker and the page content of the explanation page.

In the specific implementation, the server can periodically acquire the audio information uploaded by the audio acquisition equipment of the speaker from the cloud space, and the audio information used for determining the explanation position of the speaker is periodically acquired at the moment, so that the explanation position corresponding to the speaker can be continuously updated, and the accuracy of the explanation position of the determined speaker is ensured. The duration of the audio frequency obtained each time is the period duration, so that the content of the lecture of the speaker is not missed, and the reliability of the execution of the method is ensured.

And step S104, updating the page content of the explanation page according to the explanation position of the speaker.

Wherein the operation of updating the contents of the explanation page enables the contents corresponding to the determined position of the explanation by the speaker to be highlighted for the human eye to recognize.

In a specific implementation, if the content corresponding to the lecture position of the speaker is on the currently used lecture page, the executed operation of updating the content of the lecture page is: the content corresponding to the lecture position of the speaker is highlighted in the lecture page currently in use. If there are a plurality of acquired explanation pages corresponding to the audio information of the speaker and the content corresponding to the explanation position of the speaker is not on the currently used explanation page, the executed operation of updating the content of the explanation page is as follows: and highlighting the content corresponding to the explaining position of the speaker on the corresponding explaining page.

Step S105, sending the updated explanation page to the local equipment participating in the cloud conference.

The server can send the updated explanation page to the local device except the local device of the speaker, so that other conference participants who are not in the same space with the speaker can quickly locate the explanation position of the speaker through the updated explanation page displayed in the local device display interface. Or the server can send the updated explanation pages to the local devices of all the participants participating in the cloud conference, so that other people in the same space with the speaker can quickly locate the explanation position of the speaker through the updated explanation pages when the speaker does not indicate the explanation position.

It can be seen that the synchronous screen projection method and the related device for the speaker explanation position in the cloud conference scene described in this embodiment can obtain the audio information of the speaker in the cloud conference; acquiring an explanation page corresponding to the audio information of the speaker; determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page; updating the page content of the explanation page according to the explanation position of the speaker; and sending the updated explanation page to local equipment participating in the cloud conference. Therefore, the explanation page updated according to the explanation position of the main speaker obtained by determination can show the explanation position of the main speaker for the conference participants participating in the cloud conference, so that the conference participants can quickly position the explanation content of the main speaker in the explanation page, and the conference participants can be improved in conference experience. In addition, the operations such as the creation of the explanation page are completed on the server, and compared with the operations of the local device for collecting and uploading the video frames corresponding to the display content in real time, the data volume required to be processed by the local device is small, and the requirement on the configuration of the local device is low. Therefore, the scheme provided by the embodiment of the application has a wider application range.

In one possible example, the determining the lecture position of the speaker according to the audio information of the speaker and the page content of the lecture page includes: comparing the audio information of the speaker with first content in the page content of the explanation page to obtain a first comparison result; wherein, the explanation content of one explanation page comprises at least one first content; and determining the explaining position of the speaker according to the first comparison result.

The first content may be text information or image information. When the first content is text information, one first content can correspond to the content of a segment of text, the content of a sentence of text, the content of a line of text and the like; or, when the text information in the explanation page is distributed to form at least one area, one first content can also correspond to the content of the text in one area. When the first content is image information, a first content may correspond to an independent image, or a first content may correspond to a content of a certain area in an image or a certain feature in an image. The specific form of the first content is not further limited here, and may be specifically set according to requirements. It is understood that the explanation contents in other forms such as tables can be decomposed into corresponding text information or image information.

At least one first content included in the explanation content of one explanation page can be character information; or, the at least one first content included in the explanation content of one explanation page may be image information; or, when the explained contents of one explained page include at least two first contents, the at least two first contents may include both text information and image information.

In a specific implementation, the audio information of the speaker may be compared with each of the first contents in sequence. Or the audio information of the speaker is compared with each first content synchronously, so that the efficiency is improved, and the time for determining the explanation position of the speaker is shortened.

In this example, in order to facilitate highlighting of the speaker explanation position, the first comparison result includes a smaller number of first contents than the number of first contents included in the explanation page. And if the number of the first contents included in the first comparison result is equal to the number of the first contents included in the explanation page, not updating the explanation page.

In this example, if the explanation page corresponding to the audio information of the speaker includes at least two first contents, the audio information of the speaker may be compared with each of the first contents, so as to obtain a first comparison result. The first comparison result may be that the lecture position of the presenter is one first content in the lecture page, or, when a plurality of first contents are included in the lecture page, the first comparison result may be that the lecture position of the presenter is at least two first contents in the lecture page. For example, if the explanation page corresponding to the audio information of the speaker includes at least two pieces of text information, one piece of text information may correspond to one first content. At this time, the audio information of the speaker can be compared with each segment of text information respectively, so as to obtain a first comparison result. The first comparison result can be that the speaker explanation position is the text information of a certain section in the explanation page; or, if the explanation page includes a plurality of sections of text information, the first comparison result may be that the explanation position of the speaker is at least two sections of text information in the explanation page. For example, if the explanation page includes content a, content B, and content C, the first comparison result may be that the content corresponding to the explanation position of the speaker is content a; alternatively, the first comparison result may be that the contents corresponding to the lecture position of the presenter are the a content and the C content. It can be understood that the content corresponding to the position of the speaker may be the whole text message of the first content corresponding to the first comparison result, or may be a certain sentence of text message or a certain phrase in the text message.

Therefore, the first content corresponding to the audio information of the speaker can be quickly positioned by comparing the audio information of the speaker with each first content in the explanation page, and the explanation position of the speaker is determined. Through the embodiment, the page content of the explained page can be divided into the plurality of first contents when the page content is more, so that the task amount of the server for single comparison is reduced, and the processing pressure of the server is relieved.

In one possible example, the page content of the explanation page includes at least two first contents; the comparing the audio information of the speaker with the first content in the page content of the explanation page includes: determining whether the at least two first contents comprise first contents with explained identifications, wherein the first contents with the explained identifications are first contents corresponding to historical explanation positions of a speaker; and if so, comparing the audio information of the speaker with the first content without the explained identification.

The first content with the explained identifier is the first content once confirmed as the explaining position of the speaker in the process of explaining the shared content by the speaker, that is, the first content confirmed as the explaining position of the speaker in the process of confirming the explaining position of the speaker historically, and can also be understood as the content already explained by the speaker.

In specific implementation, the server may set the explained identifier for the first content corresponding to the first comparison result after obtaining the first comparison result each time, so as to identify and classify when performing subsequent steps.

In a specific implementation, the server may determine whether there is a first content with an already explained identifier in the explained content of the explained page, and if not, may compare the audio information of the speaker with each first content. And if so, comparing the audio information of the speaker with each first content which is not set with the explained identification. At this time, if the comparison is successful, the speaker explaining position is a position corresponding to the first content which is successfully compared; and if the comparison fails, comparing the audio information of the speaker with the first content provided with the explained identifier.

It is understood that a speaker, when speaking, will typically speak one content after the next content is spoken. Therefore, the efficiency of obtaining the first comparison result can be improved by comparing the audio information of the speaker with the first content which is not described by the speaker, the time difference between the obtaining of the audio information of the speaker and the obtaining of the first comparison result is shortened, and the efficiency of determining the speaker explaining position is improved.

In one possible example, if the page content of the explanation page includes at least two first contents, the audio information of the speaker and the first contents adjacent to the explanation position of the speaker determined last time may be compared first, so as to shorten a time difference between obtaining the audio information of the speaker and obtaining the first comparison result, and improve accuracy of the explanation position of the determined speaker.

In one possible example, the determining the lecture position of the speaker according to the first comparison result includes: determining first content corresponding to the explanation position of the speaker according to the first comparison result; judging whether the number of third contents is larger than a preset number, wherein the third contents are contents corresponding to the first contents, and the first contents comprise at least one third content; if so, comparing the audio information of the speaker with second content to obtain a second comparison result, wherein the second content is content corresponding to the first content, and the first content comprises at least one second content; and determining the explaining position of the speaker according to the second comparison result.

If the first content only comprises text information, the second content is the text information; if the first content only comprises image information, the second content is the image information; and if the first content comprises character information and image information, the second content is the character information or the image information. Illustratively, when the first content includes a plurality of text segments, a second content may correspond to the content of a text segment, the content of a sentence of text, the content of a line of text, or the like. When the first content includes only one piece of text, one second content may correspond to the content of one sentence of text, the content of one line of text, or the like. When the first content comprises at least two images, one second content may correspond to an independent image, or one second content may correspond to the content of a certain area in an image or a certain feature in an image.

When the first content is text information, the number of the third content may be the number of paragraphs, the number of sentences, the number of texts, and the like. When the first content is image information, the third content may be a feature amount or the like. For example, if the first content is a text segment, the second content is a sentence, and the number of the third content may be the number of sentences in the text segment or the number of words in the text segment.

The preset number may be one, two or more, and may be specifically set according to actual requirements, which is not specifically limited herein. For example, when the first content is a segment of words, the third content may be defined as a number of words, and the number of the third content is the number of words included in the first content, and the preset number may be N words.

In a specific implementation, if the number of the third contents is greater than the preset number, the audio information of the speaker needs to be further compared with each of the second contents. And if the number of the third contents is smaller than the preset number, taking the first contents corresponding to the first comparison result as the finally determined explaining position of the speaker.

It is to be understood that the second contents may perform the steps in the above-described embodiments as the first contents when the audio information of the speaker is compared with the respective second contents.

Therefore, by adopting the scheme provided by the embodiment, the accuracy of the explaining position of the speaker can be improved, and the situation that the meeting participants cannot accurately find the position of the explaining content of the speaker due to excessive contents corresponding to the result of comparison only once can be avoided. The processing pressure of the server for single comparison can be avoided being reduced, and the performance requirement of the server for single comparison is lowered.

In one possible example, the comparing the audio information of the speaker with the first content in the page content of the explanation page to obtain a first comparison result includes: analyzing the audio information of the speaker to obtain text data; if the first content is character information, matching the text data with the character information to obtain matching similarity; if the first content is image information, matching the text data with a preset feature tag of the image information to obtain matching similarity; comparing the matching similarity of all the first contents to obtain the first content with the highest matching similarity; wherein the first content with the highest matching similarity is the speaker explanation position.

The method for analyzing the audio information of the speaker may be a method of converting voice into text or a method of semantic analysis, and is not further limited herein. Correspondingly, the text data can be a text obtained by directly converting the audio information of the speaker; or the text data may be an analysis text obtained by performing semantic analysis on the audio information of the speaker, and an analysis result included in the analysis text may be a sentence or a phrase, and the like.

The preset feature tag is a character tag which is correspondingly set by the server according to the image content. The preset feature tag can be a character tag which is set by the server before the speaker explains the corresponding shared content, so that the high efficiency of the speaker explaining position determination is ensured.

In a specific implementation, if the first content is text information, the manner of matching the text data and the text information may be: and matching the character text with the first content to take the number of the same characters in the character text and the first content as a result corresponding to the matching similarity, wherein the matching similarity is higher when the number of the characters is larger. Or, the way of matching the text data and the text information may be: and acquiring each character feature corresponding to the first content, wherein the character feature can be key character content or general expression corresponding to the first content preset by the server according to the shared content. And comparing the analysis text with each character feature to take the character feature number of the first content successfully matched with the analysis text as a result corresponding to the matching similarity, wherein the matching similarity is higher when the number of the matched features is larger. Or, in order to further improve the accuracy of the positioning position, the method of matching the text data and the text information may further be: and performing weighting processing on the first matching similarity obtained by comparing the first content with the text and the second matching similarity obtained by comparing the first content with the text to obtain the matching similarity corresponding to the first content, wherein the weight ratio of the first matching similarity to the second matching similarity can be set according to requirements, and no further limitation is imposed here.

In a specific implementation, if the first content is image information. The way of matching text data and text information may be: and matching the preset feature tags of the analyzed text and the image information, and taking the number of the successfully matched preset feature tags as a result corresponding to the matching similarity, wherein the more the number of the successfully matched feature tags is, the higher the matching similarity is. Specifically, the preset feature tag may be compared with a sentence included in the parsed text, or the preset feature tag may be compared with a phrase included in the parsed text.

In a specific implementation, in order to obtain the highest matching similarity, all matching similarities corresponding to the first contents may be sorted in sequence, and the matching similarity is higher the earlier the sorting is.

Therefore, the accuracy of the determined explanation position of the speaker can be ensured by calculating the matching similarity corresponding to each first content and determining the first content corresponding to the highest matching similarity as the content corresponding to the explanation position of the speaker. In addition, the mode of determining the first comparison result is determined by converting the audio information of the speaker into text data and setting the corresponding preset feature tag for the image, and compared with the mode of searching the image corresponding to the audio information of the speaker and then comparing the image with the first content, the efficiency of comparison can be improved.

Further, in an embodiment, before comparing the matching similarity of all the first contents, the matching similarity of all the first contents may also be compared with a preset similarity. If the comparison result is: and if all the matching similarity degrees are smaller than the preset similarity degree, the matching is failed, and the explanation page is not updated at the moment. In this case, the content spoken by the audio information of the presenter is content unrelated to the conference content. If the comparison result is: and if only one matching similarity is greater than or equal to the preset similarity, determining the position of the explanation content corresponding to the matching similarity as the explanation position of the speaker. If the comparison result is: if at least two matching similarities are greater than or equal to the preset similarity, executing the following steps: and comparing the matching similarity of all the first contents, and determining the first content with the highest matching similarity as the explaining position of the speaker. Therefore, the accuracy of the speaker explaining position determining method can be further improved.

In one possible example, the updating the page content of the explanation page according to the explanation position of the speaker includes: and setting an explanation position mark for content information corresponding to the explanation position of the speaker in the page content of the explanation page to obtain the updated explanation page.

The explaining position mark can be a directional icon such as a cursor or an icon in other forms, which is arranged corresponding to the explaining position of the speaker; or, the explanation position mark can also be used for marking the content corresponding to the explanation position of the speaker in a form obviously different from other page contents of the explanation page, such as setting a background color or setting a mark frame and the like. The specific form of the explanation position mark is not further limited herein, and may be specifically set according to the requirement.

The conference participants can use a full screen mode or a non-full screen mode to view the explanation page. Taking a full screen mode as an example, referring to fig. 4 and 5, fig. 4 is a schematic structural diagram of an explanation page provided in an embodiment of the present application; fig. 5 is a schematic view of a structure of an updated explanation page corresponding to the explanation page in fig. 4. The explanation page may include an area for displaying the page content of the explanation page, as well as other functional components. The functional components may include: the conference system comprises a time length recording component for recording the conference time length, a display component for exiting from a full screen component and displaying a current speaker and a current speaker, a microphone component for speaking and the like, and can be specifically set according to actual requirements. Referring to fig. 4 and 5, in this example, the first contents in the explanation page are all text information, and the explanation position of the speaker finally determined by the method is a sentence of text content. Referring to fig. 5, in order to highlight the lecture position of the main speaker, the correspondingly set lecture position mark sets a ground color to a position corresponding to the lecture position of the main speaker.

It is thus clear that through setting up explanation position mark to the main speaker explanation position in explanation page, obtain the explanation page after the update, the explanation page of not only being convenient for the meeting participant through the updated explanation page quick identification main speaker explanation position that this embodiment obtained, can also watch other explanation contents in the explanation page, help meeting participant to combine the content of understanding the main speaker explanation.

In other embodiments, the method for updating the page content of the explanation page according to the explanation position of the speaker may further include: and highlighting the content corresponding to the explaining position of the speaker on the explaining page in a popup mode. Or only the content corresponding to the lecture position of the presenter may be used as the lecture content in the updated lecture page. Therefore, the content corresponding to the explaining position of the speaker can be more intuitively displayed to the conference staff.

Correspondingly to the embodiment shown in fig. 3, an embodiment of the present application further provides a synchronous screen-casting method for an explanation position of a speaker in a cloud conference scene, where the method is applied to a local device, the local device is connected to a server, and the synchronous screen-casting method for the explanation position of the speaker in the cloud conference scene includes: receiving audio information of a speaker from a server; receiving an updated explanation page from a server; displaying the updated explanation page; and playing the audio information of the speaker.

The local device may be a second local device of the speaker, or the local device may be a local device of another participant, and the second local device may receive a cloud conference interface from the server, such as a conference desktop or an explanation interface, and display the cloud conference interface in an interface of the second local device.

The updated explanation page refers to an explanation page after page content of the explanation page is updated according to an explanation position of the speaker, and the explanation position of the speaker refers to a position corresponding to audio information of the speaker in the page content of the explanation page. For details, reference may be made to the above description, which is not repeated herein.

It can be seen that through this local equipment, the position that the participant can direct recognition the main speaker is explaining can improve the validity that the meeting was exchanged, improves user experience. In addition, the operations such as the creation of the explanation page are completed on the server, and compared with the operations such as the real-time collection and uploading of the video frames displayed by the local equipment, the data volume needing to be processed by the local equipment is small, and the requirement on the configuration of the local equipment is low. Therefore, the scheme provided by the embodiment of the application has a wider application range.

The present application may perform the division of the functional units for the server according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 6 is a block diagram illustrating functional units of a synchronous screen projection device for explaining positions of a speaker in a cloud conference scene according to an embodiment of the present application. The synchronous screen projection device 300 for the lecture position of the speaker in the cloud conference scene can be applied to the server 100 in the network architecture shown in fig. 1, and the synchronous screen projection device 300 for the lecture position of the speaker in the cloud conference scene includes:

a sending unit 310, configured to send the updated explanation page to the local device participating in the cloud conference;

a receiving unit 320, configured to receive audio information and shared content of a speaker in a cloud conference, where the cloud conference refers to a conference group created at a cloud end by a conference creator through a first local device, the speaker refers to a participant who obtains a control authority of a conference desktop of the cloud conference through a second local device, and the shared content refers to content information uploaded to a cloud space of the cloud conference by the participant of the cloud conference through a third local device;

the processing unit 330 is configured to obtain an explanation page corresponding to the audio information of the speaker, where page content of the explanation page is content information included in shared content; the processing unit is also used for determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page; and updating the page content of the explanation page according to the explanation position of the speaker.

In one possible example, in terms of determining the lecture position of the speaker based on the audio information of the speaker and the page content of the lecture page, the processing unit is specifically configured to: comparing the audio information of the speaker with first content in the page content of the explanation page to obtain a first comparison result; wherein, the explanation content of one explanation page comprises at least one first content; and determining the explaining position of the speaker according to the first comparison result.

In one possible example, the page content of the explanation page includes at least two first contents; in the aspect of the comparison between the audio information of the speaker and the first content of the page content of the explanation page, the processing unit is specifically configured to: determining whether the at least two first contents comprise first contents with explained identifications, wherein the first contents with the explained identifications are first contents corresponding to the historical explanation positions of the speaker; and if so, comparing the audio information of the speaker with the first content without the explained identification.

In one possible example, in the determining the lecture position of the presenter according to the first comparison result, the processing unit is specifically configured to: determining first content corresponding to the explanation position of the speaker according to the first comparison result; judging whether the number of third contents is larger than a preset number, wherein the third contents are contents corresponding to the first contents, and the first contents comprise at least one third content; if so, comparing the audio information of the speaker with second content to obtain a second comparison result, wherein the second content is content corresponding to the first content, and the first content comprises at least one second content; and determining the explaining position of the speaker according to the second comparison result. In a possible example, in terms of obtaining a first comparison result by comparing the audio information of the speaker with a first content in the page contents of the explanation page, the processing unit is specifically configured to: analyzing the audio information of the speaker to obtain text data; if the first content is character information, matching the text data with the character information to obtain matching similarity; if the first content is image information, matching the text data with a preset feature tag of the image information to obtain matching similarity; comparing the matching similarity of all the first contents to obtain a first content with the highest matching similarity; the first content with the highest matching similarity is the lecture position of the speaker.

In one possible example, in updating the page content of the explanation page according to the position of the speaker's explanation, the processing unit is specifically configured to: and setting an explanation position mark for content information corresponding to the explanation position of the speaker in the page content of the explanation page to obtain the updated explanation page.

Fig. 7 is a block diagram of functional units of the synchronous screen projection apparatus 300 for explaining the position of the speaker in the cloud conference scene according to the embodiment of the present application. In fig. 7, the synchronous screen projection apparatus for the lecture position of the speaker in the cloud conference scene includes: a processing module 350 and a communication module 340. The processing module 350 is used to control and manage actions of the synchronized screen-casting device 300 for the presenter to explain the location in the cloud conference scenario, for example, steps performed by the receiving unit 320, the processing unit 330, the sending unit 310, and/or other processes for performing the techniques described herein. The communication module 340 is used for supporting interaction between the synchronous screen projection device 300 for explaining the position of the speaker in the cloud conference scene and other equipment. As shown in fig. 7, the synchronous screen projection apparatus 300 for the lecture position of the speaker in the cloud conference scene may further include a storage module 360, and the storage module 360 is configured to store program codes and data of the synchronous screen projection apparatus 300 for the lecture position of the speaker in the cloud conference scene.

The Processing module 350 may be a Processor or a controller, and may be, for example, a Central Processing Unit (CPU), a general-purpose Processor, a Digital Signal Processor (DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, and the like. The communication module 340 may be a transceiver, a Radio Frequency (RF) circuit or a communication interface, etc. The storage module 360 may be a memory.

All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again. The synchronous screen-casting device 300 for the lecture position of the speaker in the cloud conference scene may perform the steps performed by the server in the synchronous screen-casting method for the lecture position of the speaker in the cloud conference scene shown in fig. 3.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program enables a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes a server.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above methods of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A synchronous screen projection method for an explanation position of a speaker in a cloud conference scene is characterized by being applied to a server and comprising the following steps:

determining the explaining position of the speaker according to the audio information of the speaker and the page content of the explaining page;

and sending the updated explanation page to local equipment participating in the cloud conference.

2. The method of claim 1, wherein said determining a speaker's position of interpretation based on said speaker's audio information and the page content of said interpretation page comprises:

comparing the audio information of the speaker with first content in the page content of the explanation page to obtain a first comparison result; wherein, the explanation content of one explanation page comprises at least one first content;

and determining the explanation position of the speaker according to the first comparison result.

3. The method of claim 2, wherein the page content of the explanation page includes at least two first contents;

the comparing the audio information of the speaker with the first content in the page content of the explanation page includes:

determining whether the at least two first contents comprise first contents with explained identifications, wherein the first contents with the explained identifications are first contents corresponding to the historical explanation positions of the speaker;

and if so, comparing the audio information of the speaker with the first content without the explained identification.

4. The method of claim 2, wherein said determining a speaker explanation location based on said first comparison comprises:

determining first content corresponding to the explanation position of the speaker according to the first comparison result;

judging whether the number of third contents is larger than a preset number, wherein the third contents are contents corresponding to the first contents, and the first contents comprise at least one third content;

if so, comparing the audio information of the speaker with second content to obtain a second comparison result, wherein the second content is content corresponding to the first content, and the first content comprises at least one second content;

and determining the explaining position of the speaker according to the second comparison result.

5. The method of claim 2, wherein said comparing the audio information of the speaker with the first content of the page content of the explanation page to obtain a first comparison result comprises:

analyzing the audio information of the speaker to obtain text data;

if the first content is character information, matching the text data with the character information to obtain matching similarity;

if the first content is image information, matching the text data with a preset feature tag of the image information to obtain matching similarity;

comparing the matching similarity of all the first contents to obtain a first content with the highest matching similarity; the first content with the highest matching similarity is the lecture position of the speaker.

6. The method of claim 1, wherein said updating the page contents of said explanation page according to said lecture position of said speaker comprises:

and setting an explanation position mark for content information corresponding to the explanation position of the speaker in the page content of the explanation page to obtain the updated explanation page.

7. A synchronous screen projection method for an explanation position of a speaker in a cloud conference scene is characterized by being applied to local equipment, and comprises the following steps:

receiving audio information of a speaker from a server;

displaying the updated explanation page;

and playing the audio information of the speaker.

8. The utility model provides a synchronous screen device that throws of speaker explanation position in cloud meeting scene which characterized in that is applied to the server, the device includes:

the system comprises a receiving unit, a sharing unit and a processing unit, wherein the receiving unit is used for receiving audio information and shared content of a speaker in a cloud conference, the cloud conference refers to a conference group created by a conference creator at a cloud end through first local equipment, the speaker refers to a participant who obtains control authority of a conference desktop of the cloud conference through second local equipment, and the shared content refers to content information uploaded to a cloud space of the cloud conference by the participant of the cloud conference through third local equipment;

9. A server, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the steps in the method according to any of claims 1-7.