CN113468319B

CN113468319B - Internet-based multi-application-scene conference interaction system and method

Info

Publication number: CN113468319B
Application number: CN202110823507.7A
Authority: CN
Inventors: 何文龙; 李永红; 刘军涛
Original assignee: Shenzhen Electron Technology Co ltd
Current assignee: Shenzhen Electron Technology Co ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2022-01-14
Anticipated expiration: 2041-07-21
Also published as: CN113468319A

Abstract

The invention provides a conference interaction system and method based on multiple application scenes of the Internet, and relates to the field of the Internet. The method and the device solve the problem of privacy leakage possibly generated by a shared screen through a voice conference, recognize voice and sight of a user through a first terminal in the conference process to obtain the content spoken and seen by the user during speaking, determine the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and push the prompt position to a second terminal, so that after the user at the second terminal opens the conference lecture draft, the real-time annotation can be performed on the conference lecture draft and the presentation can be performed to the user. Therefore, the invention can ensure that other persons participating in the conference except the speaker can quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.

Description

Internet-based multi-application-scene conference interaction system and method

Technical Field

The invention relates to the technical field of Internet, in particular to a conference interaction system and method based on multiple application scenes of the Internet.

Background

With the increasing demand of home and office, the existing internet conference technology generally adopts a remote conference mode to improve the working efficiency of the staff at home.

In order to improve the communication efficiency in the conference, a shared screen is usually adopted to display the content of the lecture when the user speaks. However, the screen sharing mode easily enables non-conference files of users to be synchronously shared, and the privacy leakage problem exists, so that the voice conference can avoid the situation.

However, in a voice conference, when a lecture document with a large text amount is presented, a speaker usually presents the lecture document while looking at the lecture document, but other participants in the conference cannot well confirm the position of the speaking content of the speaker in the lecture document only by voice, which leads to a decrease in efficiency.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a conference interaction system and a conference interaction method based on multiple application scenes of the Internet, and solves the problem that other people participating in a conference except a speaker in the existing voice conference can not well confirm the position of the speaking content of the speaker in a presentation only through voice.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

in a first aspect, a conference interaction system based on internet and multiple application scenarios is provided, which includes: the system comprises a first terminal, a second terminal and a network server;

the first terminal includes: the conference lecture manuscript uploading module, the voice recognition module, the watching area recognition module and the display content information acquisition module;

the second terminal includes: the conference lecture manuscript acquisition module and the real-time prompt module are arranged;

the network server includes: the system comprises a shared database, a text recognition module and a prompt area determination module;

the conference lecture manuscript uploading module is used for uploading the conference lecture manuscript selected by the user to a shared database of the network server;

the text recognition module is used for acquiring text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information;

the conference lecture manuscript acquisition module is used for acquiring a conference lecture manuscript uploaded from a first terminal from a shared database of a network server;

the voice recognition module is used for recognizing voices in each preset time interval during speaking of the user in real time as character information of the user speaking;

the watching area identification module is used for acquiring a watching area of a user of the first terminal when the user opens a conference lecture manuscript and speaks;

the display content information acquisition module is used for acquiring display content information on the first terminal after the watching area of the user is identified; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal;

the prompting area determining module is used for determining a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompting position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushing the prompting position to the second terminal;

and the real-time prompting module is used for carrying out real-time marking on the conference lecture draft and displaying the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompting position.

Further, the determining, based on the collected display content information on the first terminal and the character information of the user during speaking, a corresponding position of the speaking content of the user of the first terminal in the conference lecture manuscript as a prompt position, and then pushing the prompt position to the second terminal includes:

s1, acquiring a display area and a watching area in the display content information;

s2, performing outward expansion on the watching area to obtain a watching paragraph area;

s3, identifying character information of the watching paragraph area;

s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft;

s5, acquiring a paragraph with the highest first matching degree in the conference lecture as a target paragraph;

s6, identifying character information of the target paragraph;

s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph;

s8, acquiring a line with the highest second matching degree in the target paragraph as a target line;

s9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;

and if the third matching degree is greater than the judgment threshold, taking the target paragraph and the target line as the prompt position.

Further, the expanding the gazing area to obtain a gazing paragraph area includes:

s2.1, calculating a bounding box of the watching region;

s2.2, acquiring line spacing and word spacing of paragraphs as standard spacing;

and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and stopping to obtain the watching paragraph area.

Further, the calculating a first matching degree between the character information of the gazing paragraph area and the character information of each paragraph in the conference lecture draft includes:

s4.1, selecting paragraphs with the same line number from the conference lecture draft;

s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;

s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;

and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.

Further, the calculating a second matching degree between the character information of the gazing area and the character information of the target paragraph includes:

s7.1, taking the character information of the watching area as a second sample matrix;

s7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;

and S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.

In a second aspect, a conference interaction method for multiple application scenarios based on the internet is provided, and the method includes:

t1, acquiring a conference lecture manuscript from the first terminal;

t2, acquiring text information of the conference lecture draft; the text information includes: paragraph number, line number, and character information;

t3, sharing the conference lecture manuscript to all the second terminals;

t4, acquiring character information of the user speaking from the first terminal in real time; the character information of the user speaking is the voice of the user in each preset time interval when speaking;

t5, acquiring display content information in real time when the user opens the conference lecture manuscript and speaks from the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of the user of the first terminal when the user of the first terminal opens the conference lecture manuscript and speaks;

and T6, determining a corresponding position of the speech content of the user of the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information when the user speaks, and pushing the prompt position to the second terminal for real-time marking on the conference lecture draft and displaying to the user of the second terminal after the conference lecture draft is opened at the second terminal.

Further, the determining, based on the display content information and the character information of the user during speaking, a corresponding position of the speaking content of the user of the first terminal in the conference lecture manuscript as a prompt position, and then pushing the prompt position to the second terminal includes:

s3, identifying character information of the watching paragraph area;

s6, identifying character information of the target paragraph;

s2.1, calculating a bounding box of the watching region;

(III) advantageous effects

The method and the device solve the problem of privacy leakage possibly generated by a shared screen through a voice conference, recognize voice and sight of a user through a first terminal in the conference process to obtain the content spoken and seen by the user during speaking, determine the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and push the prompt position to a second terminal, so that after the user at the second terminal opens the conference lecture draft, the real-time annotation can be performed on the conference lecture draft and the presentation can be performed to the user. Therefore, the invention can be used in common voice conference, and also can be optimized for the speaker needing large text amount in the voice conference. The method and the system enable other people participating in the conference except the speaker to quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a system block diagram of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a display area, a gaze area, a bounding box, and a gaze segment area of an embodiment of the present invention;

FIG. 3 is a flow chart of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application solves the problem that other persons participating in the conference except the speaker in the existing voice conference can not well confirm the position of the speaking content of the speaker in the lecture manuscript only through voice by providing the internet-based conference interaction method and the internet-based conference interaction system with multiple application scenes.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example 1:

the embodiment of the invention provides an internet-based conference interaction system with multiple application scenes, as shown in fig. 1, comprising: the system comprises a first terminal, a second terminal and a network server.

The first terminal and the second terminal are intelligent devices with conference software, such as personal computers, the first terminal is a device of a speaker of the lecture manuscript, and the second terminal is a device of other participants.

The embodiment of the invention has the beneficial effects that:

the embodiment of the invention solves the privacy leakage problem possibly generated by a shared screen through a voice conference, and performs voice and sight recognition on the user in the conference process through the first terminal to obtain the content spoken and seen by the user during speaking, and then determines the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and then pushes the prompt position to the second terminal, so that the user at the second terminal can perform real-time marking on the conference lecture draft and display the conference lecture draft to the user after opening the conference lecture draft. Therefore, the invention can be used in common voice conference, and also can be optimized for the speaker needing large text amount in the voice conference. The method and the system enable other people participating in the conference except the speaker to quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.

The following describes a detailed implementation process of this embodiment:

and uploading the conference lecture selected by the user to a shared database of the network server through a conference lecture uploading module of the first client.

Specifically, the conference lecture adopts a unified standard, for example, the paragraph spacing is larger than the line spacing. And uploading the conference lecture manuscript to a network server, and storing the conference lecture manuscript in a shared database, wherein all participants have the downloading permission of the conference lecture manuscript.

A text recognition module of the network server performs text recognition on the conference lecture manuscript to acquire text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information.

Specifically, for example, after a conference lecture is processed by the existing text recognition technology, the paragraph number and the total number of paragraphs of each paragraph, the line number and the total number of lines in each paragraph, and character information (i.e. the text content and the sequence in each line) can be obtained.

A conference lecture manuscript acquisition module of the second terminal acquires a conference lecture manuscript uploaded from the first terminal from a shared database of the network server; so that the user of the second terminal can view the original file of the conference lecture manuscript.

In the conference process, a voice recognition module of a first terminal recognizes voices in each preset time interval during the speaking of a user in real time through an existing voice-to-text algorithm as character information of the speaking of the user; the preset time interval can be preset manually according to needs.

The watching area identification module of the first terminal acquires the watching area of the user of the first terminal by using the existing eyeball tracking algorithm when the user opens the conference lecture manuscript and speaks.

Meanwhile, after the display content information acquisition module of the first terminal identifies the watching area of the user, acquiring display content information on the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal; specifically, the acquisition of the display content information can be realized by recording a screen of the first terminal.

After the collection of the display content information of the first terminal is completed, the display content information needs to be analyzed to determine the position of the text corresponding to the speech of the user in the lecture manuscript, and therefore,

and a prompt area determining module of the network server determines a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompt position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushes the prompt position to the second terminal.

Specifically, the method comprises the following steps:

as an example given in fig. 2, the peripheral rectangular area is a display area corresponding to the conference lecture manuscript; the irregular area in the display area is a user's gaze area.

as shown in fig. 2, the method specifically includes the following steps:

s2.1, calculating a bounding box of the watching region; i.e. the dashed rectangular box in fig. 2 that is circumscribed to the gaze area;

s2.2, acquiring line spacing (the height of a blank area of each line of characters in the same paragraph in the longitudinal direction) and character spacing (the width of the blank area between the characters in the horizontal direction) of the paragraph as standard spacing;

and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and then stopping expanding the periphery of the bounding box to obtain a watching paragraph area, namely a rectangular frame formed by dotted lines in the figure 2. Preferably, the outward expansion is performed in the left-right direction and then in the up-down direction.

S3, identifying character information of the watching paragraph area;

s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft; the first matching degree can represent the similarity degree between the short circuit watched by the user and each paragraph in the conference lecture, and the larger the value is, the higher the similarity degree is, and the specific calculation method comprises the following steps:

For example, the following steps are carried out:

assuming that the watched paragraph area has 10 lines, firstly screening out paragraphs with the line number of 10 from the conference lecture manuscript;

then, the first 5 characters are extracted from the 4 rows of 1, 3, 6, and 10 randomly selected from the gazing segment landing area, and then a first sample matrix can be obtained, which can be written as:

wherein the content of the first and second substances,t _ijrepresents the selected secondiThe first in a rowjIf the number of characters in a certain line is less than the set number, the characters are filled with null characters.

After the sample matrix is determined, a corresponding first comparison matrix can be obtained from the screened paragraphs, and can be recorded as:

therefore, the total number of characters in the matrix is 20, and if the first sample matrix is inconsistent with a second sample matrix only by 2 characters, the first matching degree can be calculated to be 0.9.

S5, acquiring a paragraph with the highest first matching degree in the conference lecture, wherein the paragraph is most similar to the content seen by the line of sight of the speaker, and therefore, the paragraph can be used as a target paragraph; the paragraph number can then be determined.

S6, identifying the character information of the target paragraph, which can be written as:

wherein the content of the first and second substances,Nis the total number of lines of the target paragraph,Mnumber of characters per line;

s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph; specifically, the method comprises the following steps:

s7.1, taking the character information of the watching area as a second sample matrix; can be written as:

wherein the content of the first and second substances,N’is the total number of rows of the fixation area,M’the number of characters per line.

for example, the following steps are carried out: assuming that the line number of the gazing area is 2 and the line number of the target paragraph is 5; then, the 1 st to 2 nd rows of the target paragraph can construct the 1 st second comparison matrix, the 2 nd to 3 rd rows can construct the 2 nd second comparison matrix, the 3 rd to 4 th rows can construct the 3 rd second comparison matrix, and the 4 th to 5 th rows can construct the 4 th second comparison matrix. At this time, the number of rows in each second contrast matrix is 2.

And S8, acquiring the line with the highest second matching degree in the target paragraph, wherein the line is most similar to the line of the gazing area and is used as the target line.

After the target paragraph and the target line are determined, it is further necessary to determine whether the user speaking content is similar to the content being viewed, so that the matching degree needs to be calculated.

if the third matching degree is larger than the judgment threshold, the user says that the content is consistent with the content to be watched, and therefore the target paragraph and the target line are used as the prompting positions. If the number of the users is not larger than the judgment threshold, the fact that the spoken content and the viewed content of the users are inconsistent is indicated, and the prompting position cannot be calculated.

The judgment threshold is a preset value and can be manually set in advance according to actual conditions.

After the prompt position is calculated every time, the real-time prompt module of the second terminal marks the conference lecture draft in real time and displays the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompt position calculated by the network server.

Specifically, when the real-time annotation is performed according to the target paragraph and the target line, one or more operations of the user may be guided by using an enlarged font, changing a character color, highlighting the annotation, and the like.

Example 2:

the embodiment of the invention provides a conference interaction method based on multiple application scenes of the Internet, and referring to fig. 3, the method comprises the following steps:

t1, acquiring a conference lecture manuscript from the first terminal;

t3, sharing the conference lecture manuscript to all the second terminals;

It can be understood that the internet-based multi-application-scenario conference interaction method provided in the embodiment of the present invention corresponds to the internet-based multi-application-scenario conference interaction system, and the explanation, examples, and beneficial effects of relevant contents thereof may refer to corresponding contents in the internet-based multi-application-scenario conference interaction system, which are not described herein again.

In summary, compared with the prior art, the method has the following beneficial effects:

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An Internet-based conference interaction system with multiple application scenes, comprising: the system comprises a first terminal, a second terminal and a network server;

the prompting area determining module is used for determining a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompting position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushing the prompting position to the second terminal; the method specifically comprises the following steps:

s3, identifying character information of the watching paragraph area;

s6, identifying character information of the target paragraph;

if the third matching degree is larger than the judgment threshold, taking the target paragraph and the target line as prompt positions;

2. The internet-based multi-application scenario conference interaction system of claim 1, wherein said expanding said gaze area resulting in a gaze passage area comprises:

s2.1, calculating a bounding box of the watching region;

3. The internet-based multi-application-scenario conference interaction system of claim 1, wherein the calculating of the first matching degree of the character information of the gazed paragraph area and the character information of each paragraph in the conference lecture includes:

4. The internet-based multi-application scene conference interaction system of claim 1, wherein the calculating of the second matching degree of the character information of the gazing area and the character information of the target paragraph comprises:

5. An Internet-based conference interaction method for multiple application scenes is characterized by comprising the following steps:

t1, acquiring a conference lecture manuscript from the first terminal;

t3, sharing the conference lecture manuscript to all the second terminals;

t6, based on the display content information and the character information of the user during speaking, determining a corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position, and then pushing the prompt position to the second terminal, for after the conference lecture draft is opened at the second terminal, real-time marking is performed on the conference lecture draft and the conference lecture draft is displayed to the user at the second terminal, which specifically includes:

s3, identifying character information of the watching paragraph area;

s6, identifying character information of the target paragraph;

6. The internet-based conference interaction method for multiple application scenes as claimed in claim 5, wherein the step of extending the gazing area to obtain a gazing paragraph area comprises:

s2.1, calculating a bounding box of the watching region;

7. The method as claimed in claim 5, wherein said calculating a first matching degree between the character information of the gazed paragraph area and the character information of each paragraph in the conference lecture draft includes:

8. The method as claimed in claim 5, wherein the calculating a second matching degree between the character information of the gazing area and the character information of the target paragraph includes: