CN113468319B - Internet-based multi-application-scene conference interaction system and method - Google Patents

Internet-based multi-application-scene conference interaction system and method Download PDF

Info

Publication number
CN113468319B
CN113468319B CN202110823507.7A CN202110823507A CN113468319B CN 113468319 B CN113468319 B CN 113468319B CN 202110823507 A CN202110823507 A CN 202110823507A CN 113468319 B CN113468319 B CN 113468319B
Authority
CN
China
Prior art keywords
area
conference
paragraph
character information
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110823507.7A
Other languages
Chinese (zh)
Other versions
CN113468319A (en
Inventor
何文龙
李永红
刘军涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Electron Technology Co ltd
Original Assignee
Shenzhen Electron Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Electron Technology Co ltd filed Critical Shenzhen Electron Technology Co ltd
Priority to CN202110823507.7A priority Critical patent/CN113468319B/en
Publication of CN113468319A publication Critical patent/CN113468319A/en
Application granted granted Critical
Publication of CN113468319B publication Critical patent/CN113468319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing

Abstract

The invention provides a conference interaction system and method based on multiple application scenes of the Internet, and relates to the field of the Internet. The method and the device solve the problem of privacy leakage possibly generated by a shared screen through a voice conference, recognize voice and sight of a user through a first terminal in the conference process to obtain the content spoken and seen by the user during speaking, determine the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and push the prompt position to a second terminal, so that after the user at the second terminal opens the conference lecture draft, the real-time annotation can be performed on the conference lecture draft and the presentation can be performed to the user. Therefore, the invention can ensure that other persons participating in the conference except the speaker can quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.

Description

Internet-based multi-application-scene conference interaction system and method
Technical Field
The invention relates to the technical field of Internet, in particular to a conference interaction system and method based on multiple application scenes of the Internet.
Background
With the increasing demand of home and office, the existing internet conference technology generally adopts a remote conference mode to improve the working efficiency of the staff at home.
In order to improve the communication efficiency in the conference, a shared screen is usually adopted to display the content of the lecture when the user speaks. However, the screen sharing mode easily enables non-conference files of users to be synchronously shared, and the privacy leakage problem exists, so that the voice conference can avoid the situation.
However, in a voice conference, when a lecture document with a large text amount is presented, a speaker usually presents the lecture document while looking at the lecture document, but other participants in the conference cannot well confirm the position of the speaking content of the speaker in the lecture document only by voice, which leads to a decrease in efficiency.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a conference interaction system and a conference interaction method based on multiple application scenes of the Internet, and solves the problem that other people participating in a conference except a speaker in the existing voice conference can not well confirm the position of the speaking content of the speaker in a presentation only through voice.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
in a first aspect, a conference interaction system based on internet and multiple application scenarios is provided, which includes: the system comprises a first terminal, a second terminal and a network server;
the first terminal includes: the conference lecture manuscript uploading module, the voice recognition module, the watching area recognition module and the display content information acquisition module;
the second terminal includes: the conference lecture manuscript acquisition module and the real-time prompt module are arranged;
the network server includes: the system comprises a shared database, a text recognition module and a prompt area determination module;
the conference lecture manuscript uploading module is used for uploading the conference lecture manuscript selected by the user to a shared database of the network server;
the text recognition module is used for acquiring text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information;
the conference lecture manuscript acquisition module is used for acquiring a conference lecture manuscript uploaded from a first terminal from a shared database of a network server;
the voice recognition module is used for recognizing voices in each preset time interval during speaking of the user in real time as character information of the user speaking;
the watching area identification module is used for acquiring a watching area of a user of the first terminal when the user opens a conference lecture manuscript and speaks;
the display content information acquisition module is used for acquiring display content information on the first terminal after the watching area of the user is identified; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal;
the prompting area determining module is used for determining a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompting position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushing the prompting position to the second terminal;
and the real-time prompting module is used for carrying out real-time marking on the conference lecture draft and displaying the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompting position.
Further, the determining, based on the collected display content information on the first terminal and the character information of the user during speaking, a corresponding position of the speaking content of the user of the first terminal in the conference lecture manuscript as a prompt position, and then pushing the prompt position to the second terminal includes:
s1, acquiring a display area and a watching area in the display content information;
s2, performing outward expansion on the watching area to obtain a watching paragraph area;
s3, identifying character information of the watching paragraph area;
s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft;
s5, acquiring a paragraph with the highest first matching degree in the conference lecture as a target paragraph;
s6, identifying character information of the target paragraph;
s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph;
s8, acquiring a line with the highest second matching degree in the target paragraph as a target line;
s9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;
and if the third matching degree is greater than the judgment threshold, taking the target paragraph and the target line as the prompt position.
Further, the expanding the gazing area to obtain a gazing paragraph area includes:
s2.1, calculating a bounding box of the watching region;
s2.2, acquiring line spacing and word spacing of paragraphs as standard spacing;
and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and stopping to obtain the watching paragraph area.
Further, the calculating a first matching degree between the character information of the gazing paragraph area and the character information of each paragraph in the conference lecture draft includes:
s4.1, selecting paragraphs with the same line number from the conference lecture draft;
s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;
s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;
and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.
Further, the calculating a second matching degree between the character information of the gazing area and the character information of the target paragraph includes:
s7.1, taking the character information of the watching area as a second sample matrix;
s7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;
and S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.
In a second aspect, a conference interaction method for multiple application scenarios based on the internet is provided, and the method includes:
t1, acquiring a conference lecture manuscript from the first terminal;
t2, acquiring text information of the conference lecture draft; the text information includes: paragraph number, line number, and character information;
t3, sharing the conference lecture manuscript to all the second terminals;
t4, acquiring character information of the user speaking from the first terminal in real time; the character information of the user speaking is the voice of the user in each preset time interval when speaking;
t5, acquiring display content information in real time when the user opens the conference lecture manuscript and speaks from the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of the user of the first terminal when the user of the first terminal opens the conference lecture manuscript and speaks;
and T6, determining a corresponding position of the speech content of the user of the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information when the user speaks, and pushing the prompt position to the second terminal for real-time marking on the conference lecture draft and displaying to the user of the second terminal after the conference lecture draft is opened at the second terminal.
Further, the determining, based on the display content information and the character information of the user during speaking, a corresponding position of the speaking content of the user of the first terminal in the conference lecture manuscript as a prompt position, and then pushing the prompt position to the second terminal includes:
s1, acquiring a display area and a watching area in the display content information;
s2, performing outward expansion on the watching area to obtain a watching paragraph area;
s3, identifying character information of the watching paragraph area;
s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft;
s5, acquiring a paragraph with the highest first matching degree in the conference lecture as a target paragraph;
s6, identifying character information of the target paragraph;
s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph;
s8, acquiring a line with the highest second matching degree in the target paragraph as a target line;
s9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;
and if the third matching degree is greater than the judgment threshold, taking the target paragraph and the target line as the prompt position.
Further, the expanding the gazing area to obtain a gazing paragraph area includes:
s2.1, calculating a bounding box of the watching region;
s2.2, acquiring line spacing and word spacing of paragraphs as standard spacing;
and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and stopping to obtain the watching paragraph area.
Further, the calculating a first matching degree between the character information of the gazing paragraph area and the character information of each paragraph in the conference lecture draft includes:
s4.1, selecting paragraphs with the same line number from the conference lecture draft;
s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;
s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;
and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.
Further, the calculating a second matching degree between the character information of the gazing area and the character information of the target paragraph includes:
s7.1, taking the character information of the watching area as a second sample matrix;
s7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;
and S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.
(III) advantageous effects
The method and the device solve the problem of privacy leakage possibly generated by a shared screen through a voice conference, recognize voice and sight of a user through a first terminal in the conference process to obtain the content spoken and seen by the user during speaking, determine the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and push the prompt position to a second terminal, so that after the user at the second terminal opens the conference lecture draft, the real-time annotation can be performed on the conference lecture draft and the presentation can be performed to the user. Therefore, the invention can be used in common voice conference, and also can be optimized for the speaker needing large text amount in the voice conference. The method and the system enable other people participating in the conference except the speaker to quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system block diagram of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a display area, a gaze area, a bounding box, and a gaze segment area of an embodiment of the present invention;
FIG. 3 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application solves the problem that other persons participating in the conference except the speaker in the existing voice conference can not well confirm the position of the speaking content of the speaker in the lecture manuscript only through voice by providing the internet-based conference interaction method and the internet-based conference interaction system with multiple application scenes.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
the embodiment of the invention provides an internet-based conference interaction system with multiple application scenes, as shown in fig. 1, comprising: the system comprises a first terminal, a second terminal and a network server.
The first terminal and the second terminal are intelligent devices with conference software, such as personal computers, the first terminal is a device of a speaker of the lecture manuscript, and the second terminal is a device of other participants.
The first terminal includes: the conference lecture manuscript uploading module, the voice recognition module, the watching area recognition module and the display content information acquisition module;
the second terminal includes: the conference lecture manuscript acquisition module and the real-time prompt module are arranged;
the network server includes: the system comprises a shared database, a text recognition module and a prompt area determination module;
the conference lecture manuscript uploading module is used for uploading the conference lecture manuscript selected by the user to a shared database of the network server;
the text recognition module is used for acquiring text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information;
the conference lecture manuscript acquisition module is used for acquiring a conference lecture manuscript uploaded from a first terminal from a shared database of a network server;
the voice recognition module is used for recognizing voices in each preset time interval during speaking of the user in real time as character information of the user speaking;
the watching area identification module is used for acquiring a watching area of a user of the first terminal when the user opens a conference lecture manuscript and speaks;
the display content information acquisition module is used for acquiring display content information on the first terminal after the watching area of the user is identified; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal;
the prompting area determining module is used for determining a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompting position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushing the prompting position to the second terminal;
and the real-time prompting module is used for carrying out real-time marking on the conference lecture draft and displaying the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompting position.
The embodiment of the invention has the beneficial effects that:
the embodiment of the invention solves the privacy leakage problem possibly generated by a shared screen through a voice conference, and performs voice and sight recognition on the user in the conference process through the first terminal to obtain the content spoken and seen by the user during speaking, and then determines the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and then pushes the prompt position to the second terminal, so that the user at the second terminal can perform real-time marking on the conference lecture draft and display the conference lecture draft to the user after opening the conference lecture draft. Therefore, the invention can be used in common voice conference, and also can be optimized for the speaker needing large text amount in the voice conference. The method and the system enable other people participating in the conference except the speaker to quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.
The following describes a detailed implementation process of this embodiment:
and uploading the conference lecture selected by the user to a shared database of the network server through a conference lecture uploading module of the first client.
Specifically, the conference lecture adopts a unified standard, for example, the paragraph spacing is larger than the line spacing. And uploading the conference lecture manuscript to a network server, and storing the conference lecture manuscript in a shared database, wherein all participants have the downloading permission of the conference lecture manuscript.
A text recognition module of the network server performs text recognition on the conference lecture manuscript to acquire text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information.
Specifically, for example, after a conference lecture is processed by the existing text recognition technology, the paragraph number and the total number of paragraphs of each paragraph, the line number and the total number of lines in each paragraph, and character information (i.e. the text content and the sequence in each line) can be obtained.
A conference lecture manuscript acquisition module of the second terminal acquires a conference lecture manuscript uploaded from the first terminal from a shared database of the network server; so that the user of the second terminal can view the original file of the conference lecture manuscript.
In the conference process, a voice recognition module of a first terminal recognizes voices in each preset time interval during the speaking of a user in real time through an existing voice-to-text algorithm as character information of the speaking of the user; the preset time interval can be preset manually according to needs.
The watching area identification module of the first terminal acquires the watching area of the user of the first terminal by using the existing eyeball tracking algorithm when the user opens the conference lecture manuscript and speaks.
Meanwhile, after the display content information acquisition module of the first terminal identifies the watching area of the user, acquiring display content information on the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal; specifically, the acquisition of the display content information can be realized by recording a screen of the first terminal.
After the collection of the display content information of the first terminal is completed, the display content information needs to be analyzed to determine the position of the text corresponding to the speech of the user in the lecture manuscript, and therefore,
and a prompt area determining module of the network server determines a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompt position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushes the prompt position to the second terminal.
Specifically, the method comprises the following steps:
s1, acquiring a display area and a watching area in the display content information;
as an example given in fig. 2, the peripheral rectangular area is a display area corresponding to the conference lecture manuscript; the irregular area in the display area is a user's gaze area.
S2, performing outward expansion on the watching area to obtain a watching paragraph area;
as shown in fig. 2, the method specifically includes the following steps:
s2.1, calculating a bounding box of the watching region; i.e. the dashed rectangular box in fig. 2 that is circumscribed to the gaze area;
s2.2, acquiring line spacing (the height of a blank area of each line of characters in the same paragraph in the longitudinal direction) and character spacing (the width of the blank area between the characters in the horizontal direction) of the paragraph as standard spacing;
and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and then stopping expanding the periphery of the bounding box to obtain a watching paragraph area, namely a rectangular frame formed by dotted lines in the figure 2. Preferably, the outward expansion is performed in the left-right direction and then in the up-down direction.
S3, identifying character information of the watching paragraph area;
s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft; the first matching degree can represent the similarity degree between the short circuit watched by the user and each paragraph in the conference lecture, and the larger the value is, the higher the similarity degree is, and the specific calculation method comprises the following steps:
s4.1, selecting paragraphs with the same line number from the conference lecture draft;
s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;
s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;
and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.
For example, the following steps are carried out:
assuming that the watched paragraph area has 10 lines, firstly screening out paragraphs with the line number of 10 from the conference lecture manuscript;
then, the first 5 characters are extracted from the 4 rows of 1, 3, 6, and 10 randomly selected from the gazing segment landing area, and then a first sample matrix can be obtained, which can be written as:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,t ij represents the selected secondiThe first in a rowjIf the number of characters in a certain line is less than the set number, the characters are filled with null characters.
After the sample matrix is determined, a corresponding first comparison matrix can be obtained from the screened paragraphs, and can be recorded as:
Figure 905422DEST_PATH_IMAGE002
therefore, the total number of characters in the matrix is 20, and if the first sample matrix is inconsistent with a second sample matrix only by 2 characters, the first matching degree can be calculated to be 0.9.
S5, acquiring a paragraph with the highest first matching degree in the conference lecture, wherein the paragraph is most similar to the content seen by the line of sight of the speaker, and therefore, the paragraph can be used as a target paragraph; the paragraph number can then be determined.
S6, identifying the character information of the target paragraph, which can be written as:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,Nis the total number of lines of the target paragraph,Mnumber of characters per line;
s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph; specifically, the method comprises the following steps:
s7.1, taking the character information of the watching area as a second sample matrix; can be written as:
Figure 291404DEST_PATH_IMAGE004
wherein the content of the first and second substances,N’is the total number of rows of the fixation area,M’the number of characters per line.
S7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;
for example, the following steps are carried out: assuming that the line number of the gazing area is 2 and the line number of the target paragraph is 5; then, the 1 st to 2 nd rows of the target paragraph can construct the 1 st second comparison matrix, the 2 nd to 3 rd rows can construct the 2 nd second comparison matrix, the 3 rd to 4 th rows can construct the 3 rd second comparison matrix, and the 4 th to 5 th rows can construct the 4 th second comparison matrix. At this time, the number of rows in each second contrast matrix is 2.
And S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.
And S8, acquiring the line with the highest second matching degree in the target paragraph, wherein the line is most similar to the line of the gazing area and is used as the target line.
After the target paragraph and the target line are determined, it is further necessary to determine whether the user speaking content is similar to the content being viewed, so that the matching degree needs to be calculated.
S9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;
if the third matching degree is larger than the judgment threshold, the user says that the content is consistent with the content to be watched, and therefore the target paragraph and the target line are used as the prompting positions. If the number of the users is not larger than the judgment threshold, the fact that the spoken content and the viewed content of the users are inconsistent is indicated, and the prompting position cannot be calculated.
The judgment threshold is a preset value and can be manually set in advance according to actual conditions.
After the prompt position is calculated every time, the real-time prompt module of the second terminal marks the conference lecture draft in real time and displays the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompt position calculated by the network server.
Specifically, when the real-time annotation is performed according to the target paragraph and the target line, one or more operations of the user may be guided by using an enlarged font, changing a character color, highlighting the annotation, and the like.
Example 2:
the embodiment of the invention provides a conference interaction method based on multiple application scenes of the Internet, and referring to fig. 3, the method comprises the following steps:
t1, acquiring a conference lecture manuscript from the first terminal;
t2, acquiring text information of the conference lecture draft; the text information includes: paragraph number, line number, and character information;
t3, sharing the conference lecture manuscript to all the second terminals;
t4, acquiring character information of the user speaking from the first terminal in real time; the character information of the user speaking is the voice of the user in each preset time interval when speaking;
t5, acquiring display content information in real time when the user opens the conference lecture manuscript and speaks from the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of the user of the first terminal when the user of the first terminal opens the conference lecture manuscript and speaks;
and T6, determining a corresponding position of the speech content of the user of the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information when the user speaks, and pushing the prompt position to the second terminal for real-time marking on the conference lecture draft and displaying to the user of the second terminal after the conference lecture draft is opened at the second terminal.
It can be understood that the internet-based multi-application-scenario conference interaction method provided in the embodiment of the present invention corresponds to the internet-based multi-application-scenario conference interaction system, and the explanation, examples, and beneficial effects of relevant contents thereof may refer to corresponding contents in the internet-based multi-application-scenario conference interaction system, which are not described herein again.
In summary, compared with the prior art, the method has the following beneficial effects:
the method and the device solve the problem of privacy leakage possibly generated by a shared screen through a voice conference, recognize voice and sight of a user through a first terminal in the conference process to obtain the content spoken and seen by the user during speaking, determine the corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position based on the display content information and the character information during speaking of the user, and push the prompt position to a second terminal, so that after the user at the second terminal opens the conference lecture draft, the real-time annotation can be performed on the conference lecture draft and the presentation can be performed to the user. Therefore, the invention can be used in common voice conference, and also can be optimized for the speaker needing large text amount in the voice conference. The method and the system enable other people participating in the conference except the speaker to quickly confirm the position of the speaking content of the speaker in the lecture manuscript, ensure the communication efficiency in the teleconference and avoid the problem of privacy leakage.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. An Internet-based conference interaction system with multiple application scenes, comprising: the system comprises a first terminal, a second terminal and a network server;
the first terminal includes: the conference lecture manuscript uploading module, the voice recognition module, the watching area recognition module and the display content information acquisition module;
the second terminal includes: the conference lecture manuscript acquisition module and the real-time prompt module are arranged;
the network server includes: the system comprises a shared database, a text recognition module and a prompt area determination module;
the conference lecture manuscript uploading module is used for uploading the conference lecture manuscript selected by the user to a shared database of the network server;
the text recognition module is used for acquiring text information of the conference lecture manuscript; the text information includes: paragraph number, line number, and character information;
the conference lecture manuscript acquisition module is used for acquiring a conference lecture manuscript uploaded from a first terminal from a shared database of a network server;
the voice recognition module is used for recognizing voices in each preset time interval during speaking of the user in real time as character information of the user speaking;
the watching area identification module is used for acquiring a watching area of a user of the first terminal when the user opens a conference lecture manuscript and speaks;
the display content information acquisition module is used for acquiring display content information on the first terminal after the watching area of the user is identified; the display content information comprises a display area of the conference lecture manuscript and a watching area of a user of the first terminal;
the prompting area determining module is used for determining a corresponding position of the speaking content of the user of the first terminal in the conference lecture draft as a prompting position based on the acquired display content information on the first terminal and the character information of the user during speaking, and then pushing the prompting position to the second terminal; the method specifically comprises the following steps:
s1, acquiring a display area and a watching area in the display content information;
s2, performing outward expansion on the watching area to obtain a watching paragraph area;
s3, identifying character information of the watching paragraph area;
s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft;
s5, acquiring a paragraph with the highest first matching degree in the conference lecture as a target paragraph;
s6, identifying character information of the target paragraph;
s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph;
s8, acquiring a line with the highest second matching degree in the target paragraph as a target line;
s9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;
if the third matching degree is larger than the judgment threshold, taking the target paragraph and the target line as prompt positions;
and the real-time prompting module is used for carrying out real-time marking on the conference lecture draft and displaying the conference lecture draft to a user of the second terminal after the conference lecture draft is opened at the second terminal based on the prompting position.
2. The internet-based multi-application scenario conference interaction system of claim 1, wherein said expanding said gaze area resulting in a gaze passage area comprises:
s2.1, calculating a bounding box of the watching region;
s2.2, acquiring line spacing and word spacing of paragraphs as standard spacing;
and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and stopping to obtain the watching paragraph area.
3. The internet-based multi-application-scenario conference interaction system of claim 1, wherein the calculating of the first matching degree of the character information of the gazed paragraph area and the character information of each paragraph in the conference lecture includes:
s4.1, selecting paragraphs with the same line number from the conference lecture draft;
s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;
s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;
and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.
4. The internet-based multi-application scene conference interaction system of claim 1, wherein the calculating of the second matching degree of the character information of the gazing area and the character information of the target paragraph comprises:
s7.1, taking the character information of the watching area as a second sample matrix;
s7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;
and S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.
5. An Internet-based conference interaction method for multiple application scenes is characterized by comprising the following steps:
t1, acquiring a conference lecture manuscript from the first terminal;
t2, acquiring text information of the conference lecture draft; the text information includes: paragraph number, line number, and character information;
t3, sharing the conference lecture manuscript to all the second terminals;
t4, acquiring character information of the user speaking from the first terminal in real time; the character information of the user speaking is the voice of the user in each preset time interval when speaking;
t5, acquiring display content information in real time when the user opens the conference lecture manuscript and speaks from the first terminal; the display content information comprises a display area of the conference lecture manuscript and a watching area of the user of the first terminal when the user of the first terminal opens the conference lecture manuscript and speaks;
t6, based on the display content information and the character information of the user during speaking, determining a corresponding position of the speaking content of the user at the first terminal in the conference lecture draft as a prompt position, and then pushing the prompt position to the second terminal, for after the conference lecture draft is opened at the second terminal, real-time marking is performed on the conference lecture draft and the conference lecture draft is displayed to the user at the second terminal, which specifically includes:
s1, acquiring a display area and a watching area in the display content information;
s2, performing outward expansion on the watching area to obtain a watching paragraph area;
s3, identifying character information of the watching paragraph area;
s4, calculating a first matching degree of the character information of the watching paragraph area and the character information of each paragraph in the conference lecture draft;
s5, acquiring a paragraph with the highest first matching degree in the conference lecture as a target paragraph;
s6, identifying character information of the target paragraph;
s7, calculating a second matching degree of the character information of the gazing area and the character information of the target paragraph;
s8, acquiring a line with the highest second matching degree in the target paragraph as a target line;
s9, calculating the ratio of the same word number of the character information of the user speaking and the character information of the target line to the total word number of the target line as a third matching degree;
and if the third matching degree is greater than the judgment threshold, taking the target paragraph and the target line as the prompt position.
6. The internet-based conference interaction method for multiple application scenes as claimed in claim 5, wherein the step of extending the gazing area to obtain a gazing paragraph area comprises:
s2.1, calculating a bounding box of the watching region;
s2.2, acquiring line spacing and word spacing of paragraphs as standard spacing;
and S2.3, expanding the periphery of the bounding box outwards until the line spacing and the word spacing at the edge of the bounding box are larger than the standard spacing, and stopping to obtain the watching paragraph area.
7. The method as claimed in claim 5, wherein said calculating a first matching degree between the character information of the gazed paragraph area and the character information of each paragraph in the conference lecture draft includes:
s4.1, selecting paragraphs with the same line number from the conference lecture draft;
s4.2, randomly selecting the first m characters in n rows from the gazing section falling region to form a first sample matrix;
s4.3, selecting characters at corresponding positions from the paragraphs with the same line number to form a first comparison matrix;
and S4.4, calculating the ratio of the same word number of the first sample matrix and each first comparison matrix to the total character number of the first sample matrix as a first matching degree.
8. The method as claimed in claim 5, wherein the calculating a second matching degree between the character information of the gazing area and the character information of the target paragraph includes:
s7.1, taking the character information of the watching area as a second sample matrix;
s7.2, splitting the target paragraph into a plurality of second comparison matrixes according to the line number of the watching area;
and S7.3, calculating the ratio of the same word number of the second sample matrix and each second comparison matrix to the total character number of the first sample matrix to serve as a second matching degree.
CN202110823507.7A 2021-07-21 2021-07-21 Internet-based multi-application-scene conference interaction system and method Active CN113468319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823507.7A CN113468319B (en) 2021-07-21 2021-07-21 Internet-based multi-application-scene conference interaction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823507.7A CN113468319B (en) 2021-07-21 2021-07-21 Internet-based multi-application-scene conference interaction system and method

Publications (2)

Publication Number Publication Date
CN113468319A CN113468319A (en) 2021-10-01
CN113468319B true CN113468319B (en) 2022-01-14

Family

ID=77881577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823507.7A Active CN113468319B (en) 2021-07-21 2021-07-21 Internet-based multi-application-scene conference interaction system and method

Country Status (1)

Country Link
CN (1) CN113468319B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104796584A (en) * 2015-04-23 2015-07-22 南京信息工程大学 Prompt device with voice recognition function
CN107832269A (en) * 2017-09-30 2018-03-23 重庆工商职业学院 Desktop Conferencing System and its control method
CN208240291U (en) * 2018-06-22 2018-12-14 中国五冶集团有限公司 A kind of speech notice board carrying out recognition of face
CN112202580A (en) * 2020-09-21 2021-01-08 北京字跳网络技术有限公司 Teleconferencing control method, teleconferencing control device, teleconferencing equipment and storage medium
CN112740327A (en) * 2018-08-27 2021-04-30 谷歌有限责任公司 Algorithmic determination of story reader reading interruption
CN112825551A (en) * 2019-11-21 2021-05-21 中国科学院沈阳计算技术研究所有限公司 Method and system for prompting important contents of video conference and transferring and storing important contents

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100856403B1 (en) * 2006-03-03 2008-09-04 삼성전자주식회사 Video conference recording method and video conference terminal for the same
WO2016095361A1 (en) * 2014-12-14 2016-06-23 SZ DJI Technology Co., Ltd. Methods and systems of video processing
WO2020050822A1 (en) * 2018-09-04 2020-03-12 Google Llc Detection of story reader progress for pre-caching special effects
CN213745928U (en) * 2020-10-29 2021-07-20 深圳市悟腾科技有限公司 Multimedia player for meeting room
CN113011244A (en) * 2021-01-25 2021-06-22 北京科技大学 Eye tracker-based method and system for identifying high-quality user-generated content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104796584A (en) * 2015-04-23 2015-07-22 南京信息工程大学 Prompt device with voice recognition function
CN107832269A (en) * 2017-09-30 2018-03-23 重庆工商职业学院 Desktop Conferencing System and its control method
CN208240291U (en) * 2018-06-22 2018-12-14 中国五冶集团有限公司 A kind of speech notice board carrying out recognition of face
CN112740327A (en) * 2018-08-27 2021-04-30 谷歌有限责任公司 Algorithmic determination of story reader reading interruption
CN112825551A (en) * 2019-11-21 2021-05-21 中国科学院沈阳计算技术研究所有限公司 Method and system for prompting important contents of video conference and transferring and storing important contents
CN112202580A (en) * 2020-09-21 2021-01-08 北京字跳网络技术有限公司 Teleconferencing control method, teleconferencing control device, teleconferencing equipment and storage medium

Also Published As

Publication number Publication date
CN113468319A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
US11417343B2 (en) Automatic speaker identification in calls using multiple speaker-identification parameters
CN108346034B (en) Intelligent conference management method and system
CN108566565B (en) Bullet screen display method and device
WO2020237855A1 (en) Sound separation method and apparatus, and computer readable storage medium
US20190215464A1 (en) Systems and methods for decomposing a video stream into face streams
US8791977B2 (en) Method and system for presenting metadata during a videoconference
US10304458B1 (en) Systems and methods for transcribing videos using speaker identification
US10929683B2 (en) Video processing method, apparatus and device
CN112184497B (en) Customer visit track tracking and passenger flow analysis system and method
CN112148922A (en) Conference recording method, conference recording device, data processing device and readable storage medium
CN110211590B (en) Conference hotspot processing method and device, terminal equipment and storage medium
CN109859298B (en) Image processing method and device, equipment and storage medium thereof
CN111599359A (en) Man-machine interaction method, server, client and storage medium
CN110648667B (en) Multi-person scene human voice matching method
CN111193890A (en) Conference record analyzing device and method and conference record playing system
CN111191073A (en) Video and audio recognition method, device, storage medium and device
CN110072140A (en) A kind of video information reminding method, device, equipment and storage medium
US11663824B1 (en) Document portion identification in a recorded video
CN113315979A (en) Data processing method and device, electronic equipment and storage medium
US20130163948A1 (en) Video playback apparatus and video playback method
CN113301382B (en) Video processing method, device, medium, and program product
CN113468319B (en) Internet-based multi-application-scene conference interaction system and method
CN109726271A (en) Identify method, apparatus, equipment and the storage medium of customer problem content
CN112287091A (en) Intelligent question-answering method and related products
CN111401198B (en) Audience emotion recognition method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant