CN113918114B

CN113918114B - Document control method, device, computer equipment and storage medium

Info

Publication number: CN113918114B
Application number: CN202111203894.0A
Authority: CN
Inventors: 邢起源
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2024-02-13
Anticipated expiration: 2041-10-15
Also published as: CN113918114A

Abstract

The application relates to a document control method, a document control device, computer equipment and a storage medium. The method comprises the following steps: when a document is demonstrated, a trigger control which follows voice page turning is demonstrated in a demonstration interface of the document; responding to the triggering operation of the following voice page turning triggering control, and entering a following voice page turning mode; in the following voice page-turning mode, the voice content of the presenter is followed to page to a target page of the document, and text content in the target page is matched with the semantics of the voice content of the presenter. The method and the device can be applied to online collaborative documents, can realize automatic page turning of the documents along with the voice content of the demonstrator, and the presenter does not need to control the documents to turn pages through an additional demonstration pen or send control instructions beyond the demonstration content to control the documents to turn pages, so that the page turning is very convenient, the demonstration thought of the presenter in the whole demonstration process is more coherent, and the user experience of document demonstration is improved.

Description

Document control method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a document control method, a document control device, a computer device, and a storage medium.

Background

The document demonstration is widely applied to work reports, enterprise propaganda, product introduction, education and training and various forms of lectures, and is helpful for demonstrators to directly and vividly communicate document contents and topics. However, in the current document presentation process, when a page turning operation is required to be performed on a document, a manner that a presenter manually clicks "previous page" or "next page" or that the presenter controls the page turning of the document through an additional presentation pen is generally adopted. Obviously, the mode is complex in operation, and the demonstrating thought of a presenter on a document in the demonstrating process can be frequently interrupted, so that the whole demonstrating process is incoherent.

Disclosure of Invention

Based on this, it is necessary to provide a document control method, apparatus, computer device, and storage medium in view of the above technical problems.

A document control method, the method comprising:

when a document is demonstrated, a trigger control which follows voice page turning is demonstrated in a demonstration interface of the document;

responding to the triggering operation of the following voice page turning triggering control, and entering a following voice page turning mode;

and in the following voice page turning mode, page turning is carried out on voice content of a presenter to a target page of the document, and text content in the target page is matched with the semantics of the voice content of the presenter.

A document control apparatus, the apparatus comprising:

the demonstration interface demonstration module is used for demonstrating a trigger control which follows voice page turning in a demonstration interface of a document when the document is demonstrated;

the response module is used for responding to the triggering operation of the following voice page turning triggering control and entering a following voice page turning mode;

and the following page turning module is used for turning pages to target pages of the document according to the voice content of the presenter in the following voice page turning mode, and the text content in the target pages is matched with the semantics of the voice content of the presenter.

In one embodiment, the document control apparatus further includes:

the editing interface display module is used for displaying a follow voice page turning initialization control in an editing interface of a document when the document is edited; responding to the triggering operation of the follow-up voice page turning initialization control, and displaying a follow-up page turning prompt area in the editing interface; and displaying text contents corresponding to each page in the document in the following page turning prompt area.

In one embodiment, the editing interface presentation module is further configured to enter a document editing mode with respect to the document in response to a triggering operation to edit the document; after entering the document editing mode, displaying an editing interface of the document; and displaying the following voice page turning initialization control in the editing interface.

In one embodiment, the document control apparatus further includes:

the text content editing module is used for responding to text editing operation in the following page turning prompt area and displaying edited text content in the following page turning prompt area; and updating the text content corresponding to each page in the document according to the edited text content.

In one embodiment, the presentation interface presentation module is further configured to cancel presentation of the following voice page turning initialization control and the following page turning prompt area in a presentation interface of the document when the document is presented.

In one embodiment, the presentation interface presentation module is further configured to enter a document presentation mode with respect to the document in response to a trigger operation to present the document; after entering the document demonstration mode, displaying a demonstration interface of the document; and displaying the following voice page turning trigger control in the demonstration interface.

In one embodiment, the response module is further configured to, in the following voice page-turning mode, exit the following voice page-turning mode in response to a triggering operation of the following voice page-turning trigger control.

In one embodiment, the apparatus further comprises:

before entering the follow-up voice page-turning mode, when the document is in the document editing mode, and when the trigger operation of the follow-up voice page-turning initialization control displayed in the editing interface is determined not to occur, extracting text contents corresponding to each page in the document, and storing page numbers of each page and corresponding text contents.

In one embodiment, the document control apparatus further includes:

the follow voice page turning initialization module is used for extracting text contents corresponding to each page in the document, wherein the text contents comprise at least one of page remark contents and page title contents; and storing page numbers of the pages corresponding to the corresponding text contents.

In one embodiment, the following voice page turning initialization module is further configured to extract, for each page in the document, an original text content corresponding to each page; when the word number of the original text content is less than the preset number, the original text content is reserved and used as the text content extracted from the corresponding page; when the number of words of the original text content is larger than the preset number, reading is continued after the preset number of characters are read from the first character of the original text content until the separator is read for the first time, and the read content is reserved as the text content extracted from the corresponding page.

In one embodiment, the document control apparatus further includes:

the acquisition module is used for acquiring voice data of a demonstrator in the following voice page-turning mode;

the sentence conversion module is used for converting the voice data of the demonstrator into sentences of the demonstrator;

and the matching module is used for carrying out semantic matching on the latest sentences and the text content corresponding to each page in the document successively, and determining a target page according to the text content successfully matched.

In one embodiment, the matching module is further configured to perform semantic matching on the latest sentence and page remark content corresponding to each page in the document successively, and use a page corresponding to the page remark content successfully matched first as a target page; when the latest sentence is not successfully matched with the page remark content corresponding to each page in the document, the latest sentence is successively semantically matched with the page title content corresponding to each page in the document, and the page corresponding to the page title content successfully matched first is used as a target page; and when the latest statement is not successfully matched with the page title content corresponding to each page in the document, indicating the acquisition module to continuously acquire the voice data of the demonstrator in the following voice page turning mode.

In one embodiment, the matching module is further configured to compare a length of the page remark content corresponding to the page being matched with the latest sentence in a sequential matching process that sequentially performs semantic matching on the latest sentence and the page remark content corresponding to each page in the document; when the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page remark content; when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, intercepting the page remark content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

In one embodiment, the matching module is further configured to compare the length of the page title content corresponding to the page being matched with the latest sentence in a sequential matching process that sequentially performs semantic matching on the page title content corresponding to each page in the document with the latest sentence; when the length of the latest sentence is greater than the length of the page title content, according to the length of the page title content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page title content; when the length of the latest sentence is smaller than the length of the page title content, according to the length of the latest sentence, intercepting the page title content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

In one embodiment, the matching module is further configured to convert the latest sentence into a corresponding word sequence and convert the text content corresponding to the page being matched into a corresponding word sequence in a successive matching process that successively performs semantic matching on the latest sentence and the text content corresponding to each page in the document; generating a sentence vector corresponding to the latest sentence based on word vectors corresponding to the word segments in the word sequence corresponding to the latest sentence, and generating a text vector corresponding to the text content based on word vectors corresponding to the word segments in the word sequence corresponding to the text content; and when the similarity between the sentence vector and the text vector is larger than a preset threshold value, judging that the latest sentence is successfully matched with the text content.

In one embodiment, the document control apparatus further includes:

the page turning module is used for generating a page turning instruction carrying the page number according to the page number of the target page; and based on the page turning instruction, after the document is turned from the current page to the target page, continuously indicating the acquisition module to acquire voice data of a demonstrator in the following voice page turning mode.

In one embodiment, the document is an online collaborative document, and the document control apparatus further includes:

the invitation module is used for initiating document collaboration invitation according to the access address of the online collaboration document;

and the synchronization module is used for synchronizing the document control information generated in the demonstration process of the online collaboration document to the terminal responding to the document collaboration invitation, so that the terminal synchronizes the control state of the online collaboration document according to the document control information.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

A computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read from the computer readable storage medium by a processor of a computer device, the computer instructions being executed by the processor to cause the computer device to perform the steps of the document control method described above.

A computer program product comprising a computer program which when executed by a processor implements the steps of the document control method described above.

The method, the device, the computer equipment and the storage medium for controlling the document show the following voice page turning trigger control in the demonstration interface of the document when the document is demonstrated, and when the trigger operation of the following voice page turning trigger control occurs, the following voice page turning mode is started for the document. In the mode, the document turns pages along with the voice content of the presenter, and the text content in the target page turned pages is matched with the voice content of the presenter in terms of semantics, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen or send control instructions beyond the presentation content to control the document to turn pages, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of document presentation is improved.

Drawings

FIG. 1 is a diagram of an application environment for a method of file control in one embodiment;

FIG. 2 is a flow chart of a method of controlling a file in one embodiment;

FIG. 3 is an interface schematic diagram of a presentation interface in one embodiment;

FIG. 4 is a schematic diagram of an interface for automatically turning pages of a Chinese character with the voice content of a presenter in one embodiment;

FIG. 5 is a flowchart showing steps for initializing a document with a follow-up voice page turn in one embodiment;

FIG. 6 is an interface diagram of an editing interface in one embodiment;

FIG. 7 is a schematic diagram of an interface showing a follow page flip prompt area in an editing interface in one embodiment;

FIG. 8 is a simplified flow diagram of a method of controlling a file in one embodiment;

FIG. 9 is a flow chart of a method of controlling a file in one embodiment;

FIG. 10 is a block diagram showing the structure of a file control device according to one embodiment;

FIG. 11 is a block diagram showing the structure of a file control device according to another embodiment;

fig. 12 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The document control method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Wherein the terminal 102 and the server 104 are connected by a wired or wireless means. The document display method can be applied to various scenes, such as conferences, work reports, enterprise propaganda, product introduction, education and training, various forms of lectures and the like.

When a document is presented, the terminal 102 may display a following voice page-turning trigger control in a presentation interface of the document, and enter a following voice page-turning mode in response to a trigger operation of the following voice page-turning trigger control, where the terminal 102 turns a page to a target page of the document following voice content of a presenter, and text content in the target page matches with semantics of the voice content of the presenter.

The terminal 102 may also display a follow-up voice page-turning initialization control in an editing interface of the document when the document is edited, and extract text content corresponding to each page in the document in response to a trigger operation of the follow-up voice page-turning initialization control, where the terminal 102 may store the page number of each page and the corresponding text content to the server 104.

Accordingly, in the following voice page-turning mode, the terminal 102 may send the voice content of the presenter to the server 104, perform semantic matching on the voice content of the presenter and the text content corresponding to each page in the stored document through the server 104, generate a page-turning instruction according to the page number of the target page that is successfully matched, and return the page-turning instruction to the terminal 102, so as to instruct the terminal 102 to turn the page to the target page according to the page number in the page-turning instruction.

The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, intelligent voice interaction devices, intelligent home appliances, vehicle-mounted terminals, etc., and in one embodiment, the terminal 102 may have an instant messaging application running thereon, which may support browsing, demonstration, editing, etc. of the presentation document. In another embodiment, the terminal 102 may have an online collaborative document application running thereon, which may enable online browsing, presentation, and editing of a document, and may also support multiple people to simultaneously browse and edit the document online. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers. The server 104 may be a stand-alone application server for providing corresponding computing services and storage services for instant messaging applications or online collaborative document applications running on the terminal 102. As shown in fig. 1, the server 104 may be a server cluster including a plurality of servers that can communicate with each other, such as a document server 1041, a speech recognition server 1042, and a semantic matching server 1043.

In the related art, when a document is presented, the following two modes are adopted for page turning modes of the document page: the method is characterized in that the page turning of the document is controlled through the demonstration pen or the keyboard, and the method not only needs additional control equipment and increases certain cost, but also needs manual operation of a demonstrator, so that the thought of the demonstrator is interrupted by the manual operation of the presenter in the demonstration process, and the demonstration experience is influenced. The other is that the voice control turns pages, when the page needs to be turned in the demonstration process, the demonstrator conveys the page turning control instructions to the voice assistant, such as turning to page 2, turning to chapter 3, and the like, after receiving the page turning control instructions, the voice assistant analyzes page numbers and instructs the demonstration equipment to turn pages of the document according to the analyzed page numbers, that is, the demonstrator needs to send control instructions beyond the content of the lecture in the demonstration process, and obviously, the additional control instructions are hard and redundant, and can interrupt the demonstration thought of the demonstrator.

According to the document control method, when a document is demonstrated, the following voice page turning trigger control is demonstrated in the demonstration interface of the document, and when triggering operation of the following voice page turning trigger control occurs, a following voice page turning mode is started for the document. In the mode, the document turns pages along with the voice content of the presenter, and the text content in the target page turned pages is matched with the voice content of the presenter in terms of semantics, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen or send control instructions beyond the presentation content to control the document to turn pages, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of document presentation is improved.

In one embodiment, as shown in fig. 2, a document control method is provided, and the method is applied to the terminal 102 in fig. 1 for illustration, and includes the following steps:

step 202, when a document is presented, a trigger control which follows voice page turning is presented in a presentation interface of the document.

The document may be a local document, such as a PPT document, a Word document, an Excel document, or the like. The document can also be an online collaboration document supporting multi-person collaboration, the online collaboration document can support multi-person simultaneous browsing and editing, and the online collaboration document can be an online slide, an online document, an online table or the like. The online collaboration document can be stored in the cloud, and the terminal opens and accesses the content of the online collaboration document stored by the cloud server through the access address. The terminal can display the document page of the online collaboration document through the browser, and can also display the document page of the online collaboration document in the instant messaging application.

The presentation interface is an interface presented when the document is in the document presentation mode. Accordingly, the editing interface is an interface presented when the document is in the document editing mode. The terminal may switch the document between the document presentation mode and the document editing mode in response to a switch instruction between the document presentation mode and the document editing mode. Specifically, when a document is in a document editing mode, the terminal may enter a document presentation mode from the document editing mode in response to a trigger operation to present the document, presenting a presentation interface of the document. When the document is in the document demonstration mode, the terminal can respond to the triggering operation of exiting the document demonstration mode, close the demonstration of the document, switch from the document demonstration mode to the document editing mode and display the editing interface of the document.

The following voice page turning trigger control is an icon or a control for triggering entering a following voice page turning mode.

In one embodiment, step 202 includes: in response to a triggering operation for demonstrating a document, entering a document demonstrating mode about the document; after entering a document demonstration mode, displaying a demonstration interface of the document; and displaying a trigger control which follows the voice page turning in the demonstration interface.

Specifically, when the document needs to be demonstrated, the terminal can respond to the triggering operation of the demonstrated document, enter a document demonstration mode and demonstrate the following voice page turning triggering control in the demonstration interface of the document. That is, the following voice page-turning trigger control is displayed in the presentation interface of the document, and when the document is in the document presentation mode, the terminal may further respond to the triggering operation of the following voice page-turning trigger control displayed in the presentation interface, and enter the following voice page-turning mode.

In one embodiment, the above document presentation mode may further include a follow-by-voice page-turning mode and a general presentation mode. In the following voice page-turning mode, the terminal can automatically page the document according to the voice content of the presenter. In one embodiment, in the universal presentation mode, the terminal may page the document according to a page-turning instruction manually triggered by the presenter via a presentation pen or a keyboard. It will be appreciated that the follow-speech page-flip mode is one particular document presentation mode.

In one embodiment, when the document is in the document editing mode, the terminal can display a follow-up voice page turning trigger control in an editing interface of the document, and when the document needs to be displayed and the follow-up voice page turning is displayed, the terminal can directly enter the follow-up voice page turning mode from the document editing mode in response to the trigger operation of the follow-up voice page turning trigger control displayed in the editing interface. That is, the following voice page-turning trigger control is displayed in the editing interface of the document, and when the document is in the document editing mode, the terminal can directly enter the following voice page-turning mode in response to the triggering operation of the following voice page-turning trigger control.

Optionally, in this case, after entering the following voice page-turning mode, the document is still in the document presentation mode, the terminal displays a presentation interface of the document, so the terminal may display a control for exiting the following voice page-turning mode in the presentation interface, for example, the terminal may continue to display a following voice page-turning trigger control in the presentation interface, so in the following voice page-turning mode, the terminal may exit the following voice page-turning mode in response to a trigger operation of the following voice page-turning trigger control.

In this embodiment of the present application, the triggering operation is a preset operation acting on the control or the icon, and the preset operation may be, for example, a touch operation, a click operation, or a key operation. The touch operation may be, for example, a touch operation of a control or an icon displayed on the terminal through a screen, the click operation may be, for example, a cursor click operation of the control or the icon displayed on the terminal through a mouse or a presentation pen, and the key operation may be, for example, an operation controlled through a specific key in the presentation pen or a keyboard.

Step 204, in response to the triggering operation of the following voice page turning triggering control, entering a following voice page turning mode.

The following voice page turning mode may be used to trigger the terminal to start executing logic related to the following voice page turning, including step 206 and the like. The voice page turning is followed, and the name meaning is that the terminal can control the document to turn pages along with the voice content of the presenter, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen, and does not need to send control instructions except the presentation content, such as turning pages to 2 nd pages and opening chapter 3, to control the document to turn pages, when the presenter speaks that the presenter describes the appearance … of the product in chapter 3 next, the terminal can directly control the document to turn pages along with the speaking content of the presenter to the page where chapter 3 is located, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of the document presentation is improved.

The triggering operation of the following voice page turning triggering control is a preset operation acting on the following voice page turning triggering control in the demonstration interface, and the preset operation can be, for example, touch operation, clicking operation, key operation or the like.

In one embodiment, the method further comprises: in the following voice page turning mode, responding to the triggering operation of the following voice page turning triggering control, and exiting the following voice page turning mode.

Specifically, in the document presentation process, when the document is in the following voice page-turning mode, the terminal can respond to the triggering operation of the following voice page-turning triggering control shown in the presentation interface to exit the current following voice page-turning mode, and then the terminal stops executing the related logic of automatically turning pages following the speaking content of the presenter.

Further, after the terminal controls the document to exit from the following voice page-turning mode, the document can be in a general demonstration mode, and the document can also be directly in a document editing mode. It can be understood that after the terminal controls the document to exit the following voice page-turning mode, the document is in a general demonstration mode, which means that the document is still in the document demonstration mode, and the demonstrator can manually trigger a page-turning instruction through a demonstration pen or a keyboard to turn the page of the document. And after the terminal controls the document to exit the following voice page-turning mode, the document is directly in a document editing mode, which means that the document is no longer in a document demonstration mode and the document is in an editable state.

FIG. 3 is an interface diagram illustrating an interface in one embodiment. Referring to fig. 3, when a document is presented, in a presentation interface 300, a following voice page turning trigger control 302, that is, "following voice page turning" in fig. 3, is provided, a presenter can turn on or off a following voice page turning function through the control, after the following voice page turning function is turned on, the speaking voice of the presenter is automatically switched to a corresponding page, and the switched page may be the next page, may be the previous page, or may be the page with page crossing page turning, so that manual interaction of a user is reduced, and presentation experience of the user is improved. Optionally, after the following voice page-turning function is turned on, the terminal may turn off the following voice page-turning function in response to a trigger operation of the presenter on the following voice page-turning control, so as to exit the following voice page-turning mode.

In step 206, in the follow-up voice page-turning mode, the voice content of the presenter is turned to the target page of the document, and the text content in the target page is matched with the semantics of the voice content of the presenter.

Where the presenter is the object of the speech content that is recorded when the document is presented, the object is typically the presenter of the document, during which the specific content of the document is presented. The speech content of the presenter is the content of the presenter's presentation during the presentation, e.g. "good, thank you for my graduation answer speech", "my directory structure below", "look … of the product described in chapter 3 below", etc. The speech content of a presenter is typically used to describe the specific content of the document without breaking the presentation idea during the presentation by the presenter.

The target page is a page in the document that matches the presenter's voice content. It can be understood that as the presentation process proceeds, the voice content of the presenter continuously changes, the target page is a page matching the latest voice content of the presenter, the text content in the target page is semantically associated with the latest voice content of the presenter, so that as the voice content of the presenter continuously changes, the latest voice content of the presenter also continuously changes, and the target page correspondingly updates, thereby realizing the effect of automatically turning pages following the voice content of the presenter.

It will be appreciated that in the follow-voice page-flip mode, the target page is a page that follows the changes in the presenter's voice content, so the target page may be the page that is the last, next, or page-crossing page of the current page.

The text content in the target page is matched with the voice content of the presenter in terms of semantics, the voice content of the presenter at present is describing or introducing the specific content in the target page, and the text content in the page is used as the basis of page turning, so that the page turning accuracy is higher during document presentation.

The text content in the page may be at least one of page remark content and page title content in the page, and may also be at least one of a part of the page remark content and a part of the page title content. The page remark content is remark information of a presenter on a document and can be used for guiding the presentation process of the presenter. During presentation, the terminal can display the document in a split screen mode, namely, in a document presentation mode, page remark content can be set to be displayed on the terminal used by a presenter and not displayed on the terminal watched by a viewer. The page title content is the title of each page and may include a main title, a subtitle, and the like. It will be appreciated that some pages in a document may not have page remark content or page title content.

In some cases, the document page only displays the title and the picture, the moving picture or the video used for describing the demonstration content, and the specific description is not adopted, so that the page title content or the page remark content outside the document page is adopted as the basis of page turning, the voice content of the demonstrator is matched with the page remark content or the page title content. It will be appreciated that, after a document page is used as an image, the voice content of the presenter is matched with each image, and in many cases, the corresponding page cannot be matched, so that the page cannot be turned.

FIG. 4 is a schematic diagram of an interface for automatically turning pages of a document with the speech content of a presenter in one embodiment. Referring to fig. 4, page 1 is a page presented when the presenter speaks "happy, thank you for attending my graduation answer presentation". When the presenter speaks "very powerful learning hours … …, which can unfortunately be overstocked at the teacher's direction", the document remains on page 1. When the presenter speaks "hereinafter my directory structure," the document is flipped from page 1 to page 2. When the presenter speaks "the relevant experimental data … … is given next in chapter 4", the document is turned to page 5. When the presenter speaks "the previous directory structure also introduces … …", page 2 is turned from page 5 across pages.

In the above document control method, when a document is presented, a following voice page turning trigger control is presented in a presentation interface of the document, and when a trigger operation of the following voice page turning trigger control occurs, a following voice page turning mode is started for the document. In the mode, the document turns pages along with the voice content of the presenter, and the text content in the target page turned pages is matched with the voice content of the presenter in terms of semantics, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen or send control instructions beyond the presentation content to control the document to turn pages, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of document presentation is improved.

In one embodiment, as shown in fig. 5, before presenting the document, the method further includes a step of initializing the document to follow the voice page turn, including:

step 502, when editing a document, displaying a follow voice page turning initialization control in an editing interface of the document.

The follow voice page turning initialization control is an icon or control used for triggering the follow voice page turning initialization of the document. Following the voice page-turning initialization, a process of extracting text content corresponding to each page in the document is referred to.

In one embodiment, when editing a document, presenting a follow-speech page-turning initialization control in an editing interface of the document includes: in response to a triggering operation for editing a document, entering a document editing mode for the document; after entering a document editing mode, displaying an editing interface of a document; and displaying a page turning initialization control following the voice in the editing interface.

Specifically, when the document needs to be edited, the terminal can respond to the triggering operation of editing the document, enter a document editing mode and display the follow-up voice page turning initialization control in an editing interface of the document. That is, the follow-speech page-turning initialization control is displayed in the editing interface of the document.

In step 504, in response to the triggering operation of the follow-up voice page turning initialization control, a follow-up page turning prompt area is displayed in the editing interface.

The follow page turning prompt area is an area for displaying text content corresponding to each extracted page in the editing interface. The following page turning prompt area can be displayed in the editing interface in a popup window, a floating window and the like, can be displayed at a certain fixed position in the editing interface, and can also move in the editing interface in response to a moving operation, a dragging operation or a sliding operation.

The triggering operation of the following voice page turning initialization control is a preset operation acting on the following voice page turning initialization control in the editing interface, and the preset operation can be, for example, touch operation, clicking operation, key operation or the like.

Specifically, for a follow-up voice page turning initialization control displayed in an editing interface, when triggering operation of the control occurs, a follow-up page turning prompt area can be popped up in the editing interface, and text content corresponding to each page is displayed in the follow-up page turning prompt area.

And step 506, in the following page turning prompt area, displaying text content corresponding to each page in the document.

In this embodiment, when a document is edited, a page-turning along with speech initialization control is provided in an editing interface of the document, where the control is used as an entry for extracting text content corresponding to each page required for page-turning along with speech, and can trigger extraction of text content corresponding to each page, where the text content is used as key description information for automatically page-turning along with speech and is displayed to a user in a form of a small window. The text content can support browsing or self-editing of the demonstrator, is beneficial to the presenter to confirm or change the text content according to own demonstration requirements, for example, change the text content into the content which will be taught in the demonstration process, thereby improving the page matching accuracy and page turning accuracy.

In one embodiment, the text content corresponding to each page of the document displayed in the follow-up page-turning prompt area is limited, for example, the terminal may limit the text content corresponding to each page to one sentence, or limit the number of words of the text content corresponding to each page to within 10 words, and so on.

FIG. 6 is an interface diagram of an editing interface in one embodiment. Referring to fig. 6, when editing a document, in an editing interface 600, a follow-voice page-turning initialization control 602 is provided, and a presenter can trigger initialization of a follow-voice page-turning function through one key of the control to trigger extraction of text content corresponding to each page in the document. In addition, the editing interface 600 includes a menu bar area 601, a document page area 603, a page navigation area 604, and a document remark area 605, in which a follow-by-speech page turning initialization control 602 is located in the menu bar area 601, a title, a subtitle, and a body are located in the document page area 603, and page remark content is located in the document remark area 605.

FIG. 7 is a schematic diagram of an interface showing a follow page flip prompt area in an editing interface, in one embodiment. Referring to part (a) of fig. 7, for a follow voice page-turning initialization control 702 provided in an editing interface 700, a presenter can pop up a follow page-turning prompt area 706 in the editing interface 700 by generating a trigger operation 704 thereon, and as shown in part (b) of fig. 7, page remark content and page title content corresponding to each page are presented in the follow page-turning prompt area 706. For example, 1-1 represents remarks on page 1: "happy Xie Da to participate in my graduation answer presentation", 1-2 represents the title of page 1: "graduation answer", similarly, 2-1 represents a remark on page 2, 2-2 represents a title on page 2, 3-1 represents a remark on page 3, 3-2 represents a title on page 3, and so on.

In one embodiment, the method further comprises:

displaying the edited text content in the follow-up page-turning prompt area in response to the text editing operation in the follow-up page-turning prompt area; and updating the text content corresponding to each page in the document according to the edited text content.

In this embodiment, the follow-up page-turning prompt area is an area supporting the presenter to edit text, and for the terminal to automatically extract and display the content in the follow-up page-turning prompt area, the user can confirm the content, and if the user needs to edit the content according to the presentation requirement, the user can edit and modify the content in the follow-up page-turning prompt area, so that the probability of successfully matching the voice content of the presenter with the page of the document is improved in the subsequent document presentation.

In one embodiment, the method further comprises: when the document is in the document editing mode and triggering operation of the follow-voice page turning initialization control displayed in the editing interface does not occur, text content corresponding to each page in the document is extracted, and page numbers of the pages are stored corresponding to the corresponding text content.

Specifically, if the document is not initialized by the following voice page turning initialization control when the document is edited so as to obtain text content corresponding to each page in the document, when the document is demonstrated, the text content corresponding to each page in the document is extracted in response to the triggering operation of the following voice page turning triggering control, and page numbers of each page and corresponding text content are stored.

That is, when a document is presented, after entering the document presentation mode, the terminal may determine whether the document has completed initialization of the voice page turning function, for example, the terminal may determine whether text content corresponding to each page in the document exists, so as to determine whether a trigger operation of a following voice page turning initialization control displayed in an editing interface has occurred when the document is in the document editing mode. If yes, the following voice page turning mode can be entered according to the method, and the following voice content of the demonstrator turns pages.

If not, the terminal can respond to the triggering operation of the voice following triggering control displayed in the demonstration interface at the moment, trigger the initialization of the one-time following voice page turning function, extract the text content corresponding to each page in the document, then execute the relevant logic of entering the following voice page turning mode and turning the voice content of the demonstrator. It can be understood that at this time, the document is in the document demonstration mode, after the text content corresponding to each page in the document is extracted, the terminal only caches the extracted text content, and the extracted text content is not required to be displayed in the demonstration interface by following the page turning prompt area.

In one embodiment, in the document editing mode, even if the terminal has extracted text content corresponding to each page in the document based on the triggering operation of the follow-up voice page turning initialization control shown in the editing interface, the user may edit the document again, such as modifying the title of the document or the page remark content of the document. In order to ensure the page turning accuracy, the terminal can automatically update the text content corresponding to each extracted page again based on the editing operation of the user. In addition, the terminal can update the text content displayed in the follow-up page turning prompt area according to the text content corresponding to each page which is automatically updated. Of course, the text content corresponding to each page may not be automatically updated as the user edits the document again, but rather may be based on the content presented in the follow-up page-turning prompt area.

In one embodiment, the method further comprises:

when a document is demonstrated, in a demonstration interface of the document, the display of the following voice page turning initialization control and the following page turning prompt area is canceled.

It should be noted that, after entering the document demonstration mode, the following voice page turning initialization control and the following page turning prompt area do not need to be displayed in the demonstration interface. As described above, if the document is not initialized, the initialization can be performed once according to the following voice page turning trigger control displayed in the demonstration interface, and if the initialization of the document is already completed, the following voice page turning initialization control and the following page turning prompt area are not required to be displayed in the demonstration interface.

In one embodiment, the method further comprises: extracting text content corresponding to each page in a document, wherein the text content comprises at least one of page remark content and page title content; and storing page numbers of the pages corresponding to the corresponding text contents.

The extracted text content may be text content of original text extracted from each page of the document, that is, original text content. When the original text content is long and the content is more, since the speech content of the presenter in each speech is not necessarily long, if the original text content is directly used to match with the speech content of the presenter, the accuracy of the matching result is affected when the difference between the lengths of the two is large, for this reason, the extracted text content may also be a part reserved from the original text content, for example, the first sentence is reserved, the first 10 words are reserved, and so on.

Specifically, the terminal may extract and store text contents corresponding to all pages of the document, such as page remark contents and page title contents, according to page numbers. For example, the terminal marks the page remark content extracted from each page as Rn, the length as L (Rn), n=1, 2, 3, …, n representing the page number, and the page header content extracted from each page as Tn, the length as L (Tn). The terminal may store the page number and the content in a key-value manner in the cache. For example, page 1 remarks content R1 and page 2 remarks content R2. The page title content of the 1 st page is T1, and the page title content of the 2 nd page is T2.

In one embodiment, extracting text content corresponding to each page in a document includes: extracting original text content corresponding to each page of the document; when the number of words of the original text content is less than the preset number, the original text content is reserved and used as the text content extracted from the corresponding page; when the number of words of the original text content is larger than the preset number, reading is continued after the preset number of characters are read from the first character of the original text content until the separator is read for the first time, and the read content is reserved as the text content extracted from the corresponding page.

The original text content can be original remark content of the page or original title content of the page. Specifically, for each page of the document, the terminal determines whether the original text content of the page includes more than a preset number of characters, for example, more than 10 characters, and if not, directly reserves the original text content of the page as the text content corresponding to the page. If yes, further judging whether the preset number of characters is followed by a separator, such as comma, stop sign, period, exclamation mark, and the like. If yes, reserving the preset number of characters as text content corresponding to the page, if not, continuing to read backwards until the first separator is encountered, and taking the read characters as the text content corresponding to the page.

In the following, the page remark content is taken as an example, the original page remark content on page 1 is "good, thank you for taking part in my graduation answering and speaking, and is very honor … …", the 10 th character of the sentence is "reference" without separator, the separator is read after the sentence is read until the sentence is finished, and the extracted text content corresponding to page 1 comprises "good feel Xie Da for taking part in my graduation answering and speaking". The original page remark content of the page 2 is of a directory structure below, and if the total number of characters is not more than 10, the whole page remark content is reserved. The original page remark content of the 3 rd page is ' first introduction graduation design overview ', the part is mainly … … ', the 10 th character is ' said ' and is followed by comma, reading is not continued, and the first sentence is directly taken as text content corresponding to the 3 rd page.

When the extracted text content is a part reserved from the original text content, the text content displayed in the following page turning prompt area can be the original text content, and as key description information of following voice page turning, the text content displayed in the following page turning prompt area can also be consistent with the extracted text content and is a part of the original text content.

In this embodiment, by reserving a part of the original text content as the text content corresponding to each page, not only the amount of cached data can be reduced, but also the accuracy of matching can be improved when the reserved part of the content is used to match with the voice content of the presenter in the following.

In one embodiment, the method further comprises: collecting voice data of a demonstrator in a following voice page-turning mode; converting the voice data of the demonstrator into sentences of the demonstrator; and carrying out semantic matching on the latest sentences and text contents corresponding to all pages in the document successively, and determining a target page according to the text contents successfully matched.

After entering the following voice page turning mode, the terminal can collect voice content of the demonstrator in real time. Of course, the voice content of the presenter can be collected in real time by the designated recording device, and then the voice content is transmitted to the terminal and then matched by the terminal. The terminal continuously acquires captured voice contents which are used as the basis for page turning following voice.

The terminal can use a voice recognition network based on a neural network to convert the input voice content into corresponding sentences, namely words, and can also break sentences of continuous voice content and translate the voice content of a demonstrator in real time. The speech recognition network may be constructed based on a speech acoustic model, a language model, and a dictionary.

In addition, the terminal may sequentially generate and buffer the translated latest sentence based on the inputted continuous voice content. For example, the speech content of the demonstrator is "honored cricket's teacher, good" and the sentence after the sentence breaking is "honored cricket's teacher/good", then the latest sentence translated 1 st is "honored cricket's teacher" and the latest sentence translated 2 nd is "good". With the continuous input of the voice content of the demonstrator, the translated sentences are the latest sentences in sequence.

When the latest sentences are matched with text contents corresponding to all pages in the document in a semantic manner successively, the terminal can utilize an unsupervised Euclidean distance method to vector the text contents corresponding to the latest sentences and the pages respectively by adopting a word vector model, and then measure the distance between the two vectors, namely the content similarity, as the matching degree of the two in terms of semantics.

Alternatively, when the latest sentence is semantically matched successively with the text content corresponding to each page in the document, the terminal may semantically match the latest sentence successively with the text content corresponding to the page starting from page 1. That is, regardless of what page of the document is displayed on the current presentation interface, the terminal successively matches the latest sentence with the text content corresponding to the page from page 1 until the first match is successful, and then the target page is determined. For example, from page 1, until page 3, the matching is successful, then page 3 is determined to be the target page.

Optionally, the terminal may further match the latest sentence with text contents corresponding to all pages in the document successively, and select a page with the highest matching degree as the target page. For example, the documents have 5 pages in total, after all the documents are matched, the matching degree of the 2 nd page is highest, and the 2 nd page with the highest matching degree is selected as the target page.

Optionally, in most cases, the presentation process is sequentially presented according to the sequence of the pages in the document, that is, the presenter sequentially speaks according to the sequence of the pages in the document, so as to reduce the number of matching the latest sentence with the text content of the pages in the document, the terminal may further determine the page number of the document page displayed in the current presentation interface, and match the page successfully matched for the first time from the subsequent page of the document page where the page number is located, as the target page. For example, the document has 5 pages in total, the page of the document displayed in the current demonstration interface is the 2 nd page, the matching is started from the 3 rd page, and if the first matching is successful when the page is matched to the 4 th page, the 4 th page is taken as the target page. Obviously, the target page is determined only by matching 1 time, and compared with 4 times of matching from the page 1, the terminal response efficiency can be improved, so that the demonstration experience of a demonstrator is improved.

In one embodiment, the semantic matching of the latest sentence with the text content corresponding to each page in the document successively comprises: in the successive matching process of successively carrying out semantic matching on the latest sentences and the text contents corresponding to the pages in the document, converting the latest sentences into corresponding word sequences, and converting the text contents corresponding to the matched pages into corresponding word sequences; generating sentence vectors corresponding to the latest sentences based on word vectors corresponding to the segmentation words in the word sequences corresponding to the latest sentences, and generating text vectors corresponding to the text contents based on word vectors corresponding to the segmentation words in the word sequences corresponding to the text contents; and when the similarity between the sentence vector and the text vector is larger than a preset threshold value, judging that the latest sentence is successfully matched with the text content.

Specifically, the computer device divides the text content corresponding to each page in the document into words, respectively, and thus converts the latest sentence into a corresponding word sequence, and converts the text content corresponding to the page being matched into a corresponding word sequence.

In one embodiment, a word vector for each word in the word sequence may be obtained based on a word vector model. The word vector model is to map each word into a vector with high dimensionality through training of a large number of corpora, so as to obtain a corresponding word vector. By solving the distance between the vectors, the similarity between the two words can be judged. The word vector model may be a neural network model based on CBOW and Skip-Gram algorithms.

After the word vector corresponding to each word in the word sequence is obtained, the sentence vector corresponding to the sentence represented by the word sequence can be obtained. For example, the term vector may be calculated by using an average vector, a TF-IDF weighted average vector, or a SIF weighted average word vector.

The similarity between the sentence vector and the text vector can be represented by the cosine distance between the two vectors, the smaller the cosine distance is, the higher the similarity is represented, and when the similarity is larger than a preset threshold value, the latest sentence is judged to be successfully matched with the text content. For example, the preset threshold may be 80%. Note that, when the similarity is equal to the threshold value, it may be determined that the matching is successful or that the matching is failed, which is not limited in this application.

In one embodiment, the semantic matching of the latest sentence with the text content corresponding to each page in the document successively comprises: carrying out semantic matching on the latest sentences successively with page remark contents corresponding to all pages in the document, and taking a page corresponding to the page remark contents successfully matched for the first time as a target page; when the latest sentence is not successfully matched with the page remark content corresponding to each page in the document, carrying out semantic matching on the latest sentence and the page title content corresponding to each page in the document successively, and taking the page corresponding to the page title content successfully matched first as a target page; and when the latest sentence is not successfully matched with the page title content corresponding to each page in the document, continuously collecting voice data of the presenter in a following voice page turning mode.

In this embodiment, when determining a target page matching with the voice content of the current presenter, the translated latest sentence may be preferentially matched with the page remark content of each page successively, and the page successfully matched for the first time may be used as the target page. If the matching is unsuccessful, the page is matched with the page title content of each page successively, and the page which is successfully matched for the first time is taken as a target page. If the matching is not successful, the user continues to stay on the currently displayed document page, and continues to collect the voice data of the demonstrator.

It should be noted that, the target page matching with the voice content of the current presenter may also be the document page currently being presented.

Fig. 8 is a simplified flow chart of a method of controlling a file in one embodiment. Referring to fig. 8, the method mainly comprises the following steps:

step 802, initializing a following voice page turning function for a document: and extracting text content corresponding to each page, including page remark content and page title content.

Step 804, a follow-speech page flip function is initiated.

Step 806 captures real-time voice content of the presenter.

Step 808, translate the statement of the presenter.

Step 810, carrying out semantic matching on the translated latest statement and page remark content corresponding to each page in the document successively;

step 812, it is determined whether the page remark content corresponding to a certain page is successfully matched. If yes, go to step 818; if not, go to step 814;

step 814, performing semantic matching on the translated latest sentence and page title content corresponding to each page in the document successively;

step 816, it is determined whether the matching of the page title content corresponding to a certain page is successful. If yes, go to step 818; if not, returning to execute step 806;

step 818, turning pages from the current page to the target page successfully matched; and then returns to step 806.

Since the length of the voice content of the presenter and the text content (original text content or a part of the original text content) corresponding to the page may not be uniform, the semantic matching is directly performed, which may affect the accuracy of the matching result. Therefore, when the latest sentences are matched with the text content corresponding to each page in the document in a semantic manner, the terminal can intercept the text content corresponding to the latest sentences or pages and then perform semantic matching so as to improve the accuracy of matching the voice content of the presenter with the text content corresponding to the pages.

In one embodiment, the semantic matching of the latest sentence with the page remark content corresponding to each page in the document successively comprises: in the successive matching process of successively matching the latest sentence with the page remark content corresponding to each page in the document, comparing the lengths of the page remark content corresponding to the page which is being matched with the latest sentence; when the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page remark content; when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, the latest sentence is intercepted from the last character of the page remark content to the direction of the first character, the intercepted content is obtained, and the latest sentence is subjected to semantic matching with the intercepted content.

Specifically, for the translated latest sentence, when matching the translated latest sentence with the page remark content, the terminal can align the sentence according to the length of the sentence, so that the matching accuracy is improved.

The terminal can record the latest translated sentence as S and the length as L (S), and the terminal successively performs semantic matching on the latest translated sentence S and the page remark content Rn of the nth page. The length of the page remark content Rn of the n-th page is denoted as L (Rn). In general, in chinese expression, people are used to place key content in the second half of a sentence, so the terminal can semantically match the last x characters of the translated latest sentence S with the page remark content Rn of each page in the document, that is, the terminal intercepts the translated latest sentence S and the length of the page remark content Rn, the intercepted sentence is S ', and the intercepted content is R' n.

Specifically, the terminal compares the length L (S) of the translated latest sentence S with the length L (Rn) of the page remark content Rn, and when L (S) is less than or equal to L (Rn), the whole content of the translated latest sentence S, that is, x=l (S), is reserved, the Rn needs to intercept the content with the length x, the intercepted content is obtained as R 'n, and the translated latest sentence S is semantically matched with the intercepted content as R' n. When L (S) > L (Rn), reserving the whole content of the page remark content Rn, intercepting L (Rn) words after the translated latest statement S, that is, x=l (Rn), obtaining the intercepted statement as S ', and performing semantic matching on the intercepted statement as S' and the page remark content Rn. And the terminal takes the page corresponding to the page remark content successfully matched for the first time as a target page. The specific matching method is consistent with the aforementioned method of matching the translated sentence with the text content corresponding to the page, and the description is not repeated.

For example, the translated latest statement S is: "see my directory structure below", S has a length of 10, calculated from the page remark content R1 corresponding to page 1 of the document, R1 being: "Happy Xie Da people are engaged in my graduation answer dialect" and the length is 18, so R1 needs to be intercepted according to the length of S, and R' 1 after interception is: "participate in My graduation answer lecture", length 10, calculate the similarity between the two texts about 10%, the similarity is less than the preset threshold 80%, page 1 fails to match successfully. Thus, page 2, page 3, …, and so on need to be compared continuously. When page 2 is compared, page 2 remark content R2 is: the length of the following directory structure is 9, so S needs to be intercepted according to the length of R2, and S' after interception is as follows: and the length is 9, the similarity between the two texts is calculated to be about 98 percent, the similarity is larger than a preset threshold value of 80 percent, and if the page 2 is successfully matched, the voice content of the presenter is continuously captured after the page 2 is directly turned.

If the similarity is smaller than the preset threshold after the translated latest statement S is matched with the page remark content corresponding to all the pages in the document, continuing to match the translated latest statement S with the page title content corresponding to all the pages in the document.

In one embodiment, the semantic matching of the latest sentence with the page header content corresponding to each page in the document successively comprises: in the successive matching process of successively carrying out semantic matching on the latest sentence and the page title content corresponding to each page in the document, comparing the length of the page title content corresponding to the latest sentence and the page being matched; when the length of the latest sentence is greater than the length of the page title content, intercepting the latest sentence from the last character to the first character according to the length of the page title content to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page title content; when the length of the latest sentence is smaller than the length of the page header content, according to the length of the latest sentence, intercepting the page header content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

The terminal can record the latest translated sentence as S and the length as L (S), and the terminal carries out semantic matching on the latest translated sentence S and the page title content Tn of the nth page successively. The length of the page header content Tn of the nth page is denoted as L (Tn). Similarly, the terminal may semantically match the last y characters of the translated latest sentence S with the page header content Tn of each page in the document, that is, the terminal intercepts according to the length of the translated latest sentence S and the page header content Tn, where the intercepted sentence is s″, and the intercepted content is T' n.

Specifically, the terminal compares the length L (S) of the translated latest sentence S with the length L (Tn) of the page header content Tn, and when L (S) is less than or equal to L (Tn), retains the entire content of the translated latest sentence S, i.e., y=l (S), tn is required to intercept the content of y length, obtains the intercepted content as T 'n, and performs semantic matching on the translated latest sentence S and the intercepted content as T' n. When L (S) > L (Tn), reserving the whole content of the page title content Tn, intercepting L (Tn) words after the translated latest statement S, namely y=L (Tn), obtaining the intercepted statement as S ', and carrying out semantic matching on the intercepted statement as S' and the page title content Tn. The terminal takes a page corresponding to the page title content successfully matched for the first time as a target page, and continuously captures the voice content of the presenter after turning the page to the target page.

For example, the translated latest statement S is: "Next, the key technology of the present subject is described, and when the length is 13 and none of the page remark contents Rn corresponding to each page in the document matches, the comparison with the page title contents Tn corresponding to each page in the document is continued. First, the page title content T1 corresponding to the 1 st page of the document is calculated, and T1 is: the length of the graduation answer is 4, so that the latest translated sentence S needs to be intercepted according to the length of T1, and the intercepted S' is as follows: the length of the key technology is 4, the similarity between the two texts is calculated to be about 0.5 percent and smaller than the preset threshold value of 80 percent, and page 1 fails to be successfully matched. Thus, continuing to compare with page header content T2 of page 2, T2 is: the length of the directory is 2, so that the translated latest statement S needs to be intercepted according to the length of T2, and the intercepted S' is as follows: the length of the technology is 2, the similarity between the two texts is calculated to be 1.5 percent, the similarity is less than 80 percent and is less than 80 percent of a preset threshold value, and page 1 fails to be successfully matched. Continuing to match page header content corresponding to page 3, page 4, …, etc., when proceeding to page 5, T5 is: the length of the key technology is 4, so that the translated latest statement S needs to be intercepted according to the length of T5, and the intercepted S' is as follows: and (3) calculating the similarity between the two texts by using a key technology, wherein the similarity is 100 percent and is larger than a preset threshold value of 80 percent, and if the page 5 is successfully matched, directly turning pages to the page 5 and then continuously capturing the voice content of the demonstrator.

If the similarity is smaller than the preset threshold after the translated latest statement S is matched with the page title contents corresponding to all pages in the document, the terminal continues to capture the voice contents of the presenter. For example, when the translated latest sentence S is: when the statement is not matched with the page remark content Rn corresponding to each page in the document and the page title content Tn corresponding to each page in the document is not matched, the terminal continues to capture the voice content of the presenter.

It should be noted that, in some cases, the speed of the presenter is faster, if the matching of the text content corresponding to each page in the document is not completed by the latest sentence translated at present, and when a new sentence has been generated, the matching process which has not been completed is abandoned, and the matching process of the text content corresponding to each page in the document and the latest sentence translated is directly started to be executed.

In one embodiment, the method further comprises: generating a page turning instruction carrying page numbers according to the page numbers of the target pages; based on the page turning instruction, after the document is turned from the current page to the target page, the voice data of the demonstrator is collected continuously in the following voice page turning mode.

Specifically, after determining a target page matched with the translated latest statement, the terminal generates a page turning instruction according to the page number of the target page, and directly jumps to the target page of the document, for example, the terminal can directly display the target page. In addition, after the target page is displayed, the terminal continues to collect voice data of the demonstrator. It can be appreciated that when the translated latest statement fails to match all pages successfully, the terminal stays on the currently presented page.

In one embodiment, the document is an online collaboration document, the method further comprising: initiating a document collaboration invitation according to the access address of the online collaboration document; and synchronizing document control information generated in the demonstration process of the online collaboration document to the terminal responding to the document collaboration invitation, so that the terminal synchronizes the control state of the online collaboration document according to the document control information.

In one embodiment, the document control method is applied to an online collaboration document application, the document is an online collaboration document, the terminal can start the online collaboration document application, a document catalog is opened in the online collaboration document application, and after a target document is selected from the document catalog, an access address of the selected target document is obtained. The terminal may send the access address to a collaboration terminal that requires simultaneous editing and modification of the target document. The collaboration terminal can open the target document according to the access address through the online collaboration document application or the browser, and can also open the target document according to the access address through other instant messaging applications.

In addition, when the terminal and the cooperative terminal simultaneously open the target document, document control information, such as document page turning, triggering operation on the document and the like, generated on the document are synchronously transmitted to the cooperative terminal in the process of demonstrating the target document by the terminal, and the cooperative terminal can synchronously control the target document according to the document control information, so that different users at two ends can see the operation on the document by the opposite side, and multi-user online cooperation on the target document is realized. Of course, when the terminal and the collaboration terminal open the target document at the same time, the collaboration terminal may synchronize the document control information that occurs for the target document to the terminal that initiated the document collaboration invitation.

Fig. 9 is a flow chart of a method of controlling a file in a specific embodiment. Referring to fig. 9, the method comprises the steps of:

a step 902 of entering a document editing mode with respect to a document in response to a triggering operation of editing the document;

step 904, after entering a document editing mode, displaying an editing interface of a document;

step 906, in the editing interface, displaying an initialization control for following the voice page turning;

step 908, responding to the triggering operation of the following voice page turning initialization control, and respectively extracting original text content corresponding to each page for each page in the document, wherein the text content comprises at least one of page remark content and page title content;

Step 910, when the number of words of the original text content is less than the preset number, retaining the original text content as the text content extracted from the corresponding page;

step 912, when the number of words of the original text content is greater than the preset number, reading the characters of the preset number from the first character of the original text content, and continuing to read until the separator is read for the first time, and retaining the read content as the text content extracted from the corresponding page;

step 914, storing page numbers of the pages in correspondence with the corresponding text contents;

step 916, displaying the follow page turning prompt area in the editing interface;

and step 918, in the following page turning prompt area, displaying text content corresponding to each page in the document.

Step 920, in response to the text editing operation in the following page turning prompt area, displaying the edited text content in the following page turning prompt area;

step 922, according to the edited text content, updating the text content corresponding to each page in the document;

step 924, initiating a document collaboration invitation according to the access address of the document;

step 926, synchronizing the document control information generated in the document demonstration process to a terminal responding to the document collaboration invitation, so that the terminal synchronizes the control state of the document according to the document control information;

Step 928, in response to the triggering operation of the document presentation, entering a document presentation mode with respect to the document;

step 930, after entering a document demonstration mode, displaying a demonstration interface of the document;

step 932, displaying a trigger control following voice page turning in the demonstration interface;

step 934, responding to the triggering operation of the following voice page turning triggering control, and entering a following voice page turning mode;

step 936, collecting voice data of a demonstrator in a following voice page-turning mode;

step 940, converting the voice data of the presenter into a sentence of the presenter;

step 942, comparing the length of the page remark content corresponding to the page being matched with the latest sentence in the process of sequentially performing semantic matching on the latest sentence and the page remark content corresponding to each page in the document;

step 944, when the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character to obtain an intercepted sentence, and performing semantic matching on the intercepted sentence and the page remark content;

step 946, when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, intercepting the page remark content from the last character to the first character to obtain intercepted content, and performing semantic matching on the latest sentence and the intercepted content;

Step 948, judging whether the matching with the page remark content of a certain page is successful, if yes, executing step 958; if not, then step 936 is performed:

step 950, comparing the length of the page header content corresponding to the page being matched with the latest sentence in the process of sequentially performing semantic matching on the latest sentence and the page header content corresponding to each page in the document;

step 952, when the length of the latest sentence is greater than the length of the page title content, according to the length of the page title content, intercepting the latest sentence from the last character to the first character to obtain an intercepted sentence, and performing semantic matching on the intercepted sentence and the page title content;

step 954, when the length of the latest sentence is smaller than the length of the page header content, according to the length of the latest sentence, intercepting the page header content from the last character to the first character to obtain intercepted content, and performing semantic matching on the latest sentence and the intercepted content;

step 956, judging whether the matching with the page title content of a certain page is successful, if yes, executing step 958; if not, then step 936 is performed:

Step 958, controlling the document to turn pages to the matched target pages;

in step 960, in the follow voice page-turning mode, the follow voice page-turning mode is exited in response to a trigger operation of the follow voice page-turning trigger control.

According to the document control method, when the document is demonstrated, the following voice page turning trigger control is demonstrated in the demonstration interface of the document, and when triggering operation of the following voice page turning trigger control occurs, the following voice page turning mode is started for the document. In the mode, the document turns pages along with the voice content of the presenter, and the text content in the target page turned pages is matched with the voice content of the presenter in terms of semantics, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen or send control instructions beyond the presentation content to control the document to turn pages, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of document presentation is improved.

In addition, the page remark content or the page remark content outside the document page is adopted as the basis of page turning, the voice content of the presenter is matched with the page remark content or the page title content, compared with the mode that in the related art, after the document page is used as an image, the voice content of the presenter is matched with each image, the page remark content and the page title content are used for guiding the presentation process of the presenter, namely, the content spoken by the presenter is guided, so that the voice content of the presenter can be greatly ensured to be matched with the text content of the page, namely, the voice content is associated with the text content of the document page, and the page turning accuracy is improved.

Moreover, the follow-up page turning prompt area is an area supporting the presenter to edit the text, the content in the follow-up page turning prompt area can be automatically extracted and displayed by the terminal for the user to confirm, and if the user needs to edit the content according to the demonstration requirement, the user can edit and modify the follow-up page turning prompt area, so that the follow-up page turning prompt area is beneficial to improving the probability of successfully matching the voice content of the presenter with the page of the document when the document is demonstrated later.

When the latest sentences are matched with the text content corresponding to each page in the document in a semantic manner successively, the terminal can intercept the text content corresponding to the latest sentences or pages and then perform semantic matching so as to improve the accuracy of matching the voice content of the demonstrator with the text content corresponding to the pages.

The application scene is provided with the document control method. Specifically, the application of the document control method in the application scene is as follows:

the user a starts an online collaborative document application through the user terminal 1, and creates and edits an online collaborative document in the online collaborative document application, when the document is in a document editing mode. After the editing of the document is completed, the user A clicks a follow-up voice page turning initialization control displayed in an editing interface of the document, the user terminal 1 is triggered to pop up a follow-up page turning prompt area in the editing interface, and text content corresponding to each page in the document, including at least one part of reserved page remark content and page title content, is displayed in the follow-up page turning prompt area.

When the terminal extracts text content, the following two conditions are distinguished: the page remark content and the page title content of each page, if the page remark content is less than N (for example, N is 10) words, all the contents are reserved; if the number of the characters exceeds N, judging whether the characters immediately behind the Nth character are comma, period and other separators, if so, reserving the first N characters; if not, the read is continued until the first separator is encountered.

For the text content displayed in the follow-up page turning prompt area, after confirming that the content displayed in the follow-up page turning prompt area is correct, the user A saves the document, generates an access address of the document through the online collaboration document application, and then sends the access address to the user terminal 2 needing online collaboration on the document through the user terminal 1.

When user a opens the document via user terminal 1 and user B opens the document via user terminal 2, user a may begin to present the document while the document is in the document presentation mode, and the control state of the document will be synchronized by user terminal 1 to user terminal 2 throughout the presentation.

And the user A clicks a follow-up voice page turning trigger control displayed in a demonstration interface of the document, and starts a follow-up voice page turning function. In the demonstration process, the user terminal 1 continuously captures the speaking content of the user A in the demonstration process of the document, sequentially translates the speaking content into sentences, sequentially carries out semantic matching on the latest sentences and the text content corresponding to each page in the document, and determines the target page to which the page is turned according to the successfully matched text content. The user terminal 1 controls the document to turn pages to the target page, generates a page turning instruction carrying page numbers of the target page, and transmits the page turning instruction to the user terminal 2 so that the user terminal 2 synchronously turns pages of the document.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a part of other steps or stages.

In one embodiment, as shown in fig. 10, a document control apparatus 1000 is provided, which may employ software modules or hardware modules, or a combination of both, as part of a computer device, the apparatus specifically comprising: a presentation interface presentation module 1002, a response module 1004, and a follow page flip module 1006, wherein:

the presentation interface display module 1002 is configured to display a trigger control that follows a voice page turning in a presentation interface of a document when the document is presented;

A response module 1004, configured to enter a following voice page-turning mode in response to a trigger operation of the following voice page-turning trigger control;

the follow page-turning module 1006 is configured to, in a follow voice page-turning mode, turn pages to target pages of the document following voice content of the presenter, where text content in the target pages matches semantics of the voice content of the presenter.

In one embodiment, as shown in fig. 11, the document control apparatus 1000 further includes:

the editing interface display module 1001 is configured to display a follow-voice page turning initialization control in an editing interface of a document when the document is edited; responding to the triggering operation of the follow-up voice page turning initialization control, and displaying a follow-up page turning prompt area in an editing interface; and displaying text contents corresponding to each page in the document in the following page turning prompt area.

In one embodiment, the editing interface presentation module 1001 is further configured to enter a document editing mode with respect to a document in response to a triggering operation to edit the document; after entering a document editing mode, displaying an editing interface of a document; and displaying a page turning initialization control following the voice in the editing interface.

In one embodiment, referring to fig. 11, the document control apparatus 1000 further includes:

A text content editing module 1008 for displaying the edited text content in the follow-up page-turning prompt area in response to a text editing operation in the follow-up page-turning prompt area; and updating the text content corresponding to each page in the document according to the edited text content.

In one embodiment, the presentation interface presentation module 1002 is further configured to cancel presentation of the follow-up voice page turning initialization control and the follow-up page turning prompt area in the presentation interface of the document when the document is presented.

In one embodiment, the presentation interface presentation module 1002 is further configured to enter a document presentation mode with respect to a document in response to a triggering operation to present the document; after entering a document demonstration mode, displaying a demonstration interface of the document; and displaying a trigger control which follows the voice page turning in the demonstration interface.

In one embodiment, the response module 1004 is further configured to, in the following voice page-turning mode, exit the following voice page-turning mode in response to a triggering operation of the following voice page-turning triggering control.

In one embodiment, the document control apparatus 1000 further includes:

before entering the follow-up voice page-turning mode, when the document is in the document editing mode, and when it is determined that the triggering operation of the follow-up voice page-turning initialization control displayed in the editing interface does not occur, the text content corresponding to each page in the document is extracted, and page numbers of each page are stored corresponding to the corresponding text content.

the following voice page turning initialization module 1010 is configured to extract text content corresponding to each page in the document, where the text content includes at least one of page remark content and page title content; and storing page numbers of the pages corresponding to the corresponding text contents.

In one embodiment, the following voice page turning initialization module 1010 is further configured to extract, for each page in the document, an original text content corresponding to each page; when the number of words of the original text content is less than the preset number, the original text content is reserved and used as the text content extracted from the corresponding page; when the number of words of the original text content is larger than the preset number, reading is continued after the preset number of characters are read from the first character of the original text content until the separator is read for the first time, and the read content is reserved as the text content extracted from the corresponding page.

In one embodiment, the document control apparatus 1000 further includes:

the acquisition module is used for acquiring voice data of a demonstrator in a following voice page-turning mode;

And the matching module is used for carrying out semantic matching on the latest sentences and text contents corresponding to all pages in the document successively, and determining a target page according to the text contents successfully matched.

In one embodiment, the matching module is further configured to perform semantic matching on the latest sentence and page remark contents corresponding to each page in the document successively, and use a page corresponding to the page remark contents successfully matched first as a target page; when the latest sentence is not successfully matched with the page remark content corresponding to each page in the document, carrying out semantic matching on the latest sentence and the page title content corresponding to each page in the document successively, and taking the page corresponding to the page title content successfully matched first as a target page; and when the latest statement is not successfully matched with the page title content corresponding to each page in the document, indicating the acquisition module to continuously acquire the voice data of the demonstrator in the following voice page turning mode.

In one embodiment, the matching module is further configured to compare the length of the page remark content corresponding to the page being matched with the latest sentence in a sequential matching process that sequentially performs semantic matching on the latest sentence and the page remark content corresponding to each page in the document; when the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page remark content; when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, the latest sentence is intercepted from the last character of the page remark content to the direction of the first character, the intercepted content is obtained, and the latest sentence is subjected to semantic matching with the intercepted content.

In one embodiment, the matching module is further configured to compare the length of the page title content corresponding to the page being matched with the latest sentence in a sequential matching process that sequentially performs semantic matching on the latest sentence and the page title content corresponding to each page in the document; when the length of the latest sentence is greater than the length of the page title content, intercepting the latest sentence from the last character to the first character according to the length of the page title content to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page title content; when the length of the latest sentence is smaller than the length of the page header content, according to the length of the latest sentence, intercepting the page header content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

In one embodiment, the matching module is further configured to convert the latest sentence into a corresponding word sequence and convert the text content corresponding to the page being matched into a corresponding word sequence in a successive matching process that successively performs semantic matching on the latest sentence and the text content corresponding to each page in the document; generating sentence vectors corresponding to the latest sentences based on word vectors corresponding to the segmentation words in the word sequences corresponding to the latest sentences, and generating text vectors corresponding to the text contents based on word vectors corresponding to the segmentation words in the word sequences corresponding to the text contents; and when the similarity between the sentence vector and the text vector is larger than a preset threshold value, judging that the latest sentence is successfully matched with the text content.

In one embodiment, the document control apparatus 1000 further includes:

the page turning module is used for generating a page turning instruction carrying the page number according to the page number of the target page; based on the page turning instruction, after the document is turned from the current page to the target page, the acquisition module is continuously instructed to acquire voice data of the presenter in a following voice page turning mode.

In one embodiment, the document is an online collaborative document, and the document control apparatus 1000 further includes:

The above-mentioned document control device 1000 displays a following voice page-turning trigger control in a presentation interface of a document when the document is presented, and starts a following voice page-turning mode for the document when a trigger operation for the following voice page-turning trigger control occurs. In the mode, the document turns pages along with the voice content of the presenter, and the text content in the target page turned pages is matched with the voice content of the presenter in terms of semantics, so that the document can automatically turn pages along with the voice content of the presenter, the presenter does not need to control the document to turn pages through an additional presentation pen or send control instructions beyond the presentation content to control the document to turn pages, the page turning is very convenient, the presentation thought of the presenter in the whole presentation process is more coherent, and the user experience of document presentation is improved.

For specific limitations of the document control apparatus 1000, reference may be made to the above limitations of the document control method, and no further description is given here. The respective modules in the above-described document control apparatus 1000 may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be the terminal 102 shown in fig. 1, and the internal structure diagram of which may be as shown in fig. 12. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a document control method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A document control method, the method comprising:

when a document is edited, displaying a follow voice page turning initialization control in an editing interface of the document;

responding to the triggering operation of the follow-up voice page turning initialization control, extracting text contents corresponding to all pages in the document, storing page numbers of all pages and corresponding text contents correspondingly, and displaying a follow-up page turning prompt area in the editing interface, wherein the follow-up page turning prompt area comprises the text contents corresponding to all pages in the document;

responding to the text editing operation in the following page turning prompt area, displaying edited text content in the following page turning prompt area, and updating text content corresponding to each page in the document according to the edited text content;

and under the following voice page turning mode, collecting voice data of a demonstrator, converting the voice data of the demonstrator into sentences of the demonstrator, sequentially carrying out semantic matching on the latest sentences and the text content corresponding to each stored page, and turning pages to target pages of the document according to page numbers of the text content successfully matched.

2. The method of claim 1, wherein the presenting, in an editing interface of the document, a follow-speech page-turning initialization control when editing the document comprises:

entering a document editing mode with respect to the document in response to a triggering operation to edit the document;

after entering the document editing mode, displaying an editing interface of the document;

and displaying the following voice page turning initialization control in the editing interface.

3. The method according to claim 1, wherein the presenting, in the presentation interface of the document, a follow-by-speech page-turning trigger control when presenting the document comprises:

In response to a trigger operation to demonstrate the document, entering a document demonstration mode with respect to the document;

after entering the document demonstration mode, displaying a demonstration interface of the document;

and displaying the following voice page turning trigger control in the demonstration interface.

4. The method according to claim 1, wherein the method further comprises:

and under the following voice page turning mode, responding to the triggering operation of the following voice page turning triggering control, and exiting the following voice page turning mode.

5. The method of claim 1, wherein the text content comprises at least one of page remark content and page title content.

6. The method of claim 1, wherein the extracting text content corresponding to each page in the document comprises:

extracting original text content corresponding to each page of the document;

when the word number of the original text content is less than the preset number, the original text content is reserved and used as the text content extracted from the corresponding page;

when the number of words of the original text content is larger than the preset number, reading is continued after the preset number of characters are read from the first character of the original text content until the separator is read for the first time, and the read content is reserved as the text content extracted from the corresponding page.

7. The method according to claim 1, wherein the semantic matching of the latest sentence with the text content corresponding to each stored page successively comprises:

carrying out semantic matching on the latest sentences successively with page remark contents corresponding to all pages in the document, and taking a page corresponding to the page remark contents successfully matched for the first time as a target page;

when the latest sentence is not successfully matched with the page remark content corresponding to each page in the document, the latest sentence is successively semantically matched with the page title content corresponding to each page in the document, and the page corresponding to the page title content successfully matched first is used as a target page;

and when the latest statement is not successfully matched with the page title content corresponding to each page in the document, continuously collecting voice data of the demonstrator in the following voice page turning mode.

8. The method of claim 7, wherein the sequentially semantically matching the latest sentence with the page remark content corresponding to each page in the document comprises:

in the successive matching process of successively carrying out semantic matching on the latest statement and the page remark content corresponding to each page in the document, comparing the lengths of the page remark content corresponding to the latest statement and the page being matched;

When the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page remark content;

when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, intercepting the page remark content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

9. The method of claim 7, wherein the sequentially semantically matching the latest sentence with the page remark content corresponding to each page in the document comprises:

in the successive matching process of successively carrying out semantic matching on the latest statement and the page title content corresponding to each page in the document, comparing the lengths of the latest statement and the page title content corresponding to the page being matched;

when the length of the latest sentence is greater than the length of the page title content, according to the length of the page title content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page title content;

When the length of the latest sentence is smaller than the length of the page title content, according to the length of the latest sentence, intercepting the page title content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

10. The method according to claim 1, wherein the semantic matching of the latest sentence with the text content corresponding to each stored page successively comprises:

in the successive matching process of successively carrying out semantic matching on the latest sentences and the text contents corresponding to each page in the document, converting the latest sentences into corresponding word sequences, and converting the text contents corresponding to the matched pages into corresponding word sequences;

generating a sentence vector corresponding to the latest sentence based on word vectors corresponding to the word segments in the word sequence corresponding to the latest sentence, and generating a text vector corresponding to the text content based on word vectors corresponding to the word segments in the word sequence corresponding to the text content;

and when the similarity between the sentence vector and the text vector is larger than a preset threshold value, judging that the latest sentence is successfully matched with the text content.

11. The method according to claim 1, wherein the method further comprises:

generating a page turning instruction carrying the page number according to the page number of the target page;

and based on the page turning instruction, after the document is turned from the current page to the target page, continuously collecting voice data of a demonstrator in the following voice page turning mode.

12. The method of any one of claims 1 to 11, wherein the document is an online collaboration document, the method further comprising:

initiating a document collaboration invitation according to the access address of the online collaboration document;

and synchronizing the document control information generated in the demonstration process of the online collaboration document to a terminal responding to the document collaboration invitation, so that the terminal synchronizes the control state of the online collaboration document according to the document control information.

13. A document control apparatus, the apparatus comprising:

the editing interface display module is used for displaying a follow voice page turning initialization control in an editing interface of a document when the document is edited; responding to the triggering operation of the follow-up voice page turning initialization control, extracting text contents corresponding to all pages in the document, storing page numbers of all pages and corresponding text contents correspondingly, and displaying a follow-up page turning prompt area in the editing interface, wherein the follow-up page turning prompt area comprises the text contents corresponding to all pages in the document;

The text content editing module is used for responding to text editing operation in the follow-up page turning prompt area, displaying edited text content in the follow-up page turning prompt area, and updating text content corresponding to each page in the document according to the edited text content;

the following page turning module is used for collecting voice data of a demonstrator under the following voice page turning mode, converting the voice data of the demonstrator into sentences of the demonstrator, sequentially carrying out semantic matching on the latest sentences and the text content corresponding to each stored page, and turning pages to target pages of the document according to page numbers of the text content successfully matched.

14. The apparatus of claim 13, wherein the editing interface presentation module is further configured to enter a document editing mode with respect to the document in response to a triggering operation to edit the document; after entering the document editing mode, displaying an editing interface of the document; and displaying the following voice page turning initialization control in the editing interface.

15. The apparatus of claim 13, wherein the presentation interface presentation module is further configured to enter a document presentation mode with respect to the document in response to a triggering operation to present the document; after entering the document demonstration mode, displaying a demonstration interface of the document; and displaying the following voice page turning trigger control in the demonstration interface.

16. The apparatus of claim 13, wherein the response module is further configured to, in the follow voice page-flip mode, exit the follow voice page-flip mode in response to a trigger operation of the follow voice page-flip trigger control.

17. The apparatus of claim 13, wherein the text content comprises at least one of page remark content and page title content.

18. The apparatus of claim 13, wherein the follow-up voice page-turning initialization module is further configured to extract, for each page in the document, original text content corresponding to each page; when the word number of the original text content is less than the preset number, the original text content is reserved and used as the text content extracted from the corresponding page; when the number of words of the original text content is larger than the preset number, reading is continued after the preset number of characters are read from the first character of the original text content until the separator is read for the first time, and the read content is reserved as the text content extracted from the corresponding page.

19. The apparatus of claim 13, wherein the apparatus further comprises:

the matching module is used for carrying out semantic matching on the latest sentences and page remark contents corresponding to all pages in the document successively, and taking a page corresponding to the page remark contents successfully matched first as a target page; when the latest sentence is not successfully matched with the page remark content corresponding to each page in the document, the latest sentence is successively semantically matched with the page title content corresponding to each page in the document, and the page corresponding to the page title content successfully matched first is used as a target page; and when the latest statement is not successfully matched with the page title content corresponding to each page in the document, continuously collecting voice data of the demonstrator in the following voice page turning mode.

20. The apparatus of claim 19, wherein the matching module is configured to compare a length of a page remark content corresponding to a page being matched with a latest sentence in a successive matching process that successively performs semantic matching on the latest sentence and the page remark content corresponding to each page in the document; when the length of the latest sentence is greater than the length of the page remark content, according to the length of the page remark content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page remark content; when the length of the latest sentence is smaller than the length of the page remark content, according to the length of the latest sentence, intercepting the page remark content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

21. The apparatus of claim 19, wherein the matching module is further configured to compare a length of the page title content corresponding to the page being matched with the latest sentence in a successive matching process that successively performs semantic matching on the page title content corresponding to each page in the document with the latest sentence; when the length of the latest sentence is greater than the length of the page title content, according to the length of the page title content, intercepting the latest sentence from the last character to the first character direction to obtain an intercepted sentence, and carrying out semantic matching on the intercepted sentence and the page title content; when the length of the latest sentence is smaller than the length of the page title content, according to the length of the latest sentence, intercepting the page title content from the last character to the first character to obtain intercepted content, and carrying out semantic matching on the latest sentence and the intercepted content.

22. The apparatus of claim 13, wherein the apparatus further comprises:

the matching module is used for converting the latest sentence into a corresponding word sequence and converting the text content corresponding to the page being matched into the corresponding word sequence in the successive matching process of successively carrying out semantic matching on the latest sentence and the text content corresponding to each page in the document; generating a sentence vector corresponding to the latest sentence based on word vectors corresponding to the word segments in the word sequence corresponding to the latest sentence, and generating a text vector corresponding to the text content based on word vectors corresponding to the word segments in the word sequence corresponding to the text content; and when the similarity between the sentence vector and the text vector is larger than a preset threshold value, judging that the latest sentence is successfully matched with the text content.

23. The apparatus of claim 13, wherein the document control apparatus further comprises:

the page turning module is used for generating a page turning instruction carrying the page number according to the page number of the target page; and based on the page turning instruction, after the document is turned from the current page to the target page, continuously collecting voice data of a demonstrator in the following voice page turning mode.

24. The apparatus of any one of claims 13 to 23, wherein the document is an online collaboration document, the apparatus further comprising:

25. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.

26. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 12.

27. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.