CN109726367B

CN109726367B - Comment display method and related device

Info

Publication number: CN109726367B
Application number: CN201711022730.1A
Authority: CN
Inventors: 熊飞; 任旻
Original assignee: Tencent Technology Beijing Co Ltd
Current assignee: Tencent Technology Beijing Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2022-06-10
Anticipated expiration: 2037-10-27
Also published as: CN109726367A; WO2019080873A1

Abstract

The invention discloses a comment display method, which is applied to an instant messaging application program and comprises the following steps: the method comprises the steps that a first terminal device receives an annotation input instruction set through an instant messaging application program, wherein the annotation input instruction set comprises at least one instruction for annotating a target document, and each instruction corresponds to a moment; the first terminal equipment determines annotation information corresponding to the target document according to the annotation input instruction set; the first terminal equipment synthesizes an annotation video according to the annotation information and the time corresponding to each instruction; the first terminal device sends the annotation video to the second terminal device, and the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program. The invention also provides the terminal equipment. The invention can directly annotate a plurality of places of the document, thereby improving the execution efficiency of the scheme, and can annotate and communicate the document in the instant messaging application program at the same time, so that the scheme has stronger flexibility.

Description

Comment display method and related device

Technical Field

The invention relates to the technical field of internet, in particular to a comment display method and a related device.

Background

With the continuous development of internet technology, more and more people rely on instant messaging applications for communication. In daily work and life, in order to facilitate communication, one user often needs to transmit a document to other users, and people discuss the content in the same document.

At present, when discussing a problem in a document, generally, a screenshot may be performed on content in the document or the content in the document may be directly modified, and then the screenshot or the modified content is sent to other users, and then the document content is discussed with the other users.

However, when the modified content is more, it takes more time to directly modify the document, which is not favorable for the practicability of the scheme. In addition, if the document is long in space, the screenshot operation on the document also needs to consume much time and energy, and the feasibility of the scheme is reduced.

Disclosure of Invention

The embodiment of the invention provides an annotation display method and a related device, which can be used for annotating a plurality of places of a document directly without screenshot or modification of the document, thereby improving the execution efficiency of a scheme, and can be used for annotating and communicating the document in an instant messaging application program simultaneously, so that the scheme has stronger flexibility.

In view of this, a first aspect of the present invention provides a method for displaying comments, where the method is applied to an instant messaging application, and the method includes:

the method comprises the steps that a first terminal device receives an annotation input instruction set through an instant messaging application program, wherein the annotation input instruction set comprises at least one instruction for annotating a target document, and each instruction corresponds to a moment;

the first terminal equipment determines annotation information corresponding to the target document according to the annotation input instruction set;

the first terminal equipment synthesizes annotation videos according to the annotation information and the time corresponding to each instruction;

and the first terminal equipment sends the annotation video to second terminal equipment, wherein the second terminal equipment is used for receiving and displaying the annotation video through the instant messaging application program.

A second aspect of the present invention provides a terminal device, including:

the receiving module is used for receiving an annotation input instruction set through the instant messaging application program, wherein the annotation input instruction set comprises at least one instruction for annotating a target document, and each instruction corresponds to a moment;

The determining module is used for determining annotation information corresponding to the target document according to the annotation input instruction set received by the receiving module;

the synthesis module is used for synthesizing annotation videos according to the annotation information determined by the determination module and the time corresponding to each instruction;

and the sending module is used for sending the annotation video synthesized by the synthesizing module to a second terminal device, wherein the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program.

A third aspect of the present invention provides a terminal device, including: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

receiving an annotation input instruction set through the instant messaging application program, wherein the annotation input instruction set comprises at least one instruction for annotating a target document, and each instruction corresponds to a moment;

determining annotation information corresponding to the target document according to the annotation input instruction set;

synthesizing annotation videos according to the annotation information and the time corresponding to each instruction;

Sending the annotation video to a second terminal device, wherein the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the invention has the following advantages:

the embodiment of the invention provides a comment display method, which is applied to an instant messaging application program.A first terminal receives a comment input instruction set through the instant messaging application program, wherein the comment input instruction set comprises at least one instruction for commenting a target document, each instruction corresponds to a moment, the comment information corresponding to the target document can be determined according to the comment input instruction set, then, first terminal equipment synthesizes comment videos according to the comment information and the moments corresponding to the instructions, and finally, the first terminal equipment sends the comment videos to second terminal equipment, wherein the second terminal equipment is used for receiving and displaying the comment videos through the instant messaging application program. Through the mode, on one hand, the method can directly annotate a plurality of places of the document without screenshot or modification of the document, so that the execution efficiency of the scheme is improved, on the other hand, the document can be annotated and communicated in the instant messaging application program, and the scheme has stronger flexibility.

Drawings

FIG. 1 is a diagram illustrating a relationship between a hierarchy and a display hierarchy according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating another relationship between a hierarchy relationship and a display hierarchy in an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a comment displaying method in the embodiment of the present invention;

FIG. 4 is a schematic diagram of an interface for starting a voice annotation function according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an interface for confirming a voice annotation in an embodiment of the present invention;

FIG. 6 is a schematic diagram of an interface of target document annotation in an embodiment of the present invention;

FIG. 7 is a schematic diagram of an interface for synthesizing and sending annotation videos according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an interface for displaying subtitles in annotation video according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an interface for confirming voice annotations and video annotations in the embodiment of the present invention;

FIG. 10 is a schematic diagram of an interface for previewing a target document using system plug-ins in an application scenario of the present invention;

FIG. 11 is a schematic view of an interface for viewing a target document using cloud preview in an application scenario of the present invention;

fig. 12 is a schematic diagram of an embodiment of a terminal device in the embodiment of the present invention;

fig. 13 is a schematic diagram of another embodiment of the terminal device in the embodiment of the present invention;

Fig. 14 is a schematic diagram of another embodiment of the terminal device in the embodiment of the present invention;

fig. 15 is a schematic structural diagram of a terminal device in the embodiment of the present invention.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the present invention is primarily applicable to Instant Messaging (IM) Applications (APPs). The IM APP commonly used on the Internet at present comprises Tencent QQ, WeChat, Yixin, nailing, Baidu HI, Feixin, Aliwang, Jingdongdong, Feiyi, yy, Skype, Google Talk, icq, FastMsg, parox and the like, and most of instant messaging services provide the characteristics of state information, namely displaying a contact list, whether a contact is on line or not and whether the contact can Talk with the contact or not. Generally, the IM service will send a message to notify the user when someone on the user's call list (similar to a phone book) connects to IM, so that the user can start real-time communication with the person through the internet. In addition to text, most IM services also provide video communication capability in fact, given sufficient bandwidth. The biggest difference between real-time messaging and e-mail is that the user does not need to wait, and as long as two people are on line simultaneously, the user can transmit characters, files, sounds and images to the other party like a multimedia telephone, and as long as the user has a network, the user has no distance no matter how far apart the user is in the sky or sea.

The method can directly open the document preview on the IM APP by utilizing the IM function to display the document content, can annotate the document, and then records the annotating process. In the recording process, the size of the recording frame cannot be changed, and only the document can be turned. The recording may include a page flip action, an annotation action, and a mouse action. If the user chooses to turn the microphone on. The audio track retains the microphone content during recording.

For easy understanding, please refer to fig. 1, where fig. 1 is a schematic diagram illustrating a relationship between a hierarchical relationship and a display hierarchy in an embodiment of the present invention, and as shown in the figure, if a user needs to use an annotation tool, an annotation view is superimposed on a document preview view, all annotation contents correspond to documents one to one, the documents can be scrolled on a ScrollView container, and an annotation action can be cancelled and deleted on the annotation view. And recording all page turning and annotation behaviors and mouse actions. And after the annotation is finished, combining the microphone audio track, the document operation video and the annotation operation video into a video, displaying the video on a preview window, and finally sharing the synthesized video to other users on the IM APP.

Referring to fig. 2, fig. 2 is another schematic diagram of the relationship between the hierarchy and the display hierarchy according to the embodiment of the present invention, as shown in the figure, after the user clicks the "voice annotation" button, the "annotation window" is opened. The annotation window comprises a document preview view, wherein the document preview view is used for showing document contents. The toolbar is used for adding annotation elements such as rectangles, circles, arrows, characters, labels and handwriting, and can be used for removing one-step operation on the pin, controlling a microphone switch, displaying recording time and the like. The annotation view is used for displaying annotation content.

The ScrollView container internally contains a document preview view and an annotation view, and when the view size is larger than the window size, a scroll bar is displayed. When the user slides the scroll bar, the added annotations remain fixed in position relative to the document content. The annotation view and the document preview view are the same size and are all sub-views of ScrollView. When the user slides the scroll bar of the ScrollView, the annotation view and the document preview view move simultaneously and keep the relative positions unchanged. This ensures that the annotation and document content are not misaligned. When the user zooms in and out of the preview view, the added annotations are fixed in position relative to the document content. When the document preview view is zoomed, the size of the document preview view is changed, and at the moment, the size of the annotation view is correspondingly adjusted to be always the same as the size of the document preview view and the relative position of the annotation view is unchanged.

Referring to fig. 3, a comment displaying method according to the present invention is described, where the comment displaying method is applied to an instant messaging application, and an embodiment of the comment displaying method according to the present invention includes:

101. the method comprises the steps that a first terminal device receives an annotation input instruction set through an instant messaging application program, wherein the annotation input instruction set comprises at least one instruction for annotating a target document, and each instruction corresponds to a moment;

In this embodiment, first, a first terminal device receives, through an IM APP, a comment input instruction set triggered by a user, where the comment input instruction set includes at least one instruction for annotating a target document, for example, an instruction for adding a rectangular frame, an instruction for adding a circular frame, an instruction for adding an arrow, an instruction for adding a text, an instruction for adding a label, and an instruction for adding handwriting, and certainly, the instruction for annotating the target document may further include a cancel instruction, a delete instruction, an instruction for displaying recording time, an instruction for recording video and audio, and the like.

It is understood that the target document may be any IM APP-supported document such as a picture, a word processor application (micro office word) or a Portable Document Format (PDF), and is not limited herein.

In addition, each annotation corresponds to a time, for example, 10 minutes and 25 seconds are used for inputting characters, 12 minutes and 37 seconds are used for adding a rectangular box, and the like.

102. The first terminal equipment determines annotation information corresponding to the target document according to the annotation input instruction set;

in this embodiment, the first terminal device may determine, according to the received annotation input instruction set, annotation information included in the target document, where the annotation information of the target document is shown in table 1 below.

TABLE 1

Moment of annotation	Annotating instructions	Annotation information
			0 minute 1 second	Adding handwritten instructions	Hand writing 'NO'
0 minute and 16 seconds	Add arrow instruction	Drawing a right arrow under the training
			0 minute and 55 seconds	Adding handwritten instructions	Hand writing "GOOD"
1 minute 03 second	Add word instruction	Inputting 'sample' two words
			1 minute and 17 seconds	Add circle Box instruction	Circular frame framing 'Wenxi' two-character
1 minute 44 seconds	Add tag instruction	Add "first draft" tag
			2 min 00 sec	Cancelling an instruction	Revoking added "first draft" tag

The annotation information in table 1 is only an illustration and should not be construed as a limitation of the present invention.

103. The first terminal equipment synthesizes an annotation video according to the annotation information and the time corresponding to each instruction;

in this embodiment, the first terminal device can synthesize an annotation video according to the annotation information and the time corresponding to each instruction, where the annotation video is a video of a recording annotation process.

104. The first terminal device sends the annotation video to the second terminal device, wherein the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program.

In this embodiment, finally, after the first terminal device synthesizes the annotation video, the annotation video may be sent to at least one second terminal device through the IM APP. It should be noted that, steps 101 to 104 are all performed in the same IM APP, and during this period, it is not necessary for the user to quit the recording operation of the annotation video by the IM APP, that is, after the user directly receives the target document on the IM APP, the annotation can be started and the corresponding annotation video can be recorded.

After the second terminal device receives the annotation video sent by the first terminal device through the IM APP, the whole annotation process can be seen by directly starting the annotation video through the IM APP.

The embodiment of the invention provides a comment display method, which is applied to an instant messaging application program. Through the mode, on one hand, the method can directly annotate a plurality of places of the document without screenshot or modification of the document, so that the execution efficiency of the scheme is improved, on the other hand, the document can be annotated and communicated in the instant messaging application program, and the scheme has stronger flexibility.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the annotation display method provided in the embodiment of the present invention, before the first terminal device synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, the method may further include:

the method comprises the steps that a first terminal device receives an audio data stream, wherein the audio data stream carries a moment identifier;

the first terminal device synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, and may include:

and the first terminal equipment synthesizes the annotation video according to the annotation information, the time corresponding to each instruction and the audio data stream, wherein the time corresponding to each instruction and the time and time identification of the audio data stream have a corresponding relation.

In this embodiment, how to add the speech explanation in the annotation process will be specifically described. Specifically, referring to fig. 4, fig. 4 is an interface schematic diagram of starting a voice annotation function according to an embodiment of the present invention, first, a user sends a target document on an IM APP, and if the target document is a WORD document, "voice annotation" beside a bubble of the target document may be added to the voice annotation function. And after clicking the voice annotation, opening the file for browsing, and providing an entrance for starting annotation.

Next, referring to fig. 5, fig. 5 is a schematic view of an interface for confirming the voice annotation according to the embodiment of the present invention, as shown in the figure, the user can click to select to turn on the microphone. Then click "start voice annotation", at this time, enter a voice annotation stage, please refer to fig. 6, where fig. 6 is an interface schematic diagram of target document annotation in the embodiment of the present invention, as shown in the figure, a user can annotate a document page with a tool while speaking voice for explanation, so as to help a listener to better understand annotation.

After the recording is finished, the whole annotation process is stored in a video form, and because the video recording is adopted, the time corresponding to each instruction and the time mark of the audio data stream are all used as important reference values of the synthesized annotation video, so that the problem of picture and sound asynchronization can be prevented. After the annotation video is synthesized, please refer to fig. 7, where fig. 7 is an interface schematic diagram of synthesizing and sending the annotation video according to the embodiment of the present invention, and as shown in the figure, a user may select to save the annotation video locally or share the annotation video with other users in a small video mode.

Secondly, in the embodiment of the present invention, the first terminal device may receive the audio data stream while receiving the annotation input instruction set, that is, the user may record while annotating, and the finally synthesized annotation video includes the audio data stream. By the aid of the method, annotation experience of the document can be improved, and the method of combining voice with annotation is beneficial to increasing annotation and expression efficiency.

Optionally, on the basis of the first embodiment corresponding to fig. 3, in a second optional embodiment of the annotation display method provided in the embodiment of the present invention, after the first terminal device receives the audio data stream, the method may further include:

the method comprises the steps that a first terminal device processes an audio data stream through a voice recognition model to obtain subtitle information corresponding to the audio data stream;

the first terminal device synthesizes the annotation video according to the annotation information, the time corresponding to each instruction, and the audio data stream, and may include:

and the first terminal equipment synthesizes the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the subtitle information.

In this embodiment, the first terminal device may further process the audio data stream through the voice recognition model, obtain subtitle information corresponding to the audio data stream, and finally display a subtitle corresponding to the audio data stream in the annotation video.

Referring to fig. 8, fig. 8 is an interface schematic diagram of displaying subtitles in an annotation video according to an embodiment of the present invention, as shown in the figure, when the annotation video is played, besides displaying a current playing progress, subtitles corresponding to an audio data stream may also be displayed, it should be noted that a subtitle position below fig. 8 is only one schematic diagram, and in practical applications, the subtitle position may be adjusted according to a user habit.

It is understood that speech recognition models include, but are not limited to, acoustic models and language models. The language model represents the probability of a word sequence, and generally adopts the chain rule to decompose the probability of a sentence into the product of the probabilities of each word in the sentence. The task of the acoustic model is to give the probability that the speech is spoken after a given word.

It should be noted that the subtitle information may be displayed below the annotation video, may also be displayed above the annotation video, or may be set according to a user requirement, which is not limited herein.

Thirdly, in the embodiment of the present invention, the terminal device processes the audio data stream through the speech recognition model to obtain the subtitle information corresponding to the audio data stream, and then synthesizes the annotation video by combining the annotation information, the time corresponding to each instruction, the audio data stream, and the subtitle information. In this way, it is possible to help users who have poor hearing ability or cannot hear sound in the current environment understand the program content. In addition, because many words are homophonic, the content in the video can be more clearly annotated only by combining caption characters and audio, so that the practicability and feasibility of the scheme are improved.

Optionally, on the basis of the first or second embodiment corresponding to fig. 3 and fig. 3, in a third optional embodiment of the annotation display method provided in the embodiment of the present invention, before the first terminal device synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, the method may further include:

the method comprises the steps that a first terminal device receives a video data stream, wherein the video data stream carries a moment identifier;

and the first terminal equipment synthesizes the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the video data stream, wherein the time corresponding to each instruction, the time and moment identifier of the audio data stream and the time and moment identifier of the video data stream all have corresponding relations.

In this embodiment, the first terminal device may receive a video data stream in addition to the audio data stream before synthesizing the annotation video according to the annotation information and the time corresponding to each instruction. The video data stream is captured by a camera. For example, when a user records a voice, the video recording is started, and then expressions and actions of the user during annotation can be recorded, and then a video is made, and the video, the annotation information and the audio data stream are combined to form an annotation video.

Therefore, the time corresponding to each instruction, the time mark of the audio data stream and the time mark of the video data stream are used as important reference values of the synthesized annotation video, and the problem of picture and sound asynchronism can be prevented.

Referring to fig. 9, fig. 9 is a schematic view of an interface for confirming voice annotations and video annotations in the embodiment of the present invention, as shown in the figure, a "camera" may also be selected when a video needs to be recorded, so that video recording may be performed, it should be noted that the video display position on the upper right of fig. 9 is only one schematic view, and in practical applications, the display position may be adjusted according to a habit of a user.

Further, in the embodiment of the present invention, when receiving the annotation input instruction set, the first terminal device may receive an audio data stream, and may further receive a video data stream, that is, the user may record a video while recording during annotation, and the finally synthesized annotation video includes the audio data stream and the video data stream. By the method, the annotation experience of the document can be better improved, and the annotation mode combining voice and video is adopted, so that the efficiency of annotation and expression is increased.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a fourth optional embodiment of the annotation display method provided in the embodiment of the present invention, before the first terminal device receives the annotation input instruction set through the instant messaging application program, the method may further include:

the method comprises the steps that a first terminal device obtains a document type of a target document;

the method comprises the steps that a first terminal device judges whether a document type of a target document belongs to a preset document type;

if the document type of the target document belongs to the preset document type, the first terminal device displays the target document on a display interface of the instant messaging application program;

and if the document type of the target document does not belong to the preset document type, the first terminal equipment displays the target document by calling the system plug-in.

In this embodiment, before the first terminal device receives the annotation input instruction set through the instant messaging application program, the document type of the target document needs to be obtained first, and if the document type is the preset document type, the target document content can be directly displayed in the document preview view through the IM APP. The preset document type may be a text file or a picture file, etc. If the document type does not belong to the preset document type, calling a system plug-in to display the target document.

A system plug-in is a program written in an application program interface that complies with a specification. The system plug-in runs under the system platform specified by the program (possibly supporting multiple platforms simultaneously) and cannot run independently from the specified platform. Since the plug-in needs to call a library of functions or data provided by the original clean system. Many IM APPs have plug-ins. In the invention, the first terminal equipment can display the target document by calling the system plug-in the IM APP, and can also display the target document by calling the system plug-in the operating system.

Secondly, in the embodiment of the invention, the terminal equipment can also acquire the type of the document, if the type of the document belongs to the preset document type, the terminal equipment directly displays the document on the instant messaging application program, otherwise, the terminal equipment needs to call a system plug-in, and the document is displayed through the system plug-in. By the method, even if the instant messaging application program does not support a certain document type, the system plug-in can be called to display the document corresponding to the document type, so that the feasibility and operability of the scheme are improved, and the method is suitable for various different types of documents.

Optionally, on the basis of the fourth embodiment corresponding to fig. 3, in a fifth optional embodiment of the annotation display method provided in the embodiment of the present invention, after the first terminal device displays the target document by calling a system plug-in, the method may further include:

The method comprises the steps that a first terminal device sends a document browsing instruction to a server so that the server generates a preview picture corresponding to a target document according to the document browsing instruction, wherein the document browsing instruction carries an identifier of the target document;

the method comprises the steps that a first terminal device receives a preview picture sent by a server;

the first terminal device may display the target document by calling a system plug-in, and the method may include:

and the first terminal equipment sequentially displays the preview pictures corresponding to the target document according to the sequence by calling the system plug-ins.

In this embodiment, after the first terminal device displays the target document by calling the system plug-in, the first terminal device may further send a document browsing instruction to the server, that is, start a "cloud browsing" function. And the server calls the target document in the memory according to the identifier carried in the document browsing instruction, and sends the target document to the first terminal equipment in the form of a browsing picture. And the first terminal equipment displays each preview picture corresponding to the target document according to the sequence from front to back or from back to front. The user may annotate each preview picture, for example, ten preview pictures of the target document, and the synthesized annotation video also includes annotations for the ten preview pictures.

It can be understood that the process of calling the target document in the background by the server is specifically to index through the identifiers of the target documents, and each document corresponds to one identifier, so that the identifiers have uniqueness. The identifier of the target document may be a message digest algorithm fifth edition (MD 5) or a Secure Hash Algorithm (SHA), or may be other types of identifiers, which are not limited herein.

In the embodiment of the invention, how to display the document by calling the system plug-in is described, and the document can be displayed in the form of pictures according to a certain sequence. Through the mode, when the user records the annotation video, the document can be annotated according to a reasonable sequence, so that the reasonability and the feasibility of the scheme are improved.

Optionally, on the basis of the fifth embodiment corresponding to fig. 3, in a sixth optional embodiment of the annotation display method provided in the embodiment of the present invention, the receiving, by the first terminal device, the annotation input instruction set by the instant messaging application program may include:

the method comprises the steps that a first terminal device receives a first batch of injection and input instruction subset corresponding to a first preview picture through an instant messaging application program, wherein the first preview picture belongs to a target document, and the first batch of injection and input instruction subset belongs to an injection and input instruction set;

The first terminal equipment receives a second comment input instruction subset corresponding to a second preview picture through the instant messaging application program, wherein the second preview picture belongs to the target document, and the second comment input instruction subset belongs to the comment input instruction set;

the method comprises the steps that a first terminal device establishes an annotation data array according to a first preview picture, a first batch of annotation input instruction subset, a second preview picture and a second annotation input instruction subset, wherein the annotation data array comprises the corresponding relation between the preview picture and the annotation input instruction subset;

the determining, by the first terminal device, the annotation information corresponding to the target document according to the annotation input instruction set may include:

and the first terminal equipment determines annotation information corresponding to the target document according to the annotation input instruction set, the preview picture corresponding to the target document and the annotation data array.

In this embodiment, for a target document containing multiple pages of pictures, when a user turns a page, the added annotation content needs to correspond to a preview page. Specifically, the target document comprises two pages of pictures, namely a first preview picture and a second preview picture, the first preview picture is annotated by the user, namely the first preview picture corresponds to the first comment input instruction subset, and then the second preview picture is annotated by the user, namely the second preview picture corresponds to the second comment input instruction subset. The first terminal device will maintain an array of annotation data as shown in table 2.

TABLE 2

Preview picture	Annotating input instruction subset
		First preview picture	First set of infusion instructions
Second preview picture	A second comment input instruction subset

It should be noted that the annotation data array may further include more corresponding relationships between the preview pictures and the annotation input command subset, and table 2 is only an illustration and should not be construed as a limitation of the present invention. The number of elements in the annotation data array is the same as the number of pages of the document. And when the user adds the annotation, the page number of the current preview picture is used as an index, and the annotation input instruction subset is stored in the array. The user can switch pages by turning page buttons or previewing pictures. And when the page turning starts, emptying the annotation view. And after the page turning is finished, taking out a corresponding annotation input instruction subset from the annotation data array according to the current page number, and drawing the annotation input instruction subset on an annotation view.

Further, in the embodiment of the present invention, if the document includes a page, the user can annotate each page, each page is a preview picture, the annotation made on the preview picture is an annotation input instruction subset, and the terminal device stores the corresponding relationship between the preview picture and the annotation input instruction subset in an array form. Through the mode, the terminal equipment can acquire the corresponding relation between the comments and the pages in the array when synthesizing the comment video, so that the accuracy of the synthesized comment video can be effectively improved under the condition of a plurality of pages of documents, and the condition that the comments and the pages are not aligned is avoided.

For convenience of understanding, the method for previewing the system plug-in the present invention is described in detail in a specific application scenario, specifically:

assuming that the IM APP is a QQ developed by Tencent, a user a wants to open a presentation (PPT) on the QQ, but the QQ cannot directly open the PPT, so that the QQ can call a system plug-in to display the content of the PPT, that is, as shown in fig. 10, fig. 10 is an interface diagram illustrating a target document previewed by the system plug-in an application scenario of the present invention.

Since the system plug-in is not always capable of perfectly displaying the file content, the server is queried whether the cloud preview of the type of file can be supported. And if the cloud end of the file previews, displaying a cloud end preview button in the preview view.

The cloud preview mode of the PPT file is that the server is provided with software supporting opening of the PPT format, such as Microsoft Office. The server opens the PPT file using Microsoft Office and then stores each page of PPT as a picture file. And then sending all the picture files to a client for viewing according to the page sequence in the PPT. Referring to fig. 11, fig. 11 is an interface diagram illustrating a target document viewed by using cloud preview in an application scenario of the present invention, and as shown in the figure, a server manages and caches a generated preview picture by using an MD5 value of the PPT file as an index.

If the user is not satisfied with the PPT result displayed by the system plug-in, for example, the font in the PPT is found to be incorrect or the content is misplaced, the user can click a 'cloud preview' button. The preview window firstly inquires whether the cloud preview server needs to upload the PPT file. Then, the server for cloud preview checks whether the cloud has a picture file cache of the file preview content. If the file is previewed by the user before a while, the server side has a cache, and at the moment, the server can inform the client that the PPT file does not need to be uploaded and inform the client that the preview picture is ready.

If the server does not cache the picture file, whether the cloud has the cache of the PPT file is checked, and the index is carried out through the MD 5. If a user performs operations such as cloud disk storage or QQ offline file transmission on the file once, the cloud end has a cache of the file. The server opens the file and generates a preview picture. And then informing the client that the PPT file is not required to be uploaded, and informing the client that the preview picture is ready. Otherwise, the server needs to notify the client to upload the PPT file.

After the client uploads the PPT file, the server opens the PPT file and generates a preview picture. The server informs the client that the preview picture is ready. And after receiving the notification that the preview picture is ready, the client applies for the preview picture from the server. The server informs the client of the total number of preview pictures. The client end sequentially displays each preview image in the preview window.

Referring to fig. 12, fig. 12 is a schematic view of an embodiment of a terminal device in an embodiment of the present invention, where the terminal device 20 includes:

a receiving module 201, configured to receive an annotation input instruction set through the instant messaging application program, where the annotation input instruction set includes at least one instruction for annotating a target document, and each instruction corresponds to a time;

a determining module 202, configured to determine, according to the annotation input instruction set received by the receiving module 201, annotation information corresponding to the target document;

a synthesizing module 203, configured to synthesize an annotation video according to the annotation information determined by the determining module 202 and the time corresponding to each instruction;

a sending module 204, configured to send the annotation video synthesized by the synthesizing module 203 to a second terminal device, where the second terminal device is configured to receive and display the annotation video through the instant messaging application.

In this embodiment, the receiving module 201 receives a comment input instruction set through the instant messaging application program, where the comment input instruction set includes at least one instruction for annotating a target document, each instruction corresponds to a time, the determining module 202 determines comment information corresponding to the target document according to the comment input instruction set received by the receiving module 201, the synthesizing module 203 synthesizes a comment video according to the comment information determined by the determining module 202 and the time corresponding to each instruction, and the sending module 204 sends the comment video synthesized by the synthesizing module 203 to a second terminal device, where the second terminal device is configured to receive and display the comment video through the instant messaging application program.

The embodiment of the invention provides a terminal device, which comprises a first terminal device and a second terminal device, wherein the first terminal device receives an annotation input instruction set through an instant messaging application program, the annotation input instruction set comprises at least one instruction for annotating a target document, each instruction corresponds to a moment, annotation information corresponding to the target document can be determined according to the annotation input instruction set, then the first terminal device synthesizes an annotation video according to the annotation information and the moment corresponding to each instruction, and finally the first terminal device sends the annotation video to the second terminal device, wherein the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program. Through the mode, on one hand, the method can directly annotate a plurality of places of the document without screenshot or modification of the document, so that the execution efficiency of the scheme is improved, on the other hand, the document can be annotated and communicated in the instant messaging application program, and the scheme has stronger flexibility.

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the terminal device provided in the embodiment of the present invention, the terminal device 20 further includes:

The receiving module 201 is further configured to receive a radio data stream before the synthesizing module 203 synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, where the audio data stream carries a time identifier;

the synthesizing module 203 is specifically configured to synthesize the annotation video according to the annotation information, the time corresponding to each instruction, and the audio data stream, where the time corresponding to each instruction and the time identifier of the audio data stream have a corresponding relationship.

Optionally, on the basis of the embodiment corresponding to fig. 12, referring to fig. 13, in another embodiment of the terminal device provided in the embodiment of the present invention, the terminal device 20 further includes an obtaining module 205;

The obtaining module 205 is configured to, after the receiving module 201 receives an audio data stream, process the audio data stream through a speech recognition model to obtain subtitle information corresponding to the audio data stream;

the synthesizing module 203 is specifically configured to synthesize the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream, and the subtitle information.

Thirdly, in the embodiment of the present invention, the terminal device processes the audio data stream through the speech recognition model to obtain the subtitle information corresponding to the audio data stream, and then synthesizes the annotation video by combining the annotation information, the time corresponding to each instruction, the audio data stream, and the subtitle information. In this way, it is possible to help users who have poor hearing ability or cannot hear sound in the current environment understand the program content. In addition, because many words are homophonic, the content in the video can be more clearly annotated only by watching through the combination of caption characters and audio, thereby improving the practicability and feasibility of the scheme.

Optionally, on the basis of the embodiment corresponding to fig. 12 or fig. 13, in another embodiment of the terminal device provided in the embodiment of the present invention, the terminal device 20 further includes:

The receiving module 201 is further configured to receive a video data stream before the synthesizing module 203 synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, where the video data stream carries a time identifier;

the synthesizing module 203 is specifically configured to synthesize the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream, and the video data stream, where the time corresponding to each instruction, the time identifier of the audio data stream, and the time identifier of the video data stream all have a corresponding relationship.

Optionally, on the basis of the embodiment corresponding to fig. 12, please refer to fig. 14, in another embodiment of the terminal device provided in the embodiment of the present invention, the terminal device 20 further includes a determining module 206 and a displaying module 207;

the obtaining module 205 is further configured to obtain a document type of the target document before the receiving module 201 receives the annotation input instruction set through the instant messaging application program;

the determining module 206 is configured to determine whether the document type of the target document acquired by the acquiring module 205 belongs to a preset document type;

the displaying module 207 is configured to, if the determining module 206 determines that the document type of the target document belongs to the preset document type, display the target document on a display interface of the instant messaging application by the first terminal device;

the displaying module 207 is configured to, if the determining module 206 determines that the document type of the target document does not belong to the preset document type, display the target document by the first terminal device by calling a system plug-in.

Optionally, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the terminal device provided in the embodiment of the present invention, the terminal device 20 further includes:

the sending module 204 is configured to send a document browsing instruction to a server after the displaying module 207 displays the target document by calling a system plug-in, so that the server generates a preview picture corresponding to the target document according to the document browsing instruction, where the document browsing instruction carries an identifier of the target document;

the receiving module 201 is configured to receive the preview picture sent by the server;

the display module 207 is specifically configured to sequentially display the preview pictures corresponding to the target document according to a sequence by calling system plug-ins.

The receiving module 201 is specifically configured to receive, through the instant messaging application program, a first batch of infusion instruction subsets corresponding to first preview pictures, where the first preview pictures belong to the target document, and the first batch of infusion instruction subsets belong to the infusion instruction set;

receiving a second comment input instruction subset corresponding to a second preview picture through the instant messaging application program, wherein the second preview picture belongs to the target document, and the second comment input instruction subset belongs to the comment input instruction set;

establishing an annotation data array according to the first preview picture, the first batch of annotation input instruction subset, the second preview picture and the second annotation input instruction subset, wherein the annotation data array comprises a corresponding relation between the preview picture and the annotation input instruction subset;

the determining module 202 is specifically configured to determine the annotation information corresponding to the target document according to the annotation input instruction set, the preview picture corresponding to the target document, and the annotation data array.

As shown in fig. 15, for convenience of description, only the portions related to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portion in the embodiments of the present invention. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as the mobile phone:

fig. 15 is a block diagram showing a partial structure of a cellular phone related to a terminal provided by an embodiment of the present invention. Referring to fig. 15, the handset includes: radio Frequency (RF) circuit 310, memory 320, input unit 330, display unit 340, sensor 350, audio circuit 360, wireless fidelity (WiFi) module 370, processor 380, and power supply 390. Those skilled in the art will appreciate that the handset configuration shown in fig. 15 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following specifically describes each constituent component of the mobile phone with reference to fig. 15:

The RF circuit 310 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 380; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 310 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuit 310 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 320 may be used to store software programs and modules, and the processor 380 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 320. The memory 320 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 330 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 330 may include a touch panel 331 and other input devices 332. The touch panel 331, also called a touch screen, can collect touch operations of a user (e.g., operations of a user on or near the touch panel 331 by using a finger, a stylus, or any other suitable object or accessory) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 331 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 380, and can receive and execute commands sent by the processor 380. In addition, the touch panel 331 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 330 may include other input devices 332 in addition to the touch panel 331. In particular, other input devices 332 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 340 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 340 may include a display panel 341, and optionally, the display panel 341 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 331 can cover the display panel 341, and when the touch panel 331 detects a touch operation on or near the touch panel 331, the touch panel is transmitted to the processor 380 to determine the type of the touch event, and then the processor 380 provides a corresponding visual output on the display panel 341 according to the type of the touch event. Although in fig. 15, the touch panel 331 and the display panel 341 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 331 and the display panel 341 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 350, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 341 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 341 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 360, speaker 361, microphone 362 may provide an audio interface between the user and the handset. The audio circuit 360 may transmit the electrical signal converted from the received audio data to the speaker 361, and the audio signal is converted by the speaker 361 and output; on the other hand, the microphone 362 converts the collected sound signals into electrical signals, which are received by the audio circuit 360 and converted into audio data, which are then processed by the audio data output processor 380 and then transmitted to, for example, another cellular phone via the RF circuit 310, or output to the memory 320 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 370, and provides wireless broadband internet access for the user. Although fig. 15 shows the WiFi module 370, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 380 is a control center of the mobile phone, connects various parts of the whole mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory 320, thereby integrally monitoring the mobile phone. Optionally, processor 380 may include one or more processing units; optionally, processor 380 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 380.

The handset also includes a power supply 390 (e.g., a battery) for powering the various components, optionally, the power supply may be logically connected to the processor 380 through a power management system, so that the power management system may be used to manage charging, discharging, and power consumption.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present invention, the processor 380 included in the terminal further has the following functions:

synthesizing a comment video according to the comment information and the time corresponding to each instruction;

and sending the annotation video to a second terminal device, wherein the second terminal device is used for receiving and displaying the annotation video through the instant messaging application program.

Optionally, the processor 380 is further configured to perform the following steps:

receiving an audio data stream, wherein the audio data stream carries a time identifier;

The processor 380 is specifically configured to perform the following steps:

and synthesizing the annotation video according to the annotation information, the time corresponding to each instruction and the audio data stream, wherein the time corresponding to each instruction has a corresponding relation with the time and time identification of the audio data stream.

Optionally, the processor 380 is specifically configured to perform the following steps:

processing the audio data stream through a voice recognition model to obtain subtitle information corresponding to the audio data stream;

and synthesizing the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the subtitle information.

receiving a video data stream, wherein the video data stream carries a time identifier;

the processor 380 is specifically configured to perform the following steps:

and synthesizing the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the video data stream, wherein the time corresponding to each instruction, the time and moment identifier of the audio data stream and the time and moment identifier of the video data stream all have corresponding relations.

acquiring the document type of the target document;

judging whether the document type of the target document belongs to a preset document type;

if the document type of the target document belongs to the preset document type, displaying the target document on a display interface of the instant messaging application program;

and if the document type of the target document does not belong to the preset document type, displaying the target document by calling a system plug-in.

sending a document browsing instruction to a server to enable the server to generate a preview picture corresponding to the target document according to the document browsing instruction, wherein the document browsing instruction carries an identifier of the target document;

receiving the preview picture sent by the server;

the processor 380 is specifically configured to perform the following steps:

and sequentially displaying the preview pictures corresponding to the target document according to the sequence by calling system plug-ins.

receiving a first batch of injection and input instruction subset corresponding to a first preview picture through the instant messaging application program, wherein the first preview picture belongs to the target document, and the first batch of injection and input instruction subset belongs to the batch of injection and input instruction set;

Receiving a second annotation input instruction subset corresponding to a second preview picture through the instant messaging application program, wherein the second preview picture belongs to the target document, and the second annotation input instruction subset belongs to the annotation input instruction set;

and determining the annotation information corresponding to the target document according to the annotation input instruction set, the preview picture corresponding to the target document and the annotation data array.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A comment display method is applied to a document comment by an instant messaging application program, and comprises the following steps:

the first terminal equipment receives an audio data stream, wherein the audio data stream carries a moment identifier;

The first terminal equipment receives a video data stream, wherein the video data stream carries a moment identifier;

the first terminal device synthesizes an annotation video according to the annotation information and the time corresponding to each instruction, the annotation information is displayed on an annotation view, and the annotation information keeps a relative position with a document content fixed, wherein the first terminal device synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, and the method comprises the following steps: the first terminal device synthesizes the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the video data stream, wherein the time corresponding to each instruction, the time moment identifier of the audio data stream and the time moment identifier of the video data stream all have corresponding relations;

2. The method of claim 1, wherein after the first terminal device receives the audio data stream, the method further comprises:

The first terminal equipment processes the audio data stream through a voice recognition model to acquire caption information corresponding to the audio data stream;

the first terminal device synthesizes the annotation video according to the annotation information, the time corresponding to each instruction and the audio data stream, and includes:

3. The method of claim 1, wherein before the first terminal device receives a set of annotation input instructions via the instant messaging application, the method further comprises:

the first terminal device obtains the document type of the target document;

the first terminal equipment judges whether the document type of the target document belongs to a preset document type;

and if the document type of the target document does not belong to the preset document type, the first terminal equipment displays the target document by calling a system plug-in.

4. The method of claim 3, wherein after the first terminal device exposes the target document by calling a system plug-in, the method further comprises:

the first terminal device sends a document browsing instruction to a server so that the server generates a preview picture corresponding to the target document according to the document browsing instruction, wherein the document browsing instruction carries an identifier of the target document;

the first terminal equipment receives the preview picture sent by the server;

the first terminal device displays the target document by calling a system plug-in, and the method comprises the following steps:

and the first terminal equipment sequentially displays the preview pictures corresponding to the target document according to the sequence by calling system plug-ins.

5. The method of claim 4, wherein the first terminal device receiving a set of annotation input instructions via the instant messaging application comprises:

the first terminal equipment receives a first batch of injection and input instruction subset corresponding to a first preview picture through the instant messaging application program, wherein the first preview picture belongs to the target document, and the first batch of injection and input instruction subset belongs to the batch of injection and input instruction set;

the first terminal device establishes an annotation data array according to the first preview picture, the first batch of annotation input instruction subset, the second preview picture and the second annotation input instruction subset, wherein the annotation data array comprises a corresponding relation between the preview picture and the annotation input instruction subset;

the first terminal device determines annotation information corresponding to the target document according to the annotation input instruction set, and the method includes:

and the first terminal equipment determines the annotation information corresponding to the target document according to the annotation input instruction set, the preview picture corresponding to the target document and the annotation data array.

6. A terminal device, which is applied to an instant messaging application program to annotate a document, comprises:

the receiving module is further configured to receive an audio data stream before the synthesizing module synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, where the audio data stream carries a time identifier;

the receiving module is further configured to receive a video data stream before the synthesizing module synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, where the video data stream carries a time identifier;

a synthesizing module, configured to synthesize an annotation video according to the annotation information determined by the determining module and the time corresponding to each instruction, where the annotation information is displayed on an annotation view, and the annotation information keeps a relative position with respect to a document content fixed, and the synthesizing module synthesizes the annotation video according to the annotation information and the time corresponding to each instruction, and includes: synthesizing the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the video data stream, wherein the time corresponding to each instruction, the time identifier of the audio data stream and the time identifier of the video data stream all have corresponding relations;

7. The terminal device according to claim 6, wherein the terminal device further comprises an obtaining module;

the acquisition module is used for processing the audio data stream through a voice recognition model after the receiving module receives the audio data stream, and acquiring the subtitle information corresponding to the audio data stream;

the synthesis module is specifically configured to synthesize the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream, and the subtitle information.

8. A terminal device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, the program is applied to an instant messaging application program to annotate the document, and the method comprises the following steps:

receiving a video data stream, wherein the video data stream carries a moment identifier;

synthesizing an annotation video according to the annotation information and the time corresponding to each instruction, wherein the annotation information is displayed on an annotation view, and the annotation information keeps a relative position with document content fixed, and the synthesizing of the annotation video according to the annotation information and the time corresponding to each instruction comprises: synthesizing the annotation video according to the annotation information, the time corresponding to each instruction, the audio data stream and the video data stream, wherein the time corresponding to each instruction, the time identifier of the audio data stream and the time identifier of the video data stream all have corresponding relations;

9. The terminal device of claim 8, wherein before receiving the set of annotation input instructions via the instant messaging application, the processor is further configured to perform the steps of:

acquiring the document type of the target document;

10. The terminal device of claim 9, wherein after the target document is exposed by calling a system plug-in, the processor is further configured to perform the following steps:

receiving the preview picture sent by the server;

the processor is specifically configured to perform the following steps:

And sequentially displaying the preview pictures corresponding to the target document according to the sequence by calling a system plug-in.

11. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-5.