CN114564915A

CN114564915A - Text typesetting method, electronic equipment and storage medium

Info

Publication number: CN114564915A
Application number: CN202210186943.2A
Authority: CN
Inventors: 张恒
Original assignee: Zhangyue Technology Co Ltd
Current assignee: Zhangyue Technology Co Ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-31
Also published as: WO2023160164A1

Abstract

The disclosure relates to a text typesetting method, an electronic device and a storage medium. According to the method and the device, the text data of the original text is obtained by analyzing the original text contained in the layout document, and the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document; dividing a plurality of text lines into annotation lines corresponding to the lines to be annotated, the lines to be annotated and the lines to be annotated based on the line positions; determining a character string to be annotated in a line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated; and generating a target streaming document corresponding to the format document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated, so that the problem of disordered layout of the streaming document converted from the format document with the annotation content is effectively avoided, and the electronic reading experience of a user is improved.

Description

Text typesetting method, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a text composition method, an electronic device, and a storage medium.

Background

With the rapid development of the internet and the continuous increase of the hardware level, electronic documents are gradually replacing traditional books and paper documents. Meanwhile, the reading habit of people is not limited to the traditional paper publication, and the specific gravity of electronic reading is gradually increasing.

At present, in order to shift the traditional layout reading experience to the latest electronic reading, the traditional layout document needs to be converted into a streaming document without the electronic reading typesetting obstacle. However, there is often a large amount of annotation content in the layout document, which may cause the converted streaming document to be out of the page, reducing the user's electronic reading experience.

Disclosure of Invention

In order to solve the technical problem, the present disclosure provides a text composition method, an electronic device, and a storage medium.

A first aspect of the embodiments of the present disclosure provides a text typesetting method, including: layout matching text

Analyzing an original text contained in the file to obtain text data of the original text, wherein the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document;

dividing a plurality of text lines into an annotation line corresponding to a line to be annotated, a line to be annotated and a line to be annotated based on the line position;

determining a character string to be annotated in a line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated;

and generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character strings to be annotated in the line to be annotated.

A second aspect of embodiments of the present disclosure provides an electronic device comprising a processor and a memory, the memory to store executable instructions that cause the processor to:

analyzing an original text contained in the layout document to obtain text data of the original text, wherein the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document;

dividing a plurality of text lines into annotation lines corresponding to the lines to be annotated, the lines to be annotated and the lines to be annotated based on the line positions;

A third aspect of the embodiments of the present disclosure provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, causes the processor to implement the text composition method of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

according to the method and the device, the text data of the original text is obtained by analyzing the original text contained in the layout document, and the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document; dividing a plurality of text lines into annotation lines corresponding to the lines to be annotated, the lines to be annotated and the lines to be annotated based on the line positions; determining a character string to be annotated in a line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated; and generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character strings to be annotated in the line to be annotated. Therefore, the text line including the annotation content in the format document can be analyzed and classified, the annotation character string in the annotation line is associated with the character string to be annotated in the line to be annotated, a stream document is generated to be displayed, the problem that the layout of the stream document converted from the format document with the annotation content is disordered is effectively solved, and the electronic reading experience of a user is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the embodiments or technical solutions in the prior art description will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

Fig. 1 is a flowchart of a text typesetting method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an original text parsing interface of a layout document provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another text typesetting method provided by the embodiments of the present disclosure;

FIG. 4A is a schematic diagram of a document display interface provided by an embodiment of the disclosure;

FIG. 4B is a schematic diagram of another document display interface provided by embodiments of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The text composition method provided by the embodiment of the present disclosure may be performed by an electronic device, which may be understood as any device having processing capability and computing capability, and the electronic device may include, but is not limited to, a mobile terminal such as a smart phone, a notebook computer, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a vehicle-mounted terminal (e.g., a car navigation terminal), a wearable device, and the like, and a fixed electronic device such as a digital TV, a desktop computer, a smart home device, and the like.

The layout document in the embodiment of the disclosure can be understood as a layout document, the layout document is not editable and is not easy to modify, the layout is fixed, and the original layout is always displayed in the reading process.

The streaming document in the embodiment of the present disclosure may be understood as a document whose composition interface can be adjusted arbitrarily, and the display mode of the content of the document may be adjusted, for example, hypertext Markup Language (HTML), and a tag may be added in a Text file to tell a browser how to display the content therein (e.g., how to process words, how to arrange pictures, how to display pictures, etc.).

The format document usually has a large amount of annotation contents, the distance intervals of the annotation contents are small and have strong content relevance although the annotation contents and the contents to be annotated are not in a text line, the annotation contents such as pinyin, phonetic symbols, translation, formulas (such as score lines and numerators of scores), symbols and the like, the contents to be annotated such as texts and formulas (such as denominators of scores) are not in a text line, and after the format document is converted into a streaming document, if the annotation contents and the contents to be annotated are separated, the annotation layout is disordered, so that a user cannot obtain a good electronic reading experience.

In the process of converting the layout document into the streaming document, the related art automatically separates the original text from the annotation content in the original text, for example, the original text and the annotation content are located in two separate distant text lines, the original text and the annotation content are located at different text positions in the two separate text lines, or the annotation content is identified as a separate paragraph, which causes the display effect after the annotation content in the original text and the original text is converted into the streaming document to be split, and the layout is disordered. Although the related art can make the annotated original text and the annotation content in the streaming document into one picture by hand and replace the annotated original text with the picture, the picture may have a problem of unclear display due to low resolution, and the text in the picture cannot be changed when the text is changed in a word format in the streaming document.

Aiming at the defects in text typesetting in the process of converting format documents into streaming documents in the related art, the embodiment of the disclosure provides a text typesetting method, electronic equipment and a storage medium, which effectively avoid the disorder of the layout of the streaming documents converted from the format documents with annotation contents and improve the electronic reading experience of users.

In order to better understand the inventive concept of the embodiments of the present disclosure, the following describes technical solutions of the embodiments of the present disclosure with reference to exemplary embodiments.

Fig. 1 is a flowchart of a text typesetting method provided in the embodiment of the present disclosure, and as shown in fig. 1, the method provided in the embodiment includes the following steps:

step 101, analyzing an original text contained in a layout document to obtain text data of the original text, wherein the text data comprises a plurality of text lines of the original text and line positions of each text line in the layout document.

A text line in the embodiments of the present disclosure refers to a line of text in the original text.

The line position in the disclosed embodiments may include coordinate information in the original text for each boundary of the smallest bounding rectangle of the text line. But is not limited thereto. The coordinate information of each boundary of the minimum bounding rectangle in the original text may include coordinates of four vertices of the minimum bounding rectangle, and may also include coordinate information of any point on each boundary of the minimum bounding rectangle in the original text. Therefore, richer position information about the text lines can be obtained, and the relative position relationship between the two text lines can be conveniently calculated subsequently.

In the embodiment of the present disclosure, there are various specific implementations of the electronic device for acquiring the layout document, and the following description is made with reference to a typical example, but the present disclosure is not limited thereto.

In some embodiments, when a user wants to parse original text contained in a layout document, the user may input an import operation on at least one layout document to the electronic device, and the electronic device may obtain the at least one layout document in response to the import operation on the at least one layout document.

Specifically, the importing operation of the at least one layout document may include an editing operation, a voice control operation, or an expression control operation of importing options by a touch manner or by operating a mouse and a keyboard, which is not limited herein.

For example, fig. 2 is a schematic diagram illustrating an original text parsing interface of a layout document provided by an embodiment of the present disclosure. As shown in FIG. 2, when the user clicks on import option 211, the electronic device, in response to the operation, displays an import panel that includes at least one primary folder option and/or at least one layout document option, and a determination control. When a user clicks one of the layout document options and triggers the determining control, the electronic device responds to the operation to acquire a layout document corresponding to the layout document option. When the user double-clicks one of the primary folder options, the electronic equipment responds to the operation, displays at least one secondary folder option and/or at least one layout document option included in the primary folder corresponding to the primary folder option, and similarly, when the user clicks one of the layout document options and triggers the determination control, the electronic equipment responds to the operation to acquire the layout document corresponding to the layout document option, when the user double clicks one of the second-level folder options, the electronic device responds to the operation, displays at least one third-level folder option and/or at least one layout document option included in the second-level folder corresponding to the second-level folder option, and so on until the user clicks one layout document option, and when the determined control is triggered, the electronic equipment responds to the operation to acquire the layout document corresponding to the layout document option.

In other embodiments, the electronic device may obtain a plurality of layout documents in response to an import operation on at least one layout document. For example, as shown in FIG. 2, when the user clicks on import option 211, the electronic device, in response to the operation, displays an import panel that includes at least one level one folder option and/or at least one layout document option, and a determination control. When the user continuously clicks a plurality of layout document options and triggers the determination control, the electronic device responds to the operation to acquire a plurality of layout documents 200 corresponding to the plurality of layout document options. When the user clicks at least one primary folder option and triggers the determination control, the electronic device responds to the operation to acquire a plurality of layout documents 200 corresponding to a plurality of layout document options included in each primary folder corresponding to the at least one primary folder option. When a user double clicks one of the primary folder options, the electronic device responds to the operation to display at least one secondary folder option and/or at least one layout document option included in the primary folder corresponding to the primary folder option, similarly, when the user continuously clicks the plurality of layout document options and triggers the determination control, the electronic device responds to the operation to acquire a plurality of layout documents 200 corresponding to the plurality of layout document options, when the user double clicks one of the secondary folder options, the electronic device responds to the operation to display at least one tertiary folder option and/or at least one layout document option included in the secondary folder corresponding to the secondary folder option, and so on, until the user continuously clicks the plurality of layout document options and triggers the determination control, the electronic device responds to the operation to acquire a plurality of layout documents 200 corresponding to the plurality of layout document options, or, when the user clicks at least one Z-level folder option (Z is an integer greater than or equal to 2) and triggers the determination control, the electronic device obtains, in response to the operation, a plurality of layout documents 200 corresponding to a total plurality of layout document options included in at least one Z-level folder corresponding to the at least one Z-level folder option.

In other embodiments, a user may send a document acquisition request carrying at least one layout document to an electronic device capable of providing a layout document of an original text in an embodiment of the present disclosure through another electronic device.

In the embodiment of the disclosure, after the electronic device obtains the layout document, the electronic device may analyze an original text included in the layout document to obtain text data of the original text, where the text data includes a plurality of text lines of the original text and a line position of each text line in the layout document, so as to subsequently classify and associate the text lines.

In some embodiments, the electronic device may obtain a layout document in the manner described above. At this time, the electronic device may parse the original text contained in the layout document.

In other embodiments, the electronic device may obtain a plurality of layout documents in the above manner. At this time, the electronic device may parse the original text contained in the plurality of layout documents.

In some embodiments, the electronic device may further parse original text contained in the layout document corresponding to the layout document option in the selected state. The present disclosure is not limited thereto.

Specifically, the operation of selecting the layout document option may include operations such as clicking, double-clicking, long-pressing, voice control operation, expression control operation, and the like on any layout document option by a touch manner or by operating a mouse, which is not limited herein.

In the embodiments of the present disclosure, there are various specific implementations for parsing the original text contained in the layout document, and the following description is made with reference to typical examples, but the present disclosure is not limited thereto.

In some embodiments, the electronic device can automatically parse the original text contained in the layout document. Specifically, the electronic device may analyze an original text in the layout document by using an Optical Character Recognition (OCR) technology to obtain a plurality of text lines and a line position of each text line in the layout document. But is not limited thereto.

For example, as shown in fig. 2, after the electronic device automatically parses the original text in the layout document, 4

text lines

201 and 204 and their line positions in the layout document can be obtained, where the 4 text lines are: wen fangfa, a text composition method, an electronic device, and a storage medium. "and" through the original text contained in the layout document is solved and "analyzed", the text data of the original text is obtained. ".

In other embodiments, when the original text that is not parsed still exists in the layout document after the electronic device automatically parses the original text in the layout document, the user may manually frame a layout document region where the text that is not parsed exists, so that the electronic device parses the original text in the layout document again.

Optionally, after obtaining text data of the original text, the text parsing method further includes: displaying the text data in the second display area; analyzing an original text contained in the layout document to obtain text data of the original text, wherein the analyzing comprises the following steps: and responding to the region selection operation of the original text in the layout document, and performing text analysis on the original text region to obtain a plurality of text lines of the original text and the line position of each text line in the layout document.

Specifically, the second display area may be any display area within the original text parsing interface of the layout document displayed by the electronic device, for example, but not limited to, the second display area may be located in a main display area of the original text parsing interface of the layout document.

Specifically, the operation of selecting the area of the original text in the layout document may include a frame selection operation on any area in the layout document by a touch manner or by operating a mouse, which is not limited herein.

And 102, dividing the plurality of text lines into an annotation line corresponding to a line to be annotated, a line to be annotated and a line to be annotated based on the line position.

The non-annotated line in the disclosed embodiment may be understood as a text line of the content that is not annotated, the line to be annotated may be understood as a text line of the content that is annotated, and the annotated line may be understood as a text line of the content that is annotated.

In the embodiment of the present disclosure, dividing the plurality of text lines into annotation lines corresponding to a line to be annotated, and a line to be annotated based on the line position may include steps S1021 to S1022:

and step S1021, calculating the relative position relation between every two text lines according to the line positions.

In the embodiment of the disclosure, after the electronic device parses the line positions of the text lines and the line positions of each text line in the layout document of the original text in the layout document, the relative position relationship between every two text lines can be calculated, so that the text lines are classified and associated according to the relative position relationship.

In some embodiments, the relative positional relationship may include a line spacing between every two text lines, and the line spacing between every two text lines may be calculated from the line positions. The line pitch is a distance of the text line in a direction perpendicular to the typesetting direction, where the typesetting direction of the text line refers to an arrangement direction of a plurality of characters in the text line, that is, an extending direction of a minimum bounding rectangle of the text line.

Specifically, when the minimum circumscribed rectangle frame of the text line includes two long sides and two short sides, that is, when the minimum circumscribed rectangle frame is a rectangle, the two long sides may be the first long side and the second long side, respectively. When an angle α between the extending direction of the long side of the minimum circumscribed rectangular frame and the horizontal direction is-90 ° < α <90 °, the first long side may be located above the second long side in the layout document, and at this time, when the line pitch of two text lines is calculated, a distance between the second side of the text line located above and the first side of the text line located below in the layout document is calculated.

For example, when the extending direction of the long side of the minimum bounding rectangle frame is the horizontal direction, the layout direction of the text line may be determined to be the horizontal direction. The two endpoints of the first long edge may be the top left corner vertex and the top right corner vertex of the minimum bounding rectangle frame, and the two endpoints of the second long edge may be the bottom left corner vertex and the bottom right corner vertex of the minimum bounding rectangle frame. At this time, the line spacing between the two text lines may be calculated from the ordinate of the lower left corner vertex (lower right corner vertex) of the minimum bounding rectangle box of the text line located above in the layout document and the ordinate of the upper left corner vertex (upper right corner vertex) of the minimum bounding rectangle box of the text line located below in the layout document.

For example, when the extending direction of the long side of the minimum bounding rectangle frame is the oblique extending direction, the layout direction of the text line may be determined to be the oblique direction. Moreover, the inclination degree of the typesetting direction of the text line can be represented by an angle α between the extending direction of the long side of the minimum circumscribed rectangular frame and the horizontal direction, when the typesetting direction of the text line is inclined upwards, the value range of α can be 0 ° < α <90 °, and when the typesetting direction of the text line is inclined downwards, the value range of α can be-90 ° < α <0 °. The two endpoints of the first long edge can be the top left corner vertex and the top right corner vertex of the minimum circumscribed rectangle frame, and the two endpoints of the second long edge can be the bottom left corner vertex and the bottom right corner vertex of the minimum circumscribed rectangle frame. At this time, the line spacing between the two text lines may be calculated from the coordinates (abscissa and ordinate) of the lower left corner vertex (lower right corner vertex) of the minimum bounding rectangle box of the text line located above in the layout document and the coordinates (abscissa and ordinate) of the upper left corner vertex (upper right corner vertex) of the minimum bounding rectangle box of the text line located below in the layout document.

Specifically, the specific manner of calculating the line spacing between two text lines when the minimum bounding rectangle of the text lines is a square is the same as the specific manner of calculating the line spacing between two text lines when the minimum bounding rectangle of the text lines is a rectangle, and details are not repeated here.

In other embodiments, the typesetting direction of each text line can be determined according to the line position, and then the line spacing between every two text lines is calculated according to the typesetting direction.

Specifically, determining the layout direction of each text line according to the text position may include: calculating the side lengths of four sides of the minimum circumscribed rectangular frame according to the coordinate information of each boundary of the minimum circumscribed rectangular frame of the text line in the target image; when the minimum circumscribed rectangle frame comprises two long sides and two short sides, determining the extension direction of the long sides as the typesetting direction of the text line corresponding to the minimum circumscribed rectangle frame; and when the minimum circumscribed rectangle frame comprises four edges with the same length, determining the typesetting direction of the text line corresponding to the minimum circumscribed rectangle frame according to the extending direction of the two connected edges and the typesetting direction of other text lines.

Step S1022, according to the relative position relationship, dividing the plurality of text lines into annotation lines corresponding to the line without annotation, the line to be annotated and the line to be annotated.

In the embodiment of the present disclosure, dividing the plurality of text lines into annotation lines corresponding to a line without annotation, a line to be annotated, and a line to be annotated according to the relative position relationship may include steps S102201 to S102203:

step S102201, regarding two adjacent text lines with a line spacing less than or equal to a preset distance threshold as a text line group with an upper and lower structure.

Optionally, the preset distance threshold includes any one of the following: the average line spacing of the plurality of text lines, and the product of the average line spacing of the plurality of text lines and a preset ratio.

In some embodiments, the preset distance threshold may include a line spacing mean of a plurality of adjacent text lines. Specifically, the line spacing between a plurality of adjacent text lines may be selected, that is, the line spacing between a preset number of adjacent text lines may be selected, and the distance threshold may be determined according to a mean value of the line spacing between the preset number of adjacent text lines. Wherein, a person skilled in the art can set the specific value of the preset number according to the actual situation, and the invention is not limited herein.

In other embodiments, the preset distance threshold may further include a product of a mean line spacing of a plurality of adjacent text lines and a preset ratio. Specifically, the product of the average value of the line distances of the preset number of adjacent text lines and the preset ratio may be used as the distance threshold. Wherein, the specific value of the preset ratio can be set by a person skilled in the art according to actual situations, and is not limited herein.

For example, N text lines are parsed from the original text, the line spacing between every two adjacent text lines is calculated, the mean value of the line spacing is calculated, and the mean value of the line spacing is directly used as the distance threshold, or the product of the mean value of the line spacing and a preset ratio is used as the distance threshold.

In the embodiment of the present disclosure, two adjacent text lines whose line spacing is smaller than or equal to the preset distance threshold may be used as a text line group having an upper and lower structure.

In some embodiments, for a certain text line, searching an adjacent text line whose line spacing is less than or equal to a preset distance threshold, taking the text line and the adjacent text line searched for the text line as a text line group with a top-bottom structure, otherwise, not taking the text line and the adjacent text line searched for the text line as a text line group with a top-bottom structure, and according to the searching manner, performing line-by-line searching operation on a plurality of text lines of the original text until all text line groups with top-bottom structures in the whole original text are searched.

Step S102202, according to a preset annotation structure, determining an annotation line corresponding to the line to be annotated and the line to be annotated in the text line group.

In some embodiments of the present disclosure, the preset annotation structure may include an annotation structure in which a line to be annotated is located at a lower part of the text line group, and an annotation line corresponding to the line to be annotated is located at an upper part of the text line group. As shown in fig. 2, the

text lines

204 and 201 are a text line group having an upper and lower structure, the text line 204 is an annotation line and is located at the upper part of the text line group, and the text line 201 is a line to be annotated and is located at the lower part of the text line group. The "wen" in the upper annotation line is used to annotate the pinyin for the "text" in the lower line to be annotated.

In other embodiments, the preset annotation structure may further include an annotation structure in which the line to be annotated is located at the upper part of the text line group, and the annotation line corresponding to the line to be annotated is located at the lower part of the text line group. For example, the text content of the line to be annotated is "Happy new year" which is located at the upper part of the text line group having the upper and lower structures, and the text content of the line to be annotated is "Happy new year" which is used to annotate the chinese meaning of the text of the line to be annotated, which is located at the lower part of the text line group.

Step 102203, regarding the other text lines except the text line group in the plurality of text lines as the non-comment lines.

In the embodiment of the present disclosure, one text line group includes one annotation line to be annotated and an annotation line corresponding to the line to be annotated, and it can be understood that a text line that is not divided into any text line group in a plurality of text lines may be regarded as an annotation-free line.

In other embodiments of the present disclosure, the dividing the plurality of text lines into annotation lines corresponding to a line to be annotated, and a line to be annotated according to the relative position relationship may further include steps S102204-S102207:

step S102204, two adjacent text lines with line spacing smaller than or equal to a preset distance threshold are taken as a text line group with an upper and lower structure.

In the embodiment of the present disclosure, reference may be specifically made to the above steps S102201 to S102203, which are not described herein again.

Step S102205 detects a text length of each text line in the text line group.

In the embodiment of the present disclosure, the text length of the text line may be understood as a distance from the left boundary to the right boundary of the minimum bounding rectangle box of the text line, for example, the distance may be from the abscissa of the vertex at the lower left corner (upper left corner) of the minimum bounding rectangle box of the text line to the abscissa of the vertex at the lower right corner (upper right corner), or the distance may be from the abscissa of the vertex at the lower left corner (upper left corner) of the minimum bounding rectangle box of the text line to the abscissa of the vertex at the upper right corner (lower right corner).

In the embodiment of the present disclosure, after a text line group having a top-bottom structure is obtained, a text length of each text line in the text line group is detected.

Step S102206, according to the text length, determining the annotation line corresponding to the line to be annotated and the line to be annotated in the text line group.

In the embodiment of the present disclosure, according to the text length of each text line, a text line with a longer text length in the text line group may be determined as a line to be annotated, and a text line with a shorter text length in the text line group may be determined as an annotation line.

Step 102207, regarding the other text lines except the text line group in the text lines as the non-comment lines.

In some embodiments of the present disclosure, before dividing the plurality of text lines into the non-annotated line, the to-be-annotated line, and the annotation line corresponding to the to-be-annotated line according to the line position, the plurality of text lines of the original text contained in the layout document may be first divided into a plurality of paragraphs, and then, for each paragraph, the plurality of text lines in the paragraph may be divided into the annotation line corresponding to the non-annotated line, the to-be-annotated line, and the annotation line corresponding to the to-be-annotated line according to the method of the above-described steps S1021-S1022.

Specifically, the manner of dividing the text lines into paragraphs may include the following three ways:

in some embodiments, the plurality of text lines may be divided into a plurality of paragraphs according to line position. The method comprises the steps that line spacing between every two text lines can be calculated according to line positions, adjacent text lines with the line spacing smaller than or equal to a preset threshold value are aggregated into a set, and a plurality of text line sets are obtained; and then merging the text lines in each text line set according to the line positions of the text lines to obtain a text paragraph corresponding to each text line set. Specifically, the specific value of the preset threshold may be set by a person skilled in the art according to actual situations, and is not limited herein. Through the mode, the text lines of the original text are combined line by line, and the text lines are divided into a plurality of paragraphs. It is to be understood that this is merely an illustrative, and not a sole, description of the specific manner in which text lines are divided into paragraphs based on line position.

In some embodiments, the plurality of text lines may be divided into a plurality of paragraphs according to the text characteristics of the respective text lines. Wherein the text features may include at least one of margins, fonts, font sizes, and colors. The margin herein refers to a distance between the minimum bounding rectangle of the text line and the left edge or the right edge of the layout document, and may be, for example, a distance between the top left corner vertex (or the bottom left corner vertex) of the minimum bounding rectangle of the text line and the left edge of the layout document.

Specifically, when the text features include any several elements of margins, fonts, word sizes and colors, the text lines with the same several elements may be divided into the same text line group. For example, when a text feature includes three elements of font, font size, and margin, then text lines having all three elements in common may be divided into the same text line group. The same applies to other combinations of text features including at least one of margins, fonts, font sizes and colors, and will not be described herein again.

In some embodiments, the plurality of text lines may be divided into a plurality of paragraphs according to line position and text characteristics of the respective text lines. Specifically, the text lines may be divided into a plurality of text line groups according to whether the text features are the same or not, then, line spacing calculation between every two text lines and aggregation of the text lines are performed inside each text line group, and the text lines are divided into a plurality of paragraphs, so that efficiency of merging the text lines into the paragraphs is improved.

It is to be understood that the present invention is only illustrative, but not exclusive, of the specific manner in which text lines are divided into paragraphs based on line positions and text characteristics of the individual text lines.

Step 103, determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated.

In some embodiments of the present disclosure, the line to be annotated includes at least one first character string, the annotation line corresponding to the line to be annotated includes at least one second character string, and the text data further includes a character string position of each first character string and a character string position of each second character string.

The position of the character string may include, but is not limited to, coordinate information of each boundary of the minimum bounding rectangle of the character string in the original text. The coordinate information of each boundary of the minimum bounding rectangle in the original text may include coordinates of four vertices of the minimum bounding rectangle, and may also include coordinate information of any point on each boundary of the minimum bounding rectangle in the original text.

Specifically, determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated may include steps S1031 to S1033:

and step S1031, aiming at each second character string, searching a first target character string corresponding to the second character string in at least one first character string according to the character string position of the second character string and the character string position of each first character string, wherein the central axes of the first target character string and the second character string are intersected.

Step S1032, the searched first target character string is used as the character string to be annotated.

Step S1033, the found second character string corresponding to the first target character string is used as an annotation character string corresponding to the character string to be annotated.

In the embodiment of the present disclosure, the central axis of the character string may be understood as a straight line that divides the minimum bounding rectangle of the character string into two rectangle frames having completely equal sizes in the text layout direction.

In the embodiment of the disclosure, the electronic device first determines the positions and the central axes of the first character strings and the second character strings, and searches for a first target character string corresponding to the second character string in at least one first character string according to the character string position of the second character string and the character string position of each first character string, for each second character string, where the central axes of the first target character string and the second character string intersect.

In an embodiment of the present disclosure, fig. 2 is a schematic diagram of an original text parsing interface of a layout document, as shown in fig. 2, 200 is text data of an original text, a text line 201 is a line to be annotated, a character string therein is a first character string, a text line 204 is an annotation line, a character string therein is a second character string, 205 "text", 206 "method" is a first target character string, 209 is a central axis of intersection of the first target character string 205 "text" and the second character string 207 "wen", that is, 205 "text" can be used as the character string to be annotated, and 207 "wen" can be used as the annotation character string corresponding to the character string 205 "text" to be annotated.

Similarly, 210 is a central axis of intersection of the first target string 206 "method" and the second string 208 "fangfa", that is, 206 "method" can be used as the string to be annotated, and 208 "fangfa" can be used as the annotation string corresponding to 206 "method" of the string to be annotated.

In other embodiments of the present disclosure, the line to be annotated includes at least one first character string, the annotation line corresponding to the line to be annotated includes at least one second character string, and the text data further includes an edit sequence of the at least one first character string and the at least one second character string.

The editing sequence can be understood as an original arrangement sequence of character strings in the text, and the editing sequence of the text can be obtained by analyzing the original text contained in the layout document.

Specifically, determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated may include steps S1034 to S1036:

step S1034, for each second character string, according to the editing sequence, searching a second target character string corresponding to the second character string in at least one first character string, where the second target character string is a first character string adjacent to the second character string in front of the editing sequence.

And step S1035, taking the searched second target character string as a character string to be annotated.

Step S1036, using the found second character string corresponding to the second target character string as an annotation character string corresponding to the character string to be annotated.

In one embodiment, for example, as shown in fig. 2, the editing order of the text in the

text lines

204 and 201 is "one text wen in the layout method fangfa, the electronic device and the storage medium", for one of the second character strings "wen", a second target character string corresponding to the second character string "wen" is searched for in the first character string "one text layout method, the electronic device and the storage medium" according to the editing order, and the second target character string is the first character string "text" whose editing order is adjacent to the second character string in front of the second character string.

And 104, generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated.

In the embodiment of the disclosure, the electronic device generates a target streaming document corresponding to the layout document according to annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated in the text data analyzed from the layout document, so that the layout document is converted into the streaming document to be displayed.

In the embodiment of the present disclosure, generating a target streaming document corresponding to a layout document based on an annotation character string corresponding to a non-annotation line, a line to be annotated, and a character string to be annotated in the line to be annotated may include steps S1041 to S1042:

and S1041, generating an initial streaming document corresponding to the layout document based on the basic layout label, the non-annotated line and the line to be annotated.

In the embodiment of the present disclosure, generating an initial streaming document corresponding to a layout document based on a basic layout label, a non-annotated line, and a line to be annotated may include steps S104101 to S104102:

step S104101, performing basic typesetting on the non-annotated line and the line to be annotated according to the line position to obtain a basic text sequence, wherein the basic text sequence comprises at least one basic text segment.

Specifically, the basic text sequence includes at least one basic text segment, one text line may form one basic text segment, and a plurality of text lines may also form one basic text segment.

Based on the mode, the non-annotated lines and the lines to be annotated of the whole original text are subjected to basic typesetting, and a plurality of basic text sequences consisting of basic text segments can be obtained.

Step S104102, adding a basic typesetting label to each basic text segment,

the basic type setting label in the embodiment of the present disclosure may be understood as a label representing a basic type setting style of a text line, and may include labels such as a paragraph label, a font label, and a font color label.

And adding a basic typesetting label to each basic text segment in the whole original text to generate an initial streaming document of the original text.

For example, the addition of the basic typesetting tag is illustrated by using a paragraph tag, i.e. a p tag, in hypertext Markup Language (HTML) as an example. As shown in fig. 2, the line to be annotated 201 and the line to be annotated 204, based on the base typeset label, the line without annotation, and the line to be annotated, generate an initial streaming document tag may be represented as:

" a text wen the typesetting method fangfa, electronic equipment and storage medium. And analyzing the original text contained in the layout document to obtain the text data of the original text. ".

Step S1042, based on the upper and lower structure typesetting labels, the character strings to be annotated in the lines to be annotated and the annotation character strings corresponding to the character strings to be annotated, the initial streaming document is corrected to obtain the target streaming document.

The upper and lower structure typesetting tags in the embodiment of the disclosure may include a ruby tag in hypertext Markup Language (HTML), the ruby may be used as a comment tag, an rp tag and an rt tag are provided inside the ruby tag, the < ruby > may mark and define the comment content, the < rp > may indicate a browser that does not support the ruby element and in what way the comment content should be displayed, and the < rt > may mark and define the content Text for ruby comment.

In the embodiment of the present disclosure, the correcting the initial streaming document based on the upper and lower structure typesetting labels, the character strings to be annotated in the lines to be annotated, and the annotation character strings corresponding to the character strings to be annotated to obtain the target streaming document may include steps S104201 to S104202:

step S104201, according to the annotation structure corresponding to the line to be annotated, generating a first annotation label structure corresponding to the character string to be annotated based on the upper and lower structure typesetting labels, the character string to be annotated in the line to be annotated, and the annotation character string corresponding to the character string to be annotated.

In the embodiment of the disclosure, according to an annotation structure corresponding to a line to be annotated, the arrangement positions of the upper and lower structure type-setting labels, the character string to be annotated in the line to be annotated, and the annotation character string corresponding to the character string to be annotated are determined, and then the upper and lower structure type-setting labels, the character string to be annotated in the line to be annotated, and the annotation character string corresponding to the character string to be annotated are placed at the determined arrangement positions, so as to generate a first annotation label structure corresponding to the character string to be annotated.

In some embodiments, if the annotation structure corresponding to the line to be annotated is an annotation structure in which the line to be annotated is located at the lower part of the text line group and the annotation line corresponding to the line to be annotated is located at the upper part of the text line group, as shown in fig. 2, a line to be annotated 201 and an annotation line 204, the first annotation tag structure corresponding to the generated character string to be annotated may be:

"< rub > text < rt > wen </rt > </rub >.

"< rub > method < rt > fangfa </rt > </rub >.

If the annotation structure corresponding to the line to be annotated is that the line to be annotated is located at the upper part of the text line group, the annotation line corresponding to the line to be annotated is located at the lower part of the text line group, for example, the text content of the line to be annotated is "Happy new year", which is located at the upper part of the text line group with the upper and lower structures, the text content of the line to be annotated is "Happy new year", which is used to annotate the chinese meaning of the text of the line to be annotated, which is located at the lower part of the text line group, the generated first annotation tag structure corresponding to the character string to be annotated may be:

"< rub > < rt > Happy new year </rt >".

In other embodiments, if the annotation structure corresponding to the line to be annotated is according to the text length, the annotation line corresponding to the line to be annotated is determined in the text line group, the text line with a longer text length in the text line group is determined as the line to be annotated, and the text line with a shorter text length in the text line group is determined as the annotation line. For example, as shown in fig. 2, the

text lines

201 and 204 are text line groups having a top-bottom structure, and the text line 201 is determined to be a line to be annotated, because the text length is long; if the text line 204 is short and determined as an annotation line, the generated first annotation tag structure corresponding to the character string to be annotated may be:

"< rub > text < rt > wen </rt > </rub >.

"< rub > method < rt > fangfa </rt > </rub >.

Step S104202, in the initial streaming document, replacing the character string to be annotated in the line to be annotated with the first annotation tag structure, so as to obtain a target streaming document.

In the embodiment of the present disclosure, an initial streaming document is corrected based on a top-bottom structure layout tag, a to-be-annotated string in a to-be-annotated line, and an annotation string corresponding to the to-be-annotated string, to obtain a target streaming document, for example, as shown in fig. 2, a to-be-annotated line 201 and an annotation line 204 are a text line group having a top-bottom structure, an annotation string corresponding to a "text" of a to-be-annotated string 205 is 207 "wen", an annotation string corresponding to a "method" of a to-be-annotated string 206 is 208 "fangfa", the initial streaming document is corrected based on the top-bottom structure layout tag, the to-be-annotated string in the to-be-annotated line, and the annotation string corresponding to the to-be-annotated string, and the obtained target streaming document may be represented as:

" a < rub > text < rt > wen </rt > </rub > this typeset < rub > method < rt > fangfa </rub >, electronic equipment and storage medium. And analyzing the original text contained in the layout document to obtain the text data of the original text. ".

Based on the embodiment of the disclosure, the text data of the original text is obtained by analyzing the original text contained in the layout document, and the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document; dividing a plurality of text lines into annotation lines corresponding to the lines to be annotated, the lines to be annotated and the lines to be annotated based on the line positions; determining a character string to be annotated in a line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated; and generating a target streaming document corresponding to the format document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated, so that the problem of disordered layout of the streaming document converted from the format document with the annotation content is effectively avoided, and the electronic reading experience of a user is improved.

Fig. 3 is a flowchart of another text typesetting method provided in the embodiment of the present disclosure, and as shown in fig. 3, the method provided in the embodiment includes the following steps:

step 301, analyzing the original text contained in the layout document to obtain the original text

The text data includes a plurality of text lines of the original text and a line position of each text line in the layout document.

Step 302, dividing the plurality of text lines into an annotation line corresponding to a line to be annotated, and a line to be annotated based on the line position.

Step 303, determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated.

And 304, generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated.

Step 301 to step 304 are similar to step 101 to step 104 in the embodiment shown in fig. 1, and are not described herein again.

Step 305, displaying the layout document and the target streaming document.

In some embodiments of the present disclosure, a portion of a display interface of an electronic device displays a layout document and another portion of the display interface displays a target streaming document. Specifically, the electronic device displays the layout document on the left or upper side of the display interface, and displays the target streaming document on the lower or right side of the display interface, but is not limited thereto. For example, as shown in fig. 4A, fig. 4A is a schematic diagram of a document display interface, 400 is a layout document interface displayed on the upper portion of the electronic device display interface, and 408 is a target streaming document interface displayed on the lower portion of the electronic device display interface.

And step 306, responding to the annotation typesetting triggering operation of the third character string and the fourth character string in the layout document, and determining a third target character string and a fourth target character string in the third character string and the fourth character string according to the annotation structure corresponding to the line to be annotated.

In the embodiment of the present disclosure, if the text line group with the top-bottom structure is not displayed as the top-bottom structure in the displayed target streaming document, but the line to be annotated and the annotation line are respectively displayed as two separate text lines or in other forms, the user may perform a trigger operation on the annotation typesetting of the third character string and the fourth character string in the layout document, for example, may click the third character string and the fourth character string respectively, or may perform box selection on the third character string and the fourth character string.

In this embodiment of the disclosure, the triggering operation may include a frame selection operation on any area of a character string in the layout document and a click operation on a related button by a touch manner or by operating a mouse, which is not limited herein.

The electronic device, in response to an annotation typesetting triggering operation on a third character string and a fourth character string in the layout document, determines a third target character string and a fourth target character string in the third character string and the fourth character string according to an annotation structure corresponding to a line to be annotated, which may include steps 306101-306103:

step 306101, in response to the annotated typesetting triggering operation on the third character string and the fourth character string in the layout document, determining a first text line to which the third character string belongs and a second text line to which the fourth character string belongs.

In the embodiment of the disclosure, in response to an annotation typesetting triggering operation on a third character string and a fourth character string in a layout document, the electronic device determines a range of the annotation typesetting triggering operation, and determines character strings falling within the triggering range, namely the third character string and the fourth character string, based on the positions of the character strings. Since the line position of the character string in the layout document is the same as the line position of the text line to which the character string belongs in the layout document, the first text line to which the third character string belongs and the second text line to which the fourth character string belongs can be determined according to the line positions of the third character string and the fourth character string in the layout document.

And step 306102, determining the line to be annotated and the annotation line in the first text line to which the third character string belongs and the second text line to which the fourth character string belongs according to the annotation structure corresponding to the line to be annotated.

In some embodiments of the present disclosure, the annotation structure corresponding to the line to be annotated may be a preset annotation structure, where in some embodiments, the preset annotation structure may be an annotation structure in which the line to be annotated is located at a lower part of the text line group, and the annotation line corresponding to the line to be annotated is located at an upper part of the text line group, for example, if the first text line to which the third character string belongs is located at the lower part of the text line group, and the second text line to which the fourth character string belongs is located at the upper part of the text line group, the first text line to which the third character string belongs is the line to be annotated, and the second text line to which the fourth character string belongs is the annotation line to be annotated. In other embodiments, the preset annotation structure may be an annotation structure in which a line to be annotated is located at the upper part of the text line group, and an annotation line corresponding to the line to be annotated is located at the lower part of the text line group, and if the first text line to which the third character string belongs is located at the lower part of the text line group and the second text line to which the fourth character string belongs is located at the upper part of the text line group, the first text line to which the third character string belongs and the second text line to which the fourth character string belongs are located at the line to be annotated.

In other embodiments, according to the text length of the text line, the annotation line corresponding to the line to be annotated and the line to be annotated may be determined in the text line group, the text line with a longer text length in the text line group may be determined as the line to be annotated, and the text line with a shorter text length in the text line group may be determined as the annotation line. And if the text length of the first text line to which the third character string belongs is longer and the text length of the second text line to which the fourth character string belongs is shorter, determining the first text line to which the third character string belongs as a line to be annotated and determining the second text line to which the fourth character string belongs as an annotation line.

Step 306103, determining a third target string and a fourth target string in the third string and the fourth string.

In the embodiment of the disclosure, after determining the to-be-annotated string and the annotation line in the first text line to which the third string belongs and the second text line to which the fourth string belongs, the string in the to-be-annotated line may be determined as the to-be-annotated string, the string in the annotation line may be determined as the annotation string, and the third target string and the fourth target string may be determined in the to-be-annotated string and the annotation string.

For example, fig. 4A is a schematic diagram of a document display interface, as shown in fig. 4A, 400 is an interface of a layout document displayed by an electronic device, 408 is an interface of a target streaming document displayed by the electronic device, 401 and 404 are separate text lines, 405 "text" is a third character string, 406 "wen" is a fourth character string, 407 "typeset up and down" is a button for generating typeset up and down, a user clicks 405 "text", 406 "wen", or frames 405 "text" and 406 "wen" (as shown in fig. 4B), and then clicks 407 "typeset up and down" button, the electronic device may determine a first text behavior text line 402 to which 405 "text" belongs, a second text behavior text line 401 to which 406 "wen" belongs according to the scope of the trigger operation, if the annotation structure corresponding to the line to be annotated is located at the lower part of the text line group, the annotation line corresponding to the annotation line is located at the upper part of the text line group, the text line 402 is a line to be annotated, the text line 401 is an annotation line, the third character string 405 "text" in the line to be annotated is determined as a third target character string, and the fourth character string 406 "wen" in the annotation line is determined as a fourth target character string.

And 307, generating a second annotation label structure corresponding to the third target character string based on the annotation structure corresponding to the line to be annotated and the typesetting label of the upper and lower structures, the third target character string and the fourth target character string.

In the embodiment of the disclosure, in the target streaming document, based on the annotation structure corresponding to the line to be annotated, the electronic device generates a second annotation tag structure corresponding to the third target character string based on the upper and lower structure typesetting tags, the third target character string and the fourth target character string.

For example, the originally displayed target streaming document of the layout document 400 in fig. 4A is:

“wen

a text typesetting < ruby > method < rt > fangfa </rub >, an electronic device and a storage medium. And analyzing the original text contained in the layout document to obtain the text data of the original text. [ p ] "

The second annotation tag structure in the target streaming document corresponding to the third target string "text" may be represented as: < rub text < rt > wen </rt > </rub >.

And 308, replacing the third target character string with the second annotation label structure and deleting the fourth target character string in the text line to which the fourth target character string belongs in the target streaming document.

For example, replacing the third target string "text" in step 307 with a second annotation tag structure "< ruby > text < rt > wen </rt > </ruby >", deleting the fourth target string "wen" in the text line to which the fourth target string "wen" belongs, i.e., the target streaming document may be displayed as:

“

a < ruby text < rt > wen </rt > </ruby text typesetting < ruby > method < rt > fangfa </rt > </ruby >, electronic device and storage medium. And analyzing the original text contained in the layout document to obtain the text data of the original text. ".

Step 309, detecting the number of the remaining character strings of the text line to which the fourth target character string belongs.

And step 310, if the number of the remaining character strings is zero, deleting the text line to which the fourth target character string belongs and the typesetting label corresponding to the text line to which the fourth target character string belongs in the target streaming document.

For example, as in the target streaming document obtained in step 308, if the number of remaining character strings of the text line to which the fourth target character string "wen" belongs in the target streaming document is zero, then in the target streaming document, the text line to which the fourth target character string "wen" belongs and the typesetting label "" corresponding to the text line to which the fourth target character string "wen" belongs are deleted, that is, the target streaming document may be displayed as follows:

" a < ruby > text type-setting < rt > wen </rt > </ruby > method < rt > fangfa </rt > </ruby >, electronic device and storage medium. And analyzing the original text contained in the layout document to obtain text data of the original text. ".

Based on the embodiment of the disclosure, the text data of the original text is obtained by analyzing the original text contained in the layout document, wherein the text data comprises a plurality of text lines of the original text and the line position of each text line in the layout document; dividing a plurality of text lines into annotation lines corresponding to the lines to be annotated, the lines to be annotated and the lines to be annotated based on the line positions; determining a character string to be annotated in a line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated; generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated; displaying the format document and the target streaming document, responding to annotation typesetting triggering operation of a third character string and a fourth character string in the format document, and determining a third target character string and a fourth target character string in the third character string and the fourth character string according to annotation structures corresponding to lines to be annotated; generating a second annotation label structure corresponding to a third target character string based on the annotation structure corresponding to the line to be annotated and the upper and lower structure typesetting labels, the third target character string and the fourth target character string; in the target streaming document, replacing the third target character string with a second annotation label structure and deleting the fourth target character string in a text line to which the fourth target character string belongs; detecting the number of the remaining character strings of the text line to which the fourth target character string belongs; if the number of the remaining character strings is zero, deleting the text line to which the fourth target character string belongs and the typesetting label corresponding to the text line to which the fourth target character string belongs in the target streaming document, effectively avoiding the problem of disordered layout of the streaming document converted from the format document with the annotation content, ensuring the normal display of the annotation content in the streaming document, and improving the electronic reading experience of the user.

Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. It should be noted that the electronic device 500 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present invention.

The electronic device 500 conventionally includes a processor 510 and a computer program product or computer-readable medium in the form of a memory 520. The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a memory space 521 for executable instructions (or program code) 5211 for performing any of the method steps in the text composition method described above. For example, the memory space 521 for executable instructions may include various executable instructions 5211 for implementing various steps in the above text composition method, respectively. The executable instructions may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such computer program products are typically portable or fixed storage units. The memory unit may have a memory segment or a memory space or the like arranged similarly to the memory 520 in the electronic device of fig. 5. The executable instructions may be compressed, for example, in a suitable form. Generally, the storage unit comprises executable instructions for executing the steps of the text composition method according to the invention, i.e. codes readable by a processor such as the processor 510, for example, which when run by an electronic device, cause the electronic device to perform the steps of the text composition method described above, in a similar manner and with similar advantageous effects, and will not be described again here.

Of course, for the sake of simplicity, only some of the components of the electronic apparatus 500 related to the present invention are shown in fig. 5, and components such as a bus, an input/output interface, an input device, an output device, and the like are omitted. In addition, the electronic device 500 may include any other suitable components depending on the particular application.

The embodiments of the present disclosure further provide a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, the processor is enabled to implement the text typesetting method provided in the foregoing embodiments, and the execution manner and the beneficial effects thereof are similar, and are not described herein again.

The computer-readable storage medium described above may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer programs described above may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages, for performing the operations of embodiments of the present disclosure. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

The invention discloses:

a1, a text typesetting method, comprising:

analyzing an original text contained in a layout document to obtain text data of the original text, wherein the text data comprises a plurality of text lines of the original text and line positions of each text line in the layout document;

dividing the plurality of text lines into a line without annotation, a line to be annotated and an annotation line corresponding to the line to be annotated based on the line position;

determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in an annotation line corresponding to the line to be annotated;

and generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated.

A2, the method of A1, the dividing the plurality of text lines into a no-comment line, a to-be-comment line, and a comment line corresponding to the to-be-comment line based on the line position, comprising:

calculating the relative position relation between every two text lines according to the line positions;

and dividing the plurality of text lines into an annotation line corresponding to a line without annotation, a line to be annotated and the line to be annotated according to the relative position relationship.

A3, the method of A1, further comprising, before the dividing the plurality of text lines into a no-comment line, a to-be-comment line, and a comment line corresponding to the to-be-comment line according to the line position:

dividing the text lines into a plurality of paragraphs according to the line positions and/or the text characteristics of the text lines;

wherein, according to the line position, dividing the plurality of text lines into an annotation line corresponding to a line to be annotated, and the line to be annotated includes:

for each paragraph, dividing a plurality of text lines in the paragraph into an annotating line corresponding to the paragraph, a non-annotating line corresponding to the paragraph, a line to be annotated, and an annotating line corresponding to the line to be annotated.

A4, according to the method of A2, the relative positional relationship includes a line spacing between every two text lines;

wherein, according to the relative position relationship, dividing the plurality of text lines into an annotation line corresponding to a line without annotation, a line to be annotated and the line to be annotated includes:

taking two adjacent text lines with the line spacing smaller than or equal to a preset distance threshold value as a text line group with an upper structure and a lower structure;

according to a preset annotation structure, determining the line to be annotated and the annotation line corresponding to the line to be annotated in the text line group;

and taking other text lines in the plurality of text lines except the text line group as the non-annotation line.

A5, according to the method of A2, the dividing the plurality of text lines into a no-comment line, a to-be-comment line and a comment line corresponding to the to-be-comment line according to the relative positional relationship, includes:

detecting the text length of each text line in the text line group;

according to the text length, determining the line to be annotated and the annotation line corresponding to the line to be annotated in the text line group;

A6, the method of A4 or A5, further comprising, before the calculating a relative positional relationship between each two of the lines of text according to the line positions:

determining the typesetting direction of each text line according to the line position;

wherein the line spacing is a distance in a direction perpendicular to the layout direction.

A7, the method according to any one of A4-A6, the preset distance threshold comprising any one of:

and the average line spacing of the plurality of text lines, and the product of the average line spacing of the plurality of text lines and a preset ratio.

A8, the method according to any one of A1-A7, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises a character string position of each first character string and a character string position of each second character string;

wherein, the determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated comprise:

for each second character string, searching a first target character string corresponding to the second character string in the at least one first character string according to the character string position of the second character string and the character string position of each first character string, wherein the central axis of the first target character string is intersected with the central axis of the second character string;

taking the searched first target character string as the character string to be annotated;

and taking the second character string corresponding to the searched first target character string as the annotation character string corresponding to the character string to be annotated.

A9, according to the method of any one of A1-A7, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises an editing sequence of the at least one first character string and the at least one second character string;

for each second character string, searching a second target character string corresponding to the second character string in the at least one first character string according to the editing sequence, wherein the second target character string is a first character string adjacent to the second character string in front of the editing sequence;

taking the searched second target character string as the character string to be annotated;

and taking the second character string corresponding to the searched second target character string as an annotation character string corresponding to the character string to be annotated.

A10, the method according to any one of A1-A9, wherein the generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated and the character string to be annotated in the line to be annotated comprises:

generating an initial streaming document corresponding to the layout document based on a basic typesetting label, the non-annotation line and the line to be annotated;

and correcting the initial streaming document based on the upper and lower structure typesetting labels, the character strings to be annotated in the lines to be annotated and the annotation character strings corresponding to the character strings to be annotated to obtain the target streaming document.

A11, according to the method of A10, the generating an initial streaming document corresponding to the layout document based on the basic layout label, the line without annotation and the line to be annotated includes:

performing basic typesetting on the non-annotated line and the line to be annotated according to the line position to obtain a basic text sequence, wherein the basic text sequence comprises at least one basic text segment;

and adding a basic typesetting label to each basic text segment to generate the initial streaming document. .

A12, according to the method of a10 or a11, the correcting the initial streaming document based on the upper and lower structure typesetting label, the character string to be annotated in the line to be annotated, and the annotation character string corresponding to the character string to be annotated to obtain a target streaming document, including:

generating a first annotation label structure corresponding to the character string to be annotated based on an upper structure typesetting label and a lower structure typesetting label, the character string to be annotated in the line to be annotated and the annotation character string corresponding to the character string to be annotated according to the annotation structure corresponding to the line to be annotated;

in the initial streaming document, replacing the character string to be annotated in the line to be annotated with the first annotation tag structure to obtain the target streaming document.

A13, according to the method of any one of A1-A12, after the generating a target streaming document corresponding to the layout document, the method further comprising:

displaying the layout document and the target streaming document;

responding to an annotation typesetting triggering operation of a third character string and a fourth character string in the layout document, and determining a third target character string and a fourth target character string in the third character string and the fourth character string according to an annotation structure corresponding to the line to be annotated;

generating a second annotation label structure corresponding to the third target character string based on the annotation structure corresponding to the line to be annotated and based on the upper and lower structure typesetting labels, the third target character string and the fourth target character string;

replacing the third target string with the second annotation tag structure and deleting the fourth target string in the text line to which the fourth target string belongs in the target streaming document.

A14, after deleting the fourth target string in the text line to which the fourth target string belongs according to the method of A13, the method further comprising:

detecting the number of the remaining character strings of the text line to which the fourth target character string belongs;

and if the number of the residual character strings is zero, deleting the text line to which the fourth target character string belongs and the typesetting label corresponding to the text line to which the fourth target character string belongs in the target streaming document.

A15, an electronic device comprising a processor and a memory, the memory for storing executable instructions that cause the processor to:

determining a character string to be annotated in the line to be annotated and determining an annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated;

A16, the electronic device of A15, when the processor executes the dividing of the plurality of text lines into annotation lines that correspond to no annotation line, a line to be annotated, and the line to be annotated based on the line position, the executable instructions cause the processor to perform:

A17, the electronic device of A15, the executable instructions causing the processor to perform, prior to the processor performing the dividing of the plurality of text lines into annotation lines according to the line positions that are not annotated, to be annotated, and to which the annotation line corresponds:

dividing the plurality of text lines into a plurality of paragraphs according to the line positions and/or the text characteristics of the text lines;

A18, the electronic device of A16, the relative positional relationship including a line spacing between each two of the text lines;

wherein when the processor executes the division of the plurality of text lines into annotation lines corresponding to a line without annotation, a line to be annotated and the line to be annotated according to the relative position relationship, the executable instructions cause the processor to execute:

A19, the electronic device according to A16, when the processor executes the dividing of the plurality of text lines into annotation lines corresponding to a line without annotation, a line to be annotated and the line to be annotated according to the relative positional relationship, the executable instructions cause the processor to execute:

taking two adjacent text lines with line spacing smaller than or equal to a preset distance threshold value as a text line group with an upper structure and a lower structure;

detecting the text length of each text line in the text line group;

A20, the electronic device of A18 or A19, the executable instructions further cause the processor to perform, before the processor performs the calculating of the relative positional relationship between each two text lines according to the line positions:

A21, the electronic device of any of A18-A20, the preset distance threshold comprising any of:

A22, the electronic device according to any one of A15-A21, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises a character string position of each first character string and a character string position of each second character string;

wherein, when the processor executes the determining of the character string to be annotated in the line to be annotated and the determining of the annotation character string corresponding to the character string to be annotated in the annotation line corresponding to the line to be annotated, the executable instructions cause the processor to execute:

and taking the second character string corresponding to the searched first target character string as an annotation character string corresponding to the character string to be annotated.

A23, the electronic device according to any one of A15-A21, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises an editing sequence of the at least one first character string and the at least one second character string;

and taking the second character string corresponding to the searched second target character string as the annotation character string corresponding to the character string to be annotated.

A24, according to the electronic device of any one of A15-A23, when the processor executes the target streaming document corresponding to the layout document is generated based on the annotation character strings corresponding to the non-annotation line, the line to be annotated, and the character string to be annotated in the line to be annotated, the executable instructions cause the processor to execute:

A25, when the processor executes the initial streaming document corresponding to the layout document generated based on the basic typeset label, the non-annotated line and the line to be annotated, the electronic device according to A24, the executable instructions cause the processor to execute:

and adding a basic typesetting label to each basic text segment to generate the initial streaming document.

A26, according to the electronic device of a24 or a25, executing, by the processor, the typesetting label based on the upper and lower structures, the character string to be annotated in the line to be annotated, and the annotation character string corresponding to the character string to be annotated, correcting the initial streaming document to obtain a target streaming document, where the executable instructions cause the processor to perform:

generating a first annotation label structure corresponding to the character string to be annotated based on an upper and lower structure typesetting label, the character string to be annotated in the line to be annotated and the annotation character string corresponding to the character string to be annotated according to the annotation structure corresponding to the line to be annotated;

A27, according to the electronic device of any one of A15-A26, after the processor executes the target streaming document corresponding to the generation of the layout document, the executable instructions further cause the processor to execute:

displaying the layout document and the target streaming document;

A28, the electronic device of A27, the executable instructions further cause the processor to perform, after the processor executes the text line to which the fourth target string belongs and deletes the fourth target string:

A29, a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to carry out the method of text composition as claimed in any one of the preceding claims a1-a 14.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A text typesetting method is characterized by comprising the following steps:

2. The method of claim 1, wherein the dividing the plurality of text lines into an annotating line corresponding to a line to be annotated, and the line to be annotated based on the line position comprises:

3. The method of claim 1, wherein prior to said dividing the plurality of text lines into an annotation line corresponding to a line to be annotated, and the line to be annotated according to the line position, the method further comprises:

4. The method according to claim 1, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises a character string position of each first character string and a character string position of each second character string;

5. The method according to claim 1, wherein the line to be annotated comprises at least one first character string, the annotation line corresponding to the line to be annotated comprises at least one second character string, and the text data further comprises an edit sequence of the at least one first character string and the at least one second character string;

for each second character string, searching a second target character string corresponding to the second character string in the at least one first character string according to the editing sequence, wherein the second target character string is a first character string which is adjacent to the second character string in front of the editing sequence;

6. The method according to claim 1, wherein the generating a target streaming document corresponding to the layout document based on the annotation character strings corresponding to the non-annotation line, the line to be annotated, and the character strings to be annotated in the line to be annotated comprises:

7. The method according to any one of claims 1-6, wherein after the generating a target streaming document corresponding to the layout document, the method further comprises:

displaying the layout document and the target streaming document;

responding to annotation typesetting triggering operation of a third character string and a fourth character string in the format document, and determining a third target character string and a fourth target character string in the third character string and the fourth character string according to annotation structures corresponding to the lines to be annotated;

8. The method according to claim 7, wherein after deleting the fourth target character string in the text line to which the fourth target character string belongs, the method further comprises:

9. An electronic device comprising a processor and a memory, the memory to store executable instructions that cause the processor to:

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, causes the processor to implement the method of composing text as claimed in any one of claims 1 to 8.