WO2023029116A1

WO2023029116A1 - Text image typesetting method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023029116A1
Application number: PCT/CN2021/119226
Authority: WO
Inventors: 华杰
Original assignee: 广东艾檬电子科技有限公司
Priority date: 2021-08-30
Filing date: 2021-09-18
Publication date: 2023-03-09
Also published as: CN115730563A

Abstract

A text image typesetting method and apparatus, an electronic device, and a storage medium. The method comprises: performing text line detection on first text region images respectively to determine line information corresponding to text image blocks comprised in each of the first text region images, wherein the line information comprises line height, line head position coordinates, and line tail position coordinates (110); and matching the text image blocks according to the line information corresponding to the text image blocks comprised in the first text image to generate at least one text line to obtain a line list corresponding to the first text image, wherein a matching value between two adjacent text image blocks in each text line satisfies a threshold condition (120). According to the method, the accuracy of the text typesetting of skewed or curved distorted text images can be improved.

Description

Text image typesetting method, device, electronic device and storage medium

technical field

The present application relates to the technical field of graphic typesetting, and in particular to a text image typesetting method, device, electronic equipment and storage medium.

Background technique

Text layout can improve the user's reading experience. The current text typesetting method is mainly aimed at text images with regular text content. Multiple sub-regions containing text lines are obtained by image segmentation of the text image, and the image coordinates of each sub-region are sorted from top to bottom and from left to right. However, in a text image with slanted or curved text content, it is difficult to sort each text line accurately.

Contents of the invention

The embodiment of the present application discloses a typesetting method, device, electronic device and storage medium of a text image, capable of accurately sorting text lines in a distorted text image whose text content is inclined or curved.

The first aspect of the embodiment of the present application discloses a text image typesetting method, the method comprising:

Perform text line detection on the first text image respectively, and determine line information corresponding to each text image block contained in the first text image, wherein the line information includes line height, line head position coordinates, and line end position coordinates;

According to the line information corresponding to each text image block included in the first text image, match each text image block to generate at least one text line to obtain a line list corresponding to the first text image, wherein, The matching value between two adjacent text image blocks in each text line satisfies the threshold condition.

As an optional implementation manner, in the first aspect of the embodiments of the present application, the matching is performed on each text image block according to the line information corresponding to each text image block included in the first text image, Generate at least one line of text, including:

According to the line information corresponding to each text image block contained in the first text image, determine each matching value between the first text image block and each other text image block, wherein the first text image block is the first Any text image block in the first text image, the other text image blocks are text image blocks in the first text image other than the first text image block;

Determine the maximum matching value among the respective matching values, and add the first text image block to other text image blocks corresponding to the maximum matching value, so as to generate at least one text line.

As an optional implementation manner, in the first aspect of the embodiment of the present application, in performing text line detection on the first text image, it is determined that each text image block contained in the first text image corresponds to After the line information, also include:

According to the abscissa of the line-head position coordinates of each text image block included in the first text image, and according to the order of the abscissa from small to large, the text image blocks are pre-sorted to obtain the first text image The corresponding sequence of image blocks.

As an optional implementation manner, in the first aspect of the embodiments of the present application, the matching is performed on each text image block according to the line information corresponding to each text image block included in the first text image, Generate at least one text line to obtain a line list corresponding to the first text image, including:

Establishing a line list corresponding to the first text image, and creating a new text line in the line list according to the first text image block arranged in the first image block sequence corresponding to the first text image;

Determining the current text image block from the first image block sequence, and matching the line information of the current text image block with the line information of the text image block at the end of each text line in the line list;

If the current text image block is successfully matched with the text image block at the end of the target text line, then the current text image block is added to the end of the target text line to update the target text line at the end text image blocks;

If the current text image block and the text image blocks at the end of each text line are not matched successfully, then create a new text line in the line list according to the current text image block;

Taking the next text image block in the first image block sequence as a new current text image block, and continuing to perform the step of arranging the line information of the current text image block with each text line in the line list. The step of matching the line information of the last text image block until the current text image block is the last text image block of the first image block sequence.

As an optional implementation manner, in the first aspect of the embodiments of the present application, the determining the current text image block from the first image block sequence, combining the line information of the current text image block with the line Each text line in the list is matched with the line information of the text image block at the end, including:

According to the line information of the current text image block and the line information of the text image block whose first text line is at the end in the line list, determine the position between the current text image block and the first text line at the end Matching values between text image blocks, the first text line is any text line in the line list;

If the matching value is greater than the matching threshold, the current text image block is successfully matched with the text image block at the end of the first text line, and the first text line is used as the target text line;

If the matching value is not greater than the matching threshold, the current text image block is not successfully matched with the text image block at the end of the first text line.

As an optional implementation, in the first aspect of the embodiment of the present application, if the current text image block is successfully matched with the text image block at the end of the target text line, the current text image block Added to the end of the target text line to update the text image block at the end of the target text line, including:

If the matching value between the current text image block and a text image block at the end of a target text line satisfies the threshold condition, then the current text image block is added to the end of the target text line to update all The text image block at the end of the target text line;

If the matching values between the current text image block and at least two text image blocks whose target text lines are arranged at the end all satisfy the threshold condition, then determine the maximum of the matching values of the text image blocks whose target text lines are arranged at the end matching value, and adding the current text image block to the end of the target text line corresponding to the maximum matching value, so as to update the text image block at the end of the target text line.

According to the arrangement order of each text line in the line list, the current text image block is respectively matched with the line information of the text image block at the end of each text line, wherein each text line in the line list is based on the The creation time of each text line is arranged from first to last;

And, if the current text image block is successfully matched with the text image block at the end of the target text line, then adding the current text image block to the end of the target text line to update the target text line Text image blocks at the end, including:

When it is detected that the current text image block is successfully matched with the text image block at the end of the target text line, the current text image block is added to the end of the target text line to update the target text line at the end text image block, and stop continuing to match the current text image block.

As an optional implementation manner, in the first aspect of the embodiment of the present application, in performing text line detection on the first text image, determine the line information corresponding to each text image block included in the first text image Previously, also included:

Perform region segmentation on the full text image to obtain at least one first text image, wherein the first text image is any first text image in the at least one first text image.

The second aspect of the embodiment of the present application discloses a text image typesetting device, the device includes:

A text detection module, configured to perform text line detection on the first text image, and determine line information corresponding to each text image block contained in the first text image, wherein the line information includes line height, line head position coordinates and The position coordinates of the end of the line;

A text typesetting module, configured to match each text image block according to the line information corresponding to each text image block contained in the first text image, and generate at least one text line, so as to obtain the first text image corresponding A line list of , where the matching values between two adjacent text image blocks in each text line satisfy the threshold condition.

The third aspect of the embodiment of the present application discloses an electronic device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor implements the embodiment of the present application The first aspect discloses a text image typesetting method.

The fourth aspect of the embodiment of the present application discloses a computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the method for typesetting a text image disclosed in the first aspect of the embodiment of the present application is implemented.

Compared with related technologies, the embodiments of the present application have the following beneficial effects:

By performing text line detection on the first text image, the line information of each text image block contained in the first text image is determined. The line information may include line height, line head position coordinates, and line end position coordinates. According to the content of the first text image The line information corresponding to each text image block of each text image block is matched to generate at least one text line, and then the line list corresponding to the first text image is obtained, wherein the distance between two adjacent text image blocks in each text line The matching value of satisfies the threshold condition. In the embodiment of the present application, according to the line information of each text image block contained in the first text image, it is matched with other text image blocks in the first text image to generate at least one text line, and the text image in the first text image Each text line constitutes the line list corresponding to the first text image, and can typeset the text lines of the skewed or curved distorted text image, and the typesetting process is carried out according to line information such as line height, line head position coordinates, and line end position coordinates. Matching, which can improve the accuracy of text typesetting for skewed or curved distorted text images.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

FIG. 1 is a schematic flow diagram of a text image typesetting method disclosed in an embodiment of the present application;

Fig. 2 is a schematic flow chart of a text image typesetting method disclosed in the embodiment of the present application;

Fig. 3 is a flowchart of the row list construction disclosed in the embodiment of the present application;

FIG. 4 is a structural diagram of a sequence of constructed image blocks and a row list disclosed in an embodiment of the present application;

Fig. 5 is a flow chart of matching the current text image block in the embodiment of the present application;

Fig. 6 is a schematic structural diagram of a text image typesetting device disclosed in an embodiment of the present application;

Fig. 7 is a schematic structural diagram of another text image typesetting device disclosed in the embodiment of the present application;

Fig. 8 is a schematic structural diagram of another text image typesetting device disclosed in the embodiment of the present application;

Fig. 9 is a schematic structural diagram of an electronic device disclosed by an embodiment.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present application and the drawings are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

The embodiment of the present application discloses a typesetting method, device, electronic device and storage medium of a text image, capable of accurately sorting text lines in a distorted text image whose text content is inclined or curved. A detailed description is given below in conjunction with the accompanying drawings.

Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of a typesetting method for text images disclosed in an embodiment of the present application. The typesetting method of the text image can be applied to a terminal device, and the terminal device may include but not limited to smart watches, smart phones, smart bracelets and tablet computers, etc., and the method may include the following steps:

110. Perform text line detection on the first text image, and determine line information corresponding to each text image block included in the first text image, where the line information includes line height, line start position coordinates, and line end position coordinates.

In the embodiment of the present application, the terminal device performs text line detection on the first text image, and the text line detection is used to determine each text image block contained in the first text image and the line information corresponding to each text image block, wherein, Text line detection can be realized by using PSENet (Progressive Scale Expansion) algorithm, CPTN (Connectionist Text Proposal Network) algorithm, CRAFT (Character Region Awareness for Text Detection) algorithm or LOMO (Local Maximal Occurrence) algorithm, etc. Algorithms that support curved text line detection in deep learning can be used, which is not specifically limited here.

In the embodiment of the present application, the first text image is an image containing text, and the length of the text contained in the image is not limited. The first text image may use a scanning device such as a scanning pen to input text, thereby generating a first text image corresponding to the input text.

In some embodiments, the text line detection can first preprocess the first text image, such as grayscale, binarization, and smoothing, to unify the first text image specifications, and then pass one or more bounding boxes The form marks one or more characters in the first text image, the bounding box is a text image block, and finally extracts the line height, line head position coordinates, and line end position coordinates of each bounding box as the corresponding text image block row information. The text image blocks included in each first text image and the line information corresponding to the text image blocks can be determined conveniently.

In some embodiments, the terminal device performs text line detection on the first text image, and determines each text image block in the first text image; establishes a coordinate system corresponding to the first text image area, and determines four text image blocks corresponding to each text image. According to the four corner points, the line information such as line height, line head position coordinates and line end position coordinates of each text image block is determined. Wherein, each text image block may include four corner points, and according to the coordinate system corresponding to the first text image area, the coordinates of the midpoint of the line connecting the two corner points on the left and the coordinates of the line connecting the two corner points on the right Midpoint coordinates, according to the four corner points of the text image block and the determined coordinates of the two midpoints, the row height in the row information can be determined as the difference between the vertical coordinates of the two left corner points, and the row head position The coordinates can be the coordinates of the left midpoint, and the coordinates of the line end position can be the coordinates of the right midpoint. By using four corner points instead of the text image block to determine the line information, the standard of the line information of each text image block can be unified.

As an optional embodiment, before performing text line detection on the first text image at step 110 and determining line information corresponding to each text image block contained in the first text image, the method further includes:

In the embodiment of the present application, the terminal device first obtains the full text image, and after obtaining the full text image, it can use a region segmentation algorithm to perform region segmentation on the full text image to obtain at least one first text image, wherein the full text image contains multiple An image of text, such as an image of a paper containing multiple paragraphs, or an image of a student assignment containing a large question with three sub-questions. The full-text image may be obtained by the terminal device using a camera installed on the terminal device to capture or scan the text to obtain the full-text image, or by accessing a server or other terminal devices to obtain the full-text image. The first text image is each sub-region obtained after region segmentation of the full-text image. For example, when the full-text image is an image containing a big question and three small questions in the big question, the first text image can be An image of the region for the three sub-questions. The region segmentation algorithm used for region segmentation may be an edge-based image segmentation algorithm, a region growing algorithm, a region splitting and merging algorithm, or a level set algorithm, etc., which are not specifically limited here. By performing region segmentation on the full text image, when the full text image contains more content, a smaller first text image that is convenient for text line detection and typesetting can be obtained, and the typesetting effect of the full text image can be improved.

120. According to the line information corresponding to each text image block included in the first text image, match each text image block to generate at least one text line to obtain a line list corresponding to the first text image, wherein each text line The matching value between two adjacent text image blocks in , satisfies the threshold condition.

In the embodiment of the present application, the terminal device calculates each text image according to the line information of each text image block included in the first text image, that is, according to the line height, line head position coordinates, and line end position coordinates of each text image block. Image blocks are matched, and if a text image block is successfully matched with another text image block, a text image block is added to another text image block to generate a text line. For a successful match, if the row heights are equal, and the coordinates of the first position of a text image block are the same as the coordinates of an end position of another text image block, then the matching is considered successful. Or, use a custom algorithm to calculate the matching value between two text image blocks by combining line information such as line height, line head position coordinates, and line end position coordinates, and compare the matching value with the threshold condition. If the threshold condition is met, then It is considered that the matching is successful, and the setting of the threshold condition and the custom algorithm are not specifically limited. For adding another text image block to one text image block, the line head of one text image block may be connected to the line end of another text image block.

The terminal device forms at least one text line according to each text image block after matching, and forms a line list corresponding to the first text image according to at least one text line formed in the first text image, wherein the line list is Typesetting text list in a text image.

In the embodiment of the present application, the terminal device matches other text image blocks in the first text image according to the line information of each text image block contained in the first text image, so as to generate at least one text line, the first Each text line in the text image constitutes the line list corresponding to the first text image, which can typeset the text lines of the skewed or curved distorted text image, and the typesetting process is based on the line height, the position coordinates of the beginning of the line and the position coordinates of the end of the line Matching the equal line information can improve the accuracy of text typesetting for skewed or curved distorted text images.

As an optional embodiment, in step 120, according to the line information corresponding to each text image block contained in the first text image, matching each text image block to generate at least one text line may include the following steps:

According to the line information corresponding to each text image block contained in the first text image, determine the respective matching values between the first text image block and each other text image block, wherein the first text image block is the first text image block Any text image block, other text image blocks are text image blocks other than the first text image block in the first text image;

A maximum matching value among the respective matching values is determined, and the first text image block is added to other text image blocks corresponding to the maximum matching value, so as to generate at least one text line.

In the embodiment of the present application, the terminal device can determine the matching value between a text image block and other text image blocks in the first text image according to the line information corresponding to each text image block included in the first text image . For each text image block, select the maximum matching value from each matching value between the text image block and other text image blocks, for example, the first text image block and the other 3 text image blocks in the first text image The matching values between are 50, 60 and 90 respectively, then select the maximum matching value of 90, and add the first text image block to the text image block with a matching value of 90.

In this embodiment of the present application, for each text image block, the terminal device can also select the matching value that best satisfies the threshold condition from the matching values between the text image block and other text image blocks, wherein, the most satisfying threshold value The condition refers to within the range of the threshold condition and is closest to the threshold condition, for example, if the threshold condition is within 100, then the matching values between the first text image block and the other 3 text image blocks in the first text image are respectively is 80, 90, and 110, then select the matching value that most satisfies the threshold condition to be 90, and add the first text image block to the text image block with a matching value of 90.

Therefore, each text image block in the first text image is matched with other text image blocks in the first text image to determine the most matching text image block, and splicing with the most matching text image block to At least one text line is generated, and the terminal device then forms a line list corresponding to the first text image according to the at least one text line generated in the first text image. A text image block can be matched with other text image blocks, and the most matching text image block can be determined and added to the text image block to generate a text line and a line list corresponding to the first text image, effectively realizing the first Typography of text images. Wherein, the first text image block is added to another text image block, specifically, the line head of the first text image block is connected to the line end of another text image block.

Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a typesetting method for a text image disclosed in an embodiment of the present application. The typesetting method of the text image can be applied to the terminal device, and may include the following steps:

210. Perform text line detection on the first text image, and determine line information corresponding to each text image block included in the first text image, where the line information includes line height, line start position coordinates, and line end position coordinates.

In the embodiment of the present application, the process of the terminal device performing this step is the same as that of the above-mentioned embodiments, and will not be repeated here.

220. According to the abscissa of the line head position coordinates of each text image block included in the first text image, and in the order of the abscissa coordinates from small to large, pre-sort each text image block to obtain an image block corresponding to the first text image sequence.

In the embodiment of the present application, after the terminal device determines the line information corresponding to each text image block contained in the first text image, for the first text image, according to the line information of each text image block contained in the first text image The abscissa in the first position coordinates pre-sorts each text image block in the first text image, wherein the pre-sorting is sorted according to the order of the abscissas of the first position coordinates of each line from small to large, to obtain the first An image block sequence corresponding to the text image, where the image block sequence is the pre-sorted text image blocks in the first text image. It can ensure that each text image block is in the regular text sequence from left to right in the first text image, thereby effectively improving the accuracy of the subsequently generated text lines and line lists.

230. According to the line information corresponding to each text image block included in the first text image, match each text image block to generate at least one text line to obtain a line list corresponding to the first text image, wherein each text line The matching value between two adjacent text image blocks in , satisfies the threshold condition.

In the embodiment of the present application, the terminal device matches each text image block with each other according to the line information corresponding to each text image block included in the first text image, if the matching value between two text image blocks satisfies the threshold condition , then the mosaic relationship between the two text image blocks can be determined according to the sequence in the image block sequence, for example, the matching value between text image block 1 and text image block 2 satisfies the threshold condition, and text image block 1 is in The order in the image block sequence is before the text image block 2, so between the two text image blocks, the beginning of the line of the text image block 2 needs to be connected with the end of the line of the text image block 1, and the text image blocks between each text image block The splicing relationship is all such that at least one text line is generated and a list of lines corresponding to the first text image is obtained.

As an optional embodiment, please refer to Fig. 3, Fig. 3 is a flow chart of row list construction disclosed in the embodiment of the present application. In step 230, according to the line information corresponding to each text image block contained in the first text image, each text image block is matched to generate at least one text line, so as to obtain the line list corresponding to the first text image, which may include the following steps:

310. Establish a line list corresponding to the first text image, and create a new text line in the line list according to the first text image block in the first image block sequence corresponding to the first text image.

In the embodiment of the present application, please refer to FIG. 4 , which is a structural diagram of the constructed image block sequence and row list disclosed in the embodiment of the present application. After determining the image block sequence 410 corresponding to the first text image, the terminal device constructs a row list 420 for the first text image in the text image. The terminal device creates a new text image block in the line list according to the first text image block arranged in the image block sequence corresponding to the first text image, that is, the text image block with the smallest abscissa in the first text image. line, that is to say, the text image block with the smallest abscissa first generates a text line in the line list alone.

In some embodiments, if there are a plurality of text image blocks in the image block sequence and rank first, then according to the ordinate of the line head position coordinates of the plurality of text image blocks, the second step is performed in the order of the ordinate from large to small. Sort, select the text image block that is arranged in the first place after secondary sorting, and create a new text line in the line list.

In this embodiment of the application, in the sequence of image blocks corresponding to the first text image, if there are multiple text image blocks with the same abscissa of the line-head position coordinates, the terminal device can The vertical coordinates of the image blocks are used to reorder these text image blocks, so that the text image block with the largest vertical coordinates is selected to create a new text line in the line list, which can ensure that each text image block is in the first text image Middle is the regular text order from left to right and top to bottom, which further improves the accuracy of subsequent generated text lines and line lists.

320. Determine the current text image block from the first image block sequence, and match the line information of the current text image block with the line information of the text image block whose respective text lines are arranged at the end in the line list.

In the embodiment of the present application, the terminal device determines the current text image block from the first image block sequence, please refer to Figure 4 again, each text image block in the first image block sequence corresponding to the first text image obtained by the terminal device Text image block 1, text image block 2, text image block 3, text image block 4, etc. in sequence, the terminal device arranges the first text image block in the image block sequence, that is, text image block 1, in After creating a new text line in the line list, the current text image block determined by the terminal device from the image block sequence is the text image block 2, and the terminal device arranges the text image block 2 and each text line in the line list at the end The line information of the text image block is matched. At this time, there is only one text line created with the text image block 1 in the line list, so the text image block at the end of the text line is the text image block 1, that is to say, the terminal device will The text image block 2 is matched with the line information of the text image block 1 whose text line is arranged at the end.

330. If the current text image block matches the text image block at the end of the target text line successfully, add the current text image block to the end of the target text line, so as to update the text image block at the end of the target text line.

In the embodiment of the present application, if the current text image block matches the text image block at the end of the target text line successfully, the terminal device will add the current text image block to the end of the target text line to take the current text image block as the target The new text image block at the end of the text line. Wherein, the target text behavior is the text line where the text image block at the end is located that successfully matches the current text image block. Please refer to Figure 4 again. If the current text image block, that is, text image block 2 and text image block 1, are successfully matched, the terminal device will add text image block 2 to the end of the text line where text image block 1 is located, so that the text image Block 2 replaces text image block 1 as the new last text image block for the text line.

340. If the current text image block does not match successfully with the text image block at the end of each text line, create a new text line in the line list according to the current text image block.

In the embodiment of the present application, if the current text image block is not successfully matched with the text image block at the end of each text line, then the terminal device creates a new text line in the line list according to the current text image block. Please refer to Fig. 4 again. In Fig. 4, when the current text image block is text image block 2, the terminal device only needs to match text image block 2 with text image block 1, and text image block 2 and text image block 1 If the matching is not successful, the terminal device creates a new text line in the line list according to the text image block 2, that is, opens another text line, and there is only the text image block 2 in this text line temporarily.

350. Use the next text image block in the first image block sequence as a new current text image block, and continue to execute the process of combining the line information of the current text image block with each text line in the line list at the end of the text image block The step of matching row information until the current text image block is the last text image block of the first image block sequence.

In the embodiment of the present application, after the terminal device matches the current text image block with the text image block at the end of each text line, the next text image block in the first image block sequence is used as the new current text image block, and the The new current text image block continues to be matched with the line information of the text image block at the end of each text line in the line list, and the cycle continues until each text image block in the first image block sequence has been matched with each text line Matches the line information of the text image block at the end.

Please refer to FIG. 4 again. For example, the terminal device matches the text image block 2 with the text image block 1 at the end of the only text line in the line list at the current moment. After the matching is unsuccessful, the terminal device matches the text image block 2 according to the Create a new text line in the line list. After that, the terminal device uses the next text image block in the image block sequence 410, that is, the text image block 3, as the new current text image block, and associates the text image block 3 with each text line in the line list 420 respectively. The text image blocks at the end are matched. At this time, there are only two text lines in the line list. The text image blocks at the end of these two text lines are text image block 1 and text image block 2 respectively. Therefore, the terminal device will The text image block 3 is matched with the text image block 1 and the text image block 2 respectively. At this time, the text image block 3 is successfully matched with the text image block 1, but not successfully matched with the text image block 2, so the terminal device will then use the current text image block block, that is, text image block 3, is added to the end of the target text line, and the target text line is the text line where text image block 1 is located, and the terminal device replaces text image block 1 with text image block 3 to update to this The text image block with the target text line at the end.

After that, the terminal device uses the text image block 4 as a new current text image block according to the image block sequence 410, and matches the text image block 4 with the text image block at the end of each text line in the line list 420, and then There are only two text lines in the line list, and the text image blocks at the end of these two text lines are updated to text image block 3 and text image block 2. Therefore, the terminal device combines text image block 4 with text image block 3 and text image block 2 respectively. The text image block 2 is matched. At this time, the text image block 4, the text image block 3 and the text image block 2 are not matched successfully, then the terminal device will create a new text line in the line list according to the text image block 4 , that is, another text line is opened again, and there is only text image block 4 in this text line temporarily, and at this time, three text lines are included in the line list.

By analogy, until all the text image blocks in the image block sequence are added to the line list according to the sequence in the image block sequence. There are 14 text image blocks in the image block sequence 410 in Fig. 4, so the terminal device will add the text image block 14 to a certain text line in the line list 420 or create a new text line according to the text image block 14, to end the above process. The first text image and the number of text image blocks included in the image block sequence are not limited.

In the embodiment of the present application, by sequentially matching the text image block with the text image block at the end of the existing text line in the line list according to the order of the first image block sequence, the text image block is selected to be added to the existing text image block. A text line or a new text line is created, according to the conventional text order from left to right in the first text image, which further improves the accuracy of subsequent generated text lines and line lists.

As an optional embodiment, please refer to FIG. 5 , which is a flow chart of matching current text image blocks in the embodiment of the present application. Determine the current text image block from the image block sequence in step 320, the line information of the current text image block is matched with the line information of the text image block at the end of each text line in the line list, and may include the following steps:

510. According to the line information of the current text image block and the line information of the text image block with the first text line at the end in the line list, determine the distance between the current text image block and the text image block with the first text line at the end Match value, the first text line is any text line in the line list.

In the embodiment of the present application, after the terminal device determines the current text image block from the first image block sequence, it calculates the current text image block and line The first text line in the list is the matching value between the text image blocks at the end, and the matching value is compared with a preset matching threshold, where the matching threshold can represent the matching degree between two text image blocks , that is, the larger the matching value, the more matching between the two text image blocks, and the smaller the matching value, the less matching between the two text image blocks. Wherein, the first text line is any text line in the line list.

520. If the matching value is greater than the matching threshold, the current text image block is successfully matched with the text image block at the end of the first text line, and the first text line is used as the target text line.

In the embodiment of the present application, if the matching value between two text image blocks is greater than the matching threshold, then the terminal device can determine that the two text image blocks match successfully, and use the first text line as the target text line, where , the target text behavior matches the first text line where the text image block successfully matches the current text image block.

530. If the matching value is not greater than the matching threshold, the current text image block is not successfully matched with the text image block at the end of the first text line.

In the embodiment of the present application, corresponding to step 520, if the matching value between the two text image blocks is not greater than the matching threshold, the terminal device may determine that the two text image blocks are not successfully matched.

By calculating the matching value and comparing the matching value with the threshold value, it is possible to more quickly determine whether any two text image blocks are successfully matched, thereby improving the typesetting efficiency of the text image.

As an optional embodiment, in step 320, the current text image block is determined from the first image block sequence, and the line information of the current text image block and the lines of the text image block at the end of each text line in the line list are arranged Information matching may include the following steps:

According to the arrangement order of each text line in the line list, the current text image block is respectively matched with the line information of the text image block at the end of each text line, wherein, the order of each text line in the line list is based on each The creation time of the lines of text is sorted from first to last.

And, in step 330, if the current text image block and the text image block at the end of the target text line match successfully, then the current text image block is added to the end of the target text line to update the text image block at the end of the target text line , can include the following steps:

When detecting that the current text image block matches the text image block at the end of the target text line, add the current text image block to the end of the target text line to update the text image block at the end of the target text line, and stop Continue to match the current text image block.

In the embodiment of the present application, after the terminal device determines the current text image block from the first image block sequence, it matches the current text image block with the line information of the text image block at the end of each text line in a certain order, Wherein, the certain order is the arrangement order of each text line in the line list, and the arrangement order of each text line is arranged from first to last according to the creation time of each text line.

When the terminal device detects that the current text image block successfully matches the text image block at the end of the target text line, the terminal device adds the current text image block to the end of the target text line to use the current text image block as the target text line The new text image block at the end, and stop continuing to match the current text image block. Please refer to FIG. 4 again. For example, if the current text image block currently determined by the terminal device is text image block 5, according to the above embodiment, it can be known that the line list currently contains 3 text lines, and these three text lines are based on If the creation time of the text line is determined from first to last, the order of the text line is that the text line where the text image block 1 and the text image block 3 are located is arranged first, the text line where the text image block 2 is located is arranged second, and the text line where the text image block 2 is located is arranged second. The text line where the image block 4 is located is arranged third, then, the terminal device matches the text image block 5 with the line information of the text image block at the end of the three text lines according to the arrangement order of the text lines, that is to say The terminal device matches the text image block 5 first with the text image block 3 , then with the text image block 2 , and finally with the text image block 4 . At this time, when the text image block 5 is successfully matched with the text image block 3, the terminal device will add the text image block 5 to the end of the text line where the text image block 3 is located, so that the text image block 5 replaces the text image block 3 , to be the new text image block at the end of the text line. At this time, the terminal device stops matching the text image block 5 with the text image block 2 and the text image block 4 . And if the text image block 5 is not matched successfully with the text image block 3, then the terminal device will match the text image block 5 with the text image block 2, if the matching is successful, then stop matching with the text image block 4, if the matching is not successful , then match the text image block 5 with the text image block 4 at last, if the match is successful, then add the text image block 5 to the text row where the text image block 4 is located, and update the text image block 5 to the text row in the If the text image block at the end is still not matched successfully, a new text line is created in the line list according to the text image block 5.

In the embodiment of the present application, by matching the current text image block with the text image block at the end of each text line in sequence, and adding it to the text line where the first successfully matched text image block is located, the calculation can be reduced volume, and improve the efficiency of text and image typesetting.

As an optional embodiment, if the current text image block matches the text image block at the end of the target text line in step 330, the current text image block is added to the end of the target text line to update the target text line The text image block at the end may include the following steps:

If the matching value between the current text image block and only one text image block at the end of the target text line satisfies the threshold condition, then the current text image block is added to the end of the target text line to update the target text line at the end text image blocks;

If the matching value between the current text image block and at least two target text lines at the end of the text image block satisfies the threshold condition, then determine the maximum matching value among the matching values of each target text line at the end of the text image block, And adding the current text image block to the end of the target text line corresponding to the maximum matching value, so as to update the text image block at the end of the target text line.

In the embodiment of the present application, when the terminal device matches the current text image block with the text image block at the end of each text line, if the current text image block is matched with the text image block at the end of each text line, only If the matching value between a text image block and the current text image block satisfies the threshold condition, then the terminal device will add the current text image block to the end of the target text line where the text image block whose matching value satisfies the threshold condition is located, so as to update the target A text image block with lines of text at the end.

If the matching values between the current text image block and the text image blocks with multiple target text lines at the end all meet the threshold condition, the terminal device can calculate the current text image block and the multiple target text lines at the end. Matching values between text image blocks to select one of the maximum values, add the current text image block to the target text line where the text image block corresponding to the maximum value is located, and use the current text image block as the target text line The new text image block at the end.

In the embodiment of the present application, in the case that the matching values between the current text image block and multiple text image blocks all satisfy the threshold condition, the text image block with the largest matching value is selected, and the current text image block is added to the text image block. In the target text line where the text image block is located, the typesetting accuracy of the text image is effectively guaranteed.

As an optional embodiment, in step 120, according to the line information corresponding to each text image block included in the first text image, each text image block is matched to generate at least one text line, so as to obtain the first text image corresponding After the list of lines, the following steps can also be performed:

Combining each first text image to obtain a typesetting text image, wherein the first text image is any first text image in at least one first text image, and at least one first text image can be used for the full text The image is segmented into regions.

In the embodiment of the present application, if the terminal device obtains a full text image and performs region segmentation on the full text image to obtain multiple first text images, then the terminal device can separately analyze the text image blocks in each first text image at the same time For typesetting, the text image blocks in each first text image may also be typed one by one. After the typesetting of the text image blocks of each first text image is completed, the terminal device combines each first text image to obtain a complete typesetting text image, realizing the typesetting of the entire text image.

Please refer to FIG. 6 . FIG. 6 is a schematic structural diagram of a text image typesetting device disclosed in an embodiment of the present application. As shown in FIG. 6 , the text image typesetting device includes: a text detection module 610 and a text typesetting module 620 .

The text detection module 610 is configured to perform text line detection on the first text image respectively, and determine the line information corresponding to each text image block contained in the first text image, wherein the line information includes line height, line head position coordinates, and line end Position coordinates;

The text typesetting module 620 is configured to match each text image block according to the line information corresponding to each text image block included in the first text image, and generate at least one text line to obtain a line list corresponding to the first text image, wherein , the matching value between two adjacent text image blocks in each text line satisfies the threshold condition.

As an optional embodiment, the text typesetting module 620 is also used for:

Please refer to FIG. 7 . FIG. 7 is a schematic structural diagram of another text image typesetting device disclosed in an embodiment of the present application. Wherein, the text image typesetting device shown in FIG. 7 is further optimized from the text image typesetting device shown in FIG. 6 . Compared with the text image typesetting device shown in FIG. 6, the text image typesetting device 600 shown in FIG. 7 may further include:

The text sorting module 630 is used for pre-sorting each text image block according to the abscissa of the line head position coordinates of each text image block contained in the first text image, according to the order of the abscissa from small to large, to obtain the first text The image corresponds to a sequence of image blocks.

As an optional embodiment, the text typesetting module 620 is also used for:

Establish a line list corresponding to the first text image, and create a new text line in the line list according to the first text image block arranged in the first image block sequence corresponding to the first text image;

Determine the current text image block from the first image block sequence, and match the line information of the current text image block with the line information of the text image block at the end of each text line in the line list;

If the current text image block matches successfully with the text image block at the end of the target text line, then the current text image block is added to the end of the target text line to update the text image block at the end of the target text line;

If the current text image block does not match the text image block at the end of each text line, a new text line is created in the line list according to the current text image block;

Use the next text image block in the first image block sequence as the new current text image block, and continue to execute the line information of the current text image block and the line information of the text image block at the end of each text line in the line list The matching step is performed until the current text image block is the last text image block of the first image block sequence.

As an optional embodiment, the text typesetting module 620 is also used for:

According to the line information of the current text image block and the line information of the text image block whose first text line is at the end in the line list, determine the matching value between the current text image block and the text image block whose first text line is at the end , the first text line is any text line in the line list;

As an optional embodiment, the text typesetting module 620 is also used for:

According to the arrangement order of each text line in the line list, match the current text image block with the line information of the text image block at the end of each text line in sequence, wherein each text line in the line list is based on the creation time of each text line Arranged from first to last;

As an optional embodiment, the text typesetting module 620 is also used for:

If the matching values between the current text image block and at least two text image blocks whose target text lines are arranged at the end all satisfy the threshold condition, then determine the maximum matching value among the matching values of the text image blocks whose target text lines are arranged at the end , and add the current text image block to the end of the target text line corresponding to the maximum matching value, so as to update the text image block at the end of the target text line.

Please refer to FIG. 8 . FIG. 8 is a schematic structural diagram of another text image typesetting device disclosed in an embodiment of the present application. Wherein, the text image typesetting device shown in FIG. 8 is further optimized from the text image typesetting device shown in FIG. 6 . Compared with the text image typesetting device shown in FIG. 6, the text image typesetting device 600 shown in FIG. 8 may further include:

The text segmentation module 640 is configured to perform region segmentation on the full text image to obtain at least one first text image, wherein the first text image is any first text image in the at least one first text image.

Please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of an electronic device disclosed by an embodiment. As shown in FIG. 9, the electronic device 900 may include:

a memory 910 storing executable program code;

a processor 920 coupled to the memory 910;

Wherein, the processor 920 invokes the executable program code stored in the memory 910 to execute any text image typesetting method disclosed in the embodiments of the present application.

It should be noted that the electronic device shown in FIG. 9 may also include components not shown, such as a power supply, an input button, a camera, a speaker, a screen, an RF circuit, a Wi-Fi module, a Bluetooth module, and a sensor, which will not be described in detail in this embodiment.

The embodiment of the present application discloses a computer-readable storage medium, which stores a computer program, wherein the computer program causes a computer to execute any typesetting method of a text image disclosed in the embodiment of the present application.

The embodiment of the present application discloses a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing the computer program, and the computer program is operable to cause the computer to execute any text disclosed in the embodiment of the present application How images are typed.

It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in an embodiment" in various places throughout the specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by this application.

In various embodiments of the present application, it should be understood that the sequence numbers of the above-mentioned processes do not necessarily mean the order of execution. The implementation of the examples constitutes no limitation.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, located in one place, or distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute some or all of the steps of the above-mentioned methods in various embodiments of the present application.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium includes read-only Memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.

A text image typesetting method, device, wireless earphone, and storage medium disclosed in the embodiments of the present application have been described above in detail. In this paper, specific examples are used to illustrate the principles and implementation methods of the present application. The description of the above embodiments It is only used to help understand the method and core idea of this application. At the same time, for those skilled in the art, based on the idea of this application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as limiting the application.

Claims

A method for typesetting text images, characterized in that the method comprises:

Perform text line detection on the first text image, and determine line information corresponding to each text image block contained in the first text image, wherein the line information includes line height, line head position coordinates, and line end position coordinates;

According to the line information corresponding to each text image block included in the first text image, match each text image block to generate at least one text line to obtain a line list corresponding to the first text image, wherein, The matching value between two adjacent text image blocks in each text line satisfies the threshold condition.
The method according to claim 1, characterized in that, according to the line information corresponding to each text image block included in the first text image, matching each text image block to generate at least one text line, include:

According to the line information corresponding to each text image block contained in the first text image, determine each matching value between the first text image block and each other text image block, wherein the first text image block is the first Any text image block in the text image, the other text image blocks are text image blocks other than the first text image block in the first first text image;

Determine the maximum matching value among the respective matching values, and add the first text image block to other text image blocks corresponding to the maximum matching value, so as to generate at least one text line.
The method according to claim 1, characterized in that, after performing text line detection on the first text image and determining the line information corresponding to each text image block contained in the first text image, further comprising :

According to the abscissa of the line-head position coordinates of each text image block included in the first text image, and according to the order of the abscissa from small to large, the text image blocks are pre-sorted to obtain the first text image The corresponding sequence of image blocks.
The method according to claim 3, characterized in that, according to the line information corresponding to each text image block included in the first text image, matching each text image block to generate at least one text line, To obtain the row list corresponding to the first text image, including:

Establishing a line list corresponding to the first text image, and creating a new text line in the line list according to the first text image block arranged in the first image block sequence corresponding to the first text image;

Determining the current text image block from the first image block sequence, and matching the line information of the current text image block with the line information of the text image block at the end of each text line in the line list;

If the current text image block is successfully matched with the text image block at the end of the target text line, then the current text image block is added to the end of the target text line to update the target text line at the end text image blocks;

If the current text image block and the text image blocks at the end of each text line are not matched successfully, then create a new text line in the line list according to the current text image block;

Taking the next text image block in the first image block sequence as a new current text image block, and continuing to perform the step of arranging the line information of the current text image block with each text line in the line list. The step of matching the line information of the last text image block until the current text image block is the last text image block of the first image block sequence.
The method according to claim 4, characterized in that, determining the current text image block from the first image block sequence, combining the line information of the current text image block with each text line in the line list The line information of the text image block at the end is matched, including:

According to the line information of the current text image block and the line information of the text image block whose first text line is at the end in the line list, determine the position between the current text image block and the first text line at the end Matching values between text image blocks, the first text line is any text line in the line list;

If the matching value is greater than the matching threshold, the current text image block is successfully matched with the text image block at the end of the first text line, and the first text line is used as the target text line;

If the matching value is not greater than the matching threshold, the current text image block is not successfully matched with the text image block at the end of the first text line.
The method according to claim 5, wherein if the current text image block is successfully matched with the text image block at the end of the target text line, then adding the current text image block to the target text end of line to update the text image block at the end of the target text line, including:

If the matching value between the current text image block and a text image block at the end of a target text line satisfies the threshold condition, then the current text image block is added to the end of the target text line to update all The text image block at the end of the target text line;

If the matching values between the current text image block and at least two text image blocks whose target text lines are arranged at the end all satisfy the threshold condition, then determine the maximum of the matching values of the text image blocks whose target text lines are arranged at the end matching value, and adding the current text image block to the end of the target text line corresponding to the maximum matching value, so as to update the text image block at the end of the target text line.
The method according to claim 4, characterized in that, determining the current text image block from the first image block sequence, combining the line information of the current text image block with each text line in the line list The line information of the text image block at the end is matched, including:

According to the arrangement order of each text line in the line list, the current text image block is respectively matched with the line information of the text image block at the end of each text line, wherein each text line in the line list is based on the The creation time of each text line is arranged from first to last;

And, if the current text image block is successfully matched with the text image block at the end of the target text line, then adding the current text image block to the end of the target text line to update the target text line Text image blocks at the end, including:

When it is detected that the current text image block is successfully matched with the text image block at the end of the target text line, the current text image block is added to the end of the target text line to update the target text line at the end text image block, and stop continuing to match the current text image block.
The method according to any one of claims 1-7, characterized in that, before performing text line detection on the first text image and determining the line information corresponding to each text image block included in the first text image, Also includes:

Perform region segmentation on the full text image to obtain at least one first text image, wherein the first text image is any first text image in the at least one first text image.
A typesetting device for text images, characterized in that the device includes:

A text detection module, configured to perform text line detection on the first text image, and determine line information corresponding to each text image block contained in the first text image, wherein the line information includes line height, line head position coordinates and The position coordinates of the end of the line;

A text typesetting module, configured to match each text image block according to the line information corresponding to each text image block contained in the first text image, and generate at least one text line, so as to obtain the first text image corresponding A line list of , where the matching values between two adjacent text image blocks in each text line satisfy the threshold condition.
An electronic device, characterized by comprising a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor realizes any one of claims 1 to 8. method described in the item.
A computer-readable storage medium on which a computer program is stored, wherein the computer program implements the method according to any one of claims 1 to 8 when executed by a processor.