CN115392188A

CN115392188A - Method and device for generating editable document based on non-editable image-text images

Info

Publication number: CN115392188A
Application number: CN202211036598.0A
Authority: CN
Inventors: 李青; 郑悦闻; 王飞; 李鹏飞
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-25
Also published as: WO2024041032A1

Abstract

The invention relates to a method and a device for generating an editable document based on a non-editable image-text image, wherein the method comprises the following steps: acquiring an uneditable image-text image; extracting contour features and text features from the non-editable image-text images; generating initial structured data according to the outline characteristic and the text characteristic; determining the relationship between two elements in the non-editable image-text images based on the pre-trained element relationship classification model, the outline characteristics and the text characteristics; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features and/or text features of two elements and relation labels of the two elements; supplementing the initial structured data based on the relationship between the two elements to obtain final structured data; and generating an editable document of the non-editable image-text type images based on the final structured data. Based on the method, the non-editable image-text data can be efficiently and accurately converted into the editable document.

Description

Method and device for generating editable document based on non-editable image-text images

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for generating an editable document based on non-editable image-text images.

Background

Because the image-text images cannot be edited, the image-text data cannot be quickly changed on the basis of the existing elements. If the image-text data which cannot be edited is required to be saved as an editable document, manual production is still required from scratch, so that the utilization rate of the image-text data is low, and the manual production consumes time and labor. Therefore, how to efficiently and accurately convert the non-editable image-text type image into the editable document is a technical problem to be solved at present.

Disclosure of Invention

The invention provides a method and a device for generating an editable document based on a non-editable image-text image, which are used for solving the problems of more time consumption and low efficiency in the prior art of saving non-editable image-text data into an editable document and realizing the purpose of efficiently and accurately converting the non-editable image-text image into the editable document.

A method of generating an editable document based on a non-editable teletext image, the method comprising: acquiring an uneditable image-text image; extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size; generating initial structured data according to the outline characteristic and the text characteristic; determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model, the outline features and the text features; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features or text feature contour features and/or text features of two elements and relation labels of the two elements; supplementing the initial structured data based on the relationship between the two elements to obtain final structured data; and generating an editable document corresponding to the non-editable image-text images based on the final structured data.

In one embodiment, the pre-trained element relation classification model includes a plurality of pre-trained two-classification models, and accordingly, the determining a relation between two elements in the non-editable image-text image based on the pre-trained element relation classification model, the outline feature and the text feature includes: determining a classification result of the relationship between every two elements in the non-editable image-text images based on each pre-trained two-classification model, the contour features and the text features; and determining a final classification result of the relation between every two elements in the non-editable image-text based on the classification result with the maximum probability value in the plurality of determined classification results.

In one embodiment, the process of determining a pre-trained element relationship classification model after training based on a data set composed of contour features and/or text features of two elements and relationship labels of the two elements includes: acquiring contour features and text features of a plurality of non-editable image-text images; determining a data set based on the corresponding outline features and/or text features of every two elements in each non-editable image-text type image; determining a relationship label of every two elements based on the relationship of every two elements; determining samples of the data set as positive samples and negative samples based on the relationship labels of every two elements and the two classification models corresponding to the relationship labels; and training the corresponding two classification models based on the positive samples and the negative samples to obtain the pre-trained two classification models.

In one embodiment, the text feature in the non-editable image-text based on text detection and text recognition method includes: determining a text box and coordinates thereof included in the non-editable image-text images based on a preset text box detection algorithm; determining text content included in each text box based on a preset text recognition algorithm; determining the text color in each text box according to the coordinates of each text box and the pixel histogram in each text box; the size of the font within the text box is determined based on the coordinates of each text box.

In one embodiment, the determining the contour features in the non-editable image-text based on the contour detection and shape recognition method includes: determining at least one contour included in the non-editable image-text images based on a preset contour detection algorithm; identifying a shape of each of the at least one contour based on a shape recognition model of a pre-trained residual neural network; determining the relative size of the shape based on the size of the minimum circumscribed rectangle of the shape of each contour, and determining the position of each contour according to the coordinates of the preset position of the shape of each contour; and determining the color of each contour based on the color corresponding to the centroid coordinate of each contour.

In one embodiment, the method comprises the steps of determining at least one contour included in the non-editable image-text type image based on a preset contour detection algorithm; the method comprises the following steps: determining a set of outlines contained in the non-editable image-text images based on a preset outline detection algorithm; filtering out outlines overlapped with the text box according to the coordinates of each outline and the text box; at least one contour is determined based on the remaining contours.

In one embodiment, generating an editable document corresponding to the non-editable image-text based on the final structured data includes: acquiring and displaying the structured data; and generating an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position of the canvas based on the final structured data, and determining an initial editable document corresponding to the non-editable image-text type image.

In one embodiment, after creating the image corresponding to the outline feature and the text corresponding to the text feature at the respective positions of the canvas based on the final structured data, the method further comprises: and responding to the operation of a user, and adding, modifying or deleting the initial editable document.

The invention also provides a device for generating an editable document based on the non-editable image-text images, which comprises: the acquisition module is used for acquiring the non-editable image-text images; the determining module is used for extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size; the first generation module is used for generating structured data according to the outline characteristics and the text characteristics; and the second generation module is used for generating an editable document corresponding to the non-editable image-text images based on the final structured data.

The invention also provides computer equipment comprising a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for generating the editable document based on the non-editable image-text images.

The present invention also provides a storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above method of generating an editable document based on a non-editable teletext-like image.

According to the method and the device for generating the editable document based on the non-editable image-text, the elements and the corresponding attributes contained in the non-editable image-text can be determined by determining the outline features and the text features of the non-editable image-text, the initial structured data is generated according to the elements and the corresponding attributes, the relationship between the two elements is further determined based on a pre-trained element relationship classification model, the relationship between the two elements is supplemented into the initial structured data to obtain the final structured data, and the editable document corresponding to the non-editable image-text is generated based on the final structured data, so that the non-editable image-text is not converted into the editable document manually, and the non-editable image-text is efficiently and accurately converted into the editable document.

Drawings

Fig. 1 is a schematic flowchart of a method for generating an editable document based on a non-editable image-text image according to the present invention;

fig. 2 is a second schematic flowchart of a method for generating an editable document based on the non-editable image-text images according to the present invention;

fig. 3 is a third schematic flowchart of a method for generating an editable document based on a non-editable image-text image according to the present invention;

fig. 4 is a fourth schematic flowchart of a method for generating an editable document based on the non-editable image-text image according to the present invention;

FIG. 5 is a fifth flowchart illustrating a method for generating an editable document based on the non-editable graphics and text images according to the present invention;

fig. 6 is a sixth schematic flowchart of a method for generating an editable document based on the non-editable image-text image according to the present invention;

FIG. 7 is a schematic diagram of a graphics-context image provided by the present invention;

FIG. 8 is a schematic diagram of the initial structured data provided by the present invention;

FIG. 9 is a schematic diagram of the final structured data provided by the present invention;

fig. 10 is a schematic diagram of a display interface of a graphic editor provided by the present invention;

FIG. 11 is a block diagram of an apparatus for generating an editable document based on a non-editable graphics-text image according to the present invention;

fig. 12 is a schematic diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by one having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

For convenience of understanding, technical terms related to the present application are explained.

(1) Connected text box network (CTPN)

CTPN is a word detection algorithm proposed in ECCV 2016. The CTPN is a deep neural network combining a Convolutional Neural Network (CNN) and a long and short term memory network (LSTM), and can effectively detect transversely distributed characters in a complex scene.

In the invention, CTPN is used for detecting text boxes in non-editable image-text type images.

(2) Non-maximum inhibition (NMS)

Non-maximum suppression means suppressing elements which are not maximum values, and searching local maximum values. In the object detection algorithms (e.g., rcnn, sppnet, fast-rcnn, etc.) that are common in recent years, many rectangular frames that may be objects are finally found from one picture, and then a classification probability is made for each rectangular frame. And the NMS is configured to screen out a portion of the rectangular boxes from the plurality of rectangular boxes. In the invention, NMS is used for filtering the text box in the non-editable image-text images obtained by detection, so that the text box obtained by CTPN detection is closer to the text box in the original non-editable image-text images.

(3) Text line construction algorithm

The method and the device are used for connecting the text boxes filtered by the NMS so as to form the text detection box.

(4)OpenCV

OpenCV is a cross-platform computer vision and machine learning software library issued based on apache2.0 licensing (open source), which can run on Linux, windows, android, and Mac OS operating systems. OpenCV is light and efficient, is composed of a series of C functions and a small number of C + + classes, provides interfaces of languages such as Python, ruby, MATLAB and the like, and achieves a plurality of general algorithms in the aspects of image processing and computer vision.

(5) Convolutional Recurrent Neural Network (CRNN)

The method is mainly used for recognizing the text sequence with the indefinite length end to end, and the text recognition is converted into the sequence learning problem of time sequence dependence without cutting a single character, namely the sequence recognition based on the image.

(6) Connection Time Classification (CTC)

CTCs are used to solve the alignment problem of input data with a given tag, and can be used to perform end-to-end training, outputting a sequence result of indefinite length. It can be understood that, because there is a certain character interval between characters in the character image of the natural scene, and the character image may have problems such as deformation, etc., the same character has different expression forms, and the same character appears repeatedly in the character recognition result, so the CTC model may be used to remove the interval characters and the repeated characters in the character recognition result.

(7) Structured data

Structured data, also called row data, is data logically represented and implemented by a two-dimensional table structure, strictly following the data format and length specifications, and mainly stored and managed by a relational database.

The method for generating the editable document based on the non-editable image-text type image can solve the problem that the image-text type data can only be stored as an un-editable image, and meanwhile solves the problem that the non-editable image-text type data is required to be made from beginning when a user tries to edit the image-text type data in the document in the past, and is used for efficiently converting the non-editable image-text type data into the editable image-text type data.

The following describes a method and an apparatus for generating an editable document based on a non-editable image-text based on the present invention with reference to the drawings.

Fig. 1 is a schematic flowchart of a method for generating an editable document based on a non-editable image-text image according to the present invention. It will be appreciated that the method of generating an editable document based on a non-editable teletext image may be performed by an apparatus for generating an editable document based on a non-editable teletext image. The device for generating the editable document based on the non-editable image-text can be a computer device.

As shown in fig. 1, in an embodiment, a method for generating an editable document based on a non-editable image-text image is provided, which may specifically include the following steps:

and step 110, obtaining the non-editable image-text images.

Wherein, the image and text included in the image of the graphics and text. Wherein the image and the text have associated attributes. For example, the image has attributes such as the shape and position and size of an outline and the color within the outline, and the text has text box coordinates, text content, text color, and font size. Specifically, the teletext images can be referred to fig. 7 in the following text, and will not be described in detail here.

The non-editable graphics-text image is an image which cannot directly adjust the shape, color, size and color of the image or graph in the image, such as a picture in PNG or JPG format.

Step 120, extracting contour features from the non-editable image-text images; text features are extracted from the non-editable image-text type image.

The outline characteristics comprise the shape, the position, the size and the color in the outline; the text features include text box coordinates, text content, text color, and font size.

The outline detection and shape recognition method is used for detecting relevant attribute characteristics of the outline included in the non-editable image-text type image, such as shape, position, size and color in the outline. The text detection and recognition method is used for detecting relevant attribute characteristics of text included in the non-editable teletext-like image, such as text box coordinates, text content, text color and font size.

It will be appreciated that by determining the outline features and the text features, the elements and corresponding attributes contained in the teletext-like image can be determined, and the initial structured data generated from these elements and corresponding attributes.

At step 130, initial structured data is generated based on the outline features and the text features.

It will be appreciated that generating the initial structured data from the outline features and the text features facilitates subsequent determination of the final structured data from the initial structured data.

Specifically, the outline feature and the text feature may be stored in the form of a dictionary. For example, the dictionary S = { contour C1: (position coordinates, color attribute value, shape size), contour C2: (position coordinates, color attribute values, shape size),.. The contour Ck: (position coordinates, color attribute value, shape size). As another example, the image class shape J = { image 1: (image path, shape position coordinates, shape size), image 2: (image path, shape position coordinates, shape size),.. Image z: (image path, shape position coordinates, shape size). The contour C1, the contour C2, the contour Ck, and the image 1, the image 2, or the image z are each an element.

And step 140, determining the relationship between the two elements in the non-editable image-text images based on the pre-trained element relationship classification model and the outline characteristics and text characteristics.

The pre-trained element relation classification model is determined after training based on a data set composed of outline features or text feature outline features and/or text features of the two elements and relation labels of the two elements. Wherein the element is an outline or text included in the non-editable image of the image class, such as a graph, an image, a text box or text content.

Further, the relationship tags of two elements are used to identify a relationship between the two elements, which may be, for example, surrounding, containing, associated, or independent. Where surrounding is understood to mean that one element is around another element, e.g. the text is above the arrow. Inclusion may be understood as one element including another element, such as text within an outline. Association is understood to mean that there is contact between two elements, such as arrows connected to text boxes, arrows connected to other shapes, etc. Independent is understood to mean relationships other than the three types of relationships described above.

It can be understood that the initial structured data includes the shape, position, size of the outline, and the color, text box coordinates, text content, text color, font size, and other features in the outline, and in order to more comprehensively and accurately represent the relationship between the elements in the non-editable image-text class image, the relationship between two elements in the non-editable image-text class image can be further determined based on a pre-trained element relationship classification model, and the relationship between two elements is supplemented to the initial structured data, so as to obtain final structured data.

And 150, supplementing the initial structured data based on the relationship between the two elements to obtain final structured data.

The final structured data can be referred to fig. 9 later, and will not be described in detail here.

And step 160, generating an editable document corresponding to the non-editable image-text images based on the final structured data.

The editable document corresponding to the non-editable image-text is an image which can directly adjust the size and the color of the shape, the color and the font in the image, such as an image in a visio format.

The method for generating the editable document based on the non-editable image-text comprises the steps of determining outline features and text features of the non-editable image-text, determining elements and corresponding attributes contained in the non-editable image-text, generating initial structured data according to the elements and the corresponding attributes, further determining the relationship between the two elements based on a pre-trained element relationship classification model, supplementing the relationship between the two elements into the initial structured data to obtain final structured data, and further generating the editable document corresponding to the non-editable image-text based on the final structured data, so that the non-editable image-text is not converted into the editable document manually, and the non-editable image-text is efficiently and accurately converted into the editable document.

In one embodiment, the pre-trained element relationship classification model includes a plurality of binary classification models, and accordingly, as shown in fig. 2, the method for determining the relationship between two elements in the non-editable image-text image based on the pre-trained element relationship classification model, the contour feature and the text feature includes the following steps:

and step 210, determining a classification result of the relationship between the two elements in the non-editable image-text images based on each pre-trained two-classification model and the outline characteristics and the text characteristics.

The two-classification model is used to determine whether a relationship between two elements is a relationship corresponding to a preset relationship tag, for example, may be used to determine whether a relationship between two elements is an inclusion relationship. The binary model may be a Support Vector Machine (SVM).

The classification result may be, for example, that the probability that the relationship between two elements is an inclusion relationship is 90%, and the probability that the relationship between two elements is not an inclusion relationship is 10%.

In addition, assuming that a non-editable image-text image has t outlines and c texts, the classification result of the relationship corresponding to the [ (t + c) × (t + c-1)/2 ] element combinations is correspondingly calculated.

And step 220, determining a final classification result of the relationship between every two elements in the non-editable image-text images based on the classification result with the maximum probability value in the plurality of determined classification results.

It will be appreciated that there may be one binary model for each relationship label. Therefore, a plurality of relationship labels correspond to a plurality of binary models. Therefore, in order to determine the relationship between two elements, a classification result having the highest probability value among the classification results based on the plurality of two classification models may be determined as a final classification result. For example, the classification results of the multiple binary models on the relationship between two elements of the same group are respectively: if the probability of the inclusion relationship is 90%, the probability of the association relationship is 20%, the probability of the independent relationship is 15%, and the probability of the surrounding relationship is 10%, the classification result corresponding to 90% is determined as the final classification result, that is, the relationship between two elements is determined as the inclusion relationship.

In one embodiment, as shown in fig. 3, the process of determining a pre-trained element relationship classification model after training based on a data set composed of outline features and/or text features of two elements and relationship labels of the two elements includes the following steps:

in step 310, contour features and text features of a plurality of non-editable image-text images are obtained.

For the process of acquiring the outline features and text features of the non-editable image-text images, reference may be made to the related description in the foregoing, and for brevity, details are not described here again.

In addition, after the contour feature and the text feature are obtained, feature values in the contour feature and the text feature can be normalized, and the feature values can be encoded by adopting One-hot single-hot encoding.

Step 320, determining a data set based on the corresponding outline features and/or text features of every two elements in each non-editable image-text type image; and determining a relationship label for each two elements based on the relationship of each two elements.

It can be understood that, in order to determine the relationship between every two elements in each non-editable image-text class image, the outline feature and/or the text feature corresponding to every two elements contained in each non-editable image-text class image may be used as a sample, and the relationship between every two elements may be used as a label of the corresponding sample, so as to determine a data set for training the binary classification model.

In addition, after every two element relationship tags, one-hot unique encoding can be used to encode the relationship tags for subsequent processing.

And step 330, determining the samples of the data set as positive samples and negative samples based on the relationship labels of every two elements and the two classification models corresponding to the relationship labels.

It will be appreciated that there may be one binary model for each relationship label. Thus, the corresponding two-classification model may be, for example: the system comprises a two-classification model used for judging whether the relationship between two elements is a surrounding relationship or not, a two-classification model used for judging whether the relationship between two elements is an inclusion relationship or not, a two-classification model used for judging whether the relationship between two elements is an association relationship or not, or a two-classification model used for judging whether the relationship between two elements is an independent relationship or not.

Specifically, the samples having the same type of the relationship label corresponding to the two classification models may be classified as positive samples, and the samples having different types of the relationship label corresponding to the two classification models may be classified as negative samples. For example, for a binary classification model for determining whether a relationship between two elements is an association relationship, a sample in which the relationship between two elements is an association relationship is determined as a positive sample, and a sample in which the relationship between two elements is a surrounding relationship, an inclusion relationship, or an independent relationship is determined as a negative sample.

It can also be understood that, since there may be one binary model corresponding to each relationship label, the positive and negative examples of the binary model corresponding to each relationship label are different.

And 340, training the corresponding two classification models based on the positive samples and the negative samples to obtain pre-trained two classification models.

Specifically, the positive sample and the negative sample can be divided into a training set, a verification set and a test set according to a preset proportion, and then the corresponding two classification models are trained based on the training set to obtain the pre-trained two classification models.

It can be understood that, with reference to the related description of fig. 2, after the pre-trained two-classification model is obtained, the relationship between each two elements in the non-editable image-text image can be determined based on the pre-trained two-classification model.

In one embodiment, as shown in fig. 4, determining text features in a non-editable image based on a text detection and text recognition method includes the following steps:

and step 410, determining a text box and coordinates thereof included in the non-editable image-text images based on a preset text box detection algorithm.

Specifically, a CTPN text detection network can be used as a preset text box detection algorithm to perform text detection on the non-editable image-text images to obtain an initial text box, then an NMS algorithm is used to filter redundant text boxes in the initial text box, and finally a text line construction algorithm is combined to connect text boxes belonging to the same text sequence to obtain a connected text box.

It can be understood that because there may be no text in the non-editable image, if the corresponding initial text box is not detected based on the predetermined text box detection algorithm, the subsequent related processing on the initial text box is not required.

It can be further understood that before the CTPN text detection network is adopted to perform text detection on the non-editable image-text class images, the non-editable image-text class images can be preprocessed, so that subsequent detection on text boxes and characters is facilitated. For example, image graying and binarization can be performed on a non-editable image such as a text image. The graying can be realized by a function cv2.Cvtcolor of the OpenCV, and the binarization can be realized by a function cv2.Threshold of the OpenCV.

In addition, after the text box is obtained, in order to subsequently generate the text box according with the character inclination angle and the text box area, the connected text boxes need to be corrected, and finally, a text box coordinate set T = { T = of p text boxes contained in the non-editable image-text images is obtained ₁ ,T ₂ ,...,T _t }. It will be appreciated that the connected text boxes may be represented in the form of coordinates corresponding to the text boxes, such as coordinates of diagonal lines corresponding to the text boxes. Thus, where an element T within a set of coordinates T is _t With (x) ₁ ，y ₁ ，x ₂ ，y ₂ ) (x) is recorded ₁ ，y ₁ ，x ₂ ，y ₂ ) In x ₁ And x ₂ Abscissa, y, representing the diagonal to which the text box corresponds ₁ And y ₂ The ordinate of the diagonal line corresponding to the text box is represented.

Step 420, determining the text content included in each text box based on a preset text recognition algorithm.

Specifically, a CRNN text recognition algorithm may be adopted as a preset text recognition algorithm to recognize text contents included in each text box. Firstly, a MobileNetv3 network can be adopted in the CRNN text recognition algorithm to perform feature extraction on an image region corresponding to the input text detection box set T, so as to obtain a feature map. The height of the input image can be 32, the width can be any number larger than 0, and the height of the feature map is changed into 1 through a MobileNetv3 network. Secondly, an Im2Seq network layer can be adopted in the CRNN text recognition algorithm, and the obtained feature map is converted into a feature sequence shape to input a subsequent sequence model. And then inputting the obtained characteristic sequence into a BilSTM model, learning the characteristic sequence by the BilSTM model, and acquiring a model prediction label distribution result by utilizing a full-connection layer. And finally, inputting the distribution result of the predicted label into a CTC layer, and decoding to obtain the text content identified by the coordinate set T of the text box.

Step 430, determining the text color in each text box according to the coordinates of each text box and the pixel histogram in each text box.

Specifically, a pixel histogram in each text box can be obtained by combining the coordinates of each text box in the non-editable image-text type image according to the coordinate set T of the text box, so as to obtain a text color value.

Based on the coordinates of each text box, the size of the font within the text box is determined, step 440.

It will be appreciated that the height of each text box may be determined from the coordinates of each text box, and thus the relative size of the font within the text box may be determined from the height of each text box.

In one embodiment, as shown in fig. 5, determining the outline features in the non-editable image-text based on the outline detection and shape recognition method includes the following steps:

step 510, determining at least one contour included in the non-editable image-text images based on a preset contour detection algorithm.

Specifically, step 510 may include steps 5101 through 5103, which are described below in steps 5101 through 5103.

Step 5101, based on a preset contour detection algorithm, determining a set of contours included in the non-editable image-text images.

Specifically, the function cv2.Findcontours of OpenCV may be used to perform contour detection, and a set of contours included in the non-editable image-text class image is obtained. Wherein the set of profiles may be, for example, C = { C ₁ ，C ₂ ，...，C _c In which C is ₁ 、C ₂ 、C _c Each representing a profile.

It can be understood that before the contour detection is performed on the non-editable image-text, the non-editable image-text can be preprocessed, so that the subsequent contour detection is facilitated. For example, image graying and binarization can be performed on the non-editable image such as a text image.

In step 5102, contours coincident with the text box are filtered according to the coordinates of each contour and the text box.

It can be understood that the outlines obtained based on the preset outline detection algorithm include not only the outlines of some images but also the outlines of the text boxes, so that the outlines of the text boxes need to be removed, and the outlines of the images in the non-editable image-text images are obtained.

In particular, the set C = { C) for the profiles described above may be used ₁ ，C ₂ ，...，C _c The traversal is carried out, and the currently traversed contour is recorded as C _x Contour C for the current traversal _x Search for the contour C _x Whether the overlapping degree of each text box is larger than a preset threshold value or not, if so, deleting the outline C _x To obtain an updated set C of contours.

At least one contour is determined based on the remaining contours, step 5103.

Specifically, the contours in the updated set C of contours are connected, and specifically, a function approxplolydp of OpenCV may be used to perform contour approximation.

Step 520, identifying a shape of each of the at least one contour based on a shape recognition model of the pre-trained residual neural network.

The residual neural network may be, for example, resNet50. Specifically, the data set may be randomly divided to obtain a training set, a verification set, and a test set, the ResNet50 model is trained using the training set, and the verification set and the test set are respectively used to adjust parameters of the ResNet50 model and perform performance testing. And training a shape recognition model based on the residual error neural network by using the cross entropy of multi-label classification as a loss function to obtain the shape recognition model of the pre-trained residual error neural network.

It can be understood that the classification category corresponding to the shape recognition model of the pre-trained residual neural network may be predetermined, the classification category is determined according to a common shape, and the shape of the relevant classification category is set in the subsequent visual editing platform, so as to facilitate the generation of an editable shape. Specifically, shapes can be classified into three categories: an arrow shape class, a base shape class, and an image shape class. Wherein the arrow shape classes include: up arrow, down arrow, straight arrow, curved arrow, double arrow, straight line and curved line. The base shape class includes: circular, triangular, square, fan, oval, parallelogram, and diamond. The image shape classes include images that are non-arrowed, non-base classes presented within the teletext archive. The classification defining the shape recognition model is consistent with the shape type provided by the teletext editor of the invention.

It will also be appreciated that each contour may be pre-processed before its shape is identified, for example the contour may be data enhanced, including scaling down and up, color transformation, etc.

At step 530, the relative size of the shape is determined based on the size of the smallest bounding rectangle of the shape of each contour, and the position of each contour is determined from the coordinates of the preset position of the shape of each contour.

The coordinates of the preset position of the shape of each contour may be the position coordinates of the upper left corner and the lower right corner of the shape of each contour, or may be the coordinates of other contours capable of being identified on the shape of each contour.

It is understood that step 530 provides one possible method for determining the size and position of the outline, and other methods may be used for determining the size and position, which is not limited by the present invention.

At step 540, the coordinates of the centroid of each of the at least one contour are determined.

In particular, a set C = { C) for updated profiles may be used ₁ ，C ₂ ，...，C _c And (5) traversing, and calculating a centroid coordinate corresponding to each contour in the updated set C of contours as a centroid attribute value of the contour. Specifically, the centroid position of the specified contour can be obtained by calculating the first-order geometric moment by using a function cv2.Moments of OpenCV.

And 550, determining the color of each contour based on the color corresponding to the centroid coordinate of each contour.

Specifically, a color value of the centroid coordinate is obtained from the centroid coordinate of each contour as a color attribute value of the contour.

In one embodiment, as shown in fig. 6, generating an editable document corresponding to the non-editable image-text based on the final structured data includes the following steps:

step 610, acquiring and displaying the final structured data.

It will be appreciated that this step may be implemented by a teletext editor, and in particular the final structured data may be as shown in fig. 9 below, for example, and will not be described in detail here. The image-text editor can be an image-text editor which is built by self based on Electron.

And step 620, generating an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position of the canvas based on the final structured data, and determining an initial editable document corresponding to the non-editable image-text type image.

Specifically, when the editor renders the visualization interface, firstly, a shape is created according to the shape of the outline in the outline characteristic, then the position of the shape on the canvas is determined according to the position coordinates of the outline, then the shape color is refilled, and the shape size is adjusted. After the contour is created, creating a text box, creating the text box according to the coordinates of the text box in the structured data, generating recognized character contents in the text box, adjusting the color and the size of the characters, and storing the generated and created contents, thereby generating an initial editable document corresponding to the non-editable image-text images.

In one embodiment, after creating the image corresponding to the outline feature and the text corresponding to the text feature at the respective locations of the canvas, the method further comprises:

and responding to the operation of the user, and adding, modifying or deleting the initial editable document.

It can be understood that after the editing of the image-text editor is completed, the initial editable document can be stored into a format editable by other image-text editors, and further editing such as adding, modifying, deleting and the like is performed in other image-text editors, so that compatibility among the editors is realized. In one possible embodiment, the other editor may be Visio.

Fig. 7 is a schematic diagram of a teletext image. As shown in fig. 7, the image includes a CT scan image, an arrow, and a plurality of text contents and background graphics corresponding to the text contents.

Fig. 8 is a schematic diagram of the initial structured data corresponding to the teletext image of fig. 7. As shown in fig. 8, the graphics related information (maps), the image related information (configurations), and the text features (texts) included in the teletext-like image shown in fig. 7 are included.

Wherein the related information (clips) of the graphics and the related information (configurations) of the images may correspond to the contour features of the foregoing. The configuration corresponds to the CT scan image in fig. 7, wherein: [ { id: element 21, path: "fig1.Jpg", size: (a 21, w 21), position: ((x 41, y 41), (x 42, y 42)) } ], where "path: "fig1.Jpg" is a saving path of the image, "position: ((x 41, y 41), (x 42, y 42)) "is the relative position of the image in the non-editable teletext-like image," size: (a 21, w 21) "is the size of the image.

Further, { id: element 1, type: rightdirectionconnector, color: (r 1, g1, b 1), size: (a 1, w 1), position: ((x 1, y 1), (x 2, y 2)) } in "type: rightdirectionconnector "is the shape of the outline," color: (r 1, g1, b 1) "is the color of the outline," size: (a 1, w 1) "is the size of the outline," position: ((x 1, y 1), (x 2, y 2)) } "is the position of the outline.

It is to be understood that other items in the maps are similar to the foregoing case where the id is an item of element 1, and are not described herein again for brevity.

Similarly, { id: element 13, content: "image encoder", color: (r 13, g13, b 13), size: (a 13, w 13), position: ((x 25, y 25), (x 26, y 26)) } in "content: "image encoder" is the content of text, "color: (r 13, g13, b 13) "is the color of text," size: (a 13, w 13) "is the size of the font," position: ((x 25, y 25), (x 26, y 26)) "is the position coordinates of the text box. Other items in texts are similar to the aforementioned item whose id is element 13, and are not described here again for brevity.

Fig. 9 is a schematic diagram of the final structured data corresponding to the teletext image of fig. 7. As shown in fig. 9, in addition to the related information of the graphics, the related information of the images, and the text features included in the teletext-like image shown in fig. 8, a relationship between two elements is included. Taking element 1 as an example, the corresponding final structured data is: "self attribute: [ type: rightdirectionconnector, color: (r 1, g1, b 1), size: (a 1, w 1), position: ((x 1, y 1), (x 2, y 2)) ], a relationship [ [ combination object: element 2, combination relationship: association ], [ combination object: element 19, combination relationship: surround ], [ combination object: element 21, combination relationship: association ] ]. In comparison to FIG. 8, FIG. 9 shows that the final structured data has more elements and relationships between two elements related to element 1: "relationship [ [ combination object: element 2, combination relationship: association ], [ combination object: element 19, combination relationship: surround ], [ combination object: element 21, combination relationship: association ] ]".

Fig. 10 is a schematic diagram of a display interface of the teletext editor provided in the invention. As shown in fig. 10, an editable document corresponding to the non-editable image of the text and text can be opened and displayed, and some simple adjustments can be made to the content in the editable document, such as adjusting the size, color, and position of the font in the image, and selecting the thickness of the bar, and the layout of the entire text and text image in the editable document can be adjusted, and some new shapes can be added or modified.

The following describes an apparatus for generating an editable document based on a non-editable image-text image according to the present invention, and the apparatus for generating an editable document based on a non-editable image-text image described below and the method for generating an editable document based on a non-editable image-text image described above may be referred to in correspondence with each other.

As shown in fig. 11, in an embodiment, an apparatus for generating an editable document based on a non-editable graphics-text image is provided, and the apparatus for generating the editable document based on the non-editable graphics-text image may include:

an obtaining module 1110, configured to obtain a non-editable image-text image;

a first determining module 1120, configured to extract contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size;

a first generating module 1130, configured to generate initial structured data according to the outline feature and the text feature;

a second determining module 1140, configured to determine a relationship between two elements in the non-editable teletext-like image based on the pre-trained element relationship classification model and the outline feature and the text feature; the pre-trained element relation classification model is determined after training based on a data set consisting of outline features or text feature outline features and/or text features of two elements and relation labels of the two elements;

a supplement module 1150, configured to supplement the initial structured data with the two elements based on a relationship between the two elements, so as to obtain final structured data;

a second generating module 1160, configured to generate an editable document corresponding to the non-editable image-text based on the final structured data.

The device for generating the editable document based on the non-editable image-text can determine elements and corresponding attributes contained in the non-editable image-text by determining the outline characteristics and text characteristics of the non-editable image-text, generate initial structured data according to the elements and the corresponding attributes, further determine the relationship between the two elements based on a pre-trained element relationship classification model, supplement the relationship between the two elements into the initial structured data to obtain final structured data, and further generate the editable document corresponding to the non-editable image-text based on the final structured data, so that the non-editable image-text is not converted into the editable document manually, and the non-editable image-text is efficiently and accurately converted into the editable document.

In one embodiment, the second determining module 1140 comprises:

the first determining unit is used for determining a classification result of the relation between every two elements in the non-editable image-text images based on each pre-trained two classification models, the outline features and the text features;

and the second determining unit is used for determining a final classification result of the relationship between every two elements in the non-editable image-text images based on the classification result with the maximum probability value in the plurality of determined classification results.

In one embodiment, the second determining module 1140 further comprises:

the acquisition unit is used for acquiring the outline characteristics and the text characteristics of a plurality of non-editable image-text images;

a third determining unit, configured to determine a data set based on the outline features and/or text features corresponding to every two elements in each non-editable image-text type image; determining a relationship label of every two elements based on the relationship of every two elements;

a fourth determining unit, configured to determine samples of the data set as positive samples and negative samples based on the relationship labels of every two elements and the two classification models corresponding to the relationship labels;

and the training unit is used for training the corresponding two classification models based on the positive samples and the negative samples to obtain the pre-trained two classification models.

In one embodiment, the first determining module 1120 includes:

a fifth determining unit, configured to determine, based on a preset text box detection algorithm, a text box and coordinates thereof included in the non-editable image-text image;

a sixth determining unit, configured to determine text content included in each text box based on a preset text recognition algorithm;

a seventh determining unit, configured to determine a text color in each text box according to the coordinates of each text box and the pixel histogram in each text box;

an eighth determining unit configured to determine a size of a font in the text box based on the coordinates of each text box.

In one embodiment, the first determining module 1120 further comprises:

a ninth determining unit, configured to determine at least one contour included in the non-editable image-text image based on a preset contour detection algorithm;

a recognition unit for recognizing a shape of each of the at least one contour based on a pre-trained shape recognition model of a residual neural network;

a tenth determining unit, configured to determine a relative size of the shape based on a size of a minimum bounding rectangle of the shape of each contour, and determine a position of each contour according to coordinates of a preset position of the shape of each contour;

an eleventh determining unit, configured to determine a color of each contour based on a color corresponding to the centroid coordinate of each contour.

In one embodiment, the ninth determining unit includes:

a twelfth determining unit, configured to determine, based on a preset contour detection algorithm, a set of contours included in the non-editable image-text images;

the filtering unit is used for filtering out the outline superposed with the text box according to the coordinates of each outline and the text box;

a thirteenth determining unit for determining at least one contour based on the remaining contours.

In one embodiment, the second generation module 1160 includes:

the acquisition and display unit is used for acquiring and displaying the structured data;

and the generating unit is used for generating an image corresponding to the outline characteristic and a text corresponding to the text characteristic at a corresponding position of the canvas based on the final structured data, and determining an initial editable document corresponding to the non-editable image-text type image.

Fig. 12 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 12: a processor (processor) 1210, a communication interface (communications interface) 1220, a memory (memory) 1230 and a communication bus 1240, wherein the processor 1210, the communication interface 1220 and the memory 1230 communicate with each other via the communication bus 1240. Processor 1210 may invoke logic instructions in memory 1230 to perform a method of generating an editable document based on a non-editable teletext-like image, the method comprising: acquiring an uneditable image-text image; extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size; generating initial structured data according to the outline characteristic and the text characteristic; determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model, the outline features and the text features; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features or text feature contour features and/or text features of two elements and relation labels of the two elements; supplementing the initial structured data based on the relationship between the two elements to obtain final structured data; and generating an editable document corresponding to the non-editable image-text images based on the final structured data.

In addition, the logic instructions in the memory 1230 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method for generating an editable document based on a non-editable teletext image according to the present invention, the method for generating an editable document based on a non-editable teletext image comprising: acquiring an uneditable image-text image; extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size; generating initial structured data according to the outline characteristic and the text characteristic; determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model, the outline features and the text features; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features or text feature contour features and/or text features of two elements and relation labels of the two elements; supplementing the initial structured data based on the relationship between the two elements to obtain final structured data; and generating an editable document corresponding to the non-editable image-text images based on the final structured data.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a method for generating an editable document based on a non-editable teletext-like image according to the present invention, the method for generating the editable document based on the non-editable teletext-like image comprising: acquiring non-editable image-text images; extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size; generating initial structured data according to the outline characteristic and the text characteristic; determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model, the outline features and the text features; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features or text feature contour features and/or text features of two elements and relation labels of the two elements; supplementing the initial structured data based on the relationship between the two elements to obtain final structured data; and generating an editable document corresponding to the non-editable image-text images based on the final structured data.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

It should be understood that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of generating an editable document based on a non-editable teletext image, the method comprising:

acquiring an uneditable image-text image;

extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size;

generating initial structured data according to the outline characteristic and the text characteristic;

determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model, the outline features and the text features; the pre-trained element relation classification model is determined after training based on a data set consisting of contour features and/or text features of two elements and relation labels of the two elements;

supplementing the initial structured data based on the relationship between the two elements to obtain final structured data;

and generating an editable document corresponding to the non-editable image-text images based on the final structured data.

2. The method of claim 1, wherein the pre-trained element relationship classification model comprises a plurality of pre-trained binary classification models, and accordingly, determining the relationship between two elements in the non-editable image based on the pre-trained element relationship classification model and the outline feature and the text feature comprises:

determining a classification result of the relationship between every two elements in the non-editable image-text images based on each pre-trained two-classification model, the contour features and the text features;

and determining a final classification result of the relation between every two elements in the non-editable image-text based on the classification result with the maximum probability value in the plurality of determined classification results.

3. The method for generating an editable document based on the non-editable image-text type image as claimed in claim 2, wherein the process of determining the pre-trained element relation classification model after training based on the data set consisting of the outline features and/or the text features of the two elements and the relation labels of the two elements comprises the following steps:

acquiring contour features and text features of a plurality of non-editable image-text images;

determining a data set based on the corresponding outline features and/or text features of every two elements in each non-editable image-text type image; determining a relationship label of every two elements based on the relationship of every two elements;

determining samples of the data set as positive samples and negative samples based on the relationship labels of every two elements and the two classification models corresponding to the relationship labels;

and training corresponding two classification models based on the positive samples and the negative samples to obtain the pre-trained two classification models.

4. The method for generating an editable document based on a non-editable teletext image according to claim 1, wherein determining the text characteristics in the non-editable teletext image based on a text detection and text recognition method comprises:

determining a text box and coordinates thereof included in the non-editable image-text images based on a preset text box detection algorithm;

determining text content included in each text box based on a preset text recognition algorithm;

determining the text color in each text box according to the coordinates of each text box and the pixel histogram in each text box;

the size of the font within the text box is determined based on the coordinates of each text box.

5. The method for generating an editable document based on the non-editable image-text, according to claim 4, wherein the determining the contour feature in the non-editable image-text based on the contour detection and shape recognition method comprises:

determining at least one contour included in the non-editable image-text images based on a preset contour detection algorithm;

identifying a shape of each of the at least one contour based on a shape recognition model of a pre-trained residual neural network;

determining the relative size of the shape based on the size of the minimum circumscribed rectangle of the shape of each contour, and determining the position of each contour according to the coordinates of the preset position of the shape of each contour;

and determining the color of each contour based on the color corresponding to the centroid coordinate of each contour.

6. The method for generating an editable document based on non-editable images and texts, according to claim 5, wherein the at least one contour included in the non-editable images and texts is determined based on a preset contour detection algorithm; the method comprises the following steps:

determining a set of outlines contained in the non-editable image-text images based on a preset outline detection algorithm;

filtering out outlines overlapped with the text box according to the coordinates of each outline and the text box;

at least one contour is determined based on the remaining contours.

7. The method for generating an editable document based on the non-editable image-text, as set forth in claim 1, wherein generating the editable document corresponding to the non-editable image-text based on the final structured data comprises:

acquiring and displaying the final structured data;

and generating an image corresponding to the outline feature and a text corresponding to the text feature at a corresponding position of the canvas based on the final structured data, and determining an initial editable document corresponding to the non-editable image-text type image.

8. An apparatus for generating an editable document based on a non-editable teletext image, the apparatus comprising:

the acquisition module is used for acquiring the non-editable image-text images;

the first determining module is used for extracting contour features from the non-editable image-text images; extracting text features from the non-editable image-text images; wherein the contour features comprise the shape, position, size and color in the contour; the text features comprise text box coordinates, text content, text color and font size;

a first generation module, configured to generate initial structured data according to the outline feature and the text feature;

the second determining module is used for determining the relationship between two elements in the non-editable image-text images based on a pre-trained element relationship classification model and the outline characteristics and the text characteristics; the pre-trained element relation classification model is determined after training based on a data set consisting of outline features or text feature outline features and/or text features of two elements and relation labels of the two elements;

a supplement module for supplementing the initial structured data based on the relationship between the two elements to obtain final structured data;

and the second generation module is used for generating an editable document corresponding to the non-editable image-text images based on the final structured data.

9. A computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, which computer readable instructions, when executed by the processor, cause the processor to carry out the steps of the method of generating an editable document based on a non-editable teletext image according to any one of claims 1 to 7.

10. A storage medium having computer readable instructions stored thereon which, when executed by one or more processors, cause the one or more processors to perform the steps of a method of generating an editable document based on a non-editable teletext-like image according to any one of claims 1 to 7.