WO2021172699A1

WO2021172699A1 - System and method for determining importance of text block extracted from image

Info

Publication number: WO2021172699A1
Application number: PCT/KR2020/015822
Authority: WO
Inventors: 박지혁; 한예지; 장민성
Original assignee: 주식회사 와들
Priority date: 2020-02-27
Filing date: 2020-11-11
Publication date: 2021-09-02
Also published as: KR102374281B1; KR20210109146A

Abstract

The present invention relates to a system and a method for determining importance of a text block extracted from an image, the system comprising: a character recognition unit for extracting text from an input image; a character block unit for generating a text block by dividing the extracted text into sentence units; and a calculation unit for extracting a feature from each text block, designating a characteristic for a corresponding text block, and if a value of the designated characteristic for the text block exceeds a preconfigured threshold value, classifying the text block into a corresponding text block to be output. According to the present invention as described above: features may be extracted from a text block generated by recognizing texts included in an image; by performing binary classification on text to be output, via importance calculation for the extracted features, only text that matches the calculated importance from among the texts included in the image may be selected and output; and via linkage with a screen reader, only information necessary for online shopping for a visually impaired person may be provided by voice.

Description

System and method for determining the importance of text blocks extracted from images

The present invention relates to a system and method for determining the importance of a text block extracted from an image, and more particularly, to a technique for selecting and outputting only texts that match a calculated importance among texts recognized from an image.

Optical Character Reader (OCR) is a device that reads characters using light. It converts information in the form of letters, numbers, or other symbols printed on paper or handwritten into an electrical signal that is coded appropriately for a digital computer. device that converts.

A number of companies and research institutes are developing various OCR models, and these OCRs are being developed with a focus on recognizing all text in an image with precision, and in recent years it has become possible to recognize even small and blurry text.

However, as the conventional OCR recognizes even text irrelevant to important content, such as small text printed on the background, it is inconvenient to filter unnecessary text.

In addition, even when the text recognized through OCR is output through the TTS (Text To Speech) function, all text is read, and unnecessary text is transmitted at this time, making it difficult to convey the correct meaning to the listener. have.

[Prior art literature]

[Patent Document] Republic of Korea Patent Publication No. 10-2017-0010843 (published on February 1, 2017)

An object of the present invention is to extract a feature from a text block generated by recognizing texts included in an image, and perform binary classification on the output target text by calculating the importance of the extracted features, so that texts included in an image are The purpose is to select and output only texts that match the calculated importance.

An object of the present invention is to provide only noise-free product information as text when applied to a shopping mall image including product information by selecting and outputting only text that matches the calculated importance among texts included in an image, so that a screen reader and It is to guide only the necessary information by voice without checking the screen through the connection of

An object of the present invention is to remarkably reduce the operation speed for selecting an output target text block by selecting an output target text block according to the importance derived through supervised learning.

An embodiment of the present invention for solving these technical problems is a system for determining the importance of a text block extracted from an image, comprising: a character recognition unit for extracting text from an input image; a character block unit generating a text block by dividing the extracted text into sentence units; and an operation unit that extracts features from each text block to designate a characteristic for the text block, and classifies the text block into a corresponding output target text block when the specified text block characteristic value exceeds a preset threshold value. .

Preferably, the characteristic includes at least one attribute value among 'size, width, length, character reliability or inclination' for the text block.

The operation unit extracts features included in the text block authorized by the character block unit and designates characteristics for each text block, but when the text block characteristic value exceeds a preset threshold value, it is determined that the text block is an output target text block The block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the text block is labeled as '0' and filtered.

It characterized in that it further comprises a learning unit for giving a value between '0' to '1' for each text block characteristic value through supervised learning.

At this time, the operation unit applies the characteristic value of the text block labeled '0' to the learning unit, and the learning unit puts the characteristic value into the artificial neural network to label the text block again. And the text block labeled '1' is delivered to the output unit.

In addition, the method for determining the importance of a text block extracted from an image according to an embodiment of the present invention based on the system described above includes the steps of: (a) extracting text from an image received by a character recognition unit; (b) generating a text block by dividing the extracted text by the character block unit into sentences; (c) an operation unit extracting features from each text block and designating a characteristic for the text block; (d) determining, by an operation unit, whether the text block characteristic value exceeds a preset threshold value; (d) step (e) of labeling the text block as '1' by determining that the text block is an output target text block when the text block characteristic value exceeds a preset threshold value as a result of the determination in step (d); The operation unit outputs a text block labeled as '1', and (f) removing other text blocks.

Preferably, as a result of the determination in step (d), if the text block characteristic value does not exceed a preset threshold value, the learning unit applies the text block labeled '1' to the operation unit and proceeds to step (f) It is characterized in that it further comprises the step of (g).

According to an embodiment of the present invention as described above, by extracting features from a text block generated by recognizing texts included in an image, and performing binary classification on the output target text by calculating the importance of the extracted features, There is an effect of selecting only texts that match the calculated importance among texts included in the image.

According to the present invention, by selecting and outputting only text that matches the calculated importance among texts included in an image, when applied to a shopping mall image including product information, only product information without noise is provided as text. Through the linkage, only necessary information can be guided by voice without checking the screen.

According to the present invention, by selecting the output target text block according to the importance derived through supervised learning, the operation speed for selecting the output target text block can be significantly reduced.

1A and 1B are exemplary diagrams comparing a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

2 is a block diagram illustrating a detailed configuration of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

3A to 3E are exemplary diagrams illustrating a flow from character recognition of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention to a final result of selecting an output target text block;

4 is a block diagram illustrating a learning unit and an output unit added to the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

5A and 5B are exemplary views illustrating filtering of features included in a text block and learning of a neural network model through the operation unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

6 is an exemplary diagram illustrating classification of a text block filtered by the neural network model of the learning unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention as '0'.

7 is a flowchart illustrating a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

8 is a flowchart illustrating a process after step S708 of a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

The specific features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Prior to this, the terms or words used in the present specification and claims conform to the technical spirit of the present invention based on the principle that the inventor can appropriately define the concept in order to best describe his invention. should be interpreted as meanings and concepts. In addition, when it is determined that the detailed description of the well-known functions related to the present invention and its configuration may unnecessarily obscure the gist of the present invention, it should be noted that the detailed description is omitted.

1A and 1B are exemplary diagrams illustrating a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.

As shown in FIG. 1A , according to the conventional OCR, all text included in an image is recognized and a text area is extracted as a block, whereas according to the OCR engine according to an embodiment of the present invention shown in FIG. 1B, the importance is low. Text blocks can be selected (indicated by double lines) and filtered, and only text blocks with high importance can be selected as output target text blocks.

Hereinafter, with reference to FIGS. 2 to 6 , the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention selects only a text block with high importance as an output target text block by what structural characteristics Here's a look at it:

Referring to FIG. 2 , the system S for determining the importance of a text block extracted from an image according to an embodiment of the present invention includes a character recognition unit 100 , a character block unit 200 , and an operation unit 300 . .

The character recognition unit 100 extracts text from the received image, the character block unit 200 divides the extracted text into sentence units to generate a text block, and the operation unit 300 generates a feature from each text block. is extracted to designate a characteristic for the corresponding text block, and if the specified text block characteristic value exceeds a preset threshold value, it is classified as a corresponding output target text block.

At this time, the feature extracted from the text block by the operation unit 300 includes at least one attribute value among 'size, width, length, character reliability or slope', and such attribute value is another attribute included in the text block. It can be changed or added to a value.

Here, among the elements included in the feature, the character reliability is preferably understood as a numerical value indicating how well the recognized text matches the known text.

Hereinafter, a detailed description thereof will be omitted, but the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention is a PC, notebook, tablet or smartphone capable of communicating with a server connected through an information and communication network. It is built in any one device and is driven by an application that is distributed and installed online.

Hereinafter, the detailed configuration of the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention is as follows.

Specifically, the character recognition unit 100 individually extracts each text included in the input image, sequentially recognizes it, and applies the recognized text to the character block unit 200 .

In addition, the character block unit 200 sets a text block tying blocks of sentence units to include the coordinates of the start point of the text recognized by the character recognition unit 100 and the coordinates of the end point of the recognized text. In this case, the endpoint coordinates of the text may be specified as coordinates at which a period is positioned in the recognized text or endpoint coordinates when the recognized text is the last text of a sentence, but an embodiment of the present invention is not limited thereto.

Then, the operation unit 300 extracts features included in the text block authorized by the character block unit 200 to designate characteristics for each text block, and when the text block characteristic value exceeds a preset threshold value, output target text It is determined that it is a block and the corresponding text block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the corresponding text block is labeled as '0' and filtered.

In this case, the operation unit 300 performs labeling through binary classification in which 'important' or 'not important' in the image corresponding to the text block is set to '1' and '0', respectively. The evaluation index of this heuristic model consists of precision, recall, and accuracy.

Here, precision is the ratio of blocks that are actually 1 among text blocks classified as 1 by the heuristic model, recall is the ratio of blocks classified as 1 by the heuristic model among text blocks that are actually 1, and accuracy is the ratio of blocks classified as 1 by the heuristic model among all text blocks. The percentage of text blocks that fit this label.

In addition, the threshold value is set based on the score for each evaluation index of the heuristic model. When a plurality of specific test values exceed the arbitrarily set initial threshold value, the calculation unit 300 labels the test value as '1' and '1' It is determined by performing verification on the precision, recall, and accuracy of the test values labeled with '. In this case, for each verification, the initial threshold value may be increased or decreased by a preset value and repeated to correspond to the input number.

At this time, the verification counts the number of times that the test value labeled '1' is included in the range (spectrum) of the previously verified value (correct answer value), and among the initial threshold values, the counted value is the highest initial threshold value. It is preferable to understand that extraction is performed as a threshold value.

In addition, in the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention, the threshold value serving as a standard for labeling for each characteristic is the evaluation index ('size, width, length, character reliability) of the heuristic model. and slope') may be set to values of '400 to 600, 30 to 40, 30 to 40, 0.5 to 1, and 0.02 to 0.1', respectively.

In addition, the evaluation index of the above-described heuristic model is preferably set to '500, 35, 35, 0.75, and 0.05', respectively, but may be changed by the aforementioned test, and is not limited to a specific numerical value.

As described above, in the system S for determining the importance of a text block extracted from an image according to an embodiment of the present invention, the character recognition unit 100 receives an image (photo) as shown in FIG. 3A , and FIG. As shown in Fig. 3b, text is extracted from the image, and as shown in Fig. 3c, the text block unit 200 generates a text block from the extracted characters, and as shown in Fig. 3d, the operation unit 300 determines the importance level. Noise (double-line box) is removed by filtering the text block having a low value, and as shown in FIG. 3E , the operation unit 300 derives an output target text block as a final result.

On the other hand, as shown in FIG. 4, the importance determination system (S) of a text block extracted from an image according to an embodiment of the present invention classifies an output target text block through a heuristic model of the operation unit 300, The deep learning neural network model further includes a learning unit 400 for classifying an output target text block and an output unit 500 for outputting each text included in the output target text block as a pre-recorded voice through a speaker.

Specifically, the learning unit 400 receives the text block characteristic value filtered by the input layer labeled as '0' by the operation unit 300 and applies it to the hidden layer, and through supervised learning to each text block characteristic value in the output layer. A value between '0' and '1' is assigned to the text block, and a number with a large output value is set as the label of the text block. In this case, the closer the value is to '1', the better it is understood as an important text block.

In addition, the learning unit 400 applies the text block labeled '1' to the operation unit 300 through supervised learning, and is classified as an output target text block by the authorized operation unit 300 .

According to the configuration of the learning unit 400, an embodiment of the present invention, as shown in FIG. 5A , the calculation unit 300 includes features included in the text block ('size, width, length, character reliability and slope') to extract

Then, as shown in FIG. 5B, the neural network model is trained on text blocks except for the text block labeled '1', and accordingly, the importance of each feature value of the text block is determined to remove noise. can In this case, the text block not labeled with '1' as shown in FIG. 5B is labeled with '0' and filtered as shown in FIG. 6 .

On the other hand, as described above, the output target text block selection result according to an embodiment of the present invention is as follows with reference to the images shown in FIGS. 3A to 3E, 4 and 6 .

Instead of extracting all text blocks from the images shown in FIGS. 3A to 3E, 4 and 6 and selecting them as output objects as in the prior art, the phrase “Pairing just by attaching it” is a phrase expressing the performance of the tablet PC from the image. Only two sentences are classified as output targets: the sentence “It can be done and it can be recharged.”

In addition, 'messenger conversation contents' are classified as unrelated text blocks and excluded from output, and this series of procedures is performed by the character recognition unit 100, the character block unit 200, and the operation unit 300, Through the filtering function of the learning unit 400 additionally configured, the classification speed of the output target text block performed by the calculating unit 300 may also be improved.

Hereinafter, a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention will be described with reference to FIG. 7 .

First, the text recognition unit 100 extracts text from the input image (S702).

Next, the text block unit 200 divides the extracted text into sentence units to generate a text block (S704).

Subsequently, the operation unit 300 extracts features from each text block and designates the properties for the text block ( S706 ).

Next, the operation unit 300 determines whether the text block characteristic value exceeds a preset threshold value (S708).

As a result of the determination in step S708, when the text block characteristic value exceeds a preset threshold value, the operation unit 300 determines that the text block is an output target text block and labels the text block as '1' (S710).

Then, the operation unit 300 outputs a text block labeled '1', and removes other text blocks (S712).

Hereinafter, with reference to FIG. 8, the process after step S708 of the method for determining the importance of a text block extracted from an image according to an embodiment of the present invention will be described as follows.

As a result of the determination in step S708, if the text block characteristic value does not exceed a preset threshold value, the learning unit 400 labels the text block as '1' according to the importance derived through supervised learning (S802).

Then, the learning unit 400 applies the text block labeled '1' to the operation unit 300, and the procedure proceeds to step S712 (S804).

As described above, according to an embodiment of the present invention as described above, a feature is extracted from a text block generated by recognizing texts included in an image, and binary classification of the output target text is performed by calculating the importance of the extracted features. As it is performed, only texts that match the calculated importance among texts included in the image can be selected and output, and only information necessary for online shopping for the visually impaired can be provided by voice through linkage with a screen reader.

Although described and illustrated in relation to a preferred embodiment for illustrating the technical idea of the present invention above, the present invention is not limited to the configuration and operation as shown and described as such, and deviates from the scope of the technical idea. It will be apparent to those skilled in the art that many changes and modifications can be made to the invention without reference to the invention. Accordingly, all such suitable alterations and modifications and equivalents are to be considered as being within the scope of the present invention.

[Explanation of code]

S: System for determining the importance of text blocks extracted from images

100: character recognition unit

200: character block part

300: arithmetic unit

400: study unit

500: output unit

Claims

a character recognition unit for extracting text from the input image;

a character block unit generating a text block by dividing the extracted text into sentence units; and

An operation unit that extracts features from each text block to designate a characteristic for the text block, and classifies the text block into the corresponding output target text block when the specified text block characteristic value exceeds a preset threshold value

A system for determining the importance of a text block extracted from an image, comprising:
According to claim 1,

The feature is

The system for determining the importance of a text block extracted from an image, characterized in that it includes at least one attribute value among 'size, width, length, character reliability or inclination' for the text block.
According to claim 1,

The calculation unit,

Extracting the features included in the text block authorized by the text block unit and designating the properties for each text block,

When the text block property value exceeds a preset threshold value, it is determined that the text block is an output target text block, and the text block is labeled as '1'. A system for determining the importance of a text block extracted from an image, characterized in that it is filtered by labeling it as '0'.
According to claim 1,

A learning unit that assigns a value between '0' and '1' to each of the text block characteristic values through supervised learning

The importance judgment system of the text block extracted from the image, characterized in that it further comprises.
(a) extracting text from the image received by the character recognition unit;

(b) generating a text block by dividing the extracted text by the text block unit into sentence units;

(c) designating, by an operation unit, a characteristic for the text block by extracting a characteristic from each text block;

(d) determining, by the calculator, whether the text block characteristic value exceeds a preset threshold value;

(e) as a result of the determination in step (d), when the text block characteristic value exceeds a preset threshold value, the calculation unit determines that the text block is an output target text block and labels the text block as '1';

(f) the operation unit outputs a text block labeled '1' and removes other text blocks.

A method for determining the importance of a text block extracted from an image, comprising:
6. The method of claim 5,

As a result of the determination in step (d), if the text block characteristic value does not exceed a preset threshold value, the learning unit applies the text block labeled '1' to the operation unit and performs the procedure in step (f) ( g) step

A method for determining the importance of a text block extracted from an image, characterized in that it further comprises.