WO2021172699A1 - System and method for determining importance of text block extracted from image - Google Patents

System and method for determining importance of text block extracted from image Download PDF

Info

Publication number
WO2021172699A1
WO2021172699A1 PCT/KR2020/015822 KR2020015822W WO2021172699A1 WO 2021172699 A1 WO2021172699 A1 WO 2021172699A1 KR 2020015822 W KR2020015822 W KR 2020015822W WO 2021172699 A1 WO2021172699 A1 WO 2021172699A1
Authority
WO
WIPO (PCT)
Prior art keywords
text block
text
extracted
block
image
Prior art date
Application number
PCT/KR2020/015822
Other languages
French (fr)
Korean (ko)
Inventor
박지혁
한예지
장민성
Original Assignee
주식회사 와들
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 와들 filed Critical 주식회사 와들
Publication of WO2021172699A1 publication Critical patent/WO2021172699A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image

Definitions

  • the present invention relates to a system and method for determining the importance of a text block extracted from an image, and more particularly, to a technique for selecting and outputting only texts that match a calculated importance among texts recognized from an image.
  • OCR Optical Character Reader
  • An object of the present invention is to extract a feature from a text block generated by recognizing texts included in an image, and perform binary classification on the output target text by calculating the importance of the extracted features, so that texts included in an image are
  • the purpose is to select and output only texts that match the calculated importance.
  • An object of the present invention is to provide only noise-free product information as text when applied to a shopping mall image including product information by selecting and outputting only text that matches the calculated importance among texts included in an image, so that a screen reader and It is to guide only the necessary information by voice without checking the screen through the connection of
  • An object of the present invention is to remarkably reduce the operation speed for selecting an output target text block by selecting an output target text block according to the importance derived through supervised learning.
  • An embodiment of the present invention for solving these technical problems is a system for determining the importance of a text block extracted from an image, comprising: a character recognition unit for extracting text from an input image; a character block unit generating a text block by dividing the extracted text into sentence units; and an operation unit that extracts features from each text block to designate a characteristic for the text block, and classifies the text block into a corresponding output target text block when the specified text block characteristic value exceeds a preset threshold value.
  • the characteristic includes at least one attribute value among 'size, width, length, character reliability or inclination' for the text block.
  • the operation unit extracts features included in the text block authorized by the character block unit and designates characteristics for each text block, but when the text block characteristic value exceeds a preset threshold value, it is determined that the text block is an output target text block
  • the block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the text block is labeled as '0' and filtered.
  • the operation unit applies the characteristic value of the text block labeled '0' to the learning unit, and the learning unit puts the characteristic value into the artificial neural network to label the text block again. And the text block labeled '1' is delivered to the output unit.
  • the method for determining the importance of a text block extracted from an image includes the steps of: (a) extracting text from an image received by a character recognition unit; (b) generating a text block by dividing the extracted text by the character block unit into sentences; (c) an operation unit extracting features from each text block and designating a characteristic for the text block; (d) determining, by an operation unit, whether the text block characteristic value exceeds a preset threshold value; (d) step (e) of labeling the text block as '1' by determining that the text block is an output target text block when the text block characteristic value exceeds a preset threshold value as a result of the determination in step (d); The operation unit outputs a text block labeled as '1', and (f) removing other text blocks.
  • step (d) if the text block characteristic value does not exceed a preset threshold value, the learning unit applies the text block labeled '1' to the operation unit and proceeds to step (f) It is characterized in that it further comprises the step of (g).
  • the present invention by selecting and outputting only text that matches the calculated importance among texts included in an image, when applied to a shopping mall image including product information, only product information without noise is provided as text. Through the linkage, only necessary information can be guided by voice without checking the screen.
  • the operation speed for selecting the output target text block can be significantly reduced.
  • 1A and 1B are exemplary diagrams comparing a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a detailed configuration of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • 3A to 3E are exemplary diagrams illustrating a flow from character recognition of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention to a final result of selecting an output target text block;
  • FIG. 4 is a block diagram illustrating a learning unit and an output unit added to the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • 5A and 5B are exemplary views illustrating filtering of features included in a text block and learning of a neural network model through the operation unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • FIG. 6 is an exemplary diagram illustrating classification of a text block filtered by the neural network model of the learning unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention as '0'.
  • FIG. 7 is a flowchart illustrating a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • step S708 is a flowchart illustrating a process after step S708 of a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • FIGS. 1A and 1B are exemplary diagrams illustrating a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
  • the system (S) for determining the importance of a text block extracted from an image selects only a text block with high importance as an output target text block by what structural characteristics Here's a look at it:
  • the system S for determining the importance of a text block extracted from an image includes a character recognition unit 100 , a character block unit 200 , and an operation unit 300 . .
  • the character recognition unit 100 extracts text from the received image, the character block unit 200 divides the extracted text into sentence units to generate a text block, and the operation unit 300 generates a feature from each text block. is extracted to designate a characteristic for the corresponding text block, and if the specified text block characteristic value exceeds a preset threshold value, it is classified as a corresponding output target text block.
  • the feature extracted from the text block by the operation unit 300 includes at least one attribute value among 'size, width, length, character reliability or slope', and such attribute value is another attribute included in the text block. It can be changed or added to a value.
  • the character reliability is preferably understood as a numerical value indicating how well the recognized text matches the known text.
  • the system (S) for determining the importance of a text block extracted from an image is a PC, notebook, tablet or smartphone capable of communicating with a server connected through an information and communication network. It is built in any one device and is driven by an application that is distributed and installed online.
  • the character recognition unit 100 individually extracts each text included in the input image, sequentially recognizes it, and applies the recognized text to the character block unit 200 .
  • the character block unit 200 sets a text block tying blocks of sentence units to include the coordinates of the start point of the text recognized by the character recognition unit 100 and the coordinates of the end point of the recognized text.
  • the endpoint coordinates of the text may be specified as coordinates at which a period is positioned in the recognized text or endpoint coordinates when the recognized text is the last text of a sentence, but an embodiment of the present invention is not limited thereto.
  • the operation unit 300 extracts features included in the text block authorized by the character block unit 200 to designate characteristics for each text block, and when the text block characteristic value exceeds a preset threshold value, output target text It is determined that it is a block and the corresponding text block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the corresponding text block is labeled as '0' and filtered.
  • the operation unit 300 performs labeling through binary classification in which 'important' or 'not important' in the image corresponding to the text block is set to '1' and '0', respectively.
  • the evaluation index of this heuristic model consists of precision, recall, and accuracy.
  • precision is the ratio of blocks that are actually 1 among text blocks classified as 1 by the heuristic model
  • recall is the ratio of blocks classified as 1 by the heuristic model among text blocks that are actually 1
  • accuracy is the ratio of blocks classified as 1 by the heuristic model among all text blocks. The percentage of text blocks that fit this label.
  • the threshold value is set based on the score for each evaluation index of the heuristic model.
  • the calculation unit 300 labels the test value as '1' and '1' It is determined by performing verification on the precision, recall, and accuracy of the test values labeled with '.
  • the initial threshold value may be increased or decreased by a preset value and repeated to correspond to the input number.
  • the verification counts the number of times that the test value labeled '1' is included in the range (spectrum) of the previously verified value (correct answer value), and among the initial threshold values, the counted value is the highest initial threshold value. It is preferable to understand that extraction is performed as a threshold value.
  • the threshold value serving as a standard for labeling for each characteristic is the evaluation index ('size, width, length, character reliability) of the heuristic model. and slope') may be set to values of '400 to 600, 30 to 40, 30 to 40, 0.5 to 1, and 0.02 to 0.1', respectively.
  • the evaluation index of the above-described heuristic model is preferably set to '500, 35, 35, 0.75, and 0.05', respectively, but may be changed by the aforementioned test, and is not limited to a specific numerical value.
  • the character recognition unit 100 receives an image (photo) as shown in FIG. 3A , and FIG.
  • image is extracted from the image
  • Fig. 3c text is extracted from the image
  • Fig. 3c the text block unit 200 generates a text block from the extracted characters
  • Fig. 3d the operation unit 300 determines the importance level.
  • Noise double-line box
  • FIG. 3E the operation unit 300 derives an output target text block as a final result.
  • the importance determination system (S) of a text block extracted from an image classifies an output target text block through a heuristic model of the operation unit 300
  • the deep learning neural network model further includes a learning unit 400 for classifying an output target text block and an output unit 500 for outputting each text included in the output target text block as a pre-recorded voice through a speaker.
  • the learning unit 400 receives the text block characteristic value filtered by the input layer labeled as '0' by the operation unit 300 and applies it to the hidden layer, and through supervised learning to each text block characteristic value in the output layer.
  • a value between '0' and '1' is assigned to the text block, and a number with a large output value is set as the label of the text block. In this case, the closer the value is to '1', the better it is understood as an important text block.
  • the learning unit 400 applies the text block labeled '1' to the operation unit 300 through supervised learning, and is classified as an output target text block by the authorized operation unit 300 .
  • the calculation unit 300 includes features included in the text block ('size, width, length, character reliability and slope') to extract
  • the neural network model is trained on text blocks except for the text block labeled '1', and accordingly, the importance of each feature value of the text block is determined to remove noise.
  • the text block not labeled with '1' as shown in FIG. 5B is labeled with '0' and filtered as shown in FIG. 6 .
  • the output target text block selection result according to an embodiment of the present invention is as follows with reference to the images shown in FIGS. 3A to 3E, 4 and 6 .
  • the phrase “Pairing just by attaching it” is a phrase expressing the performance of the tablet PC from the image. Only two sentences are classified as output targets: the sentence “It can be done and it can be recharged.”
  • ' messenger conversation contents' are classified as unrelated text blocks and excluded from output, and this series of procedures is performed by the character recognition unit 100, the character block unit 200, and the operation unit 300, Through the filtering function of the learning unit 400 additionally configured, the classification speed of the output target text block performed by the calculating unit 300 may also be improved.
  • the text recognition unit 100 extracts text from the input image (S702).
  • the text block unit 200 divides the extracted text into sentence units to generate a text block (S704).
  • the operation unit 300 extracts features from each text block and designates the properties for the text block ( S706 ).
  • the operation unit 300 determines whether the text block characteristic value exceeds a preset threshold value (S708).
  • step S708 when the text block characteristic value exceeds a preset threshold value, the operation unit 300 determines that the text block is an output target text block and labels the text block as '1' (S710).
  • the operation unit 300 outputs a text block labeled '1', and removes other text blocks (S712).
  • step S708 of the method for determining the importance of a text block extracted from an image according to an embodiment of the present invention will be described as follows.
  • step S708 if the text block characteristic value does not exceed a preset threshold value, the learning unit 400 labels the text block as '1' according to the importance derived through supervised learning (S802).
  • step S712 S804
  • a feature is extracted from a text block generated by recognizing texts included in an image, and binary classification of the output target text is performed by calculating the importance of the extracted features. As it is performed, only texts that match the calculated importance among texts included in the image can be selected and output, and only information necessary for online shopping for the visually impaired can be provided by voice through linkage with a screen reader.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to a system and a method for determining importance of a text block extracted from an image, the system comprising: a character recognition unit for extracting text from an input image; a character block unit for generating a text block by dividing the extracted text into sentence units; and a calculation unit for extracting a feature from each text block, designating a characteristic for a corresponding text block, and if a value of the designated characteristic for the text block exceeds a preconfigured threshold value, classifying the text block into a corresponding text block to be output. According to the present invention as described above: features may be extracted from a text block generated by recognizing texts included in an image; by performing binary classification on text to be output, via importance calculation for the extracted features, only text that matches the calculated importance from among the texts included in the image may be selected and output; and via linkage with a screen reader, only information necessary for online shopping for a visually impaired person may be provided by voice.

Description

이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템 및 그 방법System and method for determining the importance of text blocks extracted from images
본 발명은 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 이미지로부터 인식한 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별해 출력하는 기술에 관한 것이다.The present invention relates to a system and method for determining the importance of a text block extracted from an image, and more particularly, to a technique for selecting and outputting only texts that match a calculated importance among texts recognized from an image.
광학 문자 판독 장치(OCR : Optical Character Reader)는 빛을 이용해 문자를 판독하는 장치로, 종이에 인쇄되거나 손으로 쓴 문자, 숫자 또는 다른 기호의 형태가 갖는 정보로부터 디지털 컴퓨터에 알맞게 부호화된 전기신호로 변환하는 장치를 일컫는다. Optical Character Reader (OCR) is a device that reads characters using light. It converts information in the form of letters, numbers, or other symbols printed on paper or handwritten into an electrical signal that is coded appropriately for a digital computer. device that converts.
다수의 기업이나 연구소에서 다양한 OCR 모델을 개발하고 있으며, 이러한 OCR은 이미지 내 모든 텍스트를 정밀하게 인식하는 것에 초첨을 맞추어 개발되고 있어 근래에는 작고 흐릿한 글씨까지 인식하는 것이 가능해졌다.A number of companies and research institutes are developing various OCR models, and these OCRs are being developed with a focus on recognizing all text in an image with precision, and in recent years it has become possible to recognize even small and blurry text.
그러나, 종래의 OCR은 배경에 인쇄된 작은 텍스트 등 중요 컨텐츠와 무관한 텍스트까지 인식함에 따라 불필요한 텍스트를 필터링 해야하는 번거로움이 있다.However, as the conventional OCR recognizes even text irrelevant to important content, such as small text printed on the background, it is inconvenient to filter unnecessary text.
또한, OCR을 통해 인식한 텍스트를 TTS(Text To Speech) 기능을 통해 출력하는 경우에도, 모든 텍스트를 읽어주게 되고 이때 불필요한 텍스트까지 전달하게 되어 청자(聽者)에게 정확한 의미를 전달하기 어려운 문제점이 있다.In addition, even when the text recognized through OCR is output through the TTS (Text To Speech) function, all text is read, and unnecessary text is transmitted at this time, making it difficult to convey the correct meaning to the listener. have.
[선행기술문헌][Prior art literature]
[특허문헌] 대한민국 공개특허 제10-2017-0010843호(2017.02.01.공개)[Patent Document] Republic of Korea Patent Publication No. 10-2017-0010843 (published on February 1, 2017)
본 발명의 목적은, 이미지에 포함된 텍스트들을 인식하여 생성한 텍스트 블록으로부터 특징을 추출하고, 추출한 특징들에 대한 중요도 계산을 통해 출력 대상 텍스트에 대한 이진분류를 수행함으로써, 이미지에 포함된 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별해 출력하도록 하는데 있다.An object of the present invention is to extract a feature from a text block generated by recognizing texts included in an image, and perform binary classification on the output target text by calculating the importance of the extracted features, so that texts included in an image are The purpose is to select and output only texts that match the calculated importance.
본 발명의 목적은, 이미지에 포함된 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별해 출력함으로써, 상품 정보가 포함된 쇼핑몰 이미지에 적용시 노이즈 없는 상품 정보만을 텍스트로 제공함에 따라, 스크린 리더기와의 연계를 통해 화면을 확인하지 않고도 필요한 정보만을 음성으로 안내하는데 있다.An object of the present invention is to provide only noise-free product information as text when applied to a shopping mall image including product information by selecting and outputting only text that matches the calculated importance among texts included in an image, so that a screen reader and It is to guide only the necessary information by voice without checking the screen through the connection of
본 발명의 목적은, 지도학습을 통해 도출된 중요도에 따라 출력 대상 텍스트 블록을 선별함으로써, 출력대상 텍스트 블록 선별을 위한 연산속도를 현저히 단축시키는데 있다.An object of the present invention is to remarkably reduce the operation speed for selecting an output target text block by selecting an output target text block according to the importance derived through supervised learning.
이러한 기술적 과제를 해결하기 위한 본 발명의 일 실시예는 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템으로서, 입력받은 이미지로부터 텍스트를 추출하는 문자 인식부; 추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성하는 문자 블록부; 및 텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정하고, 지정한 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 해당 출력대상 텍스트 블록으로 분류하는 연산부를 포함하는 것을 특징으로 한다.An embodiment of the present invention for solving these technical problems is a system for determining the importance of a text block extracted from an image, comprising: a character recognition unit for extracting text from an input image; a character block unit generating a text block by dividing the extracted text into sentence units; and an operation unit that extracts features from each text block to designate a characteristic for the text block, and classifies the text block into a corresponding output target text block when the specified text block characteristic value exceeds a preset threshold value. .
바람직하게는, 특징은 텍스트 블록에 대한 '크기, 너비, 길이, 글자 신뢰도 또는 기울기' 중에 적어도 어느 하나의 속성값을 포함하는 것을 특징으로 한다.Preferably, the characteristic includes at least one attribute value among 'size, width, length, character reliability or inclination' for the text block.
연산부는, 문자 블록부로부터 인가받은 텍스트 블록에 포함된 특징을 추출하여 텍스트 블록별로 특성을 지정하되, 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링하고, 텍스트 블록 특성값이 기 설정된 역치값 이하인 경우, 해당 텍스트 블록을 '0'으로 라벨링하여 필터링하는 것을 특징으로 한다.The operation unit extracts features included in the text block authorized by the character block unit and designates characteristics for each text block, but when the text block characteristic value exceeds a preset threshold value, it is determined that the text block is an output target text block The block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the text block is labeled as '0' and filtered.
지도학습을 통해 텍스트 블록 특성값 각각에 대해 '0' 내지 '1' 사이의 값을 부여하는 학습부를 더 포함하는 것을 특징으로 한다.It characterized in that it further comprises a learning unit for giving a value between '0' to '1' for each text block characteristic value through supervised learning.
이때, 연산부는 '0'으로 라벨링된 텍스트 블록의 특성값을 학습부에 인가하고, 학습부는 인공 신경망에 특성값을 넣어 텍스트 블록을 다시 라벨링한다. 그리고 '1'로 라벨링된 텍스트 블록은 출력부로 전달된다.At this time, the operation unit applies the characteristic value of the text block labeled '0' to the learning unit, and the learning unit puts the characteristic value into the artificial neural network to label the text block again. And the text block labeled '1' is delivered to the output unit.
그리고, 전술한 시스템을 기반으로 하는 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법은, 문자 인식부가 입력받은 이미지로부터 텍스트를 추출하는 (a) 단계; 문자 블록부가 추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성하는 (b) 단계; 연산부가 텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정하는 (c) 단계; 연산부가 텍스트 블록 특성값이 기 설정된 역치값을 초과하는지 여부를 판단하는 (d) 단계; (d) 단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 연산부가 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링하는 (e) 단계; 연산부가 '1'로 라벨링된 텍스트 블록을 출력하고, 그 밖에 텍스트 블록은 제거하는 (f) 단계를 포함하는 것을 특징으로 한다.In addition, the method for determining the importance of a text block extracted from an image according to an embodiment of the present invention based on the system described above includes the steps of: (a) extracting text from an image received by a character recognition unit; (b) generating a text block by dividing the extracted text by the character block unit into sentences; (c) an operation unit extracting features from each text block and designating a characteristic for the text block; (d) determining, by an operation unit, whether the text block characteristic value exceeds a preset threshold value; (d) step (e) of labeling the text block as '1' by determining that the text block is an output target text block when the text block characteristic value exceeds a preset threshold value as a result of the determination in step (d); The operation unit outputs a text block labeled as '1', and (f) removing other text blocks.
바람직하게는, (d) 단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하지 않는 경우, 학습부가 '1'로 라벨링된 텍스트 블록을 연산부로 인가하고 (f) 단계로 절차를 이행하는 (g) 단계를 더 포함하는 것을 특징으로 한다.Preferably, as a result of the determination in step (d), if the text block characteristic value does not exceed a preset threshold value, the learning unit applies the text block labeled '1' to the operation unit and proceeds to step (f) It is characterized in that it further comprises the step of (g).
상기와 같은 본 발명의 일 실시예에 따르면, 이미지에 포함된 텍스트들을 인식하여 생성한 텍스트 블록으로부터 특징을 추출하고, 추출한 특징들에 대한 중요도 계산을 통해 출력 대상 텍스트에 대한 이진분류를 수행함으로써, 이미지에 포함된 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별하는 효과가 있다.According to an embodiment of the present invention as described above, by extracting features from a text block generated by recognizing texts included in an image, and performing binary classification on the output target text by calculating the importance of the extracted features, There is an effect of selecting only texts that match the calculated importance among texts included in the image.
본 발명에 따르면, 이미지에 포함된 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별해 출력함으로써, 상품 정보가 포함된 쇼핑몰 이미지에 적용시 노이즈 없는 상품 정보만을 텍스트로 제공함에 따라, 스크린 리더기와의 연계를 통해 화면을 확인하지 않고도 필요한 정보만을 음성으로 안내할 수 있다.According to the present invention, by selecting and outputting only text that matches the calculated importance among texts included in an image, when applied to a shopping mall image including product information, only product information without noise is provided as text. Through the linkage, only necessary information can be guided by voice without checking the screen.
본 발명에 따르면, 지도학습을 통해 도출된 중요도에 따라 출력 대상 텍스트 블록을 선별함으로써, 출력대상 텍스트 블록 선별을 위한 연산속도를 현저히 단축시킬 수 있다.According to the present invention, by selecting the output target text block according to the importance derived through supervised learning, the operation speed for selecting the output target text block can be significantly reduced.
도 1a 및 도 1b 는 종래의 OCR을 통해 추출된 텍스트 블록과, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템에 의해 추출된 텍스트 블록을 비교한 예시도.1A and 1B are exemplary diagrams comparing a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 2는 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템의 세부구성을 도시한 블록도.2 is a block diagram illustrating a detailed configuration of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 3a 내지 도 3e 는 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템의 문자 인식으로부터 출력대상 텍스트 블록을 선별하는 최종 결과에 이르는 흐름을 도시한 예시도.3A to 3E are exemplary diagrams illustrating a flow from character recognition of a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention to a final result of selecting an output target text block;
도 4는 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템에 부가되는 학습부 및 출력부를 도시한 블록도.4 is a block diagram illustrating a learning unit and an output unit added to the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 5a 및 도 5b 는 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템의 연산부를 통해 텍스트 블록에 포함된 특징을 필터링하고, 신경망 모델의 학습을 수행하는 것을 도시한 예시도.5A and 5B are exemplary views illustrating filtering of features included in a text block and learning of a neural network model through the operation unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 6은 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템의 학습부의 신경망 모델에 의해 필터링된 텍스트 블록을 '0'으로 분류하는 것을 도시한 예시도.6 is an exemplary diagram illustrating classification of a text block filtered by the neural network model of the learning unit of the system for determining the importance of a text block extracted from an image according to an embodiment of the present invention as '0'.
도 7은 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법을 도시한 순서도.7 is a flowchart illustrating a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 8은 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법의 제S708단계 이후 과정을 도시한 순서도.8 is a flowchart illustrating a process after step S708 of a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
본 발명의 구체적인 특징 및 이점들은 첨부 도면에 의거한 다음의 상세한 설명으로 더욱 명백해질 것이다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 그 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 할 것이다. 또한, 본 발명에 관련된 공지 기능 및 그 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는, 그 구체적인 설명을 생략하였음에 유의해야 할 것이다.The specific features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Prior to this, the terms or words used in the present specification and claims conform to the technical spirit of the present invention based on the principle that the inventor can appropriately define the concept in order to best describe his invention. should be interpreted as meanings and concepts. In addition, when it is determined that the detailed description of the well-known functions related to the present invention and its configuration may unnecessarily obscure the gist of the present invention, it should be noted that the detailed description is omitted.
도 1a 및 도 1b는 종래의 OCR을 통해 추출된 텍스트 블록과, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템에 의해 추출된 텍스트 블록을 도시한 예시도이다.1A and 1B are exemplary diagrams illustrating a text block extracted through conventional OCR and a text block extracted by a system for determining the importance of a text block extracted from an image according to an embodiment of the present invention.
도 1a 에 도시된 바와 같이 종래의 OCR에 의하면 이미지에 포함된 모든 텍스트를 인식하여 텍스트 영역을 블록으로 추출하는데 반해, 도 1b 에 도시된 본 발명의 일 실시예에 따른 OCR 엔진에 의하면 중요도가 낮은 텍스트 블록을 선별하여(이중선으로 표시) 필터링하고, 중요도가 높은 텍스트 블록만을 출력 대상 텍스트 블록으로 선별할 수 있다.As shown in FIG. 1A , according to the conventional OCR, all text included in an image is recognized and a text area is extracted as a block, whereas according to the OCR engine according to an embodiment of the present invention shown in FIG. 1B, the importance is low. Text blocks can be selected (indicated by double lines) and filtered, and only text blocks with high importance can be selected as output target text blocks.
이하, 도 2 내지 도 6을 참조하여 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)이 어떠한 구성적 특징에 의해 중요도가 높은 텍스트 블록만을 출력 대상 텍스트 블록으로 선별하는지에 대해 살피면 아래와 같다.Hereinafter, with reference to FIGS. 2 to 6 , the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention selects only a text block with high importance as an output target text block by what structural characteristics Here's a look at it:
도 2를 참조하면 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)은, 문자 인식부(100), 문자 블록부(200) 및 연산부(300)를 포함하여 구성된다.Referring to FIG. 2 , the system S for determining the importance of a text block extracted from an image according to an embodiment of the present invention includes a character recognition unit 100 , a character block unit 200 , and an operation unit 300 . .
이러한, 문자 인식부(100)는 입력받은 이미지로부터 텍스트를 추출하고, 문자 블록부(200)는 추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성하며, 연산부(300)는 텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정하고, 지정한 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 해당 출력대상 텍스트 블록으로 분류한다.The character recognition unit 100 extracts text from the received image, the character block unit 200 divides the extracted text into sentence units to generate a text block, and the operation unit 300 generates a feature from each text block. is extracted to designate a characteristic for the corresponding text block, and if the specified text block characteristic value exceeds a preset threshold value, it is classified as a corresponding output target text block.
이때, 연산부(300)가 텍스트 블록으로부터 추출하는 특징은 '크기, 너비, 길이, 글자 신뢰도 또는 기울기' 중에 적어도 어느 하나의 속성값을 포함하며, 이와 같은 속성값은 텍스트 블록에 포함된 또 다른 속성값으로 변경되거나 추가될 수 있다.At this time, the feature extracted from the text block by the operation unit 300 includes at least one attribute value among 'size, width, length, character reliability or slope', and such attribute value is another attribute included in the text block. It can be changed or added to a value.
여기서, 특징에 포함된 요소 중에 글자 신뢰도는, 인식된 텍스트가 알려진 텍스트와 얼마나 일치하는지를 수치화 한 값으로 이해함이 바람직하다.Here, among the elements included in the feature, the character reliability is preferably understood as a numerical value indicating how well the recognized text matches the known text.
이하에서는 그 구체적인 언급을 생략하겠으나, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)은, 정보통신망을 통해 접속된 서버와 통신이 가능한 PC, 노트북, 태블릿 또는 스마트폰 중에 어느 하나의 디바이스에 내장되고, 온라인을 통해 배포되어 설치되는 어플리케이션에 의해 구동된다.Hereinafter, a detailed description thereof will be omitted, but the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention is a PC, notebook, tablet or smartphone capable of communicating with a server connected through an information and communication network. It is built in any one device and is driven by an application that is distributed and installed online.
이하, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)의 세부구성에 대해 살피면 아래와 같다.Hereinafter, the detailed configuration of the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention is as follows.
구체적으로, 문자 인식부(100)는 입력받은 이미지에 포함된 텍스트 각각을 개별적으로 추출하여 순차적으로 인식하고, 인식된 텍스트를 문자 블록부(200)로 인가한다.Specifically, the character recognition unit 100 individually extracts each text included in the input image, sequentially recognizes it, and applies the recognized text to the character block unit 200 .
또한, 문자 블록부(200)는 문자 인식부(100)에 의해 인식된 텍스트의 시작점 좌표와 인식된 텍스트의 끝점 좌표를 포함하도록 문장 단위의 블록을 묶는 텍스트 블록을 설정한다. 이때, 텍스트의 끝점 좌표는 인식된 텍스트에 마침표가 위치하는 좌표 또는 인식된 텍스트가 문장의 마지막 텍스트인 경우 끝점 좌표인 것으로 특정할 수 있으나, 본 발명의 일 실시예가 이에 국한되는 것은 아니다.In addition, the character block unit 200 sets a text block tying blocks of sentence units to include the coordinates of the start point of the text recognized by the character recognition unit 100 and the coordinates of the end point of the recognized text. In this case, the endpoint coordinates of the text may be specified as coordinates at which a period is positioned in the recognized text or endpoint coordinates when the recognized text is the last text of a sentence, but an embodiment of the present invention is not limited thereto.
그리고, 연산부(300)는 문자 블록부(200)로부터 인가받은 텍스트 블록에 포함된 특징을 추출하여 텍스트 블록별로 특성을 지정하고, 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링하고, 텍스트 블록 특성값이 기 설정된 역치값 이하인 경우, 해당 텍스트 블록을 '0'으로 라벨링하여 필터링 한다.Then, the operation unit 300 extracts features included in the text block authorized by the character block unit 200 to designate characteristics for each text block, and when the text block characteristic value exceeds a preset threshold value, output target text It is determined that it is a block and the corresponding text block is labeled as '1', and when the text block characteristic value is less than or equal to a preset threshold value, the corresponding text block is labeled as '0' and filtered.
이때, 연산부(300)는 텍스트 블록과 대응하는 이미지에서 '중요하다', 또는 '중요하지 않다' 각각을 '1' 과 '0'로 설정하는 이진 분류를 통해 라벨링을 수행하는데, 라벨링의 판단 기준이 되는 휴리스틱 모델의 평가지표는 정밀도(precision), 재현율(recall) 및 정확도(accuracy)를 포함하여 구성된다.In this case, the operation unit 300 performs labeling through binary classification in which 'important' or 'not important' in the image corresponding to the text block is set to '1' and '0', respectively. The evaluation index of this heuristic model consists of precision, recall, and accuracy.
여기서, 정밀도는 휴리스틱 모델이 1로 분류한 텍스트 블록 중에서 실제 1인 블록의 비율이고, 재현율은 실제 1인 텍스트 블록 중에서 휴리스틱 모델이 1로 분류한 블록의 비율이며, 정확도는 전체 텍스트 블록 중에서 휴리스틱 모델이 라벨을 맞춘 텍스트 블록의 비율이다.Here, precision is the ratio of blocks that are actually 1 among text blocks classified as 1 by the heuristic model, recall is the ratio of blocks classified as 1 by the heuristic model among text blocks that are actually 1, and accuracy is the ratio of blocks classified as 1 by the heuristic model among all text blocks. The percentage of text blocks that fit this label.
또한, 역치값은 휴리스틱 모델의 평가지표별 점수를 토대로 설정되는데, 연산부(300)가 특정한 다수의 테스트 값들이 임의로 설정된 초기 역치값을 초과하는 경우, 해당 테스트 값을 '1'로 라벨링하고 '1'로 라벨링된 테스트 값들에 대한 정밀도, 재현율 및 정확도에 대한 검증을 수행하여 결정된다. 이때, 각 검증마다 초기 역치값을 기 설정된 값 만큼 증감 또는 감소시켜 입력받은 횟수와 대응하도록 반복할 수 있다.In addition, the threshold value is set based on the score for each evaluation index of the heuristic model. When a plurality of specific test values exceed the arbitrarily set initial threshold value, the calculation unit 300 labels the test value as '1' and '1' It is determined by performing verification on the precision, recall, and accuracy of the test values labeled with '. In this case, for each verification, the initial threshold value may be increased or decreased by a preset value and repeated to correspond to the input number.
이때, 검증은 '1'로 라벨링된 테스트 값이 사전에 검증된 값(정답값)의 범위(spctrum)에 포함되는 횟수를 카운팅하고, 초기 역치값들 중에 카운팅된 값이 가장 높은 초기 역치값을 역치값으로 추출하는 것으로 이해함이 바람직하다.At this time, the verification counts the number of times that the test value labeled '1' is included in the range (spectrum) of the previously verified value (correct answer value), and among the initial threshold values, the counted value is the highest initial threshold value. It is preferable to understand that extraction is performed as a threshold value.
아울러, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)에서 각 특성별로 라벨링의 기준이 되는 역치값은, 휴리스틱 모델의 평가지표('크기, 너비, 길이, 글자 신뢰도 및 기울기') 별로 각각 '400 내지 600, 30 내지 40, 30 내지 40, 0.5 내지 1, 및 0.02 내지 0.1'의 값으로 설정될 수 있다.In addition, in the system (S) for determining the importance of a text block extracted from an image according to an embodiment of the present invention, the threshold value serving as a standard for labeling for each characteristic is the evaluation index ('size, width, length, character reliability) of the heuristic model. and slope') may be set to values of '400 to 600, 30 to 40, 30 to 40, 0.5 to 1, and 0.02 to 0.1', respectively.
또한, 전술한 휴리스틱 모델의 평가지표는 각각 '500, 35, 35, 0.75, 및 0.05' 로 설정되는 것이 바람직하나, 앞서 언급한 테스트에 의해 변경될 수 있는바, 특정한 수치에 한정되는 것은 아니다.In addition, the evaluation index of the above-described heuristic model is preferably set to '500, 35, 35, 0.75, and 0.05', respectively, but may be changed by the aforementioned test, and is not limited to a specific numerical value.
전술한 바와 같은, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)은, 도 3a 에 도시된 바와 같이 문자 인식부(100)가 이미지(사진)를 입력받고, 도 3b 에 도시된 바와 같이 이미지로부터 텍스트를 추출하며, 도 3c 에 도시된 바와 같이 문자 블록부(200)가 추출한 문자들로부터 텍스트 블록을 생성하고, 도 3d 에 도시된 바와 같이 연산부(300)가 중요도가 낮은 텍스트 블록을 필터링하여 노이즈(이중선 박스)를 제거하며, 도 3e 에 도시된 바와 같이 연산부(300)가 출력대상 텍스트 블록을 최종 결과로 도출하게 된다.As described above, in the system S for determining the importance of a text block extracted from an image according to an embodiment of the present invention, the character recognition unit 100 receives an image (photo) as shown in FIG. 3A , and FIG. As shown in Fig. 3b, text is extracted from the image, and as shown in Fig. 3c, the text block unit 200 generates a text block from the extracted characters, and as shown in Fig. 3d, the operation unit 300 determines the importance level. Noise (double-line box) is removed by filtering the text block having a low value, and as shown in FIG. 3E , the operation unit 300 derives an output target text block as a final result.
한편, 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템(S)은 도 4에 도시된 바와 같이, 연산부(300)의 휴리스틱 모델을 통해 출력대상 텍스트 블록을 분류하는 구성 외에, 딥러닝 신경망 모델을 출력대상 텍스트 블록을 분류하는 학습부(400) 및 출력대상 텍스트 블록에 포함된 텍스트 각각을 스피커를 통해 기 녹음된 음성으로 출력하는 출력부(500)를 더 포함하여 구성된다.On the other hand, as shown in FIG. 4, the importance determination system (S) of a text block extracted from an image according to an embodiment of the present invention classifies an output target text block through a heuristic model of the operation unit 300, The deep learning neural network model further includes a learning unit 400 for classifying an output target text block and an output unit 500 for outputting each text included in the output target text block as a pre-recorded voice through a speaker.
구체적으로, 학습부(400)는 입력층이 연산부(300)에 의해 '0'으로 라벨링되어 필터링된 텍스트 블록 특성값을 입력받아 은닉층으로 인가하고, 지도학습을 통해 출력층에서 텍스트 블록 특성값 각각에 대해 '0' 내지 '1' 사이의 값을 부여하며, 부여한 출력값이 큰 숫자를 텍스트 블록의 라벨로 설정한다. 이때 값이 '1'에 가까울수록 중요한 텍스트 블록으로 이해함이 바람직하다.Specifically, the learning unit 400 receives the text block characteristic value filtered by the input layer labeled as '0' by the operation unit 300 and applies it to the hidden layer, and through supervised learning to each text block characteristic value in the output layer. A value between '0' and '1' is assigned to the text block, and a number with a large output value is set as the label of the text block. In this case, the closer the value is to '1', the better it is understood as an important text block.
또한, 학습부(400)는 지도학습을 통해 '1'로 라벨링된 텍스트 블록을 연산부(300)로 인가하고, 이를 인가받은 연산부(300)에 의해 출력대상 텍스트 블록으로 분류된다.In addition, the learning unit 400 applies the text block labeled '1' to the operation unit 300 through supervised learning, and is classified as an output target text block by the authorized operation unit 300 .
이러한 학습부(400)를 구성함에 따라 본 발명의 일 실시예는, 도 5a 에 도시된 바와 같이 연산부(300)가 텍스트 블록에 포함된 특징('크기, 너비, 길이, 글자 신뢰도 및 기울기')을 추출한다.According to the configuration of the learning unit 400, an embodiment of the present invention, as shown in FIG. 5A , the calculation unit 300 includes features included in the text block ('size, width, length, character reliability and slope') to extract
그리고, 도 5b 에 도시된 바와 같이 '1'로 라벨링된 텍스트 블록을 제외한 텍스트 블록들에 대한 신경망 모델의 학습을 수행하고, 이에 따라 텍스트 블록 각각의 특징값에 대한 중요도를 판단하여 노이즈를 제거할 수 있다. 이때, 도 5b 와 같이 '1'로 라벨링 되지 않은 텍스트 블록은, 도 6에 도시된 바와 같이 '0'으로 라벨링 되어 필터링 된다.Then, as shown in FIG. 5B, the neural network model is trained on text blocks except for the text block labeled '1', and accordingly, the importance of each feature value of the text block is determined to remove noise. can In this case, the text block not labeled with '1' as shown in FIG. 5B is labeled with '0' and filtered as shown in FIG. 6 .
한편, 전술한 바와 같은 본 발명의 일 실시예에 따른 출력대상 텍스트 블록 선별 결과 도 3a 내지 도 3e, 도 4 및 도 6에 도시된 이미지를 참조해 살피면 아래와 같다.On the other hand, as described above, the output target text block selection result according to an embodiment of the present invention is as follows with reference to the images shown in FIGS. 3A to 3E, 4 and 6 .
도 3a 내지 도 3e, 도 4 및 도 6에 도시된 이미지에서 종래와 같이 모든 텍스트 블록을 추출하여 출력대상으로 선정하지 않고, 해당 이미지에서 태블릿 PC의 성능을 표현하는 문구인 "부착하기만 하면 페어링도 되고 충전도 됩니다."라는 문장과, "한 마디로 착 붙이면 척이죠"라는 2개의 문장만을 출력대상으로 분류하게 된다.Instead of extracting all text blocks from the images shown in FIGS. 3A to 3E, 4 and 6 and selecting them as output objects as in the prior art, the phrase “Pairing just by attaching it” is a phrase expressing the performance of the tablet PC from the image. Only two sentences are classified as output targets: the sentence “It can be done and it can be recharged.”
또한, '메신저 대화 내용'은 관계없는 텍스트 블록으로 분류해 출력 대상에서 제외시키게 되고, 이러한 일련의 절차는 문자 인식부(100), 문자 블록부(200) 및 연산부(300)에 의해 수행되며, 부가적으로 구성되는 학습부(400)의 필터링 기능을 통해 연산부(300)가 수행하는 출력대상 텍스트 블록에 대한 분류 속도도 향상시킬 수 있다.In addition, 'messenger conversation contents' are classified as unrelated text blocks and excluded from output, and this series of procedures is performed by the character recognition unit 100, the character block unit 200, and the operation unit 300, Through the filtering function of the learning unit 400 additionally configured, the classification speed of the output target text block performed by the calculating unit 300 may also be improved.
이하, 도 7을 참조하여 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법에 대해 살피면 아래와 같다.Hereinafter, a method for determining the importance of a text block extracted from an image according to an embodiment of the present invention will be described with reference to FIG. 7 .
먼저, 문자 인식부(100)가 입력받은 이미지로부터 텍스트를 추출한다(S702).First, the text recognition unit 100 extracts text from the input image (S702).
이어서, 문자 블록부(200)가 추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성한다(S704).Next, the text block unit 200 divides the extracted text into sentence units to generate a text block (S704).
뒤이어, 연산부(300)가 텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정한다(S706).Subsequently, the operation unit 300 extracts features from each text block and designates the properties for the text block ( S706 ).
이어서, 연산부(300)가 텍스트 블록 특성값이 기 설정된 역치값을 초과하는지 여부를 판단한다(S708).Next, the operation unit 300 determines whether the text block characteristic value exceeds a preset threshold value (S708).
제S708단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 연산부(300)가 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링 한다(S710).As a result of the determination in step S708, when the text block characteristic value exceeds a preset threshold value, the operation unit 300 determines that the text block is an output target text block and labels the text block as '1' (S710).
그리고, 연산부(300)가 '1'로 라벨링된 텍스트 블록을 출력하고, 그 밖에 텍스트 블록은 제거한다(S712).Then, the operation unit 300 outputs a text block labeled '1', and removes other text blocks (S712).
이하, 도 8을 참조하여 본 발명의 일 실시예에 따른 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법의 제S708단계 이후 과정에 대해 살피면 아래와 같다.Hereinafter, with reference to FIG. 8, the process after step S708 of the method for determining the importance of a text block extracted from an image according to an embodiment of the present invention will be described as follows.
제S708단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하지 않는 경우, 학습부(400)가 지도학습을 통해 도출된 중요도에 따라 텍스트 블록을 '1'로 라벨링 한다(S802).As a result of the determination in step S708, if the text block characteristic value does not exceed a preset threshold value, the learning unit 400 labels the text block as '1' according to the importance derived through supervised learning (S802).
그리고, 학습부(400)가 '1'로 라벨링된 텍스트 블록을 연산부(300)로 인가하고 제S712 단계로 절차를 이행한다(S804).Then, the learning unit 400 applies the text block labeled '1' to the operation unit 300, and the procedure proceeds to step S712 (S804).
이처럼, 전술한 바와 같은 본 발명의 일 실시예에 의하면, 이미지에 포함된 텍스트들을 인식하여 생성한 텍스트 블록으로부터 특징을 추출하고, 추출한 특징들에 대한 중요도 계산을 통해 출력 대상 텍스트에 대한 이진분류를 수행함에 따라, 이미지에 포함된 텍스트들 중에 계산된 중요도와 부합하는 텍스트만을 선별해 출력할 수 있고, 스크린 리더기와의 연계를 통해 시각장애인의 온라인 쇼핑시 필요한 정보만을 음성으로 제공할 수 있다.As described above, according to an embodiment of the present invention as described above, a feature is extracted from a text block generated by recognizing texts included in an image, and binary classification of the output target text is performed by calculating the importance of the extracted features. As it is performed, only texts that match the calculated importance among texts included in the image can be selected and output, and only information necessary for online shopping for the visually impaired can be provided by voice through linkage with a screen reader.
이상으로 본 발명의 기술적 사상을 예시하기 위한 바람직한 실시 예와 관련하여 설명하고 도시하였지만, 본 발명은 이와 같이 도시되고 설명된 그대로의 구성 및 작용에만 국한되는 것이 아니며, 기술적 사상의 범주를 일탈함이 없이 본 발명에 대해 다수의 변경 및 수정이 가능함을 당업자들은 잘 이해할 수 있을 것이다. 따라서 그러한 모든 적절한 변경 및 수정과 균등물들도 본 발명의 범위에 속하는 것으로 간주되어야 할 것이다.Although described and illustrated in relation to a preferred embodiment for illustrating the technical idea of the present invention above, the present invention is not limited to the configuration and operation as shown and described as such, and deviates from the scope of the technical idea. It will be apparent to those skilled in the art that many changes and modifications can be made to the invention without reference to the invention. Accordingly, all such suitable alterations and modifications and equivalents are to be considered as being within the scope of the present invention.
[부호의 설명][Explanation of code]
S: 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템S: System for determining the importance of text blocks extracted from images
100: 문자 인식부100: character recognition unit
200: 문자 블록부200: character block part
300: 연산부300: arithmetic unit
400: 학습부400: study unit
500: 출력부500: output unit

Claims (6)

  1. 입력받은 이미지로부터 텍스트를 추출하는 문자 인식부;a character recognition unit for extracting text from the input image;
    추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성하는 문자 블록부; 및a character block unit generating a text block by dividing the extracted text into sentence units; and
    텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정하고, 지정한 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 해당 출력대상 텍스트 블록으로 분류하는 연산부를An operation unit that extracts features from each text block to designate a characteristic for the text block, and classifies the text block into the corresponding output target text block when the specified text block characteristic value exceeds a preset threshold value
    포함하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템.A system for determining the importance of a text block extracted from an image, comprising:
  2. 제1항에 있어서,According to claim 1,
    상기 특징은,The feature is
    상기 텍스트 블록에 대한 '크기, 너비, 길이, 글자 신뢰도 또는 기울기' 중에 적어도 어느 하나의 속성값을 포함하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템.The system for determining the importance of a text block extracted from an image, characterized in that it includes at least one attribute value among 'size, width, length, character reliability or inclination' for the text block.
  3. 제1항에 있어서,According to claim 1,
    상기 연산부는,The calculation unit,
    상기 문자 블록부로부터 인가받은 텍스트 블록에 포함된 특징을 추출하여 텍스트 블록별로 특성을 지정하되,Extracting the features included in the text block authorized by the text block unit and designating the properties for each text block,
    상기 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링하고, 상기 텍스트 블록 특성값이 기 설정된 역치값 이하인 경우, 해당 텍스트 블록을 '0'으로 라벨링하여 필터링하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템.When the text block property value exceeds a preset threshold value, it is determined that the text block is an output target text block, and the text block is labeled as '1'. A system for determining the importance of a text block extracted from an image, characterized in that it is filtered by labeling it as '0'.
  4. 제1항에 있어서,According to claim 1,
    지도학습을 통해 상기 텍스트 블록 특성값 각각에 대해 '0' 내지 '1' 사이의 값을 부여하는 학습부를A learning unit that assigns a value between '0' and '1' to each of the text block characteristic values through supervised learning
    더 포함하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 시스템.The importance judgment system of the text block extracted from the image, characterized in that it further comprises.
  5. (a) 문자 인식부가 입력받은 이미지로부터 텍스트를 추출하는 단계;(a) extracting text from the image received by the character recognition unit;
    (b) 문자 블록부가 추출된 텍스트를 문장 단위로 구분하여 텍스트 블록을 생성하는 단계;(b) generating a text block by dividing the extracted text by the text block unit into sentence units;
    (c) 연산부가 텍스트 블록 각각으로부터 특징을 추출하여 해당 텍스트 블록에 대한 특성을 지정하는 단계;(c) designating, by an operation unit, a characteristic for the text block by extracting a characteristic from each text block;
    (d) 연산부가 텍스트 블록 특성값이 기 설정된 역치값을 초과하는지 여부를 판단하는 단계;(d) determining, by the calculator, whether the text block characteristic value exceeds a preset threshold value;
    (e) 상기 (d) 단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하는 경우, 연산부가 출력대상 텍스트 블록인 것으로 판단하여 해당 텍스트 블록을 '1'로 라벨링하는 단계;(e) as a result of the determination in step (d), when the text block characteristic value exceeds a preset threshold value, the calculation unit determines that the text block is an output target text block and labels the text block as '1';
    (f) 연산부가 '1'로 라벨링된 텍스트 블록을 출력하고, 그 밖에 텍스트 블록은 제거하는 단계를(f) the operation unit outputs a text block labeled '1' and removes other text blocks.
    포함하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법.A method for determining the importance of a text block extracted from an image, comprising:
  6. 제5항에 있어서,6. The method of claim 5,
    상기 (d) 단계의 판단결과, 텍스트 블록 특성값이 기 설정된 역치값을 초과하지 않는 경우, 학습부가 '1'로 라벨링된 텍스트 블록을 연산부로 인가하고 상기 (f) 단계로 절차를 이행하는 (g) 단계를As a result of the determination in step (d), if the text block characteristic value does not exceed a preset threshold value, the learning unit applies the text block labeled '1' to the operation unit and performs the procedure in step (f) ( g) step
    더 포함하는 것을 특징으로 하는 이미지로부터 추출한 텍스트 블록의 중요도 판단 방법.A method for determining the importance of a text block extracted from an image, characterized in that it further comprises.
PCT/KR2020/015822 2020-02-27 2020-11-11 System and method for determining importance of text block extracted from image WO2021172699A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0024023 2020-02-27
KR1020200024023A KR102374281B1 (en) 2020-02-27 2020-02-27 Importance Determination System of Text Block Extracted from Image and Its Method

Publications (1)

Publication Number Publication Date
WO2021172699A1 true WO2021172699A1 (en) 2021-09-02

Family

ID=77491881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/015822 WO2021172699A1 (en) 2020-02-27 2020-11-11 System and method for determining importance of text block extracted from image

Country Status (2)

Country Link
KR (1) KR102374281B1 (en)
WO (1) WO2021172699A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293521A (en) * 1999-04-09 2000-10-20 Canon Inc Image processing method, its device and storage medium
KR20130080745A (en) * 2012-01-05 2013-07-15 주식회사 인프라웨어 Method of generating electronic documents using camera module of smart phones and ocr engine of remote server, and terminal device using the same
KR20160105215A (en) * 2015-02-27 2016-09-06 삼성전자주식회사 Apparatus and method for processing text
KR102001375B1 (en) * 2019-02-19 2019-07-18 미래에셋대우 주식회사 Apparatus and Method for DistinguishingSpam in Financial News
KR20200002141A (en) * 2018-06-29 2020-01-08 김종진 Providing Method Of Language Learning Contents Based On Image And System Thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100652010B1 (en) * 2005-12-02 2006-12-01 한국전자통신연구원 Apparatus and method for constituting character by teeth-clenching
US9436682B2 (en) 2014-06-24 2016-09-06 Google Inc. Techniques for machine language translation of text from an image based on non-textual context information from the image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293521A (en) * 1999-04-09 2000-10-20 Canon Inc Image processing method, its device and storage medium
KR20130080745A (en) * 2012-01-05 2013-07-15 주식회사 인프라웨어 Method of generating electronic documents using camera module of smart phones and ocr engine of remote server, and terminal device using the same
KR20160105215A (en) * 2015-02-27 2016-09-06 삼성전자주식회사 Apparatus and method for processing text
KR20200002141A (en) * 2018-06-29 2020-01-08 김종진 Providing Method Of Language Learning Contents Based On Image And System Thereof
KR102001375B1 (en) * 2019-02-19 2019-07-18 미래에셋대우 주식회사 Apparatus and Method for DistinguishingSpam in Financial News

Also Published As

Publication number Publication date
KR102374281B1 (en) 2022-03-16
KR20210109146A (en) 2021-09-06

Similar Documents

Publication Publication Date Title
CN109192194A (en) Voice data mask method, device, computer equipment and storage medium
WO2012057562A2 (en) Apparatus and method for emotional audio synthesis
CN112784696A (en) Lip language identification method, device, equipment and storage medium based on image identification
CN109783693B (en) Method and system for determining video semantics and knowledge points
CN110837558B (en) Judgment document entity relation extraction method and system
WO2020138607A1 (en) Method and device for providing question and answer using chatbot
CN110196929A (en) The generation method and device of question and answer pair
WO2023277667A1 (en) Webtoon content multilingual translation method
CN113889074A (en) Voice generation method, device, equipment and medium
WO2021172700A1 (en) System for blocking texts extracted from image, and method therefor
WO2018169276A1 (en) Method for processing language information and electronic device therefor
WO2021172699A1 (en) System and method for determining importance of text block extracted from image
CN113609865A (en) Text emotion recognition method and device, electronic equipment and readable storage medium
Suthagar et al. Translation of sign language for Deaf and Dumb people
CN116721449A (en) Training method of video recognition model, video recognition method, device and equipment
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN115620407A (en) Information exchange method and device and vehicle
CN114298030A (en) Statement extraction method and device, electronic equipment and computer-readable storage medium
WO2012077909A2 (en) Method and apparatus for recognizing sign language using electromyogram sensor and gyro sensor
CN114120425A (en) Emotion recognition method and device, electronic equipment and storage medium
WO2020067666A1 (en) Virtual counseling system and counseling method using same
Sukanya et al. Indian sign language recognition using convolution neural network
EP3757825A1 (en) Methods and systems for automatic text segmentation
WO2018030595A1 (en) Method and device for extracting character
KR100967555B1 (en) Method and system for e-learning for english education

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921491

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20921491

Country of ref document: EP

Kind code of ref document: A1