TW202328970A

TW202328970A - Electronic device and method for detecting and recognizing character

Info

Publication number: TW202328970A
Application number: TW111100039A
Authority: TW
Inventors: 黃穎竹; 洪筠緯; 田永平; 高薇雅
Original assignee: 財團法人工業技術研究院
Priority date: 2022-01-03
Filing date: 2022-01-03
Publication date: 2023-07-16
Also published as: TWI799050B

Abstract

An electronic device and a method for detecting and recognizing a character. The method includes: receiving a text image; detecting the text image to generate a plurality of bounding boxes corresponding to a plurality of characters respectively and calculating an inclined angle corresponding to the text image according to the plurality of bounding boxes; generating a plurality of recognition results corresponding to the plurality of characters respectively according to the plurality of bounding boxes; generating a plurality of orders corresponding to the plurality of characters respectively according to the inclined angle and the plurality of bounding boxes; matching the plurality of recognition results and the plurality of orders to generate a character string; and outputting the character string.

Description

Electronic device and method for detecting and recognizing text

本發明是有關於一種偵測和辨識文字的電子裝置和方法。The invention relates to an electronic device and method for detecting and recognizing characters.

在企業推動數位轉型的浪潮下，將實體資料數位化（digitalization）是基本且關鍵的一步。實體資料可包含大量的紙本文書、合約、表格或簽單的文字資料。使用者可對實體資料的影像進行字元辨識以產生數位資料。然而，基於行業別的不同，不同使用者的實體資料之間可能存在顯著的格式差異。此外，實體資料的來源可能是由掃描機掃描文件而產生，或是由使用者使用相機拍攝文件而產生。實體資料的影像的格式差異以及來源差異為字元辨識增加了許多困難。Under the tide of digital transformation promoted by enterprises, the digitization of physical data is a basic and crucial step. Substantial data may include a large number of written documents, contracts, forms or signed documents. Users can perform character recognition on images of physical data to generate digital data. However, depending on the industry, there may be significant format differences between different users' entity data. In addition, the source of physical data may be generated by scanning a document by a scanner, or by a user taking a photo of a document with a camera. Format differences and source differences of images of physical data add many difficulties to character recognition.

本發明提供一種偵測和辨識文字的電子裝置和方法，可以在文字影像中的文件已經傾斜的情況下，準確地偵測和辨識文件上的文字。The invention provides an electronic device and method for detecting and recognizing characters, which can accurately detect and recognize characters on the document when the document in the character image is tilted.

本發明的一種偵測和辨識文字的電子裝置，包含處理器、儲存媒體以及收發器。收發器接收文字影像。儲存媒體儲存多個模組。處理器耦接儲存媒體以及收發器，並且存取和執行多個模組，其中多個模組包含字元偵測模組、字元辨識模組、文字排序模組、匹配模組以及輸出模組。字元偵測模組偵測文字影像以產生分別對應於多個字元的多個定界框，並且根據多個定界框計算對應於文字影像的傾斜角度。字元辨識模組根據多個定界框產生分別對應於多個字元的多個辨識結果。文字排序模組根據傾斜角度以及多個定界框產生分別對應於多個定界框的多個順序。匹配模組匹配多個辨識結果以及多個順序以產生字串。輸出模組通過收發器輸出字串。An electronic device for detecting and identifying characters of the present invention includes a processor, a storage medium and a transceiver. The transceiver receives text images. The storage medium stores multiple modules. The processor is coupled to the storage medium and the transceiver, and accesses and executes multiple modules, wherein the multiple modules include a character detection module, a character recognition module, a character sorting module, a matching module, and an output module Group. The character detection module detects the text image to generate a plurality of bounding boxes corresponding to the plurality of characters, and calculates an inclination angle corresponding to the text image according to the plurality of bounding boxes. The character recognition module generates a plurality of recognition results respectively corresponding to the plurality of characters according to the plurality of bounding boxes. The text sorting module generates a plurality of sequences corresponding to the plurality of bounding boxes according to the tilt angle and the plurality of bounding boxes. The matching module matches multiple recognition results and multiple sequences to generate a word string. The output module outputs the string through the transceiver.

在本發明的一實施例中，上述的多個定界框包含第一定界框，其中字元偵測模組響應於多個定界框中的第二定接框最接近第一定界框而根據第一定界框以及第二定界框計算第一傾斜角度，並且根據第一傾斜角度計算傾斜角度。In an embodiment of the present invention, the above-mentioned plurality of bounding frames includes a first bounding frame, wherein the character detection module responds that the second bounding frame of the plurality of bounding frames is closest to the first bounding frame frame and calculate the first tilt angle according to the first bounding box and the second bounding box, and calculate the tilt angle according to the first tilt angle.

在本發明的一實施例中，上述的字元偵測模組根據第一定界框的第一中心點以及第二定界框的第二中心點計算傾斜角度。In an embodiment of the present invention, the above character detection module calculates the tilt angle according to the first center point of the first bounding box and the second center point of the second bounding box.

在本發明的一實施例中，上述的字元偵測模組根據傾斜角度對分別對應於多個定界框的多個座標參數執行座標轉換以更新多個座標參數，其中文字排序模組根據經更新的多個座標參數產生多個順序。In an embodiment of the present invention, the above-mentioned character detection module performs coordinate conversion on a plurality of coordinate parameters respectively corresponding to a plurality of bounding boxes according to the inclination angle to update the plurality of coordinate parameters, wherein the character sorting module according to The updated coordinate parameters generate multiple orders.

在本發明的一實施例中，上述的文字排序模組經配置以執行：根據多個座標參數以從多個定界框選出對應於最小縱座標的第一定界框；根據第一定界框的第一座標參數以從多個定界框中選出至少一定界框；從至少一定界框選出對應於最小橫座標的第二定界框；以及為第二定界框分配順序以產生多個順序。In an embodiment of the present invention, the above-mentioned text sorting module is configured to execute: select a first bounding box corresponding to the smallest vertical coordinate from multiple bounding boxes according to a plurality of coordinate parameters; A first coordinate parameter of the frame to select at least a certain bounding box from a plurality of bounding boxes; select a second bounding box corresponding to the smallest abscissa from at least a certain bounding box; and assign an order to the second bounding box to generate multiple bounding boxes sequence.

在本發明的一實施例中，上述的文字排序模組響應於為第二定界框分配順序而將第二定界框的第二座標參數自多個座標參數移除，藉以更新多個座標參數，其中文字排序模組根據經更新的多個座標參數產生多個順序。In an embodiment of the present invention, the above-mentioned text sorting module removes the second coordinate parameter of the second bounding box from the plurality of coordinate parameters in response to assigning an order to the second bounding box, thereby updating the plurality of coordinates parameters, wherein the text sorting module generates multiple sequences according to the updated multiple coordinate parameters.

在本發明的一實施例中，上述的第一座標參數包含第一橫座標、對應於第一橫座標的第一縱座標、第二橫座標以及對應於第二橫座標的第二縱座標，其中第一橫座標小於第二橫座標，其中文字排序模組更經配置以執行：將第一座標參數的第一橫座標設為零以產生參考定界框；以及響應於至少一定界框與參考定界框之間的重疊區域大於閾值，從多個定界框中選出至少一定界框。In an embodiment of the present invention, the above-mentioned first coordinate parameter includes a first abscissa, a first ordinate corresponding to the first abscissa, a second abscissa, and a second ordinate corresponding to the second abscissa, wherein the first abscissa is smaller than the second abscissa, wherein the text sorting module is further configured to: set the first abscissa of the first coordinate parameter to zero to generate a reference bounding box; and respond to at least a certain bounding box with The overlapping area between the reference bounding boxes is greater than a threshold, and at least a certain bounding box is selected from the plurality of bounding boxes.

在本發明的一實施例中，上述的字元偵測模組根據機器學習模型產生多個定界框。In an embodiment of the present invention, the above character detection module generates a plurality of bounding boxes according to a machine learning model.

在本發明的一實施例中，上述的字元辨識模組根據光學字元辨識模型產生多個辨識結果。In an embodiment of the present invention, the above character recognition module generates a plurality of recognition results according to the optical character recognition model.

在本發明的一實施例中，上述的傾斜角度大於或等於負四十五度且小於或等於四十五度。In an embodiment of the present invention, the aforementioned inclination angle is greater than or equal to negative forty-five degrees and less than or equal to forty-five degrees.

在本發明的一實施例中，上述的文字影像對應於橫式書寫格式。In an embodiment of the present invention, the above-mentioned character image corresponds to a horizontal writing format.

本發明的一種偵測和辨識文字的方法，包含：接收文字影像；偵測文字影像以產生分別對應於多個字元的多個定界框，並且根據多個定界框計算對應於文字影像的傾斜角度；根據多個定界框產生分別對應於多個字元的多個辨識結果；根據傾斜角度以及多個定界框產生分別對應於多個字元的多個順序；匹配多個辨識結果以及多個順序以產生字串；以及輸出字串。A method for detecting and recognizing characters of the present invention includes: receiving a character image; detecting the character image to generate a plurality of bounding boxes corresponding to a plurality of characters, and calculating the corresponding character image according to the plurality of bounding boxes The inclination angle; generate multiple identification results corresponding to multiple characters according to multiple bounding boxes; generate multiple sequences corresponding to multiple characters according to the tilt angle and multiple bounding boxes; match multiple identifications result and a plurality of sequences to generate a string; and output a string.

基於上述，本發明的電子裝置可在文字影像中的文件已經傾斜的情況下，準確地偵測文件上的字元的位置，從而根據字元的位置將文件的內容數位化。Based on the above, the electronic device of the present invention can accurately detect the position of characters on the document when the document in the text image is tilted, so as to digitize the content of the document according to the position of the characters.

為了使本發明之內容可以被更容易明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/步驟，係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following specific embodiments are taken as examples in which the present invention can actually be implemented. In addition, wherever possible, elements/components/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts.

圖1根據本發明的一實施例繪示一種偵測和辨識文字的電子裝置100的示意圖。電子裝置100可包含處理器110、儲存媒體120以及收發器130。FIG. 1 shows a schematic diagram of an electronic device 100 for detecting and recognizing characters according to an embodiment of the present invention. The electronic device 100 may include a processor 110 , a storage medium 120 and a transceiver 130 .

處理器110例如是中央處理單元（central processing unit，CPU），或是其他可程式化之一般用途或特殊用途的微控制單元（micro control unit，MCU）、微處理器（microprocessor）、數位信號處理器（digital signal processor，DSP）、可程式化控制器、特殊應用積體電路（application specific integrated circuit，ASIC）、圖形處理器（graphics processing unit，GPU）、影像訊號處理器（image signal processor，ISP）、影像處理單元（image processing unit，IPU）、算數邏輯單元（arithmetic logic unit，ALU）、複雜可程式邏輯裝置（complex programmable logic device，CPLD）、現場可程式化邏輯閘陣列（field programmable gate array，FPGA）或其他類似元件或上述元件的組合。處理器110可耦接至儲存媒體120以及收發器130，並且存取和執行儲存於儲存媒體120中的多個模組和各種應用程式。The processor 110 is, for example, a central processing unit (central processing unit, CPU), or other programmable general purpose or special purpose micro control unit (micro control unit, MCU), microprocessor (microprocessor), digital signal processing Digital Signal Processor (DSP), Programmable Controller, Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), Image Signal Processor (ISP) ), image processing unit (image processing unit, IPU), arithmetic logic unit (arithmetic logic unit, ALU), complex programmable logic device (complex programmable logic device, CPLD), field programmable logic gate array (field programmable gate array , FPGA) or other similar components or combinations of the above components. The processor 110 can be coupled to the storage medium 120 and the transceiver 130 , and access and execute multiple modules and various application programs stored in the storage medium 120 .

儲存媒體120例如是任何型態的固定式或可移動式的隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟（hard disk drive，HDD）、固態硬碟（solid state drive，SSD）或類似元件或上述元件的組合，而用於儲存可由處理器110執行的多個模組或各種應用程式。在本實施例中，儲存媒體120可儲存包含字元偵測模組121、字元辨識模組122、文字排序模組123、匹配模組124以及輸出模組125等多個模組，其功能將於後續說明。The storage medium 120 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), flash memory (flash memory) , hard disk drive (hard disk drive, HDD), solid state drive (solid state drive, SSD) or similar components or a combination of the above components, and are used to store multiple modules or various application programs executable by the processor 110 . In this embodiment, the storage medium 120 can store multiple modules including a character detection module 121, a character recognition module 122, a character sorting module 123, a matching module 124, and an output module 125. Will be explained later.

收發器130以無線或有線的方式傳送及接收訊號。收發器130還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。The transceiver 130 transmits and receives signals in a wireless or wired manner. The transceiver 130 may also perform operations such as low noise amplification, impedance matching, frequency mixing, up or down frequency conversion, filtering, amplification, and the like.

圖2根據本發明的一實施例繪示電子裝置100的多個模組的示意圖。首先，字元偵測模組121可通過收發器130接收文字影像S1。文字影像S1例如是RGB影像。在一實施例中，文字影像S1對應於橫式書寫格式（horizontal writing）。文字影像S1可包含傾斜的文件。舉例來說，使用者可通過攝影裝置拍攝文件以產生包含所述文件的文字影像S1。基於拍攝環境或拍攝技術不佳等因素，文字影像S1中的文件可能傾斜，從而增加字元辨識的難度。FIG. 2 shows a schematic diagram of multiple modules of the electronic device 100 according to an embodiment of the present invention. First, the character detection module 121 can receive the character image S1 through the transceiver 130 . The character image S1 is, for example, an RGB image. In one embodiment, the character image S1 corresponds to a horizontal writing format. Text image S1 may contain skewed documents. For example, a user can take a picture of a document with a photography device to generate a text image S1 including the document. Due to factors such as the shooting environment or poor shooting techniques, the document in the text image S1 may be tilted, thereby increasing the difficulty of character recognition.

圖3根據本發明的一實施例繪示文字影像S1的示意圖。文字影像S1可包含傾斜的文件30。由於文件30在文字影像S1中是傾斜的，故文件30上的文字在文字影像S1中也是傾斜的。因此，傳統的字元辨識技術將難以辨識文件30上的文字。FIG. 3 shows a schematic diagram of a text image S1 according to an embodiment of the present invention. The text image S1 may include a skewed document 30 . Since the document 30 is slanted in the text image S1, the characters on the document 30 are also slanted in the text image S1. Therefore, it is difficult for conventional character recognition technology to recognize the characters on the document 30 .

回到圖2，字元偵測模組121可偵測文字影像S1以產生分別對應於多個字元的多個定界框（bounding box）。在一實施例中，字元偵測模組121可將文字影像S1輸入至機器學習模型以產生多個定界框。舉例來說，字元偵測模組121可將文字影像S1輸入至物件偵測（object detection）模型以產生分別對應於多個字元的多個定界框。Referring back to FIG. 2 , the character detection module 121 can detect the character image S1 to generate a plurality of bounding boxes respectively corresponding to a plurality of characters. In one embodiment, the character detection module 121 can input the character image S1 into the machine learning model to generate a plurality of bounding boxes. For example, the character detection module 121 can input the character image S1 into an object detection model to generate a plurality of bounding boxes respectively corresponding to a plurality of characters.

字元偵測模組121可將關聯於多個定界框的訊息S2傳送給字元辨識模組122，其中所述多個定界框中的每一個定界框可包圍住一個字元。訊息S2可包含定界框（或字元）的序號以及定界框的座標參數。表1為圖3中的定界框的序號之範例。表1 定界框序號 31 #1 32 #2 33 #3 34 #4 35 #5 The character detection module 121 can send the message S2 associated with a plurality of bounding boxes to the character recognition module 122, wherein each bounding box of the plurality of bounding boxes can enclose a character. The message S2 may include the serial number of the bounding box (or character) and the coordinate parameters of the bounding box. Table 1 is an example of the sequence numbers of the bounding boxes in FIG. 3 . Table 1 bounding box serial number 31 #1 32 #2 33 #3 34 #4 35 #5

定界框的座標參數可包含定界框的兩個頂點的座標，其中所述兩個頂點可包含具有最小橫座標和最小縱座標的頂點以及具有最大橫座標和最大縱座標的頂點。以圖3的定界框31為例，假設橫軸為X軸且縱軸為Y軸。定界框31可包含頂點P1、頂點P2、頂點P3和頂點P4等四個頂點。在四個頂點中，頂點P1具有最小X座標和最小Y座標，並且頂點P4具有最大X座標和最大Y座標。因此，定界框31的座標參數可包含頂點P1的座標以及頂點P2的座標。The coordinate parameters of the bounding box may include coordinates of two vertices of the bounding box, where the two vertices may include a vertex with a minimum abscissa and minimum ordinate and a vertex with a maximum abscissa and maximum ordinate. Taking the bounding box 31 in FIG. 3 as an example, it is assumed that the horizontal axis is the X axis and the vertical axis is the Y axis. The bounding box 31 may include four vertices such as a vertex P1 , a vertex P2 , a vertex P3 and a vertex P4 . Among the four vertices, the vertex P1 has the smallest X coordinate and the smallest Y coordinate, and the vertex P4 has the largest X coordinate and the largest Y coordinate. Therefore, the coordinate parameters of the bounding box 31 may include the coordinates of the vertex P1 and the coordinates of the vertex P2.

在一實施例中，字元偵測模組121可將正規化的座標參數（即：由機器學習模型所產生的座標參數）轉換為實質的座標參數，再通過訊息S2將實質的座標參數傳送給字元辨識模組122。字元辨識模組122可根據實質的座標參數執行字元辨識。以圖3的定界框31為例，假設文字影像S1的長度（或文字影像S1在X軸上的投影長度）為L個像素，且文字影像S1的寬度為W個像素，則字元偵測模組121可根據方程式（1）將定界框31的座標參數F轉換為實質座標參數F’，其中代表定界框i的最小X座標，代表定界框i的最小y座標，代表定界框i的最大X座標並且代表定界框i的最大y座標。舉例來說，可代表定界框31的頂點P1的X座標，可代表定界框31的頂點P1的Y座標，可代表定界框31的頂點P4的X座標並且可代表定界框31的頂點P4的Y座標。 …(1) In one embodiment, the character detection module 121 can convert the normalized coordinate parameters (that is, the coordinate parameters generated by the machine learning model) into actual coordinate parameters, and then send the actual coordinate parameters through the message S2 To character recognition module 122. The character recognition module 122 can perform character recognition according to the actual coordinate parameters. Taking the bounding box 31 of FIG. 3 as an example, assuming that the length of the text image S1 (or the projection length of the text image S1 on the X-axis) is L pixels, and the width of the text image S1 is W pixels, then the character detection The measurement model group 121 can convert the coordinate parameter F of the bounding box 31 into a substantial coordinate parameter F' according to equation (1), wherein Represents the minimum X coordinate of the bounding box i, represents the minimum y-coordinate of the bounding box i, represents the maximum X coordinate of the bounding box i and Represents the maximum y-coordinate of the bounding box i. for example, Can represent the X coordinate of the vertex P1 of the bounding box 31, can represent the Y coordinate of the vertex P1 of the bounding box 31, can represent the X coordinate of the vertex P4 of the bounding box 31 and may represent the Y coordinate of the vertex P4 of the bounding box 31 . …(1)

字元辨識模組122可從訊息S2中取得分別對應於文件30上的多個字元的多個定界框。而後，字元辨識模組122可根據多個定界框產生分別對應於多個字元的多個辨識結果。具體來說，字元辨識模組122可將包含多個定界框的文字影像S1輸入至光學字元辨識模型（optical character recognition，OCR）以產生分別對應於多個定界框的多個辨識結果。圖4根據本發明的一實施例繪示執行包含多個定界框的文字影像S1的示意圖。字元辨識模組122可根據光學字元辨識模型判斷定界框31中的字元為「A」、定界框32中的字元為「B」、定界框33中的字元為「C」、定界框34中的字元為「D」且定界框35中的字元為「E」。The character recognition module 122 can obtain a plurality of bounding boxes respectively corresponding to a plurality of characters on the document 30 from the message S2. Then, the character recognition module 122 can generate a plurality of recognition results respectively corresponding to the plurality of characters according to the plurality of bounding boxes. Specifically, the character recognition module 122 can input the text image S1 including a plurality of bounding boxes into an optical character recognition model (optical character recognition, OCR) to generate a plurality of recognitions respectively corresponding to the plurality of bounding boxes. result. FIG. 4 illustrates a schematic diagram of executing a text image S1 including a plurality of bounding boxes according to an embodiment of the present invention. The character recognition module 122 can judge according to the optical character recognition model that the character in the bounding box 31 is "A", the character in the bounding box 32 is "B", and the character in the bounding box 33 is " C", the character in bounding box 34 is "D" and the character in bounding box 35 is "E".

字元辨識模組122可通過訊息S4將多個辨識結果傳送給匹配模組124。訊息S4可包含定界框的序號以及定界框中之字元的辨識結果。表2為圖4中的定界框的序號和辨識結果之範例。表2 定界框序號辨識結果 31 #1 A 32 #2 B 33 #3 C 34 #4 D 35 #5 E The character recognition module 122 can send a plurality of recognition results to the matching module 124 through the message S4. The message S4 may include the sequence number of the bounding box and the recognition result of the characters in the bounding box. Table 2 is an example of the sequence numbers and recognition results of the bounding boxes in FIG. 4 . Table 2 bounding box serial number Identification result 31 #1 A 32 #2 B 33 #3 C 34 #4 D. 35 #5 E.

參照圖2和圖3。字元偵測模組121可根據圖3中的多個定界框計算文字影像S1中的文件30的傾斜角度（angle of inclination）。字元偵測模組121可將關聯於多個定界框與傾斜角度的訊息S3傳送給文字排序模組123。具體來說，字元偵測模組121可根據傾斜角度對分別對應於多個定界框的多個座標參數執行座標轉換以更新所述多個座標參數。字元偵測模組121可通過訊息S3將經更新的多個座標參數傳送給文字排序模組123。訊息S3可包含定界框的序號以及經更新的定界框的座標參數。訊息S3可用以產生如圖7A所示的多個定界框。表3為定界框的序號以及經更新（或經轉換）的定界框的座標參數的範例。表3 定界框經更新的定界框序號座標參數 31 71 #1 32 72 #2 33 73 #3 34 74 #4 35 75 #5 Refer to Figure 2 and Figure 3. The character detection module 121 can calculate the angle of inclination of the document 30 in the text image S1 according to the plurality of bounding boxes in FIG. 3 . The character detection module 121 can send the message S3 associated with a plurality of bounding boxes and tilt angles to the character sorting module 123 . Specifically, the character detection module 121 may perform coordinate conversion on a plurality of coordinate parameters respectively corresponding to a plurality of bounding boxes according to the tilt angle to update the plurality of coordinate parameters. The character detection module 121 can send the updated coordinate parameters to the character sorting module 123 through the message S3. The message S3 may include the serial number of the bounding frame and the updated coordinate parameters of the bounding frame. The message S3 can be used to generate multiple bounding boxes as shown in FIG. 7A. Table 3 is an example of the serial number of the bounding box and the coordinate parameters of the updated (or transformed) bounding box. table 3 bounding box updated bounding box serial number Coordinate parameter 31 71 #1 32 72 #2 33 73 #3 34 74 #4 35 75 #5

圖5根據本發明的一實施例繪示執行座標轉換的流程圖。在字元偵測模組121產生文字影像S1的多個定界框後，在步驟S501中，字元偵測模組121可根據定界框的座標參數取得定界框的中心點。以圖3為例，定界框31的座標參數可包含頂點P1的座標以及頂點P4的座標。字元偵測模組121可計算頂點P1的座標以及頂點P4的座標的平均以決定定界框的中心點C1。基於類似的方法，字元偵測模組121可取得定界框32的中心點C2、定界框33的中心點C3、定界框34的中心點C4以及定界框35的中心點C5。FIG. 5 illustrates a flow chart of performing coordinate transformation according to an embodiment of the invention. After the character detection module 121 generates a plurality of bounding boxes of the text image S1, in step S501, the character detection module 121 can obtain the center point of the bounding box according to the coordinate parameters of the bounding box. Taking FIG. 3 as an example, the coordinate parameters of the bounding box 31 may include the coordinates of the vertex P1 and the coordinates of the vertex P4. The character detection module 121 can calculate the average of the coordinates of the vertex P1 and the coordinates of the vertex P4 to determine the center point C1 of the bounding box. Based on a similar method, the character detection module 121 can obtain the center point C2 of the bounding box 32 , the center point C3 of the bounding box 33 , the center point C4 of the bounding box 34 , and the center point C5 of the bounding box 35 .

在步驟S502中，字元偵測模組121可計算最鄰近的定界框之間的傾斜角度。舉例來說，字元偵測模組121可響應於多個定界框中的定界框32最接近定界框31而根據定界框31和定界框32計算傾斜角度θ1。在一實施例中，字元偵測模組121可根據定界框31的中心點C1與定界框32的中心點C2計算傾斜角度θ1。具體來說，字元偵測模組121可根據方程式（2）計算傾斜角度θ1，其中為中心點C1的X座標，為中心點C1的Y座標，為中心點C2的X座標，並且為中心點C2的Y座標。 …(2) In step S502, the character detection module 121 can calculate the inclination angle between the nearest bounding boxes. For example, the character detection module 121 may calculate the tilt angle θ1 according to the bounding box 31 and the bounding box 32 in response to the bounding box 32 among the plurality of bounding boxes being closest to the bounding box 31 . In one embodiment, the character detection module 121 can calculate the tilt angle θ1 according to the center point C1 of the bounding box 31 and the center point C2 of the bounding box 32 . Specifically, the character detection module 121 can calculate the tilt angle θ1 according to equation (2), where is the X coordinate of the center point C1, is the Y coordinate of the center point C1, is the X coordinate of the center point C2, and is the Y coordinate of the center point C2. …(2)

基於與方程式（2）相同的方法，字元偵測模組121可產生多個傾斜角度。字元偵測模組121可根據定界框32的中心點C2與定界框33的中心點C3計算傾斜角度θ2。字元偵測模組121可根據定界框34的中心點C4與定界框35的中心點C5計算傾斜角度θ3。Based on the same method as equation (2), the character detection module 121 can generate multiple tilt angles. The character detection module 121 can calculate the tilt angle θ2 according to the center point C2 of the bounding box 32 and the center point C3 of the bounding box 33 . The character detection module 121 can calculate the tilt angle θ3 according to the center point C4 of the bounding box 34 and the center point C5 of the bounding box 35 .

在步驟S503中，字元偵測模組121可根據多個傾斜角度計算平均傾斜角度（即：多個傾斜角度的平均），並且將平均傾斜角度設為文件30的傾斜角度，其中傾斜角度可大於或等於負45度且小於或等於45度，以使文字排序模組123能夠正確地判斷每個字元的順序。In step S503, the character detection module 121 can calculate the average inclination angle according to the multiple inclination angles (ie: the average of the multiple inclination angles), and set the average inclination angle as the inclination angle of the document 30, wherein the inclination angle can be Greater than or equal to minus 45 degrees and less than or equal to 45 degrees, so that the character sorting module 123 can correctly determine the order of each character.

在步驟S504中，字元偵測模組121可根據傾斜角度對定界框的座標參數執行座標轉換以更新座標參數。舉例來說，字元偵測模組121可將座標參數乘上傾斜角度的餘弦以更新座標參數。在產生定界框的經更新的座標參數後，字元偵測模組121可通過訊息S3將經更新的座標參數傳送給文字排序模組123。In step S504, the character detection module 121 may perform coordinate transformation on the coordinate parameters of the bounding box according to the tilt angle to update the coordinate parameters. For example, the character detection module 121 can multiply the coordinate parameter by the cosine of the tilt angle to update the coordinate parameter. After generating the updated coordinate parameters of the bounding box, the character detection module 121 can send the updated coordinate parameters to the character sorting module 123 through the message S3.

文字排序模組123可自訊息S3取得分別對應於多個字元的經更新的多個座標參數。文字排序模組123可根據經更新的多個座標參數產生分別對應於多個定界框的多個順序。The character sorting module 123 can obtain a plurality of updated coordinate parameters respectively corresponding to a plurality of characters from the message S3. The text sorting module 123 can generate multiple sequences respectively corresponding to multiple bounding boxes according to the updated multiple coordinate parameters.

圖6根據本發明的一實施例繪示產生字元的順序的流程圖。在步驟S601中，文字排序模組123可根據多個座標參數從多個定界框中選出對應於最小縱座標的第一定界框。換句話說，若在多個定界框中，第一定界框的參考點（例如：頂點或中心點）的縱座標小於其他定界框的參考點的縱座標，則文字排序模組123可從多個定界框中選出第一定界框。FIG. 6 is a flowchart illustrating a sequence of character generation according to an embodiment of the present invention. In step S601 , the text sorting module 123 can select a first bounding box corresponding to the smallest vertical coordinate from a plurality of bounding boxes according to a plurality of coordinate parameters. In other words, if among multiple bounding boxes, the ordinate of the reference point (for example: vertex or center point) of the first bounding box is smaller than the ordinates of the reference points of other bounding boxes, the text sorting module 123 The first bounding box may be selected from a plurality of bounding boxes.

圖7A和7B根據本發明的一實施例繪示根據訊息S3產生的定界框的示意圖。參照圖7A，在座標轉換完成後，圖3中的定界框31、32、33、34和35被分別轉換為圖7A中的定界框71、72、73、74、和75。文字排序模組123可從圖7A中的多個定界框中選出對應於最小Y座標的定界框72。7A and 7B illustrate a schematic diagram of a bounding box generated according to the message S3 according to an embodiment of the present invention. Referring to FIG. 7A, after the coordinate conversion is completed, bounding boxes 31, 32, 33, 34, and 35 in FIG. 3 are converted into bounding boxes 71, 72, 73, 74, and 75 in FIG. 7A, respectively. The text sorting module 123 can select the bounding box 72 corresponding to the smallest Y coordinate from the multiple bounding boxes in FIG. 7A .

在步驟S602中，文字排序模組123可根據第一定界框的第一座標參數以從多個定界框中選出至少一定界框。具體來說，第一座標參數可包含第一橫座標、對應於第一橫座標的第一縱座標、第二橫座標以及對應於第二橫座標的第二縱座標，其中第一橫座標小於第二橫座標，並且第一縱座標小於第二縱座標。文字排序模組123可將第一橫座標設為零以產生參考定界框。接著，文字排序模組123可響應於至少一定界框與參考定界框之間的重疊區域大於閾值，從多個定界框中選出至少一定界框。In step S602, the text sorting module 123 can select at least a certain bounding box from a plurality of bounding boxes according to the first coordinate parameter of the first bounding box. Specifically, the first coordinate parameter may include a first abscissa, a first ordinate corresponding to the first abscissa, a second abscissa and a second ordinate corresponding to the second abscissa, wherein the first abscissa is less than The second abscissa, and the first ordinate is smaller than the second ordinate. The text sorting module 123 can set the first abscissa to zero to generate a reference bounding box. Next, the text sorting module 123 may select at least a certain bounding box from the plurality of bounding boxes in response to an overlapping area between at least a certain bounding box and the reference bounding box being greater than a threshold.

以圖7A的定界框72為例，假設定界框72在步驟S601時被選為第一定界框，閾值為定界框的50%，並且定界框72的座標參數為。文字排序模組123可根據定界框72的座標參數選出定界框71與定界框72。具體來說，文字排序模組123可將定界框的X座標設為零以產生座標參數為的參考定界框81。接著，文字排序模組123可響應於定界框71與參考定界框81之間的重疊區域大於定界框71的50%而選出定界框71。此外，文字排序模組123可響應於定界框72與參考定界框81之間的重疊區域大於定界框72的50%而選出定界框72。 Taking the bounding box 72 of Fig. 7A as an example, suppose that the bounding box 72 is selected as the first bounding box in step S601, the threshold is 50% of the bounding box, and the coordinate parameter of the bounding box 72 is . The text sorting module 123 can select the bounding box 71 and the bounding box 72 according to the coordinate parameters of the bounding box 72 . Specifically, the text sorting module 123 can set the X coordinate of the bounding box Set to zero to produce a coordinate parameter of The reference bounding box 81. Next, the text sorting module 123 can select the bounding box 71 in response to the overlapping area between the bounding box 71 and the reference bounding box 81 being greater than 50% of the bounding box 71 . In addition, the text sorting module 123 may select the bounding box 72 in response to the overlapping area between the bounding box 72 and the reference bounding box 81 being greater than 50% of the bounding box 72 .

在步驟S603中，文字排序模組123可從至少一定界框中選出對應於最小橫座標的第二定界框。換句話說，若在至少一定界框中，第二定界框的參考點（例如：頂點或中心點）的橫座標小於其他定界框的參考點的橫座標，則文字排序模組123可從至少一定界框中選出第二定界框。In step S603, the text sorting module 123 may select a second bounding box corresponding to the minimum abscissa from at least certain bounding boxes. In other words, if in at least a certain bounding box, the abscissa of the reference point (for example: vertex or center point) of the second bounding box is smaller than the abscissa of reference points of other bounding boxes, the text sorting module 123 may A second bounding box is selected from at least certain bounding boxes.

以圖7A為例，定界框71的座標參數為，並且定界框72的座標參數為。假設參考點為定界框中具有最小橫座標的頂點。換句話說，定界框71的參考點的X座標為，並且定界框72的參考點的X座標為。在根據參考定界框81選出定界框71和定界框72後，文字排序模組123可響應於定界框71的X座標小於定界框72的X座標而從定界框71和定界框72兩者中選出定界框71以作為第二定界框。 Taking Fig. 7A as an example, the coordinate parameters of the bounding box 71 are , and the coordinate parameters of the bounding box 72 are . The reference point is assumed to be the vertex with the smallest abscissa in the bounding box. In other words, the X coordinate of the reference point of the bounding box 71 is , and the X coordinate of the reference point of the bounding box 72 is . After selecting the bounding box 71 and the bounding box 72 according to the reference bounding box 81, the text sorting module 123 can respond to the X coordinate of the bounding box 71 X coordinate smaller than bounding box 72 And the bounding box 71 is selected from both the bounding box 71 and the bounding box 72 as the second bounding box.

在步驟S604中，文字排序模組123可為第二定界框分配順序。文字排序模組123可響應於為第二定界框分配順序而將第二定界框的第二座標參數自多個座標參數移除，藉以更新多個座標參數。In step S604, the text sorting module 123 can assign a sequence to the second bounding box. The text sorting module 123 may remove the second coordinate parameter of the second bounding frame from the plurality of coordinate parameters in response to assigning the order to the second bounding frame, thereby updating the plurality of coordinate parameters.

舉例來說，若定界框71是圖7A的所有定界框中第一個被選中的定界框（即：定界框71在初次執行步驟S603時被選為第二定界框），則文字排序模組123可將第一順序分配給定界框71。在分配完第一順序給定界框71後，文字排序模組123可將定界框71的座標參數自多個座標參數中移除，藉以更新多個座標參數。經更新的多個座標參數可用以產生如圖7B所示的多個定界框。For example, if the bounding box 71 is the first selected bounding box among all the bounding boxes in FIG. , the text sorting module 123 can assign the first order to the bounding box 71 . After allocating the bounding box 71 in the first order, the text sorting module 123 may remove the coordinate parameter of the bounding box 71 from the plurality of coordinate parameters, so as to update the plurality of coordinate parameters. The updated coordinate parameters can be used to generate bounding boxes as shown in FIG. 7B.

在步驟S605中，文字排序模組123可判斷是否存在未被分配順序的定界框。若還存在未被分配順序的定界框，則重新執行步驟S601。以下以圖7B說明圖6流程的第二次迭代。在步驟S601中，文字排序模組123可從多個定界框中選出對應於最小縱座標的定界框72以作為第一定界框。在步驟S602中，文字排序模組123可根據定界框72產生參考定界框82。文字排序模組123可從多個定界框中選出與參考定界框82重疊的定界框72以作為至少一定界框。在步驟S603中，文字排序模組123可從至少一定界框中選出對應於最小橫座標的定界框72以作為第二定界框。在步驟S604中，文字排序模組123可將第二順序分配給定界框72，並且將定界框72的座標參數自多個座標參數中移除。在步驟S605中，文字排序模組123可判斷是否存在未被分配順序的定界框。在重複地執行圖6的流程後，文字排序模組將可為每一個定界框或每一個字元分配相應的順序，從而產生分別對應於多個定界框的多個順序。In step S605 , the text sorting module 123 can determine whether there is a bounding box that is not assigned a sequence. If there is still a bounding box that has not been assigned a sequence, step S601 is re-executed. The second iteration of the flow chart in FIG. 6 is described below with FIG. 7B. In step S601 , the text sorting module 123 may select the bounding box 72 corresponding to the smallest ordinate from the plurality of bounding boxes as the first bounding box. In step S602 , the text sorting module 123 can generate a reference bounding box 82 according to the bounding box 72 . The text sorting module 123 can select the bounding box 72 overlapping with the reference bounding box 82 from the multiple bounding boxes as at least a certain bounding box. In step S603 , the text sorting module 123 can select the bounding box 72 corresponding to the smallest abscissa from at least certain bounding boxes as the second bounding box. In step S604, the text sorting module 123 can assign the second order to the bounding box 72, and remove the coordinate parameter of the bounding box 72 from the plurality of coordinate parameters. In step S605 , the text sorting module 123 can determine whether there is a bounding box that is not assigned a sequence. After repeatedly executing the process shown in FIG. 6 , the text sorting module can assign a corresponding sequence to each bounding box or each character, thereby generating multiple sequences corresponding to multiple bounding boxes.

在步驟S605中，若文字排序模組123判斷不存在未被分配順序的定界框，則文字排序模組123可通過訊息S5將分別對應於多個定界框的多個順序傳送給匹配模組124。訊息S5可包含定界框的序號以及定界框的順序。表4為定界框的序號以及定界框的順序的範例。表4 定界框經更新的定界框序號順序 31 71 #1 第一順序 32 72 #2 第二順序 33 73 #3 第三順序 34 74 #4 第四順序 35 75 #5 第五順序 In step S605, if the text sorting module 123 judges that there is no bounding box that has not been assigned a sequence, the text sorting module 123 can send a plurality of sequences corresponding to multiple bounding boxes to the matching module through a message S5 Group 124. The message S5 may include the serial number of the bounding frame and the sequence of the bounding frame. Table 4 is an example of the sequence numbers of the bounding boxes and the order of the bounding boxes. Table 4 bounding box updated bounding box serial number order 31 71 #1 first order 32 72 #2 second order 33 73 #3 third order 34 74 #4 fourth order 35 75 #5 fifth order

回到圖2，匹配模組124可自字元辨識模組122接收訊息S4，並且自文字排序模組123接收訊息S5。匹配模組124可自訊息S4中取得分別對應於多個定界框的多個序號以及多個辨識結果，並可自訊息S5取得分別對應於多個定界框的多個序號以及多個順序。匹配模組124可匹配多個辨識結果與多個順序以產生字串。匹配模組124可通過訊息S6將字串傳送給輸出模組125。輸出模組125可通過收發器130輸出字串。舉例來說，匹配模組124可融合表2和表4的資訊以產生如表5所示的融合資訊。匹配模組124可依照順序排列多個辨識結果以產生字串「ABCDE」。表5 定界框序號辨識結果順序 31 #1 A 第一順序 32 #2 B 第二順序 33 #3 C 第三順序 34 #4 D 第四順序 35 #5 E 第五順序 Returning to FIG. 2 , the matching module 124 can receive the message S4 from the character recognition module 122 and the message S5 from the character sorting module 123 . The matching module 124 can obtain a plurality of serial numbers and a plurality of recognition results respectively corresponding to a plurality of bounding boxes from the message S4, and can obtain a plurality of serial numbers and a plurality of sequences corresponding to a plurality of bounding boxes respectively from the message S5 . The matching module 124 can match a plurality of recognition results with a plurality of sequences to generate a word string. The matching module 124 can send the string to the output module 125 through the message S6. The output module 125 can output the string through the transceiver 130 . For example, the matching module 124 can fuse the information in Table 2 and Table 4 to generate the fused information shown in Table 5 . The matching module 124 can arrange the recognition results in order to generate the word string "ABCDE". table 5 bounding box serial number Identification result order 31 #1 A first order 32 #2 B second order 33 #3 C third order 34 #4 D. fourth order 35 #5 E. fifth order

圖8根據本發明的一實施例繪示一種偵測和辨識文字的方法的流程圖，其中所述方法可由如圖1所示的電子裝置100實施。在步驟S801中，接收文字影像。在步驟S802中，偵測文字影像以產生分別對應於多個字元的多個定界框，並且根據多個定界框計算對應於文字影像的傾斜角度。在步驟S803中，根據多個定界框產生分別對應於多個字元的多個辨識結果。在步驟S804中，根據傾斜角度以及多個定界框產生分別對應於多個字元的多個順序。在步驟S805中，匹配多個辨識結果以及多個順序以產生字串。在步驟S806中，輸出字串。FIG. 8 shows a flowchart of a method for detecting and recognizing characters according to an embodiment of the present invention, wherein the method can be implemented by the electronic device 100 shown in FIG. 1 . In step S801, a text image is received. In step S802, the text image is detected to generate a plurality of bounding boxes corresponding to the plurality of characters, and an inclination angle corresponding to the text image is calculated according to the plurality of bounding boxes. In step S803, a plurality of recognition results respectively corresponding to the plurality of characters are generated according to the plurality of bounding boxes. In step S804, a plurality of sequences respectively corresponding to the plurality of characters are generated according to the tilt angle and the plurality of bounding boxes. In step S805, a plurality of recognition results and a plurality of sequences are matched to generate a word string. In step S806, a character string is output.

綜上所述，本發明的電子裝置可取得文字影像並且辨識文字影像中的文件上的字元。此外，電子裝置可根據文件上的字元判斷文件的傾斜角度，從而根據傾斜角度校正各個字元的定界框的位置。電子裝置可根據校正後的定界框判斷每一個字元所對應的順序，從而對經過辨識的多個字元進行排列以產生字串。據此，就算文字影像中的文件已經傾斜，電子裝置仍能順利地將文件的內容數位化。To sum up, the electronic device of the present invention can acquire text images and recognize characters on documents in the text images. In addition, the electronic device can determine the inclination angle of the document according to the characters on the document, so as to correct the position of the bounding box of each character according to the inclination angle. The electronic device can determine the sequence corresponding to each character according to the corrected bounding box, so as to arrange the recognized characters to generate a word string. Accordingly, even if the document in the text image is skewed, the electronic device can still digitize the content of the document smoothly.

100:電子裝置 110:處理器 120:儲存媒體 121:字元偵測模組 122:字元辨識模組 123:文字排序模組 124:匹配模組 125:輸出模組 130:收發器 30:文件 31、32、33、34、35、71、72、73、74、75:定界框 81、82:參考定界框 C1、C2、C3、C4、C5:中心點 L:文字影像的長度 P1、P2、P3、P4:頂點 S1:文字影像 S2、S3、S4、S5、S6:訊息 S501、S502、S503、S504、S601、S602、S603、S604、S605、S801、S802、S803、S804、S805、S806:步驟 W:文字影像的寬度 θ1、θ2、θ3:傾斜角度 100: Electronic device 110: Processor 120: storage media 121:Character detection module 122:Character recognition module 123:Text sorting module 124: Matching module 125: Output module 130: Transceiver 30: File 31, 32, 33, 34, 35, 71, 72, 73, 74, 75: bounding box 81, 82: Reference bounding box C1, C2, C3, C4, C5: center point L: the length of the text image P1, P2, P3, P4: vertices S1: text image S2, S3, S4, S5, S6: message S501, S502, S503, S504, S601, S602, S603, S604, S605, S801, S802, S803, S804, S805, S806: steps W: the width of the text image θ1, θ2, θ3: tilt angle

圖1根據本發明的一實施例繪示一種偵測和辨識文字的電子裝置的示意圖。圖2根據本發明的一實施例繪示電子裝置的多個模組的示意圖。圖3根據本發明的一實施例繪示文字影像的示意圖。圖4根據本發明的一實施例繪示執行包含多個定界框的文字影像的示意圖。圖5根據本發明的一實施例繪示執行座標轉換的流程圖。圖6根據本發明的一實施例繪示產生字元的順序的流程圖。圖7A和7B根據本發明的一實施例繪示根據訊息S3產生的定界框的示意圖。圖8根據本發明的一實施例繪示一種偵測和辨識文字的方法的流程圖。 FIG. 1 shows a schematic diagram of an electronic device for detecting and recognizing characters according to an embodiment of the present invention. FIG. 2 is a schematic diagram illustrating a plurality of modules of an electronic device according to an embodiment of the present invention. FIG. 3 is a schematic diagram illustrating a text image according to an embodiment of the present invention. FIG. 4 illustrates a schematic diagram of executing a text image including multiple bounding boxes according to an embodiment of the present invention. FIG. 5 illustrates a flow chart of performing coordinate transformation according to an embodiment of the invention. FIG. 6 is a flowchart illustrating a sequence of character generation according to an embodiment of the present invention. 7A and 7B illustrate a schematic diagram of a bounding box generated according to the message S3 according to an embodiment of the present invention. FIG. 8 shows a flow chart of a method for detecting and recognizing characters according to an embodiment of the present invention.

S801、S802、S803、S804、S805、S806:步驟 S801, S802, S803, S804, S805, S806: steps

Claims

An electronic device for detecting and recognizing characters, comprising: Transceiver, receive text image; a storage medium for storing multiple modules; and A processor, coupled to the storage medium and the transceiver, and accessing and executing the multiple modules, wherein the multiple modules include: A character detection module, which detects the character image to generate a plurality of bounding boxes respectively corresponding to a plurality of characters, and calculates an inclination angle corresponding to the character image according to the plurality of bounding boxes; The character recognition module generates a plurality of recognition results respectively corresponding to the plurality of characters according to the plurality of bounding boxes; The text sorting module generates a plurality of orders respectively corresponding to the plurality of bounding boxes according to the tilt angle and the plurality of bounding boxes; a matching module, matching the plurality of recognition results and the plurality of sequences to generate a word string; and The output module outputs the word string through the transceiver.

The electronic device as claimed in claim 1, wherein the plurality of bounding frames includes a first bounding frame, wherein the character detection module is responsive to a second bounding frame in the plurality of bounding frames A first tilt angle is calculated according to the first bounding box and the second bounding box closest to the first bounding box, and the tilt angle is calculated according to the first tilt angle.

The electronic device according to claim 2, wherein the character detection module calculates the inclination angle according to the first center point of the first bounding frame and the second center point of the second bounding frame .

The electronic device according to claim 1, wherein the character detection module performs coordinate conversion on a plurality of coordinate parameters respectively corresponding to the plurality of bounding boxes according to the tilt angle to update the plurality of coordinates parameters, wherein the text sorting module generates the multiple orders according to the updated coordinate parameters.

The electronic device as described in claim 4, wherein the word sorting module is configured to execute: selecting a first bounding box corresponding to a minimum ordinate from the plurality of bounding boxes according to the plurality of coordinate parameters; selecting at least a certain bounding box from the plurality of bounding boxes according to a first coordinate parameter of the first bounding box; selecting a second bounding box corresponding to the smallest abscissa from said at least certain bounding box; and Assigning an order to the second bounding box to generate the plurality of orders.

The electronic device according to claim 5, wherein the text sorting module assigns the second coordinate parameter of the second bounding box from the plurality of The coordinate parameters are removed, so as to update the plurality of coordinate parameters, wherein the word sorting module generates the plurality of sequences according to the updated coordinate parameters.

The electronic device according to claim 5, wherein the first coordinate parameter includes a first abscissa, a first ordinate corresponding to the first abscissa, a second abscissa, and a first abscissa corresponding to the second abscissa , wherein the first abscissa is smaller than the second abscissa, wherein the text sorting module is further configured to perform: setting the first abscissa of the first coordinate parameter to zero to generate a reference bounding box; and In response to an overlapping area between the at least certain bounding box and the reference bounding box being greater than a threshold, the at least certain bounding box is selected from the plurality of bounding boxes.

The electronic device according to claim 1, wherein the character detection module generates the plurality of bounding boxes according to a machine learning model.

The electronic device as claimed in claim 1, wherein the character recognition module generates the plurality of recognition results according to an optical character recognition model.

The electronic device according to claim 1, wherein the tilt angle is greater than or equal to negative forty-five degrees and less than or equal to forty-five degrees.

The electronic device as claimed in claim 1, wherein the character image corresponds to a horizontal writing format.

A method of detecting and recognizing text, comprising: receive text images; Detecting the text image to generate a plurality of bounding boxes respectively corresponding to a plurality of characters, and calculating an inclination angle corresponding to the text image according to the plurality of bounding boxes; generating a plurality of recognition results respectively corresponding to the plurality of characters according to the plurality of bounding boxes; generating a plurality of sequences respectively corresponding to the plurality of characters according to the tilt angle and the plurality of bounding boxes; matching the plurality of recognition results and the plurality of sequences to generate a word string; and Output the string.