TWI706336B

TWI706336B - Image processing device and method for detecting and filtering text object

Info

Publication number: TWI706336B
Application number: TW107141052A
Authority: TW
Inventors: 林志榮; 柳恆崧; 黃少鵬; 周揚賀; 謝君偉; 吳宇鴻
Original assignee: 中華電信股份有限公司
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2020-10-01
Also published as: TW202020740A

Abstract

The disclosure provides an image processing device and a method for detecting and filtering text object. The method includes: retrieving an image, wherein the image includes a text object; analyzing the image to find out a plurality of candidate bounding boxes in the image, wherein the candidate bounding boxes partially overlap with each other; calculating a score of each of the candidate bounding boxes; finding out a specific candidate bounding box including the text object from the candidate bounding boxes according to the score of each of the candidate bounding boxes.

Description

Image processing device and method for detecting and filtering text objects

本發明是有關於一種影像處理裝置及其偵測與過濾文字物件的方法，且特別是有關於一種可找出包括文字物件的邊界框的影像處理裝置及其偵測與過濾文字物件的方法。 The present invention relates to an image processing device and a method for detecting and filtering text objects, and more particularly to an image processing device capable of finding a bounding box including text objects and a method for detecting and filtering text objects.

在文字辨識的領域裡，將包括文字的物件從圖像中清楚且正確地擷取並進行辨識，實為至關重要的議題。若能夠精準地在影像中找出包括文字的物件，將可有效地提升其中文字的辨識率。然而，圖像中往往存在許多干擾元素，如不同的光線強弱或複雜的背景，而這些元素常常會增加文字物件辨識的困難度。 In the field of text recognition, it is a crucial issue to clearly and accurately extract and recognize objects including text from images. If an object including text can be accurately found in the image, the recognition rate of the text can be effectively improved. However, there are often many interference elements in the image, such as different light intensity or complex background, and these elements often increase the difficulty of identifying text objects.

有鑑於此，本發明提出一種影像處理裝置及其偵測與過濾文字物件的方法，其可用以解決以上技術問題。 In view of this, the present invention provides an image processing device and a method for detecting and filtering text objects, which can solve the above technical problems.

本發明提供一種偵測與過濾文字物件的方法，包括：取得一影像，其中影像包括一文字物件；分析影像以在影像中找出多個候選邊界框，其中前述候選邊界框彼此部分地重疊，且各候選邊界框包括一中間區域、一第一側區域、相對於第一側區域的一第二側區域、一第三側區域及相對於第三側區域的一第四側區域；計算各候選邊界框的一分數；依據各候選邊界框的分數從前述候選邊界框中找出包括文字物件的一特定候選邊界框。計算各候選邊界框的分數的步驟包括：對前述候選邊界框中的一第一候選邊界框而言，計算第一候選邊界框的第一側區域的邊緣密度及第二側區域的邊緣密度之間的一邊緣密度比值；計算第一候選邊界框的第一側區域的灰度及第二側區域的灰度之間的一第一灰度差值，並計算第一灰度差值相對於中間區域、第三側區域及第四側區域的灰度的一第一灰度比值；計算第一候選邊界框第三側區域的灰度與第四側灰度區域的灰度之間的一第二灰度差值，並正規化第二灰度差值以得到一灰度差正規化值；將邊緣密度比值、第一灰度比值及灰度差正規化值個別乘以一第一權重值、一第二權重值及一第三權重值並加總，以得到第一候選邊界框的分數。 The present invention provides a method for detecting and filtering text objects, including: obtaining an image, wherein the image includes a text object; analyzing the image to find out in the image A plurality of candidate bounding boxes, where the aforementioned candidate bounding boxes partially overlap each other, and each candidate bounding box includes a middle area, a first side area, a second side area relative to the first side area, and a third side area And a fourth side region relative to the third side region; calculating a score of each candidate bounding box; finding a specific candidate bounding box including the text object from the aforementioned candidate bounding box according to the score of each candidate bounding box. The step of calculating the score of each candidate bounding box includes: for a first candidate bounding box in the aforementioned candidate bounding box, calculating the edge density of the first side area of the first candidate bounding box and the edge density of the second side area Calculate a first gray level difference between the gray level of the first side area of the first candidate bounding box and the gray level of the second side area, and calculate the first gray level difference relative to A first gray scale ratio of the gray levels of the middle area, the third side area, and the fourth side area; calculate the one between the gray level of the third side area of the first candidate bounding box and the gray level of the fourth side gray area The second gray level difference value, and the second gray level difference value is normalized to obtain a gray level difference normalized value; the edge density ratio, the first gray level ratio and the gray level difference normalized value are respectively multiplied by a first weight Value, a second weight value, and a third weight value are added together to obtain the score of the first candidate bounding box.

本發明提供一種影像處理裝置，包括儲存電路及處理器。儲存電路儲存多個模組。處理器耦接儲存電路，存取前述模組以執行下列步驟：取得一影像，其中影像包括一文字物件；分析影像以在影像中找出多個候選邊界框，其中前述候選邊界框彼此部分地重疊，且各候選邊界框包括一中間區域、一第一側區域、相對於第一側區域的一第二側區域、一第三側區域及相對於第三側區域的一第四側區域；計算各候選邊界框的一分數；依據各候選邊界框的分數從前述候選邊界框中找出包括文字物件的一特定候選邊界框。計算各候選邊界框的分數的步驟包括：對前述候選邊界框中的一第一候選邊界框而言，計算第一候選邊界框的第一側區域的邊緣密度及第二側區域的邊緣密度之間的一邊緣密度比值；計算第一候選邊界框的第一側區域的灰度及第二側區域的灰度之間的一第一灰度差值，並計算第一灰度差值相對於中間區域、第三側區域及第四側區域的灰度的一第一灰度比值；計算第一候選邊界框第三側區域的灰度與第四側灰度區域的灰度之間的一第二灰度差值，並正規化第二灰度差值以得到一灰度差正規化值；將邊緣密度比值、第一灰度比值及灰度差正規化值個別乘以一第一權重值、一第二權重值及一第三權重值並加總，以得到第一候選邊界框的分數。 The invention provides an image processing device, which includes a storage circuit and a processor. The storage circuit stores multiple modules. The processor is coupled to the storage circuit and accesses the aforementioned module to perform the following steps: obtain an image, wherein the image includes a text object; analyze the image to find a plurality of candidate bounding boxes in the image, wherein the candidate bounding boxes partially overlap each other , And each candidate bounding box includes a middle area, a first side area, a second side area relative to the first side area, a third side area, and a third side area relative to the third A fourth side area of the side area; calculating a score of each candidate bounding box; finding a specific candidate bounding box including the text object from the aforementioned candidate bounding box according to the score of each candidate bounding box. The step of calculating the score of each candidate bounding box includes: for a first candidate bounding box in the aforementioned candidate bounding box, calculating the edge density of the first side area of the first candidate bounding box and the edge density of the second side area Calculate a first gray level difference between the gray level of the first side area of the first candidate bounding box and the gray level of the second side area, and calculate the first gray level difference relative to A first gray scale ratio of the gray levels of the middle area, the third side area, and the fourth side area; calculate the one between the gray level of the third side area of the first candidate bounding box and the gray level of the fourth side gray area The second gray level difference value, and the second gray level difference value is normalized to obtain a gray level difference normalized value; the edge density ratio, the first gray level ratio and the gray level difference normalized value are respectively multiplied by a first weight Value, a second weight value, and a third weight value are added together to obtain the score of the first candidate bounding box.

基於上述，本發明提出的影像處理裝置及其偵測與過濾文字物件的方法可在從影像中找出彼此部分重疊的多個候選邊界框之後，計算各候選邊界框的分數，並可將各候選邊界框中具最高分數的一者決定為包括文字物件的特定候選邊界框。藉此，可提升找出文字物件的效率及準確度。 Based on the above, the image processing device and the method for detecting and filtering text objects proposed by the present invention can calculate the score of each candidate bounding box after finding out a plurality of candidate bounding boxes that overlap each other in the image, and can compare each The one with the highest score in the candidate bounding box is determined as the specific candidate bounding box including the text object. In this way, the efficiency and accuracy of finding text objects can be improved.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

100:影像處理裝置 100: Image processing device

102:儲存電路 102: storage circuit

104:處理器 104: processor

300、600:影像 300, 600: image

310、610:文字物件 310, 610: text objects

311a、311b、311c、311d、311e、611a:候選邊界框 311a, 311b, 311c, 311d, 311e, 611a: candidate bounding box

411:第一側區域 411: first side area

412:第二側區域 412: second side area

413:第三側區域 413: Third Side Area

414:第四側區域 414: Fourth Side Area

415:中間區域 415: middle area

S210~S240:步驟 S210~S240: steps

圖1是依據本發明之一實施例繪示的影像處理裝置示意圖。 FIG. 1 is a schematic diagram of an image processing device according to an embodiment of the invention.

圖2是依據本發明之一實施例繪示的偵測與過濾文字物件的方法流程圖。 2 is a flowchart of a method for detecting and filtering text objects according to an embodiment of the invention.

圖3是依據本發明之一實施例繪示的影像示意圖。 FIG. 3 is a schematic diagram of an image drawn according to an embodiment of the invention.

圖4是依據圖3繪示的候選邊界框示意圖。 FIG. 4 is a schematic diagram of the candidate bounding box shown in FIG. 3.

圖5是依據圖3繪示的在影像中標示特定候選邊界框的示意圖。 FIG. 5 is a schematic diagram of marking a specific candidate bounding box in an image according to FIG. 3.

圖6是依據本發明之一實施例繪示的文字辨識示意圖。 Fig. 6 is a schematic diagram of character recognition according to an embodiment of the present invention.

請參照圖1，其是依據本發明之一實施例繪示的影像處理裝置示意圖。在本實施例中，影像處理裝置100可以是伺服器、手機、智慧型手機、個人電腦(personal computer，PC)、筆記型電腦(notebook PC)、網本型電腦(netbook PC)、平板電腦(tablet PC)等，但可不限於此。 Please refer to FIG. 1, which is a schematic diagram of an image processing device according to an embodiment of the present invention. In this embodiment, the image processing device 100 may be a server, a mobile phone, a smart phone, a personal computer (PC), a notebook PC, a netbook PC, a tablet computer ( tablet PC), but not limited to this.

如圖1所示，影像處理裝置100包括儲存電路102及處理器104。儲存電路102例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合，而可用以記錄多個程式碼或模組。 As shown in FIG. 1, the image processing device 100 includes a storage circuit 102 and a processor 104. The storage circuit 102 is, for example, any type of fixed or removable random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), flash memory (Flash memory), hard disk Disk or other similar devices or a combination of these devices can be used to record multiple codes or modules.

處理器104耦接於儲存電路102，並可為一般用途處理器、特殊用途處理器、傳統的處理器、數位訊號處理器、多個微處理器(microprocessor)、一個或多個結合數位訊號處理器核心的微處理器、控制器、微控制器、特殊應用集成電路(Application Specific Integrated Circuit，ASIC)、場可程式閘陣列電路(Field Programmable Gate Array，FPGA)、任何其他種類的積體電路、狀態機、基於進階精簡指令集機器(Advanced RISC Machine，ARM)的處理器以及類似品。 The processor 104 is coupled to the storage circuit 102, and can be a general-purpose processor, a special-purpose processor, a traditional processor, a digital signal processor, a plurality of microprocessors, one or more combined digital signal processing The core microprocessor, controller, microcontroller, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), any other types of integrated circuits, State machines, processors based on Advanced RISC Machine (ARM) and similar products.

在本發明的實施例中，處理器104可載入儲存電路102中所記錄的程式碼或模組以執行本發明提出的偵測與過濾文字物件的方法，以下將作進一步說明。 In the embodiment of the present invention, the processor 104 can load the program codes or modules recorded in the storage circuit 102 to execute the method of detecting and filtering text objects proposed by the present invention, which will be further described below.

請參照圖2，其是依據本發明之一實施例繪示的偵測與過濾文字物件的方法流程圖。本實施例的方法可由圖1的影像處理裝置100執行，以下即搭配圖1所示的元件來說明圖2各步驟的細節。 Please refer to FIG. 2, which is a flowchart of a method for detecting and filtering text objects according to an embodiment of the present invention. The method of this embodiment can be executed by the image processing device 100 in FIG. 1. The details of each step in FIG. 2 are described below with the components shown in FIG. 1.

首先，在步驟S210中，處理器104可取得影像。在不同的實施例中，上述影像可以是任意的影像，且其中可包括文字物件，例如招牌、路牌等，但可不限於此。 First, in step S210, the processor 104 can obtain an image. In different embodiments, the above-mentioned image may be any image, and it may include text objects, such as signs, road signs, etc., but it is not limited thereto.

為便於理解，以下將輔以圖3作具體說明，但其並非用以限定本發明可能的實施方式。 For ease of understanding, the following will be supplemented with FIG. 3 for specific description, but it is not used to limit the possible embodiments of the present invention.

請參照圖3，其是依據本發明之一實施例繪示的影像示意圖。在圖3中，影像300包括文字物件310(其例如是寫著「W. CHESTNUT ST.」的路牌)。 Please refer to FIG. 3, which is a schematic diagram of an image drawn according to an embodiment of the present invention. In FIG. 3, the image 300 includes a text object 310 (which, for example, is written "W. CHESTNUT ST." street sign).

基於圖3，處理器104可接續執行步驟S220以分析影像300以在影像300中找出多個候選邊界框311a、311b、311c、311d、311e。在不同的實施例中，處理器104可基於任何已知的邊緣偵測演算法來在影像300中找出候選邊界框311a~311e。舉例而言，處理器104可先算出影像300中相鄰畫素之間的畫素梯度值。若存在較高的畫素梯度值，即代表該處的畫素存在邊緣，故處理器104可相應地定義出如圖3所示的候選邊界框311a~311e，但本發明可不限於此。 Based on FIG. 3, the processor 104 may continue to perform step S220 to analyze the image 300 to find a plurality of candidate bounding boxes 311a, 311b, 311c, 311d, and 311e in the image 300. In different embodiments, the processor 104 can find the candidate bounding boxes 311a to 311e in the image 300 based on any known edge detection algorithm. For example, the processor 104 may first calculate the pixel gradient value between adjacent pixels in the image 300. If there is a higher pixel gradient value, it means that the pixel has an edge. Therefore, the processor 104 can define the candidate bounding boxes 311a to 311e as shown in FIG. 3 accordingly, but the present invention is not limited to this.

在一實施例中，處理器104亦可先計算影像300的積分影像(Integral Image)，並在此積分影像中找出候選邊界框311a~311e，以提升執行的效率。 In one embodiment, the processor 104 may also calculate the integral image of the image 300 first, and find the candidate bounding boxes 311a to 311e in the integral image to improve the efficiency of execution.

如圖3所示，候選邊界框311a~311e彼此部分地重疊。相應地，處理器104可從中過濾掉未具體包括文字物件310的一或多者，以找出具體包括文字物件310的特定候選邊界框。 As shown in FIG. 3, the candidate bounding boxes 311a to 311e partially overlap each other. Correspondingly, the processor 104 can filter out one or more ones that do not specifically include the text object 310 to find a specific candidate bounding box that specifically includes the text object 310.

概略而言，處理器104可執行步驟S230以計算各候選邊界框311a~311e的分數，並執行步驟S240以依據各候選邊界框311a~311e的分數從候選邊界框311a~311e中找出包括文字物件310的特定候選邊界框。相關細節說明如下。 Roughly speaking, the processor 104 may perform step S230 to calculate the scores of each candidate bounding box 311a~311e, and perform step S240 to find out the candidate bounding boxes 311a~311e that contain text according to the scores of each candidate bounding box 311a~311e A specific candidate bounding box of the object 310. The relevant details are as follows.

請參照圖4，其是依據圖3繪示的候選邊界框示意圖。在本實施例中，將以圖3的候選邊界框311a為例來說說明計算分數相關機制，而本領域具通常知識者應可據以推得計算候選邊界框 311b~311e的分數的相關細節。 Please refer to FIG. 4, which is a schematic diagram of the candidate bounding box shown in FIG. 3. In this embodiment, the candidate bounding box 311a of FIG. 3 will be taken as an example to illustrate the calculation of the score-related mechanism, and a person with ordinary knowledge in the art should be able to calculate the candidate bounding box accordingly. Details about the scores of 311b~311e.

如圖4所示，本實施例的候選邊界框311a包括第一側區域411、第二側區域412、第三側區域413、第四側區域414及中間區域415。在本實施例中，第二側區域412的位置相對於第一側區域411，第二側區域412與第一側區域411的長度可與候選邊界框311a相同，第二側區域412與第一側區域411可具有相同的寬度。此外，第三側區域413的位置相對於第四側區域414，且第四側區域414與第三側區域413的長、寬可相同。在決定第一側區域411、第二側區域412、第三側區域413及第四側區域414之後，可將候選邊界框311a中的剩餘區域定義為中間區域415。 As shown in FIG. 4, the candidate bounding box 311a of this embodiment includes a first side area 411, a second side area 412, a third side area 413, a fourth side area 414, and a middle area 415. In this embodiment, the position of the second side area 412 is relative to the first side area 411. The lengths of the second side area 412 and the first side area 411 may be the same as the candidate bounding box 311a. The side regions 411 may have the same width. In addition, the position of the third side area 413 is relative to the fourth side area 414, and the length and width of the fourth side area 414 and the third side area 413 may be the same. After the first side area 411, the second side area 412, the third side area 413, and the fourth side area 414 are determined, the remaining area in the candidate bounding box 311a can be defined as the middle area 415.

由圖4可看出，第一側區域411、第二側區域412、第三側區域413、第四側區域414及中間區域415彼此可互不重疊，且中間區域415、第三側區域413及第四側區域414介於第一側區域411及第二側區域412之間，但本發明可不限於此。在其他實施例中，設計者亦可依需求而調整圖4中各區塊的長、寬。 It can be seen from FIG. 4 that the first side area 411, the second side area 412, the third side area 413, the fourth side area 414, and the middle area 415 may not overlap each other, and the middle area 415 and the third side area 413 And the fourth side area 414 is between the first side area 411 and the second side area 412, but the present invention may not be limited thereto. In other embodiments, the designer can also adjust the length and width of each block in FIG. 4 according to requirements.

之後，處理器104可計算第一側區域411的邊緣密度及第二側區域412的邊緣密度之間的邊緣密度比值。在本實施例中，第一側區域411的邊緣密度例如是第一側區域411中的多個畫素梯度值的總和。同理，第二側區域412的邊緣密度例如是第二側區域412中的多個畫素梯度值的總和。 After that, the processor 104 may calculate the edge density ratio between the edge density of the first side region 411 and the edge density of the second side region 412. In this embodiment, the edge density of the first side area 411 is, for example, the sum of the gradient values of multiple pixels in the first side area 411. In the same way, the edge density of the second side region 412 is, for example, the sum of the gradient values of multiple pixels in the second side region 412.

在一實施例中，在處理器104取得第一側區域411的邊緣密度及第二側區域412的邊緣密度之後，可以第一側區域411 的邊緣密度及第二側區域412的邊緣密度中的較小值除以較大值，以計算上述邊緣密度比值。在本實施例中，上述邊緣密度比值可表徵第一側區域411的邊緣密度與第二側區域412的邊緣密度之間的對稱性。 In an embodiment, after the processor 104 obtains the edge density of the first side area 411 and the edge density of the second side area 412, the first side area 411 may be The smaller value of the edge density of and the edge density of the second side region 412 is divided by the larger value to calculate the aforementioned edge density ratio. In this embodiment, the aforementioned edge density ratio can represent the symmetry between the edge density of the first side region 411 and the edge density of the second side region 412.

接著，處理器104可計算第一側區域411的灰度及第二側區域412的灰度之間的第一灰度差值，並計算第一灰度差值相對於中間區域415、第三側區域413及第四側區域414的灰度的第一灰度比值。 Next, the processor 104 may calculate the first gray-scale difference between the gray-scale of the first side area 411 and the gray-scale of the second side area 412, and calculate the first gray-level difference relative to the middle area 415 and the third The first gray scale ratio of the gray scales of the side area 413 and the fourth side area 414.

具體而言，處理器104可經配置以：計算第一側區域411中的第一平均灰度值；計算第二側區域412中的第二平均灰度值；計算第一平均灰度值及第二平均灰度值的第一絕對差值；計算中間區域415、第三側區域413及第四側區域414的中間平均灰度值，以第一絕對差值除以中間平均灰度值，以產生第二灰度比值；以1減去第二灰度比值，以產生第一灰度比值。在本實施例中，上述中間平均灰度值例如是中間區域415、第三側區域413及第四側區域414的平均灰度值。 Specifically, the processor 104 may be configured to: calculate the first average gray value in the first side area 411; calculate the second average gray value in the second side area 412; calculate the first average gray value and The first absolute difference of the second average gray value; calculate the intermediate average gray value of the middle area 415, the third side area 413, and the fourth side area 414, and divide the first absolute difference by the middle average gray value, To generate the second gray scale ratio; subtract the second gray scale ratio from 1 to generate the first gray scale ratio. In this embodiment, the aforementioned intermediate average gray value is, for example, the average gray value of the intermediate area 415, the third side area 413, and the fourth side area 414.

在本實施例中，第一灰度比值可表徵第一側區域311的灰度及第二側區域412的灰度相對於中間區域415、第三側區域413及第四側區域414的灰度的第一灰度對稱性。 In this embodiment, the first gray scale ratio can represent the gray scale of the first side region 311 and the gray scale of the second side region 412 relative to the gray scale of the middle region 415, the third side region 413, and the fourth side region 414. The first gray scale symmetry.

之後，處理器104可計算第三側區域413的灰度與第四側灰度區域414的灰度之間的第二灰度差值，並正規化第二灰度差值以得到灰度差正規化值。 After that, the processor 104 may calculate the second grayscale difference between the grayscale of the third side region 413 and the grayscale of the fourth side grayscale region 414, and normalize the second grayscale difference to obtain the grayscale difference. Normalized value.

具體而言，處理器104可經配置以：計算第三側區域413中的第三平均灰度值；計算第四側區域414中的第四平均灰度值；計算第三平均灰度值及第四平均灰度值的第二絕對差值；以第二絕對差值除以第三平均灰度值及第四平均灰度值中的一較大值，以產生第三灰度比值；以及以1減去第三灰度比值，以產生上述灰度差正規化值。 Specifically, the processor 104 may be configured to: calculate a third average gray value in the third side area 413; calculate a fourth average gray value in the fourth side area 414; calculate a third average gray value, and The second absolute difference of the fourth average gray value; divide the second absolute difference by the larger of the third average gray value and the fourth average gray value to generate the third gray ratio; and The third gray scale ratio is subtracted by 1 to generate the aforementioned gray scale difference normalized value.

在取得邊緣密度比值、第一灰度比值及灰度差正規化值之後，處理器104可將邊緣密度比值、第一灰度比值及灰度差正規化值個別乘以第一權重值、第二權重值及第三權重值並加總，以得到候選邊界框311a的分數。 After obtaining the edge density ratio, the first grayscale ratio, and the grayscale difference normalized value, the processor 104 may individually multiply the edge density ratio, the first grayscale ratio, and the grayscale difference normalized value by the first weight value and the first weight value. The second weight value and the third weight value are added together to obtain the score of the candidate bounding box 311a.

在一實施例中，第一權重值、第二權重值及第三權重值係基於一遞迴修正過程決定。具體而言，在前述過程中，處理器104可取得一預設影像，並初始化第一權重值、第二權重值及第三權重值，其中影像包括預設文字物件。 In one embodiment, the first weight value, the second weight value, and the third weight value are determined based on a recursive correction process. Specifically, in the foregoing process, the processor 104 may obtain a preset image, and initialize the first weight value, the second weight value, and the third weight value, where the image includes the preset text object.

之後，處理器104可在前述預設影像中找出多個候選邊界框，其中前述候選邊界框中的特定邊界框包括上述預設文字物件。在一實施例中，處理器104可參照先前實施例中教示的方式來在預設影像中找出上述候選邊界框，並計算各候選邊界框的分數。 After that, the processor 104 can find a plurality of candidate bounding boxes in the aforementioned predetermined image, wherein the specific bounding box of the aforementioned candidate bounding box includes the aforementioned predetermined text object. In an embodiment, the processor 104 can refer to the method taught in the previous embodiment to find the aforementioned candidate bounding boxes in the preset image, and calculate the score of each candidate bounding box.

接著，處理器104可遞迴修正第一權重值、第二權重值及第三權重值，直至上述特定邊界框(其包括上述預設文字物件)的分數趨近1且其他邊界框的分數趨近0，且第一權重值、第二權重值及第三權重值的總和為1。 Then, the processor 104 may recursively modify the first weight value, the second weight value, and the third weight value until the score of the specific bounding box (which includes the preset text object) approaches 1, and the scores of other bounding boxes tend to Nearly 0, and the first weight value, the second weight The sum of the weight value and the third weight value is 1.

基於以上教示，處理器104可相應地計算圖3中各候選邊界框311a~311e的分數。之後，處理器104可從彼此重疊的候選邊界框311a~311e找出分數最高的一者作為特定候選邊界框(例如候選邊界框311e)。 Based on the above teaching, the processor 104 can calculate the scores of the candidate bounding boxes 311a to 311e in FIG. 3 accordingly. After that, the processor 104 may find the one with the highest score from the overlapping candidate bounding boxes 311a to 311e as the specific candidate bounding box (for example, the candidate bounding box 311e).

請參照圖5，其是依據圖3繪示的在影像中標示特定候選邊界框的示意圖。承先前實施例中所述，由於候選邊界框311e的分數高於候選邊界框311a~311d，因此處理器104可相應地濾除分數較低的其他候選邊界框311a~311d，以僅在圖5中標示包括文字物件310的候選邊界框311e。 Please refer to FIG. 5, which is a schematic diagram of marking a specific candidate bounding box in an image according to FIG. 3. As described in the previous embodiment, since the score of the candidate bounding box 311e is higher than that of the candidate bounding boxes 311a to 311d, the processor 104 can filter out other candidate bounding boxes 311a to 311d with lower scores accordingly, so as to be only shown in FIG. The middle mark includes the candidate bounding box 311e of the text object 310.

在一實施例中，處理器104還可進一步對候選邊界框311e進行文字辨識，以具體辨識文字物件310中的文字(例如，「W.CHESTNUT ST.」等字樣)。 In one embodiment, the processor 104 may further perform text recognition on the candidate bounding box 311e to specifically recognize the text in the text object 310 (for example, words such as "W. CHESTNUT ST.").

請參照圖6，其是依據本發明之一實施例繪示的文字辨識示意圖。在本實施例中，可依據先前實施例的教示而在影像600中找出多個候選邊界框，並計算各候選邊界框的分數。之後，可從這些候選邊界框中具最高分數的一者辨識為包括文字物件特定候選邊界框，例如包括文字物件610的候選邊界框611a。並且，還可進一步對包括文字物件610的候選邊界框611a進行文字辨識，以具體辨識文字物件610中的文字(例如，「AB-123」等字樣)。 Please refer to FIG. 6, which is a schematic diagram of character recognition according to an embodiment of the present invention. In this embodiment, a plurality of candidate bounding boxes can be found in the image 600 according to the teaching of the previous embodiment, and the score of each candidate bounding box can be calculated. Afterwards, the one with the highest score from these candidate bounding boxes can be identified as including the specific candidate bounding box of the text object, for example, the candidate bounding box 611a including the text object 610. In addition, the candidate bounding box 611a including the text object 610 can be further subjected to text recognition to specifically recognize the text in the text object 610 (for example, words such as "AB-123").

綜上所述，本發明提出的影像處理裝置及其偵測與過濾文字物件的方法可在從影像中找出彼此部分重疊的多個候選邊界框之後，基於各候選邊界框的邊緣密度比值、第一灰度比值及灰度差正規化值分數計算各候選邊界框的分數。之後，可將各候選邊界框中具最高分數的一者決定為包括文字物件的特定候選邊界框。藉此，可提升找出文字物件的效率及準確度，進而有助於提升路牌、招牌等辨識機制的效能。 In summary, the image processing device and method for detecting and filtering text objects provided by the present invention can find multiple candidate boundaries that partially overlap each other from the image. After the frame, the score of each candidate bounding box is calculated based on the edge density ratio of each candidate bounding box, the first gray ratio and the normalized value score of the gray difference. After that, the one with the highest score in each candidate bounding box may be determined as the specific candidate bounding box including the text object. In this way, the efficiency and accuracy of finding text objects can be improved, and the efficiency of identification mechanisms such as road signs and signboards can be improved.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be determined by the scope of the attached patent application.

S210~S240:步驟 S210~S240: steps

Claims

A method for detecting and filtering text objects includes: obtaining an image, wherein the image includes a text object; analyzing the image to find a plurality of candidate bounding boxes in the image, wherein the candidate bounding boxes partially overlap each other, And each of the candidate bounding boxes includes a middle area, a first side area, a second side area relative to the first side area, a third side area, and a fourth side area relative to the third side area Calculate a score of each candidate bounding box; find out a specific candidate bounding box including the text object from the candidate bounding boxes according to the score of each candidate bounding box, wherein, calculate the candidate bounding box of the candidate bounding box The step of scoring includes: for a first candidate bounding box in the candidate bounding boxes, calculating the difference between the edge density of the first side area of the first candidate bounding box and the edge density of the second side area An edge density ratio, wherein the edge density of the first side region includes the sum of the pixel gradient values in the first side region, and the edge density of the second side region includes the edge density in the second side region The sum of the gradient values of a plurality of pixels; calculate a first gray-scale difference between the gray-scale of the first side area and the gray-scale of the second side area of the first candidate bounding box, calculate the intermediate area, An average gray value of the third side area and the fourth side area, and a first gray value ratio is calculated based on the first gray value difference and the average gray value; the first candidate bounding box is calculated The gray scale of the three-side area is A second gray-scale difference between the gray levels of the region, and normalize the second gray-scale difference to obtain a gray-scale difference normalized value; the edge density ratio, the first gray-scale ratio, and the gray The degree difference normalized value is respectively multiplied by a first weight value, a second weight value, and a third weight value and added together to obtain the score of the first candidate bounding box.

The method according to claim 1, wherein the edge density ratio represents the symmetry between the edge density of the first side area and the edge density of the second side area of the first candidate bounding box, The first gray scale ratio represents the gray scale of the first side area and the second side area of the first candidate bounding box relative to the middle area, the third side area, and the fourth side area A first gray level symmetry of the gray level, and the gray level difference normalized value represents a second gray level symmetry between the gray level of the third side area and the gray level of the fourth side area .

The method according to claim 1, wherein the edge density of the first side area is a sum of a plurality of pixel gradient values included in the first side area.

The method according to item 1 of the scope of patent application, wherein the first grayscale difference between the grayscale of the first side area and the grayscale of the second side area of the first candidate bounding box is calculated, and the calculation The step of calculating the first gray-scale ratio based on the first gray-scale difference and the average gray-scale value of the average gray value of the middle area, the third side area and the fourth side area includes: calculating the A first average gray value in the first side area; Calculate a second average gray value in the second side area, where the length and width of the first side area and the second side area are equal; calculate the first average gray value and the second average gray value Calculate a middle average gray value of the middle area, the third side area and the fourth side area as the average gray value; divide the first absolute difference by the middle average Gray value to generate a second gray ratio; subtract the second gray ratio from 1 to generate the first gray ratio.

The method according to item 1 of the scope of patent application, wherein the second gray level difference between the gray level of the third side area of the first candidate bounding box and the gray level of the fourth side area is calculated, And normalizing the second gray level difference value to obtain the gray level difference normalization value includes: calculating a third average gray value in the third side area; calculating a fourth side area in the fourth side area. Average gray value, where the length and width of the third side area and the fourth side area are equal; calculate a second absolute difference between the third average gray value and the fourth average gray value; The two absolute difference values are divided by the larger of the third average gray value and the fourth average gray value to generate a third gray ratio; and the third gray ratio is subtracted by 1 to The normalized value of the gray level difference is generated.

The method according to item 1 of the scope of patent application, wherein the middle region, the first side region, the second side region, the third side region, and the fourth side region do not overlap each other, and the middle region, the The third side area and the fourth side area are between the first side area and the second side area.

For the method described in item 1 of the scope of patent application, the first weight value, the second weight value and the third weight value are determined based on a recursive correction process, and the process includes: obtaining a preset image, And initialize the first weight value, the second weight value, and the third weight value. The default image includes a default text object; a plurality of candidate bounding boxes are found in the default image, and the candidates A specific bounding box in the bounding box includes the preset text object; calculating a score of each candidate bounding box; recursively modifying the first weight value, the second weight value, and the third weight value until the specific boundary The score of the box approaches 1 and the score of each of the other candidate bounding boxes in the candidate bounding boxes approaches 0.

The method described in item 1 of the scope of the patent application, wherein the sum of the first weight value, the second weight value, and the third weight value is 1.

For the method described in claim 1, wherein the step of finding the specific candidate bounding box including the text object from the candidate bounding boxes according to the score of each candidate bounding box includes: The one with the highest score is found in the candidate bounding box as the specific candidate bounding box.

For example, the method described in item 1 of the scope of patent application further includes: performing text recognition on the specific candidate bounding box including the text object.

An image processing device comprising: a storage circuit storing a plurality of modules; and a processor coupled to the storage circuit, accessing the modules to perform the following steps: obtaining an image, wherein the image includes a text object; The image is analyzed to find multiple candidate bounding boxes in the image, where each candidate bounding box includes a middle area, a first side area, a second side area relative to the first side area, and a third Side area and a fourth side area relative to the third side area; calculate a score of each candidate bounding box; find out which includes the text object from the candidate bounding boxes according to the score of each candidate bounding box A specific candidate bounding box, wherein the step of calculating the score of each candidate bounding box includes: for a first candidate bounding box in the candidate bounding boxes, calculating the first side of the first candidate bounding box An edge density ratio between the edge density of the region and the edge density of the second side region, wherein the edge density of the first side region includes the sum of the gradient values of a plurality of pixels in the first side region, and the The edge density of the second side area includes the sum of the gradient values of the pixels in the second side area; calculate the gray level of the first side area and the gray level of the second side area of the first candidate bounding box A first gray-scale difference between, calculate the middle area An average gray value of the domain, the third side area and the fourth side area, and calculate a first gray ratio based on the first gray difference and the average gray value; calculate the first candidate bounding box A second gray level difference between the gray level of the third side area and the gray level of the fourth side area, and normalize the second gray level difference to obtain a gray level difference normalized value; The edge density ratio, the first gray scale ratio, and the gray scale difference normalized value are respectively multiplied by a first weight value, a second weight value, and a third weight value and added together to obtain the first candidate bounding box Of the score.