WO2020107866A1

WO2020107866A1 - Text region obtaining method and apparatus, storage medium and terminal device

Info

Publication number: WO2020107866A1
Application number: PCT/CN2019/091526
Authority: WO
Inventors: 黄泽浩; 王满
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-11-30
Filing date: 2019-06-17
Publication date: 2020-06-04
Also published as: CN109670500A; CN109670500B

Abstract

A text region obtaining method and apparatus, a storage medium, and a terminal device. The text region obtaining method comprises: obtaining a preset image comprising text, and performing background removal on the preset image by using a mean shift algorithm and a bilateral filtering algorithm (S101); performing grayscale processing on the preset image subjected to the background removal to obtain a grayscale image of the preset image (S102); sharpening the grayscale image to obtain an enhanced image of the grayscale image (S103); extracting each text region of the enhanced image by using a maximally stable extremal region (MSER) algorithm, and obtaining position information of each text region (S104); and classifying the text regions according to the position information of each text region, and merging the text regions of the same type to obtain final text regions (S105). The joint use of the mean shift algorithm and the bilateral filtering algorithm improves the background removal effect and reduces the background interference, and the merging of the text regions reduces the number of the text regions and improves the text region obtaining speed and efficiency.

Description

Method, device, storage medium and terminal equipment for acquiring text area

This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 30, 2018, with the application number 201811451778.9 and the invention titled "A method, device, storage medium, and terminal equipment for acquiring a text area", all of its content Incorporated by reference in this application.

Technical field

The present application relates to the field of image processing technology, and in particular, to a method, device, storage medium, and terminal device for acquiring a text area.

Background technique

In many existing scenarios, it is necessary to enter text information in the image, such as entering text information such as name, ID number, and address in the ID card, or entering financial information on the invoice into the company's financial system, etc., if manually If the text information in the image is entered, it not only requires a lot of manpower and financial resources, but also the entry efficiency is low, and the user experience is poor. In order to improve the input efficiency of text information in ID cards, invoices and other images, OCR text automatic recognition technology came into being. OCR technology can automatically recognize text information in images, and the recognition effect of text information in OCR technology depends on the text area The accuracy of the acquisition, but in the existing OCR technology, due to the reasons such as the complex image background, the accuracy rate of the text area acquisition is often low, and the acquisition efficiency is not high.

technical problem

Embodiments of the present application provide a method and apparatus for acquiring a text area, a computer-readable storage medium, and a terminal device, which can accurately acquire the text area in an image, improve the accuracy and speed of acquiring the text area, and greatly improve the text Regional access efficiency.

Technical solution

A first aspect of an embodiment of the present application provides a method for acquiring a text area, including:

Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;

Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;

Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;

The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.

According to a second aspect of the embodiments of the present application, an apparatus for acquiring a text area is provided, including:

The background removal module is used to obtain a preset image containing text, and uses a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;

The grayscale processing module is used to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;

A sharpening processing module, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

A position acquisition module, used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm, and obtain position information of each text area;

The area acquisition module is used to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.

According to a third aspect of the embodiments of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the following steps are implemented:

A fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the The computer-readable instructions implement the following steps:

Beneficial effect

In the embodiment of the present application, when a preset image containing text is acquired, the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain the gray image of the preset image, and the gray image can be sharpened to obtain the enhanced image of the gray image, so that the enhanced image The text area is more prominent and obvious, which facilitates the extraction of each text area in the enhanced image by the most stable extreme value area MSER algorithm, improves the accuracy of text area extraction, and after extracting each text area, you can further obtain each The location information of the text area, and the text area can be classified according to the location information of each text area, and the text areas of the same type are combined to obtain the final text area, so as to reduce the number of text areas, improve the speed of text area acquisition and acquisition effectiveness.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only for the application In some embodiments, for those of ordinary skill in the art, without paying creative labor, other drawings may be obtained based on these drawings.

FIG. 1 is a flowchart of an embodiment of a method for acquiring a text area in an embodiment of the present application;

2 is a schematic diagram of a text area extracted by using a MSER algorithm in an application scenario in a text area acquisition method in an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for acquiring a text area in an application scenario to classify text areas in an application scenario; FIG.

4 is a schematic flow chart of a method for acquiring a text area in an application scenario according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a text area acquisition method in an embodiment of the present application after performing expansion processing in an application scenario

6 is a structural diagram of an embodiment of an apparatus for acquiring a text area in an embodiment of the present application;

7 is a schematic diagram of a terminal device according to an embodiment of the present application.

Embodiments of the invention

In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the following The described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of this application.

Referring to FIG. 1, an embodiment of the present application provides a method for acquiring a text area. The method for acquiring a text area includes:

Step S101: Acquire a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;

It can be understood that the method of acquiring the preset image may be a photographing method or a scanning method. For example, when text information such as the name, ID number, address, etc. in an ID card needs to be acquired, the photo may be taken first Way, or to obtain the preset image of the ID card by scanning; for example, when the invoice information on an invoice needs to be obtained, the preset image of the invoice can also be obtained by photographing or scanning.

In the embodiment of the present application, after acquiring the preset image, a mean shift algorithm and a bilateral filtering algorithm may be used to perform background removal on the preset image to remove the image background in the preset image and reduce the image The background interferes with the text area acquisition. Here, the order of adopting the mean shift algorithm and the bilateral filtering algorithm is not limited. For example, the mean shift algorithm can be used to separate the foreground portion of the preset image from the image background and obtain the separated foreground portion, and then The background portion of the separated foreground portion is further removed by a bilateral filtering algorithm; the image background of the preset image can also be removed by a bilateral filtering algorithm, and then the foreground portion of the preset image can be further separated by a mean shift algorithm In order to improve the removal effect of the image background by jointly using the mean shift algorithm and the bilateral filtering algorithm, thereby reducing the interference of the image background during the text area acquisition process and improving the accuracy of the text area acquisition.

Step S102: Perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;

It can be understood that, in order to facilitate subsequent image processing on the preset image, in the embodiment of the present application, after obtaining the preset image after removing the background, that is, after obtaining the foreground portion of the preset image, Further performing gray-scale processing on the preset image to obtain a gray-scale image of the preset image. Here, in the embodiment of the present application, any existing grayscale processing method may be used to perform grayscale processing on the preset image. The embodiment of the present application does not make any limitation on the grayscale processing method, as long as the The grayscale image of the preset image is sufficient.

Step S103: Perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

Here, in order to avoid the problem that the pixel changes are insignificant due to uneven photo light, etc., resulting in poor acquisition of the text area, in the embodiment of the present application, after obtaining the grayscale image of the preset image, the The grayscale image performs a sharpening operation to make the pixels of the text portion in the grayscale image more prominent, thereby making the text area in the obtained enhanced image more prominent and obvious, and improving the accuracy of acquiring the text area.

Further, in the embodiment of the present application, the sharpening the grayscale image may include:

Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;

Wherein, the 3*3 convolution kernel is:

It can be understood that, in the embodiment of the present application, the above-mentioned 3*3 convolution kernel and the gray image of the preset image may be used to perform convolution processing to quickly adjust specific parts in the gray image The contrast or sharpness of the image can make the pixels of the text part of the grayscale image more prominent and obvious.

Step S104: Use the most stable extreme value region MSER algorithm to extract each text region of the enhanced image, and obtain position information of each text region;

In the embodiment of the present application, after obtaining the enhanced image of the grayscale image, that is, after obtaining an image with more prominent and obvious pixels of the text portion, the most stable extreme value region MSER algorithm may be used to extract the enhanced image The text area in the image, for example, in a specific application scenario, the text area extracted by the MSER algorithm is shown in FIG. 2, wherein each irregular polygon extracted by the MSER algorithm can represent a text area. After acquiring each text area extracted by the MSER algorithm, the position information of each text area can be obtained immediately, and the coordinate information of each point in each text area can be obtained immediately.

Step S105: Classify the text area according to the position information of each text area, and merge text areas of the same type to obtain a final text area.

As shown in FIG. 2, the text area extracted by the MSER algorithm often contains multiple text fields. For example, one text character can correspond to one text area. Therefore, in order to improve the acquisition speed and efficiency of the text area, embodiments of the present application After obtaining the location information of each text area, the text area can be further clustered or classified according to the location information of each text area, and the text areas belonging to the same class can be merged according to the clustering or classification results, thereby Obtain the final text area, for example, merge text characters located in the same line into the same text area, reduce the number of text area acquisitions, and thus improve the speed and efficiency of text area acquisition.

Preferably, as shown in FIG. 3, the classification of the text area according to the position information of each of the text areas may include:

Step S301: Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;

Step S302: Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;

Step S303: Classify each text area according to the classification result of the center point.

With respect to the above steps S301 to S303, it can be understood that after the position information of each text area is obtained, if the coordinate information of each point in each text area is obtained, each text can be determined according to the coordinate information of each point The center point of the area, and the center point coordinates of each center point can be obtained, and the horizontal and vertical coordinates of each center point can be obtained, and then the center points can be classified according to the horizontal and vertical coordinates of each center point, according to The classification result of the center point classifies each text area.

The first preset condition may be that the difference between the ordinates meets a preset threshold, and the preset threshold may be set to zero. When the preset threshold value is zero, it indicates that the center points with the same ordinate can be divided into one category, that is, the center points located in the same row can be determined as one category, for example, in a specific application scenario, the center point A It is the same as the ordinate of the center point B, the ordinates of the center point C, the center point D and the center point E are the same, the center point F, the center point G, the center point H and the center point I have the same ordinate, which means the center point A It belongs to the same line as center point B, center point C, center point D and center point E belong to the same line, while center point F, center point G, center point H and center point I belong to the same line, then center point A and Center point B is divided into one category, such as class A, center point C, center point D and center point E can be divided into another class, such as class B, and center point F, center point G can also be divided , The center point H and the center point I are classified into one category, for example, into category C.

Here, after obtaining the classification result of the center point, for example, after obtaining the above-mentioned class A, class B, and class C, the text area corresponding to each center point in class A may be divided into the first class, and class B The text area corresponding to each center point in class C is divided into the second category, and the text area corresponding to each center point in class C is divided into the third class, that is, the text region A corresponding to center point A and the center point B The corresponding text area B is divided into the first category, the text area C corresponding to the center point C, the text area D corresponding to the center point D, and the text area E corresponding to the center point E are divided into the second type, and the center point The text area F corresponding to F, the text area G corresponding to the center point G, the text area H corresponding to the center point H, and the text area I corresponding to the center point I are classified into the third category.

It should be noted that the preset threshold is zero for illustrative explanation only, and should not be construed as a limitation to the embodiment of the present application. In the embodiment of the present application, the preset threshold may of course be other values, such as 0.5 or 1, etc., when the preset threshold is 0.5, it means that the center points with the difference between the ordinates less than or equal to 0.5 can be classified into the same category. In addition, in the embodiment of the present application, the first preset condition may be that the difference between the ordinates meets the first preset threshold, and the difference between the abscissas meets the second preset threshold, where The case where the difference between the coordinates meets the second preset threshold is similar to the difference between the ordinates described above that satisfies the preset threshold, and the basic principles are the same. For the sake of brevity, they are not repeated here.

Further, in the embodiment of the present application, of course, the text area can also be classified according to the position information of other points in the text area, for example, the vertical coordinate of the uppermost and lowermost points in the text area and the horizontal coordinate of the center point can be obtained. And the text regions with the same horizontal coordinate of the center point, the vertical coordinate of the uppermost point satisfying the third preset condition and the vertical coordinate of the lowermost point can be divided into one type. The third preset condition and the fourth preset condition may be that the difference between the ordinates is within the preset value.

Optionally, as shown in FIG. 4, in the embodiment of the present application, the classification of the text regions according to the position information of each of the text regions, and the combination of text regions of the same type to obtain the final text region may include: :

Step S401: Construct a blank canvas of the same size as the enhanced image;

It should be noted that, in order to prevent the interference of the unfiltered image background on the acquisition of the text area, and to further improve the accuracy of the text area acquisition, in the embodiment of the present application, the most stable extreme value area MSER algorithm is used to extract the enhancement After each text area of the image, a blank canvas of the same size as the enhanced image can be constructed first.

Step S402: Import each extracted text area into the blank canvas according to the arrangement position in the enhanced image;

After constructing the blank canvas, each text area extracted by the MSER algorithm can be imported into the blank canvas, wherein, when importing each text area into the blank canvas, it is necessary to follow the text area in the enhanced image The arrangement position of is imported so that the image formed after the text areas are imported into the blank canvas is the same as the enhanced image.

Step S403: Perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;

Here, the text area extracted by the MSER algorithm is often an irregular polygon, and what is needed in the text area acquisition is line text, that is, the polygon on the same line needs to be fitted. If the irregular polygon is directly fitted It is troublesome. Therefore, as shown in FIG. 5, in the embodiment of the present application, before fitting the polygon, the polygon may be inflated, that is, each text area in the blank canvas may be inflated. , So that the text areas are connected together. Here, after performing expansion processing on each character region, each character region may also be etched to achieve the effect of connecting the character region and smoothing the boundary through the operation of expanding first and then eroding.

Step S404: Perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;

Step S405: Acquire position information of the smallest circumscribed rectangle of each of the connected areas;

Step S406: Classify each of the connected areas according to the position information of the smallest circumscribed rectangles, and merge the connected areas of the same type to obtain a final text area.

With respect to the above steps S404 to S406, it can be understood that, in the embodiment of the present application, after the expanded first text regions are obtained, edge detection may be performed on each of the first text regions, for example, through OpenCV The findcontours() function performs edge detection on each of the first text areas to determine the connected first text area according to the detection result, and can merge the connected first text area into a connected area, and at the same time, the minimum of each connected area is detected. Circumscribed rectangles, the smallest circumscribed rectangles are the smallest rectangles containing the connected first text areas, and the position information of each smallest circumscribed rectangle is obtained, so that each of the connected areas can be classified according to the position information of the smallest circumscribed rectangle, It can also merge the connected areas of the same type to get the final text area.

Here, the detection result may include the distance between adjacent first text regions. In the embodiment of the present application, a distance threshold may be set to determine whether the adjacent first text regions are connected, such as in a specific In an application scenario, the distance threshold can be set to 1 cm. Therefore, when the detection determines that the distance between the first text area and the second text area is 0.6 cm, and the distance between the second text area and the third text area is At 0.7cm, it can be determined that the first text area and the second text area are connected, and the second text area and the third text area are connected, and the first text area, the second text area, and the third text area can be merged into a connected area .

It should be noted that, in the embodiment of the present application, the determination of the connected first text area through the setting of the distance threshold is for illustrative explanation only, and should not be construed as a limitation on the embodiment of the present application. The first connected text area is determined by any other method that can determine whether the text areas are connected or not.

Wherein, classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles may include:

Step a: Obtain the diagonal coordinates of each minimum circumscribed rectangle;

Step b: Classify each of the connected areas according to each of the diagonal coordinates.

For the above steps a and b, it can be understood that the position information of each minimum circumscribed rectangle in the embodiment of the present application may be the coordinate information of the diagonal points in each minimum circumscribed rectangle, such as the acquisition of the minimum circumscribed rectangles. The coordinates of the upper left point and the coordinates of the lower right point are used to classify all connected areas according to the coordinates of the upper left point and the coordinates of the lower right point. For example, all connected areas with the same vertical coordinate of the upper left point and the same vertical coordinate of the lower right point can be divided It is a type, for example, when the ordinate of the upper left point of the smallest circumscribed rectangle A is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle A and the ordinate of the lower right point of the smallest circumscribed rectangle B If the ordinate of the upper left point of the smallest circumscribed rectangle C is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle C is the same as the ordinate of the lower right point of the smallest circumscribed rectangle B, then The connectivity area A corresponding to the smallest circumscribed rectangle A, the connectivity area B corresponding to the smallest circumscribed rectangle B, and the connectivity area C corresponding to the smallest circumscribed rectangle C can be classified into the same category.

It should be noted that in the embodiments of the present application, the classification of the connected area according to the same ordinate of the upper left point and the same ordinate of the lower right point is only for schematic explanation, and should not be construed as a limitation to the embodiments of the present application. In the example, of course, it is also possible to set a fourth preset condition that the difference between the ordinate of the upper left point needs to meet and a fifth preset condition that the difference between the ordinate of the lower right point needs to meet, according to the fourth preset The condition and the fifth preset condition are used to classify the connected area. The fourth preset condition and the fifth preset condition may be that the difference between the ordinates is within the preset value. Of course, in the embodiment of the present application, the connection area may also be classified according to the ordinate of the lower left point and the ordinate of the upper right point.

Further, in the embodiment of the present application, after all the connected areas are classified according to the position information of the smallest circumscribed rectangle to obtain multiple clusters, operations such as filtering and filtering can be performed on the connected areas within various types of clusters. Filter out the connected areas whose distance from other connected areas in the cluster is greater than the preset distance threshold, and filter out the selected connected areas from the cluster, that is, remove the corresponding cluster from the corresponding cluster Connectivity area; or filter out the connectivity areas whose area area is greater than the preset area threshold among various clusters, and filter out the selected connectivity areas from the corresponding clusters; A Unicom area within a Unicom area, etc., to prevent acquisition of areas that are not text, or to prevent repeated acquisition of text areas, thereby improving classification accuracy, and improving the efficiency and accuracy of text area acquisition.

Preferably, before the mean shift algorithm and the bilateral filtering algorithm are used to remove the background of the preset image, the method may further include:

Step c: Collect RGB values of each pixel in the preset image;

Step d: Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.

With respect to the above steps c and d, it can be understood that when acquiring a text area in a preset image with a clear color distinction, such as when acquiring a text area in an invoice, the embodiment of the present application acquires a preset invoice After the image, you can first use color separation technology to extract the interference area in the preset image of the invoice, such as extracting the interference area such as the border and seal in the invoice, and delete the interference area in the preset image of the invoice Pixels, and then use the mean shift algorithm and bilateral filtering algorithm to remove the background image of the pixels in the interference area and remove the background and subsequent steps to obtain the text area. Here, the interference area may be determined according to the RGB value of the pixel, and the second preset condition may be set according to the specific color of the interference area to be removed.

In the embodiment of the present application, when a preset image containing text is acquired, the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain a gray-scale image, and the gray-scale image can be sharpened to obtain an enhanced image, so that the text area in the enhanced image is more prominent and obvious, In order to facilitate the extraction of each text area in the enhanced image by the most stable extreme value area MSER algorithm, the accuracy of text area extraction is improved, and after each text area is extracted, the position information of each text area can be further obtained and can be based on The position information of each text area is classified into text areas, and the text areas of the same type are merged to obtain a final text area, so as to reduce the number of text areas and improve the acquisition speed and efficiency of the text area.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The above mainly describes a text area acquisition method, and a text area acquisition device will be described in detail below.

FIG. 6 shows a structural diagram of an embodiment of a device for acquiring a text area in an embodiment of the present application. As shown in FIG. 6, the text area acquisition device includes:

The background removal module 601 is used to obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;

The grayscale processing module 602 is configured to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;

A sharpening processing module 603, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

The position obtaining module 604 is used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm and obtain position information of each text area;

The area acquisition module 605 is configured to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.

Further, the sharpening processing module 603 is specifically configured to perform a convolution process on the grayscale image using a 3*3 convolution kernel to perform sharpening operation on the grayscale image;

Wherein, the 3*3 convolution kernel is:

Preferably, the area acquisition module 605 may include:

A center point determining unit, configured to determine the center point of each text area based on the position information of each text area, and obtain the center point coordinates of each center point;

A center point classification unit, configured to determine center points that satisfy the first preset condition between the center point coordinates as the same type, and obtain a classification result of the center points;

The text area classification unit is used to classify each text area according to the classification result of the center point.

Optionally, the area acquisition module 605 may include:

A blank canvas construction unit, used to build a blank canvas of the same size as the enhanced image;

A text area importing unit for importing each extracted text area into the blank canvas according to the arrangement position in the enhanced image;

An expansion processing unit, configured to perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;

An edge detection unit, configured to perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;

A location information acquiring unit, configured to acquire the location information of the smallest circumscribed rectangle of each of the connected areas;

The connected area merging unit is used to classify the connected areas according to the position information of the smallest circumscribed rectangles, and combine the connected areas of the same type to obtain the final text area.

Further, the China Unicom area merging unit may include:

A diagonal coordinate obtaining subunit, used to obtain the diagonal coordinates of each of the smallest circumscribed rectangles;

The connectivity area merging subunit is used to classify the connectivity areas according to the diagonal coordinates.

Preferably, the text area acquisition device may further include:

RGB value collection module, used to collect the RGB value of each pixel in the preset image;

The pixel deletion module is used to extract pixels whose RGB values meet the second preset condition, and delete the extracted pixels in the preset image.

7 is a schematic diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and computer-readable instructions 72 stored in the memory 71 and executable on the processor 70, such as text area acquisition program. When the processor 70 executes the computer-readable instruction 72, the steps in the above embodiments of the method for acquiring a text area are implemented, for example, steps S101 to S105 shown in FIG. 1. Alternatively, when the processor 70 executes the computer-readable instructions 72, the functions of each module/unit in the foregoing device embodiments are realized, for example, the functions of the modules 601 to 605 shown in FIG. 6.

Exemplarily, the computer-readable instructions 72 may be divided into one or more modules/units, the one or more modules/units are stored in the memory 71 and executed by the processor 70, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions. The instruction segments are used to describe the execution process of the computer-readable instructions 72 in the terminal device 7.

The terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer and a cloud server. The terminal device may include, but is not limited to, a processor 70 and a memory 71. Those skilled in the art may understand that FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7, and may include more or less components than those illustrated, or a combination of certain components, or different components. For example, the terminal device may further include an input and output device, a network access device, a bus, and the like.

The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, for example, a plug-in hard disk equipped on the terminal device 7, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc. Further, the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device. The memory 71 is used to store the computer-readable instructions and other programs and data required by the terminal device. The memory 71 can also be used to temporarily store data that has been or will be output.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they can still The technical solutions described in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A method for acquiring a text area, characterized in that it includes:

Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;

Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;

Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;

The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
The method for acquiring a text area according to claim 1, wherein the sharpening the grayscale image includes:

Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;

Wherein, the 3*3 convolution kernel is:
The method for acquiring a text area according to claim 1, wherein the classifying the text area according to the position information of each text area includes:

Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;

Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;

Classify each of the character regions according to the classification result of the center point.
The method for acquiring a text area according to claim 1, wherein the classifying the text area according to the position information of each text area and merging the text areas of the same type to obtain the final text area includes:

Construct a blank canvas of the same size as the enhanced image;

Import the extracted text areas into the blank canvas according to the arrangement position in the enhanced image;

Performing expansion processing on each text area located in the blank canvas to obtain each expanded first text area;

Performing edge detection on each of the first text areas, determining the connected first text areas, and merging the connected first text areas into a connected area;

Acquiring the position information of the smallest circumscribed rectangle of each of the connected areas;

According to the position information of each minimum circumscribed rectangle, each of the connected areas is classified, and the connected areas of the same type are combined to obtain a final text area.
The method for acquiring a text area according to claim 4, wherein the classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles includes:

Obtain the diagonal coordinates of each of the smallest circumscribed rectangles;

According to each of the diagonal coordinates, each of the connected areas is classified.
The method for acquiring a text area according to any one of claims 1 to 5, wherein before the background removal is performed on the preset image by using a mean shift algorithm and a bilateral filtering algorithm, the method further includes:

Collect the RGB values of each pixel in the preset image;

Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.
An apparatus for acquiring a text area, characterized in that it includes:

The background removal module is used to obtain a preset image containing text, and uses a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;

The grayscale processing module is used to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;

A sharpening processing module, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

A position acquisition module, used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm, and obtain position information of each text area;

The area acquisition module is used to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
The character area acquisition device according to claim 7, wherein the sharpening processing module is configured to perform a convolution process on the grayscale image using a 3*3 convolution kernel to process the grayscale image Perform sharpening operations;

Wherein, the 3*3 convolution kernel is:
The text area acquisition device according to claim 7, wherein the area acquisition module includes:

A center point determining unit, configured to determine the center point of each text area based on the position information of each text area, and obtain the center point coordinates of each center point;

A center point classification unit, configured to determine center points that satisfy the first preset condition between the center point coordinates as the same type, and obtain a classification result of the center points;

The text area classification unit is used to classify each text area according to the classification result of the center point.
The text area acquisition device according to claim 7, wherein the area acquisition module includes:

A blank canvas construction unit, used to build a blank canvas of the same size as the enhanced image;

A text area importing unit for importing each extracted text area into the blank canvas according to the arrangement position in the enhanced image;

An expansion processing unit, configured to perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;

An edge detection unit, configured to perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;

A location information acquiring unit, configured to acquire the location information of the smallest circumscribed rectangle of each of the connected areas;

A connection area merging unit is used to classify each connection area according to the position information of each minimum circumscribed rectangle, and merge the connection areas of the same type to obtain a final text area.
The apparatus for acquiring a text area according to claim 10, wherein the unit for merging the connected areas includes:

A diagonal coordinate obtaining subunit, used to obtain the diagonal coordinates of each of the smallest circumscribed rectangles;

The connectivity area merging subunit is used to classify the connectivity areas according to the diagonal coordinates.
The text area acquisition device according to any one of claims 7 to 11, wherein the text area acquisition device further comprises:

RGB value collection module, used to collect the RGB value of each pixel in the preset image;

The pixel deletion module is used to extract pixels whose RGB values meet the second preset condition, and delete the extracted pixels in the preset image.
A computer-readable storage medium, the computer-readable storage medium storing computer-readable instructions, characterized in that, when the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;

Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;

Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;

The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
The computer-readable storage medium of claim 13, said sharpening the grayscale image comprising:

Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;

Wherein, the 3*3 convolution kernel is:
A terminal device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that the processor is implemented as follows when executing the computer-readable instructions step:

Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;

Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;

Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;

Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;

The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
The method for acquiring a text area according to claim 15, wherein the sharpening the grayscale image includes:

Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;

Wherein, the 3*3 convolution kernel is:
The method for acquiring a text area according to claim 15, wherein the classifying the text area according to the position information of each text area includes:

Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;

Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;

Classify each of the character regions according to the classification result of the center point.
The method for acquiring a text area according to claim 15, wherein the classifying the text area according to the position information of each text area and merging the text areas of the same type to obtain the final text area includes:

Construct a blank canvas of the same size as the enhanced image;

Import the extracted text areas into the blank canvas according to the arrangement position in the enhanced image;

Performing expansion processing on each text area located in the blank canvas to obtain each expanded first text area;

Performing edge detection on each of the first text areas, determining the connected first text areas, and merging the connected first text areas into a connected area;

Acquiring the position information of the smallest circumscribed rectangle of each of the connected areas;

According to the position information of each minimum circumscribed rectangle, each of the connected areas is classified, and the connected areas of the same type are combined to obtain a final text area.
The method for acquiring a text area according to claim 18, wherein the classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles includes:

Obtain the diagonal coordinates of each of the smallest circumscribed rectangles;

According to each of the diagonal coordinates, each of the connected areas is classified.
The method for acquiring a text region according to any one of claims 15 to 19, characterized in that before the mean shift algorithm and the bilateral filtering algorithm are used to remove the background of the preset image, the method further includes:

Collect the RGB values of each pixel in the preset image;

Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.