WO2020107866A1 - Text region obtaining method and apparatus, storage medium and terminal device - Google Patents

Text region obtaining method and apparatus, storage medium and terminal device Download PDF

Info

Publication number
WO2020107866A1
WO2020107866A1 PCT/CN2019/091526 CN2019091526W WO2020107866A1 WO 2020107866 A1 WO2020107866 A1 WO 2020107866A1 CN 2019091526 W CN2019091526 W CN 2019091526W WO 2020107866 A1 WO2020107866 A1 WO 2020107866A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
text area
image
area
position information
Prior art date
Application number
PCT/CN2019/091526
Other languages
French (fr)
Chinese (zh)
Inventor
黄泽浩
王满
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020107866A1 publication Critical patent/WO2020107866A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Definitions

  • the present application relates to the field of image processing technology, and in particular, to a method, device, storage medium, and terminal device for acquiring a text area.
  • OCR technology can automatically recognize text information in images, and the recognition effect of text information in OCR technology depends on the text area The accuracy of the acquisition, but in the existing OCR technology, due to the reasons such as the complex image background, the accuracy rate of the text area acquisition is often low, and the acquisition efficiency is not high.
  • Embodiments of the present application provide a method and apparatus for acquiring a text area, a computer-readable storage medium, and a terminal device, which can accurately acquire the text area in an image, improve the accuracy and speed of acquiring the text area, and greatly improve the text Regional access efficiency.
  • a first aspect of an embodiment of the present application provides a method for acquiring a text area, including:
  • the text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  • an apparatus for acquiring a text area including:
  • the background removal module is used to obtain a preset image containing text, and uses a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
  • the grayscale processing module is used to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
  • a sharpening processing module configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image
  • a position acquisition module used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm, and obtain position information of each text area;
  • the area acquisition module is used to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
  • a computer-readable storage medium stores computer-readable instructions.
  • the computer-readable instructions When the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  • a fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the The computer-readable instructions implement the following steps:
  • the text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  • the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain the gray image of the preset image, and the gray image can be sharpened to obtain the enhanced image of the gray image, so that the enhanced image
  • the text area is more prominent and obvious, which facilitates the extraction of each text area in the enhanced image by the most stable extreme value area MSER algorithm, improves the accuracy of text area extraction, and after extracting each text area, you can further obtain each
  • the location information of the text area, and the text area can be classified according to the location information of each text area, and the text areas of the same type are combined to obtain the final text area, so as to reduce the number of text areas, improve the speed of text area acquisition and acquisition effectiveness.
  • FIG. 1 is a flowchart of an embodiment of a method for acquiring a text area in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a text area extracted by using a MSER algorithm in an application scenario in a text area acquisition method in an embodiment of the present application;
  • FIG. 3 is a schematic flow chart of a method for acquiring a text area in an application scenario to classify text areas in an application scenario
  • FIG. 4 is a schematic flow chart of a method for acquiring a text area in an application scenario according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a text area acquisition method in an embodiment of the present application after performing expansion processing in an application scenario
  • FIG. 6 is a structural diagram of an embodiment of an apparatus for acquiring a text area in an embodiment of the present application
  • FIG. 7 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • Embodiments of the present application provide a method and apparatus for acquiring a text area, a computer-readable storage medium, and a terminal device, which can accurately acquire the text area in an image, improve the accuracy and speed of acquiring the text area, and greatly improve the text Regional access efficiency.
  • an embodiment of the present application provides a method for acquiring a text area.
  • the method for acquiring a text area includes:
  • Step S101 Acquire a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
  • the method of acquiring the preset image may be a photographing method or a scanning method.
  • the photo may be taken first Way, or to obtain the preset image of the ID card by scanning; for example, when the invoice information on an invoice needs to be obtained, the preset image of the invoice can also be obtained by photographing or scanning.
  • a mean shift algorithm and a bilateral filtering algorithm may be used to perform background removal on the preset image to remove the image background in the preset image and reduce the image The background interferes with the text area acquisition.
  • the order of adopting the mean shift algorithm and the bilateral filtering algorithm is not limited.
  • the mean shift algorithm can be used to separate the foreground portion of the preset image from the image background and obtain the separated foreground portion, and then The background portion of the separated foreground portion is further removed by a bilateral filtering algorithm; the image background of the preset image can also be removed by a bilateral filtering algorithm, and then the foreground portion of the preset image can be further separated by a mean shift algorithm In order to improve the removal effect of the image background by jointly using the mean shift algorithm and the bilateral filtering algorithm, thereby reducing the interference of the image background during the text area acquisition process and improving the accuracy of the text area acquisition.
  • Step S102 Perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image
  • any existing grayscale processing method may be used to perform grayscale processing on the preset image.
  • the embodiment of the present application does not make any limitation on the grayscale processing method, as long as the The grayscale image of the preset image is sufficient.
  • Step S103 Perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image
  • the The grayscale image performs a sharpening operation to make the pixels of the text portion in the grayscale image more prominent, thereby making the text area in the obtained enhanced image more prominent and obvious, and improving the accuracy of acquiring the text area.
  • the sharpening the grayscale image may include:
  • the 3*3 convolution kernel is:
  • the above-mentioned 3*3 convolution kernel and the gray image of the preset image may be used to perform convolution processing to quickly adjust specific parts in the gray image
  • the contrast or sharpness of the image can make the pixels of the text part of the grayscale image more prominent and obvious.
  • Step S104 Use the most stable extreme value region MSER algorithm to extract each text region of the enhanced image, and obtain position information of each text region;
  • the most stable extreme value region MSER algorithm may be used to extract the enhanced image
  • the text area in the image for example, in a specific application scenario, the text area extracted by the MSER algorithm is shown in FIG. 2, wherein each irregular polygon extracted by the MSER algorithm can represent a text area.
  • Step S105 Classify the text area according to the position information of each text area, and merge text areas of the same type to obtain a final text area.
  • the text area extracted by the MSER algorithm often contains multiple text fields.
  • one text character can correspond to one text area. Therefore, in order to improve the acquisition speed and efficiency of the text area, embodiments of the present application
  • the text area can be further clustered or classified according to the location information of each text area, and the text areas belonging to the same class can be merged according to the clustering or classification results, thereby Obtain the final text area, for example, merge text characters located in the same line into the same text area, reduce the number of text area acquisitions, and thus improve the speed and efficiency of text area acquisition.
  • the classification of the text area according to the position information of each of the text areas may include:
  • Step S301 Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;
  • Step S302 Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;
  • Step S303 Classify each text area according to the classification result of the center point.
  • each text can be determined according to the coordinate information of each point The center point of the area, and the center point coordinates of each center point can be obtained, and the horizontal and vertical coordinates of each center point can be obtained, and then the center points can be classified according to the horizontal and vertical coordinates of each center point, according to The classification result of the center point classifies each text area.
  • the first preset condition may be that the difference between the ordinates meets a preset threshold, and the preset threshold may be set to zero.
  • the preset threshold value is zero, it indicates that the center points with the same ordinate can be divided into one category, that is, the center points located in the same row can be determined as one category, for example, in a specific application scenario, the center point A It is the same as the ordinate of the center point B, the ordinates of the center point C, the center point D and the center point E are the same, the center point F, the center point G, the center point H and the center point I have the same ordinate, which means the center point A It belongs to the same line as center point B, center point C, center point D and center point E belong to the same line, while center point F, center point G, center point H and center point I belong to the same line, then center point A and Center point B is divided into one category, such as class A, center point C, center point D and center point E can be divided into another class, such as class B, and
  • the text area corresponding to each center point in class A may be divided into the first class, and class B
  • the text area corresponding to each center point in class C is divided into the second category
  • the text area corresponding to each center point in class C is divided into the third class, that is, the text region A corresponding to center point A and the center point B
  • the corresponding text area B is divided into the first category
  • the text area E corresponding to the center point E are divided into the second type
  • the text area F corresponding to F, the text area G corresponding to the center point G, the text area H corresponding to the center point H, and the text area I corresponding to the center point I are classified into the third category.
  • the preset threshold is zero for illustrative explanation only, and should not be construed as a limitation to the embodiment of the present application.
  • the preset threshold may of course be other values, such as 0.5 or 1, etc., when the preset threshold is 0.5, it means that the center points with the difference between the ordinates less than or equal to 0.5 can be classified into the same category.
  • the first preset condition may be that the difference between the ordinates meets the first preset threshold, and the difference between the abscissas meets the second preset threshold, where The case where the difference between the coordinates meets the second preset threshold is similar to the difference between the ordinates described above that satisfies the preset threshold, and the basic principles are the same. For the sake of brevity, they are not repeated here.
  • the text area can also be classified according to the position information of other points in the text area, for example, the vertical coordinate of the uppermost and lowermost points in the text area and the horizontal coordinate of the center point can be obtained.
  • the text regions with the same horizontal coordinate of the center point, the vertical coordinate of the uppermost point satisfying the third preset condition and the vertical coordinate of the lowermost point can be divided into one type.
  • the third preset condition and the fourth preset condition may be that the difference between the ordinates is within the preset value.
  • the classification of the text regions according to the position information of each of the text regions, and the combination of text regions of the same type to obtain the final text region may include: :
  • Step S401 Construct a blank canvas of the same size as the enhanced image
  • the most stable extreme value area MSER algorithm is used to extract the enhancement After each text area of the image, a blank canvas of the same size as the enhanced image can be constructed first.
  • Step S402 Import each extracted text area into the blank canvas according to the arrangement position in the enhanced image
  • each text area extracted by the MSER algorithm can be imported into the blank canvas, wherein, when importing each text area into the blank canvas, it is necessary to follow the text area in the enhanced image
  • the arrangement position of is imported so that the image formed after the text areas are imported into the blank canvas is the same as the enhanced image.
  • Step S403 Perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
  • the text area extracted by the MSER algorithm is often an irregular polygon, and what is needed in the text area acquisition is line text, that is, the polygon on the same line needs to be fitted. If the irregular polygon is directly fitted It is troublesome. Therefore, as shown in FIG. 5, in the embodiment of the present application, before fitting the polygon, the polygon may be inflated, that is, each text area in the blank canvas may be inflated. , So that the text areas are connected together.
  • each character region may also be etched to achieve the effect of connecting the character region and smoothing the boundary through the operation of expanding first and then eroding.
  • Step S404 Perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;
  • Step S405 Acquire position information of the smallest circumscribed rectangle of each of the connected areas
  • Step S406 Classify each of the connected areas according to the position information of the smallest circumscribed rectangles, and merge the connected areas of the same type to obtain a final text area.
  • edge detection may be performed on each of the first text regions, for example, through OpenCV
  • the findcontours() function performs edge detection on each of the first text areas to determine the connected first text area according to the detection result, and can merge the connected first text area into a connected area, and at the same time, the minimum of each connected area is detected.
  • the smallest circumscribed rectangles are the smallest rectangles containing the connected first text areas, and the position information of each smallest circumscribed rectangle is obtained, so that each of the connected areas can be classified according to the position information of the smallest circumscribed rectangle, It can also merge the connected areas of the same type to get the final text area.
  • the detection result may include the distance between adjacent first text regions.
  • a distance threshold may be set to determine whether the adjacent first text regions are connected, such as in a specific
  • the distance threshold can be set to 1 cm. Therefore, when the detection determines that the distance between the first text area and the second text area is 0.6 cm, and the distance between the second text area and the third text area is At 0.7cm, it can be determined that the first text area and the second text area are connected, and the second text area and the third text area are connected, and the first text area, the second text area, and the third text area can be merged into a connected area .
  • the determination of the connected first text area through the setting of the distance threshold is for illustrative explanation only, and should not be construed as a limitation on the embodiment of the present application.
  • the first connected text area is determined by any other method that can determine whether the text areas are connected or not.
  • classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles may include:
  • Step a Obtain the diagonal coordinates of each minimum circumscribed rectangle
  • Step b Classify each of the connected areas according to each of the diagonal coordinates.
  • the position information of each minimum circumscribed rectangle in the embodiment of the present application may be the coordinate information of the diagonal points in each minimum circumscribed rectangle, such as the acquisition of the minimum circumscribed rectangles.
  • the coordinates of the upper left point and the coordinates of the lower right point are used to classify all connected areas according to the coordinates of the upper left point and the coordinates of the lower right point.
  • all connected areas with the same vertical coordinate of the upper left point and the same vertical coordinate of the lower right point can be divided It is a type, for example, when the ordinate of the upper left point of the smallest circumscribed rectangle A is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle A and the ordinate of the lower right point of the smallest circumscribed rectangle B If the ordinate of the upper left point of the smallest circumscribed rectangle C is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle C is the same as the ordinate of the lower right point of the smallest circumscribed rectangle B, then The connectivity area A corresponding to the smallest circumscribed rectangle A, the connectivity area B corresponding to the smallest circumscribed rectangle B, and the connectivity area C corresponding to the smallest circumscribed rectangle C can be classified into the same category.
  • the classification of the connected area according to the same ordinate of the upper left point and the same ordinate of the lower right point is only for schematic explanation, and should not be construed as a limitation to the embodiments of the present application.
  • the condition and the fifth preset condition are used to classify the connected area.
  • the fourth preset condition and the fifth preset condition may be that the difference between the ordinates is within the preset value.
  • the connection area may also be classified according to the ordinate of the lower left point and the ordinate of the upper right point.
  • operations such as filtering and filtering can be performed on the connected areas within various types of clusters. Filter out the connected areas whose distance from other connected areas in the cluster is greater than the preset distance threshold, and filter out the selected connected areas from the cluster, that is, remove the corresponding cluster from the corresponding cluster Connectivity area; or filter out the connectivity areas whose area area is greater than the preset area threshold among various clusters, and filter out the selected connectivity areas from the corresponding clusters; A Unicom area within a Unicom area, etc., to prevent acquisition of areas that are not text, or to prevent repeated acquisition of text areas, thereby improving classification accuracy, and improving the efficiency and accuracy of text area acquisition.
  • the method may further include:
  • Step c Collect RGB values of each pixel in the preset image
  • Step d Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.
  • the embodiment of the present application acquires a preset invoice After the image, you can first use color separation technology to extract the interference area in the preset image of the invoice, such as extracting the interference area such as the border and seal in the invoice, and delete the interference area in the preset image of the invoice Pixels, and then use the mean shift algorithm and bilateral filtering algorithm to remove the background image of the pixels in the interference area and remove the background and subsequent steps to obtain the text area.
  • the interference area may be determined according to the RGB value of the pixel, and the second preset condition may be set according to the specific color of the interference area to be removed.
  • the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain a gray-scale image, and the gray-scale image can be sharpened to obtain an enhanced image, so that the text area in the enhanced image is more prominent and obvious,
  • the accuracy of text area extraction is improved, and after each text area is extracted, the position information of each text area can be further obtained and can be based on The position information of each text area is classified into text areas, and the text areas of the same type are merged to obtain a final text area, so as to reduce the number of text areas and improve the acquisition speed and efficiency of the text area.
  • the above mainly describes a text area acquisition method, and a text area acquisition device will be described in detail below.
  • FIG. 6 shows a structural diagram of an embodiment of a device for acquiring a text area in an embodiment of the present application.
  • the text area acquisition device includes:
  • the background removal module 601 is used to obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
  • the grayscale processing module 602 is configured to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
  • a sharpening processing module 603, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image
  • the position obtaining module 604 is used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm and obtain position information of each text area;
  • the area acquisition module 605 is configured to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
  • the sharpening processing module 603 is specifically configured to perform a convolution process on the grayscale image using a 3*3 convolution kernel to perform sharpening operation on the grayscale image;
  • the 3*3 convolution kernel is:
  • the area acquisition module 605 may include:
  • a center point determining unit configured to determine the center point of each text area based on the position information of each text area, and obtain the center point coordinates of each center point;
  • a center point classification unit configured to determine center points that satisfy the first preset condition between the center point coordinates as the same type, and obtain a classification result of the center points
  • the text area classification unit is used to classify each text area according to the classification result of the center point.
  • the area acquisition module 605 may include:
  • a blank canvas construction unit used to build a blank canvas of the same size as the enhanced image
  • a text area importing unit for importing each extracted text area into the blank canvas according to the arrangement position in the enhanced image
  • An expansion processing unit configured to perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area
  • An edge detection unit configured to perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;
  • a location information acquiring unit configured to acquire the location information of the smallest circumscribed rectangle of each of the connected areas
  • the connected area merging unit is used to classify the connected areas according to the position information of the smallest circumscribed rectangles, and combine the connected areas of the same type to obtain the final text area.
  • China Unicom area merging unit may include:
  • a diagonal coordinate obtaining subunit used to obtain the diagonal coordinates of each of the smallest circumscribed rectangles
  • the connectivity area merging subunit is used to classify the connectivity areas according to the diagonal coordinates.
  • the text area acquisition device may further include:
  • RGB value collection module used to collect the RGB value of each pixel in the preset image
  • the pixel deletion module is used to extract pixels whose RGB values meet the second preset condition, and delete the extracted pixels in the preset image.
  • the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and computer-readable instructions 72 stored in the memory 71 and executable on the processor 70, such as text area acquisition program.
  • the processor 70 executes the computer-readable instruction 72
  • the steps in the above embodiments of the method for acquiring a text area are implemented, for example, steps S101 to S105 shown in FIG. 1.
  • the processor 70 executes the computer-readable instructions 72
  • the functions of each module/unit in the foregoing device embodiments are realized, for example, the functions of the modules 601 to 605 shown in FIG. 6.
  • the computer-readable instructions 72 may be divided into one or more modules/units, the one or more modules/units are stored in the memory 71 and executed by the processor 70, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions.
  • the instruction segments are used to describe the execution process of the computer-readable instructions 72 in the terminal device 7.
  • the terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer and a cloud server.
  • the terminal device may include, but is not limited to, a processor 70 and a memory 71.
  • FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7, and may include more or less components than those illustrated, or a combination of certain components, or different components.
  • the terminal device may further include an input and output device, a network access device, a bus, and the like.
  • the processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7.
  • the memory 71 may also be an external storage device of the terminal device 7, for example, a plug-in hard disk equipped on the terminal device 7, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc.
  • the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device.
  • the memory 71 is used to store the computer-readable instructions and other programs and data required by the terminal device.
  • the memory 71 can also be used to temporarily store data that has been or will be output.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

A text region obtaining method and apparatus, a storage medium, and a terminal device. The text region obtaining method comprises: obtaining a preset image comprising text, and performing background removal on the preset image by using a mean shift algorithm and a bilateral filtering algorithm (S101); performing grayscale processing on the preset image subjected to the background removal to obtain a grayscale image of the preset image (S102); sharpening the grayscale image to obtain an enhanced image of the grayscale image (S103); extracting each text region of the enhanced image by using a maximally stable extremal region (MSER) algorithm, and obtaining position information of each text region (S104); and classifying the text regions according to the position information of each text region, and merging the text regions of the same type to obtain final text regions (S105). The joint use of the mean shift algorithm and the bilateral filtering algorithm improves the background removal effect and reduces the background interference, and the merging of the text regions reduces the number of the text regions and improves the text region obtaining speed and efficiency.

Description

一种文字区域获取方法、装置、存储介质及终端设备Method, device, storage medium and terminal equipment for acquiring text area
本申请要求于2018年11月30日提交中国专利局、申请号为201811451778.9、发明名称为“一种文字区域获取方法、装置、存储介质及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 30, 2018, with the application number 201811451778.9 and the invention titled "A method, device, storage medium, and terminal equipment for acquiring a text area", all of its content Incorporated by reference in this application.
技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及一种文字区域获取方法、装置、存储介质及终端设备。The present application relates to the field of image processing technology, and in particular, to a method, device, storage medium, and terminal device for acquiring a text area.
背景技术Background technique
现有很多的场景中都需要录入图像中的文字信息,比如录入身份证中的姓名、身份证号码、住址等文字信息,或者将发票上的财务信息录入至公司财务系统等等,若手工进行图像中文字信息录入的话,不仅需要耗费大量的人力财力,而且录入效率低,用户使用体验差。为提高身份证、发票等图像中文字信息的录入效率,OCR文字自动识别技术应运而生,通过OCR技术可自动识别图像中的文字信息,而OCR技术中文字信息的识别效果则取决于文字区域获取的准确性,但现有的OCR技术中,因存在图像背景复杂等原因,往往造成文字区域获取的准确率较低,而且获取效率也不高。In many existing scenarios, it is necessary to enter text information in the image, such as entering text information such as name, ID number, and address in the ID card, or entering financial information on the invoice into the company's financial system, etc., if manually If the text information in the image is entered, it not only requires a lot of manpower and financial resources, but also the entry efficiency is low, and the user experience is poor. In order to improve the input efficiency of text information in ID cards, invoices and other images, OCR text automatic recognition technology came into being. OCR technology can automatically recognize text information in images, and the recognition effect of text information in OCR technology depends on the text area The accuracy of the acquisition, but in the existing OCR technology, due to the reasons such as the complex image background, the accuracy rate of the text area acquisition is often low, and the acquisition efficiency is not high.
技术问题technical problem
本申请实施例提供了一种文字区域获取方法、装置、计算机可读存储介质及终端设备,能够准确获取图像中的文字区域,提高了文字区域获取的准确性和获取速度,极大地提高了文字区域的获取效率。Embodiments of the present application provide a method and apparatus for acquiring a text area, a computer-readable storage medium, and a terminal device, which can accurately acquire the text area in an image, improve the accuracy and speed of acquiring the text area, and greatly improve the text Regional access efficiency.
技术解决方案Technical solution
本申请实施例的第一方面,提供了一种文字区域获取方法,包括:A first aspect of an embodiment of the present application provides a method for acquiring a text area, including:
获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
本申请实施例的第二方面,提供了一种文字区域获取装置,包括:According to a second aspect of the embodiments of the present application, an apparatus for acquiring a text area is provided, including:
背景去除模块,用于获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;The background removal module is used to obtain a preset image containing text, and uses a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
灰度处理模块,用于对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;The grayscale processing module is used to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
锐化处理模块,用于对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;A sharpening processing module, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
位置获取模块,用于使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;A position acquisition module, used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm, and obtain position information of each text area;
区域获取模块,用于根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The area acquisition module is used to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
本申请实施例的第三方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:According to a third aspect of the embodiments of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the following steps are implemented:
获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
本申请实施例的第四方面,提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the The computer-readable instructions implement the following steps:
获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
有益效果Beneficial effect
本申请实施例中,在获取到包含文字的预设图像时,可首先联合采用均值漂移算法和双边滤波算法对预设图像进行背景去除,以提高背景去除效果,降低文字区域获取过程中的背景干扰;然后,可对去除背景后的预设图像进行灰度处理,得到预设图像的灰度图像,并可对灰度图像进行锐化操作得到灰度图像的增强图像,以使得增强图像中的文字区域更加突出和明显,从而方便最稳定极值区域MSER算法进行增强图像中各文字区域的提取,提高了文字区域提取的准确性,而在提取出各文字区域后,还可进一步获取各文字区域的位置信息,并可根据各文字区域的位置信息进行文字区域的分类,且对同一类的文字区域进行合并,得到最终文字区域,以减少文字区域的数量,提高文字区域获取速度和获取效率。In the embodiment of the present application, when a preset image containing text is acquired, the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain the gray image of the preset image, and the gray image can be sharpened to obtain the enhanced image of the gray image, so that the enhanced image The text area is more prominent and obvious, which facilitates the extraction of each text area in the enhanced image by the most stable extreme value area MSER algorithm, improves the accuracy of text area extraction, and after extracting each text area, you can further obtain each The location information of the text area, and the text area can be classified according to the location information of each text area, and the text areas of the same type are combined to obtain the final text area, so as to reduce the number of text areas, improve the speed of text area acquisition and acquisition effectiveness.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only for the application In some embodiments, for those of ordinary skill in the art, without paying creative labor, other drawings may be obtained based on these drawings.
图1为本申请实施例中一种文字区域获取方法的一个实施例流程图;FIG. 1 is a flowchart of an embodiment of a method for acquiring a text area in an embodiment of the present application;
图2为本申请实施例中一种文字区域获取方法在一个应用场景下使用MSER算法所提取的文字区域的示意图;2 is a schematic diagram of a text area extracted by using a MSER algorithm in an application scenario in a text area acquisition method in an embodiment of the present application;
图3为本申请实施例中一种文字区域获取方法在一个应用场景下进行文字区域分类的流程示意图;FIG. 3 is a schematic flow chart of a method for acquiring a text area in an application scenario to classify text areas in an application scenario; FIG.
图4为本申请实施例中一种文字区域获取方法在一个应用场景下获取文字区域的流程示意图;4 is a schematic flow chart of a method for acquiring a text area in an application scenario according to an embodiment of the present application;
图5为本申请实施例中一种文字区域获取方法在一个应用场景下进行膨胀处理后的示意图FIG. 5 is a schematic diagram of a text area acquisition method in an embodiment of the present application after performing expansion processing in an application scenario
图6为本申请实施例中一种文字区域获取装置的一个实施例结构图;6 is a structural diagram of an embodiment of an apparatus for acquiring a text area in an embodiment of the present application;
图7为本申请一实施例提供的一种终端设备的示意图。7 is a schematic diagram of a terminal device according to an embodiment of the present application.
本发明的实施方式Embodiments of the invention
本申请实施例提供了一种文字区域获取方法、装置、计算机可读存储介质及终端设备,能够准确获取图像中的文字区域,提高了文字区域获取的准确性和获取速度,极大地提高了文字区域的获取效率。Embodiments of the present application provide a method and apparatus for acquiring a text area, a computer-readable storage medium, and a terminal device, which can accurately acquire the text area in an image, improve the accuracy and speed of acquiring the text area, and greatly improve the text Regional access efficiency.
为使得本申请的发明目的、特征、优点能够更加的明显和易懂,下面将结合本申请实 施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本申请一部分实施例,而非全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the following The described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of this application.
请参阅图1,本申请实施例提供了一种文字区域获取方法,所述文字区域获取方法,包括:Referring to FIG. 1, an embodiment of the present application provides a method for acquiring a text area. The method for acquiring a text area includes:
步骤S101、获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Step S101: Acquire a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
可以理解的是,所述预设图像的获取方式可以为拍照方式,还可以为扫描方式,如当需要获取某一身份证中的姓名、身份证号码、住址等文字信息时,可先通过拍照方式,或者通过扫描方式来获取该身份证的预设图像;又如当需要获取某一发票上的发票信息时,也可通过拍照方式或者扫描方式获取该发票的预设图像。It can be understood that the method of acquiring the preset image may be a photographing method or a scanning method. For example, when text information such as the name, ID number, address, etc. in an ID card needs to be acquired, the photo may be taken first Way, or to obtain the preset image of the ID card by scanning; for example, when the invoice information on an invoice needs to be obtained, the preset image of the invoice can also be obtained by photographing or scanning.
本申请实施例中,在获取到所述预设图像后,可联合采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除,以去除所述预设图像中的图像背景,减少图像背景对文字区域获取的干扰。在此,对均值漂移算法和双边滤波算法的采用顺序不作任何限定,如可先采用均值漂移算法将所述预设图像的前景部分和图像背景进行分离,并获取分离出的前景部分,然后再通过双边滤波算法对分离出的前景部分进行进一步的背景去除;也可以先通过双边滤波算法去除所述预设图像的图像背景,然后再通过均值漂移算法进一步分离出所述预设图像的前景部分,以通过联合采用均值漂移算法和双边滤波算法来提高图像背景的去除效果,进而降低文字区域获取过程中图像背景的干扰,提高文字区域获取的准确性。In the embodiment of the present application, after acquiring the preset image, a mean shift algorithm and a bilateral filtering algorithm may be used to perform background removal on the preset image to remove the image background in the preset image and reduce the image The background interferes with the text area acquisition. Here, the order of adopting the mean shift algorithm and the bilateral filtering algorithm is not limited. For example, the mean shift algorithm can be used to separate the foreground portion of the preset image from the image background and obtain the separated foreground portion, and then The background portion of the separated foreground portion is further removed by a bilateral filtering algorithm; the image background of the preset image can also be removed by a bilateral filtering algorithm, and then the foreground portion of the preset image can be further separated by a mean shift algorithm In order to improve the removal effect of the image background by jointly using the mean shift algorithm and the bilateral filtering algorithm, thereby reducing the interference of the image background during the text area acquisition process and improving the accuracy of the text area acquisition.
步骤S102、对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Step S102: Perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
可以理解的是,为了方便对所述预设图像进行后续的图像处理,本申请实施例中,在得到去除背景后的预设图像后,即在得到所述预设图像的前景部分后,可进一步对所述预设图像进行灰度处理,以得到所述预设图像的灰度图像。在此,本申请实施例中,可采用现有的任一灰度处理方式来对所述预设图像进行灰度处理,本申请实施例对灰度处理方式不作任何限定,只要能得到所述预设图像的灰度图像即可。It can be understood that, in order to facilitate subsequent image processing on the preset image, in the embodiment of the present application, after obtaining the preset image after removing the background, that is, after obtaining the foreground portion of the preset image, Further performing gray-scale processing on the preset image to obtain a gray-scale image of the preset image. Here, in the embodiment of the present application, any existing grayscale processing method may be used to perform grayscale processing on the preset image. The embodiment of the present application does not make any limitation on the grayscale processing method, as long as the The grayscale image of the preset image is sufficient.
步骤S103、对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Step S103: Perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
在此,为避免拍照光线不均匀等导致像素变化不明显而造成文字区域获取效果差的问题,本申请实施例中,在获取到所述预设图像的灰度图像后,可进一步对所述灰度图像执行锐化操作,以使得所述灰度图像中文字部分的像素更加突出,从而使得所得到的增强图像中文字区域更加突出和明显,提高文字区域获取的准确性。Here, in order to avoid the problem that the pixel changes are insignificant due to uneven photo light, etc., resulting in poor acquisition of the text area, in the embodiment of the present application, after obtaining the grayscale image of the preset image, the The grayscale image performs a sharpening operation to make the pixels of the text portion in the grayscale image more prominent, thereby making the text area in the obtained enhanced image more prominent and obvious, and improving the accuracy of acquiring the text area.
进一步地,本申请实施例中,所述对所述灰度图像进行锐化操作,可以包括:Further, in the embodiment of the present application, the sharpening the grayscale image may include:
采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;
其中,所述3*3的卷积核为:
Figure PCTCN2019091526-appb-000001
Wherein, the 3*3 convolution kernel is:
Figure PCTCN2019091526-appb-000001
可以理解的是,本申请实施例中,可采用上述所述的3*3的卷积核与所述预设图像的灰度图像做卷积处理,以快速调整所述灰度图像中特定部位的对比度或者清晰度,从而可使得所述灰度图像中文字部分的像素更加突出与明显。It can be understood that, in the embodiment of the present application, the above-mentioned 3*3 convolution kernel and the gray image of the preset image may be used to perform convolution processing to quickly adjust specific parts in the gray image The contrast or sharpness of the image can make the pixels of the text part of the grayscale image more prominent and obvious.
步骤S104、使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Step S104: Use the most stable extreme value region MSER algorithm to extract each text region of the enhanced image, and obtain position information of each text region;
本申请实施例中,在得到所述灰度图像的增强图像后,即在得到所述文字部分的像素更加突出、明显的图像后,可采用最稳定极值区域MSER算法来提取出所述增强图像中的文字区域,如在某一具体应用场景中,采用MSER算法所提取的文字区域如图2所示,其中,MSER算法所提取出的每一个不规则的多边形则可代表一个文字区域。在获取到MSER算法提取出的各文字区域后,可随即获取各文字区域的位置信息,即可随即获取各文字区域中各点的坐标信息。In the embodiment of the present application, after obtaining the enhanced image of the grayscale image, that is, after obtaining an image with more prominent and obvious pixels of the text portion, the most stable extreme value region MSER algorithm may be used to extract the enhanced image The text area in the image, for example, in a specific application scenario, the text area extracted by the MSER algorithm is shown in FIG. 2, wherein each irregular polygon extracted by the MSER algorithm can represent a text area. After acquiring each text area extracted by the MSER algorithm, the position information of each text area can be obtained immediately, and the coordinate information of each point in each text area can be obtained immediately.
步骤S105、根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。Step S105: Classify the text area according to the position information of each text area, and merge text areas of the same type to obtain a final text area.
如图2所示,因MSER算法所提取出的文字区域往往包含有多个,如往往一个文字字符即可对应一个文字区域,因而,为提高文字区域的获取速度和获取效率,本申请实施例中,在获取了各文字区域的位置信息后,可进一步根据各文字区域的位置信息对文字区域进行聚类或者分类处理,并根据聚类或者分类结果将属于同一类的文字区域进行合并,从而得到最终文字区域,如将位于同一行的文字字符合并至同一个文字区域中,减少文字区域的获取数量,从而提高文字区域的获取速度和效率。As shown in FIG. 2, the text area extracted by the MSER algorithm often contains multiple text fields. For example, one text character can correspond to one text area. Therefore, in order to improve the acquisition speed and efficiency of the text area, embodiments of the present application After obtaining the location information of each text area, the text area can be further clustered or classified according to the location information of each text area, and the text areas belonging to the same class can be merged according to the clustering or classification results, thereby Obtain the final text area, for example, merge text characters located in the same line into the same text area, reduce the number of text area acquisitions, and thus improve the speed and efficiency of text area acquisition.
优选地,如图3所示,所述根据各所述文字区域的位置信息进行文字区域的分类,可以包括:Preferably, as shown in FIG. 3, the classification of the text area according to the position information of each of the text areas may include:
步骤S301、根据各所述文字区域的位置信息,确定各所述文字区域的中心点,并获取各所述中心点的中心点坐标;Step S301: Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;
步骤S302、将各所述中心点坐标之间满足第一预设条件的中心点确定为同一类,得到所述中心点的分类结果;Step S302: Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;
步骤S303、根据所述中心点的分类结果对各所述文字区域进行分类。Step S303: Classify each text area according to the classification result of the center point.
对于上述步骤S301至步骤S303,可以理解的是,在获取到各文字区域的位置信息后,如在获取到各文字区域中各点的坐标信息后,可根据各点的坐标信息确定出各文字区域的中心点,并可获取各中心点的中心点坐标,即可获取各中心点的横坐标和纵坐标,随后可根据各中心点的横坐标和纵坐标来对中心点进行分类,以根据中心点的分类结果对各文字区域进行分类。With respect to the above steps S301 to S303, it can be understood that after the position information of each text area is obtained, if the coordinate information of each point in each text area is obtained, each text can be determined according to the coordinate information of each point The center point of the area, and the center point coordinates of each center point can be obtained, and the horizontal and vertical coordinates of each center point can be obtained, and then the center points can be classified according to the horizontal and vertical coordinates of each center point, according to The classification result of the center point classifies each text area.
其中,所述第一预设条件可以是纵坐标之间的差值满足预设阈值,所述预设阈值可以设置为零。当所述预设阈值为零时,表明可将纵坐标相同的中心点分为一类,即可将位于同一行的中心点确定为一类,如在某一具体应用场景中,中心点A与中心点B的纵坐标相同,中心点C、中心点D以及中心点E的纵坐标相同,中心点F、中心点G、中心点H以及中心点I的纵坐标相同,即表明中心点A与中心点B属于同一行,中心点C、中心点D以及中心点E属于同一行,而中心点F、中心点G、中心点H以及中心点I属于同一行,则可将中心点A与中心点B划分至一类,例如划分至类A,可将中心点C、中心点D以及中心点E划分至另一类,例如划分至类B,同时还可将中心点F、中心点G、中心点H以及中心点I划分至一类,例如划分至类C。The first preset condition may be that the difference between the ordinates meets a preset threshold, and the preset threshold may be set to zero. When the preset threshold value is zero, it indicates that the center points with the same ordinate can be divided into one category, that is, the center points located in the same row can be determined as one category, for example, in a specific application scenario, the center point A It is the same as the ordinate of the center point B, the ordinates of the center point C, the center point D and the center point E are the same, the center point F, the center point G, the center point H and the center point I have the same ordinate, which means the center point A It belongs to the same line as center point B, center point C, center point D and center point E belong to the same line, while center point F, center point G, center point H and center point I belong to the same line, then center point A and Center point B is divided into one category, such as class A, center point C, center point D and center point E can be divided into another class, such as class B, and center point F, center point G can also be divided , The center point H and the center point I are classified into one category, for example, into category C.
在此,在得到中心点的分类结果后,例如,得到上述的类A、类B以及类C后,则可将类A中各中心点所对应的文字区域划分为第一类,将类B中各中心点所对应的文字区域划分为第二类,将类C中各中心点所对应的文字区域划分为第三类,即可将中心点A所对应的文字区域A和中心点B所对应的文字区域B划分为第一类,将中心点C所对应的文字区域C、中心点D所对应的文字区域D以及中心点E所对应的文字区域E划分为第二类,将中心点F所对应的文字区域F、中心点G所对应的文字区域G、中心点H所对应的文字区域H以及中心点I所对应的文字区域I划分为第三类。Here, after obtaining the classification result of the center point, for example, after obtaining the above-mentioned class A, class B, and class C, the text area corresponding to each center point in class A may be divided into the first class, and class B The text area corresponding to each center point in class C is divided into the second category, and the text area corresponding to each center point in class C is divided into the third class, that is, the text region A corresponding to center point A and the center point B The corresponding text area B is divided into the first category, the text area C corresponding to the center point C, the text area D corresponding to the center point D, and the text area E corresponding to the center point E are divided into the second type, and the center point The text area F corresponding to F, the text area G corresponding to the center point G, the text area H corresponding to the center point H, and the text area I corresponding to the center point I are classified into the third category.
需要说明的是,所述预设阈值为零仅作示意性解释,不应理解为对本申请实施例的限制,本申请实施例中,所述预设阈值当然也可以为其他值,如可以为0.5或者1等等,当所述预设阈值为0.5时,即表明可将纵坐标之间的差值小于或者等于0.5的中心点划分为同一类。另外,本申请实施例中,所述第一预设条件可以是纵坐标之间的差值满足第一预设阈值的同时,横坐标之间的差值满足第二预设阈值,其中,横坐标之间的差值满足第二预设阈值的情形与前述描述的纵坐标之间的差值满足预设阈值相似,基本原理相同,为简明起见,在此不再赘述。It should be noted that the preset threshold is zero for illustrative explanation only, and should not be construed as a limitation to the embodiment of the present application. In the embodiment of the present application, the preset threshold may of course be other values, such as 0.5 or 1, etc., when the preset threshold is 0.5, it means that the center points with the difference between the ordinates less than or equal to 0.5 can be classified into the same category. In addition, in the embodiment of the present application, the first preset condition may be that the difference between the ordinates meets the first preset threshold, and the difference between the abscissas meets the second preset threshold, where The case where the difference between the coordinates meets the second preset threshold is similar to the difference between the ordinates described above that satisfies the preset threshold, and the basic principles are the same. For the sake of brevity, they are not repeated here.
进一步地,本申请实施例中,当然也可以根据文字区域中其他点的位置信息来对文字区域进行分类,如可获取文字区域中最上点和最下点的纵坐标以及中心点的横坐标,并可将中心点横坐标相同、最上点纵坐标满足第三预设条件以及最下点纵坐标满足第四预设条件的 文字区域划分为一类。其中,第三预设条件和第四预设条件可以是纵坐标之间的差值在预设值之内。Further, in the embodiment of the present application, of course, the text area can also be classified according to the position information of other points in the text area, for example, the vertical coordinate of the uppermost and lowermost points in the text area and the horizontal coordinate of the center point can be obtained. And the text regions with the same horizontal coordinate of the center point, the vertical coordinate of the uppermost point satisfying the third preset condition and the vertical coordinate of the lowermost point can be divided into one type. The third preset condition and the fourth preset condition may be that the difference between the ordinates is within the preset value.
可选地,如图4所示,本申请实施例中,所述根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域,可以包括:Optionally, as shown in FIG. 4, in the embodiment of the present application, the classification of the text regions according to the position information of each of the text regions, and the combination of text regions of the same type to obtain the final text region may include: :
步骤S401、构建与所述增强图像的大小相同的空白画布;Step S401: Construct a blank canvas of the same size as the enhanced image;
需要说明的是,为防止过滤不干净的图像背景对文字区域获取的干扰,以进一步提高文字区域获取的准确性,本申请实施例中,在使用最稳定极值区域MSER算法提取出所述增强图像的各文字区域后,可首先构建一与所述增强图像大小相同的空白画布。It should be noted that, in order to prevent the interference of the unfiltered image background on the acquisition of the text area, and to further improve the accuracy of the text area acquisition, in the embodiment of the present application, the most stable extreme value area MSER algorithm is used to extract the enhancement After each text area of the image, a blank canvas of the same size as the enhanced image can be constructed first.
步骤S402、将所提取的各文字区域按照在所述增强图像中的排布位置,导入所述空白画布中;Step S402: Import each extracted text area into the blank canvas according to the arrangement position in the enhanced image;
在构建出所述空白画布后,可将MSER算法提取出的各文字区域导入所述空白画布中,其中,在将各文字区域导入所述空白画布时,需按照文字区域在所述增强图像中的排布位置进行导入,以使得在所述空白画布中导入各文字区域后所形成的图像与所述增强图像相同。After constructing the blank canvas, each text area extracted by the MSER algorithm can be imported into the blank canvas, wherein, when importing each text area into the blank canvas, it is necessary to follow the text area in the enhanced image The arrangement position of is imported so that the image formed after the text areas are imported into the blank canvas is the same as the enhanced image.
步骤S403、对位于所述空白画布中的各文字区域进行膨胀处理,得到膨胀后的各第一文字区域;Step S403: Perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
在此,MSER算法所提取出的文字区域往往为不规则的多边形,而文字区域获取中所需要的是行文本,即需要对同一行上的多边形进行拟合,若直接拟合不规则的多边形则较麻烦,因而,如图5所示,本申请实施例中,在对多边形进行拟合之前,可先对多边形进行膨胀处理,即可先对所述空白画布中的各文字区域进行膨胀处理,以使得文字区域联通在一起。在此,在对各文字区域进行膨胀处理后,还可以对各文字区域进行腐蚀处理,以通过先膨胀后腐蚀的操作来达到联通文字区域和平滑边界的作用。Here, the text area extracted by the MSER algorithm is often an irregular polygon, and what is needed in the text area acquisition is line text, that is, the polygon on the same line needs to be fitted. If the irregular polygon is directly fitted It is troublesome. Therefore, as shown in FIG. 5, in the embodiment of the present application, before fitting the polygon, the polygon may be inflated, that is, each text area in the blank canvas may be inflated. , So that the text areas are connected together. Here, after performing expansion processing on each character region, each character region may also be etched to achieve the effect of connecting the character region and smoothing the boundary through the operation of expanding first and then eroding.
步骤S404、对各所述第一文字区域进行边缘检测,确定相联通的第一文字区域,并将相联通的第一文字区域合并成联通区域;Step S404: Perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;
步骤S405、获取各所述联通区域的最小外接矩形的位置信息;Step S405: Acquire position information of the smallest circumscribed rectangle of each of the connected areas;
步骤S406、根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,并对同一类的联通区域进行合并,得到最终文字区域。Step S406: Classify each of the connected areas according to the position information of the smallest circumscribed rectangles, and merge the connected areas of the same type to obtain a final text area.
对于上述步骤S404至步骤S406,可以理解的是,本申请实施例中,在得到膨胀处理后的各所述第一文字区域后,可对各所述第一文字区域进行边缘检测,如可通过OpenCV中的findcontours()函数对各所述第一文字区域进行边缘检测,以根据检测结果确定相联通的第一文字区域,并可将相联通的第一文字区域合并成联通区域,同时检测得到各联通区域的最小外接矩形,所述最小外接矩形为包含相联通的各第一文字区域的最小的矩形,并获取各最 小外接矩形的位置信息,从而可根据最小外接矩形的位置信息对各所述联通区域进行分类,并可对同一类的联通区域进行合并,得到最终文字区域。With respect to the above steps S404 to S406, it can be understood that, in the embodiment of the present application, after the expanded first text regions are obtained, edge detection may be performed on each of the first text regions, for example, through OpenCV The findcontours() function performs edge detection on each of the first text areas to determine the connected first text area according to the detection result, and can merge the connected first text area into a connected area, and at the same time, the minimum of each connected area is detected. Circumscribed rectangles, the smallest circumscribed rectangles are the smallest rectangles containing the connected first text areas, and the position information of each smallest circumscribed rectangle is obtained, so that each of the connected areas can be classified according to the position information of the smallest circumscribed rectangle, It can also merge the connected areas of the same type to get the final text area.
在此,所述检测结果可包括相邻的第一文字区域之间的距离,本申请实施例中,可通过设置距离阈值来确定相邻的第一文字区域之间是否相联通,如在某一具体应用场景中,可将所述距离阈值设置为1cm,因而,当检测确定第一文字区域与第二文字区域之间的距离为0.6cm,而第二文字区域与第三文字区域之间的距离为0.7cm时,则可确定第一文字区域与第二文字区域相联通,第二文字区域与第三文字区域相联通,即可将第一文字区域、第二文字区域以及第三文字区域合并成联通区域。Here, the detection result may include the distance between adjacent first text regions. In the embodiment of the present application, a distance threshold may be set to determine whether the adjacent first text regions are connected, such as in a specific In an application scenario, the distance threshold can be set to 1 cm. Therefore, when the detection determines that the distance between the first text area and the second text area is 0.6 cm, and the distance between the second text area and the third text area is At 0.7cm, it can be determined that the first text area and the second text area are connected, and the second text area and the third text area are connected, and the first text area, the second text area, and the third text area can be merged into a connected area .
需要说明的是,本申请实施例中,通过距离阈值的设置来确定相联通的第一文字区域仅作示意性解释,不应理解为对本申请实施例的限制,本申请实施例中,当然也可以采用其他任何可确定文字区域之间联通与否的方式来确定相联通的第一文字区域。It should be noted that, in the embodiment of the present application, the determination of the connected first text area through the setting of the distance threshold is for illustrative explanation only, and should not be construed as a limitation on the embodiment of the present application. The first connected text area is determined by any other method that can determine whether the text areas are connected or not.
其中,所述根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,可以包括:Wherein, classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles may include:
步骤a、获取各所述最小外接矩形的对角坐标;Step a: Obtain the diagonal coordinates of each minimum circumscribed rectangle;
步骤b、根据各所述对角坐标,对各所述联通区域进行分类。Step b: Classify each of the connected areas according to each of the diagonal coordinates.
对于上述步骤a和步骤b,可以理解的是,本申请实施例中的获取各最小外接矩形的位置信息,可以是获取各最小外接矩形中对角点的坐标信息,如获取各最小外接矩形的左上点坐标和右下点坐标,以根据所述左上点坐标和所述右下点坐标对所有联通区域进行分类,如可将所有左上点纵坐标相同和右下点纵坐标相同的联通区域划分为一类,例如,当最小外接矩形A的左上点纵坐标与最小外接矩形B的左上点纵坐标相同,且最小外接矩形A的右下点纵坐标与最小外接矩形B的右下点纵坐标相同,同时最小外接矩形C的左上点纵坐标与最小外接矩形B的左上点纵坐标相同,且最小外接矩形C的右下点纵坐标与最小外接矩形B的右下点纵坐标相同时,则可将最小外接矩形A对应的联通区域A、最小外接矩形B对应的联通区域B以及最小外接矩形C对应的联通区域C划分为同一类。For the above steps a and b, it can be understood that the position information of each minimum circumscribed rectangle in the embodiment of the present application may be the coordinate information of the diagonal points in each minimum circumscribed rectangle, such as the acquisition of the minimum circumscribed rectangles. The coordinates of the upper left point and the coordinates of the lower right point are used to classify all connected areas according to the coordinates of the upper left point and the coordinates of the lower right point. For example, all connected areas with the same vertical coordinate of the upper left point and the same vertical coordinate of the lower right point can be divided It is a type, for example, when the ordinate of the upper left point of the smallest circumscribed rectangle A is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle A and the ordinate of the lower right point of the smallest circumscribed rectangle B If the ordinate of the upper left point of the smallest circumscribed rectangle C is the same as the ordinate of the upper left point of the smallest circumscribed rectangle B, and the ordinate of the lower right point of the smallest circumscribed rectangle C is the same as the ordinate of the lower right point of the smallest circumscribed rectangle B, then The connectivity area A corresponding to the smallest circumscribed rectangle A, the connectivity area B corresponding to the smallest circumscribed rectangle B, and the connectivity area C corresponding to the smallest circumscribed rectangle C can be classified into the same category.
需要说明的是,本申请实施例中,根据左上点纵坐标相同和右下点纵坐标相同来进行联通区域的分类仅作示意性解释,不应理解为对本申请实施例的限制,本申请实施例中,当然也可以设置左上点纵坐标之间的差值需满足的第四预设条件和右下点纵坐标之间的差值需满足的第五预设条件,以根据第四预设条件和第五预设条件来进行联通区域的分类。其中,第四预设条件和第五预设条件可以是纵坐标之间的差值在预设值之内。当然,本申请实施例中,还可以根据左下点纵坐标和右上点纵坐标来进行联通区域的分类。It should be noted that in the embodiments of the present application, the classification of the connected area according to the same ordinate of the upper left point and the same ordinate of the lower right point is only for schematic explanation, and should not be construed as a limitation to the embodiments of the present application. In the example, of course, it is also possible to set a fourth preset condition that the difference between the ordinate of the upper left point needs to meet and a fifth preset condition that the difference between the ordinate of the lower right point needs to meet, according to the fourth preset The condition and the fifth preset condition are used to classify the connected area. The fourth preset condition and the fifth preset condition may be that the difference between the ordinates is within the preset value. Of course, in the embodiment of the present application, the connection area may also be classified according to the ordinate of the lower left point and the ordinate of the upper right point.
进一步地,本申请实施例中,在根据最小外接矩形的位置信息对所有联通区域进行分 类,得到多个类簇后,还可对各类簇内的联通区域执行筛选、过滤等操作,如可在各类簇中筛选出与该类簇中其他联通区域的距离大于预设距离阈值的联通区域,并从该类簇中过滤掉所筛选出的联通区域,即从对应的类簇中去除该联通区域;又或者在各类簇中筛选出区域面积大于预设面积阈值的联通区域,并从对应的类簇中过滤掉所筛选出的联通区域;再或者在各类簇中筛选出位于某一联通区域内的联通区域等等,以防止获取到不是文字的区域,或者防止文字区域的重复获取,从而提高分类准确性,提高文字区域获取效率和准确性。Further, in the embodiment of the present application, after all the connected areas are classified according to the position information of the smallest circumscribed rectangle to obtain multiple clusters, operations such as filtering and filtering can be performed on the connected areas within various types of clusters. Filter out the connected areas whose distance from other connected areas in the cluster is greater than the preset distance threshold, and filter out the selected connected areas from the cluster, that is, remove the corresponding cluster from the corresponding cluster Connectivity area; or filter out the connectivity areas whose area area is greater than the preset area threshold among various clusters, and filter out the selected connectivity areas from the corresponding clusters; A Unicom area within a Unicom area, etc., to prevent acquisition of areas that are not text, or to prevent repeated acquisition of text areas, thereby improving classification accuracy, and improving the efficiency and accuracy of text area acquisition.
优选地,在采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除之前,还可以包括:Preferably, before the mean shift algorithm and the bilateral filtering algorithm are used to remove the background of the preset image, the method may further include:
步骤c、采集所述预设图像中各像素点的RGB值;Step c: Collect RGB values of each pixel in the preset image;
步骤d、提取RGB值满足第二预设条件的像素点,并在所述预设图像中删除所提取的像素点。Step d: Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.
对于上述步骤c和步骤d,可以理解的是,在进行有明显颜色区分的预设图像中的文字区域获取时,如在获取发票中的文字区域时,本申请实施例在获取发票的预设图像之后,可先采用颜色分离技术提取出该发票的预设图像中的干扰区域,如提取出该发票中的边框和印章等干扰区域,并在该发票的预设图像中删除该干扰区域的像素点,然后再采用均值漂移算法和双边滤波算法对删除干扰区域的像素点后的预设图像进行背景去除以及后续的步骤,以此进行文字区域的获取。在此,干扰区域可根据像素点的RGB值进行确定,而所述第二预设条件则可根据需要去除的干扰区域的具体颜色进行设置。With respect to the above steps c and d, it can be understood that when acquiring a text area in a preset image with a clear color distinction, such as when acquiring a text area in an invoice, the embodiment of the present application acquires a preset invoice After the image, you can first use color separation technology to extract the interference area in the preset image of the invoice, such as extracting the interference area such as the border and seal in the invoice, and delete the interference area in the preset image of the invoice Pixels, and then use the mean shift algorithm and bilateral filtering algorithm to remove the background image of the pixels in the interference area and remove the background and subsequent steps to obtain the text area. Here, the interference area may be determined according to the RGB value of the pixel, and the second preset condition may be set according to the specific color of the interference area to be removed.
本申请实施例中,在获取到包含文字的预设图像时,可首先联合采用均值漂移算法和双边滤波算法对预设图像进行背景去除,以提高背景去除效果,降低文字区域获取过程中的背景干扰;然后,可对去除背景后的预设图像进行灰度处理,得到灰度图像,并可对灰度图像进行锐化操作得到增强图像,以使得增强图像中的文字区域更加突出和明显,从而方便最稳定极值区域MSER算法进行增强图像中各文字区域的提取,提高了文字区域提取的准确性,而在提取出各文字区域后,可进一步获取各文字区域的位置信息,并可根据各文字区域的位置信息进行文字区域的分类,且对同一类的文字区域进行合并,得到最终文字区域,以减少文字区域的数量,提高文字区域的获取速度和获取效率。In the embodiment of the present application, when a preset image containing text is acquired, the mean shift algorithm and the bilateral filtering algorithm may be used first to perform background removal on the preset image to improve the background removal effect and reduce the background during the text area acquisition process Interference; then, the preset image after removing the background can be gray-scale processed to obtain a gray-scale image, and the gray-scale image can be sharpened to obtain an enhanced image, so that the text area in the enhanced image is more prominent and obvious, In order to facilitate the extraction of each text area in the enhanced image by the most stable extreme value area MSER algorithm, the accuracy of text area extraction is improved, and after each text area is extracted, the position information of each text area can be further obtained and can be based on The position information of each text area is classified into text areas, and the text areas of the same type are merged to obtain a final text area, so as to reduce the number of text areas and improve the acquisition speed and efficiency of the text area.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
上面主要描述了一种文字区域获取方法,下面将对一种文字区域获取装置进行详细描述。The above mainly describes a text area acquisition method, and a text area acquisition device will be described in detail below.
图6示出了本申请实施例中一种文字区域获取装置的一个实施例结构图。如图6所示, 所述文字区域获取装置,包括:FIG. 6 shows a structural diagram of an embodiment of a device for acquiring a text area in an embodiment of the present application. As shown in FIG. 6, the text area acquisition device includes:
背景去除模块601,用于获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;The background removal module 601 is used to obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
灰度处理模块602,用于对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;The grayscale processing module 602 is configured to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
锐化处理模块603,用于对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;A sharpening processing module 603, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
位置获取模块604,用于使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;The position obtaining module 604 is used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm and obtain position information of each text area;
区域获取模块605,用于根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The area acquisition module 605 is configured to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
进一步地,所述锐化处理模块603,具体用于采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;Further, the sharpening processing module 603 is specifically configured to perform a convolution process on the grayscale image using a 3*3 convolution kernel to perform sharpening operation on the grayscale image;
其中,所述3*3的卷积核为:
Figure PCTCN2019091526-appb-000002
Wherein, the 3*3 convolution kernel is:
Figure PCTCN2019091526-appb-000002
优选地,所述区域获取模块605,可以包括:Preferably, the area acquisition module 605 may include:
中心点确定单元,用于根据各所述文字区域的位置信息,确定各所述文字区域的中心点,并获取各所述中心点的中心点坐标;A center point determining unit, configured to determine the center point of each text area based on the position information of each text area, and obtain the center point coordinates of each center point;
中心点分类单元,用于将各所述中心点坐标之间满足第一预设条件的中心点确定为同一类,得到所述中心点的分类结果;A center point classification unit, configured to determine center points that satisfy the first preset condition between the center point coordinates as the same type, and obtain a classification result of the center points;
文字区域分类单元,用于根据所述中心点的分类结果对各所述文字区域进行分类。The text area classification unit is used to classify each text area according to the classification result of the center point.
可选地,所述区域获取模块605,可以包括:Optionally, the area acquisition module 605 may include:
空白画布构建单元,用于构建与所述增强图像的大小相同的空白画布;A blank canvas construction unit, used to build a blank canvas of the same size as the enhanced image;
文字区域导入单元,用于将所提取的各文字区域按照在所述增强图像中的排布位置,导入所述空白画布中;A text area importing unit for importing each extracted text area into the blank canvas according to the arrangement position in the enhanced image;
膨胀处理单元,用于对位于所述空白画布中的各文字区域进行膨胀处理,得到膨胀后的各第一文字区域;An expansion processing unit, configured to perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
边缘检测单元,用于对各所述第一文字区域进行边缘检测,确定相联通的第一文字区域,并将相联通的第一文字区域合并成联通区域;An edge detection unit, configured to perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;
位置信息获取单元,用于获取各所述联通区域的最小外接矩形的位置信息;A location information acquiring unit, configured to acquire the location information of the smallest circumscribed rectangle of each of the connected areas;
联通区域合并单元,用于根据各所述最小外接矩形的位置信息对各所述联通区域进行 分类,并对同一类的联通区域进行合并,得到最终文字区域。The connected area merging unit is used to classify the connected areas according to the position information of the smallest circumscribed rectangles, and combine the connected areas of the same type to obtain the final text area.
进一步地,所述联通区域合并单元,可以包括:Further, the China Unicom area merging unit may include:
对角坐标获取子单元,用于获取各所述最小外接矩形的对角坐标;A diagonal coordinate obtaining subunit, used to obtain the diagonal coordinates of each of the smallest circumscribed rectangles;
联通区域合并子单元,用于根据各所述对角坐标,对各所述联通区域进行分类。The connectivity area merging subunit is used to classify the connectivity areas according to the diagonal coordinates.
优选地,所述文字区域获取装置,还可以包括:Preferably, the text area acquisition device may further include:
RGB值采集模块,用于采集所述预设图像中各像素点的RGB值;RGB value collection module, used to collect the RGB value of each pixel in the preset image;
像素点删除模块,用于提取RGB值满足第二预设条件的像素点,并在所述预设图像中删除所提取的像素点。The pixel deletion module is used to extract pixels whose RGB values meet the second preset condition, and delete the extracted pixels in the preset image.
图7是本申请一实施例提供的终端设备的示意图。如图7所示,该实施例的终端设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机可读指令72,例如文字区域获取程序。所述处理器70执行所述计算机可读指令72时实现上述各个文字区域获取方法实施例中的步骤,例如图1所示的步骤S101至步骤S105。或者,所述处理器70执行所述计算机可读指令72时实现上述各装置实施例中各模块/单元的功能,例如图6所示的模块601至模块605的功能。7 is a schematic diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and computer-readable instructions 72 stored in the memory 71 and executable on the processor 70, such as text area acquisition program. When the processor 70 executes the computer-readable instruction 72, the steps in the above embodiments of the method for acquiring a text area are implemented, for example, steps S101 to S105 shown in FIG. 1. Alternatively, when the processor 70 executes the computer-readable instructions 72, the functions of each module/unit in the foregoing device embodiments are realized, for example, the functions of the modules 601 to 605 shown in FIG. 6.
示例性的,所述计算机可读指令72可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器71中,并由所述处理器70执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令72在所述终端设备7中的执行过程。Exemplarily, the computer-readable instructions 72 may be divided into one or more modules/units, the one or more modules/units are stored in the memory 71 and executed by the processor 70, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions. The instruction segments are used to describe the execution process of the computer-readable instructions 72 in the terminal device 7.
所述终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是终端设备7的示例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。The terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer and a cloud server. The terminal device may include, but is not limited to, a processor 70 and a memory 71. Those skilled in the art may understand that FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7, and may include more or less components than those illustrated, or a combination of certain components, or different components. For example, the terminal device may further include an input and output device, a network access device, a bus, and the like.
所述处理器70可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
所述存储器71可以是所述终端设备7的内部存储单元,例如终端设备7的硬盘或内存。所述存储器71也可以是所述终端设备7的外部存储设备,例如所述终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存 卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述终端设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储所述计算机可读指令以及所述终端设备所需的其他程序和数据。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, for example, a plug-in hard disk equipped on the terminal device 7, a smart memory card (Smart, Media, Card, SMC), and a secure digital (SD) Cards, flash cards, etc. Further, the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device. The memory 71 is used to store the computer-readable instructions and other programs and data required by the terminal device. The memory 71 can also be used to temporarily store data that has been or will be output.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they can still The technical solutions described in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种文字区域获取方法,其特征在于,包括:A method for acquiring a text area, characterized in that it includes:
    获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
    对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
    对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
    使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
    根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  2. 根据权利要求1所述的文字区域获取方法,其特征在于,所述对所述灰度图像进行锐化操作,包括:The method for acquiring a text area according to claim 1, wherein the sharpening the grayscale image includes:
    采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;
    其中,所述3*3的卷积核为:
    Figure PCTCN2019091526-appb-100001
    Wherein, the 3*3 convolution kernel is:
    Figure PCTCN2019091526-appb-100001
  3. 根据权利要求1所述的文字区域获取方法,其特征在于,所述根据各所述文字区域的位置信息进行文字区域的分类,包括:The method for acquiring a text area according to claim 1, wherein the classifying the text area according to the position information of each text area includes:
    根据各所述文字区域的位置信息,确定各所述文字区域的中心点,并获取各所述中心点的中心点坐标;Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;
    将各所述中心点坐标之间满足第一预设条件的中心点确定为同一类,得到所述中心点的分类结果;Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;
    根据所述中心点的分类结果对各所述文字区域进行分类。Classify each of the character regions according to the classification result of the center point.
  4. 根据权利要求1所述的文字区域获取方法,其特征在于,所述根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域,包括:The method for acquiring a text area according to claim 1, wherein the classifying the text area according to the position information of each text area and merging the text areas of the same type to obtain the final text area includes:
    构建与所述增强图像的大小相同的空白画布;Construct a blank canvas of the same size as the enhanced image;
    将所提取的各文字区域按照在所述增强图像中的排布位置,导入所述空白画布中;Import the extracted text areas into the blank canvas according to the arrangement position in the enhanced image;
    对位于所述空白画布中的各文字区域进行膨胀处理,得到膨胀后的各第一文字区域;Performing expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
    对各所述第一文字区域进行边缘检测,确定相联通的第一文字区域,并将相联通的第一文字区域合并成联通区域;Performing edge detection on each of the first text areas, determining the connected first text areas, and merging the connected first text areas into a connected area;
    获取各所述联通区域的最小外接矩形的位置信息;Acquiring the position information of the smallest circumscribed rectangle of each of the connected areas;
    根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,并对同一类的联通区域进行合并,得到最终文字区域。According to the position information of each minimum circumscribed rectangle, each of the connected areas is classified, and the connected areas of the same type are combined to obtain a final text area.
  5. 根据权利要求4所述的文字区域获取方法,其特征在于,所述根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,包括:The method for acquiring a text area according to claim 4, wherein the classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles includes:
    获取各所述最小外接矩形的对角坐标;Obtain the diagonal coordinates of each of the smallest circumscribed rectangles;
    根据各所述对角坐标,对各所述联通区域进行分类。According to each of the diagonal coordinates, each of the connected areas is classified.
  6. 根据权利要求1至5中任一项所述的文字区域获取方法,其特征在于,在采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除之前,还包括:The method for acquiring a text area according to any one of claims 1 to 5, wherein before the background removal is performed on the preset image by using a mean shift algorithm and a bilateral filtering algorithm, the method further includes:
    采集所述预设图像中各像素点的RGB值;Collect the RGB values of each pixel in the preset image;
    提取RGB值满足第二预设条件的像素点,并在所述预设图像中删除所提取的像素点。Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.
  7. 一种文字区域获取装置,其特征在于,包括:An apparatus for acquiring a text area, characterized in that it includes:
    背景去除模块,用于获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;The background removal module is used to obtain a preset image containing text, and uses a mean shift algorithm and a bilateral filtering algorithm to perform background removal on the preset image;
    灰度处理模块,用于对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;The grayscale processing module is used to perform grayscale processing on the preset image after removing the background to obtain a grayscale image of the preset image;
    锐化处理模块,用于对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;A sharpening processing module, configured to perform a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
    位置获取模块,用于使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;A position acquisition module, used to extract each text area of the enhanced image using the most stable extreme value area MSER algorithm, and obtain position information of each text area;
    区域获取模块,用于根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The area acquisition module is used to classify text areas based on the position information of each text area, and merge text areas of the same type to obtain a final text area.
  8. 根据权利要求7所述的文字区域获取装置,其特征在于,所述锐化处理模块,用于采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;The character area acquisition device according to claim 7, wherein the sharpening processing module is configured to perform a convolution process on the grayscale image using a 3*3 convolution kernel to process the grayscale image Perform sharpening operations;
    其中,所述3*3的卷积核为:
    Figure PCTCN2019091526-appb-100002
    Wherein, the 3*3 convolution kernel is:
    Figure PCTCN2019091526-appb-100002
  9. 根据权利要求7所述的文字区域获取装置,其特征在于,所述区域获取模块,包括:The text area acquisition device according to claim 7, wherein the area acquisition module includes:
    中心点确定单元,用于根据各所述文字区域的位置信息,确定各所述文字区域的中心点,并获取各所述中心点的中心点坐标;A center point determining unit, configured to determine the center point of each text area based on the position information of each text area, and obtain the center point coordinates of each center point;
    中心点分类单元,用于将各所述中心点坐标之间满足第一预设条件的中心点确定为同一类,得到所述中心点的分类结果;A center point classification unit, configured to determine center points that satisfy the first preset condition between the center point coordinates as the same type, and obtain a classification result of the center points;
    文字区域分类单元,用于根据所述中心点的分类结果对各所述文字区域进行分类。The text area classification unit is used to classify each text area according to the classification result of the center point.
  10. 根据权利要求7所述的文字区域获取装置,其特征在于,所述区域获取模块,包括:The text area acquisition device according to claim 7, wherein the area acquisition module includes:
    空白画布构建单元,用于构建与所述增强图像的大小相同的空白画布;A blank canvas construction unit, used to build a blank canvas of the same size as the enhanced image;
    文字区域导入单元,用于将所提取的各文字区域按照在所述增强图像中的排布位置,导入所述空白画布中;A text area importing unit for importing each extracted text area into the blank canvas according to the arrangement position in the enhanced image;
    膨胀处理单元,用于对位于所述空白画布中的各文字区域进行膨胀处理,得到膨胀后的各第一文字区域;An expansion processing unit, configured to perform expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
    边缘检测单元,用于对各所述第一文字区域进行边缘检测,确定相联通的第一文字区域,并将相联通的第一文字区域合并成联通区域;An edge detection unit, configured to perform edge detection on each of the first text regions, determine the connected first text regions, and merge the connected first text regions into a connected region;
    位置信息获取单元,用于获取各所述联通区域的最小外接矩形的位置信息;A location information acquiring unit, configured to acquire the location information of the smallest circumscribed rectangle of each of the connected areas;
    联通区域合并单元,用于根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,并对同一类的联通区域进行合并,得到最终文字区域。A connection area merging unit is used to classify each connection area according to the position information of each minimum circumscribed rectangle, and merge the connection areas of the same type to obtain a final text area.
  11. 根据权利要求10所述的文字区域获取装置,其特征在于,所述联通区域合并单元,包括:The apparatus for acquiring a text area according to claim 10, wherein the unit for merging the connected areas includes:
    对角坐标获取子单元,用于获取各所述最小外接矩形的对角坐标;A diagonal coordinate obtaining subunit, used to obtain the diagonal coordinates of each of the smallest circumscribed rectangles;
    联通区域合并子单元,用于根据各所述对角坐标,对各所述联通区域进行分类。The connectivity area merging subunit is used to classify the connectivity areas according to the diagonal coordinates.
  12. 根据权利要求7至11任一项所述的文字区域获取装置,其特征在于,所述文字区域获取装置,还包括:The text area acquisition device according to any one of claims 7 to 11, wherein the text area acquisition device further comprises:
    RGB值采集模块,用于采集所述预设图像中各像素点的RGB值;RGB value collection module, used to collect the RGB value of each pixel in the preset image;
    像素点删除模块,用于提取RGB值满足第二预设条件的像素点,并在所述预设图像中删除所提取的像素点。The pixel deletion module is used to extract pixels whose RGB values meet the second preset condition, and delete the extracted pixels in the preset image.
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium, the computer-readable storage medium storing computer-readable instructions, characterized in that, when the computer-readable instructions are executed by a processor, the following steps are implemented:
    获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
    对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
    对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
    使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
    根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  14. 根据权利要求13所述的计算机可读存储介质,所述对所述灰度图像进行锐化操作,包括:The computer-readable storage medium of claim 13, said sharpening the grayscale image comprising:
    采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;
    其中,所述3*3的卷积核为:
    Figure PCTCN2019091526-appb-100003
    Wherein, the 3*3 convolution kernel is:
    Figure PCTCN2019091526-appb-100003
  15. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A terminal device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that the processor is implemented as follows when executing the computer-readable instructions step:
    获取包含文字的预设图像,并采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除;Obtain a preset image containing text, and use a mean shift algorithm and a bilateral filtering algorithm to remove the background from the preset image;
    对去除背景后的预设图像进行灰度处理,得到所述预设图像的灰度图像;Grayscale processing the preset image after removing the background to obtain a grayscale image of the preset image;
    对所述灰度图像进行锐化操作,得到所述灰度图像的增强图像;Performing a sharpening operation on the grayscale image to obtain an enhanced image of the grayscale image;
    使用最稳定极值区域MSER算法提取所述增强图像的各文字区域,并获取各所述文字区域的位置信息;Using the most stable extreme value area MSER algorithm to extract each text area of the enhanced image and obtain position information of each text area;
    根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域。The text regions are classified according to the position information of each text region, and the text regions of the same type are combined to obtain the final text region.
  16. 根据权利要求15所述的文字区域获取方法,其特征在于,所述对所述灰度图像进行锐化操作,包括:The method for acquiring a text area according to claim 15, wherein the sharpening the grayscale image includes:
    采用3*3的卷积核对所述灰度图像进行卷积处理,以对所述灰度图像进行锐化操作;Adopting a 3*3 convolution kernel to perform convolution processing on the grayscale image to sharpen the grayscale image;
    其中,所述3*3的卷积核为:
    Figure PCTCN2019091526-appb-100004
    Wherein, the 3*3 convolution kernel is:
    Figure PCTCN2019091526-appb-100004
  17. 根据权利要求15所述的文字区域获取方法,其特征在于,所述根据各所述文字区域的位置信息进行文字区域的分类,包括:The method for acquiring a text area according to claim 15, wherein the classifying the text area according to the position information of each text area includes:
    根据各所述文字区域的位置信息,确定各所述文字区域的中心点,并获取各所述中心点的中心点坐标;Determine the center point of each text area according to the position information of each text area, and obtain the center point coordinates of each center point;
    将各所述中心点坐标之间满足第一预设条件的中心点确定为同一类,得到所述中心点的分类结果;Determine the center points that satisfy the first preset condition between the coordinates of the center points as the same type, and obtain a classification result of the center points;
    根据所述中心点的分类结果对各所述文字区域进行分类。Classify each of the character regions according to the classification result of the center point.
  18. 根据权利要求15所述的文字区域获取方法,其特征在于,所述根据各所述文字区域的位置信息进行文字区域的分类,并对同一类的文字区域进行合并,得到最终文字区域, 包括:The method for acquiring a text area according to claim 15, wherein the classifying the text area according to the position information of each text area and merging the text areas of the same type to obtain the final text area includes:
    构建与所述增强图像的大小相同的空白画布;Construct a blank canvas of the same size as the enhanced image;
    将所提取的各文字区域按照在所述增强图像中的排布位置,导入所述空白画布中;Import the extracted text areas into the blank canvas according to the arrangement position in the enhanced image;
    对位于所述空白画布中的各文字区域进行膨胀处理,得到膨胀后的各第一文字区域;Performing expansion processing on each text area located in the blank canvas to obtain each expanded first text area;
    对各所述第一文字区域进行边缘检测,确定相联通的第一文字区域,并将相联通的第一文字区域合并成联通区域;Performing edge detection on each of the first text areas, determining the connected first text areas, and merging the connected first text areas into a connected area;
    获取各所述联通区域的最小外接矩形的位置信息;Acquiring the position information of the smallest circumscribed rectangle of each of the connected areas;
    根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,并对同一类的联通区域进行合并,得到最终文字区域。According to the position information of each minimum circumscribed rectangle, each of the connected areas is classified, and the connected areas of the same type are combined to obtain a final text area.
  19. 根据权利要求18所述的文字区域获取方法,其特征在于,所述根据各所述最小外接矩形的位置信息对各所述联通区域进行分类,包括:The method for acquiring a text area according to claim 18, wherein the classifying each of the connected areas according to the position information of each of the smallest circumscribed rectangles includes:
    获取各所述最小外接矩形的对角坐标;Obtain the diagonal coordinates of each of the smallest circumscribed rectangles;
    根据各所述对角坐标,对各所述联通区域进行分类。According to each of the diagonal coordinates, each of the connected areas is classified.
  20. 根据权利要求15至19中任一项所述的文字区域获取方法,其特征在于,在采用均值漂移算法和双边滤波算法对所述预设图像进行背景去除之前,还包括:The method for acquiring a text region according to any one of claims 15 to 19, characterized in that before the mean shift algorithm and the bilateral filtering algorithm are used to remove the background of the preset image, the method further includes:
    采集所述预设图像中各像素点的RGB值;Collect the RGB values of each pixel in the preset image;
    提取RGB值满足第二预设条件的像素点,并在所述预设图像中删除所提取的像素点。Extract pixel points whose RGB values meet the second preset condition, and delete the extracted pixel points in the preset image.
PCT/CN2019/091526 2018-11-30 2019-06-17 Text region obtaining method and apparatus, storage medium and terminal device WO2020107866A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811451778.9 2018-11-30
CN201811451778.9A CN109670500B (en) 2018-11-30 2018-11-30 Text region acquisition method and device, storage medium and terminal equipment

Publications (1)

Publication Number Publication Date
WO2020107866A1 true WO2020107866A1 (en) 2020-06-04

Family

ID=66143422

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091526 WO2020107866A1 (en) 2018-11-30 2019-06-17 Text region obtaining method and apparatus, storage medium and terminal device

Country Status (2)

Country Link
CN (1) CN109670500B (en)
WO (1) WO2020107866A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814785A (en) * 2020-06-11 2020-10-23 浙江大华技术股份有限公司 Invoice recognition method, training method of related model, related equipment and device
CN112132807A (en) * 2020-09-23 2020-12-25 泉州装备制造研究所 Weld joint region extraction method and device based on color similarity segmentation
CN112330553A (en) * 2020-10-30 2021-02-05 武汉理工大学 Crack image denoising method, device and storage medium
CN112651399A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Method for detecting same-line characters in oblique image and related equipment thereof
CN113033540A (en) * 2021-04-14 2021-06-25 易视腾科技股份有限公司 Contour fitting and correcting method for scene characters, electronic device and storage medium
CN114898409A (en) * 2022-07-14 2022-08-12 深圳市海清视讯科技有限公司 Data processing method and device
CN115588202A (en) * 2022-10-28 2023-01-10 南京云阶电力科技有限公司 Contour detection-based method and system for extracting characters in electrical design drawing

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670500B (en) * 2018-11-30 2024-06-28 平安科技(深圳)有限公司 Text region acquisition method and device, storage medium and terminal equipment
CN110956739A (en) * 2019-05-09 2020-04-03 杭州睿琪软件有限公司 Bill identification method and device
CN110472623B (en) * 2019-06-29 2022-08-09 华为技术有限公司 Image detection method, device and system
CN110717489B (en) * 2019-09-19 2023-09-15 平安科技(深圳)有限公司 Method, device and storage medium for identifying text region of OSD (on Screen display)
CN110852229A (en) * 2019-11-04 2020-02-28 泰康保险集团股份有限公司 Method, device and equipment for determining position of text area in image and storage medium
CN112862694A (en) * 2019-11-12 2021-05-28 合肥欣奕华智能机器有限公司 Screen position correction method and device, computing equipment and storage medium
CN110929738A (en) * 2019-11-19 2020-03-27 上海眼控科技股份有限公司 Certificate card edge detection method, device, equipment and readable storage medium
CN110992353B (en) * 2019-12-13 2021-04-06 哈尔滨工业大学 Chip coating film quality detection method based on intelligent sensing
CN112287933B (en) * 2019-12-20 2022-09-06 中北大学 Method and system for removing character interference of X-ray image of automobile hub
CN112418204A (en) * 2020-11-18 2021-02-26 杭州未名信科科技有限公司 Text recognition method, system and computer medium based on paper document
CN113096099B (en) * 2021-04-14 2023-08-25 重庆交通大学 Color channel combination-based permeable asphalt mixture communication gap identification method
CN113920295A (en) * 2021-10-30 2022-01-11 平安科技(深圳)有限公司 Character detection and recognition method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1327955A2 (en) * 2002-01-11 2003-07-16 Hewlett-Packard Company Text extraction from a compound document
CN107977658A (en) * 2017-12-27 2018-05-01 深圳Tcl新技术有限公司 Recognition methods, television set and the readable storage medium storing program for executing in pictograph region
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593277A (en) * 2008-05-30 2009-12-02 电子科技大学 A kind of complicated color image Chinese version zone automatic positioning method and device
CN101901344B (en) * 2010-08-13 2012-04-25 上海交通大学 Method for detecting character image local feature based on corrosion method and DoG operator
CN102136064A (en) * 2011-03-24 2011-07-27 成都四方信息技术有限公司 System for recognizing characters from image
CN104182722B (en) * 2013-05-24 2018-05-18 佳能株式会社 Method for text detection and device and text message extracting method and system
CN108038481A (en) * 2017-12-11 2018-05-15 江苏科技大学 A kind of combination maximum extreme value stability region and the text positioning method of stroke width change

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1327955A2 (en) * 2002-01-11 2003-07-16 Hewlett-Packard Company Text extraction from a compound document
CN107977658A (en) * 2017-12-27 2018-05-01 深圳Tcl新技术有限公司 Recognition methods, television set and the readable storage medium storing program for executing in pictograph region
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CUI, XUAN ET AL: "A Method for Small Infrared Target Detection Based on the Technique of Image Restoration", INFRARED TECHNOLOGY, vol. 36, no. 7, 31 July 2014 (2014-07-31), XP009521453, ISSN: 1001-8891 *
YANG LIU: "Research of Text Localization Algorithm Based on Radon Tilt Correction and MSER in Complex Scenes", MICROCOMPUTER & ITS APPLICATIONS, vol. 35, no. 21, 31 December 2016 (2016-12-31), pages 42 - 44,48, XP009521452, DOI: 10.19358/j.issn.1674-7720.2016.21.013 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814785A (en) * 2020-06-11 2020-10-23 浙江大华技术股份有限公司 Invoice recognition method, training method of related model, related equipment and device
CN111814785B (en) * 2020-06-11 2024-03-29 浙江大华技术股份有限公司 Invoice recognition method, training method of relevant model, relevant equipment and device
CN112132807A (en) * 2020-09-23 2020-12-25 泉州装备制造研究所 Weld joint region extraction method and device based on color similarity segmentation
CN112132807B (en) * 2020-09-23 2024-02-23 泉州装备制造研究所 Weld joint region extraction method and device based on color similarity segmentation
CN112330553A (en) * 2020-10-30 2021-02-05 武汉理工大学 Crack image denoising method, device and storage medium
CN112330553B (en) * 2020-10-30 2022-07-01 武汉理工大学 Crack image denoising method, device and storage medium
CN112651399A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Method for detecting same-line characters in oblique image and related equipment thereof
CN112651399B (en) * 2020-12-30 2024-05-14 中国平安人寿保险股份有限公司 Method for detecting same-line characters in inclined image and related equipment thereof
CN113033540A (en) * 2021-04-14 2021-06-25 易视腾科技股份有限公司 Contour fitting and correcting method for scene characters, electronic device and storage medium
CN114898409A (en) * 2022-07-14 2022-08-12 深圳市海清视讯科技有限公司 Data processing method and device
CN115588202A (en) * 2022-10-28 2023-01-10 南京云阶电力科技有限公司 Contour detection-based method and system for extracting characters in electrical design drawing
CN115588202B (en) * 2022-10-28 2023-08-15 南京云阶电力科技有限公司 Contour detection-based method and system for extracting characters in electrical design drawing

Also Published As

Publication number Publication date
CN109670500A (en) 2019-04-23
CN109670500B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
WO2020107866A1 (en) Text region obtaining method and apparatus, storage medium and terminal device
CN109086714B (en) Form recognition method, recognition system and computer device
US10896349B2 (en) Text detection method and apparatus, and storage medium
US11164027B2 (en) Deep learning based license plate identification method, device, equipment, and storage medium
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
US9047529B2 (en) Form recognition method and device
US20190340460A1 (en) Text line detecting method and text line detecting device
WO2018145470A1 (en) Image detection method and device
CN110378313B (en) Cell cluster identification method and device and electronic equipment
WO2014026483A1 (en) Character identification method and relevant device
CN104182750A (en) Extremum connected domain based Chinese character detection method in natural scene image
CN110647882A (en) Image correction method, device, equipment and storage medium
CN110276279B (en) Method for detecting arbitrary-shape scene text based on image segmentation
CN110378351B (en) Seal identification method and device
CN109447117B (en) Double-layer license plate recognition method and device, computer equipment and storage medium
CN110942435B (en) Document image processing method and device
CN110807457A (en) OSD character recognition method, device and storage device
CN109271882B (en) Method for extracting color-distinguished handwritten Chinese characters
CN113221778B (en) Method and device for detecting and identifying handwritten form
CN116434071B (en) Determination method, determination device, equipment and medium for normalized building mask
CN112329641B (en) Form identification method, device, equipment and readable storage medium
CN110610163B (en) Table extraction method and system based on ellipse fitting in natural scene
CN112101323A (en) Method, system, electronic device and storage medium for identifying title list
JP4967045B2 (en) Background discriminating apparatus, method and program
CN113033562A (en) Image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889222

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889222

Country of ref document: EP

Kind code of ref document: A1