CN113762234A - Method and device for determining text line region - Google Patents

Method and device for determining text line region Download PDF

Info

Publication number
CN113762234A
CN113762234A CN202110104724.0A CN202110104724A CN113762234A CN 113762234 A CN113762234 A CN 113762234A CN 202110104724 A CN202110104724 A CN 202110104724A CN 113762234 A CN113762234 A CN 113762234A
Authority
CN
China
Prior art keywords
text
line
text line
loss function
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110104724.0A
Other languages
Chinese (zh)
Inventor
杨学行
赖荣凤
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110104724.0A priority Critical patent/CN113762234A/en
Publication of CN113762234A publication Critical patent/CN113762234A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a method and a device for determining a text line area, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a text image, wherein texts in the text image are presented in a text line form; detecting to obtain the center line information of the text line by using a text line detection model based on the center line; and obtaining a text line outline according to the center line information so as to determine the text line area. The embodiment can effectively solve the problem of text adhesion, and particularly, under the condition that the text lines are very close to or have an overlapping area, the text lines can be accurately segmented in a manner of extracting the center lines of the text lines, so that the accuracy of segmentation of the text lines is improved, and the accuracy of subsequent text recognition is further improved; on the other hand, the text line contour (area) is obtained by determining the text line central line and then detecting according to the central line information, and the method is particularly suitable for text detection of long text lines.

Description

Method and device for determining text line region
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for determining a text line region.
Background
The text detection technology is an important component in an OCR (optical character recognition) technology, and mainly utilizes a computer vision technology to detect a text region in an image for subsequent operations such as text recognition. The current text detection methods can be broadly divided into three types: a frame candidate regression based method, an image segmentation based method and a segmentation regression combined method.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing segmentation-based method mainly performs segmentation on a text confidence map, although text adhesion can be avoided as much as possible by using a shrink strategy, the existing technical scheme usually cannot segment the text line well under the condition that the text line is very close or an overlapping region exists between the text lines, so that text detection errors are caused, and further accuracy of subsequent text recognition is influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a text region, which can effectively solve the problem of text adhesion, and particularly, under the condition that a text line is very close to or has an overlapping region, the text line can be accurately segmented in a manner of extracting a text line center line, which is helpful for improving the efficiency of text line segmentation, and is further helpful for the accuracy of subsequent text recognition; on the other hand, the text line contour (area) is obtained by determining the text line central line and then detecting according to the central line information, and the method is particularly suitable for text detection of long text lines.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of determining a text line region, including: acquiring a text image, wherein texts in the text image are presented in a text line form; detecting to obtain the center line information of the text line by using a text line detection model based on the center line; and obtaining a text line outline according to the center line information so as to determine the text line area.
Optionally, the method for determining a text line region, the centerline information comprises: the coordinate information of the points on the central line, the angle of the central line relative to the designated direction and the vertical distance between the points on the central line and the outline of the text line to be determined.
Optionally, the method for determining a text line region, where obtaining a text line outline according to the centerline information includes: and extracting a central line based on the connected domain, and obtaining the text line profile according to the angle and the vertical distance.
Optionally, the method for determining the text line region obtains the text line detection model according to the following manner: acquiring a training sample, wherein the training sample comprises the text image and position information of a text line in the text image; and training a convolution neural network by using the training sample to obtain the text line detection model.
Optionally, the method for determining the text line region includes: during the training process, determining a loss function of the text line detection model based on one or more of: a first loss function based on a predicted value and a true value of the coordinate information of each point on the center line, a second loss function based on a predicted value and a true value of the vertical distance, and a third loss function based on a predicted value and a true value of an angle corresponding to each point on the center line.
Optionally, the method for determining the text line region further includes: determining the loss function from one or more of: a first modified loss function based on the first loss function and a first weight value, a second modified loss function based on the second loss function and a second weight value, and a third modified loss function based on the third loss function and a third weight value.
Optionally, the method for determining text line regions, the convolutional neural network being one of: ResNet, VGG, MobileNet, PVANet, DenseNet.
To achieve the above object, according to a second aspect of embodiments of the present invention, there is provided an apparatus for determining a text line region, including: the image acquisition module is used for acquiring a text image, wherein the text in the text image is presented in a text line form; the center line detection module is used for detecting and obtaining center line information of the text line by using a text line detection model based on a center line; and the text line outline determining module is used for obtaining a text line outline according to the central line information so as to determine the text line area.
Optionally, the apparatus for determining a text line region further includes: the model training module is used for acquiring training samples, and the training samples comprise the text images and the position information of the text lines in the text images; and training a convolutional neural network by using the training sample to obtain the text line detection model.
To achieve the above object, according to a third aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the method as any one of the methods for determining a text line region described above.
One embodiment of the above invention has the following advantages or benefits: because the technical means of determining the text line outline by detecting the relevant information of the text line central line is adopted, the problem of text adhesion can be effectively solved, and particularly, under the condition that the text line is very close to or has an overlapping region, the text line can be accurately segmented in a mode of extracting the text line central line, the efficiency of segmenting the text line is improved, and the accuracy of subsequent text recognition is further improved; on the other hand, the text line contour (area) is obtained by determining the text line central line and then detecting according to the central line information, and the method is particularly suitable for text detection of long text lines.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIGS. 1A and 1B illustrate examples of the presence of text line-of-text sticky in a text image;
FIG. 2 shows a flow of a method of determining text line regions according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a main flow of determining text line regions according to an embodiment of the invention;
FIG. 4 illustrates a flow of a method of training a text line detection model according to an embodiment of the invention;
FIG. 5 is a schematic diagram of a process for training a text line detection model according to an embodiment of the invention;
FIG. 6 is a schematic diagram of the main modules of an apparatus for text line region according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system of a terminal device or a server suitable for implementing the embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The text detection technology is an important component in an OCR (optical character recognition) technology, and mainly utilizes a computer vision technology to detect a text region in an image for subsequent operations such as text recognition. However, in practice, in a text image, it is difficult to avoid a case where adjacent text lines are too close to each other or where regions overlap. Fig. 1A and 1B illustrate an example of text line sticking in a text image, where a broken line in each of fig. 1A and 1B indicates a center line of a text line in which the broken line is located. As shown in fig. 1A, the text line "12 months and 1 day, the weather is clear, the wind power is 4-level, the temperature is between minus 3 degrees centigrade and minus 7 degrees centigrade" in the horizontal direction, and the text line "12 months and 24 days, christmas, xx market will take sales promotion" in the inclined direction have an area overlap, which will affect the determination of the text line area. As shown in fig. 1B, the text line "12 month 1 day, the weather is clear, the wind power is at level 4, and the font of" 12 month 1 day "in the" air temperature drops to below zero "is large, which causes the text line to overlap with the text line" 12 month 24 day, christmas, and the xx market will take a sales promotion "in the horizontal direction, and will also affect the determination of the respective text line regions. It is noted that the text boxes shown in solid lines in fig. 1A and 1B are merely illustrative, and that in practice, for example, the text boxes may be closer to the text.
Fig. 2 shows a flow of a method of determining a text line region according to an embodiment of the present invention.
As shown in fig. 2, in S201, a text image is obtained, wherein text in the text image is presented in the form of text lines. The acquired text image may be the image shown in fig. 1A, or may be a text image in which adjacent text lines are not too close or in which there is an overlapping area. In addition, the text line described in the present disclosure refers to a text line image.
In S203, detecting to obtain centerline information of the text line by using a text line detection model based on a centerline; wherein the centerline information comprises: the text line outline locating method comprises the following steps of coordinate information of points on the central line, an angle of the central line relative to a specified direction and the vertical distance between the points on the central line and the outline of a text line to be located. The training process of the text line detection model will be described later.
Because the central line is composed of points on the central line, coordinate information of the points on the central line is obtained, and the position of the central line in the image can be obtained. Taking the ResNet50 model as an example, coordinates of a point on the centerline can be obtained through a shrink policy provided by the model, and then the position of the centerline can be known. In one embodiment, for the case that the text line outline is a special quadrangle such as a rectangle or a parallelogram, the center of the side constituting the text line outline can be directly connected to obtain the center line.
In the embodiment of the invention, the predicted points on the center line of the text line are corresponding to the angle, and the trend of the text line can be obtained according to the angle. The angle corresponding to a point on the central line of the text line can be understood as the size of the included angle of the "trend" of the text line relative to a specified direction such as the horizontal direction; it will be appreciated that the specified direction may also be other directions, for example, where the lines (columns) of text are top-to-bottom typed, a vertical direction is preferably used as the specified direction. In the case of text behavior curves, the "heading" of a text line appears as a tangential direction at the current point of the centerline.
In this context, the vertical distance from a point on the center line to the outline of the text line to be determined is also referred to as the height of the point on the center line. FIG. 1B shows an example of an upper and lower vertical distance "h 1" and "h 2", i.e., the upper vertical distance (upper height) refers to the point-to-centerline vertical distance of the upper profile corresponding to a point on the centerline, or the point-to-upper profile vertical distance on the centerline; accordingly, the lower vertical distance (lower height) refers to a point-to-centerline vertical distance of the lower profile corresponding to a point on the centerline, or a point on the centerline-to-lower profile vertical distance. The separate provision of the upper and lower vertical distances helps to reduce errors in the description of the text regions, especially in cases where the points on the centreline cannot be accurately defined at the mid-points of the lines between the upper and lower contours.
In S204, a text line outline is obtained according to the centerline information to determine the text line region. Specifically, for the points on the predicted centerline, the points related to the points on the predicted centerline are obtained in a connected domain-based manner (e.g., 4 connected or 8 connected), and then the centerline is extracted — wherein the points related to the points on the predicted centerline may have the same pixel value or gray value as the points on the predicted centerline, or have a difference value within a predefined threshold range. Then, for each point on the finally determined central line, the text line profile is obtained according to the angle and the vertical distance. In summary, the obtaining the text line outline according to the centerline information may specifically include: and extracting a central line based on the connected domain, and obtaining the text line profile according to the angle and the vertical distance. In addition, in a preferred embodiment, in the process of extracting the center line based on the connected component, the width of the sliding "small window" related to the connected component can be set as the height 1/8, and the sliding is performed in the direction of the center line of the text line for fusion, so as to achieve the purpose of bonding together the broken text line.
Fig. 3 is a schematic diagram of a main flow of determining a text line region according to an embodiment of the present invention. As shown in fig. 3, inputting a text image of a text line outline to be determined to a text line detection model; the text line detection model predicts and obtains center line information of each text line in the text image, wherein the center line information comprises a center line position, namely coordinate information of a point on the center line, and a center line angle, namely an angle of the center line relative to a specified direction and a center line height, namely an up-down vertical distance between the point on the center line and the outline of the text line to be determined; finally, according to the center line information, a text line outline can be obtained, so that a text line area can be determined, and further, the subsequent text recognition is facilitated.
FIG. 4 shows a flow of a method of training a text line detection model according to an embodiment of the invention.
As shown in fig. 4, in S401, a training sample is acquired, where the training sample includes the text image and position information of a text line in the text image. Specifically, the text image is used as input data of the model, and the position information of the text line marked in advance is used as a label of the training sample.
In S402, a convolutional neural network is trained using the training samples to obtain the text line detection model. By using the text line detection model, after the position information of the text line of the input text image is detected, the centerline information of the input text image can be further detected according to the position information of the text line, and it can be understood that the points on the centerline are continuous in space, and the attribute information (i.e., the coordinates, angles and heights of the points on the centerline) also has the characteristics of consistency or continuous slow change. Thus, constraints are introduced that require the regression results to be as smooth as possible. Specifically, the loss function in the training process is defined as:
Loss=L1+L2+L3+Lsim (1)
L1=BCE(Pcenterline-Tcenterline) (2)
L2=cos(Pangle–Tangle) (3)
L3=smooth-L1(Pheight–Theight) (4)
Lsim=||sobel(P–T)||1 (5)
wherein L issimBCE, cos and smooth-L1 are all common loss functions which are regular parts of the loss functions; it is understood that L can be paired1、L2、L3And LsimAnd weighting according to different weights to obtain the final Loss function Loss. Specifically, | | | purple sweet1The calculation of the L1 norm for each point on the pre-measured centerline is shown, i.e., the sum of the absolute values of the differences between the true and predicted values of the point attributes on the centerline. P and T correspond to the predicted and true values of the property of the point on the centerline, e.g., the coordinates corresponding to the point prediction and the true coordinates given in the sample, the angle value corresponding to the point prediction and the true angle value given in the sample, the predicted height value and the true height value given in the sample, respectively.
In addition, the sobel operator is a discrete differentiation operator (discrete differentiation operator), which is generally used to calculate an approximate gradient of the image gray scale, and the larger the gradient, the more likely the gradient is to be an edge, and here, the sobel operator is used to evaluate the difference between the predicted value and the true value of the point on the center line, that is, the smoothness degree. In other embodiments, the sobel operator can be exchanged for other gradient-related operators. By introducing the loss function and punishing the unevenness, the center line is kept continuous as much as possible and the information of the center line is changed stably through model iteration.
Further, the loss function may be calculated by combining the difference between the true and predicted values of the different attributes of each point on the centerline. Specifically, the penalty function for the text line detection model will be determined based on one or more of the following: the first loss function based on the predicted value and the true value of the coordinate information of each point on the central line, the second loss function based on the predicted value and the true value of the vertical distance of each point on the central line, and the third loss function based on the predicted value and the true value of the corresponding angle on the central line.
Lsim=w1×||sobel(P1–T1)||1+w2×||sobel(P2–T2)||1+w3×||sobel(P3–T3)||1
(6)
Wherein, P1、T1Respectively representing the predicted value and the true value of the coordinate information of each point on the centerline, | | sobel (P1-T1) | survival of the cells1Represents the first loss function, w1A first weight value representing a first loss function; in a special case, the final loss function L is not included in the first loss functionsimWhen w1=0。
Similarly, P2、T2Respectively represents the predicted value and the true value of the vertical distance between the upper and the lower parts corresponding to each point on the central line, | sobel (P)2–T2)||1Represents a second loss function, w2A second weight value representing a second loss function; in a special case, the final loss is not counted in the second loss functionFunction LsimWhen w2=0。P3、T3Respectively represents the predicted value and the true value of the angle corresponding to each point on the central line, | | sobel (P)3–T3)||1Represents a third loss function, w3A weight value representing a third loss function; in a special case, when the third loss function does not account for the final loss function LsimWhen w3=0。
In one embodiment, if the loss function associated with each attribute (i.e., the first, second, and third loss functions) is compared to the final loss function LsimThe contributions are the same, then, w1=w2=w31 or w1=w2=w31/3. It is to be understood that the present disclosure is directed to w1、w2、w3The specific values of (a) are not limiting.
Thus, in summary, the loss function is determined based on one or more of the following: the first loss function and a first modified loss function of a first weight value (i.e., w)1×||sobel(P1–T1) 1), a second modified loss function (i.e., w) based on the second loss function and a second weight value2×||sobel(P2–T2) 1), a third modified loss function (i.e., w) based on the third loss function and a third weight value3×||sobel(P3–T3)||1)。
Further, the convolutional neural network is one of: ResNet, VGG, MobileNet, PVANet, DenseNet.
FIG. 5 is a diagram illustrating a process of training a text line detection model according to an embodiment of the invention. In fig. 5, using ResNet50 as an example, the features of stage 5(stage5), stage4 (stage3), and stage 2(stage2) are extracted and feature fusion is performed layer by layer. Finally, the predicted value of the center line information is output, namely the predicted value of the following information: the text line outline locating method comprises the following steps of coordinate information of points on the central line, an angle of the central line relative to a specified direction and the vertical distance between the points on the central line and the outline of a text line to be located.
FIG. 6 is a schematic diagram of the main modules of an apparatus for text line region according to an embodiment of the present invention. The device of the text line area comprises an image acquisition module, a text line acquisition module and a text line display module, wherein the image acquisition module is used for acquiring a text image, and texts in the text image are presented in a text line form; the center line detection module is used for detecting and obtaining the center line information of the text line by using a text line detection model based on the center line; and the text line outline determining module is used for obtaining a text line outline according to the central line information so as to determine the text line area.
The device for the text line region further comprises a model training module, a text line region calculation module and a text line region calculation module, wherein the model training module is used for acquiring training samples, and the training samples comprise the text images and the position information of the text lines in the text images; and training a convolutional neural network by using the training sample to obtain the text line detection model.
Fig. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. Network 704 is used to provide a medium for communication links between terminal devices 701, 702, 703 and server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. The terminal devices 701, 702, 703 may be terminals for sending image text to the server.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server providing various services, such as a background server (for example only) that analyzes text images provided by the terminal devices 701, 702, 703 to detect text line centerlines and thereby determine text line outlines.
It should be noted that the method for determining the text line region provided by the embodiment of the present invention is generally performed by the server 705.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device or server implementing an embodiment of the present invention. The terminal device or server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as an internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units (or "modules") mentioned in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The described units (or "modules") may also be provided in a processor, and may be described, for example, as: a processor includes an image acquisition unit (or "module"), a centerline detection unit, and a text line outline determination unit. The names of these units do not in some cases constitute a limitation on the units themselves, and for example, the image acquisition unit may also be described as a "unit that acquires a text image from information presented by a client".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a text image, wherein texts in the text image are presented in a text line form; detecting and obtaining the center line information of the text line by using a text line detection model; and obtaining a text line outline according to the center line information so as to determine the text line region points.
According to the technical scheme of the embodiment of the invention, as the technical means of determining the outline of the text line by detecting the relevant information of the center line of the text line is adopted, the problem of text adhesion can be effectively solved, and particularly, under the condition that the text line is very close to or has an overlapping area, the text line can be accurately extracted in a mode of determining the center line of the text line, so that the efficiency of text line segmentation is improved, and the accuracy of subsequent text recognition is improved; on the other hand, the text line contour (area) is obtained by determining the text line central line and then detecting according to the central line information, and the method is particularly suitable for text detection of long text lines.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations, and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of determining text line regions, comprising:
acquiring a text image, wherein texts in the text image are presented in a text line form;
detecting to obtain the center line information of the text line by using a text line detection model based on the center line;
and obtaining a text line outline according to the center line information so as to determine the text line area.
2. The method of claim 1, wherein the centerline information comprises: the text line outline locating method comprises the following steps of coordinate information of points on the central line, an angle of the central line relative to a specified direction and the vertical distance between the points on the central line and the outline of a text line to be located.
3. The method of claim 2, wherein obtaining a text line outline from the centerline information comprises:
and extracting a central line based on the connected domain, and obtaining the text line profile according to the angle and the vertical distance.
4. The method of claim 1, wherein the text detection model is obtained according to:
acquiring a training sample, wherein the training sample comprises the text image and position information of a text line in the text image;
and training a convolutional neural network by using the training sample to obtain the text line detection model.
5. The method of claim 4, comprising:
during the training process, determining a loss function of the text recognition detection model according to one or more of the following: the first loss function based on the predicted value and the true value of the coordinate information of each point on the central line, the second loss function based on the predicted value and the true value of the vertical distance of each point on the central line, and the third loss function based on the predicted value and the true value of the corresponding angle on the central line.
6. The method of claim 5, comprising:
determining the loss function from one or more of: a first modified loss function based on the first loss function and a first weight value, a second modified loss function based on the second loss function and a second weight value, and a third modified loss function based on the third loss function and a third weight value.
7. The method of claim 4, wherein the convolutional neural network is one of: ResNet, VGG, MobileNet, PVANet, DenseNet.
8. An apparatus for determining text line regions, comprising:
the image acquisition module is used for acquiring a text image, wherein the text in the text image is presented in a text line form;
the center line detection module is used for detecting and obtaining center line information of the text line by using a text line detection model based on a center line;
and the text line outline determining module is used for obtaining a text line outline according to the central line information so as to determine the text line area.
9. The apparatus of claim 8, further comprising:
the model training module is used for acquiring training samples, and the training samples comprise the text images and the position information of the text lines in the text images; and training a convolutional neural network by using the training sample to obtain the text line detection model.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202110104724.0A 2021-01-26 2021-01-26 Method and device for determining text line region Pending CN113762234A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104724.0A CN113762234A (en) 2021-01-26 2021-01-26 Method and device for determining text line region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104724.0A CN113762234A (en) 2021-01-26 2021-01-26 Method and device for determining text line region

Publications (1)

Publication Number Publication Date
CN113762234A true CN113762234A (en) 2021-12-07

Family

ID=78786445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104724.0A Pending CN113762234A (en) 2021-01-26 2021-01-26 Method and device for determining text line region

Country Status (1)

Country Link
CN (1) CN113762234A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704507A (en) * 2023-05-23 2023-09-05 读书郎教育科技有限公司 Accurate recognition method for dictionary pen area content

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704507A (en) * 2023-05-23 2023-09-05 读书郎教育科技有限公司 Accurate recognition method for dictionary pen area content

Similar Documents

Publication Publication Date Title
CN108038880B (en) Method and apparatus for processing image
CN108229485B (en) Method and apparatus for testing user interface
CN112560862B (en) Text recognition method and device and electronic equipment
CN113674287A (en) High-precision map drawing method, device, equipment and storage medium
CN108182457B (en) Method and apparatus for generating information
CN113205041A (en) Structured information extraction method, device, equipment and storage medium
CN113378696A (en) Image processing method, device, equipment and storage medium
CN109960959B (en) Method and apparatus for processing image
CN111738252B (en) Text line detection method, device and computer system in image
CN110472673B (en) Parameter adjustment method, fundus image processing device, fundus image processing medium and fundus image processing apparatus
CN113553428B (en) Document classification method and device and electronic equipment
CN113762234A (en) Method and device for determining text line region
CN112967248B (en) Method, apparatus, medium and program product for generating defect image samples
CN113780294B (en) Text character segmentation method and device
CN113420727B (en) Training method and device of form detection model and form detection method and device
CN115719444A (en) Image quality determination method, device, electronic equipment and medium
CN115359502A (en) Image processing method, device, equipment and storage medium
CN115239700A (en) Spine Cobb angle measurement method, device, equipment and storage medium
CN111291758B (en) Method and device for recognizing seal characters
CN114511862A (en) Form identification method and device and electronic equipment
CN114612971A (en) Face detection method, model training method, electronic device, and program product
CN113887394A (en) Image processing method, device, equipment and storage medium
CN111383193A (en) Image restoration method and device
CN113378836A (en) Image recognition method, apparatus, device, medium, and program product
CN113361371A (en) Road extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination