CN114663873A - Text region determination method and device, storage medium and electronic equipment - Google Patents
Text region determination method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN114663873A CN114663873A CN202210320579.4A CN202210320579A CN114663873A CN 114663873 A CN114663873 A CN 114663873A CN 202210320579 A CN202210320579 A CN 202210320579A CN 114663873 A CN114663873 A CN 114663873A
- Authority
- CN
- China
- Prior art keywords
- text
- region
- determining
- candidate
- rectangular reference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
The disclosure relates to the technical field of text detection, in particular to a text content determination method and device, a computer readable storage medium and an electronic device, wherein the method comprises the following steps: acquiring an initial image comprising a text, and determining a rectangular reference region comprising the text in the initial image, wherein one edge of the rectangular reference region is parallel to a reference direction in the initial image; determining a candidate region according to the rectangular reference region; determining text distribution information of each candidate area in the candidate areas; a target text region is determined in the candidate region based on the text distribution information. According to the technical scheme of the embodiment of the disclosure, under the condition of ensuring the precision, the calculation amount is reduced, and the calculation speed is increased.
Description
Technical Field
The disclosure relates to the technical field of information display, and in particular to a text region determination method and device, a computer-readable storage medium and an electronic device.
Background
With the development of image recognition technology, the technology for determining text regions in an image to obtain corresponding texts is becoming more and more widespread.
In the prior art, a method for determining a text region in an image to obtain a corresponding text has the problems of low calculation speed or insufficient precision, namely, precision and calculation speed cannot be both considered.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The purpose of the present disclosure is to provide a text region determining method, a text region determining apparatus, a computer readable medium, and an electronic device, so that under the condition of ensuring accuracy, the amount of calculation is reduced, and the calculation speed is increased.
According to a first aspect of the present disclosure, there is provided a text region determination method including: acquiring an initial image comprising the text, and determining a rectangular reference area comprising the text in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image; determining a candidate region according to the rectangular reference region; determining text distribution information of each candidate area in the candidate areas; and determining a target text region in the candidate region based on the text distribution information.
According to a second aspect of the present disclosure, there is provided a text region determination apparatus including: the target detection module is used for acquiring an initial image comprising the text and determining a rectangular reference area comprising the text in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image; a first determining module, configured to determine a candidate region according to the rectangular reference region; an information extraction module; the text distribution information of each candidate area is determined in the candidate areas; a second determining module, configured to determine a target text region in the candidate region based on the text distribution information.
According to a third aspect of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the above-mentioned method.
According to a fourth aspect of the present disclosure, there is provided an electronic apparatus, comprising: one or more processors; and memory storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the above-described method.
The text region determining method provided by one embodiment of the present disclosure obtains an initial image including a text, and determines a rectangular reference region including the text in the initial image, wherein an edge of the rectangular reference region is parallel to a reference direction in the initial image; determining a candidate region according to the rectangular reference region; determining text distribution information of each candidate area in the candidate areas; compared with the prior art, on one hand, an approximate rectangular reference region is determined in the initial image, then the candidate region is determined in the rectangular reference region based on the geometric relation, the target detection can be started without correcting the image, the calculated amount is reduced, meanwhile, the edge of the rectangular reference region is parallel to the reference direction in the initial image, namely, the angles of the detection frames can be consistent during detection, the calculated amount in the detection process is reduced, the calculated amount is further reduced, and the calculation speed is improved. On the other hand, after determining the candidate region, determining the target text region based on the text distribution information of the candidate region can ensure the accuracy of the determination of the text region. The method and the device have the advantages that the accuracy of determining the text region is guaranteed, meanwhile, the calculated amount is reduced, and the speed of determining the text region is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 shows a schematic view of a bank card image without a rotation angle;
fig. 2 shows a schematic view of a bank card image with a rotation angle;
FIG. 3 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;
FIG. 4 schematically illustrates a flow chart of a text region determination method in an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart for obtaining a rectangular reference region in an exemplary embodiment of the present disclosure;
FIG. 6 is a block diagram schematically illustrating a rectangular reference area in an exemplary embodiment of the present disclosure;
FIG. 7 is a diagram schematically illustrating a geometric relationship between an inscribed rectangle and a rectangular reference area in an exemplary embodiment of the present disclosure;
FIG. 8 schematically illustrates a geometric relationship diagram of another inscribed rectangle and a rectangular reference region in an exemplary embodiment of the present disclosure;
FIG. 9 schematically illustrates a target image corresponding to a rectangular reference region in an exemplary embodiment of the disclosure;
fig. 10 schematically illustrates a position diagram of an inscribed rectangle abcd in an initial image in an exemplary embodiment of the present disclosure;
fig. 11 schematically illustrates a target image corresponding to an inscribed rectangle abcd in an exemplary embodiment of the present disclosure;
FIG. 12 schematically illustrates a position diagram of an inscribed rectangle efgj in an initial image in an exemplary embodiment of the present disclosure;
FIG. 13 schematically illustrates a target image corresponding to an inscribed rectangle efgj in an exemplary embodiment of the present disclosure;
FIG. 14 schematically illustrates a location graph of keypoints in an exemplary embodiment of the disclosure;
FIG. 15 schematically illustrates a location graph of a straight line fitted from keypoints in an exemplary embodiment of the disclosure;
FIG. 16 schematically illustrates a location diagram of a minimum bounding rectangle in an exemplary embodiment of the present disclosure;
FIG. 17 is a flow chart of a preferred embodiment of the text region determination method of the present disclosure;
fig. 18 schematically illustrates a composition diagram of the text region determining apparatus in an exemplary embodiment of the present disclosure;
fig. 19 shows a schematic diagram of an electronic device to which an embodiment of the present disclosure may be applied.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the related art, with the popularity of portable electronic devices such as mobile phones and the large-scale construction of communication infrastructures such as 4G and 5G, mobile payments represented by mobile phone payments are more and more popular, and partial characteristics of the "cashless" society are realized to a certain extent.
The first step of mobile phone payment is that a user logs in a payment account number on a mobile phone and binds a real-name bank card of the user, wherein an important step is that the user inputs the identity information of the user and then inputs the bank card number for verification. However, because the bank card generally has a large number of digits, and the font of the bank card is greatly different from that of the conventional print and handwritten characters, the probability of the user inputting by mistake is high, so that the user needs to repeatedly input, confirm or even temporarily freeze an account, and the experience is very influenced. Therefore, the research on the automatic identification scheme of the bank card number based on the image identification has very important application value.
Currently, mainstream bank card number region detection schemes are divided into two categories: firstly, based on a filter, morphological processing and other traditional computer vision methods, the card number area is positioned through edge detection, and the detection precision of the method is poor; the other type of method is based on a target detection type CNN (Convolutional Neural network), a model is trained in a big data learning mode, the model has an automatic labeling capacity, a target area is directly positioned, and the method has the advantages of extremely high performance and high training difficulty and generally only can position a horizontal rectangular frame, so that the effect on images with rotation is poor. As shown in fig. 1 and 2, when the image has a rotation angle, the monitoring accuracy can be guaranteed only by correcting the image, the amount of calculation is large, and if the image is not corrected, a large number of irrelevant background areas exist in the horizontal detection frame output by the CNN-based target detection scheme, which affects the effect of the identification module.
Fig. 3 shows a schematic diagram of a system architecture, which system architecture 300 may include a terminal 310 and a server 320. The terminal 310 may be a terminal device such as a smart phone, a tablet computer, a desktop computer, or a notebook computer, and the server 320 generally refers to a background system that provides relevant services determined by a text region in the exemplary embodiment, and may be a server or a cluster formed by multiple servers. The terminal 310 and the server 320 may form a connection through a wired or wireless communication link to perform data interaction.
In one embodiment, the text region determination method described above may be performed by the terminal 310. For example, after the user takes an image using the terminal 310 or the user selects an image in an album of the terminal 310, the terminal 310 determines a text region of the image and outputs a target text region.
In one embodiment, the text region determination method described above may be performed by the server 320. For example, after the user uses the terminal 310 to photograph an image or the user selects an image in an album of the terminal 310, the terminal 310 uploads the image to the server 320, the server 320 determines a text region of the image, and returns a target text region to the terminal 310.
As can be seen from the above, the main body of execution of the text region determining method in the present exemplary embodiment may be the terminal 310 or the server 320, which is not limited by the present disclosure.
The text region determining method in the present exemplary embodiment is described below with reference to fig. 4, where fig. 4 shows an exemplary flow of the text region determining method, and may include:
step S410, obtaining an initial image comprising the text, and determining a rectangular reference area comprising the text in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image;
step S420, determining a candidate area according to the geometric relation of the rectangular reference area;
step S430, determining text distribution information of each candidate area in the candidate areas;
step S440, determining a target text region in the candidate region based on the text distribution information.
Based on the method, on one hand, a rectangular reference area is determined in the initial image, then a candidate area is determined in the rectangular reference area based on the geometric relation, the target detection can be started without correcting the image, the calculated amount is reduced, meanwhile, the edge of the rectangular reference area is parallel to the reference direction in the initial image, namely, the angles of the detection frames can be consistent during detection, the calculated amount in the detection process is reduced, and the calculation speed is improved. On the other hand, after the candidate regions are determined, the target text regions are determined based on the text distribution information of the candidate regions, and the accuracy of the determination of the text regions can be ensured. The method and the device have the advantages that the accuracy of determining the text region is guaranteed, meanwhile, the calculated amount is reduced, and the speed of determining the text region is improved.
Each step in fig. 4 is explained in detail below.
Referring to fig. 4, in step S410, an initial image including the text is acquired, and a rectangular reference area including the text is determined in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image.
In this exemplary embodiment, the initial image includes a text, and the initial image may be an image corresponding to a card such as a bank card or an identification card, or may be a document image, and may also be customized according to a user requirement, which is not specifically limited in this exemplary embodiment.
When the initial image is an image corresponding to a card such as a bank card, an identity card and the like, the text may include a bank card number, an identity card number and the like.
In the present exemplary embodiment, the initial image may be a rectangle, a triangle, a circle, or the like, and the shape of the initial image is not particularly limited in the present exemplary embodiment. The initial image may be a color image of 3 channels, or a grayscale image of 1 channel, and may also be customized according to a user requirement, which is not specifically limited in this exemplary embodiment.
For example, if the initial image is a rectangle, the reference direction may be parallel to one of the edges of the initial image.
In the present exemplary embodiment, when the rectangular reference region is acquired, steps S510 to S530 may be included, and the steps S510 to S530 are described in detail below.
In step S510, a plurality of intermediate text regions are obtained by performing object detection on the initial image.
In step S520, the accuracy of each of the intermediate text regions and the confidence that each of the intermediate text regions includes a preset type of text are determined.
Specifically, as shown in fig. 6, a rectangular coordinate system may be first established, the initial image may be placed in the rectangular coordinate system, the reference direction may be a direction parallel to any coordinate axis, a plurality of middle text regions may be obtained by performing target detection on the initial image by using a rectangular detection frame, an edge of the rectangular detection frame is parallel to the reference direction, the direction of the rectangular detection frame is defined, rectangular detection frames corresponding to other directions are not required, the number of rectangular detection frames may be reduced, and the amount of calculation in the detection process may be reduced.
Taking the initial image as an image including a bank card and the preset text type as a bank card number as an example, referring to fig. 6, the processor may output n 6-dimensional vectors after performing a plurality of cascaded operations such as convolution and pooling on the initial image. Where n is related to a specific network structure, n may be any positive integer greater than or equal to 1000 and less than or equal to 99999, such as 10000 and 20000, and may also be customized according to requirements, and is not specifically limited in this exemplary embodiment. The single prediction vector is of the form (x, y, w, h, c)1,c2) Wherein x, y, w and h are positive numbers and respectively represent the horizontal coordinate of the central point of the middle text region, the vertical coordinate of the central point, the width of the middle text region and the height of the middle text region, so as to obtain a plurality of middle text regions, and each 6 is a vector to represent one middle text region.
In the present exemplary embodiment, wherein c1、c2The confidence coefficient of the accurate coordinates of the central point and the confidence coefficient of the bank card number in the area in the frame are represented by positive numbers between 0 and 1 respectively.
In the present exemplary embodiment, c above may be combined1、c2The product of (a) and (b) is used as the confidence level that the intermediate text region includes the preset type text.
In step S530, the rectangular reference region is determined in the plurality of intermediate text regions according to the accuracy and the confidence.
In the present exemplary embodiment, after n intermediate text regions are obtained, c is taken out of these1、c2The one with the largest product, i.e. the above-mentioned middle text regionAnd taking the middle text area with the maximum confidence coefficient of the preset type text as the rectangular reference area. As shown in fig. 6, the prediction result is (X, Y, W, H, C1, C2), and the rectangular reference area is rectangular ABCD.
In an example embodiment of the present disclosure, a preset threshold may be set first, where the preset threshold may be 0.5, or 0.4, 0.6, and the like, and may also be customized according to a user requirement, and is not limited in this example embodiment.
In the present exemplary embodiment, taking the upper left corner of the initial image as the origin of coordinates, taking the upper left corner of the image as the origin of coordinates, taking the AB direction as the positive X-axis direction, and taking the AD direction as the positive Y-axis direction as an example where the preset threshold is 0.5, if C1 × C2 is smaller than the preset threshold, it is described that the initial image does not include the card number region, and the subsequent operation may not be performed. If C1 XC 2 is greater than or equal to the preset threshold, the coordinate of the center point P of the rectangular reference area ABCD is determined as (X, Y), and the coordinates of the point A, B, C, D are respectively (X-W/2, Y-H/2), (X + W/2, Y + H/2), and (X-W/2, Y + H/2). Where W denotes the length of the long side of the rectangular reference area, H denotes the length of the short side of the rectangular reference area, i.e., long in this disclosure represents the length of the long side, and wide represents the length of the short side. By adopting the method to determine the rectangular reference area, the accuracy can be ensured, and meanwhile, the calculation amount of the target detection algorithm can be reduced.
After the rectangular reference area is obtained, step S420 may be performed.
In step S420, a candidate region is determined according to the rectangular reference region.
In this exemplary embodiment, the candidate region is a rectangular region that may include all texts in the rectangular reference region and has an area smaller than or equal to the rectangular reference region, specifically, the candidate region may be determined according to a geometric relationship of the rectangular reference region, and the geometric relationship may include a length, a width, an aspect ratio, an inscribed image, an circumscribed image, and the like of the rectangular reference region, which is not specifically limited in this exemplary embodiment.
In one embodiment, after determining the rectangular reference region, an inscribed rectangle of the rectangular reference region is first determined, and the inscribed rectangle of the rectangular reference region and the rectangular reference region themselves may be used as the candidate regions, that is, three candidate regions, specifically, two inscribed rectangles including the rectangular reference region itself and the rectangular reference region may be included.
In one embodiment, if W and H of the rectangular reference region are not equal (i.e. the rectangular reference region is not a square), the rectangular reference region includes only two inscribed rectangles, and it should be noted that four corners of the inscribed rectangle in the exemplary embodiment are located on four sides of the rectangular reference region respectively. Referring to fig. 7 and 8, inscribed rectangles of the rectangular reference area ABCD may include ABCD and efgj, wherein the inscribed rectangles have equal sizes and opposite deflection directions. At this time, the coordinates of each point of abcd and efgj may be directly fitted by using a computer to obtain the two inscribed rectangles, and the inscribed rectangle abcd, the inscribed rectangle efgj, and the rectangular reference area themselves may be used as the candidate area. The inscribed rectangles are used as the candidate regions, so that the positions of texts can be accurately positioned on one hand, and on the other hand, the number of the inscribed rectangles is fixed, so that the number of the candidate regions can be directly reduced, and further, the calculation amount is reduced.
In an example embodiment of the disclosure, referring to fig. 7, a deflection angle ≤ α of the inscribed rectangle relative to a rectangular reference region may be first determined, where the deflection angle ≤ α and the ≤ BDC have a one-to-one correspondence, for example, at 32 ° for the ≤ BDC, ≤ 22 °; when the angle BDC is equal to 20 °, the angle α is equal to 15 °. The corresponding relation between the ≤ BDC and the deflection angle in the different rectangular reference regions can be determined first, after the ≤ BDC is obtained, the deflection angle can be determined according to the corresponding relation, and then the coordinate values of the points of the inscribed rectangle are calculated based on the deflection angle.
In the exemplary embodiment, the correspondence between the ≈ BDC and the yaw angle may be accurate to 1 degree, or may be accurate to 0.5 degree, 0.1 degree, or the like, and is not particularly limited in the exemplary embodiment.
In an exemplary embodiment, a preset aspect ratio of the text may be determined first, and then the inscribed rectangle is determined based on the preset aspect ratio and the deflection angle, specifically, the inscribed rectangle abcd is taken as an example, where a long side of the inscribed rectangle abcd is w, a short side is h, w in the present disclosure represents a length of the long side, h represents a length of the short side, and assuming that the deflection angle is ≦ α, and the ≦ α is m times of ≦ BDC, at this time, the following geometric logic relationship may be obtained:
∠α=m∠BDC
hsinα+wcosα=W
meanwhile, the preset aspect ratio of the text may be the preset aspect ratio of the text region, for example, w ═ kh, where k represents the preset aspect ratio, and then the values of h and w can be solved according to the simultaneous equation set. And then the coordinates of the four points abcd are calculated. For example, assuming that the coordinates of point a are (x, y), the coordinates of point a are (x, y + H-H) cos α, the coordinates of point b are (x + W-H sin α, y), the coordinates of point c are (x + W, y + H cos α), and the coordinates of point d are (x + H sin α, y + H).
In an exemplary embodiment of the present disclosure, if W and H of the rectangular reference region are equal (that is, the rectangular reference region is a square), in this case, the inscribed rectangle may be determined based on the preset aspect ratio and the deflection angle, and the candidate region may be determined, and in the case where the rectangular reference region is a square, the angle α ═ BDC ═ 45 °, then the angle α ═ BDC ═ may be obtainedThen, based on w — kh, the values of h and w may be calculated, and the coordinates of the four points abcd may be calculated. For example, assuming that the coordinates of point A are (x, y), then the coordinates of point a arePoint bHas the coordinates ofThe coordinate of the point c isd point coordinates of
In still another exemplary embodiment, when the inscribed rectangle is determined based on a preset aspect ratio and a deflection angle, if the initial image corresponds to a bank card, the initial image can be approximately recognized as ═ α ═ BDC.
At this time, a geometrical logical relationship can be obtained,
sinα=H/(W2+H2)1/2
cosα=W/(W2+H2)1/2
hsinα+wcosα=W
meanwhile, the value of h and w can be solved by a simultaneous equation set by substituting w into the geometrical logical relationship. And then the coordinates of the abcd four points are calculated. For example, assuming that the coordinates of point a are (x, y), the coordinates of point a are (x, y + H-H) cos α, the coordinates of point b are (x + W-H sin α, y), the coordinates of point c are (x + W, y + H cos α), and the coordinates of point d are (x + H sin α, y + H).
In this exemplary embodiment, the preset aspect ratio is determined according to the initial image and the text in the initial image, for example, if the initial image is a bank card image and the preset type text is a bank card number, the preset aspect ratio k may be 14, and in this exemplary embodiment, the preset aspect ratio is not limited to more than the preset aspect ratio.
The calculation process based on the inscribed rectangle efgj may refer to the calculation process of the inscribed rectangle abcd, and is not described herein again.
After obtaining the two inscribed rectangles, the inscribed rectangles and the rectangular reference region themselves are used as the candidate regions, and then step S430 is performed.
In step S430, text distribution information of each candidate region is determined in the candidate regions.
The text distribution information is information for indicating the text distribution position, and may include a text distribution coefficient, a distance between each character in the text, and the like, which is not specifically limited in this exemplary embodiment.
In the present exemplary embodiment, as shown in fig. 9 and 10, after obtaining the plurality of candidate regions, a target image corresponding to each candidate region may be obtained, that is, an image of the candidate region is extracted from the initial image, and the edge of the candidate region is also parallel to the reference direction. Specifically, referring to fig. 10, an inscribed rectangle adcd is taken as an example for explanation, and it can be assumed that four corresponding points of the candidate area in the original image are respectively P1、P2、P3、P4(wherein P is1Is the point closest to the upper left corner of the image, P1、P2、P3、P4In clockwise order) and their coordinate set is PS { (x)1,y1),(x2,y2),(x3,y3),(x4,y5) And if the width and the height of the candidate region are w1 and h1, respectively, affine transformation matrices of PS-to-point sets PD { (0,0), (w1,0), (w1, h), (0, h1) } are solved, and then the original image is subjected to corresponding affine transformation. Fig. 9 shows an object image obtained when a rectangular reference area is used as a candidate area, and fig. 10 and 11 show an object image determined when an inscribed rectangle abcd is used as a candidate area, and fig. 12 and 13 show an object image determined when an inscribed rectangle efgj is used as a candidate area.
In this case, the rotation angle of the text in the target image may be determined, specifically, referring to fig. 14, 15 and 16, taking the candidate image as the rectangular reference region itself as an example, the processor may perform corner point detection on the candidate region by using the Shi Tomasi algorithm on the target image to obtain a plurality of key points, then obtain a minimum circumscribed rectangle corresponding to the text according to the key points, take the included angle between the minimum circumscribed rectangle S and the candidate region as the rotation angle, and denote the rotation angle as V. In this exemplary embodiment, the maximum number of corner points detected may be set to be 100, the quality level is 0.005, and the minimum distance is 2, so that the corner points may be concentrated in the region where the text is located, and the specific parameters of the corner point detection may also be customized according to the user's requirements, which is not specifically limited in this exemplary embodiment.
In the present exemplary embodiment, the text distribution parameter includes a text distribution coefficient, and as shown in fig. 15, after obtaining a plurality of key points, a least square method may be used to perform a straight line fitting on the key points, and obtain a slope K of a determined straight line, and then the text distribution coefficient R may be calculated according to the slope and the rotation angle. Specifically, R ═ VK.
After the text distribution information of the candidate regions is calculated, step S440 may be performed.
In step S440, a target text region is determined in the candidate region based on the text distribution information.
Determining a target text region according to the text distribution information, determining the rotation angle of the text according to the text distribution information, taking a candidate region with the text closest to the horizontal in the target image as the target text region, or determining the area proportion occupied by the text in the candidate region according to the text distribution information, and taking the region closest to the full as the target text region
In an example real-time manner of the present disclosure, the candidate region having the smallest text distribution coefficient may be used as the target text region. The text in the candidate region with the smallest text distribution coefficient is closest to the level, namely the accuracy of the obtained target text region is highest, and the accuracy of determining the target text region can be improved by adopting the method.
In another example embodiment of the present disclosure, the processor may perform, for each candidate region, an operation of dividing a target image corresponding to the candidate region into a plurality of sub-regions, then may determine text density information of each of the sub-regions according to the text distribution coefficient, and then calculate a standard deviation of the text density information of each of the sub-regions in each of the candidate regions.
After the standard deviations of the plurality of candidate regions are obtained, the candidate region corresponding to the target image with the smallest standard deviation may be used as the target text region. The smaller the standard deviation of the text density of each region in the candidate regions is, the more uniform the text distribution is, and the accuracy of the determined target text region can be improved by selecting the candidate region with the most uniform text distribution as the target text region.
In yet another example embodiment of the present disclosure, the processor may also determine the target text region in the candidate region directly using OCR (Optical Character Recognition).
It should be noted that, the manner of determining the target text region in the candidate region may include multiple manners, which are described above as an exemplary illustration, and the disclosure does not specifically limit the manner.
Further, referring to fig. 17, describing the method for determining the text region by a specific example embodiment, step S1710 may be executed first, acquiring an initial image including the text, and determining a rectangular reference region including the text in the initial image; then, step S1720 is executed to use the rectangular reference region and the inscribed rectangle as the candidate regions, specifically, when step S1720 is executed, step S1721 may be executed first to determine whether the rectangular reference region is a rectangle or a square, if so, step S1722 is executed to determine a deflection angle according to the geometric relationship, and step S1723 is executed to determine the inscribed rectangle in the rectangular reference region based on the deflection angle. If the rectangular reference area is square, executing step S1724, obtaining a preset length-width ratio of the text, and determining a deflection angle according to the geometric relationship; and step S1725, determining the inscribed rectangle in the rectangular reference area based on the preset length-width ratio and the deflection angle.
After the inscribed rectangle is obtained, step S1730 may be executed to determine a text distribution coefficient in the text distribution information, and obtain a target image corresponding to each candidate region; then, step S1740 is executed to perform corner detection on each target image to obtain a plurality of key points; then executing step S1750, and determining a rotation angle of the text in the target image; and step S1760, performing straight line fitting based on the key points, and determining the slope of the straight line; finally, a step, step S1770, of determining the text distribution coefficient based on the rotation angle and the slope is performed.
After obtaining the text distribution coefficients, step S1780 may be executed to determine the candidate region corresponding to the target image with the minimum absolute value of the text distribution coefficients as the target text region. To complete the determination of the target text region.
In summary, compared with the prior art, on one hand, an approximate rectangular reference region is determined in the initial image, then the candidate region is determined in the rectangular reference region based on the geometric relationship, the target detection can be started without correcting the image, the calculated amount is reduced, meanwhile, the edge of the rectangular reference region is parallel to the reference direction in the initial image, namely, when the detection is performed, the angles of the detection frames can be consistent, the calculated amount in the detection process is reduced, the calculated amount is further reduced, and the calculation speed is improved. On the other hand, the inscribed rectangle of the rectangular reference region and the inscribed rectangle are used as candidate regions, and the coordinate values of all vertexes of the candidate regions are obtained through calculation of relevant geometric relations, so that under the condition that the precision is guaranteed, a computer does not need to perform complex operation, and the calculation amount is further reduced. On the other hand, after the candidate region is determined, the target text region is determined based on the text distribution information of the candidate region, specifically, a text distribution coefficient in the text distribution information is obtained through a rotation angle of the text and a slope of a key point you and a straight line, and the target text region is determined by using the text distribution coefficient, so that the accuracy of determining the text region can be ensured. The method and the device have the advantages that the accuracy of determining the text region is guaranteed, meanwhile, the calculated amount is reduced, and the speed of determining the text region is improved.
It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Further, referring to fig. 18, the embodiment of the present example also provides a text region determining apparatus 1800, which includes an object detecting module 1810, a first determining module 1820, an information extracting module 1830, and an image generating module 1840. Wherein:
the target detection module 1810 may be configured to obtain an initial image including a text, and determine a rectangular reference region including the text in the initial image, where one edge of the rectangular reference region is parallel to a reference direction in the initial image, and specifically, perform target detection on the initial image by using a rectangular detection box to obtain a plurality of intermediate text regions; determining the accuracy of each intermediate text region and the confidence coefficient of each intermediate text region including the preset type text; a rectangular reference region is determined among the plurality of intermediate text regions based on the accuracy and confidence.
The first determining module 1820 may be configured to determine a candidate region according to a rectangular reference region, and specifically, may first determine an inscribed rectangle of the rectangular reference region; and taking the rectangular reference region as the candidate region, and taking the inscribed rectangle as the candidate region.
In an example embodiment, when determining the inscribed rectangle of the rectangular reference region, the first determination module 1820 may first determine a deflection angle of the inscribed rectangle with respect to the rectangular reference region according to a geometric relationship, and then determine the inscribed rectangle in the rectangular reference region based on the deflection angle.
In the present exemplary embodiment, in determining an inscribed rectangle in the rectangle reference region based on the yaw angle, the first determination module 1820 may first acquire a preset aspect ratio of the text, and then determine the inscribed rectangle in the rectangle reference region based on the preset aspect ratio and the yaw angle.
The information extraction module 1830 may be used to determine text distribution information of each candidate region among the candidate regions. Specifically, the text distribution information includes a text distribution coefficient, and when determining that the text distribution information of each candidate region includes the text distribution coefficient in the candidate regions, the information extraction module 1830 may first acquire a target image corresponding to each candidate region; then determining the rotation angle of the text in the target image relative to the candidate area; and finally determining the text distribution coefficient according to the rotation angle.
In this example embodiment, after determining the text distribution coefficients according to the rotation angles, the information extraction module 1830 may first perform corner detection on each target image to obtain a plurality of key points; then, performing straight line fitting based on the key points, and determining the slope of the straight line; finally, a text distribution coefficient is determined based on the rotation angle and the slope.
The image generation module 1840 may be used to determine a target text region in a candidate region based on text distribution information.
In an example embodiment, the image generation module 1840 is configured to determine a candidate region corresponding to a target image with a smallest absolute value of the text distribution coefficient as the target text region.
In another example embodiment, the image generation module 1840 may be configured to first obtain a target image corresponding to each candidate region; dividing each target image into a plurality of sub-regions; secondly, determining text density information in each subregion according to the text distribution information; then, calculating the standard deviation of the text density information of each sub-area in each candidate area; and finally, determining the candidate region corresponding to the target image with the minimum standard deviation as the target text region.
The specific details of each module in the above apparatus have been described in detail in the method section, and details that are not disclosed may refer to the method section, and thus are not described again.
Exemplary embodiments of the present disclosure also provide an electronic device for performing the text region determining method, which may be the terminal 310 or the server 320. In general, the electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to execute the image text region determination method described above via execution of the executable instructions.
The structure of the electronic device will be described below by way of example using the mobile terminal 1900 in fig. 19. It will be appreciated by those skilled in the art that the configuration of figure 19 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.
As shown in fig. 19, the mobile terminal 1900 may specifically include: a processor 1901, a memory 1902, a bus 1903, a mobile communication module 1904, an antenna 1, a wireless communication module 1905, an antenna 2, a display 1906, a camera module 1907, an audio module 1908, a power module 1909, and a sensor module 1910.
The processor 1901 may include one or more processing units, such as: the Processor 210 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc. The text region determination method in the present exemplary embodiment may be performed by the AP, the GPU, or the DSP, and when the method involves neural network related processing, may be performed by the NPU.
The processor 1901 may be coupled to the memory 1902 or other components via the bus 1903.
The communication function of the mobile terminal 1900 can be realized by the mobile communication module 1904, the antenna 1, the wireless communication module 1905, the antenna 2, the modem processor, the baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 204 may provide a mobile communication solution of 2G, 3G, 4G, 5G, etc. applied to the mobile terminal 1900. The wireless communication module 1905 may provide wireless communication solutions for wireless local area network, bluetooth, near field communication, etc. applied to the mobile terminal 1900.
The sensor module 1910 may include a depth sensor 19101, a pressure sensor 19102, a gyroscope sensor 19103, an air pressure sensor 19104, etc. to implement a corresponding inductive detection function.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure as described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Furthermore, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.
Claims (12)
1. A text region determining method, comprising:
acquiring an initial image comprising a text, and determining a rectangular reference area comprising the text in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image;
determining a candidate region according to the rectangular reference region;
determining text distribution information of each candidate area in the candidate areas;
determining a target text region based on the text distribution information.
2. The method of claim 1, wherein determining the rectangular reference region comprising the text in the initial image comprises:
carrying out target detection on the initial image to obtain a plurality of intermediate text regions;
determining the accuracy of each intermediate text region and the confidence of each intermediate text region including a preset type text;
determining the rectangular reference region in the plurality of intermediate text regions according to the accuracy and the confidence.
3. The method of claim 1, wherein determining the candidate region from the rectangular reference region comprises:
determining an inscribed rectangle of the rectangular reference area;
and taking the rectangular reference region as the candidate region, and taking the inscribed rectangle as the candidate region.
4. The method of claim 3, the determining an inscribed rectangle of the rectangular reference region, comprising:
determining the deflection angle of the inscribed rectangle relative to the rectangular reference area according to the geometric relationship;
determining the inscribed rectangle in the rectangle reference region based on the deflection angle.
5. The method of claim 4, the determining the inscribed rectangle in the rectangle reference region based on the deflection angle comprising:
acquiring a preset length-width ratio of the text;
and determining the inscribed rectangle in the rectangular reference area based on the preset length-width ratio and the deflection angle.
6. The method of claim 1, wherein the text distribution information comprises text distribution coefficients, and wherein determining the text distribution information of each candidate region in the candidate regions comprises:
acquiring a target image corresponding to each candidate area;
determining a rotation angle of text in the target image relative to the candidate region;
and determining the text distribution coefficient according to the rotation angle.
7. The method of claim 6, wherein the determining the text distribution coefficient according to the rotation angle comprises:
carrying out corner detection on each target image to obtain a plurality of key points;
performing straight line fitting based on the key points, and determining the slope of the straight line;
determining the text distribution coefficient based on the rotation angle and the slope.
8. The method of claim 6, wherein the determining a target text region in the candidate region based on the text distribution information comprises:
and determining a candidate region corresponding to the target image with the minimum absolute value of the text distribution coefficient as the target text region.
9. The method of claim 1, wherein the determining a target text region in the candidate region based on the text distribution information comprises:
acquiring a target image corresponding to each candidate region;
dividing each target image into a plurality of sub-regions;
determining text density information in each sub-region according to the text distribution information;
calculating the standard deviation of the text density information of each sub-region in each candidate region;
and determining the candidate region corresponding to the target image with the minimum standard deviation as the target text region.
10. A text region determination apparatus, comprising:
the target detection module is used for acquiring an initial image comprising the text and determining a rectangular reference area comprising the text in the initial image, wherein one edge of the rectangular reference area is parallel to a reference direction in the initial image;
the first determining module is used for determining a candidate region according to the geometric relation of the rectangular reference region;
an information extraction module; the text distribution information of each candidate area is determined in the candidate areas;
a second determining module, configured to determine a target text region in the candidate region based on the text distribution information.
11. A computer-readable storage medium on which a computer program is stored, which program, when being executed by a processor, is characterized by carrying out the text region determination method according to any one of claims 1 to 9.
12. An electronic device, comprising:
one or more processors; and
memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the text region determination method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210320579.4A CN114663873A (en) | 2022-03-29 | 2022-03-29 | Text region determination method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210320579.4A CN114663873A (en) | 2022-03-29 | 2022-03-29 | Text region determination method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114663873A true CN114663873A (en) | 2022-06-24 |
Family
ID=82034089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210320579.4A Pending CN114663873A (en) | 2022-03-29 | 2022-03-29 | Text region determination method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114663873A (en) |
-
2022
- 2022-03-29 CN CN202210320579.4A patent/CN114663873A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108681729B (en) | Text image correction method, device, storage medium and equipment | |
EP3576017A1 (en) | Method, apparatus, and device for determining pose of object in image, and storage medium | |
CN113673519B (en) | Character recognition method based on character detection model and related equipment thereof | |
JP4738469B2 (en) | Image processing apparatus, image processing program, and image processing method | |
US11551027B2 (en) | Object detection based on a feature map of a convolutional neural network | |
CN110619656B (en) | Face detection tracking method and device based on binocular camera and electronic equipment | |
WO2020125062A1 (en) | Image fusion method and related device | |
CN112597940B (en) | Certificate image recognition method and device and storage medium | |
CN113763249A (en) | Text image super-resolution reconstruction method and related equipment thereof | |
CN110827301B (en) | Method and apparatus for processing image | |
WO2019080702A1 (en) | Image processing method and apparatus | |
CN113989604B (en) | Tire DOT information identification method based on end-to-end deep learning | |
JP2023119593A (en) | Method and apparatus for recognizing document image, storage medium, and electronic device | |
CN112488095A (en) | Seal image identification method and device and electronic equipment | |
US20220207917A1 (en) | Facial expression image processing method and apparatus, and electronic device | |
Liu et al. | SLPR: A deep learning based Chinese ship license plate recognition framework | |
WO2024174726A1 (en) | Handwritten and printed text detection method and device based on deep learning | |
WO2022095318A1 (en) | Character detection method and apparatus, electronic device, storage medium, and program | |
CN116802683A (en) | Image processing method and system | |
CN114723640B (en) | Obstacle information generation method and device, electronic equipment and computer readable medium | |
CN114663873A (en) | Text region determination method and device, storage medium and electronic equipment | |
CN113392820B (en) | Dynamic gesture recognition method and device, electronic equipment and readable storage medium | |
CN112434591B (en) | Lane line determination method and device | |
CN115359502A (en) | Image processing method, device, equipment and storage medium | |
CN113033256B (en) | Training method and device for fingertip detection model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |