WO2021190155A1 - 文本行中的空格识别方法、装置、电子设备及存储介质 - Google Patents

文本行中的空格识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021190155A1
WO2021190155A1 PCT/CN2021/074886 CN2021074886W WO2021190155A1 WO 2021190155 A1 WO2021190155 A1 WO 2021190155A1 CN 2021074886 W CN2021074886 W CN 2021074886W WO 2021190155 A1 WO2021190155 A1 WO 2021190155A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
pixel value
pixel
grayscale image
interval
Prior art date
Application number
PCT/CN2021/074886
Other languages
English (en)
French (fr)
Inventor
尚太章
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021190155A1 publication Critical patent/WO2021190155A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of image processing technology, and more specifically, to a method, device, electronic device, and storage medium for identifying spaces in text lines.
  • this application proposes a method, device, electronic device, and storage medium for identifying spaces in text lines.
  • the embodiment of the present application provides a method for recognizing spaces in a text line to obtain a text grayscale image, the text grayscale image only includes a single line of text; calculating the preset direction in the text grayscale image The sum of the pixel values of each row of pixels, the preset direction is the direction perpendicular to the arrangement direction of the characters in the single line of text; the pixel points corresponding to the sum of the pixel values in the first pixel value interval are formed The connected domain of is used as a space in the single-line text, and the first pixel value interval is an interval where the sum of pixel values corresponding to the spaces in the text grayscale image is located.
  • an embodiment of the present application provides a device for identifying spaces in a text line.
  • the device includes: a picture acquisition module for acquiring a text grayscale image, the text grayscale image only includes a single line of text; pixels A value acquisition module, configured to calculate the sum of pixel values of each row of pixels in a preset direction in the text grayscale image, where the preset direction is a direction perpendicular to the arrangement direction of the text in the single line of text;
  • the space determination module is configured to use the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval as the space in the single line of text, and the first pixel value interval is the text grayscale image The interval in which the sum of the pixel values corresponding to the space is located.
  • an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured To be executed by the one or more processors, the one or more programs are executed by the processor to execute the above-mentioned methods.
  • an embodiment of the present application provides a computer-readable storage medium having program code stored in the computer-readable storage medium, and the program code can be invoked by a processor to execute the above-mentioned method.
  • Fig. 1 shows a flowchart of a method for identifying spaces in a text line provided by an embodiment of the present application.
  • FIG. 2 shows a schematic diagram of a pixel point arrangement provided by an embodiment of the present application.
  • Fig. 3 shows a flowchart of a method for identifying spaces in a text line provided by another embodiment of the present application.
  • Figure 4 shows a schematic text grayscale image provided by an embodiment of the present application.
  • FIG. 5 shows a fitting curve diagram between the pixel value and the number of pixel points in a text grayscale image provided by an embodiment of the present application.
  • FIG. 6 shows a schematic diagram after unifying the pixel values of the background part provided by an embodiment of the present application.
  • FIG. 7 shows a schematic text grayscale image after a closing operation is performed on the text grayscale image shown in FIG. 6 provided by an embodiment of the present application.
  • FIG. 8 shows a schematic diagram after the color of FIG. 7 is inverted according to an embodiment of the present application.
  • FIG. 9 shows a graph of statistical results after calculating the sum of pixel values for each column of pixels in FIG. 8.
  • Fig. 10 shows a functional module diagram of a device for identifying spaces in a text line provided by an embodiment of the present application.
  • Fig. 11 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 12 is a storage unit for storing or carrying program codes that implement the method for identifying spaces in a text line according to an embodiment of the present application.
  • each character can exist as a separate text. Regardless of whether a space is recognized, each character can be sorted at a certain interval to form a text with real text information.
  • each word is composed of corresponding alphabetic characters, and the letters of different words are selected from the same alphabet.
  • it is extremely important to extract the spaces between words. If the spaces cannot be extracted, then each line of text will be a series of letters that are connected together, and it is impossible to distinguish the specific word. This results in the difficulty of subsequent machine processing and the difficulty of human understanding and recognition.
  • the embodiments of the present application provide a method, device, electronic device, and storage medium for identifying spaces in a text line, and whether the sum of pixel values obtained by calculation is in the interval of the sum of pixel values corresponding to the spaces in the text grayscale , To determine the space of a single line of text in the text grayscale image.
  • the method, device, electronic device, and storage medium for identifying spaces in text lines provided in the embodiments of the present application will be described in detail below through specific embodiments.
  • FIG. 1 shows a method for identifying spaces in a text line provided by an embodiment of the present application. Specifically, the method includes:
  • Step S110 Obtain a text grayscale image, where the text grayscale image only includes a single line of text.
  • the text grayscale image is a grayscale image, which includes only one line of text, and this line of text is a single line of text in the text grayscale image. Identify the spaces in the text line in the text grayscale, that is, identify the spaces in the single line of text.
  • the specific arrangement direction of a single line of text in the text grayscale is not limited, and it can be horizontal arrangement; it can also be vertical arrangement, and of course, it can also be arranged in other directions.
  • the embodiment of the present application takes the horizontal arrangement as an example for description.
  • the text arrangement in the text grayscale is horizontal or vertical.
  • the horizontal arrangement may be processed by default, or the vertical arrangement may be processed by default.
  • the text arrangement direction is determined according to the two sides perpendicular to each other in the text grayscale image, and the extending direction of the longer side can be determined as the text arrangement direction.
  • Step S120 Calculate the sum of pixel values of each row of pixels in a predetermined direction in the text grayscale image, where the predetermined direction is a direction perpendicular to the arrangement direction of the characters in the single line of text.
  • the direction of the text arrangement in a single line of text is defined as the text arrangement direction, and the direction perpendicular to the text arrangement direction is the preset direction.
  • the horizontal direction is the text arrangement direction
  • the vertical direction is the preset direction.
  • the sum of the pixel values of each row of pixels in the preset direction in the text grayscale image can be calculated.
  • a row of pixels in the preset direction indicates that the arrangement direction of the row of pixels is the preset direction.
  • a row of pixels in the vertical direction is a column of pixels, and the sum of the pixel values of each column of pixels in the text grayscale image is calculated.
  • Figure 2 shows a schematic diagram of pixels in a text grayscale image with horizontal text arrangement.
  • the pixels in column I1 include (I1, J1) (I1, J2) (I1, J3); the pixels in column I2
  • the points include (I2, J1) (I2, J2) (I2, J3); the pixel points in the 13th column include (I3, J1) (I3, J2) (I3, J3), and so on.
  • the preset direction is the longitudinal direction. The sum of the pixel values of each row of pixels in the preset direction is calculated, that is, the sum of the gray values of each row of pixels is calculated, and the 7 corresponding to the I1 to I7 columns are obtained. The sum of pixel values.
  • each space may include multiple rows of pixels.
  • each space may include multiple rows of pixels.
  • one or more rows may be calculated at intervals of one row or multiple rows. The sum of the pixel values of the pixels in the direction.
  • Step S130 Use the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval as the space in the single line of text, and the first pixel value interval is the space corresponding to the space in the text grayscale image The interval in which the sum of pixel values lies.
  • the colors between the text are the same or close
  • the colors of the background are the same or close
  • the color difference between the text and the background is relatively large. That is, the pixel values of the pixels forming the text are the same or close, and the pixel values of the pixels forming the background are the same or close.
  • the difference is greater than a preset pixel difference value.
  • the background is the part other than the text in the single-line text picture, including spaces, the top, bottom, left, and right areas of the text line. Then, in a single-line text picture, the pixel value of the pixel forming the text is quite different from the pixel value of the pixel forming the space.
  • the pixels in the blanks after the pixels in the blanks are used to calculate the sum of pixel values in the foregoing steps, they may be within a range of pixel values, which is defined as the first pixel value range.
  • the first pixel value interval is unique, and is different from the pixel value interval range that the pixel points in the text may be in after being used to calculate the sum of pixel values in the foregoing steps.
  • the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval may be recognized as a space in a single line of text. That is, the area formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval is determined as a space.
  • the calculation is the sum of the pixel values of each row of pixels in the preset direction, or the calculation is the sum of the pixel values of every two adjacent rows or every adjacent multiple rows of pixels in the preset direction, and the sum of the pixel values
  • the pixel points corresponding to the sum can be all the pixels used to calculate the sum of the pixel value.
  • If calculating the sum of pixel values of pixels calculate the sum of the pixel values of one or more rows of pixels in the preset direction every other row or every other row.
  • the pixel corresponding to the sum of pixel values can include the use of All the pixels for which the sum of the pixel values are calculated, and the pixels for which the sum of the pixel values are not calculated when the sum of the pixel values is calculated. For example, in a single line of text arranged horizontally, the sum of the pixel values of the pixels in the first column, the sum of the pixel values of the pixels in the third column, and the sum of the pixel values of the odd-numbered columns in turn are calculated.
  • each odd-numbered column may include the pixels of the odd-numbered column and the pixels of the even-numbered column separated by the odd-numbered column. It can be understood that, in this example, each odd-numbered column pixel
  • the even-numbered columns of pixels separated by dots are pixels of even-numbered columns that are one smaller than the odd number.
  • the direction perpendicular to the text arrangement direction in the single-line text is used as the preset direction, and the sum of the pixel values of each row of pixels in the preset direction is calculated. Then, according to the first pixel value interval that may exist in the sum of the pixel values of the pixels corresponding to the spaces in the text grayscale image, determine which pixel values are in the first pixel value interval, and the pixel values in the first pixel value interval will be determined
  • the connected domain formed by the pixels corresponding to the sum is used as the space in the single-line text, so that the space in the single-line text can be more accurately recognized.
  • the method for identifying spaces in text lines may also include unifying the colors of the background part to make the pixel values of the space part more uniform, and the difference in the sum of the different pixel values obtained by calculation is smaller and more concentrated. , It is convenient to set more accurately to the first pixel value interval that can measure the range of the sum of pixel values. See Figure 3.
  • the method includes:
  • Step S210 Obtain a text grayscale image, where the text grayscale image only includes a single line of text.
  • the specific method for obtaining the text grayscale image is not limited.
  • the size of the text, the ratio between the text height and the picture height, and the ratio between the text width and the picture width are not limited.
  • the text gray scale image may be obtained by extracting a single line of text from a text image to obtain a single line of text image including only a single line of text. Then through image preprocessing, the single-line text image is converted into the text grayscale image.
  • the method of extracting a single line of text is not limited in the embodiment of the present application. For example, it may be extracted by a deep learning algorithm, such as the textboxes series algorithm, the east algorithm series, sglink and other algorithms.
  • the text grayscale image may also be obtained by performing image preprocessing on a single-line text picture that itself includes only one line of text.
  • image preprocessing may include one or more of the following:
  • the single-line text image itself is not a grayscale image, such as an RGB three-channel image
  • the single-line text image can be grayed out and converted into a grayscale image as the text grayscale image;
  • the processing sequence between the processing methods can be consistent with the above described sequence, and the denoising is performed after the grayscale conversion, and after the denoising Rebalancing processing reduces the difficulty of each step of processing and improves the effectiveness of processing.
  • Step S220 Obtain the divided pixel values in the text grayscale image.
  • the space part is not a pure pixel value, and the pixel value with a greater probability of being a space can be determined and unified by dividing the pixel value.
  • the segmented pixel values in the text grayscale image can more accurately distinguish spaces and text.
  • the pixel value on the side of the segmented pixel value is more likely to be the pixel value of the space than the pixel value of the text;
  • the pixel value on the other side of the divided pixel value is more likely to be the pixel value of the text than the space. That is, the pixel value on one side of the divided pixel value has a greater probability of being a space pixel value than the probability of a text pixel value; the pixel value on the other side of the divided pixel value has a probability of being text greater than the probability of being a space.
  • one side and the other side of the divided pixel value represent opposite sides of the divided pixel value, and the pixel values on the opposite sides are the pixel value larger than the divided pixel value and the pixel value smaller than the divided pixel value, respectively.
  • Which side is more likely to be a space and which side is more likely to be a text is determined according to the actual pixels of the space and text in the text grayscale image.
  • the divided pixel values in the text grayscale image can be obtained, and the pixel values on the side with a greater probability of being a blank are unified into a same pixel value, so that the calculated pixel value in the blank is More focused.
  • the unified pixel value is different from the pixel value on the side with a greater probability of being text.
  • the pixel value can be used as the abscissa and the number of pixels as the ordinate.
  • the selected divided pixel value can be a pixel value with a relatively small number of pixels, and a minimum value adjacent to the maximum value can be obtained, and the pixel value corresponding to the obtained minimum value can be used as the divided pixel value.
  • the minimum value indicates that the number of pixels is small, and it is adjacent to the maximum value, which indicates that the corresponding pixel value may be between the text pixel value and the space pixel value.
  • the minimum value adjacent to the left of the maximum value or the minimum value adjacent to the right of the maximum value is smaller than the minimum value corresponding to the maximum value; the minimum value on the right side of the maximum value indicates that the corresponding pixel value is smaller than the pixel value corresponding to the maximum value Large minimum.
  • the pixel value distribution is more balanced. If the color distinction between the background and the text is greater, the pixel value of the background and the pixel value of the text are different.
  • the background includes spaces.
  • the pixel values of the background represent the pixel values of the spaces, and the processing of the background pixel values can realize the processing of the pixel values of the spaces. If the background is closer to white, the text is closer to black, and the segmented pixel value should be smaller than the pixel value corresponding to the maximum value. Select the minimum value adjacent to the maximum value and to the left of the maximum value; if the background is closer to black, then The text is closer to white, and the segmented pixel value should be greater than the pixel value corresponding to the maximum value. Select the minimum value to the right of the maximum value and adjacent to the maximum value.
  • whether the background is closer to white or black may be the default, and the processing is directly performed according to the processing method corresponding to the default color.
  • the background is white, or the background is closer to white, and the background is closer to white.
  • the pixel value corresponding to the minimum value to the left of the maximum value is selected as the segmented pixel value.
  • the color of the background in the text grayscale image and the color of the text can be converted to the default color through color inversion.
  • the default background is white or closer to white, but the background in the actual text grayscale image is black or closer to black, you can convert the pixel value of each pixel to 255 minus the difference of the current pixel value to achieve black and white color Flip.
  • white and black can be distinguished by a preset pixel value to distinguish the background closer to white and black.
  • a preset pixel value to distinguish the background closer to white and black.
  • the pixel value of the background is greater than the preset pixel value, it is determined that the background is closer to white; if the color of the background is less than or equal to the preset pixel value, it is determined that the background is closer to black.
  • the specific value of the preset pixel value is not specifically limited, and can be set according to actual needs, for example, it is set to a middle gray value, such as 127 or 128.
  • the pixel value of the background may be represented by the pixel value corresponding to the maximum value, that is, the pixel value corresponding to the maximum value is used as the background pixel value.
  • the corners of the text grayscale image are usually part of the background, you can select the average pixel value of the four corners of the text grayscale image, the average pixel value of one or more of the four corners, and the pixel with the most corresponding pixels in the four corners. Value, one of the four corners or the pixel value with the most corresponding pixels in the four corners, etc., one of the pixel values represents the pixel value of the background.
  • the method of obtaining the divided pixel value can also be: in the coordinate system established by taking the pixel value as the abscissa and the number of pixels as the ordinate, obtaining the pixel value corresponding to the largest number of pixels, and defining the pixel The number of pixels corresponding to the value is the maximum. Then obtain a minimum value adjacent to the maximum value, and use the pixel value corresponding to the obtained minimum value as the divided pixel value.
  • the method of obtaining the divided pixel value can also be: in the coordinate system established by taking the pixel value as the abscissa and the number of pixels as the ordinate, obtaining the pixel value corresponding to the largest number of pixels, and defining the pixel The number of pixels corresponding to the value is the maximum. Then obtain a minimum value adjacent to the maximum value, and use the pixel value corresponding to the obtained minimum value as the divided pixel value.
  • Step S230 If the pixel value of the background in the text grayscale image is greater than the segmented pixel value, set the pixel with the pixel value greater than the segmented pixel value as a first pixel value, and the first pixel value is greater than or equal to The pixel value of the background. If the pixel value of the background in the text grayscale image is less than the segmented pixel value, the pixel with the pixel value less than the segmented pixel value is set as a second pixel value, and the second pixel value is less than or equal to the background The pixel value.
  • the pixel value of the background can be converted to the same pixel value.
  • a pixel with a pixel value greater than the divided pixel value may be set as the first pixel value, and the first pixel value is greater than or equal to the pixel value of the background.
  • the background may be uniformly converted to white, that is, the first pixel value is 255.
  • the pixel value of the background can also be converted to the same pixel value.
  • a pixel with a pixel value smaller than the divided pixel value may be set as the second pixel value, and the second pixel value is smaller than or equal to the pixel value of the background.
  • the background may be uniformly converted to black, that is, the second pixel value is 0.
  • the pixel value of the background in the text grayscale image may be obtained to determine whether the pixel point of the background should be set to the first pixel value or the second pixel value.
  • the pixel value in the text grayscale image can be inverted. For example, if the background in the default text grayscale image is closer to white, but the background in the actual text grayscale image is closer to black, the pixel value of each pixel is converted to 255 minus the difference of the current pixel value to achieve black and white color Flip.
  • the background includes a space, and the pixel value of the background is converted into the same pixel value, so that the pixel value of the space is converted into the same pixel value.
  • Step S240 Calculate the sum of pixel values of each row of pixels in a predetermined direction in the text grayscale image, where the predetermined direction is a direction perpendicular to the arrangement direction of the characters in the single line of text.
  • the closing operation can eliminate narrow discontinuities and long and thin gaps, eliminate small holes, and fill in the breaks in the contour line, the text grayscale image can be closed, and further Reduce the noise in the space part, make the pixel value of the space part more pure, and the sum of the pixel values calculated in the space part will receive less interference and can be more concentrated.
  • Step S250 Use the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval as the space in the single line of text, and the first pixel value interval is the space corresponding to the space in the text grayscale image The interval in which the sum of pixel values lies.
  • the text grayscale image calculate the sum of the pixel values of each row of pixels in the direction perpendicular to the text arrangement direction, that is, the sum of the pixel values of each row of pixels. Since the pixel values of the spaces are unified, the sum of the pixel values obtained in the spaces is more concentrated in one interval. Therefore, it can be determined according to whether the calculated sum of pixel values is in the interval of the sum of pixel values corresponding to the spaces. Whether the sum of the obtained pixel values is the sum of the pixel values of the pixels in the space part.
  • the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval is used as a space in the single-line text.
  • the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval may be determined by starting from one end in the text arrangement direction in the text grayscale image and sequentially detecting toward the other end Whether the sum of each pixel value is within the first pixel value interval.
  • the sum of pixel values in the first pixel value interval is detected, it is regarded as the beginning of the connected domain, and in the same direction, it is continuously detected whether the sum of pixel values is within the first pixel value interval.
  • the sum of pixel values not within the first pixel value interval it is determined that the previous sum of pixel values corresponds to the end of the connected domain, thereby determining a connected domain.
  • the width of the spaces can be limited, and the width interval of the spaces can be set, which will be in the first In the connected domain formed by the pixel points corresponding to the sum of the pixel values in the pixel value interval, the width within the width interval of the space is regarded as the space in the single-line text. It is understandable that the width of the space indicates the width in the direction of text arrangement. For example, in a text grayscale image in which text is arranged horizontally, the width of the space is the length of the space in the horizontal direction.
  • the width interval of the space can be set according to the character width.
  • the character width in the text grayscale image can be obtained, and the character width can be the width in the text arrangement direction. Then set the width interval of the space according to the character width.
  • the width of the obtained character includes multiple.
  • a character width can be determined according to the obtained widths of multiple characters, and used to set the width interval of the space.
  • the width of each character in the text grayscale image may be obtained; the median of the obtained widths of all characters is used as the character width for setting the width interval of the space.
  • the width of each character in the text grayscale image can also be obtained, and the average of the obtained widths of all characters is used as the character width for setting the width interval of the space.
  • the width of the character can also be determined according to the interval of the sum of the pixel values.
  • the width of each connected domain formed by the pixel points corresponding to the sum of the pixel values in the second pixel value interval may be used as the width of a single character. That is, each connected domain formed by the pixel points corresponding to the sum of the pixel values in the second pixel value interval corresponds to a character, and the width of each connected domain is used as the width of the corresponding character.
  • the second pixel value interval is the interval where the sum of the pixel values corresponding to the characters in the text grayscale image is located, or in the area that includes the characters and does not include the spaces, the sum of the pixel values of each row of pixels in the preset direction is located
  • the range of pixel values is the sum of the pixel values of the pixel in a row where the pixel of the character is located.
  • the sum of pixel values in column I2 is the sum of pixel values corresponding to characters in the text grayscale image.
  • the pixel values of the text and the space are different, and the second pixel value interval is different from the first pixel value interval.
  • the character width used to set the width interval of the space when setting the width interval of the space, since the width of the space is usually greater than a ratio of the character width and less than another ratio of the character width, the character width can be multiplied by The first ratio obtains the first value, the character width is multiplied by the second ratio to obtain the second value, and the interval formed by the first value and the second value is used as the width interval of the space. For example, if the first ratio is one-third and the second ratio is one-half, the width interval of the space is set from one-third of the character width to one-half of the character width.
  • the specific ratio setting is not limited in the embodiment of the present application, and can be set according to actual conditions.
  • the corresponding space widths are different, and different font sizes have different corresponding text widths.
  • space width intervals corresponding to different character widths can be set. After determining the character width in the text grayscale image, according to the corresponding relationship, determine the width interval of the space.
  • the width of the space is larger than the gap inside the text itself, and is smaller than the non-text area that may be formed at the end of the text, in this embodiment, it is also possible to obtain the value in the first pixel value interval.
  • the connected domain with the middle width among the connected domains formed by the pixels corresponding to the sum of pixel values.
  • a certain ratio is added or subtracted to form a space width interval. For example, if the width of the connected domain with the middle width is 6, and the ratio of addition and subtraction is one-third, that is, 2, the obtained space width range is 4-8.
  • the width interval of the space may also be preset according to the general space width.
  • the connected domain whose width is within the width interval Determined as a space in the single line of text. That is to say, in the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval, the connected domain whose width is within the width interval of the space is determined as the space in the text grayscale image. Connected domains outside the width interval are not considered as spaces in the text grayscale image.
  • a connected domain with a width greater than a width interval may also be determined as a blank area after the end of a single line of text.
  • the set first pixel value interval may be different.
  • the background is uniformly black, that is, the pixel values of the pixels of the background are uniformly 0, that is, the pixel values of the spaces are uniformly 0, and the sum of the values of pixels with the pixel value of 0 is all 0.
  • the first pixel value interval may be set to be relatively small, and a pixel value interval is set as the first pixel value interval within a pixel value less than 127, such as 5-40.
  • the pixel value of the pixel of the text is greater than 0, and the summed maximum value is uncertain.
  • the second pixel value interval is an infinite interval at the right end, that is, an interval greater than a minimum pixel value.
  • the first pixel value interval may intersect with the second pixel value interval, and the minimum pixel value of the first pixel value interval is smaller than the minimum pixel value of the second pixel value interval.
  • the first pixel value interval is set to 5-40, and the second pixel value interval is set to be greater than 10.
  • the pixels in the space part The sum of the points is different, the minimum pixel value of the sum of the pixels in the space is different, and the maximum pixel value that may be reached is also uncertain, that is, the range of the first pixel value interval is indefinite, and the first pixel value interval can be an infinity on the right end
  • the infinite interval that is, the interval greater than a minimum pixel value.
  • the first pixel value interval may correspond to the height of the text grayscale image, that is, different first pixel value intervals may be set corresponding to different heights. Select the corresponding first pixel value interval according to the current actual height of the text grayscale image.
  • a default first pixel value interval may be set, and the default first pixel value interval corresponds to a default height.
  • the original text grayscale image can be proportionally transformed to the default height, and then the text grayscale image under the default height can be used as the text grayscale image for determining the position of the space. After determining the position of the space in the text grayscale image at the default height, the position of the space in the original text grayscale image is determined according to the proportional relationship between the text grayscale image and the original text grayscale image at the default height.
  • the sum of the pixel values of each row of pixels in the preset direction is obtained, and the connected domain formed by the pixels corresponding to the sum of the pixel values in the first pixel value interval is obtained, Determine the space based on the obtained connected domain.
  • the second pixel value interval may be a pixel value interval between one pixel value and another pixel value.
  • the text part also includes part of the background color, for text grayscale images of different heights, the number of pixels is different, and the range of the second pixel value interval is also different.
  • different second pixel value intervals may be set corresponding to different heights. According to the current actual height of the text grayscale image, the corresponding second pixel value interval is selected.
  • a default second pixel value interval may be set, and the default second pixel value interval corresponds to a default height.
  • the original text grayscale image can be proportionally transformed to the default height, and then the character width is calculated according to the text grayscale image under the default height and the second pixel value interval, and the width interval of the space is determined according to the character width.
  • the first pixel value interval and the second pixel value interval may intersect, and the minimum pixel value of the first pixel value interval is greater than the minimum pixel value of the second pixel value interval.
  • the processing operation can be that the default background is closer to white or the background is closer to black. If the current color of the background in the text grayscale image does not match the default color, you can reverse the color of the text grayscale image before processing.
  • the pixels in the background part are determined by dividing the pixel values, and the pixels in the background part are unified in color. All the blanks belong to the background. In theory, the pixel values of the blanks are unified, so that the sum of the pixel values is calculated At this time, even if affected by the noise, the sum of the pixel values obtained in the space part is relatively concentrated, and the space can be selected through the first pixel value interval to determine the space of a single line of text in the text grayscale image.
  • the embodiment of the present application describes the method for recognizing spaces in the text line through a specific usage scenario.
  • the obtained text grayscale image is shown in Figure 4, the background part is closer to white, and the text part is close to black, but includes a lot of noise.
  • image preprocessing such as median blurring and equalization can be performed on the text grayscale image, so that the text grayscale image after image preprocessing can be used as the text grayscale image for subsequent processing.
  • the maximum value among the maximum values is the extreme value point m1. Because in the text grayscale image, the background part is more white, choose the adjacent minimum value to the left of the larger value, that is, select the first minimum value point on the left side of m1, as shown in Figure 5 m2.
  • the pixel value corresponding to m2 is used as the divided pixel value. In the embodiment of the present application, the pixel value corresponding to the minimum point m2 in FIG. 5 is taken as an example.
  • the background is more white.
  • the pixel value of the pixel corresponding to the maximum point represents the pixel value of the background, it can also be determined that the background is more white.
  • the pixel points larger than the divided pixel value in the text grayscale image are set as the first pixel value.
  • 255 is used as the first pixel value, and the pixel points with a pixel value greater than 213 in the text grayscale image are set to 255, and the obtained text grayscale image is shown in FIG. 6.
  • the text grayscale image shown in FIG. 6 can be closed, and the obtained text grayscale image is shown in FIG. 7. It can be seen from Figure 7 that the text grayscale image after the closing operation has less noise and the blank part is more pure.
  • the first pixel value interval and the second pixel value interval may be set for the background being black, that is, the pixel value of the background is 0. Therefore, the color of the text grayscale image shown in Figure 7 can be reversed, that is, the pixel value of each pixel is set to 255 minus the current pixel value to realize the white part of the text grayscale image with all the pixel values of 255 Converted to the black part with a pixel value of 0, the text grayscale image obtained is shown in Figure 8.
  • the text grayscale image is a text grayscale image in which the text is arranged horizontally, and the sum of the pixel values of each column is counted.
  • the statistical results obtained can be shown in Figure 9.
  • the ordinate represents the sum of pixel values;
  • the abscissa represents the position of each pixel in the text arrangement direction, or the number of columns.
  • the ordinate value corresponding to the abscissa 100 in FIG. 9 may represent the sum of the pixel values of the pixels in the 100th column.
  • each pixel value is within the first pixel value interval.
  • the embodiment of the present application takes 5-40 as the first pixel value interval as an example.
  • the determination of the character width can refer to the foregoing embodiment.
  • the connected domains within the width interval of the space are determined to be spaces, and the connected domains that are not within the width interval of the spaces are determined to be non-spaces. Therefore, the space in the text line can be effectively determined by the method of identifying the space in the text line.
  • the space recognition method can be used in OCR optical character recognition. After the text is detected, a single line of text is then detected. Using this method, extract the position of the space, send the divided words to the OCR recognition module for word-by-word recognition, and then use the space to connect the recognized words to get the final complete single line with spaces Text, to prevent getting all words connected together, no spaces, text that humans cannot recognize.
  • An embodiment of the present application also provides a device 300 for identifying spaces in a text line.
  • the pixel value acquisition module 320 is used to calculate the sum of the pixel values of each row of pixels in a preset direction in the text grayscale image, where the preset direction is the same as the text in the single line of text
  • the arrangement direction is the vertical direction
  • the space determination module 330 is configured to use the connected domain formed by the pixel points corresponding to the sum of the pixel values in the first pixel value interval as the space in the single-line text, and the first pixel value
  • the interval is the interval where the sum of the pixel values corresponding to the spaces in the text grayscale image is located.
  • the device may further include a segmentation module, including a segmentation pixel value determination unit for obtaining segmentation pixel values in the text grayscale image; a pixel value setting unit for determining the background of the text grayscale image If the pixel value of is greater than the divided pixel value, a pixel with a pixel value greater than the divided pixel value is set as the first pixel value, and the first pixel value is greater than or equal to the pixel value of the background; if the text grayscale image The pixel value of the middle background is smaller than the divided pixel value, and the pixel point whose pixel value is smaller than the divided pixel value is set as a second pixel value, and the second pixel value is less than or equal to the pixel value of the background.
  • a segmentation module including a segmentation pixel value determination unit for obtaining segmentation pixel values in the text grayscale image; a pixel value setting unit for determining the background of the text grayscale image If the pixel value of is greater than the divided pixel value, a
  • the segmented pixel value determining unit is configured to use the pixel value as the abscissa and the number of pixel points as the ordinate to obtain a fitting curve between the pixel value and the number of pixels corresponding to each pixel value; The maximum value among all the maximum values in the composite curve is obtained; a minimum value adjacent to the maximum value is obtained; the pixel value corresponding to the obtained minimum value is used as the divided pixel value.
  • the device may also include a denoising module, which is used to perform a closing operation on the text grayscale image.
  • a denoising module which is used to perform a closing operation on the text grayscale image.
  • the space determination module 330 may be used to obtain the character width in the text grayscale image; set the space width interval according to the character width; and set the pixel corresponding to the sum of the pixel values in the first pixel value interval Among the connected domains formed by dots, the connected domains whose width is within the width interval are determined as spaces in the single-line text.
  • the space determination module 330 may be used to obtain the width of each character in the text grayscale image; the median of the obtained widths of all characters is used as the character width.
  • the space determining module 330 may be used to take the width of each connected domain formed by the pixel points corresponding to the sum of the pixel values in the second pixel value interval as the width of a single character, and the second pixel value interval Different from the first pixel value interval, the second pixel value interval is an interval where the sum of the pixel values corresponding to the characters in the text grayscale image is located.
  • the first pixel value interval and the second pixel value interval intersect, and the minimum value of the first pixel value interval is The pixel value is greater than the minimum pixel value in the second pixel value interval; if the pixel value of the background in the text grayscale image is less than or equal to the preset pixel value, the first pixel value interval and the second pixel value interval There is an intersection, and the minimum pixel value of the first pixel value interval is smaller than the minimum pixel value of the second pixel value interval.
  • the device may further include an equalization module, which is used to perform equalization processing on the text grayscale image.
  • the method and device for recognizing spaces in text lines provided by the embodiments of the present application can effectively extract the positions of spaces in the text by intelligently searching for the best segmentation point.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
  • Each module can be configured in different electronic devices, and can also be configured in the same electronic device, which is not limited in the embodiment of the present application.
  • FIG. 11 shows a structural block diagram of an electronic device 500 provided by an embodiment of the present application.
  • the electronic device may include one or more processors 510 (only one is shown in the figure), a memory 520, and one or more programs.
  • the one or more programs are stored in the memory 520 and configured to be executed by the one or more processors 510.
  • the one or more programs are executed by the processor to execute the methods described in the foregoing embodiments.
  • the processor 510 may include one or more processing cores.
  • the processor 510 uses various interfaces and lines to connect various parts of the entire electronic device 500, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 520, and calling data stored in the memory 520.
  • the processor 510 may use at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 510 may integrate one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing of display content; the modem is used for processing wireless communication. It is understandable that the above-mentioned modem may not be integrated into the processor 510, but may be implemented by a communication chip alone.
  • the memory 520 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 520 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 520 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing each of the foregoing method embodiments, and the like.
  • the data storage area can also be the data created by the electronic device in use.
  • FIG. 12 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable storage medium 700 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 700 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 700 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 700 has a storage space for the program code 710 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • the program code 710 may be compressed in a suitable form, for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

本申请公开了一种文本行中的空格识别方法、装置、电子设备及存储介质,涉及图像处理技术领域。其中,该方法包括:获取文本灰度图,所述文本灰度图中仅包括单行文本;计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。该技术方案可以确定出单行文本中的空格。

Description

文本行中的空格识别方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2020年3月23日提交的申请号为202010231850.8的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文
技术领域
本申请涉及图像处理技术领域,更具体地,涉及一种文本行中的空格识别方法、装置、电子设备及存储介质。
背景技术
在图像中,若存在一行字符,则需要对其中的空格进行提取,以确定哪些字符之间存在空格,用于获得包含有空格的真实文本信息。
发明内容
鉴于上述问题,本申请提出了一种文本行中的空格识别方法、装置、电子设备及存储介质。
第一方面,本申请实施例提供了一种文本行中的空格识别方法,获取文本灰度图,所述文本灰度图中仅包括单行文本;计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
第二方面,本申请实施例提供了一种文本行中的空格识别装置,所述装置包括:图片获取模块,用于获取文本灰度图,所述文本灰度图中仅包括单行文本;像素值获取模块,用于计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;空格确定模块,用于将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储器;一个或多个程序,其中所述一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序被所述处理器执行用于执行上述的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请一实施例提供的文本行中的空格识别方法的流程图。
图2示出了本申请一实施例提供的像素点排列示意图。
图3示出了本申请另一实施例提供的文本行中的空格识别方法的流程图。
图4示出了本申请实施例提供的示意性的文本灰度图。
图5示出了本申请实施例提供的文本灰度图中像素值与像素点数量之间的拟合曲线图。
图6示出了本申请实施例提供的对背景部分的像素值进行统一后的示意图。
图7示出了本申请实施例提供的对图6所示的文本灰度图进行闭操作后的示意性的文本灰度图。
图8示出了本申请实施例提供的对图7的颜色进行翻转后的示意图。
图9示出了对图8中每一列像素点计算像素值之和后的统计结果图。
图10示出了本申请实施例提供的文本行中的空格识别装置的功能模块图。
图11示出了本申请实施例提供的电子设备的结构框图。
图12是本申请实施例的用于保存或者携带实现根据本申请实施例的文本行中的空格识别方法的程序代码的存储单元。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
对于图片中的文字,无法进行直接编辑、复制以及剪切等文字处理操作,因此,通常需要对其进行识别,获得可以以文本形式进行表现的文字,从而可以对获得的文本进行编辑、复制以及剪切等文字处理操作。
对于图片中中文字符的处理,每个字符可以作为一个单独的文字存在,不管有没有识别出空格,都可以将每个字符进行一定间隔的排序形成具有真实文本信息的文本。但是,对于其他语种的识别,例如对于英文单词的识别,每个单词由对应的字母字符组成,不同单词的字母都是从同一个字母表中进行选择。当出现连续的一行字符的时候,对单词之间的空格提取就显得极为重要,如果无法提取出空格,那么得到的每行文本将会是连在一起的一连串字母,无法分别具体是什么单词,造成后续的机器处理难度及人类自己理解识别的难度。
因此,本申请实施例提供了一种文本行中的空格识别方法、装置、电子设 备及存储介质,通过计算得到的像素值之和是否在文本灰度图中空格对应的像素值之和所在区间,确定文本灰度图中的单行文本的空格。下面将通过具体实施例对本申请实施例提供的文本行中的空格识别方法、装置、电子设备及存储介质进行详细说明。
请参阅图1,示出了本申请实施例提供的文本行中的空格识别方法。具体的,该方法包括:
步骤S110:获取文本灰度图,所述文本灰度图中仅包括单行文本。
该文本灰度图是一幅灰度图,其中只包括一行文本,该一行文本即该文本灰度图中的单行文本。识别该文本灰度图中文本行中的空格,即识别该单行文本中的空格。
在本申请实施例中,文本灰度图中单行文本的具体排列方向并不限定,可以是横向排列;也可以是纵向排列,当然,也可以是其他方向的排列。本申请实施例以横向排列为例进行说明。
可选的,文本灰度图中文字排列为横向还是纵向的识别,在本申请实施例中并不限定。例如,可以是默认对横向排列的进行处理,或者默认对纵向排列的进行处理。又如,根据文本灰度图中相互垂直的两条边确定文字排列方向,可以确定其中较长的一条边的延伸方向为文字排列方向。
步骤S120:计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向。
在文本灰度图中,定义单行文本中文字排列的方向为文字排列方向,与文字排列方向垂直的方向为预设方向。例如,横向排列的单行文本中,横向为文字排列方向,纵向为预设方向。
在本申请实施例中,可以计算文本灰度图中,在预设方向上每一排像素点的像素值之和。其中,预设方向上的一排像素点,表示一排像素点的排列方向为预设方向。例如,文字横向排列的单行文本中,纵向上的一排像素点为一列像素点,计算文本灰度图中每一列像素点的像素值之和。如图2示出了一种文字横向排列的文本灰度图中的像素点示意图,第I1列的像素点包括(I1,J1)(I1,J2)(I1,J3);第I2列的像素点包括(I2,J1)(I2,J2)(I2,J3);第I3列的像素点包括(I3,J1)(I3,J2)(I3,J3),依次类推。预设方向为纵向,计算预设方向上的每一排像素点的像素值之和,即计算其中每一列像素点的灰度值之和,获得分别对应第I1列至第I7列的7个像素值之和。
可选的,由于像素点排列紧密,每个空格可能包括多排像素点,为了提高计算速度,在本申请实施例中,也可以计算预设方向上每相邻两排或每相邻多排的像素点的像素值之和。
可选的,由于像素点排列紧密,每个空格可能包括多排像素点,为了提高计算速度,在本申请实施例中,也可以隔一排或者隔多排计算一排或多排在预设方向上的像素点的像素值之和。
步骤S130:将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格 对应的像素值之和所在的区间。
为了便于文本的识别,在单行文本图片中,通常文字之间的颜色相同或接近,背景的颜色相同或接近,文字与背景之间的颜色差别较大。即形成文本的像素点的像素值相同或接近,形成背景的像素点的像素值相同或接近,形成文本的像素点与形成背景的像素点的像素值之间,差别较大,如像素值之差大于某个预设像素差值。背景即单行文本图片中文本以外的部分,包括空格、文本行的上下左右等各个区域。则在单行文本图片中,形成文本的像素点的像素值与形成空格的像素点的像素值差别较大。
因此,在本申请实施例中,空格中的像素点用于按照前述步骤中计算像素值之和的方式进行计算后,可能处于一个像素值区间范围,定义为第一像素值区间。该第一像素值区间具有独特性,与文本中的像素点用于按照前述步骤中计算像素值之和的方式进行计算后可能处于的像素值区间范围不同。
可以将在第一像素值区间内的像素值之和对应的像素点形成的连通域,识别为单行文本中的空格。即将在第一像素值区间内的像素值之和对应的像素点组成的区域确定为空格。
其中,计算的为预设方向上每排像素点的像素值之和,或者计算的是预设方向上每相邻两排或每相邻多排的像素点的像素值之和,像素值之和对应的像素点,可以是用于计算该像素值之和的所有像素点。
若计算像素点的像素值之和时,隔一排或者隔多排计算一排或多排在预设方向上的像素点的像素值之和,像素值之和对应的像素点,可以包括用于计算该像素值之和的所有像素点,以及计算该像素值之和时被间隔的未进行像素值之和计算的像素点。例如,横向排列的单行文本中,计算了第一列像素点的像素值之和,第三列像素点的像素点像素值之和,依次类推的奇数列像素点的像素值之和。则每一奇数列的像素值之和对应的像素点,可以包括该奇数列的像素点以及被该奇数列间隔掉的偶数列像素点,可以理解的,在该举例中,每一奇数列像素点间隔掉的偶数列像素点,为比该奇数小一的偶数列的像素点。
在本申请实施例中,对于单行文本所在的文本灰度图,以与单行文本中的文字排列方向垂直的方向作为预设方向,计算预设方向上每一排像素点的像素值之和。再根据文本灰度图中空格对应的像素点的像素值之和可能存在的第一像素值区间,确定哪些像素值之和在第一像素值区间,将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,从而较为准确地识别到单行文本中的空格。
本申请另一实施例提供的文本行中的空格识别方法中,还可以包括对背景部分进行颜色统一,以使空格部分像素值更加统一,计算获得的不同像素值之和差别更小,更加集中,方便更准确地设置到可以衡量像素值之和所在范围的第一像素值区间。请参见图3,该方法包括:
步骤S210:获取文本灰度图,所述文本灰度图中仅包括单行文本。
在本申请实施例中,该文本灰度图的具体获取方式并不限定。在该文本灰 度图中,文字的大小、文字高度与图片高度之间的比例以及文字宽度与图片宽度之间的比例等并不限定。
可选的,文本灰度图可以是通过从文本图片中进行单行文本提取,获得仅包括单行文本的单行文本图片。再通过图像预处理,将该单行文本图片转换为该文本灰度图。其中,提取单行文本的方式在本申请实施例中并不限定,例如可以是通过深度学习的算法进行提取,如textboxes系列算法,east算法系列,sglink等算法。
可选的,该文本灰度图也可以是对本身只包括一行文本的单行文本图片进行图像预处理获得。
其中,在本申请实施例中,图像预处理可以包括以下一种或多种:
若单行文本图片本身并非灰度图,如是RGB三通道图片,可以对单行文本图片进行灰度化处理,转换为灰度图,作为该文本灰度图;
对文本灰度图进行去噪处理,如中值模糊处理;
对文本灰度图进行均衡化处理,以使文本灰度图中的像素值分布更加均衡,防止图片中像素值的过分的偏移。
可选的,当图像预处理包括两种或两种以上的处理方式时,各处理方式之间的处理顺序可以与上述的描述顺序一致,在灰度转换后再进行去噪,在去噪后再均衡化处理,降低每一步处理的难度,提高处理的有效性。
另外,可选的,在本申请实施例中,获取到的也可以是灰度化后的文本灰度图后,在获取到文本灰度图后,进行去噪以及均衡化等图像预处理操作。
步骤S220:获取所述文本灰度图中的分割像素值。
由于文本灰度图中噪声的存在,以及文本灰度图本身并非二值化,空格部分并非纯粹的像素值,可以通过分割像素值对更大概率是空格的像素值进行确定以及统一。
也就是说,文本灰度图中的分割像素值,可以较为准确地区分空格以及文字,分割像素值的一侧的像素值,相比于文字的像素值,更大概率是空格的像素值;分割像素值的另一侧的像素值,相对于空格,更大概率是文字的像素值。即分割像素值一侧的像素值,是空格像素值的概率大于是文字像素值的概率;分割像素值的另一侧的像素值,是文字的概率大于是空格的概率。
其中,分割像素值的一侧和另一侧表示分割像素值相对的两侧,相对的两侧的像素值分别为比分割像素值大的像素值以及比分割像素值小的像素值。具体哪一侧更大概率是空格哪一侧更大概率是文字,根据文本灰度图中空格和文字的实际像素确定。
因此,在本申请实施例中,可以获取文本灰度图中的分割像素值,将更大概率是空格的一侧的像素值统一为一个相同的像素值,以使空格中计算得到的像素值更集中。并且,为了有效区分空格和文字,统一到的像素值与更大概率是文字的一侧的像素值不同。
由于包括空格的背景部分颜色相对文字通常具有较大区别,而背景部 分的像素点数量比文字的像素点数量更多,可选的,可以以像素值为横坐标,像素点数量为纵坐标,建立坐标系,用于确定像素点数量随像素值的变化。在该坐标系中,可以获取像素值以及每个像素值对应的像素点数量之间的拟合曲线,获取该拟合曲线中所有极大值中的最大值。该最大值更大概率是背景的像素值,则更大概率是空格的像素值。
由于分割像素值的一侧需要更大概率是文字像素值,另一侧需要更大概率是空格像素值,则分割像素值的数量应当较少,且更大概率处于文字像素值以及空格像素值之间。因此,选取的分割像素值可以是一个对应的像素点数量较少的像素值,可以获取与最大值相邻的一个极小值,以获取的极小值对应的像素值作为所述分割像素值。该极小值表示像素点数量较少,且其与最大值相邻,表示其对应的像素值可能处于文字像素值和空格像素值之间。
另外,由分割像素值在空格像素值和文字像素值之间,可以根据文本灰度图的实际情况,选择最大值左侧相邻的极小值,或者选择最大值右侧相邻的极小值。其中,最大值左侧的极小值,表示对应的像素值比最大值对应的像素值小的极小值;最大值右侧的极小值,表示对应的像素值比最大值对应的像素值大的极小值。
由于均衡化处理后的文本灰度图中,像素值分布更均衡。背景和文字之间颜色区分度较大,则背景的像素值和文字的像素值差别较大。而背景包括空格,通过背景的像素值代表空格的像素值,对背景像素值的处理可以实现对空格像素值的处理。若背景更靠近白色,则文字更靠近黑色,分割像素值应当小于最大值对应的像素值,选择与最大值相邻的、且在最大值左侧的极小值;若背景更靠近黑色,则文字更靠近白色,分割像素值应当大于最大值对应的像素值,选择最大值右侧且与最大值相邻的极小值。
在一种实施方式中,背景更靠近白色还是黑色,可以是默认的,直接按照默认的颜色对应的处理方式进行处理。例如,默认文本灰度图中背景为白色,或者背景更靠近白色,按照背景更靠近白色的方式处理,选择最大值左侧的极小值对应的像素值,作为分割像素值。
可选的,在该实施方式中,若背景的实际颜色与默认的背景颜色不同,可以通过颜色翻转将文本灰度图中背景的颜色以及文字的颜色转换到默认的颜色。如默认背景是白色或更靠近白色,但实际文本灰度图中背景是黑色或者更靠近黑色,可以将每个像素点的像素值转换为255减去当前像素值的差值,实现黑白颜色的翻转。例如某像素点像素值为214,转换后像素值变为(255-214)=41。
在另一种实施方式中,可以通过一个预设像素值区分白色和黑色,以区分背景更靠近白色和黑色。在该实施方式中,若背景的像素值大于该预设像素值,确定背景更靠近白色;若背景的颜色小于或等于该预设像素值,确定背景更靠近黑色。该预设像素值具体值并不做具体限定,可以根据实际需求设置,如设置为是居中的灰度值,如127或128。
可选的,在该实施方式中,由于像素点较多,背景的像素值可以通过最大值对应的像素值代表,即以最大值对应的像素值作为背景像素值。
可选的,由于文本灰度图中的边角通常为背景的一部分,可以选取文本灰度图中四角的平均像素值、四角中一角或多角的平均像素值、四角中对应像素点最多的像素值、四角中一角或多角中对应像素点最多的像素值等其中一个像素值代表背景的像素值。
另外,可选的,获取分割像素值的方式也可以是,在以像素值为横坐标,像素点数量为纵坐标,建立的坐标系中,获取对应像素点数量最多的像素值,定义该像素值对应的像素点数量为最大值。再获取与所述最大值相邻的一个极小值,以获取的极小值对应的像素值作为分割像素值。获取最大值相邻的极小值的具体方式参见前述描述,在此不再赘述。
步骤S230:若所述文本灰度图中背景的像素值大于所述分割像素值,将像素值大于所述分割像素值的像素点设置为第一像素值,所述第一像素值大于或等于背景的像素值。若所述文本灰度图中背景的像素值小于所述分割像素值,将像素值小于所述分割像素值的像素点设置为第二像素值,所述第二像素值小于或等于所述背景的像素值。
若背景更靠近白色,背景的像素值大于分割像素值,可以将背景的像素值转换为同一个像素值。具体的,可以将像素值大于分割像素值的像素点设置为第一像素值,该第一像素值大于或等于背景的像素值。可选的,在本申请实施例中,可以将背景统一转换为白色,即第一像素值为255。
若背景更靠近黑色,背景的像素值小于分割像素值,也可以将背景的像素值转换为同一个像素值。具体的,可以将像素值小于分割像素值的像素点设置为第二像素值,该第二像素值小于或等于背景的像素值。可选的,在本申请实施例中,可以将背景统一转换为黑色,即第二像素值为0。
可选的,在本申请实施例中,可以通过获取文本灰度图中背景的像素值确定背景的像素点应该设置为第一像素值还是第二像素值。
可选的,在该实施例中,也可以默认背景更靠近哪一颜色,设置为所靠近的颜色对应的像素值。如默认背景更靠近白色,则直接将像素值大于分割像素值的像素点设置为第一像素值。
可选的,若背景的实际颜色与该默认的颜色条件不匹配,可以对文本灰度图中的像素值进行翻转。例如,默认文本灰度图中背景更靠近白色,但是实际文本灰度图中背景更靠近黑色,则将每个像素点的像素值转换为255减去当前像素值的差值,实现黑白颜色的翻转。
在本申请实施例中,背景包括空格,将背景的像素值转换为同一个像素值,则实现将空格的像素值转换为同一个像素值。
步骤S240:计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向。
可选的,在本申请实施例中,由于闭操作可以消弥狭窄的间断和长细的鸿沟,消除小的空洞,并填补轮廓线中的断裂,可以对文本灰度图进行 闭操作,进一步减少空格部分的噪声,使空格部分的像素值更纯净,空格部分计算获得的像素值之和受到的干扰更小,可以更加集中。
步骤S250:将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
在文本灰度图中,计算与文字排列方向垂直的方向上,每一排像素点的像素值之和,即对每一排像素点的像素值求和。由于空格的像素值进行了统一,则空格部分求得的像素值之和更加集中在一个区间内,因此,可以根据计算的像素值之和有没有在空格对应的像素值之和所在区间,确定获得的各个像素值之和是否为空格部分的像素点求得的像素值之和。将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格。
可选的,在第一像素值区间内的像素值之和对应的像素点形成的连通域的确定方式可以是,从文本灰度图中在文字排列方向上的一端开始,向另一端依次检测各个像素值之和是否在第一像素值区间内。当检测到在第一像素值区间内的像素值之和时,作为连通域的开始,并且在同一方向上继续检测像素值之和是否在第一像素值区间内。当检测到不在第一像素值区间内的像素值之和时,确定前一个像素值之和对应连通域的结束,从而确定一个连通域。
可选的,由于文字本身也具有一定的间隔,如同一个单词的不同字母之间存在间隔,为了降低对空格的误识别,可以对空格的宽度进行限定,设置空格的宽度区间,将在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度在空格的宽度区间内的,作为所述单行文本中的空格。可以理解的,空格的宽度表示在文字排列方向上的宽度。例如,文字横向排列的文本灰度图中,空格宽度为空格在横向上的长度。
在一种实施方式中,空格的宽度区间可以根据字符宽度进行设置。
在该实施方式中,可以获取所述文本灰度图中的字符宽度,该字符宽度可以是在文字排列方向上的宽度。再根据字符宽度设置空格的宽度区间。
在该实施方式中,由于在单行文本图片中,字符数量包括多个,则可以获得字符的宽度包括多个。可以根据获得的多个字符的宽度确定一个字符宽度,用于设置空格的宽度区间。
可选的,可以获取所述文本灰度图中每个字符的宽度;以获取到的所有字符的宽度中的中位数,作为用于设置空格的宽度区间的字符宽度。
可选的,也可以获取文本灰度图中每个字符的宽度,以获取到的所有的字符的宽度的平均数,作为用于设置空格的宽度区间的字符宽度。
其中,字符的宽度也可以根据像素值之和所在区间范围进行确定。具体的,可以将在第二像素值区间内的像素值之和对应的像素点形成的每个连通域的宽度,作为单个字符的宽度。即在第二像素值区间内的像素值之和对应的像素点形成的每个连通域,分别对应一个字符,每个连通域的宽 度,作为对应的字符的宽度。
该第二像素值区间为文本灰度图中字符对应的像素值之和所在的区间,或者说包括字符且不包括空格的区域内,预设方向上每一排像素点的像素值之和所在的像素值区间。其中,字符对应的像素值之和,为字符的像素点所在的一排的像素点的像素值之和,例如图2所示的像素排列示意图中,若像素点(I2,J2)为字符的像素点,则I2列的像素值之和为文本灰度图中字符对应的像素值之和。另外,可以理解的,文字和空格的像素值不同,第二像素值区间与所述第一像素值区间不同。
可选的,根据用于设置空格的宽度区间的字符宽度,设置空格的宽度区间时,由于空格的宽度通常大于字符宽度的一个比例,小于字符宽度的另一个比例,则可以以字符宽度乘以第一比例得到第一数值,以字符宽度乘以第二比例得到第二数值,将第一数值和第二数值形成的区间,作为空格的宽度区间。例如,第一比例为三分之一,第二比例为二分之一,则空格的宽度区间设置为字符宽度的三分之一到字符宽度的二分之一。具体的比例设置在本申请实施例中并不限定,可以根据实际情况设定。
可选的,由于不同字号的文字,对应的空格宽度不同,而不同字号的字,对应的文字宽度也不同。在该实施方式中,可以设置对应不同字符宽度的空格宽度区间。在确定文本灰度图中的字符宽度后,根据对应关系,确定空格的宽度区间。
在一种实施方式中,由于空格的宽度比文字本身内部的缝隙大,且比文字末尾可能形成的无文字区域小,因此,在该实施方式中,还可以获取在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度居中的连通域。在该宽度居中的连通域的宽度基础上,加减一定的比例形成空格宽度区间。例如,宽度居中的连通域的宽度为6,加减的比例分别为三分之一,即2,则获得的空格宽度区间为4至8。
在一种实施方式中,空格的宽度区间也可以根据通用的空格宽度预先设置。
在本申请实施例中,在确定空格的宽度空间的基础上,将在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度在所述宽度区间内的连通域,确定为所述单行文本中的空格。也就是说,由在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度在空格的宽度区间范围内的连通域,确定为文本灰度图中的空格,宽度在宽度区间之外的连通域,不认为是文本灰度图中的空格。
可选的,在文本灰度图中,宽度比宽度区间更大的连通域,也可以确定为单行文本结束后的空白区域部分。
在本申请实施例中,背景的像素值统一后,更靠近白色还是更靠近黑色,或者背景像素值是黑色,设置的第一像素值区间可以不同。
在一种实施方式中,若背景统一为黑色,即背景的像素点的像素值统一为0,即空格的像素值统一为0,多少个像素值为0的像素点求和所得的 值都为0。即使空格中有噪点,通过处理后噪点的数量也较少,且因为噪点本身通过灰度值进行表现,则在空格部分的像素值之和都可以集中在一个偏小的区间内,基本不受文本灰度图高度、字体大小等因素影响。在该实施方式中,为了有效容错,可以设置第一像素值区间偏小,在小于127的像素值内设置一个像素值区间作为第一像素值区间,如5至40。
在该实施方式中,文字的像素点的像素值大于0,求和后的最大值不确定,上述第二像素值区间为右端无穷大的无限区间,即为大于一个最小像素值的区间。为了有效容错,该第一像素值区间与上述第二像素值区间可以有交叉,且第一像素值区间的最小像素值小于第二像素值区间的最小像素值。例如,第一像素值区间设置为5至40,第二像素值区间设置为大于10。
在一种实施方式中,若背景的像素点的像素值统一为非0的像素值,如统一为大于预设像素值,由于文本灰度图在预设方向上的高度不同,空格部分的像素点相加的和不同,空格部分像素点之和的最小像素值不同,可能达到的最大像素值也不确定,即第一像素值区间的范围不定,该第一像素值区间可以是一个右端无穷大的无限区间,即大于一个最小像素值的区间。
可选的,在该实施方式中,第一像素值区间可以与文本灰度图的高度对应,即可以对应不同的高度设置不同的第一像素值区间。根据文本灰度图当前的实际高度,选择对应的第一像素值区间。
可选的,在该实施方式中,可以设置一默认的第一像素值区间,该默认的第一像素值区间对应默认的高度。可以将原文本灰度图按比例变换到默认的高度,再以默认高度下的文本灰度图作为确定空格位置的文本灰度图。在确定默认高度下文本灰度图中的空格位置后,再根据默认高度下文本灰度图与原文本灰度图之间的比例关系,确定原文本灰度图中空格的位置。即对默认高度下的文本灰度图获取预设方向上每一排的像素点的像素值之和,并获取在第一像素值区间内的像素值之和对应的像素点形成的连通域,根据获取的连通域确定空格。
对应的,在该实施方式中,由于文字接近黑色,像素值较小,该第二像素值区间可以是一个像素值到另一个像素值之间的像素值区间。但是,由于文字部分还包括部分背景的颜色,则对于不同高度的文本灰度图,像素点数量不同,第二像素值区间的范围也不同。
因此,可选的,该实施方式中,可以对应不同的高度设置不同的第二像素值区间。根据文本灰度图当前的实际高度,选择对应的第二像素值区间。
可选的,在该实施方式中,可以设置一默认的第二像素值区间,该默认的第二像素值区间对应默认的高度。可以将原文本灰度图按比例变换到默认的高度,再根据默认高度下的文本灰度图以及第二像素值区间计算字符宽度,根据字符宽度确定空格的宽度区间。
另外,在该实施方式中,为了容错,可以是第一像素值区间与第二像素值区间有交叉,且第一像素值区间的最小像素值大于所述第二像素值区间的最小像素值。
在本申请实施例中,对文本灰度图的每一次处理,处理操作都可以是默认背景更接近白色或者背景更接近黑色。若文本灰度图中背景的当前颜色与默认的颜色不符,可以对文本灰度图进行颜色翻转后再进行处理。
在本申请实施例中,通过分割像素值确定背景部分的像素点,将背景部分的像素点进行颜色统一,空格部分全部属于背景,理论上空格部分的像素值统一,从而在计算像素值之和时,即使受噪点影响,空格部分获得的像素值之和也比较集中,可以通过第一像素值区间将空格选定出来,从而确定文本灰度图中单行文本的空格。
本申请实施例通过一种具体的使用场景对该文本行中的空格识别方法进行说明。
获取到的文本灰度图如图4所示,背景部分更接近白色,文字部分接近黑色,但是包括很多噪音。可选的,可以对该文本灰度图可以进行中值模糊以及均衡化等图像预处理,以进行图像预处理后的文本灰度图,作为后续处理的文本灰度图。
在文本灰度图中,可以以像素值为横坐标,像素点数量为纵坐标,获取像素值以及每个像素值对应的像素点数量之间的拟合曲线,获取到的拟合曲线如图5中的曲线L所示。
在该曲线L中,可以确定极大值中的最大值为极值点m1。由于在该文本灰度图中,背景部分更偏向白色,选择做大值左侧的相邻极小值,即选择m1左侧第一个极小值点,如图5中的极小值点m2。以m2对应的像素值作为分割像素值。本申请实施例以如图5中极小值点m2对应的像素值为213进行举例。
在该文本灰度图中,背景更偏向白色,例如以最大值点对应的像素点的像素值代表背景的像素值,也可以确定背景更偏向白色。将文本灰度图中大于分割像素值的像素点设置为第一像素值。在本申请实施例中,以255作为第一像素值,将文本灰度图中像素值大于213的像素点设置为255,获得的文本灰度图如图6所示。
为了进一步去除噪点,可以对图6所示的文本灰度图进行闭操作,获得的文本灰度图如图7所示。从图7可以看出,闭操作后的文本灰度图中,噪点更少,空白部分更加纯净。
在本申请实施例中,为了更方便计算,设置的第一像素值区间以及第二像素值区间可以是针对背景为黑色,即背景的像素值为0的。因此,可以将如图7所示文本灰度图进行颜色翻转,即将每个像素点的像素值设置为255减去当前像素值,实现该文本灰度图中所有的像素值为255的白色部分转换为像素值为0的黑色部分,获得的文本灰度图如图8所示。
对图8进行每排像素值之和的统计。该文本灰度图为文字横向排列的文本 灰度图,统计每一列的像素值之和。获得的统计结果可以如图9所示。在如图9所示的统计结果中,纵坐标表示像素值之和;横坐标表示文字排列方向上的每一个像素点位置,或者说表示第几列。例如,图9中的横坐标100对应的纵坐标值可以表示第100列像素点的像素值之和。
确定各个像素值之和是否在第一像素值区间内。本申请实施例以5至40作为第一像素值区间为例。可以对图9所示的统计结果中,从左至右依次确认各个纵坐标值是否在5至40的范围内。确定在该范围内的连续的纵坐标对应的横坐标,将确定的横坐标对应的像素列连起来,作为在第一像素值区间内的像素值之和对应的像素点形成的连通域。
另外,还可以设置空格的宽度区间。例如,将字符宽度的三分之一到字符宽度的二分之一确定为空格的宽度区间。而字符宽度的确定可以参见前述实施例。
根据空格的宽度区间,将根据第一像素值区间确定的连通域中,在空格的宽度区间内的连通域确定为空格,而不在空格的宽度区间内的连通域确定为非空格。从而可以通过该文本行中的空格识别方法,有效确定文本行中的空格。
该空格识别方法可以用于OCR光学字符识别中,在检测到文本之后,然后检测出单行文本。使用该方法,提取出空格的位置,将分割开的单词送入OCR识别模块进行一个一个单词的识别,之后再使用空格将识别的各个单词连接起来,可以得到最终的带有空格的完整的单行文本,防止得到所有单词连在一起,没有空格,人类无法识别的文本。
本申请实施例还提供了一种文本行中的空格识别装置300,如图10所示,该装置300包括:图片获取模块310,用于获取文本灰度图,所述文本灰度图中仅包括单行文本;像素值获取模块320,用于计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;空格确定模块330,用于将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
可选的,该装置还可以包括分割模块,包括分割像素值确定单元,用于获取所述文本灰度图中的分割像素值;像素值设置单元,用于若所述文本灰度图中背景的像素值大于所述分割像素值,将像素值大于所述分割像素值的像素点设置为第一像素值,所述第一像素值大于或等于背景的像素值;若所述文本灰度图中背景的像素值小于所述分割像素值,将像素值小于所述分割像素值的像素点设置为第二像素值,所述第二像素值小于或等于所述背景的像素值。
可选的,分割像素值确定单元,用于以像素值为横坐标,像素点数量为纵坐标,获取像素值以及每个像素值对应的像素点数量之间的拟合曲线;获取所述拟合曲线中所有极大值中的最大值;获取与所述最大值相邻的一 个极小值;以获取的极小值对应的像素值作为所述分割像素值。
可选的,该装置还可以包括去噪模块,用于对所述文本灰度图进行闭操作。
可选的,空格确定模块330可以用于获取所述文本灰度图中的字符宽度;根据所述字符宽度设置空格的宽度区间;将在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度在所述宽度区间内的连通域,确定为所述单行文本中的空格。
可选的,空格确定模块330可以用于获取所述文本灰度图中每个字符的宽度;以获取到的所有字符的宽度中的中位数,作为所述字符宽度。
可选的,空格确定模块330可以用于将在第二像素值区间内的像素值之和对应的像素点形成的每个连通域的宽度,作为单个字符的宽度,所述第二像素值区间与所述第一像素值区间不同,所述第二像素值区间为文本灰度图中字符对应的像素值之和所在的区间。
可选的,若所述文本灰度图中背景的像素值大于预设像素值,所述第一像素值区间与所述第二像素值区间有交叉,且所述第一像素值区间的最小像素值大于所述第二像素值区间的最小像素值;若所述文本灰度图中背景的像素值小于或等于预设像素值,所述第一像素值区间与所述第二像素值区间有交叉,且所述第一像素值区间的最小像素值小于所述第二像素值区间的最小像素值。
可选的,该装置还可以包括均衡化模块,用于对所述文本灰度图进行均衡化处理。
本申请实施例提供的文本行中的空格识别方法及装置,通过智能寻找最佳分割点,可以有效的提取文本中的空格位置。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述的各个方法实施例之间可以相互参照;上述描述装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。各个模块可以配置在不同的电子设备中,也可以配置在相同的电子设备中,本申请实施例并不限定。
请参考图11,其示出了本申请实施例提供的一种电子设备500的结构框图。该电子设备可以包括一个或多个处理器510(图中仅示出一个),存储器520以及一个或多个程序。其中,所述一个或多个程序被存储在所述存储器520中,并被配置为由所述一个或多个处理器510执行。所述一个或多个程序被处理器执行用于执行前述实施例所描述的方法。
处理器510可以包括一个或者多个处理核。处理器510利用各种接口 和线路连接整个电子设备500内的各个部分,通过运行或执行存储在存储器520内的指令、程序、代码集或指令集,以及调用存储在存储器520内的数据,执行电子设备500的各种功能和处理数据。可选地,处理器510可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器510可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器510中,单独通过一块通信芯片进行实现。
存储器520可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器520可用于存储指令、程序、代码、代码集或指令集。存储器520可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令、用于实现上述各个方法实施例的指令等。存储数据区还可以电子设备在使用中所创建的数据等。
请参考图12,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读存储介质700中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质700可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质700包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质700具有执行上述方法中的任何方法步骤的程序代码710的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码710可以例如以适当形式进行压缩。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种文本行中的空格识别方法,其特征在于,所述方法包括:
    获取文本灰度图,所述文本灰度图中仅包括单行文本;
    计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;
    将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
  2. 根据权利要求1所述的方法,其特征在于,所述计算所述文本灰度图中预设方向上的每一排像素点的像素值之和之前,还包括:
    获取所述文本灰度图中的分割像素值;
    若所述文本灰度图中背景的像素值大于所述分割像素值,将像素值大于所述分割像素值的像素点设置为第一像素值,所述第一像素值大于或等于背景的像素值;
    若所述文本灰度图中背景的像素值小于所述分割像素值,将像素值小于所述分割像素值的像素点设置为第二像素值,所述第二像素值小于或等于所述背景的像素值。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述文本灰度图中的分割像素值,包括:
    以像素值为横坐标,像素点数量为纵坐标,获取像素值以及每个像素值对应的像素点数量之间的拟合曲线;
    获取所述拟合曲线中所有极大值中的最大值;
    获取与所述最大值相邻的一个极小值;
    以获取的极小值对应的像素值作为所述分割像素值。
  4. 根据权利要求3所述的方法,其特征在于,所述获取与所述最大值相邻的一个极小值,包括:
    若所述文本灰度图中背景的像素值大于预设像素值,获取与所述最大值相邻的且在所述最大值左侧的极小值;
    若所述文本灰度图中背景的像素值小于或等于所述预设像素值时,获取与所述最大值相邻的且在所述最大值右侧的极小值。
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述背景的像素值为所述文本灰度图中四角的平均像素值、所述四角中一角或多角的平均像素值、所述四角中对应像素点最多的像素值、以及所述四角中一角或多角中对应像素点最多的像素值中的任意一种。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述计算所述文本灰度图中预设方向上的每一排像素点的像素值之和之前,还包括:
    对所述文本灰度图进行闭操作。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述将在第一 像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,包括:
    获取所述文本灰度图中的字符宽度;
    根据所述字符宽度设置空格的宽度区间;
    将在第一像素值区间内的像素值之和对应的像素点形成的连通域中,宽度在所述宽度区间内的连通域,确定为所述单行文本中的空格。
  8. 根据权利要求7所述的方法,其特征在于,所述获取所述文本灰度图中的字符宽度,包括:
    获取所述文本灰度图中每个字符的宽度;
    以获取到的所有字符的宽度中的中位数,作为所述字符宽度。
  9. 根据权利要求7所述的方法,其特征在于,所述获取所述文本灰度图中的字符宽度,包括:
    获取所述文本灰度图中每个字符的宽度;
    以获取到的所有字符的宽度中的平均数,作为所述字符宽度。
  10. 根据权利要求8或9所述的方法,其特征在于,所述获取所述文本灰度图中每个字符的宽度,包括:
    将在第二像素值区间内的像素值之和对应的像素点形成的每个连通域的宽度,作为单个字符的宽度,所述第二像素值区间与所述第一像素值区间不同,所述第二像素值区间为文本灰度图中字符对应的像素值之和所在的区间。
  11. 根据权利要求10所述的方法,其特征在于,若所述文本灰度图中背景的像素值大于预设像素值,所述第一像素值区间与所述第二像素值区间有交叉,且所述第一像素值区间的最小像素值大于所述第二像素值区间的最小像素值;
    若所述文本灰度图中背景的像素值小于或等于预设像素值,所述第一像素值区间与所述第二像素值区间有交叉,且所述第一像素值区间的最小像素值小于所述第二像素值区间的最小像素值。
  12. 根据权利要求7至11任一项所述的方法,其特征在于,所述根据所述字符宽度设置空格的宽度区间,包括:
    以所述字符宽度乘以第一比例,得到第一数值;
    以所述字符宽度乘以第二比例,得到第二数值;
    将所述第一数值和所述第二数值形成的区间,作为空格的宽度区间。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述计算所述文本灰度图中预设方向上的每一排像素点的像素值之和之前,还包括:
    对所述文本灰度图进行均衡化处理。
  14. 根据权利要求1至13任一项所述的方法,其特征在于,所述计算所述文本灰度图中预设方向上的每一排像素点的像素值之和之前,还包括:
    对所述文本灰度图进行去噪处理。
  15. 根据权利要求1至14任一项所述的方法,其特征在于,所述获取文本灰度图,包括:
    从文本图片中进行单行文本提取,获得仅包括单行文本的单行文本图片;
    对所述单行文本图片进行灰度化处理,得到文本灰度图。
  16. 根据权利要求1至15任一项所述的方法,其特征在于,所述第一像素值区间与所述文本灰度图的高度对应。
  17. 根据权利要求1至16任一项所述的方法,其特征在于,所述在第一像素值区间内的像素值之和对应的像素点形成的连通域,包括:
    当检测到在第一像素值区间内的像素值之和时,作为连通域的开始,并在同一方向上继续检测像素值之和是否在所述第一像素值区间内;
    当检测到不在所述第一像素值区间内的像素值之和时,确定前一个像素值之和对应所述连通域的结束。
  18. 一种文本行中的空格识别装置,其特征在于,所述装置包括:
    图片获取模块,用于获取文本灰度图,所述文本灰度图中仅包括单行文本;
    像素值获取模块,用于计算所述文本灰度图中预设方向上的每一排像素点的像素值之和,所述预设方向为与所述单行文本中的文字排列方向垂直的方向;
    空格确定模块,用于将在第一像素值区间内的像素值之和对应的像素点形成的连通域,作为所述单行文本中的空格,所述第一像素值区间为文本灰度图中空格对应的像素值之和所在的区间。
  19. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个程序,其中所述一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序被所述处理器执行用于执行如权利要求1-17任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-17任一项所述的方法。
PCT/CN2021/074886 2020-03-23 2021-02-02 文本行中的空格识别方法、装置、电子设备及存储介质 WO2021190155A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010231850.8 2020-03-23
CN202010231850.8A CN111461126B (zh) 2020-03-23 2020-03-23 文本行中的空格识别方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021190155A1 true WO2021190155A1 (zh) 2021-09-30

Family

ID=71683306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074886 WO2021190155A1 (zh) 2020-03-23 2021-02-02 文本行中的空格识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111461126B (zh)
WO (1) WO2021190155A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926839A (zh) * 2022-07-22 2022-08-19 富璟科技(深圳)有限公司 基于rpa和ai的图像识别方法及电子设备
CN115497109A (zh) * 2022-11-17 2022-12-20 山东思玛特教育科技有限公司 基于智能翻译的文字图像预处理方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461126B (zh) * 2020-03-23 2023-08-18 Oppo广东移动通信有限公司 文本行中的空格识别方法、装置、电子设备及存储介质
CN113780265B (zh) * 2021-09-16 2023-12-15 平安科技(深圳)有限公司 英文单词的空格识别方法、装置、存储介质及计算机设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050249391A1 (en) * 2004-05-10 2005-11-10 Mediguide Ltd. Method for segmentation of IVUS image sequences
CN104298982A (zh) * 2013-07-16 2015-01-21 深圳市腾讯计算机系统有限公司 一种文字识别方法及装置
CN106295630A (zh) * 2016-07-21 2017-01-04 北京小米移动软件有限公司 字符识别方法及装置
CN109543770A (zh) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 点阵字符识别方法及装置
CN109726722A (zh) * 2018-12-20 2019-05-07 上海众源网络有限公司 一种字符分割方法及装置
CN111461126A (zh) * 2020-03-23 2020-07-28 Oppo广东移动通信有限公司 文本行中的空格识别方法、装置、电子设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940799B (zh) * 2016-01-05 2020-07-24 腾讯科技(深圳)有限公司 文本图像处理方法和装置
CN106940800B (zh) * 2016-01-05 2021-01-05 深圳友讯达科技股份有限公司 计量仪表读数识别方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050249391A1 (en) * 2004-05-10 2005-11-10 Mediguide Ltd. Method for segmentation of IVUS image sequences
CN104298982A (zh) * 2013-07-16 2015-01-21 深圳市腾讯计算机系统有限公司 一种文字识别方法及装置
CN106295630A (zh) * 2016-07-21 2017-01-04 北京小米移动软件有限公司 字符识别方法及装置
CN109543770A (zh) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 点阵字符识别方法及装置
CN109726722A (zh) * 2018-12-20 2019-05-07 上海众源网络有限公司 一种字符分割方法及装置
CN111461126A (zh) * 2020-03-23 2020-07-28 Oppo广东移动通信有限公司 文本行中的空格识别方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926839A (zh) * 2022-07-22 2022-08-19 富璟科技(深圳)有限公司 基于rpa和ai的图像识别方法及电子设备
CN115497109A (zh) * 2022-11-17 2022-12-20 山东思玛特教育科技有限公司 基于智能翻译的文字图像预处理方法

Also Published As

Publication number Publication date
CN111461126A (zh) 2020-07-28
CN111461126B (zh) 2023-08-18

Similar Documents

Publication Publication Date Title
WO2021190155A1 (zh) 文本行中的空格识别方法、装置、电子设备及存储介质
US10896349B2 (en) Text detection method and apparatus, and storage medium
US10817741B2 (en) Word segmentation system, method and device
US20180157927A1 (en) Character Segmentation Method, Apparatus and Electronic Device
WO2020140698A1 (zh) 表格数据的获取方法、装置和服务器
EP3117369B1 (en) Detecting and extracting image document components to create flow document
US9275030B1 (en) Horizontal and vertical line detection and removal for document images
WO2023284502A1 (zh) 图像处理方法、装置、设备和存储介质
CN105469027A (zh) 针对文档图像的水平和垂直线检测和移除
KR102472821B1 (ko) 혼합 조판 문자를 인식하는 방법, 기기, 칩 회로 및 컴퓨터 프로그램 제품
Al Abodi et al. An effective approach to offline Arabic handwriting recognition
CN114863492B (zh) 一种低质量指纹图像的修复方法及修复装置
WO2013136546A1 (ja) 画像処理装置、及び画像処理方法
CN108520263B (zh) 一种全景图像的识别方法、系统及计算机存储介质
CN112101386A (zh) 文本检测方法、装置、计算机设备和存储介质
JP4565396B2 (ja) 画像処理装置および画像処理プログラム
RU2453919C1 (ru) Способ выявления спама в растровом изображении
CN113449726A (zh) 文字比对及识别方法、装置
CN114120305B (zh) 文本分类模型的训练方法、文本内容的识别方法及装置
JP6883199B2 (ja) 画像処理装置、画像読み取り装置、および、プログラム
CN113343866A (zh) 表格信息的识别方法及装置、电子设备
CN114648751A (zh) 一种处理视频字幕的方法、装置、终端及存储介质
JP4244692B2 (ja) 文字認識装置及び文字認識プログラム
JP2020119291A (ja) 情報処理装置及びプログラム
US20220406083A1 (en) Image processing apparatus, control method thereof, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21775871

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21775871

Country of ref document: EP

Kind code of ref document: A1