WO2020063314A1 - 字符切分识别方法、装置、电子设备、存储介质 - Google Patents

字符切分识别方法、装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2020063314A1
WO2020063314A1 PCT/CN2019/104931 CN2019104931W WO2020063314A1 WO 2020063314 A1 WO2020063314 A1 WO 2020063314A1 CN 2019104931 W CN2019104931 W CN 2019104931W WO 2020063314 A1 WO2020063314 A1 WO 2020063314A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
image
pixel
segmentation
characters
Prior art date
Application number
PCT/CN2019/104931
Other languages
English (en)
French (fr)
Inventor
蔡小龙
刘永强
桂晨光
邓超
王超
Original Assignee
京东数字科技控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东数字科技控股有限公司 filed Critical 京东数字科技控股有限公司
Publication of WO2020063314A1 publication Critical patent/WO2020063314A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the technical field of computer applications, and in particular, to a method, a device, an electronic device, and a storage medium for character segmentation recognition.
  • OCR Optical Character Recognition
  • Character segmentation algorithms commonly used today include character segmentation based on connected domains and character segmentation based on fixed character widths.
  • Character recognition algorithms commonly used today include character recognition algorithms based on statistical machine learning.
  • the present disclosure provides a character segmentation recognition method, device, electronic device, and storage medium, and at least to some extent overcomes one or more problems caused by the limitations and defects of related technologies .
  • a character segmentation recognition method including:
  • characters in the image to be recognized are identified.
  • the pixel value is a preset pixel value
  • character segmentation of the to-be-recognized image including at least one line of characters includes: :
  • Segmenting the image to be identified according to the number of pixels on each pixel row of the pixel array of the image to be identified with a pixel value of a preset pixel value and a first preset threshold To get at least one character line;
  • Characters are split on a character line.
  • Recognizing an image and performing line segmentation to obtain at least one character line includes:
  • the number of pixels on a pixel row of a pixel array of the image to be identified is a preset pixel value, and the number of pixels is less than or equal to a first preset threshold, marking the pixel row as a quasi-slicable branch;
  • a quasi-cuttable branch in which at most one pixel of the two pixel rows adjacent to the quasi-cuttable branch acts as a quasi-cuttable branch is marked as a slicable branch;
  • character segmentation for each character line includes:
  • the pixel column is marked as a quasi-slicable column.
  • the character segmentation of the image to be recognized according to the width of different types of characters includes:
  • the quasi-slicable columns Traverse the quasi-slicable columns to determine whether the number of quasi-slicable columns is greater than or equal to two out of the four pixel columns of the first and second widths from the quasi-slicable column in the direction of the pixel row.
  • the first width and the second width depend on the width of different types of characters;
  • the character segmentation of the image to be recognized according to the width of different types of characters further includes:
  • the second preset threshold is increased, and the character segmentation and the correction of the character segmentation are performed again.
  • the pixel value is a preset pixel value, before character segmentation is performed on the image to be identified that contains at least one line of characters Also includes:
  • the binary image is used to count each pixel row and each pixel column of the pixel array, and the pixel value is preset The number of pixel points of the pixel value and the position where the character is segmented.
  • the grayscale image is used to perform segmentation according to the position where the character is segmented to obtain a plurality of segmented character images.
  • the sample image is augmented with data by the following steps: one or more of a font, a font size, and a character gray value of the character are randomly set.
  • sample image is further augmented with data by the following steps:
  • One or more of a rotation angle, a radiation amplitude, a perspective angle, an interference line, and a filtering type of a sample image having characters are randomly set.
  • the classifier is a character classifier based on a convolutional neural network.
  • a character segmentation and recognition apparatus including:
  • the segmentation module is configured to perform character segmentation on an image to be identified that includes at least one line of characters according to the number of pixels in each pixel row and each pixel column of the pixel array of the image to be identified;
  • the segmentation correction module is configured to correct the character segmentation of the image to be recognized according to the width of different types of characters
  • a classification module configured to input the segmented to-be-recognized image into a classifier trained on a character sample set, the character sample set including a data augmented sample image;
  • the recognition module is configured to recognize characters in the image to be recognized according to an output of the classifier.
  • an electronic device including: a processor; a storage medium having a computer program stored thereon, which is executed by the processor when the computer program is executed step.
  • a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps described above are performed.
  • the character segmentation of the image to be recognized is modified by the width of different types of characters so that the character segmentation recognition method provided by the present disclosure can accurately segment different types of characters (such as Chinese characters, English characters, numeric characters, etc.);
  • the present disclosure trains a classifier by making a character sample set of sample images having multiple characters for each character, thereby increasing the number of characters recognizable by the classifier and providing a recognition accuracy rate for complex characters.
  • the present disclosure is particularly applicable to the recognition of printed characters, and the recognition accuracy of printed characters is particularly improved.
  • FIG. 1 is a flowchart of a character segmentation recognition method according to an embodiment of the present disclosure.
  • FIG. 2 is a flowchart of character segmentation according to an exemplary embodiment of the present disclosure.
  • FIG. 3 is a flowchart of row segmentation according to an exemplary embodiment of the present disclosure.
  • FIG. 4 is a flowchart of character segmentation and modification according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a flowchart of automatically generating a character sample set according to an exemplary embodiment of the present disclosure.
  • FIG. 6 is a block diagram of a character segmentation recognition apparatus according to an embodiment of the present disclosure.
  • FIG. 7 is a block diagram of a character segmentation recognition apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an electronic device in an exemplary embodiment of the present disclosure.
  • FIG. 1 is a flowchart of a character segmentation recognition method according to an embodiment of the present disclosure.
  • the character segmentation recognition method includes the following steps:
  • Step S110 perform character segmentation on the image to be identified that includes at least one line of characters according to the number of pixels in each pixel row and each pixel column of the pixel array of the image to be identified;
  • Step S120 Correct the character segmentation of the image to be recognized according to the width of different types of characters
  • Step S130 input the segmented to-be-recognized image into a classifier trained on a character sample set, where the character sample set includes data augmented sample images;
  • Step S140 Recognize characters in the image to be recognized according to the output of the classifier.
  • the character segmentation of the image to be recognized is modified by the width of different types of characters so that the character segmentation recognition method provided by the present disclosure can accurately segment different Types of characters (such as Chinese characters, English characters, numeric characters, etc.); on the other hand, the present disclosure trains a classifier by character sample sets of sample images with each character having multiple different attributes, thereby increasing the classifier's recognition The number of characters and provides the accuracy of recognition of complex characters.
  • the present disclosure is particularly applicable to the recognition of printed characters, and the recognition accuracy of printed characters is particularly improved.
  • the method before step S110, the method further includes a step of preprocessing the image to be identified.
  • the image preprocessing may include one or more of the following steps: performing grayscale processing on the image to be identified; performing Gaussian filtering on the image to be identified; and performing local adaptive binarization on the image to be identified.
  • the gray value calculation is performed on each pixel to obtain a gray image.
  • a Gaussian filter with a kernel size of 3 * 3 (3 pixels * 3 pixels) is used for image smoothing and noise reduction.
  • performing Gaussian filtering on an image to be identified is actually performing Gaussian filtering on a grayscale image.
  • the pixel neighborhood size is set to 9 * 9 (9 pixels * 9 pixels), and the threshold value of the pixels (greater than or equal to the threshold value) is calculated using the Gaussian weighting method. It is unified to one value (displayed as black), and less than the threshold is unified to another value (displayed as white)), thereby obtaining a binary image.
  • performing local adaptive binarization on the image to be identified is actually performing local adaptive binarization on the filtered gray image.
  • a grayscale image (a Gaussian filtered grayscale image) and a binary image with clear separation between text and background, where the binary image is used in steps S110 and In step S120, images of pixel points and segmentation positions are counted.
  • FIG. 2 is a flowchart of character segmentation (ie, step S110 in FIG. 1) according to an exemplary embodiment of the present disclosure.
  • the pixel value is a preset pixel value.
  • Character segmentation includes the following steps:
  • Step S210 According to the number of pixels on each pixel row of the pixel array of the image to be identified, the pixel value is a preset pixel value (for example, the number of pixels whose pixels are displayed as black, and the preset pixel value enables the Pixels are displayed in black), and compared with a first preset threshold, line segmentation of the image to be recognized to obtain at least one character line;
  • a preset pixel value for example, the number of pixels whose pixels are displayed as black, and the preset pixel value enables the Pixels are displayed in black
  • Step S210 a comparison between the number of pixels with pixel values in each pixel column of the pixel array of the character line according to the segmented character line and a second preset threshold value , Character segmentation for each character line.
  • the image to be identified may actually be a pre-processed binary image.
  • FIG. 3 is a flowchart of line segmentation (step S210 in FIG. 2) according to an exemplary embodiment of the present disclosure.
  • step S210 the number of pixels on each pixel row of the pixel array of the image to be identified, the pixel value of which is a preset pixel value, and a first preset threshold
  • performing line segmentation on the image to be identified to obtain at least one character line includes the following steps:
  • Step S310 when the number of pixels on a pixel row of the pixel array of the image to be identified has a preset pixel value and less than or equal to a first preset threshold, mark the pixel row as a quasi-slicable branch;
  • Step S320 For each quasi-cuttable branch, a quasi-cuttable branch in which at most one pixel of two pixel rows adjacent to the quasi-cuttable branch acts as a quasi-cuttable branch is marked as a slicable branch;
  • Step S330 Perform line segmentation on the to-be-recognized image according to the segmentable lines to obtain at least one character line.
  • the first preset threshold may be n ⁇ the number of pixels in a pixel row.
  • n can take 0.02 or other constants greater than 0 and close to 0.
  • the above step S320 considers that there are usually a plurality of adjacent quasi-cuttable lines between character lines, and the attributes of adjacent pixel lines of each quasi-cuttable line may be judged to be adjacent to the text.
  • the quasi-slicable branches are marked as slicable branches for line segmentation. Blank lines and character lines are obtained through line segmentation.
  • the above steps may also include the steps of removing blank lines and retaining character lines. Thus, through the step of line segmentation, (binarized) character lines and height information of the character lines can be obtained.
  • FIG. 4 is a flowchart of character segmentation and modification (step S220 in FIG. 2 and step S120 in FIG. 1) according to an exemplary embodiment of the present disclosure.
  • step S220 is based on each character line segmented, according to the number of pixels in each pixel column of the pixel array of the character line, the pixel value of which is a preset pixel value, and a second preset Set the comparison between the thresholds and perform character segmentation for each character line including:
  • Step S410 When the number of pixels on each pixel column of the pixel array of the character row whose pixel value is a preset pixel value is less than a second preset threshold, the pixel column is marked as a quasi-slicable column.
  • the step S120 of correcting the character segmentation of the image to be recognized according to the width of different types of characters includes:
  • S420 Determine whether the number of quasi-slicable columns is greater than or equal to two among the four pixel columns that are first width and second width from the quasi-slicable column in the direction of the pixel row.
  • the second width depends on the width of different types of characters
  • step S430 executes step S430 to retain the mark of the quasi-slicable column.
  • step S440 is executed to delete the mark of the quasi-divisible column.
  • Step S450 Determine whether all quasi-slicable columns have been traversed.
  • step S420 to step S440 are performed on the next quasi-slicable column.
  • step S460 traverse the filtered quasi-slicable columns to determine whether the distance between adjacent quasi-slicable columns is less than or equal to s times the height of a character line, and s is greater than 1 and less than 2 constant;
  • step S470 mark each quasi-slicable column as a slicable column, and slice each character line according to the slicable column;
  • step S480 the second preset threshold is increased, and the character segmentation and the character segmentation correction in steps S410 to S470 are performed again.
  • the second preset threshold may be, for example, m ⁇ the number of pixel points of a pixel column in a character row m may be a constant greater than or equal to 0.01 and less than or equal to 0.2, and m may be initially taken as 0.01.
  • the width of different types of characters has their own settings. For example, the width and height of Chinese characters are approximately the same, and the English characters are the same. The width of the number is approximately one-half the height.
  • the traversable columns are aligned and traversed to determine the width of one English character (numeric character) and one Chinese character on both sides of the shardable column.
  • the position of the total width is whether the four pixel columns are quasi-slicable columns. If the quasi-slicable columns are equal to two among the four pixel columns, the mark of the quasi-slicable columns is retained, otherwise delete the quasi-slicable columns.
  • the label of the column (equivalent to an ordinary pixel column). Then, traverse the quasi-slicable column again through step S460.
  • the quasi-slicable column is used as the shardable column to split the character Otherwise, increase the second preset threshold and perform the character segmentation step and the segment correction step again.
  • the quasi-slicable columns are initially selected according to the screening steps of the quasi-slicable branches.
  • the character segmentation position is determined, and the corresponding position on the gray image is segmented to obtain a single character gray image.
  • the sample image is augmented with data by the following steps: one or more of a font, a font size, and a character gray value of the character are randomly set.
  • the sample image is further augmented with data by randomly setting one or more of a rotation angle, a radiation amplitude, a perspective angle, an interference line, and a filtering type of the sample image with characters. . See FIG. 5, which is a flowchart of automatically generating a character sample set according to an exemplary embodiment of the present disclosure.
  • a large number of gray image samples can be automatically generated for training the classifier.
  • Select multiple commonly used Chinese and English numbers such as 6793 characters
  • make multiple samples for each character for example, set each character to make 1000 samples, and for example, generate more sample images for complex Chinese characters, for simple Chinese, English, and numeric characters generate relatively few sample images to reduce the system load).
  • the generation steps of each sample are shown in Figure 5:
  • Step S510 Create a pure white image.
  • Step S520 randomly select a commonly used font (such as Song style, imitation Song style, bold style, etc., which is not limited in this disclosure).
  • a commonly used font such as Song style, imitation Song style, bold style, etc., which is not limited in this disclosure.
  • Step S530 randomly select a font size (for example, 24-48, which is not limited in this disclosure).
  • Step S540 Randomly select a character gray value.
  • Step S550 Write characters in the pure white image.
  • Step S560 Randomly select rotation, radiation, and perspective angles.
  • Step S570 Trim the edges of the characters, leaving only the rectangular character area.
  • Step S580 Randomly add interference lines (such as interference lines).
  • Step S590 Perform mean filtering and Gaussian filtering randomly.
  • the prepared single character gray image is used to generate character sample sets for training character classifiers.
  • the classifier described in the present disclosure is a character classifier based on a convolutional neural network.
  • the input size of the character classifier is normalized to 32 * 32 (32 pixels * 32 pixels), the normalization method is bicubic interpolation, and the output is the confidence of each character category (In this exemplary embodiment, there are 6793 characters in total, and the confidence value ranges from 0 to 1).
  • the loss function of the character classifier based on the convolutional neural network is set as the cross-entropy loss.
  • the optimizer of the character classifier based on the convolutional neural network is set to the Adam optimizer, and the initial learning rate is set to 0.001.
  • the training data is data generated by a character sample generator, and each character includes, for example, 1000 samples.
  • the batch size (Batch Size) of the batch training can be set to 32. Use Early Stopping technology to stop training. In actual tests, using the above training methods to train a character classifier, the accuracy of the training set after training stops can reach 99.2%, and the accuracy of real samples can reach 97.6%.
  • FIG. 6 is a block diagram of a character segmentation recognition apparatus according to an embodiment of the present disclosure.
  • the character segmentation and recognition apparatus 600 includes a segmentation module 610, a segmentation correction module 620, a classification module 630, and a recognition module 640.
  • the segmentation module 610 is configured to perform character segmentation on an image to be identified including at least one line of characters according to the number of pixels in each pixel row and each pixel column of the pixel array of the image to be identified, the pixel value being a preset pixel value;
  • the segmentation correction module 620 is configured to correct the segmentation of characters of the image to be recognized according to the width of different types of characters;
  • the classification module 630 is configured to input the segmented to-be-recognized image into a classifier trained on a character sample set, where the character sample set includes data-enhanced sample images;
  • the recognition module 640 is configured to recognize characters in the image to be recognized according to an output of the classifier.
  • the character segmentation of the image to be recognized is modified by the width of different types of characters so that the character segmentation recognition method provided by the present disclosure can accurately segment different Types of characters (such as Chinese characters, English characters, numeric characters, etc.); on the other hand, the present disclosure trains a classifier by character sample sets of sample images with each character having multiple different attributes, thereby increasing the classifier's recognition The number of characters and provides the accuracy of recognition of complex characters.
  • the present disclosure is particularly applicable to the recognition of printed characters, and the recognition accuracy of printed characters is particularly improved.
  • FIG. 7 is a block diagram of a character segmentation recognition apparatus according to an exemplary embodiment of the present disclosure.
  • the character segmentation and recognition apparatus 700 includes a preprocessing module 710, a segmentation module 720, a segmentation correction module 730, a character sample set generation module 740, a classification module 750, and a recognition module 760.
  • the functions of the segmentation module 720, the segmentation correction module 730, the classification module 750, and the identification module 760 are the same as the segmentation module 610, the segmentation correction module 620, the classification module 630, and the identification module 640 shown in FIG. Different from FIG.
  • the preprocessing module 710 is configured to preprocess the image to be identified to obtain a grayscale image and a binarized image, where the binarized image is used to count each pixel of the pixel array.
  • the grayscale image is used to perform segmentation according to the character segmentation position to obtain a segmented segment.
  • the character sample set generation module 740 is configured to randomly set various attributes of the character for each character to automatically generate a sample image of the character
  • FIG. 7 only schematically illustrates an embodiment of the present disclosure, and the present disclosure is not limited thereto.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by, for example, a processor, the character segmentation recognition described in any one of the foregoing embodiments can be implemented Method steps.
  • aspects of the present disclosure may also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the program product
  • the terminal device executes the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription circulation processing method section of this specification.
  • a program product 800 for implementing the above method according to an embodiment of the present disclosure is described, which may adopt a portable compact disc read-only memory (CD-ROM) and include program code, and may be implemented in a terminal device. For example running on a personal computer.
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer-readable storage medium may include a data signal in baseband or propagated as part of a carrier wave, in which a readable program code is carried. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable storage medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device.
  • the program code contained on the readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wired, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, which include object-oriented programming languages such as Java, C ++, and the like, as well as conventional procedural Programming language—such as "C" or a similar programming language.
  • the program code can be executed entirely on the tenant computing device, partially on the tenant device, as a standalone software package, partially on the tenant computing device, partially on the remote computing device, or entirely on the remote computing device or server On.
  • the remote computing device may be connected to the tenant computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider) To connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider an external computing device
  • an electronic device which may include a processor, and a memory for storing executable instructions of the processor.
  • the processor is configured to execute the steps of the electronic prescription circulation processing method in any one of the foregoing embodiments by executing the executable instructions.
  • FIG. 9 An electronic device 900 according to such an embodiment of the present disclosure is described below with reference to FIG. 9.
  • the electronic device 900 shown in FIG. 9 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiments of the present disclosure.
  • the electronic device 900 is expressed in the form of a general-purpose computing device.
  • the components of the electronic device 900 may include, but are not limited to, at least one processing unit 910, at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), a display unit 940, and the like.
  • the storage unit stores program code, and the program code can be executed by the processing unit 910, so that the processing unit 910 executes various exemplary embodiments according to the present disclosure described in the above-mentioned electronic prescription circulation processing method section of this specification. Steps of the implementation.
  • the processing unit 910 may perform the steps shown in FIG. 1.
  • the storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 9201 and / or a cache storage unit 9202, and may further include a read-only storage unit (ROM) 9203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 920 may further include a program / utility tool 9204 having a group (at least one) of program modules 9205.
  • program modules 6205 include, but are not limited to, an operating system, one or more application programs, other program modules, and programs. Data, each or some combination of these examples may include an implementation of the network environment.
  • the bus 930 may be one or more of several types of bus structures, including a memory unit bus or a memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure in a variety of bus structures bus.
  • the electronic device 900 may also communicate with one or more external devices 1000 (such as a keyboard, pointing device, Bluetooth device, etc.), and may also communicate with one or more devices that enable tenants to interact with the electronic device 900, and / or with Any device (eg, router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. This communication can be performed through an input / output (I / O) interface 950.
  • the electronic device 900 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through the network adapter 960.
  • the network adapter 960 may communicate with other modules of the electronic device 900 through the bus 930.
  • the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a U disk, a mobile hard disk, etc.) or on a network , Including several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the electronic prescription circulation processing method according to the embodiment of the present disclosure.
  • a non-volatile storage medium which may be a CD-ROM, a U disk, a mobile hard disk, etc.
  • a computing device which may be a personal computer, a server, or a network device, etc.
  • the character segmentation of the image to be recognized is modified by the width of different types of characters so that the character segmentation recognition method provided by the present disclosure can accurately segment different types of characters (such as Chinese characters, English characters, numeric characters, etc.);
  • the present disclosure trains a classifier by making a character sample set of sample images having multiple characters for each character, thereby increasing the number of characters recognizable by the classifier and providing a recognition accuracy rate for complex characters.
  • the present disclosure is particularly applicable to the recognition of printed characters, and the recognition accuracy of printed characters is particularly improved.
  • the embodiments of the present disclosure modify the character segmentation of the image to be recognized through the width of different types of characters, so that the character segmentation recognition method provided by the present disclosure can accurately segment different types of characters (e.g. Chinese characters, English characters, numeric characters, etc.) .
  • the embodiments of the present disclosure train a classifier by using a character sample set of a sample image in which each character has a plurality of different attributes, thereby increasing the number of characters that the classifier can recognize and providing a recognition accuracy rate for complex characters.
  • the present disclosure is particularly applicable to the recognition of printed characters, and the recognition accuracy of printed characters is particularly improved.

Abstract

一种字符切分识别方法、装置、电子设备、存储介质,字符切分识别方法包括:依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;依据不同类型字符的宽度对待识别图像的字符切分的修正;将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及根据所述分类器的输出,识别所述待识别图像中的字符。本公开实现不同字符类型的准确字符切分,并基于字符样本集和分类器提高字符识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。

Description

字符切分识别方法、装置、电子设备、存储介质 技术领域
本公开涉及计算机应用技术领域,尤其涉及一种字符切分识别方法、装置、电子设备、存储介质。
背景技术
OCR(Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,然后用字符识别方法将形状翻译成计算机文字的过程。为了实现这个目的,一般分为字符切分和字符识别两个过程。目前常用的字符切分算法包括基于连通域的字符切分法及基于固定字符宽度的字符切分法。目前常用的字符识别算法包括基于统计机器学习的字符识别算法。
然而,相关的字符切分和识别算法具有如下缺陷:
1)支持字符集较小。由于分类器的选择和具体设计问题,往往只能识别数十或数百个字符。
2)不支持中英文结合。由于中文和英文字符的宽度不同,在切分和中英文结合的文字时往往会出现错切和漏切的现象。
3)不能识别复杂汉字。由于复杂汉字与简单汉字的特征差异较大,导致分类器往往只能支持简单汉字的识别。
4)总体识别率低。切分和识别两个环节都扮演了重要的角色,任何一项出了问题都会极大的影响识别率。
发明内容
本公开为了克服上述相关技术存在的缺陷,提供一种字符切分识别方法、装置、电子设备、存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的一个或者多个问题。
根据本公开的一个方面,提供一种字符切分识别方法,包括:
依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
依据不同类型字符的宽度对待识别图像的字符切分的修正;
将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
根据所述分类器的输出,识别所述待识别图像中的字符。
可选地,所述依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分包括:
依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数 量,与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行;
依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分。
可选地,所述依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数量,与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行包括:
当所述待识别图像的像素阵列的像素行上的,像素值为预设像素值的像素点数量,小于等于第一预设阈值,则将该像素行标记为准可切分行;
对各准可切分行,与该准可切分行相邻的两像素行中至多一像素行为准可切分行的准可切分行标记为可切分行;
按所述可切分行对待识别图像进行行切分,以获得至少一字符行。
可选地,所述依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分包括:
当字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,小于第二预设阈值时,则将该像素列标记为准可切分列。
可选地,所述依据不同类型字符的宽度修正对待识别图像的字符切分包括:
遍历准可切分列,判断在像素行的方向上距离该准可切分列第一宽度和第二宽度的四个像素列中,准可切分列的数量是否大于等于两个,所述第一宽度和所述第二宽度依据不同类型字符的宽度而定;
若是,则保留该准可切分列的标记;以及
若否,则删除该准可切分列的标记。
可选地,所述依据不同类型字符的宽度修正对待识别图像的字符切分还包括:
遍历经筛选后的准可切分列,判断相邻准可切分列之间的距离是否皆小于等于一字符行高度的s倍,s为大于1小于等于2的常数;
若是,则将各准可切分列标记为可切分列,并按所述可切分列切分各字符行;
若否,则增加第二预设阈值,并再次执行所述字符切分和字符切分的修正。
可选地,所述依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分之前还包括:
对所述待识别图像进行预处理,以获得一灰度图像及一二值化图像,所述二值化图像用于统计像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,及进行字符切分的位置,所述灰度图像用于按所述字符切分的位置进行切分获得经切分的多个字符图像。
可选地,所述样本图像通过如下步骤进行经数据增广:随机设定该字符的字体、字号、字符灰度值中的一项或多项。
可选地,所述样本图像还通过如下步骤进行经数据增广:
随机设定具有字符的样本图像的旋转角度、放射幅度、透视角度、干扰线及滤波种类中的一项或多项。
可选地,所述分类器为基于卷积神经网络的字符分类器。
根据本公开的又一方面,还提供一种字符切分识别装置,包括:
切分模块,设置为依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
切分修正模块,设置为依据不同类型字符的宽度对待识别图像的字符切分的修正;
分类模块,设置为将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
识别模块,设置为根据所述分类器的输出,识别所述待识别图像中的字符。
根据本公开的又一方面,还提供一种电子设备,所述电子设备包括:处理器;存储介质,其上存储有计算机程序,所述计算机程序被所述处理器运行时执行如上所述的步骤。
根据本公开的又一方面,还提供一种存储介质,所述存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如上所述的步骤。
相比相关技术,本公开的优势在于:
一方面,通过不同类型字符的宽度对待识别图像的字符切分的修正以使本公开提供的字符切分识别方法能够准确切分不同类型的字符(例如中文字符、英文字符、数字字符等);另一方面,本公开通过使每个字符具有多个不同属性的样本图像的字符样本集训练分类器,以此增加分类器可识别的字符数,并提供复杂字符的识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。
附图说明
通过参照附图详细描述其示例实施方式,本公开的上述和其它特征及优点将变得更加明显。
图1是根据本公开实施例的字符切分识别方法的流程图。
图2是根据本公开示例性实施例的字符切分的流程图。
图3是根据本公开示例性实施例的行切分的流程图。
图4是根据本公开示例性实施例的字符切分并修正的流程图。
图5是根据本公开示例性实施例的自动生成字符样本集的流程图。
图6是根据本公开实施例的字符切分识别装置的模块图。
图7是根据本公开示例性实施例的字符切分识别装置的模块图。
图8是本公开示例性实施例中一种计算机可读存储介质示意图。
图9是本公开示例性实施例中一种电子设备示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体或在一个或多个硬件模块或集成电路中实现这些功能实体或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的步骤。例如,有的步骤还可以分解,而有的步骤可以合并或部分合并,因此,实际执行的顺序有可能根据实际情况改变。
图1是根据本公开实施例的字符切分识别方法的流程图。参考图1,所述字符切分识别方法包括如下步骤:
步骤S110:依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
步骤S120:依据不同类型字符的宽度对待识别图像的字符切分的修正;
步骤S130:将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
步骤S140:根据所述分类器的输出,识别所述待识别图像中的字符。
在本公开的示例性实施方式的字符切分识别方法中,一方面,通过不同类型字符的宽度对待识别图像的字符切分的修正以使本公开提供的字符切分识别方法能够准确切分不同类型的字符(例如中文字符、英文字符、数字字符等);另一方面,本公开通过使每个字符具有多个不同属性的样本图像的字符样本集训练分类器,以此增加分类器可识别的字符数,并提供复杂字符的识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。
在本公开的一些实施例中,在上述步骤S110之前还包括对待识别图像进行预处理的步骤。
示例性而言,图像预处理可以包括如下步骤中的一项或多项:对待识别图像进行灰度化处理;对待识别图像进行高斯滤波;对待识别图像进行局部自适应二值化。
示例性而言,在灰度化处理的步骤中,根据每个像素点的RGB值可计算处该像素点的灰度值,例如,灰度值=0.299*R+0.587*G+0.114*B。由此,对各个像素点进行灰度值计算以得到灰度图像。在对待识别图像进行高斯滤波的步骤中,优选地,采用核尺寸3*3(3像素点*3像素点)的高斯滤波器进行图像平滑降噪。在本实施例中,对待识别图像进行 高斯滤波实际上即为对灰度图像进行高斯滤波。在对待识别图像进行局部自适应二值化的步骤中,设定像素邻域大小为9*9(9像素点*9像素点),采用高斯加权法计算像素点的阈值(大于等于该阈值的统一为一个值(显示为黑色),小于该阈值的统一为另一个值(显示为白色)),由此,即可得到二值图像。在本实施例中,对待识别图像进行局部自适应二值化实际上即为对滤波后的灰度图像进行局部自适应二值化。
经过上述预处理步骤后,我们可以得到一幅灰度图像(经高斯滤波的灰度图像)和一幅文字与背景分离清晰的二值化图像,其中,二值化图像是用于步骤S110和步骤S120中统计像素点和切分位置的图像。
下面参见图2,图2是根据本公开示例性实施例的字符切分(即图1中步骤S110)的流程图。
如图2所示,上述步骤S110所述依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分包括如下步骤:
步骤S210:依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数量(例如像素点显示为黑色的像素点的数量,预设像素值使所述像素显示为黑色),与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行;
步骤S210:依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分。
示例性而言,如图2所示的步骤中,待识别图像实际可以是预处理后的二值化图像。
下面参见图3,图3是根据本公开示例性实施例的行切分(图2中步骤S210)的流程图。
如图3所示,上述步骤S210,所述依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数量,与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行包括如下步骤:
步骤S310:当所述待识别图像的像素阵列的像素行上的,像素值为预设像素值的像素点数量,小于等于第一预设阈值,则将该像素行标记为准可切分行;
步骤S320:对各准可切分行,与该准可切分行相邻的两像素行中至多一像素行为准可切分行的准可切分行标记为可切分行;
步骤S330:按所述可切分行对待识别图像进行行切分,以获得至少一字符行。
示例性而言,第一预设阈值可以是n×一像素行的像素点个数。本公开并非以此为限,n可以取0.02或者其它大于0且接近0的常数。在一实施例中,上述步骤S320考虑到字符行之间通常会有多条相邻的准可切分行,通过对各准可切分行的相邻像素行的属性判断即可,将与文字邻近的准可切分行标记为可切分行,以进行行切分。通过行切分会获得空 白行及字符行,上述步骤中还可以包括剔除空白行,保留字符行的步骤。由此,通过行切分的步骤,可以得到(二值化的)字符行及字符行的高度信息等。
下面参见图4,图4是根据本公开示例性实施例的字符切分并修正(图2中步骤S220及图1中步骤S120)的流程图。
如图4所示,上述步骤S220依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分包括:
步骤S410:当字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,小于第二预设阈值时,则将该像素列标记为准可切分列。
上述步骤S120依据不同类型字符的宽度修正对待识别图像的字符切分包括:
S420:判断在像素行的方向上距离该准可切分列第一宽度和第二宽度的四个像素列中,准可切分列的数量是否大于等于两个,所述第一宽度和所述第二宽度依据不同类型字符的宽度而定;
若是,则执行步骤S430保留该准可切分列的标记。
若否,则执行步骤S440删除该准可切分列的标记。
步骤S450:判断是否遍历完所有准可切分列。
若否,则对下一准可切分列执行步骤S420至步骤S440。
若是,则执行步骤S460:遍历经筛选后的准可切分列,判断相邻准可切分列之间的距离是否皆小于等于一字符行高度的s倍,s为大于1小于等于2的常数;
若是,则执行步骤S470:将各准可切分列标记为可切分列,并按所述可切分列切分各字符行;
若否,则执行步骤S480:增加第二预设阈值,并再次执行步骤S410至步骤S470的所述字符切分和字符切分的修正。
示例性而言,在本实施例中,第二预设阈值例如可以是m×一字符行中的一像素列的像素点个数m可以是大于等于0.01小于等于0.2的常数,m初始可以取0.01。考虑到对于字符切分环节,常常出现错切或漏切的问题,鉴于字符行高度已经确定,不同类型的字符的宽度有各自的设定,例如,中文字符的宽度和高度近似相同、英文字符和数字的宽度近似是高度的二分之一,因此,通过上述步骤S420至步骤S450对准可切分列进行遍历,判断可切分列两侧一个英文字符(数字字符)宽度和一个中文字符宽度的位置合计四个像素列是否为准可切分列,如果该四个像素列中,准可切分列等于2个则保留该准可切分列的标记,否则删除该准可切分列的标记(相当于普通的像素列)。然后,通过步骤S460再次遍历准可切分列,如果不存在字符宽度大于s×字符行高度(s为大于1的常数),则将准可切分列作为可切分列用来切分字符,否则调高第二预设阈值重新进行字符切分步骤和切分修正步骤。
示例性而言,准可切分列在执行步骤S420至步骤S480之前,按准可切分行的筛选 步骤进行初步筛选。
经过以上步骤确定字符的切分位置,在前述的灰度图像上对应的位置进行切分,得到单字符灰度图像。
示例性而言,上述步骤S130的字符样本集中,所述样本图像通过如下步骤进行经数据增广:随机设定该字符的字体、字号、字符灰度值中的一项或多项。在一实施例中,所述样本图像还通过如下步骤进行经数据增广:随机设定具有字符的样本图像的旋转角度、放射幅度、透视角度、干扰线及滤波种类中的一项或多项。可参见图5,图5是根据本公开示例性实施例的自动生成字符样本集的流程图。
在本实施例中,仿照前述的单字符灰度图像,可自动生成大量灰度图像样本用于训练分类器。选取多个常用的中英文数字(如6793个字符),每个字符制作多个样本(例如设定每个字符制作1000样本,又例如,针对复杂的中文字符生成更多的样本图像,针对简单的中文、英文、数字字符生成相对较少的样本图像,以减少系统负荷),每个样本的生成步骤如图5所示:
步骤S510:创建一幅纯白图像。
步骤S520:随机选择一种常用字体(例如宋体、仿宋、黑体等,本公开并非以此为限)。
步骤S530:随机选择一个字号(例如24~48,本公开并非以此为限)。
步骤S540:随机选择字符灰度值。
步骤S550:在纯白图像中写入字符。
步骤S560:随机选择旋转、放射、透视角度。
步骤S570:对字符边缘进行裁切,只保留矩形字符区域。
步骤S580:随机添加干扰线(例如干扰线条)。
步骤S590:随机进行均值滤波、高斯滤波。
将制作好的单字符灰度图像生成字符样本集,用于训练字符分类器。
示例性而言,本公开中所述分类器为基于卷积神经网络的字符分类器。在一个示例性实施例中,字符分类器的输入尺寸归一化为32*32(32像素点*32像素点),归一化方法为双立方插值,输出结果为每个字符类别的置信度(在本示例性实施例中,共6793个字符,置信度取值为0~1)。将基于卷积神经网络的字符分类器的损失函数设定为交叉熵损失。将基于卷积神经网络的字符分类器的优化器设定为Adam优化器,初始学习率设置为0.001。训练数据为字符样本生成器生成的数据,每个字符例如包括1000个样本。批训练的批尺寸(Batch Size)大小可以设置为32。利用早停止(Early Stopping)技术来停止训练。在实际测试中,使用以上训练方式训练字符分类器,训练停止后训练集准确率可达到99.2%,真实样本准确率可达97.6%。
以上仅仅是示意性地示出本公开的多个实施例,在不违背本公开构思的前提下,步骤的合并、拆分、并行执行、交换顺序等变化都在本公开的保护范围之内。
下面结合图6描述本公开提供的字符切分识别装置。图6是根据本公开实施例的字符切分识别装置的模块图。字符切分识别装置600包括切分模块610、切分修正模块620、分类模块630及识别模块640。
切分模块610设置为依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
切分修正模块620设置为依据不同类型字符的宽度对待识别图像的字符切分的修正;
分类模块630设置为将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
识别模块640设置为根据所述分类器的输出,识别所述待识别图像中的字符。
在本公开的示例性实施方式的字符切分识别装置中,一方面,通过不同类型字符的宽度对待识别图像的字符切分的修正以使本公开提供的字符切分识别方法能够准确切分不同类型的字符(例如中文字符、英文字符、数字字符等);另一方面,本公开通过使每个字符具有多个不同属性的样本图像的字符样本集训练分类器,以此增加分类器可识别的字符数,并提供复杂字符的识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。
在一实施例中,参考图7,图7是根据本公开示例性实施例的字符切分识别装置的模块图。字符切分识别装置700包括预处理模块710、切分模块720、切分修正模块730、字符样本集生成模块740、分类模块750及识别模块760。切分模块720、切分修正模块730、分类模块750及识别模块760的作用与图6所示的切分模块610、切分修正模块620、分类模块630及识别模块640相同。与图6不同的是,预处理模块710设置为对所述待识别图像进行预处理,以获得一灰度图像及一二值化图像,所述二值化图像用于统计像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,及进行字符切分的位置,所述灰度图像用于按所述字符切分的位置进行切分获得经切分的多个字符图像。字符样本集生成模块740设置为对各字符,随机设定该字符的各个属性以自动生成该字符的样本图像
图7仅仅是示意性地示出本公开的实施例,本公开并非以此为限。
以上仅仅是示意性地示出本公开的多个实施例,在不违背本公开构思的前提下,模块的合并、拆分等变化都在本公开的保护范围之内。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被例如处理器执行时可以实现上述任意一个实施例中所述字符切分识别方法的步骤。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述电子处方流转处理方法部分中描述的根据本公开各种示例性实施方式的步骤。
参考图8所示,描述了根据本公开的实施方式的用于实现上述方法的程序产品800, 其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在租户计算设备上执行、部分地在租户设备上执行、作为一个独立的软件包执行、部分在租户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到租户计算设备或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
在本公开的示例性实施例中,还提供一种电子设备,该电子设备可以包括处理器,以及用于存储所述处理器的可执行指令的存储器。其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一个实施例中所述电子处方流转处理方法的步骤。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等)或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图9来描述根据本公开的这种实施方式的电子设备900。图9显示的电子设备900仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备900以通用计算设备的形式表现。电子设备900的组件可以包括但不限于:至少一个处理单元910、至少一个存储单元920、连接不同系统组件(包括 存储单元920和处理单元910)的总线930、显示单元940等。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元910执行,使得所述处理单元910执行本说明书上述电子处方流转处理方法部分中描述的根据本公开各种示例性实施方式的步骤。例如,所述处理单元910可以执行如图1所示的步骤。
所述存储单元920可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)9201和/或高速缓存存储单元9202,还可以进一步包括只读存储单元(ROM)9203。
所述存储单元920还可以包括具有一组(至少一个)程序模块9205的程序/实用工具9204,这样的程序模块6205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线930可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备900也可以与一个或多个外部设备1000(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得租户能与该电子设备900交互的设备通信,和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口950进行。并且,电子设备900还可以通过网络适配器960与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。网络适配器960可以通过总线930与电子设备900的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备900使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、或者网络设备等)执行根据本公开实施方式的上述电子处方流转处理方法。
相比相关技术,本公开的优势在于:
一方面,通过不同类型字符的宽度对待识别图像的字符切分的修正以使本公开提供的字符切分识别方法能够准确切分不同类型的字符(例如中文字符、英文字符、数字字符等);另一方面,本公开通过使每个字符具有多个不同属性的样本图像的字符样本集训练分类器,以此增加分类器可识别的字符数,并提供复杂字符的识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实 施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由所附的权利要求指出。
工业实用性
本公开实施例通过不同类型字符的宽度对待识别图像的字符切分的修正以使本公开提供的字符切分识别方法能够准确切分不同类型的字符(例如中文字符、英文字符、数字字符等)。此外,本公开实施例通过使每个字符具有多个不同属性的样本图像的字符样本集训练分类器,以此增加分类器可识别的字符数,并提供复杂字符的识别准确率。本公开尤其适用于印刷体字符的识别,对印刷体字符的识别准确率提高尤为显著。

Claims (13)

  1. 一种字符切分识别方法,其中,包括:
    依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
    依据不同类型字符的宽度对待识别图像的字符切分的修正;
    将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
    根据所述分类器的输出,识别所述待识别图像中的字符。
  2. 如权利要求1所述的字符切分识别方法,其中,所述依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分包括:
    依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数量,与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行;
    依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分。
  3. 如权利要求2所述的字符切分识别方法,其中,所述依据所述待识别图像的像素阵列的各像素行上的,像素值为预设像素值的像素点数量,与一第一预设阈值之间的比较,对所述待识别图像进行行切分,以获得至少一字符行包括:
    当所述待识别图像的像素阵列的像素行上的,像素值为预设像素值的像素点数量,小于等于第一预设阈值,则将该像素行标记为准可切分行;
    对各准可切分行,与该准可切分行相邻的两像素行中至多一像素行为准可切分行的准可切分行标记为可切分行;
    按所述可切分行对待识别图像进行行切分,以获得至少一字符行。
  4. 如权利要求2所述的字符切分识别方法,其中,所述依据所切分的每一字符行,依据该字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,与一第二预设阈值之间的比较,对每一字符行进行字符切分包括:
    当字符行的像素阵列的各像素列上的,像素值为预设像素值的像素点数量,小于第二预设阈值时,则将该像素列标记为准可切分列。
  5. 如权利要求4所述的字符切分识别方法,其中,所述依据不同类型字符的宽度修正对待识别图像的字符切分包括:
    遍历准可切分列,判断在像素行的方向上距离该准可切分列第一宽度和第二宽度的四个像素列中,准可切分列的数量是否大于等于两个,所述第一宽度和所述第二宽度依据不同类型字符的宽度而定;
    若是,则保留该准可切分列的标记;以及
    若否,则删除该准可切分列的标记。
  6. 如权利要求5所述的字符切分识别方法,其中,所述依据不同类型字符的宽度修正对待识别图像的字符切分还包括:
    遍历经筛选后的准可切分列,判断相邻准可切分列之间的距离是否皆小于等于一字符行高度的s倍,s为大于1小于等于2的常数;
    若是,则将各准可切分列标记为可切分列,并按所述可切分列切分各字符行;
    若否,则增加第二预设阈值,并再次执行所述字符切分和字符切分的修正。
  7. 如权利要求1至6任一项所述的字符切分识别方法,其中,所述依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分之前还包括:
    对所述待识别图像进行预处理,以获得一灰度图像及一二值化图像,所述二值化图像用于统计像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,及进行字符切分的位置,所述灰度图像用于按所述字符切分的位置进行切分获得经切分的多个字符图像。
  8. 如权利要求1至6任一项所述的字符切分识别方法,其中,所述样本图像通过如下步骤进行经数据增广:
    随机设定该字符的字体、字号、字符灰度值中的一项或多项。
  9. 如权利要求8所述的字符切分识别方法,其中,所述样本图像还通过如下步骤进行经数据增广:
    随机设定具有字符的样本图像的旋转角度、放射幅度、透视角度、干扰线及滤波种类中的一项或多项。
  10. 如权利要求1至6任一项所述的字符切分识别方法,其中,所述分类器为基于卷积神经网络的字符分类器。
  11. 一种字符切分识别装置,其中,包括:
    切分模块,设置为依据待识别图像的像素阵列的各像素行和各像素列上的,像素值为预设像素值的像素点数量,对包含至少一行字符的待识别图像进行字符切分;
    切分修正模块,设置为依据不同类型字符的宽度对待识别图像的字符切分的修正;
    分类模块,设置为将经切分的待识别图像输入一经字符样本集训练的分类器中,所述字符样本集包括经数据增广的样本图像;以及
    识别模块,设置为根据所述分类器的输出,识别所述待识别图像中的字符。
  12. 一种电子设备,其中,所述电子设备包括:
    处理器;
    存储器,其上存储有计算机程序,所述计算机程序被所述处理器运行时执行如权利要求1至10任一项所述的字符切分识别方法。
  13. 一种存储介质,其中,所述存储介质上存储有计算机程序,所述计算机程序被处 理器运行时执行如权利要求1至10任一项所述的字符切分识别方法。
PCT/CN2019/104931 2018-09-25 2019-09-09 字符切分识别方法、装置、电子设备、存储介质 WO2020063314A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811121021.3 2018-09-25
CN201811121021.3A CN110942074B (zh) 2018-09-25 2018-09-25 字符切分识别方法、装置、电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2020063314A1 true WO2020063314A1 (zh) 2020-04-02

Family

ID=69905425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/104931 WO2020063314A1 (zh) 2018-09-25 2019-09-09 字符切分识别方法、装置、电子设备、存储介质

Country Status (2)

Country Link
CN (1) CN110942074B (zh)
WO (1) WO2020063314A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539406A (zh) * 2020-04-21 2020-08-14 招商局金融科技有限公司 证件复印件信息识别方法、服务器及存储介质
CN111553336A (zh) * 2020-04-27 2020-08-18 西安电子科技大学 基于连体段的印刷体维吾尔文文档图像识别系统及方法
CN111783781A (zh) * 2020-05-22 2020-10-16 平安国际智慧城市科技股份有限公司 基于产品协议字符识别的恶意条款识别方法、装置、设备
CN112529004A (zh) * 2020-12-08 2021-03-19 平安科技(深圳)有限公司 智能图像识别方法、装置、计算机设备及存储介质
CN112699886A (zh) * 2020-12-30 2021-04-23 广东德诚大数据科技有限公司 一种字符识别方法、装置及电子设备
CN112784835A (zh) * 2021-01-21 2021-05-11 恒安嘉新(北京)科技股份公司 圆形印章的真实性识别方法、装置、电子设备及存储介质
CN113723410A (zh) * 2020-05-21 2021-11-30 安徽小眯当家信息技术有限公司 一种数码管数字识别方法及装置
CN114332888A (zh) * 2022-03-16 2022-04-12 中央民族大学 一种东巴文的文字切分方法、装置、存储介质及电子设备
CN115588204A (zh) * 2022-09-23 2023-01-10 神州数码系统集成服务有限公司 一种基于ds证据理论的单一字符图像匹配识别方法
CN115880300A (zh) * 2023-03-03 2023-03-31 北京网智易通科技有限公司 图像模糊检测方法、装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523541A (zh) * 2020-04-21 2020-08-11 上海云从汇临人工智能科技有限公司 一种基于ocr的数据生成方法、系统、设备及介质
CN113160222A (zh) * 2021-05-14 2021-07-23 电子科技大学 一种针对工业信息图像的生产数据识别方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249897A1 (en) * 2010-04-08 2011-10-13 University Of Calcutta Character recognition
CN104616009A (zh) * 2015-02-13 2015-05-13 广州广电运通金融电子股份有限公司 一种字符切割识别方法
CN105631486A (zh) * 2014-10-27 2016-06-01 深圳Tcl数字技术有限公司 图像文字识别方法及装置
CN106407976A (zh) * 2016-08-30 2017-02-15 百度在线网络技术(北京)有限公司 图像字符识别模型生成和竖列字符图像识别方法和装置
CN106611175A (zh) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 用于图像文字识别的字符图片自动切分系统
CN106874909A (zh) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 一种图像字符的识别方法及其装置
CN107305630A (zh) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 文本序列识别方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013126286A1 (en) * 2012-02-21 2013-08-29 General Electric Company System and method for segmenting image data to identify a character-of-interest
CN104008384B (zh) * 2013-02-26 2017-11-14 山东新北洋信息技术股份有限公司 字符识别方法和字符识别装置
CN106446896B (zh) * 2015-08-04 2020-02-18 阿里巴巴集团控股有限公司 一种字符分割方法、装置及电子设备
CN105760891A (zh) * 2016-03-02 2016-07-13 上海源庐加佳信息科技有限公司 一种中文字符验证码的识别方法
CN106682671A (zh) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 图像文字识别系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249897A1 (en) * 2010-04-08 2011-10-13 University Of Calcutta Character recognition
CN105631486A (zh) * 2014-10-27 2016-06-01 深圳Tcl数字技术有限公司 图像文字识别方法及装置
CN104616009A (zh) * 2015-02-13 2015-05-13 广州广电运通金融电子股份有限公司 一种字符切割识别方法
CN107305630A (zh) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 文本序列识别方法和装置
CN106407976A (zh) * 2016-08-30 2017-02-15 百度在线网络技术(北京)有限公司 图像字符识别模型生成和竖列字符图像识别方法和装置
CN106611175A (zh) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 用于图像文字识别的字符图片自动切分系统
CN106874909A (zh) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 一种图像字符的识别方法及其装置

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539406A (zh) * 2020-04-21 2020-08-14 招商局金融科技有限公司 证件复印件信息识别方法、服务器及存储介质
CN111539406B (zh) * 2020-04-21 2023-04-18 招商局金融科技有限公司 证件复印件信息识别方法、服务器及存储介质
CN111553336A (zh) * 2020-04-27 2020-08-18 西安电子科技大学 基于连体段的印刷体维吾尔文文档图像识别系统及方法
CN111553336B (zh) * 2020-04-27 2023-03-24 西安电子科技大学 基于连体段的印刷体维吾尔文文档图像识别系统及方法
CN113723410A (zh) * 2020-05-21 2021-11-30 安徽小眯当家信息技术有限公司 一种数码管数字识别方法及装置
CN111783781A (zh) * 2020-05-22 2020-10-16 平安国际智慧城市科技股份有限公司 基于产品协议字符识别的恶意条款识别方法、装置、设备
CN111783781B (zh) * 2020-05-22 2024-04-05 深圳赛安特技术服务有限公司 基于产品协议字符识别的恶意条款识别方法、装置、设备
CN112529004A (zh) * 2020-12-08 2021-03-19 平安科技(深圳)有限公司 智能图像识别方法、装置、计算机设备及存储介质
CN112699886A (zh) * 2020-12-30 2021-04-23 广东德诚大数据科技有限公司 一种字符识别方法、装置及电子设备
CN112784835A (zh) * 2021-01-21 2021-05-11 恒安嘉新(北京)科技股份公司 圆形印章的真实性识别方法、装置、电子设备及存储介质
CN112784835B (zh) * 2021-01-21 2024-04-12 恒安嘉新(北京)科技股份公司 圆形印章的真实性识别方法、装置、电子设备及存储介质
CN114332888A (zh) * 2022-03-16 2022-04-12 中央民族大学 一种东巴文的文字切分方法、装置、存储介质及电子设备
CN115588204A (zh) * 2022-09-23 2023-01-10 神州数码系统集成服务有限公司 一种基于ds证据理论的单一字符图像匹配识别方法
CN115880300A (zh) * 2023-03-03 2023-03-31 北京网智易通科技有限公司 图像模糊检测方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN110942074B (zh) 2024-04-09
CN110942074A (zh) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2020063314A1 (zh) 字符切分识别方法、装置、电子设备、存储介质
US11468225B2 (en) Determining functional and descriptive elements of application images for intelligent screen automation
US10896349B2 (en) Text detection method and apparatus, and storage medium
EP3117369B1 (en) Detecting and extracting image document components to create flow document
US8644561B2 (en) License plate optical character recognition method and system
CN110598686B (zh) 发票的识别方法、系统、电子设备和介质
EP1854051B1 (en) Intelligent importation of information from foreign application user interface using artificial intelligence
KR102435365B1 (ko) 증명서 인식 방법 및 장치, 전자 기기, 컴퓨터 판독 가능한 저장 매체
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
CN111488826A (zh) 一种文本识别方法、装置、电子设备和存储介质
CN102822846B (zh) 用于对来自文本行图像的单词进行分割的方法和设备
CN110942004A (zh) 基于神经网络模型的手写识别方法、装置及电子设备
CN113785305A (zh) 一种检测倾斜文字的方法、装置及设备
US11893765B2 (en) Method and apparatus for recognizing imaged information-bearing medium, computer device and medium
CN111444986A (zh) 建筑图纸构件分类方法、装置、电子设备及存储介质
CN112818852A (zh) 印章校验方法、装置、设备及存储介质
CN111241897B (zh) 通过推断视觉关系的工业检验单数字化的系统和实现方法
CN112784737B (zh) 结合像素分割和线段锚的文本检测方法、系统及装置
Nasiri et al. A new binarization method for high accuracy handwritten digit recognition of slabs in steel companies
CN114187445A (zh) 识别图像中文本的方法、装置、电子设备及存储介质
CN114120305A (zh) 文本分类模型的训练方法、文本内容的识别方法及装置
CN112712080B (zh) 一种用于走字屏采集图像的文字识别处理方法
CN114359536A (zh) 字符识别模型的训练方法及装置、存储介质及电子设备
CN115719488A (zh) 文本识别方法、装置、电子设备以及存储介质
CN114743030A (zh) 图像识别方法、装置、存储介质和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19865216

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19865216

Country of ref document: EP

Kind code of ref document: A1