CN117037165A

CN117037165A - Chinese character recognition method, chinese character recognition device, computer equipment and storage medium

Info

Publication number: CN117037165A
Application number: CN202310429785.3A
Authority: CN
Inventors: 王婷婷; 邵允学
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-11-10

Abstract

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a Chinese character recognition method, a Chinese character recognition device, computer equipment and a storage medium. The method comprises the following steps: acquiring a Chinese character image to be identified; sending the Chinese character image to be identified into a preset image segmentation model, and obtaining a characteristic vector group of a character area in the Chinese character image to be identified; according to a preset rule, converting non-zero elements in the feature vector group into 0 or 1 to obtain a new feature vector group; according to each feature vector in the new feature vector group, determining a target code with highest similarity with the feature vector from codes corresponding to each Chinese character in the Chinese character information base; and determining the Chinese character corresponding to the target code as a target Chinese character. The scheme improves the efficiency of Chinese character recognition. In addition, the purpose of saving the storage space can be achieved.

Description

Chinese character recognition method, chinese character recognition device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a Chinese character recognition method, a Chinese character recognition device, computer equipment and a storage medium.

Background

With the rapid development of deep learning technology, deep learning methods are widely adopted in the field of Chinese character recognition. Chinese characters are characters formed by strokes, and the strokes are complex and various, so that the recognition of Chinese characters is more difficult than the recognition of images formed by English letters. Along with the increasing importance of the capability of Chinese character image recognition in the process of office automation, enterprises are very urgent to need algorithms with high accuracy and high speed for handwriting Chinese character recognition.

At present, the main Chinese character recognition model is detection and recognition, wherein the detection solves the problem of the presence of Chinese characters, and the range of the Chinese characters is large. The recognition is to recognize the positioned Chinese character areas, mainly solving the problem of what each Chinese character is, and converting the Chinese character areas in the image into character information. The existing detection and recognition model has the advantages of high program calculation speed and low time consumption, and the defect that the integrity of Chinese character image content analysis cannot be ensured.

Disclosure of Invention

In order to solve the above-mentioned drawbacks or improvements of the prior art, embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for identifying Chinese characters, so as to improve the above-mentioned problems. In order to achieve the above purpose, the technical scheme adopted by the application is as follows:

in a first aspect, an embodiment of the present application provides a method for identifying chinese characters, where the method includes:

setting the number of Chinese character categories identified by Chinese characters, constructing a Chinese character information base containing all Chinese characters, and carrying out Chinese character coding on each Chinese character in the Chinese character information base;

acquiring a Chinese character image to be identified, then sending the Chinese character image to be identified into a preset image segmentation model to acquire a Chinese character content area, solving a communication area for the Chinese character content area to acquire a single Chinese character content area, and acquiring a feature vector group of the single Chinese character content area, wherein the dimension of each feature vector in the feature vector group is equal to the length of Chinese character codes in a Chinese character information base, and each element of the feature vector is a probability value at each position in the codes; according to a preset rule, converting non-zero elements in the feature vector group into 0 or 1 to obtain a new feature vector group; and determining a target code with the highest similarity with the feature vector from codes corresponding to all Chinese characters in the Chinese character information base according to all feature vectors in the new feature vector group, and determining the Chinese character corresponding to the target code as a target Chinese character.

In a second aspect, an embodiment of the present application provides a chinese character recognition apparatus, including: the acquisition unit is used for acquiring the Chinese character image to be identified; the extraction unit is used for sending the Chinese character image to be identified into a preset image segmentation model to obtain a Chinese character content area, solving a communication area for the Chinese character content area to obtain a single Chinese character content area, and extracting a feature vector group of the single Chinese character content area, wherein the dimension of each feature vector in the feature vector group is equal to the length of Chinese character codes in a Chinese character information base, and each element of the feature vector is a probability value at each position in the codes; the determining unit is used for converting non-zero elements in the feature vector groups into 0 or 1 according to a preset rule for each feature vector group extracted by the extracting unit to obtain a new feature vector group, and determining a target code with highest similarity with the feature vector from codes corresponding to each Chinese character in the Chinese character information base according to each feature vector in the new feature vector group; and the recognition unit is used for determining the Chinese character corresponding to the target code determined by the determination unit as a Chinese character recognition result.

In a third aspect, an embodiment of the present application further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor implements a method according to any embodiment of the present specification when executing the computer program;

in a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a method according to any of the embodiments of the present specification.

In the technical scheme, after a Chinese character image to be identified is obtained, firstly, a feature vector group of a character area in the Chinese character image to be identified is obtained; next, for each feature vector in the feature vector group, determining a target code with highest similarity with the feature vector from codes corresponding to each Chinese character in the Chinese character information base; then, the Chinese character corresponding to the target code is determined as a Chinese character recognition result. Because the encoding length of all Chinese characters in the Chinese character information base is far smaller than the total number of Chinese characters in the Chinese character information base, the dimension of each feature vector is relatively low. Therefore, the calculation workload in determining the feature vector group can be reduced, and the target code can be obtained more quickly, so that the Chinese character recognition efficiency is improved. In addition, the purpose of saving the storage space can be achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for recognizing Chinese characters according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a network model for Unet image segmentation provided by an embodiment of the present application;

FIG. 3 is a flowchart of a method for obtaining a preset image segmentation network model according to an embodiment of the present application;

FIG. 4 is a diagram showing a structure of a Chinese character image recognition apparatus according to an embodiment of the present application;

FIG. 5 is a block diagram of a model training apparatus according to an embodiment of the present application;

fig. 6 is a hardware architecture diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a Chinese character recognition method, which can solve the problem of Chinese character image recognition and comprises the following implementation steps as shown in figure 1:

step 100: acquiring a Chinese character image to be identified;

it should be noted that, the chinese character image to be identified in this step may be acquired by an online image capturing device, or may be a saved scene image, which may be specific according to an actual application scene, and is not limited herein.

Step 101: and sending the Chinese character image to be identified into a preset image segmentation model to obtain a Chinese character content area, solving a connected area for the Chinese character content area to obtain a single Chinese character content area, and obtaining a feature vector group of the single Chinese character content area.

In this embodiment, for example, the size of a chinese character image to be identified is w×h (W is an image width, H is an image height), and the number of channels is 3, the chinese character image to be identified is first input into a preset image segmentation model, the image segmentation model is shown in fig. 2, the model outputs a feature tensor of w×h×c corresponding to the chinese character image to be identified, where C is the number of channels, the number of channels is equal to the length of the chinese character code, a communication area is calculated for a text area on each channel, if the overlapping portion of the communication areas on two adjacent channels is greater than a preset threshold, two communication areas are combined, the communication area calculation method adopts a four-adjacent-area communication area calculation, finally, a one-dimensional feature vector can be obtained for all the positions of the text areas, the feature vector dimension obtained is (C, 1), and the feature vector on each communication area forms a feature vector group;

in addition, the preset image segmentation model may be a Unet image segmentation model used in the present application, or may be another image segmentation model, which is not particularly limited in the present application. In addition, the preset image segmentation network model may be acquired through steps 301 to 303 shown in fig. 3.

In step 301, a plurality of training sample images are acquired.

In embodiments of the present application, the training sample image may be acquired in a variety of ways. In one embodiment, the characters included in each training sample image may be manually marked by manually marking, that is, capturing a plurality of training sample images by shooting or the like. However, because the training process typically requires a large number of training sample images, the manner in which the manual marking is done tends to be inefficient and labor-intensive. Thus, to increase efficiency and reduce labor costs, in another embodiment, one or more chinese character images may be extracted from a chinese character information base, from which a training sample image is generated, wherein each chinese character image may include one or more chinese characters.

In step 302, a plurality of training sample images are input into an image segmentation network model to obtain an output result of an initial image segmentation network model.

In the embodiment of the present application, the number of convolution kernels of at least one convolution layer in the image segmentation network model is determined according to the length of the code, the length of the code is equal to the dimension of the feature vector, and each element of the feature vector corresponds to each bit in the code one by one. The number of layers of the initial image segmentation network model, the node structure in the layers, and the convolution kernel used for performing the convolution operation may be constructed so as to be suitable for the encoding. For example, parameters such as the number of nodes and the number of convolution kernels of the final layer of convolution layer may be set with reference to the length of the above-described code. For example, the number of nodes of the convolutional layer of the last layer of the initial image segmentation network model may be set correspondingly to the length of the code.

After a plurality of training sample images are obtained in step 301, the plurality of training sample images may be used as training data of the initial image segmentation network model, and a feature vector group corresponding to a single text region in the plurality of training sample images and having correct recognition may be used as marking data, so as to train the image segmentation network model. Wherein the training of the initial image segmentation network model is to obtain relevant parameters in the initial neural network model, such as the size of the convolution kernel, the moving step length of the convolution kernel, and the like.

In step 303, training the initial neural network model according to the comparison result of the output result and the tag data, to obtain a preset image segmentation network model.

After the output result of the initial image segmentation network model is obtained in the step 302, the output result and the label data may be compared, for example, the similarity between the output result and the label data is measured by a cosine distance or a euclidean distance, the degree of network convergence is measured by using the difference between the output result and the label result, and when the difference is greater than or equal to a preset difference threshold, the model is repeatedly trained until the difference is less than the preset difference threshold, and training is stopped, thereby obtaining the preset image segmentation network model. The preset difference threshold may be a value set by a user, or may be a default empirical value, which is not particularly limited in the present application.

Returning to fig. 2, step 102: according to a preset rule, converting non-zero elements in the feature vector group into 0 or 1 to obtain a new feature vector group, aiming at each feature vector in the feature vector group, calculating the similarity between each feature vector and codes corresponding to all Chinese characters in a Chinese character information base according to the dimension of the feature vector and the length of the codes, determining the code with the highest similarity with the feature vector in codes corresponding to all Chinese characters in the Chinese character information base as a target code, and voting to select the target code with the highest vote number on the connected region. For example, the similarity between the feature vector and the codes corresponding to each Chinese character in the Chinese character information base can be measured by the Euclidean distance or the cosine distance.

Specifically, in one embodiment, the conversion of the non-zero elements in the feature vector set may be implemented according to a comparison result between each non-zero element and an average value of all the non-zero elements in the feature vector set. Specifically, for each non-zero element in the feature vector set, if the non-zero element is greater than the average value, the non-zero element may be set to 1; if the non-zero element is less than or equal to the average value, the non-zero element may be set to 0.

In another embodiment, the conversion of the non-zero elements in the feature vector set may be implemented according to a comparison result between the non-zero elements and a preset threshold, for each non-zero element in the feature vector set, if the non-zero element is greater than the preset threshold, the non-zero element may be set to 1, and if the non-zero element is less than or equal to the preset threshold, the non-zero element may be set to 0, where it is required to be noted that the preset threshold may be a set value or a default experience value (for example, 0.7), which is not specifically limited in the disclosure;

in one embodiment, for each Chinese character in the Chinese character information base, acquiring a Chinese character font image corresponding to each Chinese character in the Chinese character information base, performing binarization processing on the Chinese character font image, and directly expanding the binarized Chinese character font image to obtain Chinese character codes of all Chinese characters in the Chinese character information base; for example, the size of the chinese font image is 32 x 32, and the extended encoding length is 1024;

in another implementation, for each Chinese character in the Chinese character information base, one_hot encoding is adopted to encode each Chinese character, and the encoding length is the total number of all Chinese character categories in the Chinese character information base;

in another embodiment, for each Chinese character in the Chinese character information base, training a self-coding network, taking the output in the middle of self-coding as a characteristic, extracting the characteristic from each Chinese character font image to obtain a characteristic matrix, then clustering each Chinese character characteristic by adopting a k-means algorithm, and obtaining a binary Chinese character code for all Chinese characters in the Chinese character information base after clustering is finished;

step 103: and determining the Chinese character corresponding to the target code as the target Chinese character.

In the present disclosure, the codes corresponding to each chinese character are stored in the chinese character information base, so after the target code is obtained in the above step 102, the chinese character corresponding to the target code, that is, the target chinese character, can be found by accessing the corresponding storage module in the chinese character information base. Thus, a plurality of target Chinese characters in the image to be recognized can be obtained.

In the technical scheme, after an image to be identified is obtained, firstly, determining a characteristic vector group of a single Chinese character content area in the Chinese character image to be identified; next, for each feature vector in the feature vector group, determining a target code with highest similarity with the feature vector from codes corresponding to each Chinese character in the Chinese character information base; then, the Chinese character corresponding to the target code is determined as the target Chinese character. The Chinese character encoding method provided by the present disclosure makes the encoding length of all Chinese characters far smaller than the total number of Chinese characters in the Chinese character information base, so that the dimension of each feature vector is relatively low, the calculation workload when determining the feature vector group can be reduced, and the acquisition of the target encoding is faster, thereby improving the efficiency of Chinese character recognition. In addition, the purpose of saving the storage space can be achieved.

As shown in FIG. 4, the embodiment of the application provides a Chinese character recognition device. The apparatus 400 includes: an obtaining unit 401, configured to obtain a chinese character image to be identified; an extracting unit 402, configured to send the to-be-identified chinese character image obtained by the obtaining unit 401 into a preset image segmentation model, obtain a chinese character content area, obtain a single chinese character content area for the chinese character content area, and extract a feature vector group of the single chinese character content area, where dimensions of each feature vector in the feature vector group are equal to a length of a chinese character code in a chinese character information base, and each element of the feature vector is a probability value at each position in the foregoing code; a determining unit 403, configured to convert, for each of the feature vector groups extracted by the extracting unit, a non-zero element in the feature vector group into 0 or 1 according to a preset rule, obtain a new feature vector group, and determine, according to each feature vector in the new feature vector group, a target code with the highest similarity with the feature vector from codes corresponding to each chinese character in the chinese character information base; and a recognition unit 404, configured to determine, as a kanji recognition result, a kanji corresponding to the target code determined by the determination unit 403.

As shown in fig. 5, the embodiment of the application provides a model training device. The apparatus 500 includes: an acquisition unit 501 for acquiring a plurality of training sample images; an extracting unit 502, configured to input the plurality of training sample images acquired by the acquiring unit 501 into an initial image segmentation network model, and obtain an output result of the initial image segmentation network model, where the number of convolution kernels of at least one layer in the initial image segmentation network model is determined according to the length of the code, the at least one layer includes a last layer, and the length of the code is equal to the dimension of the feature vector; and a training unit 503, configured to train the initial image segmentation network model according to the comparison result of the output result obtained by the extracting unit 502 and the marker data, so as to obtain a preset image segmentation network model, where the marker data is a feature vector group that is respectively corresponding to the plurality of training sample images and has correct recognition.

In addition, the model training device may be independent of the chinese character recognition device, or may be integrated into the chinese character recognition device, which is not particularly limited in the present disclosure.

It will be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation of a device for recognizing chinese characters. In other embodiments of the present application, a Chinese character recognition apparatus may include more or less parts than shown, or certain parts may be combined, or certain parts may be split, or different parts may be arranged. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The Chinese character recognition method provided by the embodiment of the application can be applied to the computer equipment shown in fig. 6. The computer device comprises a processor, a memory, and a computer program stored in the memory, wherein the processor is connected through a system bus, and when executing the computer program, the processor can execute the steps of the method embodiments described below. Optionally, the computer device may further comprise a network interface, a display screen and an input means. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, which stores an operating system and a computer program, an internal memory. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. Optionally, the computer device may be a server, a personal computer, a personal digital assistant, other terminal devices, such as a tablet computer, a mobile phone, etc., or a cloud or remote server, and the embodiment of the present application does not limit a specific form of the computer device.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program when being executed by a processor causes the processor to execute the Chinese character recognition method in any embodiment of the application.

Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.

In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present application.

Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.

Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.

Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion module connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion module is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.

It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one …" does not exclude the presence of additional identical elements in a process, method, article or apparatus that comprises the element.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media in which program code may be stored, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A Chinese character recognition method is characterized by comprising the following steps:

acquiring a Chinese character image to be identified, then sending the Chinese character image to be identified into a preset image segmentation model to acquire a Chinese character content area, solving a communication area for the Chinese character content area to acquire a single Chinese character content area, and acquiring a feature vector group of the single Chinese character content area, wherein the dimension of each feature vector in the feature vector group is equal to the length of Chinese character codes in a Chinese character information base, and each element of the feature vector is a probability value at each position in the codes;

according to a preset rule, converting non-zero elements in the feature vector group into 0 or 1 to obtain a new feature vector group, determining a target code with highest similarity with the feature vector from codes corresponding to all Chinese characters in the Chinese character information base according to all feature vectors in the new feature vector group, and determining a Chinese character corresponding to the target code as a target Chinese character.

2. The method of claim 1, wherein the obtaining the set of feature vectors for the single chinese content area comprises:

inputting the Chinese character image to be identified into a preset image segmentation network model to obtain a Chinese character content area, solving a communication area for the Chinese character content area to obtain a single Chinese character content area, and obtaining a feature vector group of the single Chinese character content area, wherein the preset image segmentation network model is constructed according to codes corresponding to all Chinese characters in the Chinese character information base.

3. The method of claim 2, wherein the pre-set image segmentation network model is constructed by:

acquiring a plurality of training sample images;

inputting the training sample images into an initial image segmentation network model to obtain an output result of the initial image segmentation network model, wherein the number of convolution kernels of at least one layer of convolution layers in the image segmentation network model is determined according to the length of the code, the length of the code is equal to the dimension of the feature vector, and each element of the feature vector corresponds to each bit in the code one by one;

training the initial image segmentation network model according to the comparison result of the output result and the marking data to obtain a preset image segmentation network model, wherein the marking data is a feature vector group which is respectively corresponding to the training sample images and is correctly identified.

4. A method according to any one of claims 1-3, wherein the corresponding codes for each chinese character in the chinese character information base are determined by:

for each Chinese character in the Chinese character information base, acquiring a Chinese character font image corresponding to each Chinese character in the Chinese character information base, binarizing the Chinese character font image, and directly expanding the binarized Chinese character font image to obtain Chinese character codes of all Chinese characters in the Chinese character information base;

aiming at each Chinese character in the Chinese character information base, encoding each Chinese character by using one_hot encoding, wherein the encoding length is the total number of all Chinese character categories in the Chinese character information base;

and training a self-coding network aiming at each Chinese character in the Chinese character information base, taking the output in the middle of self-coding as a characteristic, extracting the characteristic from each Chinese character font image to obtain a characteristic matrix, clustering each Chinese character characteristic by adopting a k-means algorithm, and obtaining a binary Chinese character code for all Chinese characters in the Chinese character information base after clustering is finished.

5. A chinese character recognition apparatus, comprising:

the acquisition unit is used for acquiring the Chinese character image to be identified;

the extraction unit is used for sending the Chinese character image to be identified into a preset image segmentation model to obtain a Chinese character content area, solving a communication area for the Chinese character content area to obtain a single Chinese character content area, and extracting a feature vector group of the single Chinese character content area, wherein the dimension of each feature vector in the feature vector group is equal to the length of Chinese character codes in a Chinese character information base, and each element of the feature vector is a probability value at each position in the codes;

the determining unit is used for converting non-zero elements in the feature vector groups into 0 or 1 according to a preset rule for each feature vector group extracted by the extracting unit to obtain a new feature vector group, and determining a target code with highest similarity with the feature vector from codes corresponding to each Chinese character in the Chinese character information base according to each feature vector in the new feature vector group;

and the recognition unit is used for determining the Chinese character corresponding to the target code determined by the determination unit as a Chinese character recognition result.

6. The apparatus of claim 5, wherein the predetermined image segmentation network model is constructed by a model training apparatus according to codes corresponding to each chinese character in a chinese character information base, wherein the model training apparatus comprises:

an acquisition unit configured to acquire a plurality of training sample images;

the extraction unit is used for inputting the plurality of training sample images acquired by the acquisition unit into an initial image segmentation network model to obtain an output result of the initial image segmentation network model, wherein the number of convolution kernels of at least one layer in the initial image segmentation network model is determined according to the length of the code, the at least one layer comprises a last layer, and the length is equal to the dimension;

the training unit is used for training the initial image segmentation network model according to the comparison result of the output result and the marking data obtained by the extraction unit to obtain a preset image segmentation network model, wherein the marking data is a feature vector group which is respectively corresponding to the training sample images and is correctly identified.

7. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-6.

8. An electronic device, comprising:

a storage device having a computer program stored thereon; processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-6.