CN112434699A

CN112434699A - Automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes

Info

Publication number: CN112434699A
Application number: CN202011336351.1A
Authority: CN
Inventors: 吕福成; 张月霞; 王凯
Original assignee: Hangzhou Liupin Cultural Creativity Co ltd
Current assignee: Hangzhou Liupin Cultural Creativity Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-03-02

Abstract

The invention discloses an automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes, which comprises: the Chinese character handwriting practicing system comprises an automatic extraction module of a handwriting practicing book image, an automatic extraction module of a single Chinese character, a Chinese character identification module and a Chinese character grading module. The automatic extraction module of the calligraphy practicing book comprises the steps of preprocessing the uploaded handwritten Chinese character photos, capturing images, correcting, reporting errors and judging formats, and the accuracy and the automation effect of image extraction are improved through a perfect process; the single Chinese character extraction module comprises a special processing method aiming at pencil character extraction, and can effectively extract pencil digital images; the established deep learning identification model can quickly identify handwritten Chinese characters, including identification of spaces; the Chinese character scoring adopts a structure and content comprehensive scoring method, the structure is the length and width value of the Chinese character, and the content is determined by cosine similarity. The invention can automatically extract the randomly uploaded handwritten character photos and score the handwritten character photos, is suitable for the students in the low ages who just learn Chinese characters, and is also helpful for the daily calligraphy practice of calligraphy enthusiasts.

Description

Automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes

Technical Field

The invention relates to an automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes, in particular to an automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes based on a common calligraphy practicing book, belonging to the technical field of artificial intelligent character image processing.

Background

The Chinese character writing is the tradition of Chinese culture, has a long history, and how to effectively judge the normalization of the written Chinese characters is a more complicated subject in the fields of Chinese character application level test, Chinese character application teaching and the like. On one hand, the education departments, teachers and parents increasingly pay more attention to calligraphy education, and writing Chinese characters is the basic literacy of each Chinese, and is the inheritance of excellent Chinese traditional culture; on the other hand, in the advanced development of electronic products, more and more Chinese people are difficult to write a beautiful and standard Chinese character. Meanwhile, students often imitate by copybooks when practicing calligraphy, and the writing level is gradually improved, which is a conventional method and obtains a more ideal effect to a certain extent, but the method still has some problems: 1. the writing effect is evaluated by teachers, different teachers have different standards, even if the same teacher has different evaluations at different time, the subjectivity is strong; 2. the method is lack of quantitative and visual evaluation methods, writers only have sensory knowledge on the written content but do not have quantitative and visual evaluation, and in the past, the writers are difficult to evaluate the writing level of the writers, are easy to lack of driving force to practice the calligraphy, and gradually lose the interest in calligraphy practice.

The prior patent relating to the Chinese character writing practice system generally has the following defects:

1. the existing patent avoids or simplifies the automatic extraction function of a handwriting practicing book image and a single Chinese character of a user. First, the automatic extraction function of some patent images is weak. If the calligraphy practicing book image uploaded by the user has the conditions of different backgrounds, different shooting angles, interferents, strong and weak light and the like, effective automatic extraction is difficult to realize. "a system and method of artificial intelligence scoring of calligraphy exercise" (201910427401.8) catch the picture of calligraphy exercise book through the simple straight line detection, and stipulate the background of taking a picture as the non-interfering, apparent pure color of colour difference with calligraphy exercise book; the patent 'handwriting scoring system and scoring method thereof' (201510565555.5) sets three vertexes on a user photographing interface, so that a user can aim at a fixed point on a corresponding calligraphy practicing book, and the use difficulty of the user is increased. Second, some patents will replace automatic extraction of the calligraphy practice image by the positioning of an external device. The intelligent evaluation method for Chinese character writing quality (202010433699.6), the digital writing exercise method and system (201710762777.5), the intelligent writing scoring system and method (201710032399.5), the writing exercise system and method (201710404401.7), the writing exercise scoring method and writing exercise device (201610957317.3) and the like increase equipment purchase cost and greatly reduce experience for users. Thirdly, the single automatic extraction mode of the image of the exercise book also has great limitation on the selection of the paper and the format of the exercise book, so that a user cannot obtain good use experience. Fourthly, the automatic extraction function of single Chinese characters in most patents is simple, and mainly aims at Chinese character extraction in water-writing strokes, and the extraction of pencil characters is rarely mentioned. When a user uses a pencil to practice characters, due to the lack of a good automatic Chinese character extraction unit, the square frame cannot be effectively removed, the display effect is poor, and the normal operation of the subsequent process is influenced; meanwhile, due to the existence of the shadow, the Chinese character after binarization becomes fuzzy, part of strokes become thin or are absent, and great challenge is brought to the stability of the scoring result.

2. The intelligent degree of Chinese character scoring in the existing patent is not enough, and a Chinese character recognition module based on the current popular artificial intelligence technology, namely a recognition module for carrying out high accuracy on handwritten Chinese characters, components and strokes, is lacked, so that the current situation that the intelligent degree of the existing patent is not enough is highlighted. Some patents require the user to manually input what the written Chinese characters are to be scored, for example, in the character evaluation method and device (201310488191.6), the user is required to input an evaluation instruction to score the indicated Chinese characters. Some patents do not have the function of Chinese character recognition, and can only score specific calligraphy practicing books, for example, in the artificial intelligence scoring system and method for calligraphy practicing by copybook (201910427401.8), the style of the calligraphy practicing book is fixed, and the first column is the standard character, and the first character of each line of the calligraphy practicing book is used as the standard character to compare and score the subsequent handwritten Chinese characters.

3. The traditional patent Chinese character scoring module mostly adopts simple contact ratio to evaluate the Chinese character effect, the method does not verify that the evaluation result is consistent with the professional calligraphy evaluation, certain scientificity is lacked, and the visual influence of the whole character pattern on the quality of the Chinese character cannot be reflected. The patents of 'a handwriting scoring system and scoring method thereof' (201510565555.5) 'an artificial intelligence scoring system and method for copybook calligraphy practice' (201910427401.8) and the like only simply evaluate the quality of Chinese characters and the like through the coincidence degree of handwriting and standard characters. In the published patents CN201510565555, CN201911199560 and CN201910427401, the similarity between the hand-written character and the whole character pattern of the standard character cannot be reflected by adopting a simple area overlapping degree.

Disclosure of Invention

The invention aims to design an automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes aiming at the defects of the background technology, which comprises the following steps: the Chinese character handwriting practicing system comprises an automatic extraction module of a handwriting practicing book image, an automatic extraction module of a single Chinese character, a Chinese character identification module and a Chinese character grading module. When a user uploads a calligraphy practicing image at will, the system can automatically weaken the influence of light intensity, paper difference and printing on the extraction of the edge of the frame in the calligraphy practicing image, and realize the quick extraction of the calligraphy practicing image. The automatic and high-accuracy inner frame extraction mainly depends on various image preprocessing methods included in the invention, and the method is tried one by one through the feedback of an error reporting mechanism unit, so that a mode of effectively extracting the calligraphy practice book is found, if all the methods cannot be effectively extracted, the pictures are fed back to a user to be uploaded and cannot be identified, a common error solution is listed, and the method is convenient for the user to upload again after adjustment. The single Chinese character extraction module comprises a processing unit which removes image shadows, uses special mask plates and dynamically binarizes, and can effectively and stably extract single Chinese characters, so that the subsequent recognition and scoring effects are more stable. The Chinese character recognition module based on the deep learning method enables the system to intelligently recognize Chinese characters, components and strokes related in the teaching materials, and guarantees that the Chinese characters written in any teaching materials on the exercise word can be scored. The scoring mode based on the structure and the content is consistent with the evaluation of professional calligraphy teachers, and the scientificity of the evaluation of the system is reflected. The system not only realizes intelligent scoring of the pencil words, but also can visually display the writing quality of the user, so that the user experience is better, and the practicability is stronger.

The technical scheme of the invention is as follows: the automatic extraction and intelligent scoring system for hand-written Chinese characters, components and strokes includes: the Chinese character input method comprises an automatic extraction module of a calligraphy practicing book image, an automatic extraction module of a single Chinese character, a Chinese character identification module and a Chinese character grading module; the handwriting practicing book image automatic extraction module is used for preprocessing the uploaded handwritten Chinese character image, capturing and correcting the image, and judging a fault reporting mechanism and a format; the single Chinese character automatic extraction module is used for acquiring clear handwritten Chinese characters from the extracted image and further processing the cut Chinese characters with the Mi character grids into binary Chinese characters with the minimum external rectangle; the Chinese character recognition module is used for recognizing handwritten Chinese characters or components, strokes and spaces, and when the Chinese characters written by a user are not in a standard word stock, information which cannot be recognized by the Chinese characters appears; the Chinese character scoring module is used for comparing the written Chinese characters acquired by the Chinese character recognition module with the same standard characters in the standard character library and obtaining a judging result.

The main characteristics and the realization method of each module of the system are as follows:

1. this image automatic extraction module of practising calligraphy includes: the device comprises a photo preprocessing unit, an image capturing unit, a correcting unit, an error reporting mechanism unit and a format judging unit.

(1) The picture preprocessing unit is used for weakening the influence of light intensity, paper difference and printing quality difference on the extraction of the edge of the frame in the calligraphy practicing book image, the pictures are extracted one by one through an error reporting mechanism in the format judging module, and the picture preprocessing mode comprises the following steps:

direct graying;

graying + grayscale enhancement;

photo Blue color channel usage;

photo Blue color channel use + grayscale enhancement;

extracting colors by adopting an HSV color model;

and adopting the color extraction of the HSV color model and the frame enhancement of the Hough transformation technology.

(2) The image capturing unit is used for realizing correct capturing and description of the inner border by adopting canny operator edge detection, contour extraction, quadrilateral fitting and outer border filtering.

(3) The correction unit is used for carrying out perspective transformation on the captured calligraphy practice book image when the calligraphy practice book image photographed by the user has different angles. Specifically, four points of the fitted quadrangle output by the image capturing unit are used as input of perspective transformation, so that image rectification is realized. The user can take a picture at any angle, and finally the picture is corrected to be a front view.

(4) The error reporting mechanism unit is used for the automatic image extraction module to execute the method for calling different photo preprocessing, and simultaneously is used for prompting the user to upload feedback information of the photo error when the user takes the non-standard photo or does not adopt the specified exercise book to take the photo, so that the user can check common error reporting solutions.

The error reporting content output by the error reporting mechanism unit comprises: whether the aspect ratio of the corrected image is normal, whether the extracted area size is normal, and whether the feature of the fixed position exists. The specific algorithm set is as follows:

the length-width ratio is normal within the range of 1.1-1.8;

the ratio of the extracted area to the area of the whole picture is more than 0.25, which is normal;

when the area of a certain 'word' in the format title is extracted as a feature and the extracted user image has a similar area size at a specified position, the user image is normal.

(5) The format judging unit is used for judging which format of the job paper is uploaded by a user and then calling different image cutting units.

The format judging content comprises: presetting at least three formats, and sequentially numbering the formats as a format 1, a format 2 and a format 3; setting a distinguishing algorithm of a format 3 and a non-format 3 and a distinguishing algorithm of a format 1 and a format 2 at the same time; when a user arbitrarily transmits a layout photo, firstly judging whether the layout photo is a layout 3, and if the layout photo is the layout 3, entering the next process; if not, further judging whether the layout is the layout 1 or the layout 2. The specific implementation of the two algorithms is as follows:

algorithm 1: for distinguishing between plate 3 and non-plate 3; specifically, an image capturing algorithm for extracting the layout 1 and the layout 2 is connected with an image capturing algorithm for extracting the layout 3 in series, and the layout is judged according to different numbers of extracted frames:

if the layout transmitted by the user contains two borders, the layout is detected by a first algorithm;

if the user inputs a layout of a frame, the first algorithm reports an error, and the second algorithm is successfully detected.

And 2, algorithm: the method is used for distinguishing the layout 1 from the layout 2, and the detection of the tiny line segments is realized by applying Hough transform. The Hough transform method in an open source digital image library (opencv) commonly used in engineering does not support the detection of the tiny line segments, and the method splices the tiny line segments so as to reach the capability range which can be detected by the existing method, thereby realizing the judgment of the format. The method comprises the specific steps of intercepting two linear regions with the width of about 40 pixels at a certain fixed position with different formats from a corrected image, splicing the regions for 5 times to form a linear line with the length of 200 pixels, and detecting whether the linear line exists in a splicing map by adopting Hough transformation compared with the conventional thought that the format is distinguished by directly comparing the total length of the linear lines in the two formats.

2. The single Chinese character automatic extraction module comprises: the device comprises a pre-cutting preprocessing unit, an image cutting unit, an additional mask plate unit, a dynamic binarization unit and a minimum external rectangle unit.

(1) The preprocessing unit before cutting is used for removing the shadow in the exercise book image and enhancing the contrast of the image, the basic idea of removing the shadow is to separate the image background through morphological dilation operation and median filtering, and then the original image and the background image are differed to obtain the image without the background (the shadow is also a background image), at this time, because of difference operation, the background is changed into black, and the Chinese characters are white. Then, the image with the background stripped off is subtracted from the 255 gray scale to change the original black background into white and the original white character into black, so that the colors of the background and the original image are consistent. And finally, normalizing the image to be in a range of 0-255, specifically, setting the maximum pixel value of the processed image to be 255 and the minimum value to be 0, and normalizing the pixel distribution of the image to be in the range of 0-255, thereby enhancing the contrast.

(2) Because the automatic image extraction module judges the format of the exercise book, all the types of the exercise book can be easily cut into each square Chinese character according to the type setting mode of each format, the corrected image can be zoomed to the same size in practice when the image is cut, and the cutting step length can be adjusted according to the realized correction effect when the image is cut, so that the cutting error is reduced.

(3) The added mask plate unit is used for processing single Chinese character extraction, and partial Chinese character strokes can not be erased as much as possible while the frame of the Chinese character grid is removed; the specific treatment method comprises the following steps: firstly, a special mask plate graph is constructed, a circle is drawn by taking the center of a grid as the center of the circle, a square with the side length equal to the radius of the circle is drawn at the lower right corner of the circle, the top left point of the square is coincident with the center of the circle, thus the special mask plate graph formed by combining the circle and the square is constructed, the grid image arranged inside the mask plate graph is kept unchanged, and the grid image outside the mask plate is changed into white. So both can get rid of the influence of rice word check frame, guarantee again when the square that cuts out from practising calligraphy this image has the deviation, square lower right corner portion can both remain (so keep square lower right part because the last of chinese character of pressing down often can be longer that presses down).

(4) The dynamic binarization unit is used for extracting handwritten Chinese characters in the squares, and the basic idea of binarization is to find an optimal threshold value according to the characteristics that the handwritten Chinese characters in the square image are all black and the background is white, and set the pixel value of the image lower than the threshold value to be 0 and the pixel value of the image higher than the threshold value to be 255, so that the binarization of the image pixels and the separation of the handwritten Chinese characters from the background are realized. The general threshold selection is usually static, that is, a fixed value is set for binarization; due to the fact that the fluctuation of the image quality exists objectively, the method easily causes poor effect extraction of partial handwritten Chinese characters; the dynamic threshold segmentation is to calculate different optimal thresholds according to different brightness distribution characteristics of each square image for segmentation, and the dynamic threshold segmentation can extract clearer Chinese characters. Specifically, the invention adopts dynamic threshold segmentation with a mask to realize Chinese character extraction, the mask has the function of covering useless image areas, only the useful areas are calculated according to the distribution characteristics of image brightness to obtain dynamic thresholds, and the calculation method adopts classical great-volume operators.

(5) The minimum circumscribed rectangle obtaining unit is used for performing edge extraction and contour extraction on a binary image subjected to threshold segmentation by adopting a canny operator, then filtering some noise contours through contour areas and the number of contour points, and finally selecting coordinates of the contour points, and comprises the following steps: selecting x on the plane of the extracted contour_min、y_minAnd x_max、y_maxAnd coordinate points which are coordinates of two diagonal points of the minimum bounding rectangle. X on the plane_min、y_minX and y coordinate values respectively representing the lower left corner of the minimum circumscribed rectangle_max、y_maxAnd respectively representing the x coordinate value and the y coordinate value of the upper right corner of the minimum circumscribed rectangle.

3. The Chinese character recognition module comprises: the device comprises a picture preprocessing unit before identification, a trained convolutional neural network model unit and a prediction result processing unit.

(1) The image preprocessing unit is used for uniformly preprocessing an input image so as to input a convolutional neural network model; specifically, the size of the picture is uniformly scaled to 299 × 299, and the pixel values are uniformly normalized: the pixel size within the range of 0-255 is normalized to be between 0-1, so that the operation is convenient.

(2) The trained convolutional neural network model unit is used for training model parameters, and comprises the following steps: the IncepisationV 3 network structure proposed by Google, Inc., and the trained model parameters provided by the present invention. Further, the specific model training process is as follows:

preparation of data set: at least 5 ten thousand handwritten Chinese character data sets are produced, at least 5 thousand test sets are produced, a classification network is adopted for training, the Chinese characters are classified into 473 types, the number of the Chinese characters matched with the teaching material is consistent with that of the components, and the Chinese characters contain blank spaces. It should be noted that the data set will gradually increase with the development of the calligraphy course, and will eventually cover most of the commonly used chinese characters.

Comparison of model structures: the influence of three network structures of VGG, IncepotionV 3 and Dennet on the recognition effect of the handwritten Chinese characters is compared, and the recognition accuracy of the VGG network is the lowest, the recognition accuracy of the Dennet is the second, and the recognition accuracy of the IncepotionV 3 network intervention is the highest under the same condition;

in the study of data enhancement during training, 5 enhancement modes are tried, respectively:

no enhancement;

enhancing by changing the ways of shading, turning and random cutting;

the mode of changing the brightness and the random cutting is enhanced;

enhancing by changing the ways of shading, small-angle rotation and random cutting;

enhancing by changing the ways of shading, small-angle rotation, amplification and reduction increase;

among the above 5 modes, preferred are: changing the brightness and the darkness and randomly cutting; the beneficial effects are as follows: the accuracy and generalization effect obtained by training the model in the data enhancement mode are optimal.

(3) The prediction result processing unit is used for inputting only recognized Chinese characters into the Chinese character scoring module, and the model training considers the blank condition, so that the recognition model can also recognize various blank types such as a Chinese character grid, a field grid and the like. Three results will be output in actual use: unrecognized Chinese characters, spaces and recognized Chinese characters, wherein the probability of the unrecognized Chinese characters is predicted by a finger model is lower than 35%.

And simultaneously setting: only recognized Chinese characters enter the Chinese character scoring module, and both recognized spaces and unrecognized Chinese characters are set not to enter the scoring module.

4. The Chinese character scoring module calls the manufactured binary icon standard character with the minimum external rectangle and the binary icon handwritten Chinese character with the minimum external rectangle as input according to the result of the recognition model, and the Chinese character scoring module comprises the following steps: the device comprises a similarity calculation unit, an evaluation unit, a judgment unit and an evaluation result adjustment unit.

(1) The similarity calculation unit includes: the similarity structure evaluation mechanism is used for evaluating the structure of the handwritten Chinese character and is expressed by the product of the ratio of the width of a circumscribed rectangle of the handwritten Chinese character to the width of a standard character and the ratio of the height of the circumscribed rectangle of the handwritten Chinese character to the height of the standard character; and the similarity content evaluation mechanism is used for evaluating the content similarity between the handwritten word and the standard word after the structure evaluation is finished.

The specific implementation scheme is as follows: scaling the handwritten Chinese characters to a size similar to the standard Chinese characters in equal proportion, namely: the height or width between the handwritten Chinese character and the standard Chinese character is the same as that of the standard Chinese character, then the two pictures are respectively added into a background frame with the same size, and the handwritten Chinese character and the standard Chinese character are respectively arranged in the center of the background frame. The cosine similarity is adopted to calculate the content similarity, and the cosine similarity method is to expand a two-dimensional matrix of the picture into one-dimensional vectors and then calculate cosine values of the two vectors. And finally, multiplying the cosine value by the numerical value of the structural evaluation to obtain the comprehensive similarity score of the Chinese character.

It should be noted that the final integrated similarity calculation method of the present invention is finally determined through a series of selections and comparisons, and excludes other integrated similarity calculation methods such as weighting or structure evaluation index processing. The method is based on criteria that can be well met with the evaluation of professional calligraphers.

The specific implementation scheme is as follows: a professional calligraphy teacher selects a plurality of samples with three grades of 'good', 'middle', 'poor' from a plurality of samples, and then obtains a comprehensive similarity value of any Chinese character by adopting a certain comprehensive similarity calculation method; if the comprehensive similarity value of the 'bad' character is higher than that of the 'good' character, the calculation method of the comprehensive similarity value needs to be adjusted until a calculation method is found, and the calculation result of the calculation method is consistent with the evaluation effect of a teacher.

(2) The evaluation unit is used for obtaining: the final evaluation results of three grades of "good", "medium" and "poor"; the unit comprises a calculated threshold distinguishing table, each Chinese character has two thresholds, when the comprehensive similarity value of the handwritten Chinese characters of the user is higher than the larger threshold, the handwritten Chinese characters are evaluated as good characters, when the comprehensive similarity value of the handwritten Chinese characters of the user is lower than the smaller threshold, the handwritten Chinese characters of the user are evaluated as poor characters, and the handwritten Chinese characters between the good characters and the poor characters are evaluated as medium Chinese characters; the threshold table is obtained from the integrated similarity values calculated based on the "good", "medium", and "poor" samples.

(3) The judging unit is used for counting the word number proportion of three levels of good, middle and poor in a picture uploaded by a user, and when the proportion of the good word or the poor word is too low or too high, the evaluation result is adjusted.

(4) The evaluation result adjusting unit is used for enabling part of poor characters to be changed into medium characters and part of the medium characters to be changed into good characters by micro-adjusting the similarity value of the standard sample so as to improve the enthusiasm of the students in calligraphy practice; meanwhile, in order to not influence the whole performance of the calligraphy of the student, the adjustment is only carried out by one-time fine adjustment, and the poor characters cannot be adjusted into good characters; the transparent graph of the character and the transparent graph of the standard character are output while the scoring result is output, and then the transparent graphs are overlapped together, so that a user can visually experience the writing quality, and can write in a position where the user is not very good.

The key points of the technical scheme of the invention are as follows:

1. and automatically extracting the calligraphy practicing book image. The image extraction module of the invention integrates various technologies, including RGB color channel application, gray level enhancement technology, color extraction in HSV mode and combined application of the color extraction and straight line detection, and can automatically and continuously try to select different frame capture algorithms to find the best capture by matching with the perfect error reporting feedback mechanism of the invention. Compared with the CN201510565555 patent in which a user needs to align to a calligraphy practice book fixed point according to three fixed points on a photographing interface to obtain an image, the method is more automatic; compared with the method for capturing the calligraphy practicing book image through simple linear detection in the patent CN201910427401, the method has the advantages of diversified algorithm selection, higher accuracy, more intelligent image capturing and less strict requirements on the photographing realization environment and the mobile phone performance.

2. The prior patent does not relate to the problem of format discrimination, and the method well realizes the discrimination of different formats by the way of linear splicing and algorithm series connection, thereby having wider application range.

3. Aiming at the problems of light color of the pencil characters, influence of shadow, frame residue and unclear Chinese characters in the binarization extraction process of the Chinese characters, the method adds a shadow removing algorithm, simultaneously removes the frame in a mode of adding a special mask plate before binarization, and then clearly extracts the Chinese characters by adopting a dynamic binarization method with the mask plate.

4. The invention establishes an intelligent recognition model which can effectively recognize Chinese characters and components. The method is characterized in that Incepision V3 is used as a convolutional neural network structure, a random cutting and shading changing picture enhancement mode is adopted during model training, and at least 5 ten thousand pictures are contained in a manufactured data set.

5. The similarity score of the invention adopts the comprehensive scoring thought of structure + content, the structure is based on the length and width of the Chinese characters, the content is determined by cosine similarity, and a formula with the best actual effect is provided, namely: the integrated similarity is equal to:

in the formula, the widths of the handwritten Chinese characters and the standard Chinese characters respectively refer to the widths of handwritten Chinese character images and standard Chinese character images under the external rectangles; similarly, the heights of the handwritten Chinese characters and the standard characters respectively refer to the heights of the handwritten Chinese character images and the standard Chinese character images under the external rectangles; cosine similarity representation expands the matrix of two images into two one-dimensional vectors: (

And

). For example: the size of each of the two pictures is 100 x 100, the two pictures become 1 x 10000 after being unfolded into vectors, and then the cosine value of the included angle theta of the two vectors is calculated, and the formula is as follows:

it should be noted that: cosine similarity can be solved only under the condition that the dimensions contained in the two vectors are consistent, namely the two images from which the vectors are obtained are required to be the same in size, and the length, the width and the standard characters of the handwritten Chinese character under the circumscribed rectangle are different. The method adopts the measures that firstly, the handwritten Chinese character is zoomed to the size similar to the standard character (the height or the width of the handwritten Chinese character image is the same as the standard character), then, the zoomed handwritten Chinese character image and the standard image are respectively placed at the central position of a background image with the same size to generate two new images with the same size, and thus, the content similarity of the handwritten Chinese character and the standard character can be represented by calculating the cosine similarity of the two new images. The font characteristics (namely the content characteristics mentioned above) of the whole Chinese character can be well displayed through the operation of zooming the handwritten character, and then the difference in content is well evaluated by adopting cosine similarity when the handwritten character and the standard character are placed in a background image with the same size. The final integrated similarity is the product of the structural similarity calculated at the beginning and the cosine similarity calculated after some processing.

Compared with the prior art, the invention has the advantages and beneficial effects that:

1. further, the function that the user uploads the photos at will to realize automatic extraction is achieved.

2. The Chinese character recognition module applies the current popular deep learning technology.

3. The technical scheme of the invention not only adopts cosine similarity to replace a contact ratio algorithm, but also considers the influence of the structure on the scoring, and selects the optimal comprehensive scoring calculation method, so that the scoring is more accurate and consistent with the evaluation of professional calligraphy teachers.

4. The Chinese character scoring algorithm is optimized, the Chinese character scoring algorithm is matched with the evaluation standard of the calligraphy angle, and meanwhile, the intuitive experience of the quality of the Chinese characters can be displayed for the user.

5. Not only can score Chinese characters, but also can score written components.

Drawings

FIG. 1 is a block diagram of the structure of an embodiment of an automatic extraction and intelligent scoring system for handwritten Chinese characters or radicals and strokes of the present invention;

FIG. 2 is a detailed flow chart of the automatic extraction of the exercise book image;

FIG. 3 is an effect diagram of an automatic extraction module for three types of layout exercise book images;

FIG. 4 is a flow chart of a four-step extraction module for a single Chinese character, comprising: preprocessing before cutting, image cutting, adding a mask, performing threshold segmentation, and obtaining a minimum circumscribed rectangle;

fig. 5 is an effect diagram after an automatic extraction module for a single chinese character, which sequentially shows from left to right: the Chinese characters after cutting, the Chinese characters after adding the mask plate, the Chinese characters after threshold value segmentation, and the minimum external rectangular Chinese characters;

FIG. 6 is a flow chart of the Chinese character recognition module in the process of recognizing Chinese characters, and only recognized Chinese characters will be scored;

FIG. 7 is a flow chart of a Chinese character scoring module, which has six parts from left to right, namely: calculating, evaluating and judging the similarity, and adjusting the evaluation result;

fig. 8 is a diagram of a comprehensive evaluation effect presented in actual evaluation according to an embodiment of the system for automatically extracting and intelligently scoring handwritten Chinese characters or components and strokes of the present invention, a leftmost photo in the diagram is 96 Chinese characters in total uploaded by a user, each Chinese character or stroke is completely extracted to a mobile phone screen on the right side, a "crown" pattern is given and points are obtained for a good "word with good writing structure and strokes, medium and poor words are not marked and given points, and evaluations after clicking each Chinese character are different. In the four small graphs positioned at the four corners of the large graph on the right side in the figure 8, two on the left side are comments of good words and the superposition effect; on the right side are the comments and the effects of the coincidence of the two medium words.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings, which are illustrative and not restrictive, and any modifications, equivalents or improvements made within the spirit and principles of the invention shall be included in the scope of the claims of the present invention, and all technical solutions not described in detail are known in the art.

Referring to fig. 1-8, the automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes of the invention comprises: the Chinese character handwriting practicing system comprises an automatic extraction module of a handwriting practicing book image, an automatic extraction module of a single Chinese character, a Chinese character identification module and a Chinese character grading module. The automatic extraction module of the calligraphy practicing book image is used for preprocessing the uploaded handwritten Chinese character photos, capturing images, correcting, reporting errors and judging formats; the single Chinese character automatic extraction module is used for cutting an image and enhancing the contrast, and further processing the cut Chinese characters with the Chinese character 'mi' character grids into binary Chinese characters with the minimum external rectangle, and then inputting the binary Chinese characters into the Chinese character recognition module; the Chinese character recognition module is used for recognizing handwritten Chinese characters or components, strokes and spaces, and can prompt information which cannot be recognized by the Chinese characters when the Chinese characters written by a user are not in a standard word stock, the standard words corresponding to the handwritten Chinese characters can be called by the recognition module, and the handwritten Chinese characters and the standard words are transmitted to the evaluation module together; the Chinese character scoring module is used for comparing the written Chinese characters acquired by the Chinese character recognition module with the same standard characters in the standard character library and obtaining a judging result.

1. this image automatic extraction module of practising calligraphy includes: the device comprises a picture preprocessing unit, an image capturing unit, a correcting unit, an error reporting mechanism unit and a format judging unit.

direct graying;

graying + grayscale enhancement;

photo Blue color channel usage;

photo Blue color channel use + grayscale enhancement;

extracting colors by adopting an HSV color model;

the length-width ratio is normal within the range of 1.1-1.8;

the area of the character "day" in the format title is extracted as a feature, and the extracted user image is normal when similar area size exists at a specified position.

As shown in fig. 3, the layout judgment content includes: the three formats in fig. 3 are numbered as format 1, format 2 and format 3 in sequence. The basic idea of the version is to design a distinguishing algorithm of the version 3 and the non-version 3, and a distinguishing algorithm of the version 1 and the version 2. When a user arbitrarily transmits a layout, firstly judging whether the layout is 3, and if the layout is 3, entering the next process; if not, further judging whether the layout is the layout 1 or the layout 2. The specific implementation of the two algorithms is as follows:

2. The single Chinese character automatic extraction module comprises: preprocessing before cutting, namely, an image cutting unit, an addition of a circular mask plate unit, a threshold segmentation unit and an acquisition of a minimum circumscribed rectangle unit.

(1) The preprocessing unit before cutting is used for removing shadows in the exercise book image and enhancing the contrast of the image, the basic idea of removing the shadows is to separate an image background through morphological dilation operation and median filtering, and then the original image and the background image are differed to obtain an image without the background (the shadows are also a background image), at this time, the background is changed into black due to difference operation, and the Chinese characters are white. Then, the image with the background stripped off is subtracted from the 255 gray scale to change the original black background into white and the original white character into black, so that the colors of the background and the original image are consistent. And finally, normalizing the image to be in a range of 0-255, specifically, setting the maximum pixel value of the processed image to be 255 and the minimum value to be 0, and normalizing the pixel distribution of the image to be in the range of 0-255, thereby enhancing the contrast.

(4) And the dynamic binarization unit is used for extracting the Chinese characters in the square grids. Compared with the method that a threshold value is directly set, pixels lower than the threshold value become 0, pixels higher than the threshold value become 255, and the static threshold value segmentation is used for separating Chinese characters, the dynamic threshold value segmentation can select the optimal segmentation threshold value according to a specific grid image, and therefore clear Chinese characters can be extracted. Specifically, the method adopts dynamic threshold segmentation with a mask to realize Chinese character extraction, the mask has the function of covering useless image areas, only calculates useful areas and obtains dynamic thresholds, and the calculation method of the dynamic thresholds is obtained by calculating by adopting an extra operator based on the brightness distribution characteristic of an actual square image.

(5) The minimum bounding rectangle unit is obtained for the pass threshold scoreThe method comprises the following steps of performing edge extraction and contour extraction on a cut binary image by adopting a canny operator, filtering some noise contours by virtue of contour areas and the number of contour points, and finally selecting coordinates of the contour points, wherein the steps comprise: selecting x on the plane of the extracted contour_min、y_minAnd x_max、y_maxAnd coordinate points which are coordinates of two diagonal points of the minimum bounding rectangle. X on the plane_min、y_minX and y coordinate values respectively representing the lower left corner of the minimum circumscribed rectangle_max、y_maxAnd respectively representing the x coordinate value and the y coordinate value of the upper right corner of the minimum circumscribed rectangle.

3. The Chinese character recognition module comprises: the device comprises an image preprocessing unit, a trained convolutional neural network model unit and a prediction result processing unit.

(1) The image preprocessing unit is used for uniformly preprocessing an input image so as to input a convolutional neural network model; the specific processing mode for unifying the sizes of the pictures comprises the following steps: the pixel size within the range of 0-255 is normalized to be between 0-1, so that the operation is convenient.

no enhancement;

enhancing by changing the ways of shading, turning and random cutting;

the mode of changing the brightness and the random cutting is enhanced;

(1) The similarity calculation unit includes: the similarity structure evaluation mechanism is used for evaluating the structure of the handwritten Chinese character and is represented by the product of the ratio of the width of a circumscribed rectangle of the handwritten Chinese character to the width of a standard character and the ratio of the height of the circumscribed rectangle of the handwritten Chinese character to the height of the standard character; and the similarity content evaluation mechanism is used for evaluating the content similarity between the handwritten word and the standard word after the structure evaluation is finished.

The core points of the technical scheme of the invention are as follows:

5. The similarity score of the invention adopts the comprehensive scoring thought of structure + content, the structure is based on the length and width of the Chinese characters, the content is determined by cosine similarity, and a formula with the best practical effect is provided, namely the comprehensive similarity is equal to:

And

it should be noted that the cosine similarity is solved only when the two images are the same in size, and only then can it be ensured that the dimensions contained in the two vectors are consistent, while the length, width and standard characters of the handwritten Chinese character under the circumscribed rectangle are different. The method adopts the measures that firstly, the handwritten Chinese character is zoomed to the size similar to the standard character (the height or the width of the handwritten Chinese character image is the same as the standard character), then, the zoomed handwritten Chinese character image and the standard image are respectively placed at the central position of a background image with the same size to generate two new images with the same size, and thus, the content similarity of the handwritten Chinese character and the standard character can be represented by calculating the cosine similarity of the two new images. The font characteristics (namely the content characteristics mentioned above) of the whole Chinese character can be well displayed through the operation of zooming the handwritten character, and then the difference in content is well evaluated by adopting cosine similarity when the handwritten character and the standard character are placed in a background image with the same size. The final integrated similarity is the product of the structural similarity calculated at the beginning and the cosine similarity calculated after some processing.

Compared with the methods for scoring Chinese characters by adopting coincidence degrees in the published patents CN201510565555, CN201911199560 and CN201910427401, the method has the advantages that the algorithm is only simple in area coincidence degree, and the similarity degree of the whole fonts of the handwritten characters and the standard characters cannot be reflected. The invention not only adopts cosine similarity to replace a coincidence degree algorithm, but also considers the influence of the structure on the scoring, and selects the optimal comprehensive scoring calculation method, so that the scoring is more accurate and consistent with the evaluation of professional calligraphy teachers.

The automatic extraction and intelligent scoring system for handwritten Chinese characters or components and strokes further solves the problem that a user uploads photos at will to realize the automatic extraction function; the method applies the current popular deep learning technology as a Chinese character recognition module; the Chinese character scoring algorithm is optimized, the Chinese character scoring algorithm is matched with the evaluation standard of the calligraphy angle, and the intuitive experience of the Chinese characters can be displayed for the user; not only can score Chinese characters, but also can score written components.

Claims

1. The automatic extraction and intelligent scoring system for hand-written Chinese characters, components and strokes includes: the Chinese character input method comprises an automatic extraction module of a calligraphy practicing book image, an automatic extraction module of a single Chinese character, a Chinese character identification module and a Chinese character grading module; the handwriting practicing book image automatic extraction module is used for preprocessing the uploaded handwritten Chinese character image, capturing and correcting the image, and judging a fault reporting mechanism and a format; the single Chinese character automatic extraction module is used for cutting the image and enhancing the contrast, and further processing the cut Chinese characters with the Chinese character 'mi' character grid into binary Chinese characters with the minimum external rectangle; the Chinese character recognition module is used for recognizing handwritten Chinese characters or components, strokes and spaces, and when the Chinese characters written by a user are not in a standard word stock, information which cannot be recognized by the Chinese characters appears; the Chinese character scoring module is used for comparing the written Chinese characters acquired by the Chinese character recognition module with the same standard characters in the standard character library and obtaining a judging result; the method is characterized in that:

this image automatic extraction module of practising calligraphy includes: the device comprises a photo preprocessing unit, an image capturing unit, a correcting unit, an error reporting mechanism unit and a format judging unit;

the single Chinese character automatic extraction module comprises: the method comprises the following steps of (1) image cutting unit, adding a circular mask plate unit and a threshold value segmentation unit, and obtaining a minimum external rectangular unit;

the Chinese character recognition module comprises: the image preprocessing unit, the trained convolutional neural network model unit and the prediction result processing unit;

the Chinese character scoring module comprises: the device comprises a similarity calculation unit, an evaluation unit, a judgment unit and an evaluation result adjustment unit; and the Chinese character scoring module calls the manufactured binary icon standard character with the minimum circumscribed rectangle and the binary icon handwritten Chinese character with the minimum circumscribed rectangle as input according to the result of the recognition model.

2. The system for automatic extraction and intelligent scoring of handwritten chinese characters or radicals and strokes as claimed in claim 1, wherein:

the picture preprocessing unit is used for weakening the influence of light intensity, paper difference and printing quality difference on the extraction of the edge of the frame in the calligraphy practicing book image, the pictures are extracted one by one through an error reporting mechanism in the format judging module, and the picture preprocessing mode comprises the following steps:

direct graying;

graying + grayscale enhancement;

photo Blue color channel usage;

photo Blue color channel use + grayscale enhancement;

extracting colors by adopting an HSV color model;

adopting HSV color model color extraction and Hough transformation technology frame enhancement;

the image capturing unit is used for realizing correct capturing and description of the inner frame by adopting canny operator edge detection, contour extraction, quadrilateral fitting and outer frame filtering;

the correction unit is used for carrying out perspective transformation on the captured calligraphy practice book image when the calligraphy practice book image photographed by the user has different angles; specifically, four points of a fitted quadrangle output by an image capturing unit are used as input of perspective transformation, so that image rectification is realized; the picture is finally corrected into a front view no matter the angle of the picture taken by the user;

the error reporting mechanism unit is used for the automatic image extraction module to execute methods for calling different photo preprocessing, and simultaneously prompting the user to upload feedback information of the photo error when the user takes a non-standard photo or does not adopt a specified exercise book to take a photo, so that the user can check common error reporting solutions;

the error reporting content output by the error reporting mechanism unit comprises: whether the aspect ratio of the corrected image is normal or not, whether the extracted area size is normal or not and whether the characteristic of the fixed position exists or not; the specific algorithm set is as follows:

the length-width ratio is normal within the range of 1.1-1.8;

when the area of a certain 'word' in the format title is extracted as a feature and the extracted user image has a similar area size at a specified position, the user image is normal;

the format judging unit is used for judging which format of the job paper is uploaded by a user and then calling different image cutting units;

the format judging content comprises: presetting at least three formats, and sequentially numbering the formats as a format 1, a format 2 and a format 3; setting a distinguishing algorithm of a format 3 and a non-format 3 and a distinguishing algorithm of a format 1 and a format 2 at the same time; when a user arbitrarily transmits a layout photo, firstly judging whether the layout photo is a layout 3, and if the layout photo is the layout 3, entering the next process; if not, further judging whether the layout is a layout 1 or a layout 2;

the specific implementation of the two algorithms is as follows:

if the format of a frame is transmitted by the user, the first algorithm reports errors, and the second algorithm is successfully detected;

and 2, algorithm: the method is used for distinguishing the layout 1 from the layout 2, and the adopted method is to realize the detection of the tiny line segments by using Hough transform; the tiny line segments are spliced, so that the detection capability range of the existing method is achieved, and the judgment of the format is realized; the specific method comprises the following steps: intercepting two linear regions with the width of about 40 pixels at a certain fixed position with different formats from the corrected image, splicing the linear regions for 5 times to form a linear line with the length of 200 pixels, and detecting whether the spliced image has the linear line by adopting Hough transform, wherein the linear line is compared with the format difference by directly comparing the total length of the linear lines in the two formats in a conventional thought;

3. the system for automatic extraction and intelligent scoring of handwritten chinese characters or radicals and strokes as claimed in claim 1, wherein:

the preprocessing unit before cutting is used for removing shadows in the exercise book image, enhancing the contrast of the image, separating an image background through morphological expansion operation and median filtering, and then subtracting the original image from the background image to obtain an image without the background, wherein the background is changed into black, and the Chinese characters are white; then, the difference is made between the 255 gray scale and the image with the background stripped, the original black of the background is changed into white, and the Chinese characters are changed into black; finally, normalizing the image to be in a range of 0-255, specifically, setting the maximum pixel value of the processed image to be 255 and the minimum value to be 0, and normalizing the pixel distribution of the image to be in a range of 0-255 to realize the enhancement of the contrast;

the image cutting unit is used for cutting the automatically extracted images according to the specification of each format to obtain square Chinese characters, zooming the corrected images to the same size when cutting the images, and adjusting the cutting step length according to the realized correction effect when cutting to reduce the cutting error;

the added mask plate unit is used for processing single Chinese character extraction, and partial Chinese character strokes can not be erased as much as possible while the frame of the Chinese character grid is removed; the specific treatment method comprises the following steps: firstly, a mask plate graph is constructed, a circle is drawn by taking the center of a square as the center of the circle, a square with the side length equal to the radius of the circle is drawn at the lower right corner of the circle, the top left point of the square is superposed with the center of the circle, so that the mask plate graph formed by combining the circle and the square is formed, the grid image in the mask plate graph is kept unchanged, and the grid image outside the mask plate is changed into white;

the dynamic binarization unit is used for extracting Chinese characters in squares, is a dynamic threshold segmentation mode with a mask, firstly sets a threshold, and limits pixels lower than the threshold to become 0 and pixels higher than the threshold to become 255; the mask is used for covering useless image areas, only the useful areas are calculated and dynamic threshold values are obtained during operation, and the calculation method of the dynamic threshold values is obtained by calculation through an Otsu operator based on the brightness distribution characteristics of the actual square grid image;

the minimum circumscribed rectangle obtaining unit is used for performing edge extraction and contour extraction on a binary image subjected to threshold segmentation by adopting a canny operator, then filtering some noise contours through contour areas and the number of contour points, and finally selecting coordinates of the contour points, and comprises the following steps: selecting xmin, ymin, xmax and ymax coordinate points on the plane where the extracted contour is located as coordinates of two diagonal points of the minimum circumscribed rectangle; xmin and ymin on the plane respectively represent x and y coordinate values of the lower left corner of the minimum circumscribed rectangle, and xmax and ymax respectively represent x and y coordinate values of the upper right corner of the minimum circumscribed rectangle.

4. The system for automatic extraction and intelligent scoring of handwritten chinese characters or radicals and strokes as claimed in claim 1, wherein:

the image preprocessing unit is used for uniformly preprocessing an input image so as to input a convolutional neural network model; the specific processing mode for unifying the sizes of the pictures comprises the following steps: normalizing the pixel size within the range of 0-255 to be 0-1;

the trained convolutional neural network model unit is used for training model parameters, and comprises the following steps: the Incepison V3 network structure proposed by Google corporation, and the trained model parameters provided by the invention;

the prediction result processing unit is used for inputting only the recognized Chinese characters into the Chinese character scoring module; three results are output in actual use: unrecognized Chinese characters, spaces, recognized Chinese characters; wherein: the probability of the unrecognized Chinese character finger model prediction is lower than 35 percent;

setting: only recognized Chinese characters enter a Chinese character scoring module, and recognized spaces and unrecognized Chinese characters do not enter the scoring module;

5. the system for automatic extraction and intelligent scoring of handwritten chinese characters or radicals and strokes as recited in claim 4, wherein: the training process of the training model parameters is as follows:

preparation of data set: making at least 5 ten thousand handwritten Chinese character data sets, making at least 5 thousand test sets, training by adopting a classification network, classifying into 473 classes, matching with the Chinese characters and the components matched with the teaching material in quantity, and containing blank spaces;

comparison of model structures: comparing the influence of three network structures of VGG, IncepotionV 3 and Densnet on the recognition effect of the handwritten Chinese characters, and selecting the IncepotionV 3 network structure under the same condition;

the data enhancement mode during model training is selected as follows: change shade + random clipping.

6. The automatic extraction and intelligent scoring system for handwritten Chinese characters or radicals and strokes as claimed in claim 1, wherein the Chinese character scoring is characterized by:

the similarity calculation unit includes: the similarity structure evaluation mechanism is used for evaluating the structure of the handwritten Chinese character and is expressed by the product of the ratio of the width of a circumscribed rectangle of the handwritten Chinese character to the width of a standard character and the ratio of the height of the circumscribed rectangle of the handwritten Chinese character to the height of the standard character; the similarity content evaluation mechanism is used for evaluating the content similarity between the handwritten word and the standard word after the structure evaluation is finished;

the specific evaluation method comprises the following steps:

scaling the handwritten Chinese characters to a size similar to the standard Chinese characters in equal proportion, namely: the height or width between the handwritten Chinese character and the standard Chinese character is the same as that of the standard Chinese character, then the two pictures are respectively added into a background frame with the same size, and the handwritten Chinese character and the standard Chinese character are respectively arranged in the center of the background frame, and through the operation, two pictures with the same size are obtained, one is the handwritten Chinese character and the other is the standard Chinese character, and the two pictures are both arranged in the center of the picture;

calculating content similarity by cosine similarity, wherein the cosine similarity is calculated by unfolding a two-dimensional matrix of a picture into one-dimensional vectors and then calculating cosine values of the two vectors;

multiplying the cosine value by the numerical value of the structural evaluation to serve as the comprehensive similarity score of the Chinese character;

the comprehensive similarity calculation method is based on the criterion conforming to the evaluation of professional calligraphy teachers;

the specific method comprises the following steps:

firstly, selecting samples of three grades of 'good', 'middle' and 'poor' from a plurality of samples by a professional calligraphy teacher, and then obtaining a comprehensive similarity value of any Chinese character by adopting a comprehensive similarity calculation method; defining that the similarity value is larger, the handwritten character is better, and when the comprehensive similarity value of the poor character is higher than that of the good character, the calculation method of the comprehensive similarity value is adjusted until the calculation result is consistent with the evaluation effect of a calligraphy teacher;

the evaluation unit is used for obtaining: the final evaluation results of three grades of "good", "medium" and "poor"; the evaluation unit comprises a calculated threshold distinguishing table, each Chinese character has two thresholds, when the comprehensive similarity value of the handwritten Chinese characters of the user is higher than the larger threshold, the handwritten Chinese characters are evaluated as good characters, when the comprehensive similarity value of the handwritten Chinese characters of the user is lower than the smaller threshold, the handwritten Chinese characters of the user are evaluated as poor characters, and the handwritten Chinese characters between the good characters and the poor characters are evaluated as medium Chinese characters; the threshold value table is obtained from comprehensive similarity values calculated based on the samples of 'good', 'middle' and 'difference';

the judging unit is used for counting the word number proportion of three levels of good, middle and poor in a picture uploaded by a user, and when the proportion of the good word or the poor word is too low or too high, the evaluation result is adjusted;

the evaluation result adjusting unit is used for enabling part of the difference characters to become middle characters and part of the middle characters to become good characters by micro-adjusting the similarity values of the standard samples; setting result adjustment is only carried out once, and the difference word cannot be adjusted into a good word; and outputting the grading result and simultaneously outputting the transparent picture of the evaluated word and the transparent picture of the standard word, and overlapping the transparent picture of the evaluated word and the transparent picture of the standard word, so that the user can visually experience the writing quality.

7. The system for automatic extraction and intelligent scoring of handwritten chinese characters or radicals and strokes as recited in claim 6, wherein: the similarity score adopts the following steps: a composite scoring method of (structure + content); the structure is based on the length and width of the Chinese characters, the content is determined by cosine similarity, and a comprehensive similarity calculation formula is set as follows:

in the formula, the widths of the handwritten Chinese characters and the standard Chinese characters respectively refer to the widths of handwritten Chinese character images and standard Chinese character images under the external rectangles; similarly, the heights of the handwritten Chinese characters and the standard characters respectively refer to the heights of the handwritten Chinese character images and the standard Chinese character images under the external rectangles; cosine similarity shows that the matrix of the two images is expanded into two one-dimensional vectors, and when the one-dimensional vectors of the handwritten Chinese character image and the standard Chinese character image are respectively

And

then, the cosine value of the included angle theta between the two vectors is calculated according to the following formula: