CN112183538A

CN112183538A - Manchu recognition method and system

Info

Publication number: CN112183538A
Application number: CN202011370960.9A
Authority: CN
Inventors: 张殿典; 张永康
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-01-05
Anticipated expiration: 2040-11-30
Also published as: CN112183538B

Abstract

The invention discloses a Manchu recognition method and a Manchu recognition system, which can quickly recognize partial letters in a character image, quickly and accurately position the Manchu in a database storing Manchu formed by corresponding letters according to the recognized letters, output characters and numbers of the letters corresponding to all letter standard images marked as associated letters in the database, sequentially recognize all similar letters on the Manchu image through a sliding window, compare the letters according to local recognition letters and local adjustment images of the sliding window according to the letter standard images, reduce the operation complexity and improve the recognition precision, index partial letter areas with higher accuracy, ensure the letter recognition reliability by local recognition and reduce the false detection recognition probability.

Description

Manchu recognition method and system

Technical Field

The invention relates to the technical field of character recognition, in particular to a Manchu recognition method and system.

Background

At present, Manchu is a language character used by minority nationalities such as Manchu and siberian nationality in China, although Manchu is a pinyin character. However, because the letters of the Manchu alphabet system are continuous writing and have deformed particularity, and are completely different from the writing rule of the modern Chinese, similar to the ancient Chinese and read and written from top to bottom from left to right, because the Manchu is a pinyin character, unit tones, compound vowels, consonants and specific letters are distinguished on the letters, and the lengths of radicals of the corresponding different Manchu alphabets are different; in the process of translating and explaining the Manchu book, some radicals often cause difficulty in quickly and accurately identifying corresponding characters due to continuous writing of letters, deformation and the like, and often cause a plurality of situations of wrong identification and wrong identification, and the like, and the reasons of scratches, cracks and the like in scanned book images due to storage causes cause great increase of error rate of false detection of Manchu images by the existing Manchu identification method, and the situations also often occur when Manchu is handwritten, so that the handwritten Manchu input is inconvenient; therefore, the Manchu recognition method usually needs to segment the Manchu into basic units (such as letters and the like) first and then recognize the Manchu, so that the improvement of Manchu recognition is usually only started from the improvement of segmentation precision, but the problems of low recognition accuracy caused by continuous writing of letters and deformation are not solved actually; in addition, because the Manchu word is formed by connecting one or more Manchu letters with vertical central axes, no gap exists between the letters in the same word, and the splicing position of the letters is positioned on the central axis of the Manchu word image, the traditional segmentation method is difficult to quickly and accurately identify and position the image or the Manchu characters formed by handwriting.

Disclosure of Invention

The present invention is directed to a method and system for Manchu recognition, which solves one or more of the problems of the prior art and provides at least one of the advantages of the method and system.

All similar letters are sequentially recognized on the Manchu image through the sliding window, partial letters in the character image can be rapidly recognized, so that the Manchu in the database in which the Manchu composed of the corresponding letters is stored can be rapidly and accurately positioned according to the recognized letters, the characters and the serial numbers of the letters corresponding to all letter standard images marked as related letters in the database can be rapidly and accurately output, the letters are recognized locally according to the sliding window, the images are adjusted locally and compared according to the letter standard images, the operation complexity is reduced, the recognition precision is improved, and the problem of inaccurate letter recognition caused by no gap between the letters in the same word is solved.

In order to achieve the above object, according to an aspect of the present invention, there is provided a Manchu recognition method, the method including:

s100, reading each letter standard image;

s200, collecting a Manchu image and carrying out binarization to obtain a binarized image;

s300, filtering the binary image, extracting a significant region after filtering, and performing edge detection on the significant region to obtain an image to be identified;

s400, setting an initial value of a variable i as 1, setting a numeric area of i as [1, N ], wherein i is a natural number, and the value N is a ratio of the size of the image to be recognized to the size of the sliding window; setting an initial value of a variable j as 1, setting a value range of j as [1, M ], wherein M is the total number of letter standard images, and the width of a sliding window is W;

s500, searching an image to be identified by a sliding window method, searching the longest line segment in a sliding window through Hough Transform (Hough Transform) to be used as an ith line segment, calculating an included angle between the ith line segment and the ordinate axis (y axis) of an image matrix along the clockwise direction to be used as an ith angle, when the length of the ith line segment is greater than or equal to K, rotating the image in the sliding window by the ith angle in a counterclockwise direction to obtain the image in the sliding window to be used as an ith comparison image (when i is less than K, the image indicates that no Manchun letters to be identified exist in the area of the sliding window), and K = 0.2W;

s600, filtering the binary image of the jth letter standard image, extracting a significant region after filtering, carrying out edge detection on the significant region to obtain an image to be recognized of the jth letter standard image, and searching the longest line segment in the image to be recognized of the jth letter standard image as a standard line segment through Hough transformation;

s700, zooming the ith comparison image according to the proportion between the standard line segment and the ith line segment, and respectively extracting the vector contour lines of the ith comparison image and the jth letter standard image;

s800, calculating the connection strength between the vector contour lines of the ith comparison image and the jth letter standard image; increasing the value of j by 1, turning to step S900 when the value of j is greater than M or marking the jth letter standard image as an associated letter and turning to step S900 when the connection strength is greater than the strength threshold value, and turning to step S600 when the value of j is less than or equal to M;

s900, increasing the value of i by 1, setting the value of j to be 1, sliding the sliding window by the distance of W, and turning to the step S500 when the value of i is less than or equal to N, and turning to the step S1000 when the value of i is greater than N;

s1000, outputting all letter standard images marked as associated letters;

further, outputting all the characters and numbers of the letters corresponding to the letter standard images marked as the associated letters in the database

Further, in S100, the alphabetical standard image is an image of a Manchu alphabet pre-stored in a database; the letter standard images are constructed by referring to a Manchu alphabet and a national standard of the people's republic of China, namely ' information technology universal multi-eight-bit coding character set Siberian language and Manchu character type ', the Manchu language has 4 different forms of a common letter, such as an independent shape, a head shape, a middle shape and a tail shape, the Manchu language consists of one or more letters, and the letter standard images are images of Manchu languages with different character shapes, wherein the total number of the letter standard images is 114; the database also stores the characters and numbers of the letters corresponding to the letter standard images, and the component numbers of the constituent letters of each Manchu, and can output corresponding Manchu characters according to one or more letters.

Further, in S200, the Manchu image is acquired by scanning the Manchu book with a line camera, handwriting input, or a scanner.

Further, in S300, the filtering method is any one of mean filtering, median filtering, or gaussian filtering; the method for extracting the significant region is any one of an AC algorithm, an HC algorithm, an LC algorithm or an FT algorithm;

wherein the AC algorithm, HC algorithm, LC algorithm or FT algorithm is implemented according to the following references:

and (3) AC algorithm: achanta R, Estrada F, Wils P, et al, friendly Region Detection and Segmentation [ C ]// Proceedings of the 6th international conference on Computer systems, Springer, Berlin, Heidelberg, 2008.

HC algorithm: cheng M M, Mitra N J, Huang X, et al, Global Contrast Based sales Region Detection [ J ]. IEEE Transactions on Pattern Analysis and Machine understanding, 2014.

And (3) LC algorithm: yun Zhai, Mubarak Shah, Visual Attention Detection in Visual Sequences Using spatiomorph Cues, ACM2006.Page 4-5.

FT algorithm: R.Achanta, S.Hemami, F.Estrada and S.S. Susstring, Frequency-tuned saline Region Detection, IEEE International Conference on Computer Vision and Pattern recognition (CVPR 2009), pp.1597-1604,2009;

further, the edge detection of the salient region is the edge detection of a pixel region with Manchu characters in the image.

Further, in S500, the sliding window method is to search for an image to be recognized through a sliding window, and the step size of each sliding is the size of the sliding window; the size of the sliding window is between [0.01 and 1] times of the size of the image to be recognized, or the height (the size) of the sliding window is W x W, the value of W is [0.01 and 1] times of the width of the image to be recognized, the adjustment is carried out according to the full character number in the image to be recognized, the sliding distance is the width of the sliding window every time, after one line is slid on the image matrix every time, the sliding window automatically jumps to the next line of the image matrix according to the height of the sliding window (the image is scanned transversely from a pixel area which is not selected by the sliding window in the image matrix), and the sliding is continued, and the following steps are carried out: the height and width is also called the length and width, and is in units of pixels.

Further, in S700, when the ith comparison image is scaled according to the ratio between the standard line segment and the ith line segment, the edge of the ith comparison image is left blank by at least 8 pixels, so as to obtain an image with a size of 80 × 80.

Further, in S700, the method for extracting the vector contour line of the ith contrast image is as follows: in the ith contrast image, starting from an end point of the ith line segment, which is closest to the ordinate axis, calculating curvature values at each edge point on the ith line segment, calculating an average value of all the curvature values, and taking all the edge points with the curvature values larger than the average value as corner points to form a large curvature point set; and sequentially connecting each corner point with the large-curvature point set curvature value larger than the average value.

Further, in S700, the method for extracting the vector contour line of the jth letter standard image includes: starting from an end point of the standard line segment, which is closest to the ordinate axis, calculating curvature values at each edge point on the standard line segment, calculating an average value of all the curvature values, and taking all the edge points with the curvature values larger than the average value as corner points to form a large curvature point set; and sequentially connecting each corner point with the large-curvature point set curvature value larger than the average value.

Further, the method for sequentially connecting each corner point with the large-curvature point set and the curvature value larger than the average value comprises the following steps:

s701, setting the coordinates of an edge point with the minimum value of a central abscissa axis (value of an x axis) of the large-curvature points as (Xmin, Ymin), setting the coordinates of an edge point with the maximum value of a central abscissa axis of the large-curvature points as (Xmax, Ymax), setting the interval formed by a plurality of rows of pixels of a y axis as span pixels, setting the span value as an integer between 10 and 100, setting the initial value of a variable h as 0, setting the initial value of a variable r as 1, and setting h and r as natural numbers;

s702, sequencing each corner point in the large-curvature point set from small to large according to the value of an abscissa; except for the corner points with the maximum and minimum horizontal coordinates, each corner point in the large curvature point set is correspondingly provided with a connecting mark Linkmark and an array mark ArrayMark, the Linkmark and the ArrayMark are both set to be 1, the connecting mark Linkmark of the corner point with the minimum horizontal coordinate is set to be 2 (namely the connecting mark Linkmark =2 of the first corner point in the large curvature point set), and the connecting mark Linkmark of the corner point with the maximum horizontal coordinate is set to be 3 (namely the connecting mark Linkmark =3 of the last corner point in the large curvature point set);

s703, making the corner point in the interval from Xmin + (h-1) span to Xmin + h span of the abscissa axis value range of the image matrix be the r-th layer connected interval, and making the interval from Xmin + h span to Xmin + (h +1) span of the abscissa axis value range be the r + 1-th layer interval to be connected;

s704, if there is a corner point of Linkmark =2 in the r-th layer connected interval (i.e. there is a first corner point in the large curvature point set, and there is a leftmost boundary point in the image), connecting the corner point of Linkmark =2 with all corner points of Linkmark =1 of the r + 1-th layer to-be-connected interval by vector lines, and setting connecting marks Linkmark of all corner points of the r-th layer connected interval and the r + 1-th layer to-be-connected interval to 0 and proceeding to step S705 (the above steps are performed only at the first corner point); otherwise, if there is no corner point Linkmark =2 in the r-th layer connected section, go to step S705;

s705, if there is a corner point of Linkmark =3 in the r +1 th to-be-connected interval (i.e. there is the last corner point in the large curvature point set, and the rightmost boundary point in the image), connecting all corner points marked as Linkmark =0 in the r-th connected interval to the corner point of Linkmark =3 by using vector lines; and sets the linking mark Linkmark of the corner point of Linkmark =3 to 0 (the above steps are performed only at the last corner point), and goes to step S711 (i.e., the linking process is ended);

if no edge point with Linkmark =3 exists in the r + 1-th layer to-be-connected interval, judging whether the connection marks Linkmark of all edge points in the large-curvature point set are all equal to 0 (namely the connection process is ended), if so, turning to the step S711, and if not, turning to the step S706;

s706, inputting corner points of Linkmark =0 in the r-th layer connected interval into an array as a connected array; inputting corner points of Linkmark =1 in the r + 1-th section to be connected into another array as an array to be connected; sequencing the corner points in the connected array and the array to be connected from small to large according to the value of the ordinate;

s707, regarding the corner point with the largest vertical coordinate value among the corner points of the array mark ArrayMark =1 in the connected array as the first starting point,

taking the corner point with the largest longitudinal coordinate value in all corner points of array mark ArrayMark =1 in the array to be connected as a first end point,

connecting the first starting point and the first ending point by a vector line, and setting an array mark ArrayMark of the first starting point and the first ending point to 0; (this step is to connect the corner points with the highest (upper) y-axis value in turn);

s708, using the corner point with the smallest longitudinal coordinate value among the corner points of array mark ArrayMark =1 in the connected array as the second starting point,

taking the corner point with the smallest longitudinal coordinate value in all corner points of the array mark ArrayMark =1 in the array to be connected as a second end point,

connecting the second starting point and the second ending point by a vector line, and setting an array mark ArrayMark of the second starting point and the second ending point to be 0; (this step is to connect the corner points with the lowest (lower) y-axis value in turn);

s710, when the array mark ArrayMark of all corner points in the connected array or any one of the arrays to be connected is equal to 0 (namely, the connection of all corner points in any one range in the two coordinate axis distance ranges is completed), increasing the variables h and r by 1, setting the array mark ArrayMark of all corner points in the array to be connected to 1 and setting the Linkmark of all corner points in the array to be connected and the connected array to 0 and going to the step S703 (namely, the connection of the next coordinate axis distance range is performed),

otherwise, go to step S706 (continue to connect the corner points in the connected array and the array to be connected);

s711 outputs a vector outline composed of all corner points connected by vector lines (frame connecting corner points for font).

Further, in S800, the method of calculating the connection strength between the vector contour lines of the ith contrast image and the jth letter standard image includes the steps of:

s801, let the vector outline of the ith contrast image bePThe vector contour line of the jth letter standard image isQ(ii) a To be provided withPAndQwill have the center of gravity as the centerPAndQperforming superposition;

wherein the content of the first and second substances,P={p₁, p₂,…,p_k |k>0}，kthe number of corner points on the contour lines in the vector contour line of the ith comparison image,Q={q₁,q₂,… ,q_n |n>0}，nthe number p of corner points on the contour line in the vector contour line of the jth letter standard image_k、q_nIs the corner point on the contour line, p₁And q is₂Are all made ofPAndQcorner point, p, with minimum distance from the ordinate axis₁, p₂,…,p_kAnd q is₁,q₂,… ,q_nIn the two groups of sequences, the distance between each corner point and the ordinate axis is sequentially increased along with the increasing of the subscript serial number;

s802, connecting every two in turn by vector linesPAndQthe corner point with the nearest upper distance is obtained, and the connected side set is a connected side set V = { (p)_FC,q_FC) FC is the serial number of the edge point, and the value range is [1, PNum]PNum is a constant value, PNum has a value ofkAndnthe smaller of the two values;

s803, respectively calculating the end points p of all the connection edges in the connection edge set_FCTo p₁Distance (calculation p)₁, p₂,…,p_PNumAll corner points of these to p₁Distance) of the first distance set, and sequentially taking each calculated distance as a first distance set; respectively calculating the end points q of all the connection edges in the connection edge set_FCTo q₁Distance (calculation of q)₁,q₂,…,q_PNumAll corner points of these to p₁Distance) of the first distance set, and sequentially taking each calculated distance as a second distance set; calculating the difference value of each distance element in the first distance set and each distance element in the second distance set in sequence; counting the number of all the difference values as positive numbers and negative numbers; when the number of positive numbers is larger than the negative number, go to step S804, otherwise go to step S805; the distance elements are all distance values in the set;

s804, calculating the connection strength S of the connection edge set V and going to the step S806,

wherein the content of the first and second substances,

，

wherein the similarity function

，|p_FC-q_FC|_xHas the meaning of point p_FCAbscissa value (point p)_FCDistance to x axis values) and q_FCAbscissa value (point q)_FCDistance to x-axis values);

s805, the connection strength S of the connection edge set V is calculated and goes to step S806,

wherein the content of the first and second substances,

，

wherein the similarity function

，|q_FC-p_FC|_yHas the meaning of point q_FCOrdinate value (point q)_FCDistance value to y-axis) and p_FCOrdinate value (point p)_FCDistance value to y-axis);

and S806, outputting the connection strength S.

Further, in S800, the value of the intensity threshold is 0.5 to 0.8 times of PNum or 0.5 to 0.8 times of the number of corner points on the contour line in the vector contour line of the jth letter standard image.

Further, in S1000, a Manchu character including 1 or more characters marked as letters corresponding to the letter standard image of the associated letter is output in the database through the output character and the number thereof.

The invention also provides a Manchu recognition system, which comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to run in the units of the following system:

an alphabetical image reading unit for reading each alphabetical standard image;

the Manchu image acquisition unit is used for acquiring the Manchu image and carrying out binarization to obtain a binary image;

the image preprocessing unit is used for filtering the binary image, extracting a salient region after filtering and carrying out edge detection on the salient region to obtain an image to be identified;

the parameter initialization unit is used for setting the initial value of a variable i to be 1, the value range of i is [1, N ], i is a natural number, and the value of N is the ratio of the size of the image to be recognized to the size of the sliding window; setting an initial value of a variable j as 1, setting a value range of j as [1, M ], wherein M is the total number of letter standard images, and the width of a sliding window is W;

the sliding image extraction unit is used for searching an image to be identified by a sliding window method, searching the longest line segment in the sliding window through Hough transform to serve as the ith line segment, calculating the angle of an included angle between the ith line segment and the ordinate axis of the image matrix along the clockwise direction to serve as the ith angle, rotating the image in the sliding window by the ith angle in the anticlockwise direction to serve as the ith comparison image when the length of the ith line segment is larger than or equal to K, and enabling K = 0.2W;

the standard line segment extraction unit is used for filtering the binary image of the jth letter standard image, extracting a significant region after filtering, carrying out edge detection on the significant region to obtain an image to be recognized of the jth letter standard image, and searching the longest line segment in the image to be recognized of the jth letter standard image as a standard line segment through Hough transformation;

the contour line extracting unit is used for scaling the ith comparison image according to the proportion between the standard line segment and the ith line segment and respectively extracting the vector contour lines of the ith comparison image and the jth letter standard image;

the connection strength output unit is used for calculating the connection strength between the vector contour lines of the ith comparison image and the jth letter standard image; when the connection strength is larger than the strength threshold value, marking the jth letter standard image as an associated letter and transferring to a window sliding unit; when the connection strength is less than or equal to the strength threshold value, increasing the value of j by 1, turning to a window sliding unit when the value of j is greater than M, and turning to a standard line segment extracting unit when the value of j is less than or equal to M;

the window sliding unit is used for increasing the value of i by 1, setting the value of j to be 1, sliding the sliding window by the distance of W, switching to the sliding image extracting unit when the value of i is less than or equal to N, and switching to the result output unit when the value of i is greater than N;

and the result output unit is used for outputting the characters and the numbers of the letters corresponding to the letter standard images marked as the associated letters in the database.

The invention has the beneficial effects that: the invention provides a Manchu recognition method and a Manchu recognition system, which solve the problem of low recognition accuracy caused by continuous writing of letters and deformation for local recognition of letters and local adjustment images through a sliding window; the method can quickly and accurately identify and position the Manchu characters formed by the images or the handwriting, reduce the operation complexity and improve the identification precision, can quickly and accurately output the corresponding characters, avoids the problem of inaccurate letter identification caused by no gap between letters in the same word, can perform indexes on partial letter areas with higher accuracy, ensures the reliability of letter identification by local identification, and reduces the false detection identification probability.

Drawings

The above and other features of the present invention will become more apparent by describing in detail embodiments thereof with reference to the attached drawings in which like reference numerals designate the same or similar elements, it being apparent that the drawings in the following description are merely exemplary of the present invention and other drawings can be obtained by those skilled in the art without inventive effort, wherein:

FIG. 1 is a flow chart of a Manchu recognition method;

FIG. 2 shows a sliding path over a full text image in a sliding window;

fig. 3 is a block diagram of a Manchu recognition system.

Detailed Description

The conception, the specific structure and the technical effects of the present invention will be clearly and completely described in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the schemes and the effects of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 1 is a flow chart of a Manchu recognition method according to the present invention, and a Manchu recognition method according to an embodiment of the present invention is described below with reference to fig. 1.

The invention provides a Manchu recognition method, which specifically comprises the following steps:

s100, reading each letter standard image;

s1000, outputting all letter standard images marked as associated letters;

further, the characters and the numbers of the letters of the corresponding letters in the database of all the letter standard images marked as the associated letters are output.

FT algorithm: R.Achanta, S.Hemami, F.Estrada and S.S. Susstring, Frequency-tuned saline Region Detection, IEEE International Conference on Computer Vision and Pattern recognition (CVPR 2009), pp.1597-1604,2009.

Further, in S500, the sliding window method is to search for an image to be recognized through a sliding window, as shown in fig. 2, fig. 2 shows a sliding path on a Manchu image in the sliding window, a rectangular frame in fig. 2 is the sliding window, 1 to 20 in fig. 2 are sequences of sliding of the sliding window each time, an ith comparison image is obtained after processing a Manchu image portion captured in the sliding window after sliding, and a step amount of each sliding is a size of the sliding window; the size of the sliding window is between [0.01 and 1] times of the size of the image to be recognized, or the height (the size) of the sliding window is W x W, the value of W is [0.01 and 1] times of the width of the image to be recognized, the adjustment is carried out according to the full character number in the image to be recognized, the sliding distance is the width of the sliding window every time, after one line is slid on the image matrix every time, the sliding window automatically jumps to the next line of the image matrix according to the height of the sliding window (the image is scanned transversely from a pixel area which is not selected by the sliding window in the image matrix), and the sliding is continued, and the following steps are carried out: the height and width is also called the length and width, and is in units of pixels.

s710, when ArrayMark of array marks of all corner points of the connected array or any one of the arrays to be connected is equal to 0 (namely, all corner points in any one of the two coordinate axis distance ranges are connected completely), increasing 1 to the variables h and r, setting ArrayMark of array marks of all corner points in the arrays to be connected to 1 (for preparation of next connection, after the variable h is increased, the arrays to be connected are about to be converted into connected arrays), setting Linkmark of all corner points in the arrays to be connected and the arrays to be connected to 0, and proceeding to S703 (namely, connecting the next coordinate axis distance range),

otherwise (that is, there is no situation that any one of the connected array and the array to be connected is completed, and when the array mark ArrayMark of all the corner points is equal to 0, the meaning is that all the connections of the array are completed), go to step S706 (continue to connect the corner points in the connected array and the array to be connected);

s802, connecting every two in turn by vector linesPAndQthe corner point with the nearest upper distance is obtained, and the connected side set is a connected side set V = { (p)_FC,q_FC) FC is the serial number of the edge point, and the value range is [1, PNum]PNum is a constant value, PNum has a value ofkAndntwo numberThe smaller of the values;

s803, respectively calculating the end points p of all the connection edges in the connection edge set_FCTo p₁Sequentially taking each distance obtained by calculation as a first distance set; respectively calculating the end points q of all the connection edges in the connection edge set_FCTo q₁Sequentially taking each distance obtained by calculation as a second distance set; calculating the difference value of each distance element in the first distance set and each distance element in the second distance set in sequence; counting the number of all the difference values as positive numbers and negative numbers; when the number of positive numbers is larger than the negative number, go to step S804, otherwise go to step S805; the distance elements are all distance values in the set;

wherein the content of the first and second substances,

，

wherein the similarity function

wherein the content of the first and second substances,

，

wherein the similarity function

，|q_FC-p_FC|_yHas the meaning of point q_FCOrdinate value (point q)_FCDistance value to y-axis) and p_FCOrdinate value (point p)_FCDistance value to y-axis) of the two-dimensional image data；

And S806, outputting the connection strength S.

A Manchu recognition system provided by an embodiment of the present invention is a Manchu recognition system structure diagram of the present invention as shown in fig. 3, and the Manchu recognition system of the embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the steps in one of the Manchu recognition system embodiments described above when executing the computer program.

The system comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to run in the units of the following system:

The Manchu recognition system can be operated in computing equipment such as desktop computers, notebooks, palm computers, cloud servers and the like. The Manchu recognition system can be operated by a system comprising, but not limited to, a processor and a memory. Those skilled in the art will appreciate that the illustrated example is merely an example of a Manchu recognition system and is not intended to limit a Manchu recognition system, which may include more or less than a proportional number of components, or some combination of components, or different components, such as an input-output device, a network access device, a bus, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the operating system of the Manchu recognition system, and various interfaces and lines connecting the various parts of the entire operating system of the Manchu recognition system.

The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the Manchu recognition system by running or executing the computer programs and/or modules stored in the memory, and by invoking the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Although the present invention has been described in considerable detail and with reference to certain illustrated embodiments, it is not intended to be limited to any such details or embodiments or any particular embodiment, so as to effectively encompass the intended scope of the invention. Furthermore, the foregoing describes the invention in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the invention, not presently foreseen, may nonetheless represent equivalent modifications thereto.

Claims

1. A Manchu recognition method, comprising:

s100, reading each letter standard image;

s500, searching an image to be identified by a sliding window method, searching the longest line segment in the sliding window as an ith line segment through Hough transformation, calculating an included angle between the ith line segment and the ordinate axis of the image matrix along the clockwise direction as an ith angle, and when the length of the ith line segment is greater than or equal to K, rotating the image in the sliding window counterclockwise by the ith angle to obtain the image in the sliding window as an ith comparison image, wherein K = 0.2W;

s800, calculating the connection strength between the vector contour lines of the ith comparison image and the jth letter standard image; when the connection strength is greater than the strength threshold value, marking the jth letter standard image as an associated letter and turning to the step S900; when the connection strength is less than or equal to the strength threshold value, increasing the value of j by 1, turning to the step S900 when the value of j is greater than M, and turning to the step S600 when the value of j is less than or equal to M;

and S1000, outputting all the letter standard images marked as the associated letters.

2. The Manchu recognition method according to claim 1, wherein in S500, the sliding window method is to search for the image to be recognized through a sliding window, and the step size of each sliding is the size of the sliding window; the size of the sliding window is between [0.01 and 1] times of the size of the image to be recognized, or the height and the width of the sliding window are W x W, the value of W is [0.01 and 1] times of the width of the image to be recognized, the adjustment is carried out according to the full character number in the image to be recognized, the sliding distance is the width of the sliding window each time, and after the sliding on the image matrix is completed by one line each time, the sliding window automatically jumps to the next line of the image matrix according to the height of the sliding window to continue sliding.

3. The Manchu recognition method of claim 1, wherein in S700, when scaling the ith comparison image according to a ratio between the standard line segment and the ith line segment, the image is obtained after at least 8 pixels of the edge of the ith comparison image are left blank.

4. The Manchu recognition method of claim 3, wherein in S700, the method for extracting the vector contour line of the ith contrast image comprises: in the ith contrast image, starting from an end point of the ith line segment, which is closest to the ordinate axis, calculating curvature values at each edge point on the ith line segment, calculating an average value of all the curvature values, and taking all the edge points with the curvature values larger than the average value as corner points to form a large curvature point set; and sequentially connecting each corner point with the large-curvature point set curvature value larger than the average value.

5. The Manchu recognition method of claim 4, wherein in S700, the method for extracting the vector contour line of the jth letter standard image is as follows: starting from an end point of the standard line segment, which is closest to the ordinate axis, calculating curvature values at each edge point on the standard line segment, calculating an average value of all the curvature values, and taking all the edge points with the curvature values larger than the average value as corner points to form a large curvature point set; and sequentially connecting each corner point with the large-curvature point set curvature value larger than the average value.

6. A Manchu recognition method according to claim 4 or claim 5, wherein the method of sequentially connecting edge points having a curvature value larger than the average value in the large-curvature point set comprises the steps of:

s701, setting coordinates of an edge point with the minimum value of a central abscissa axis of the large-curvature point as (Xmin, Ymin), setting coordinates of an edge point with the maximum value of a central abscissa axis of the large-curvature point as (Xmax, Ymax), setting a space formed by a plurality of rows of pixels of a y axis as span pixels, setting a span value as an integer between 10 and 100, setting an initial value of a variable h as 0, setting an initial value of a variable r as 1, and setting h and r as natural numbers;

s702, sequencing each corner point in the large-curvature point set from small to large according to the value of an abscissa; except for the edge points with the maximum and minimum values of the abscissa, each edge point in the large-curvature point set is correspondingly provided with a connecting mark Linkmark and an array mark ArrayMark, the Linkmark and the ArrayMark are both set to be 1, the connecting mark Linkmark of the edge point with the minimum value of the abscissa is set to be 2, and the connecting mark Linkmark of the edge point with the maximum value of the abscissa is set to be 3;

s704, if there is a corner point Linkmark =2 in the r-th layer connected section, connecting the corner point Linkmark =2 with the corner point of all Linkmark =1 of the r + 1-th layer section to be connected by a vector line, setting the connection mark Linkmark of all the corner points of the r-th layer connected section and the r + 1-th layer section to be connected to 0, and going to step S705, if there is no corner point Linkmark =2 in the r-th layer connected section, going to step S705;

s705, if corner points of Linkmark =3 exist in the r + 1-th section to be connected, connecting all corner points marked with Linkmark =0 in the r-th section to be connected to the corner points of Linkmark =3 by vector lines; setting a link mark Linkmark of the corner point where Linkmark =3 to 0, and proceeding to step S711;

if no edge point with Linkmark =3 exists in the r + 1-th layer to-be-connected interval, judging whether the connection marks Linkmark of all edge points in the large-curvature point set are all equal to 0, if so, turning to the step S711, and if not, turning to the step S706;

connecting the first starting point and the first ending point by a vector line, and setting an array mark ArrayMark of the first starting point and the first ending point to 0;

connecting the second starting point and the second ending point by a vector line, and setting an array mark ArrayMark of the second starting point and the second ending point to be 0;

s710, when the array mark ArrayMark of all corner points in any one of the connected array or the array to be connected is equal to 0, increasing the variables h and r by 1, setting the array mark ArrayMark of all corner points in the array to be connected to 1 and setting the Linkmark of all corner points in the array to be connected and the connected array to 0 and going to step S703,

otherwise, go to step S706;

s711 outputs a vector contour line formed by all corner points connected by the vector line.

7. The Manchu recognition method of claim 6, wherein in S800, the method for calculating the connection strength between the vector contour lines of the ith contrast image and the jth letter standard image comprises the following steps:

wherein the content of the first and second substances,P={p₁,p₂,…,p_k |k>0}，kthe number of corner points on the contour lines in the vector contour line of the ith comparison image,Q={q₁,q₂,… ,q_n |n>0}，nthe number p of corner points on the contour line in the vector contour line of the jth letter standard image_k、q_nIs the corner point on the contour line, p₁And q is₂Are all made ofPAndQcorner point with minimum distance from upper to vertical axis，p₁, p₂,…,p_kAnd q is₁,q₂,… ,q_nIn the two groups of sequences, the distance between each corner point and the ordinate axis is sequentially increased along with the increasing of the subscript serial number;

wherein the content of the first and second substances,

，

wherein the similarity function

，|p_FC-q_FC|_xHas the meaning of point p_FCAbscissa value and q_FCThe difference between the abscissa values of (a);

wherein the content of the first and second substances,

，

wherein the similarity function

，|q_FC-p_FC|_yHas the meaning of point q_FCOrdinate value of (a) and p_FCThe difference between the ordinate values of (a);

and S806, outputting the connection strength S.

8. The method according to claim 7, wherein in S800, the intensity threshold is 0.5 to 0.8 times PNum or 0.5 to 0.8 times the number of corner points on the contour lines in the vector contour line of the jth letter standard image.

9. A Manchu recognition method according to claim 1, wherein in S1000, Manchu characters including more than 1 character marked as the letter corresponding to the letter standard image of the associated letter are output in the database by the output character and its number.

10. A Manchu recognition system, the system comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to run in the units of the following system: