WO2020155484A1 - 基于支持向量机的文字识别方法、装置和计算机设备 - Google Patents
基于支持向量机的文字识别方法、装置和计算机设备 Download PDFInfo
- Publication number
- WO2020155484A1 WO2020155484A1 PCT/CN2019/089057 CN2019089057W WO2020155484A1 WO 2020155484 A1 WO2020155484 A1 WO 2020155484A1 CN 2019089057 W CN2019089057 W CN 2019089057W WO 2020155484 A1 WO2020155484 A1 WO 2020155484A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- picture
- classified
- support vector
- intersection
- Prior art date
Links
- 238000012706 support-vector machine Methods 0.000 title claims abstract description 136
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 141
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000005516 engineering process Methods 0.000 claims description 23
- 238000000605 extraction Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 239000002699 waste material Substances 0.000 abstract description 2
- 230000003044 adaptive effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 33
- 238000004590 computer program Methods 0.000 description 9
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
Definitions
- This application relates to the field of computers, and in particular to a text recognition method, device, computer equipment and storage medium based on a support vector machine.
- Picture recognition technology is widely used and has an important position in various fields.
- the picture recognition classification technology can be used for head portrait recognition, real estate certificate recognition and then classification, etc., which can automatically recognize pictures.
- the specified image is generally scanned to obtain all the pixels of the specified image, and then the pixels are compared with the pixels of the standard template in turn, and the specified images are classified according to the comparison results.
- This method is time-consuming and labor-intensive, error-prone, low accuracy, and misjudgment of the specified pictures that are stretched and tilted.
- the traditional technology needs to recognize the entire specified picture when recognizing the specified picture, which consumes a lot of computing power and has low recognition efficiency. Therefore, the prior art technical solutions for picture recognition and classification are time-consuming and laborious, and cannot identify and classify designated pictures that are stretched or tilted.
- the main purpose of this application is to provide a support vector machine-based text recognition method, device, computer equipment and storage medium, aiming to reduce waste of computing power, improve classification efficiency, recognition efficiency, and solve the existing problems of existing technologies.
- this application proposes a text recognition method based on a support vector machine, including the following steps:
- the n-dimensional vectors (G1, G2,...Gn) are input into preset multiple support vector machines that have been trained to perform operations, where the kth support vector machine can classify the specified picture as the kth class and except for the kth class. Classes other than class k;
- the category of the specified picture using a preset correspondence relationship between the category and the text recognition mode to obtain a text recognition mode corresponding to the specified picture, wherein the text recognition mode specifies a text recognition area;
- the text recognition area is recognized as text text, and the text text is stored.
- This application provides a text recognition device based on a support vector machine, including:
- a designated picture acquiring unit configured to acquire a designated picture to be classified, and the designated picture to be classified has a closed table border;
- the n-dimensional vector acquisition unit is used according to the formula:
- the support vector machine arithmetic unit is used to input the n-dimensional vector (G1, G2,...Gn) into a plurality of pre-trained support vector machines for calculation, wherein the k-th support vector machine can assign a picture Classified into category k and other categories except category k;
- a preliminary classification result obtaining unit configured to obtain a plurality of preliminary classification results respectively output by a plurality of support vector machines and output values corresponding to the plurality of preliminary classification results
- the category marking unit is used to mark the category of the specified picture to be classified as the preliminary classification result corresponding to the return value of max (the output value corresponding to the plurality of preliminary classification results).
- the character recognition mode acquisition unit is used to obtain the character recognition mode corresponding to the designated picture by using the preset correspondence between the category and the character recognition mode according to the category of the designated picture, wherein the character recognition mode specifies the character Identification area
- the text text recognition unit is configured to use a preset text recognition technology to recognize the text recognition area as text text, and store the text text.
- the present application provides a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the foregoing methods when the computer program is executed.
- the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above methods are implemented.
- the support vector machine-based text recognition method, device, computer equipment and storage medium of this application calculate the normalized vector GI of the specified picture by obtaining the specified picture to be classified, thereby obtaining the n-dimensional vector (G1, G2,... Gn), input the n-dimensional vectors (G1, G2,... Gn) into preset multiple support vector machines that have been trained for operation, and record the category of the specified picture to be classified as max (the The return value of the output value of multiple support vector machines) corresponds to the preliminary classification result, thereby realizing automatic, rapid and accurate identification of designated pictures, which can be applied to the designated picture recognition under stretch and tilt conditions.
- FIG. 1 is a schematic flowchart of a text recognition method based on a support vector machine according to an embodiment of the application
- FIG. 2 is a schematic block diagram of the structure of a text recognition device based on a support vector machine according to an embodiment of the application;
- FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
- an embodiment of the present application provides a text recognition method based on a support vector machine, which includes the following steps:
- a designated picture to be classified is obtained, and the designated picture to be classified has a closed table border, including a real estate certificate picture.
- Specified pictures to be classified such as pictures of real estate certificates. Since real estate certificates, such as real estate certificates, have multiple types, such as commercial, residential, commercial and residential, etc., they can be divided into multiple categories according to different regions, different periods, and different uses. Manual classification is tedious and error-prone. Therefore, this application uses machine learning support vector machines to realize automatic classification. Wherein, the acquired designated picture to be classified has a closed table border.
- the table border has n+1 intersections, thereby obtaining n-dimensional vectors (G1, G2,...Gn). Because different types of designated pictures use different tables, the table borders are also different, and the intersections of the table borders are also different. Therefore, the specified pictures can be classified according to the intersection of the table borders. Wherein, the intersection of the upper left corner of the frame of the table is the origin, and according to the formula
- the n-dimensional vectors (G1, G2,...Gn) are input into preset multiple support vector machines that have been trained to perform operations, wherein the k-th support vector machine can classify the specified picture It is the kth class and other classes except the kth class.
- Support Vector Machine (SVM) is a binary classification model in machine learning. Its purpose is to find a hyperplane to segment the sample. The principle of segmentation is to maximize the interval, and finally transform into a convex quadratic Plan problems to solve. That is, the support vector machine maps the sample to a high-dimensional space and finds a hyperplane, so that each side of the hyperplane is a category, thereby realizing two classifications.
- kernel function Commonly used kernel functions include linear kernel functions, polynomial kernel functions, Gaussian kernel functions, Laplace kernel functions, and so on.
- This application uses any feasible kernel function, preferably a Gaussian kernel function.
- the n-dimensional vector (G1, G2,...Gn) is input into a plurality of pre-trained support vector machines for operation.
- the k-th support vector machine can classify the specified picture into the k-th category and the way in which other categories are except the k-th category. Therefore, the k-th holding vector machine can output the k-th classification result and the corresponding output value, where the output value is essentially a sample composed of the feature vector of the real estate certificate mapped to a point in a high-dimensional space to a hyperplane Distance, where the hyperplane classifies the property ownership certificate into positive and negative classes in the high-dimensional space.
- the distance value When the point is in the positive class, the distance value is positive, and when the point is in the negative class, the distance The value is negative, where the positive class corresponds to the k-th class, and the negative class corresponds to other classes except the k-th class. According to this, multiple classification results (the same as the number of multiple support vector machines) and corresponding output values (that is, values reflecting the accuracy of the classification results) of the specified pictures to be classified can be obtained.
- step S4 multiple preliminary classification results respectively output by multiple support vector machines and output values corresponding to the multiple preliminary classification results are obtained.
- the output value is essentially the distance from a point in a high-dimensional space to a hyperplane from a sample composed of feature vectors of the real estate certificate.
- the category of the designated picture to be classified is recorded as the preliminary classification result corresponding to the return value of max (the output value corresponding to the plurality of preliminary classification results).
- the return value of max (the output value corresponding to the plurality of preliminary classification results) is the largest value among the output values corresponding to the plurality of preliminary classification results, and the largest value indicates that the classification result is the most accurate.
- the preliminary classification result corresponding to the return value of max (the output value corresponding to the plurality of preliminary classification results) is used as the classification result of the designated picture to be classified.
- the corresponding relationship between the preset category and the text recognition mode is used to obtain the text recognition mode corresponding to the specified picture, wherein the text recognition mode specifies text recognition area.
- the area of the text recognition area is smaller than the total area of the designated picture, so that only the text content of a small area can be recognized (that is, the text content of the required area is recognized, For example, it is sufficient to identify only the information of the head of the household, the issuing agency, the type of residence, etc.), thereby reducing the computational power consumption and improving the efficiency of picture recognition.
- the character recognition mode can be any mode, but the character recognition mode must specify a character recognition area.
- the text recognition area is recognized as text text, and the text text is stored. Since the text recognition area is the most needed text information in this application, a preset text recognition technology is used to recognize the text recognition area as text text, and store the text text. Among them, the preset text recognition technology can be any technology, such as OCR (Optical Character Recognition) recognition. Since the text recognition technology is mature, it will not be repeated here. Wherein, the recognized text can be called by any instruction, for example, according to an information verification instruction.
- the method for obtaining the support vector machine includes:
- the kth support vector machine is obtained.
- This embodiment divides multiple designated pictures of different categories into two groups, one group is the k-th type designated pictures (the n-dimensional vector forms the positive set in the training set), and the other group is the designated pictures other than the k-th type designated pictures. Pictures (the n-dimensional vectors form the negative set in the training set), so that the support vector machine obtained by training can classify the specified pictures of different categories into the kth class, or other classes except the kth class.
- the step S1 of acquiring the designated picture to be classified, the designated picture to be classified has a closed table border includes:
- S102 Detect the designated picture to be classified, and determine the position of a closed table frame in the designated picture to be classified;
- the designated picture to be classified that includes only the table borders, which reduces the difficulty of subsequently detecting the intersection of the table borders on the picture.
- the specified picture contains not only tables, but also other printed words and corner patterns. In the process of using support vector machines to classify the designated pictures, these other printed words and corner patterns are not relevant. Function, anyway, it will interfere with the acquisition of the vector. Therefore, in this embodiment, the position of the closed table frame in the specified picture to be classified is determined first, and then the part outside the table frame in the specified picture to be classified is removed to obtain only the table The designated picture of the frame to be classified.
- Step S2 includes:
- the table border corresponding to the smallest distance among the four first distances is obtained first, and then the normalized vector is calculated. Since the designated picture may be rotated, such as 90, 180, 270 degrees, if the rotated picture is compared and classified with the unrotated picture, it will inevitably cause a classification error. Therefore, it is necessary to unify the initial rotation angle of the picture (that is, determine the standard picture). Specifically, the first distance between the second intersection of the first row of the table frame and the first intersection of the first row of the table frame is calculated, and the smallest of the four first distances is obtained.
- the table frame line corresponding to the distance that is, the table frame line corresponding to the smallest distance among the four first distances as the table frame line of the standard picture, so that the classification is more accurate.
- the training data of the training set of the support vector machine is also obtained from a designated picture having a table frame corresponding to the smallest distance among the four first distances.
- the step S5 of recording the category of the specified picture to be classified as the preliminary classification result corresponding to the return value of max includes:
- S501 Use text recognition technology to obtain text information in the designated picture to be classified
- S503 According to the specific text, obtain the estimated category of the specified picture to be classified by using the preset correspondence between the specific text and the specified picture category;
- the support vector machine has been used to classify the specified pictures to be classified.
- this embodiment also adopts the estimated category to further improve the classification accuracy.
- the text recognition technology can be any feasible way, such as using OCR (Optical Character Recognition) recognition. Since the text recognition technology is mature, it will not be repeated here.
- Extracting a specific text from the text information, and storing the specific text in a specific text table includes: determining whether a specific text in the specific text table exists in the text information, and if it exists, extracting the specific text. After obtaining the estimated category, judge whether the preliminary classification result corresponding to the return value of max (the output value corresponding to the multiple preliminary classification results) obtained by the support vector machine is the same as the estimated category, if the same, it indicates the support vector The classification of the machine is accurate.
- the step S502 of extracting a specific text from the text information, and the specific text is pre-stored in a specific text table includes:
- S5021 Determine whether there is a specific text pre-stored in a specific text table in the text information
- the specific text is extracted from the text information, and the specific text is pre-stored in the specific text table.
- the specific text pre-stored in the specific text table can reflect the category of the specified picture. If there is a specific text pre-stored in the specific text table in the text information, it indicates that the category of the specified picture can be estimated based on the specific text. Therefore, by determining whether there is a specific text pre-stored in a specific text table in the text information, if there is a specific text pre-stored in the specific text table in the text information, the specific text is extracted from the text information, To achieve access to specific text.
- the support vector machine adopts a Gaussian kernel function
- x i is an n-dimensional vector (G1, G2,...Gn)
- x j is the center of the kernel function
- ⁇ is the width parameter of the function.
- the kernel function is set.
- the kernel function and the support vector machine have a one-to-one correspondence. Once the kernel function K(x i , x j ) is determined, the support vector machine is implicitly determined.
- the use of the kernel function enables the support vector machine to obtain powerful nonlinear processing capabilities, and avoids complex calculations on high-dimensional feature spaces, effectively overcoming the dimensionality disaster problem.
- This embodiment adopts a Gaussian kernel function, and the expression is:
- the Gaussian kernel function is a Radial Basis Function (RBF), which uses the Gaussian kernel function to construct a support vector machine.
- RBF Radial Basis Function
- the RBF hyperparameters are less and relatively simpler, and compared with the polynomial kernel, which may be from 0 to infinity, the numerical calculation pressure will be much less. Therefore, this embodiment adopts a Gaussian kernel function.
- the text recognition method based on the support vector machine of the present application calculates the normalized vector GI of the specified picture by obtaining the specified picture to be classified, thereby obtaining the n-dimensional vector (G1, G2,...Gn).
- the vector (G1, G2,...Gn) is input into preset multiple support vector machines that have been trained for operation, and the type of the specified picture to be classified is recorded as max (the output value of the multiple support vector machines The return value of) corresponds to the preliminary classification result, thereby realizing automatic, rapid and accurate identification of designated pictures, which can be applied to the designated picture recognition under stretch and tilt conditions.
- an embodiment of the present application provides a text recognition device based on a support vector machine, including:
- the designated picture acquiring unit 10 is configured to acquire a designated picture to be classified, and the designated picture to be classified has a closed table border;
- the n-dimensional vector obtaining unit 20 is used for according to the formula:
- the support vector machine operation unit 30 is configured to input the n-dimensional vector (G1, G2,...Gn) into a plurality of pre-trained support vector machines for calculation, wherein the k-th support vector machine can specify The pictures are classified into category k and other categories except category k;
- the preliminary classification result obtaining unit 40 is configured to obtain a plurality of preliminary classification results respectively output by a plurality of support vector machines and output values corresponding to the plurality of preliminary classification results;
- the category marking unit 50 is configured to record the category of the specified picture to be classified as the preliminary classification result corresponding to the return value of max (the output value corresponding to the multiple preliminary classification results);
- the character recognition mode acquisition unit 60 is configured to obtain the character recognition mode corresponding to the designated picture by using the preset correspondence relationship between the category and the character recognition mode according to the category of the designated picture, wherein the character recognition mode specifies Text recognition area;
- the text text recognition unit 70 is configured to use a preset text recognition technology to recognize the text recognition area as text text, and store the text text.
- the device includes a support vector machine acquisition unit, and the support vector machine acquisition unit includes:
- the designated picture acquisition subunit is used to acquire designated pictures of different categories
- the normalized vector Gi gets the subunit, which is used according to the formula:
- the training set acquisition subunit is used to take the n-dimensional vector of the k-th specified picture in the specified pictures of different categories as the positive set, and the n-dimensional vector of the specified pictures except the k-th specified picture as the negative set, Thus forming the training set of the k-th support vector machine;
- the training subunit is used to input the sample data of the training set of the k-th support vector machine into the support vector machine for training to obtain the k-th support vector machine.
- the designated picture acquiring unit 10 includes:
- the table frame line position determination subunit is used to detect the specified picture to be classified and determine the closed table frame line position in the specified picture to be classified;
- the removing subunit is used to remove the part outside the frame line of the table in the specified picture to be classified to obtain the specified picture to be classified that only includes the table frame line.
- the n-dimensional vector acquiring unit 20 includes:
- the rotation subunit is used to rotate the table border clockwise or counterclockwise by 90 degrees 3 times, and before each rotation and after the third rotation, calculate the second line of the first row of the table border The first distance between the two intersections and the first intersection of the first row of the frame line of the table, thereby obtaining four first distances;
- the table frame line obtaining subunit is used to obtain the table frame line corresponding to the smallest distance among the four first distances;
- the n-dimensional vector acquisition subunit is used for the table frame line corresponding to the smallest distance among the four first distances, according to the formula:
- the category marking unit 50 includes:
- the text recognition subunit is used to obtain text information in the designated picture to be classified by using text recognition technology
- the estimated category subunit is used to obtain the estimated category of the specified picture to be classified by using the preset correspondence between the specific word and the specified picture according to the specific word;
- the category labeling subunit is configured to, if the estimated category and the preliminary classification result corresponding to the return value of the max (the output value corresponding to the plurality of preliminary classification results) are the same, then the specified picture to be classified The category is recorded as the preliminary classification result corresponding to the return value of max (the output value corresponding to the plurality of preliminary classification results).
- the extracting specific text subunit includes:
- the specific character judgment module is used to determine whether there is a specific character pre-stored in a specific character table in the text information
- the specific text extraction module is used for extracting the specific text from the text information if there is a specific text pre-stored in a specific text table in the text information.
- the support vector machine acquisition unit includes:
- the support vector machine-based text recognition device of the present application calculates the normalized vector GI of the specified picture by obtaining the specified picture to be classified, thereby obtaining the n-dimensional vector (G1, G2,...Gn), and the n-dimensional
- the vector (G1, G2,...Gn) is input into preset multiple support vector machines that have been trained for operation, and the type of the specified picture to be classified is recorded as max (the output value of the multiple support vector machines The return value of) corresponds to the preliminary classification result, thereby realizing automatic, rapid and accurate identification of designated pictures, which can be applied to the designated picture recognition under stretch and tilt conditions.
- the embodiment of the present invention also provides a computer device.
- the computer device may be a server, and its internal structure may be as shown in the figure.
- the computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the database of the computer equipment is used to store the data used in the character recognition method based on the support vector machine.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer program is executed by the processor to realize a character recognition method based on support vector machine.
- the above-mentioned processor executes the above-mentioned support vector machine-based character recognition method, wherein the steps included in the method respectively correspond to the steps of executing the support vector machine-based character recognition method of the foregoing embodiment, and will not be repeated here.
- the computer device of the present application calculates the normalized vector GI of the specified picture by obtaining the specified picture to be classified, thereby obtaining the n-dimensional vector (G1, G2,...Gn), and the n-dimensional vector (G1, G2, ...Gn)
- Preliminary classification results thereby realizing automatic, rapid and accurate identification of designated pictures, which can be applied to designated picture recognition under stretched and tilted conditions.
- An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
- a computer program is stored.
- the computer program is executed by a processor, a support vector machine-based character recognition method is implemented, wherein the steps included in the method are respectively the same as those in the foregoing
- the steps of the support vector machine-based character recognition method of the embodiment correspond to each other, which will not be repeated here.
- the computer-readable storage medium of the present application calculates the normalized vector GI of the specified picture by obtaining the specified picture to be classified, thereby obtaining the n-dimensional vector (G1, G2,...Gn), and the n-dimensional vector (G1 ,G2,...Gn) Input the preset multiple support vector machines that have been trained to perform operations, and record the category of the specified picture to be classified as max (the output value of the multiple support vector machines). Preliminary classification results corresponding to the value, thereby realizing automatic, fast and accurate identification of designated pictures, which can be applied to designated picture recognition under stretch and tilt conditions.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于支持向量机的文字识别方法、装置、计算机设备和存储介质,包括:获取待分类的指定图片;计算出指定图片的归一化向量Gi,从而获得n维向量(G1,G2,…Gn);将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算;将待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;获取与指定图片对应的文字识别模式;将文字识别区域识别为文字文本,并存储所述文字文本。从而减少算力浪费、提高分类效率、识别效率,并且适应存在拉伸和倾斜情况的指定图片。
Description
本申请要求于2019年1月31日提交中国专利局、申请号为201910100425.2,发明名称为“基于支持向量机的文字识别方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及到计算机领域,特别是涉及到一种基于支持向量机的文字识别方法、装置、计算机设备和存储介质。
图片识别技术应用广泛,在各领域中均具有重要地位,其中的图片识别分类技术可用于头像识别,不动产证识别后再分类等,能够自动识别图片。目前对于不动产证的图片识别分类,一般是将指定图片进行扫描,获取指定图片的所有像素点,再依次将像素点与标准模板的像素点进行对比,根据对比结果对指定图片进行分类。这种方法耗时费力,且容易出错,准确度不高,更对存在拉伸和倾斜情况的指定图片会出现误判。并且,传统技术在动指定图片进行识别时,需要对整张指定图片进行识别,耗费算力多、识别效率低。因此,现有技术的图片识别与分类的技术方案耗时费力,且不能对存在拉伸和倾斜情况的指定图片进行识别并分类。
本申请的主要目的为提供一种基于支持向量机的文字识别方法、装置、计算机设备和存储介质,旨在减少算力浪费、提高分类效率、识别效率,并且解决现有技术对存在拉伸和倾斜情况的指定图片不能分类的技术问题。
为了实现上述发明目的,本申请提出一种基于支持向量机的文字识别方法,包括以下步骤:
获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;
根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);
将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;
获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;
将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;
根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;
采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
本申请提供一种基于支持向量机的文字识别装置,包括:
指定图片获取单元,用于获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;
n维向量获取单元,用于根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);
支持向量机运算单元,用于将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;
初步分类结果获取单元,用于获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;
类别标记单元,用于将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。
文字识别模式获取单元,用于根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;
文字文本识别单元,用于采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
本申请提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述任一项所述方法的步骤。
本申请提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法的步骤。
本申请的基于支持向量机的文字识别方法、装置、计算机设备和存储介质,通过获取待分类的指定图片,计算出指定图片的归一化向量GI,从而获得n维向量(G1,G2,…Gn),将所述n维向量(G1,G2,… Gn)输入预设的已训练完成的多个支持向量机中进行运算,将所述待分类的指定图片的类别记为max(所述多个支持向量机的输出值)的返回值对应的初步分类结果,从而实现了自动快速准确地指定图片识别,可适用拉伸和倾斜情况下的指定图片识别。
图1为本申请一实施例的基于支持向量机的文字识别方法的流程示意图;
图2为本申请一实施例的基于支持向量机的文字识别装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,本申请实施例提供一种基于支持向量机的文字识别方法,包括以下步骤:
S1、获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;
S2、根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);
S3、将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;
S4、获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;
S5、将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;
S6、根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;
S7、采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
如上述步骤S1所述,获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线,包括不动产证图片。待分类的指定图片例如不动产证图片,由于不动产证例如房产证,具有多个种类,例如商业、住宅、商住等,根据不同地域、不同时期、不同用途可分为多个种类,若完全由人工分类,繁琐 且易出错。因此本申请采用机器学习的支持向量机实现自动分类。其中,获取的待分类的指定图片具有封闭的表格框线。
如上述步骤S2所述,根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。由于不同类别的指定图片采用的表格不同,因此表格框线也不同,表格框线的交点也不同,因此根据表格框线的交点可对指定图片进行分类。其中,所述表格框线的左上角的交点为原点,根据公式
计算出指定图片的归一化向量Gi,相交于直接以指定图片的所述原点至所述表格框线中第i个交点的向量gi,可以避免在图片拉伸和倾斜的情况下的错误分类。
如上述步骤S3所述,将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类。支持向量机(Support Vector Machine,SVM)是机器学习中的一种二分类模型,它的目的是寻找一个超平面来对样本进行分割,分割的原则是间隔最大化,最终转化为一个凸二次规划问题来求解。即支持向量机将样本映射至高维空间里,并寻找到一个超平面,使所述超平面的两边各为一个类别,从而实现二分类。其中,寻找超平面的过程通过核函数来实现。常用的核函数包括线性核函数、多项式核函数、高斯核函数、拉普拉斯核函数等等。本申请采用任意可行的核函数,优选高斯核函数。高斯核函数的数学表达式为:K(x
i,x
j)=exp{-||x
i-x
j|
2/σ
2}},其中,x
i是n维向量(G1,G2,…Gn),x
j为核函数中心,σ为函数的宽度参数。本申请为了实现利用二分类的支持向量机对多种类样本的分类,采用了将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类的方式。从而由第k个持向量机能够输出第k个分类结果以及相应的输出值,其中所述输出值实质上就是所述不动产证的特征向量组成的样本映射至高维空间的一个点至超平面的距离,其中超平面在所述高维空间中将所述房产证分类为正类与负类,所述点在正类中时,距离值为正值,所述点在负类中时,距离值为负值,其中正类对应于第k类,负类对应于除了第k类之外的其他类。据此,可以得到所述待分类的指定图片的多个分类结果(与多个支持向量机的个数相同)及相应的输出值(即反应分类结果准确性的值)。
如上述步骤S4所述,获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类 结果对应的输出值。由前述,可知将所述n维向量(G1,G2,…Gn)输入多个支持向量机后,将获得所述待分类的指定图片的多个分类结果(与多个支持向量机的个数相同)及相应的输出值(即反应分类结果准确性的值)。其中,所述输出值实质上就是所述不动产证的特征向量组成的样本映射至高维空间的一个点至超平面的距离。
如上述步骤S5所述,将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。其中,max(所述多个初步分类结果对应的输出值)的返回值为所述多个初步分类结果对应的输出值中最大的值,所述最大的值表示其分类结果最为准确,因此应将max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果作为待分类的指定图片的分类结果。
如上述步骤S6所述,根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域。其中,所述文字识别区域中整个指定图片中的一部分,文字识别区域的面积小于指定图片的总面积,从而仅识别出较小区域的文字内容即可(即识别出需要的区域的文字内容,例如仅识别出户主、发证机构、住宅类型等区域的信息即可),从而减少了算力消耗、提高了图片的识别效率。其中,由于不同类别的指定图片的格式不同,因此不同区域记载的文字内容也不同,通过前述的方式获取所述指定图片的类别,才能准确获取所述文字识别区域,并进行文字识别以获取准确的文字信息,从而减少了算力消耗、提高了图片的识别效率。其中,所述文字识别模式可为任意模式,但是所述文字识别模式必须指定文字识别区域。
如上述步骤S7所述,采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。由于所述文字识别区域是本申请最需要的文字信息,因此采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。其中,预设的文字识别技术可为任意技术,例如采用OCR(光学字符识别)识别,由于文字识别技术已成熟,在此不再赘述。其中,识别得到的所述文字文本可通过任意指令调用,例如根据信息核实指令等调用。
在一个实施方式中,所述支持向量机的获取方法,包括:
S301、获取不同类别的指定图片;
S302、根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);
S303、将所述不同类别的指定图片中的第k类指定图片的n维向量作为正集,除第k类指定图片之外的指定图片的n维向量作为负集,从而构成第k个支持向量机的训练集;
S304、将第k个支持向量机的训练集的样本数据输入到支持向量机中进行训练,得到第k个支持向量机。
如上所述,实现了获取第k个支持向量机。本实施方式将多个不同类别的指定图片分成两组,一组为第k类指定图片(其n维向量构成训练集中的正集),另一组为除了第k类指定图片之外的指定图片(其n维向量构成训练集中的负集),从而使训练得到的支持向量机能够将不同类别的指定图片分类为第k类,或者是除了第k类之外的其他类。
在一个实施方式中,所述获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线的步骤S1,包括:
S101、获取待分类的指定图片;
S102、检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;
S103、去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
如上所述,实现了得到仅包括表格框线的待分类的指定图片,减少后续检测图片上表格框线交点的困难度。在所述指定图片中,不仅含有表格,还含有其他印刷字样以及边角的花纹等,在采用支持向量机对指定图片进行分类的过程中,这些其他印刷字样以及边角的花纹等并不起作用,反正会对获取向量造成干扰。因此,在本实施方式中,先确定所述待分类的指定图片中的封闭的表格框线位置,再去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
在一个实施方式中,所述根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)的步骤S2,包括:
S201、将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;
S202、获取所述4个第一距离中最小的距离对应的表格框线;
S203、基于所述4个第一距离中最小的距离对应的表格框线,根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点 至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
如上所述,实现了先获取所述4个第一距离中最小的距离对应的表格框线,再计算归一化向量。由于指定图片可能是经过旋转的,例如旋转90、180、270度,若将经过旋转的图片与未经旋转的图片进行对比分类,必然造成分类错误。因此需要统一图片的初始旋转角度(即确定标准图片)。具体的,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,再获取所述4个第一距离中最小的距离对应的表格框线,即以所述4个第一距离中最小的距离对应的表格框线为标准图片的表格框线,从而使分类更为准确。其中,所述支持向量机的训练集的训练数据,同样是以具有所述4个第一距离中最小的距离对应的表格框线的指定图片中获取的。
在一个实施方式中,所述将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果的步骤S5,包括:
S501、采用文字识别技术,获取所述待分类的指定图片中的文字信息;
S502、从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中;
S503、根据所述特定文字,利用预设的特定文字与指定图片的类别对应关系,获得所述待分类的指定图片的预估类别;
S504、若所述预估类别与所述max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果相同,则将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。
如上所述,实现了进一步提高分类准确性。由前述,已经采用支持向量机对所述待分类的指定图片进行分类。但为了避免机器学习模型的误判,本实施方式还采用预估类别来进一步提高分类准确性。具体地,由于不同的指定图片中具有特别的文字信息,例于住宅的指定图片中有“住宅”字样、商业用地的不动产证图片中有“商业”字样,通过提取这些特定文字,即可粗略预估指定图片的类别。其中,文字识别技术可以为任意可行的方式,例如采用OCR(光学字符识别)识别,由于文字识别技术已成熟,在此不再赘述。从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中包括:判断文字信息中是否存在所述特定文字表中的特定文字,若存在,将所述特定文字提取出来。在获得预估类别后,判断由支持向量机获得的max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果是否与预估类别相同,若相同,则表明支持向量机的分类是准确的。
在一个实施方式中,所述从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中的步骤S502,包括:
S5021、判断所述文字信息中是否存在预存于特定文字表中的特定文字;
S5022、若所述文字信息中存在预存于特定文字表中的特定文字,则从所述文字信息中提取所述特定文字。
如上所述,实现了从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中。其中,预存于特定文字表中的特定文字,能够反应指定图片的类别。若所述文字信息中存在预存于特定文字表中的特定文字,表明可以依据所述特定文字预估指定图片的类别。因此通过判断所述文字信息中是否存在预存于特定文字表中的特定文字,若所述文字信息中存在预存于特定文字表中的特定文字,则从所述文字信息中提取所述特定文字,以实现获取特定文字。
在一个实施方式中,所述支持向量机采用高斯核函数,所述高斯核函数的表达式为:K(x
i,x
j)=exp{-|x
i-x
j|
2/2σ
2},其中,x
i是n维向量(G1,G2,…Gn),x
j为核函数中心,σ为函数的宽度参数。
如上所述,实现了设置核函数。核函数与支持向量机是一一对应的,确定了核函数K(x
i,x
j)就隐含地确定了支持向量机。核函数的使用使支持向量机获得了强有力的非线性处理能力,并且避免了在高维特征空间上的复杂计算,有效的克服了维数灾难问题。本实施方式采用高斯核函数,表达式为:
K(x
i,x
j)=exp{-||x
i-x
j||
2/σ
2}}
,其中,x
i是n维向量(G1,G2,…Gn),x
j为核函数中心,σ为函数的宽度参数。其中,高斯核函数是一种径向基函数(Radial Basis Function简称RBF),利用高斯核函数构建出支持向量机。相对于多项式内核,RBF超参数要少,相对更为简单,并且,相对于多项式内核可能从0到无限大之间,其数值计算的压力也会少很多。因此,本实施方式采用高斯核函数。
本申请的基于支持向量机的文字识别方法,通过获取待分类的指定图片,计算出指定图片的归一化向量GI,从而获得n维向量(G1,G2,…Gn),将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,将所述待分类的指定图片的类别记为max(所述多个支持向量机的输出值)的返回值对应的初步分类结果,从而实现了自动快速准确地指定图片识别,可适用拉伸和倾斜情况下的指定图片识别。
参照图2,本申请实施例提供一种基于支持向量机的文字识别装置,包括:
指定图片获取单元10,用于获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;
n维向量获取单元20,用于根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个 交点,从而获得n维向量(G1,G2,…Gn);
支持向量机运算单元30,用于将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;
初步分类结果获取单元40,用于获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;
类别标记单元50,用于将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;
文字识别模式获取单元60,用于根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;
文字文本识别单元70,用于采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
其中上述单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述装置包括支持向量机获取单元,所述支持向量机获取单元,包括:
指定图片获取子单元,用于获取不同类别的指定图片;
归一化向量Gi获取子单元,用于根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);
训练集获取子单元,用于将所述不同类别的指定图片中的第k类指定图片的n维向量作为正集,除第k类指定图片之外的指定图片的n维向量作为负集,从而构成第k个支持向量机的训练集;
训练子单元,用于将第k个支持向量机的训练集的样本数据输入到支持向量机中进行训练,得到第k个支持向量机。
其中上述子单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述指定图片获取单元10包括:
待分类的指定图片获取子单元,用于获取待分类的指定图片;
表格框线位置确定子单元,用于检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;
去除子单元,用于去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
其中上述子单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述n维向量获取单元20,包括:
旋转子单元,用于将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;
表格框线获取子单元,用于获取所述4个第一距离中最小的距离对应的表格框线;
n维向量获取子单元,用于基于所述4个第一距离中最小的距离对应的表格框线,根据公式:
计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
其中上述子单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述类别标记单元50,包括:
文字识别子单元,用于采用文字识别技术,获取所述待分类的指定图片中的文字信息;
提取特定文字子单元,用于从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中;
预估类别子单元,用于根据所述特定文字,利用预设的特定文字与指定图片的类别对应关系,获得所述待分类的指定图片的预估类别;
类别标记子单元,用于若所述预估类别与所述max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果相同,则将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。
其中上述子单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述提取特定文字子单元,包括:
特定文字判断模块,用于判断所述文字信息中是否存在预存于特定文字表中的特定文字;
提取特定文字模块,用于若所述文字信息中存在预存于特定文字表中的特定文字,则从所述文字信息中提取所述特定文字。
其中上述模块分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述支持向量机获取单元包括:
核函数设置子单元,用于设置所述支持向量机的核函数为高斯核函数,所述高斯核函数的表达式为:K(x
i,x
j)=exp{-|x
i-x
j|
2/2σ
2},其中,x
i是n维向量(G1,G2,…Gn),x
j为核函数中心,σ为函数的宽度参数。
其中上述子单元分别用于执行的操作与前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
本申请的基于支持向量机的文字识别装置,通过获取待分类的指定图片,计算出指定图片的归一化向量GI,从而获得n维向量(G1,G2,…Gn),将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,将所述待分类的指定图片的类别记为max(所述多个支持向量机的输出值)的返回值对应的初步分类结果,从而实现了自动快速准确地指定图片识别,可适用拉伸和倾斜情况下的指定图片识别。
参照图3,本发明实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储基于支持向量机的文字识别方法所用数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于支持向量机的文字识别方法。
上述处理器执行上述基于支持向量机的文字识别方法,其中所述方法包括的步骤分别与执行前述实施方式的基于支持向量机的文字识别方法的步骤一一对应,在此不再赘述。
本领域技术人员可以理解,图中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请的计算机设备,通过获取待分类的指定图片,计算出指定图片的归一化向量GI,从而获得n维向量(G1,G2,…Gn),将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,将所述待分类的指定图片的类别记为max(所述多个支持向量机的输出值)的返回值对应的初步分类结果,从而实现了自动快速准确地指定图片识别,可适用拉伸和倾斜情况下的指定图片识别。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现基于支持向量机的文字识别方法,其中所述方法包括的步骤分别与执行前述实施方式的基于支 持向量机的文字识别方法的步骤一一对应,在此不再赘述。
本申请的计算机可读存储介质,通过获取待分类的指定图片,计算出指定图片的归一化向量GI,从而获得n维向量(G1,G2,…Gn),将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,将所述待分类的指定图片的类别记为max(所述多个支持向量机的输出值)的返回值对应的初步分类结果,从而实现了自动快速准确地指定图片识别,可适用拉伸和倾斜情况下的指定图片识别。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (20)
- 一种基于支持向量机的文字识别方法,其特征在于,包括:获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
- 根据权利要求1所述的基于支持向量机的文字识别方法,其特征在于,所述获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线的步骤,包括:获取待分类的指定图片;检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
- 根据权利要求1所述的基于支持向量机的文字识别方法,其特征在于,所述根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)的步骤,包括:将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;获取所述4个第一距离中最小的距离对应的表格框线;基于所述4个第一距离中最小的距离对应的表格框线,根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
- 根据权利要求1所述的基于支持向量机的文字识别方法,其特征在于,所述将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果的步骤,包括:采用文字识别技术,获取所述待分类的指定图片中的文字信息;从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中;根据所述特定文字,利用预设的特定文字与指定图片的类别对应关系,获得所述待分类的指定图片的预估类别;若所述预估类别与所述max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果相同,则将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。
- 根据权利要求5所述的基于支持向量机的文字识别方法,其特征在于,所述从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中的步骤,包括:判断所述文字信息中是否存在预存于特定文字表中的特定文字;若所述文字信息中存在预存于特定文字表中的特定文字,则从所述文字信息中提取所述特定文字。
- 一种基于支持向量机的文字识别装置,其特征在于,包括:指定图片获取单元,用于获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;n维向量获取单元,用于根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);支持向量机运算单元,用于将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;初步分类结果获取单元,用于获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;类别标记单元,用于将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;文字识别模式获取单元,用于根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;文字文本识别单元,用于采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
- 根据权利要求7所述的基于支持向量机的文字识别方法,其特征在于,所述装置包括支持向量机获取单元,所述支持向量机获取单元,包括:指定图片获取子单元,用于获取不同类别的指定图片;归一化向量Gi获取子单元,用于根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);训练集获取子单元,用于将所述不同类别的指定图片中的第k类指定图片的n维向量作为正集,除第k类指定图片之外的指定图片的n维向量作为负集,从而构成第k个支持向量机的训练集;训练子单元,用于将第k个支持向量机的训练集的样本数据输入到支持向量机中进行训练,得到第k个支持向量机。
- 根据权利要求7所述的基于支持向量机的文字识别方法,其特征在于,所述指定图片获取单元包括:待分类的指定图片获取子单元,用于获取待分类的指定图片;表格框线位置确定子单元,用于检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;去除子单元,用于去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
- 根据权利要求7所述的基于支持向量机的文字识别方法,其特征在于,所述n维向量获取单元,包括:旋转子单元,用于将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;表格框线获取子单元,用于获取所述4个第一距离中最小的距离对应的表格框线;n维向量获取子单元,用于基于所述4个第一距离中最小的距离对应的表格框线,根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
- 根据权利要求7所述的基于支持向量机的文字识别方法,其特征在于,所述类别标记单元,包括:文字识别子单元,用于采用文字识别技术,获取所述待分类的指定图片中的文字信息;提取特定文字子单元,用于从所述文字信息中提取特定文字,所述特定文字预存于特定文字表中;预估类别子单元,用于根据所述特定文字,利用预设的特定文字与指定图片的类别对应关系,获得所述待分类的指定图片的预估类别;类别标记子单元,用于若所述预估类别与所述max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果相同,则将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果。
- 根据权利要求11所述的基于支持向量机的文字识别方法,其特征在于,所述提取特定文字子 单元,包括:特定文字判断模块,用于判断所述文字信息中是否存在预存于特定文字表中的特定文字;提取特定文字模块,用于若所述文字信息中存在预存于特定文字表中的特定文字,则从所述文字信息中提取所述特定文字。
- 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现基于支持向量机的文字识别方法,该基于支持向量机的文字识别方法,包括:获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn);将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
- 根据权利要求13所述的计算机设备,其特征在于,所述获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线的步骤,包括:获取待分类的指定图片;检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
- 根据权利要求13所述的计算机设备,其特征在于,所述根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)的步骤,包括:将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;获取所述4个第一距离中最小的距离对应的表格框线;基于所述4个第一距离中最小的距离对应的表格框线,根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
- 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现基于支持向量机的文字识别方法,该基于支持向量机的文字识别方法,包括:获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线;根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个 交点,从而获得n维向量(G1,G2,…Gn);将所述n维向量(G1,G2,…Gn)输入预设的已训练完成的多个支持向量机中进行运算,其中第k个支持向量机能够将指定图片分类为第k类与除了第k类之外的其他类;获取多个支持向量机分别输出的多个初步分类结果以及与所述多个初步分类结果对应的输出值;将所述待分类的指定图片的类别记为max(所述多个初步分类结果对应的输出值)的返回值对应的初步分类结果;根据所述指定图片的类别,利用预设的类别与文字识别模式的对应关系,获取与所述指定图片对应的文字识别模式,其中所述文字识别模式指定了文字识别区域;采用预设的文字识别技术,将所述文字识别区域识别为文字文本,并存储所述文字文本。
- 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述获取待分类的指定图片,所述待分类的指定图片具有封闭的表格框线的步骤,包括:获取待分类的指定图片;检测所述待分类的指定图片,确定所述待分类的指定图片中的封闭的表格框线位置;去除所述待分类的指定图片中所述表格框线之外的部分,得到仅包括表格框线的待分类的指定图片。
- 根据权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个 交点,从而获得n维向量(G1,G2,…Gn)的步骤,包括:将所述表格框线顺时针或者逆时针依次旋转90度3次,并在每次旋转之前以及第3次旋转之后,计算所述表格框线的第一行的第二个交点与所述表格框线的第一行的第一个交点的第一距离,从而获得4个第一距离;获取所述4个第一距离中最小的距离对应的表格框线;基于所述4个第一距离中最小的距离对应的表格框线,根据公式:计算出指定图片的归一化向量Gi,其中,所述表格框线的左上角的交点为原点g0,gi是所述原点至所述表格框线中第i个交点的向量,i为大于等于1且小于等于n的整数,所述表格框线具有n+1个交点,从而获得n维向量(G1,G2,…Gn)。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910100425.2 | 2019-01-31 | ||
CN201910100425.2A CN109902724B (zh) | 2019-01-31 | 2019-01-31 | 基于支持向量机的文字识别方法、装置和计算机设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020155484A1 true WO2020155484A1 (zh) | 2020-08-06 |
Family
ID=66944661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/089057 WO2020155484A1 (zh) | 2019-01-31 | 2019-05-29 | 基于支持向量机的文字识别方法、装置和计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109902724B (zh) |
WO (1) | WO2020155484A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111611990B (zh) * | 2020-05-22 | 2023-10-31 | 北京百度网讯科技有限公司 | 用于识别图像中表格的方法和装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320961A (zh) * | 2015-10-16 | 2016-02-10 | 重庆邮电大学 | 基于卷积神经网络和支持向量机的手写数字识别方法 |
CN107239786A (zh) * | 2016-03-29 | 2017-10-10 | 阿里巴巴集团控股有限公司 | 一种字符识别方法和装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982343B (zh) * | 2012-11-12 | 2015-03-25 | 信阳师范学院 | 手写数字识别的增量式模糊支持向量机方法 |
CN104517112B (zh) * | 2013-09-29 | 2017-11-28 | 北大方正集团有限公司 | 一种表格识别方法与系统 |
CN107688829A (zh) * | 2017-08-29 | 2018-02-13 | 湖南财政经济学院 | 一种基于支持向量机的识别系统及识别方法 |
-
2019
- 2019-01-31 CN CN201910100425.2A patent/CN109902724B/zh active Active
- 2019-05-29 WO PCT/CN2019/089057 patent/WO2020155484A1/zh active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320961A (zh) * | 2015-10-16 | 2016-02-10 | 重庆邮电大学 | 基于卷积神经网络和支持向量机的手写数字识别方法 |
CN107239786A (zh) * | 2016-03-29 | 2017-10-10 | 阿里巴巴集团控股有限公司 | 一种字符识别方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN109902724B (zh) | 2023-09-01 |
CN109902724A (zh) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492643B (zh) | 基于ocr的证件识别方法、装置、计算机设备及存储介质 | |
WO2021120752A1 (zh) | 域自适应模型训练、图像检测方法、装置、设备及介质 | |
WO2020155518A1 (zh) | 物体检测方法、装置、计算机设备及存储介质 | |
WO2019128646A1 (zh) | 人脸检测方法、卷积神经网络参数的训练方法、装置及介质 | |
WO2019232862A1 (zh) | 嘴巴模型训练方法、嘴巴识别方法、装置、设备及介质 | |
US20190279045A1 (en) | Methods and apparatuses for identifying object category, and electronic devices | |
WO2019232866A1 (zh) | 人眼模型训练方法、人眼识别方法、装置、设备及介质 | |
Jiang et al. | Robust feature matching for remote sensing image registration via linear adaptive filtering | |
WO2019232853A1 (zh) | 中文模型训练、中文图像识别方法、装置、设备及介质 | |
Escalera et al. | Blurred shape model for binary and grey-level symbol recognition | |
WO2021136027A1 (zh) | 相似图像检测方法、装置、设备及存储介质 | |
EP3690700A1 (en) | Image similarity calculation method and device, and storage medium | |
WO2020220575A1 (zh) | 证件识别方法和装置、电子设备、计算机可读存储介质 | |
JP2021193610A (ja) | 情報処理方法、情報処理装置、電子機器及び記憶媒体 | |
US12112522B2 (en) | Defect detecting method based on dimensionality reduction of data, electronic device, and storage medium | |
CN112396047B (zh) | 训练样本生成方法、装置、计算机设备和存储介质 | |
JP6170860B2 (ja) | 文字認識装置及び識別関数生成方法 | |
US11893773B2 (en) | Finger vein comparison method, computer equipment, and storage medium | |
CN111523537A (zh) | 一种文字识别方法、存储介质及系统 | |
CN111199558A (zh) | 一种基于深度学习的图像匹配方法 | |
CN111985469A (zh) | 一种图像中文字的识别方法、装置及电子设备 | |
CN114357200A (zh) | 一种基于监督图嵌入的跨模态哈希检索方法 | |
Zhang et al. | Graph fusion network for multi-oriented object detection | |
WO2020155484A1 (zh) | 基于支持向量机的文字识别方法、装置和计算机设备 | |
CN109034279B (zh) | 手写模型训练方法、手写字识别方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19913496 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19913496 Country of ref document: EP Kind code of ref document: A1 |