WO2022142611A1 - 文字识别方法及装置、存储介质、计算机设备 - Google Patents
文字识别方法及装置、存储介质、计算机设备 Download PDFInfo
- Publication number
- WO2022142611A1 WO2022142611A1 PCT/CN2021/125181 CN2021125181W WO2022142611A1 WO 2022142611 A1 WO2022142611 A1 WO 2022142611A1 CN 2021125181 W CN2021125181 W CN 2021125181W WO 2022142611 A1 WO2022142611 A1 WO 2022142611A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- image
- underlying
- feature
- data set
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 239000013598 vector Substances 0.000 claims abstract description 39
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000007499 fusion processing Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 108
- 238000013527 convolutional neural network Methods 0.000 claims description 29
- 230000004927 fusion Effects 0.000 claims description 28
- 230000011218 segmentation Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 238000002372 labelling Methods 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the technical field of character recognition, and in particular, to a character recognition method and device, storage medium, and computer equipment.
- Character recognition is a key step in Optical Character Recognition (OCR), and its applications in the financial field include bank card recognition, ID card recognition, and bill recognition.
- OCR Optical Character Recognition
- the present application provides a character recognition method and device, a storage medium, and a computer device.
- a method for character recognition comprising:
- the feature vector is recognized by a pre-trained text recognition model to obtain text data, wherein the text recognition model uses a convolutional neural network model configured with convolution kernels of various sizes and a pre-constructed first training
- the data set and the second training data set are obtained by training;
- the text data is output.
- a character recognition device comprising:
- the acquisition unit is used to acquire text images
- an extraction unit configured to extract the underlying features of the text image, perform fusion processing on the obtained underlying color features and underlying texture features, and determine the feature vector of the preset text area in the text image;
- the recognition unit is configured to perform recognition processing on the feature vector through a pre-trained text recognition model to obtain text data, wherein the text recognition model uses a convolutional neural network model configured with convolution kernels of various sizes and a The constructed first training data set and the second training data set are obtained by training;
- An output unit for outputting the text data.
- the present application can realize character recognition without labeling data, thereby improving the character recognition efficiency.
- FIG. 1 shows a flowchart of a method for character recognition provided by an embodiment of the present application
- FIG. 2 shows a block diagram of the composition of a character recognition device provided by an embodiment of the present application
- FIG. 3 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the embodiment of the present application provides a method for character recognition, as shown in FIG. 1 , the method includes:
- the text image may specifically be an image containing text in different languages.
- the demand for text recognition in small languages such as Thai has become increasingly strong, and the text recognition technology for small languages such as Thai is also As the times require, there are also a large number of text images in small languages on the Internet, thereby expanding the application scope of the embodiments of the present application, and quickly obtaining training samples and test samples.
- the specific process may include: reading the text region of the text image; extracting the underlying color feature and the underlying texture feature according to the text region; fusing the underlying color feature and the underlying texture feature to obtain the underlying local feature; extracting The label layer global feature of the text area; the feature vectors of all pixels in the text area are obtained by fusing the bottom layer local feature of the text area and the label layer global feature of the text area.
- the character recognition model is obtained by training using a convolutional neural network model configured with convolution kernels of various sizes and a pre-constructed first training data set and a second training data set.
- the text recognition model can be a pre-trained convolutional neural network model, and a large number of unlabeled text images collected on the network in advance are used as training sample data and test sample data. For example, 10W unlabeled small language text images can be collected, 9W unlabeled small language text images are determined as training sample data, and 1W unlabeled small language text images are determined as test sample data.
- Feature extraction is performed on the training sample data and the test sample data to obtain feature vectors, and text data can be obtained by recognizing the feature vectors through a pre-trained text recognition model.
- the text data can be output, and in practical application scenarios, the output text data can be saved to a pre-established area In the block chain network node, in order to improve the security of the text data storage, especially for some privacy-related information, it can effectively prevent the leakage of information.
- the present application provides a text recognition method, which can obtain text images; perform underlying feature extraction on the text images, and fuse the obtained underlying color features and underlying texture features to determine the features of preset text regions in the text images.
- the feature vector is identified and processed by a pre-trained text recognition model to obtain text data, wherein the text recognition model is a convolutional neural network model configured with convolution kernels of various sizes and a pre-constructed No.
- a training data set and a second training data set are obtained by training; and the text data is output. Therefore, it can solve the technical problems that the existing annotation data is small, the manual annotation requires a strong understanding, and the annotation is difficult, and the text recognition can be realized without the annotation data, and the recognition efficiency of the text can be improved.
- the embodiment of the present application provides several optional embodiments, but is not limited thereto, as follows:
- the method further includes: reading the text image in the text area; extracting the underlying color feature and the underlying texture feature according to the text region; fusing the underlying color feature and the underlying texture feature to obtain low-level local features; extracting the label layer global features of the text region; fusing the bottom layer local features of the text region and the label layer global features of the text region to obtain feature vectors of all pixels in the text region.
- texture is an important underlying feature to describe the image, it is a global feature, it is an obvious feature of the surface of the object or the description area in the image.
- the texture feature of the image reflects the grayscale change law of the image, as well as the structural information and spatial distribution information of the image.
- the grayscale change law can be digitized and texture features can be extracted.
- the texture features have scale invariance. and rotational invariance.
- the underlying texture feature may be obtained by a statistical analysis method, a structural analysis method, or a spectrum analysis method, etc., which is not specified in the embodiment of the present application.
- Color is another important underlying feature of an image. It describes the visual characteristics of an image or multiple pairs of regions. It is widely used in color image processing.
- the extraction process can specifically include: selecting an appropriate color space to describe color features; using Certain methods quantify color features. Commonly used color spaces include RGB, CMY, etc., which are not specified in the embodiments of the present application.
- the method further includes: extracting the underlying color feature of each pixel in the text area block in the RGB color space; converting the text area into a grayscale image; Gabor texture features are extracted to obtain the underlying texture features of each pixel; the underlying color features and the underlying texture features are fused to obtain underlying local features.
- the specific process of extracting the underlying color feature of each pixel in the text area block in the RGB color space may include: first decomposing the image into three color channels of R, G, and B, and extracting each pixel The corresponding R, G, B color components and the mean of the three components of the 8-neighbor pixel R, G, B.
- the process of extracting Gabor texture features from the grayscale image to obtain the underlying texture features of each pixel may specifically include: using a group of Gabor filters with the same frequency, different directions and different scales on the grayscale image of the text area block.
- i represents the scale
- j represents the direction
- gabori represents the combined Gabor filtered image of scale i, which represents the Gabor filtered image of scale i and direction j, and then merged after filtering
- the corresponding texture features are extracted from the image, and the mean and variance of the 3 ⁇ 3 neighborhood Gabor coefficients of each pixel are extracted as the texture features of each pixel.
- the color image can be converted into a gray image first; the same frequency Gabor filter is used on the gray image in 8 directions (0°, 30°, 45°, 60°, 90°, 120°, 135°, 150°) and 3 scales; merge the output images after Gabor filtering at each scale to obtain a merged filtered image; for each image, a merged Gabor filter can be obtained at each scale image.
- the corresponding texture features are extracted from the 3 ⁇ 3 neighborhood of each pixel on the combined Gabor filtered image; the mean and variance of the Gabor coefficients are extracted from the 3 ⁇ 3 neighborhood of each pixel as the texture feature of each pixel.
- 2-dimensional Gabor features can be extracted on each scale, so a total of 6-dimensional Gabor features are extracted on 3 scales.
- the method further includes: using the constructed first training data set to train a preset convolutional neural network model to obtain a pre-training model; using the constructed second training data set to train the pre-training The model is trained to obtain a text recognition model.
- the first training data set may be unlabeled text images.
- the text is usually arranged horizontally along the long side, then the long side of the image can be segmented, and the image can be divided into several sub-blocks, so as to identify the characters in each sub-block, so as to be able to splicing
- the image must be segmented.
- the number of sub-blocks that the image is divided into can be set as required, which is not specified in this embodiment of the present application.
- the convolutional neural network model can calculate the cross-entropy loss for two binary classification tasks, including: 1. Whether the order of the current image sub-blocks is correct; 2. Whether there are sub-blocks from other images in the current image sub-block sequence. piece.
- the overall optimization goal is to minimize the sum of the cross-entropy losses for the two binary classification tasks, enabling semantic information learned from a large amount of unlabeled image data.
- cross-entropy can be used as a loss function.
- the cross-entropy is often used in classification problems, especially when neural networks are used for classification problems.
- Cross-entropy is often used as a loss function, because cross-entropy involves calculating each category.
- the method further includes: using preconfigured convolution kernels of various sizes to preset a convolutional neural network model; dividing the acquired unlabeled image into a plurality of sub-blocks, and assigning all of them according to a preset probability.
- the multiple sub-blocks are randomly scrambled or replaced to construct a first training data set; the convolutional neural network model is trained by using the first training data set to obtain a pre-training model; the acquired marked images are Divide into a plurality of sub-blocks, and randomly scramble or replace the plurality of sub-blocks according to a preset probability to construct a second training data set; use the second training data set to train the pre-training model to obtain Text recognition model.
- the image information can be convoluted by designing a multi-scale sliding window convolutional neural network in advance.
- the size of the input image is W ⁇ H
- three types of 2 ⁇ H, 3 ⁇ H, and 4 ⁇ H can be used.
- the convolution kernel of size learn the context information between 2 pixels, 3 pixels and 4 pixels respectively, each size convolution kernel can slide on the image with a step size of 1 pixel, so as to capture different lengths respectively.
- the number and size of the pre-configured convolution kernels can be set according to requirements, and this application does not specify them.
- the information of ab can be extracted through the convolution kernel of 2 ⁇ H size
- the information of abc can be extracted through the convolution kernel of 3 ⁇ H size
- the The convolution kernel of size 4 ⁇ H can extract the information of abcd.
- the method further includes: determining the minimum size of the segmented area according to the acquired unlabeled image; according to the minimum size of the segmented area, performing superpixel segmentation on the unlabeled image to obtain a segmented image; Determine an image fusion threshold based on the segmented image; perform regional fusion on the segmented image according to the image fusion threshold to obtain a fusion image; mark a local area that only includes one target image in the local area of the fusion image, Get annotated images.
- superpixel segmentation is an image segmentation technology proposed and developed by Xiaofeng Ren in 2003, which refers to irregular pixel blocks with certain visual significance composed of adjacent pixels with similar texture, color, brightness and other characteristics. It uses the similarity of features between pixels to group pixels, and replaces a large number of pixels with a small number of superpixels to express image features, which greatly reduces the complexity of image post-processing, so it is usually used as a preprocessing step in segmentation algorithms.
- the target detection model may be used to determine the coordinate information of the target image in the to-be-labeled image; based on the coordinate information, the minimum size of the segmented area is determined. Obtaining the second average color value of the pixels of the unmarked area in the current segmented image; and determining the current threshold based on the second average color value. Acquire the first average color value of the pixels in each segmented area in the segmented image; according to the current threshold and the first average color value, by means of area fusion, combine the two segments in the segmented image with the first average color value. A current fused image is obtained by merging the corresponding two segmented regions in the segmented image whose difference in average color value is less than the current threshold. According to the category of the target image, the local area including only one target image in the local area is marked.
- the method may further include: saving the output text data in a storage node of a pre-established blockchain network.
- the present application provides a text recognition method, which can obtain text images; perform underlying feature extraction on the text images, and fuse the obtained underlying color features and underlying texture features to determine the features of preset text regions in the text images.
- the feature vector is identified and processed by a pre-trained text recognition model to obtain text data, wherein the text recognition model is a convolutional neural network model configured with convolution kernels of various sizes and a pre-constructed No.
- a training data set and a second training data set are obtained by training; and the text data is output. Therefore, it can solve the technical problems that the existing annotation data is small, the manual annotation requires a strong understanding, and the annotation is difficult, and the text recognition can be realized without the annotation data, and the recognition efficiency of the text can be improved.
- an embodiment of the present application provides a character recognition device. As shown in FIG. 2, the device includes:
- an acquisition unit 21 which can be used to acquire text images
- the extraction unit 22 can be used to extract the underlying features of the text image, perform fusion processing on the obtained underlying color features and underlying texture features, and determine the feature vector of the preset text area in the text image;
- the recognition unit 23 can be used to perform recognition processing on the feature vector through a pre-trained text recognition model to obtain text data, wherein the text recognition model is a convolutional neural network model configured with convolution kernels of various sizes And the pre-constructed first training data set and the second training data set are obtained by training;
- the output unit 24 can be used to output the text data.
- the extraction unit 22 includes:
- the reading module 221 can be used to read the text area of the text image
- the extraction module 222 can be used to extract the underlying color feature and the underlying texture feature according to the text area;
- the fusion module 223 can be used to fuse the underlying color features and underlying texture features to obtain underlying local features;
- the extraction module 222 can also be specifically used to extract the label layer global feature of the text region;
- the fusion module 223 can also be specifically configured to fuse the underlying local features of the text region with the label layer global features of the text region to obtain feature vectors of all pixels in the text region.
- the extraction module 222 can specifically be used to extract the underlying color feature of each pixel in the text area block in the RGB color space; convert the text area into a grayscale image; The Gabor texture feature is extracted from the top to obtain the underlying texture feature of each pixel; the underlying color feature and the underlying texture feature are fused to obtain the underlying local feature.
- the device also includes:
- the first training unit 25 can be used to train a preset convolutional neural network model by using the constructed first training data set to obtain a pre-training model;
- the second training unit 26 may be configured to use the constructed second training data set to train the pre-training model to obtain a character recognition model.
- the device also includes:
- the configuration unit 27 can be used to preset a convolutional neural network model by using preconfigured convolution kernels of various sizes;
- the first construction unit 28 can be used to divide the acquired unlabeled image into multiple sub-blocks, and randomly scramble or replace the multiple sub-blocks according to a preset probability to construct a first training data set;
- the first training unit 25 may be specifically configured to use the first training data set to train the convolutional neural network model to obtain a pre-training model
- the second construction unit 29 can be used to divide the acquired marked image into a plurality of sub-blocks, and randomly scramble or replace the plurality of sub-blocks according to a preset probability to construct a second training data set;
- the second training unit 26 may be specifically configured to use the second training data set to train the pre-training model to obtain a character recognition model.
- the device also includes:
- the determining unit 210 may be configured to determine the minimum size of the segmented area according to the acquired unlabeled image
- the segmentation unit 211 can be configured to perform superpixel segmentation on the unlabeled image according to the minimum size of the segmented area to obtain a segmented image;
- the determining unit 210 may be specifically configured to determine an image fusion threshold based on the segmented image
- the fusion unit 212 may be configured to perform regional fusion on the segmented image according to the image fusion threshold to obtain a fusion image;
- the labeling unit 213 may be configured to label a local area that only includes one target image in the local area of the fused image to obtain an labeled image.
- the device also includes:
- the saving unit 214 can be used to save the output text data in the storage node of the pre-established blockchain network.
- a storage medium stores at least one executable instruction, and the computer-executable instruction can execute the character recognition method in any of the foregoing method embodiments.
- Fig. 3 shows a schematic structural diagram of a computer device provided according to an embodiment of the present application, and the specific embodiment of the present application does not limit the specific implementation of the computer device.
- the computer device may include: a processor (processor) 302 , a communication interface (Communications Interface) 304 , a memory (memory) 306 , and a communication bus 308 .
- processor processor
- Communication interface Communication Interface
- memory memory
- communication bus 308 a communication bus
- the processor 302 , the communication interface 304 , and the memory 306 communicate with each other through the communication bus 308 .
- the communication interface 304 is used for communicating with network elements of other devices such as clients or other servers.
- the processor 302 is configured to execute the program 310, and specifically may execute the relevant steps in the above embodiments of the character recognition method.
- the program 310 may include program code including computer operation instructions.
- the processor 302 may be a central processing unit (CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
- the one or more processors included in the computer equipment may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
- the memory 306 is used to store the program 310 .
- Memory 306 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
- the memory can be non-volatile or volatile.
- the program 310 can specifically be used to cause the processor 302 to perform the following operations:
- the feature vector is recognized by a pre-trained text recognition model to obtain text data, wherein the text recognition model is a convolutional neural network model configured with convolution kernels of various sizes and a pre-constructed first training
- the data set and the second training data set are obtained by training;
- the text data is output.
- modules or steps of the present application can be implemented by a general-purpose computing device, and they can be centralized on a single computing device, or distributed in a network composed of multiple computing devices Alternatively, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device and executed by the computing device, and in some cases, in a different order than here
- the steps shown or described are performed either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps of them into a single integrated circuit module.
- the present application is not limited to any particular combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
本申请公开了一种文字识别方法及装置、存储介质、计算机设备,涉及文字识别技术领域,主要目的在于能够无需标注数据即可实现文字识别,提高文字的识别效率,同时利用区块链网络节点存储识别结果,提高识别结果的存储安全性。包括:获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据;输出所述文本数据。本申请适用于文字的识别。
Description
本申请要求与2020年12月28日提交中国专利局、申请号为CN202011576748.8申请名称为“文字识别方法及装置、存储介质、计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及一种文字识别技术领域,特别是涉及一种文字识别方法及装置、存储介质、计算机设备。
随着计算机技术的应用越来越广泛,文字识别也逐渐应用于不同的领域。文字识别是光学字符识别(Optical Character Recognition,OCR)中的关键步骤,在金融领域中的应用包括银行卡识别、身份证识别、票据识别等。近年来,随着东南亚国家金融数字化转型,小语种如泰语的文字识别需求愈发旺盛,针对泰语等小语种的文字识别技术也应运而生。
目前,传统的文字识别方法通常是基于深度学习模型,用CTC损失函数衡量预测结果和真实标注的误差。然而,这种方式需要有大量的标注数据,而小语种文字识别面临很大的挑战:现有小语种标注的数据少,且人工标注需要对小语种有较强的理解,标注难度大。
发明内容
有鉴于此,本申请提供一种文字识别方法及装置、存储介质、计算机设备。
依据本申请一个方面,提供了一种文字识别方法,包括:
获取文字图像;
对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;
通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;
输出所述文本数据。
依据本申请另一个方面,提供了一种文字识别装置,包括:
获取单元,用于获取文字图像;
提取单元,用于对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;
识别单元,用于通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;
输出单元,用于输出所述文本数据。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请能够无需标注数据即可实现文字识别,提高文字的识别效率。
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种文字识别方法流程图;
图2示出了本申请实施例提供的一种文字识别装置组成框图;
图3示出了本申请实施例提供的一种计算机设备的结构示意图。
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例提供了一种文字识别方法,如图1所示,该方法包括:
101、获取文字图像。
其中,所述文字图像具体可以为包含不同语种的文字的图像,近年来,随着东南亚国家金融数字化转型,小语种如泰语的文字识别需求愈发旺盛,针对泰语等小语种的文字识别技术也应运而生,网络上也存在海量的小语种文字图像,从而扩大本申请实施例的应用范围,快速获取训练样本和测试样本。
102、对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量。
其中,具体过程可以包括:对所述文字图像进行文字区域读取;根据所述文字区域抽取底层颜色特征和底层纹理特征;对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;抽取所述文字区域的标签层全局特征;将所述文字区域的底层局部特征与所述文字区域的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。具体地,在所述文字图像中,仅需要将存在小语种文字的区域读取出来,并抽取所述小语种文字区域的底层颜色特征和底层纹理特征,将所述底层颜色特征和底层纹理特征进行叠加,得到融合后的底层局部特征。通过抽取所述小语种文字区域的标签层全局特征, 并将所述底层局部特征与所述标签层全局特征进行直接叠加融合,即可得到所述小语种文字区域中所有像素的特征向量。以便于后续能够利用所述特征向量进行文字的识别,提高识别效率和准确性。
103、通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据。
其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的。所述文字识别模型具体可以为预先训练的卷积神经网络模型,预先网络上收集的海量未标注的文字图像作为训练样本数据和测试样本数据,如可以收集10W张未标注的小语种文字图像,将9W张未标注的小语种文字图像确定为训练样本数据,而将1W张未标注的小语种文字图像确定为测试样本数据。对所述训练样本数据和测试样本数据进行特征提取,得到特征向量,通过预先训练的文字识别模型对所述特征向量进行识别处理,从而能够得到文本数据。
104、输出所述文本数据。
具体地,在通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据之后,可以输出所述文本数据,在实际应用场景中,可以将输出的文本数据保存至预先建立的区块链网络节点中,以提高所述文本数据存储的安全性,尤其是对于一些涉及隐私的信息,能够有效防止信息的外泄。
本申请提供一种文字识别方法,能够获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;输出所述文本数据。从而能够解决现有标注的数据少,且人工标注需要对有较强的理解,标注难度大的技术问题,无需标注数据即可实现文字识别,提高文字的识别效率。
进一步的,为了更好的说明上述文字识别方法的过程,作为对上述实施例的细化和扩展,本申请实施例提供了几种可选实施例,但不限于此,具体如下所示:
对于本申请实施例,所述方法还包括:对所述文字图像进行文字区域读取;根据所述文字区域抽取底层颜色特征和底层纹理特征;对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;抽取所述文字区域的标签层全局特征;将所述文字区域的底层局部特征与所述文字区域的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。
其中,纹理是描述图像的一个重要底层特征,它是一种全局特征,它是物体表面或者图像中描述区域的一种明显特征。图像的纹理特征反映了图像的灰度变化规律,也反映了图像的结构信息和空间分布信息,在分析图像时,可以将灰度变化规律数字化并提取纹理特征,所述纹理特征具有尺度不变性和旋转不变性。对于本申请实施例,所述底层纹理特征具体可以通过统计分析法、结构分析法或频谱分析法等得到,本申请实施例 不做具体规定。颜色是图像另一个重要的底层特征,它描述的是图像或者多对区域的视觉特性,其在彩色图像处理中应用较为广泛,其抽取过程具体可以包括:选择合适的彩色空间描述颜色特征;采用一定的方法量化颜色特征。常用的颜色空间包括RGB、CMY等,本申请实施例不做具体规定。
对于本申请实施例,所述方法还包括:在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征;将所述文字区域转换为灰度图像;在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征;将所述底层颜色特征和所述底层纹理特征进行融合,得到底层局部特征。
其中,所述在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征的具体过程可以包括:首先将图像分解为R、G、B三个颜色通道,对每个像素点抽取相应的R、G、B颜色分量和8邻域像素R、G、B三个分量的均值。所述在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征的过程具体可以包括:在文字区域块的灰度图像上用一组同一频率不同方向不同尺度的Gabor滤波器进行滤波,然后将滤波后的图像按照公式进行合并其中i表示尺度,j表示方向,gabori表示尺度i的合并Gabor滤波图像,表示尺度i方向j上的Gabor滤波图像,然后在滤波后的合并图像上抽取相应的纹理特征,抽取每个像素点的3×3邻域Gabor系数的均值和方差作为每个像素点的纹理特征。具体地,可以先将彩色图像转换为灰色图像;在所述灰度图像上用同一频率Gabor滤波器分别在8个方向(0°、30°、45°、60°、90°、120°、135°、150°)和3个尺度上进行滤波;对每个尺度Gabor滤波后的输出图像进行合并处理,获得合并滤波图像;对于每幅图像,在每个尺度上可获得一幅合并Gabor滤波图像。在合并的Gabor滤波图像上对每个像素点的3×3邻域抽取相应的纹理特征;在每个像素点的3×3邻域抽取Gabor系数均值和方差作为每个像素点的纹理特征。在每个尺度上可抽取2维Gabor特征,所以在3各尺度上共抽取6维Gabor特征。
对于本申请实施例,所述方法还包括:利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型;利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
其中,所述第一训练数据集可以为未标注的文字图像。对于未标注的文字图像,通常文字是沿长边横向排列,则可以从图像的长边进行切分,将所述图像等分成若干子块,以便于识别各个子块中的字符,从而能够拼接为完整的一句话,由于分类的结果是单个字符,所以必须对图像进行切分,对于本申请实施例,图像划分为子块的数量可以根据需求进行设置,本申请实施例不做具体规定。按照预设的概率将子块随机打乱或者按照预设的概率从其他图像中用若干个子块替换当前图像的子块,如可以以50%的概率对多个子块进行随机打乱,或以50%的概率从其他图像中用3个子块替换当前图像的子块,从而得到用于预训练模型的第一训练数据集。需要说明的是,由于本步骤属于自监督学习的范畴,不需要人工进行标注处理,所以在实际应用场景中,可以通过代码进行实现, 从而提高训练数据构造的效率和准确率。
所述卷积神经网络模型可以为对两个二分类任务求交叉熵损失,具体包括:1.当前图像子块的排列顺序是否正确;2.当前图像子块序列中是否有来自其他图像的子块。总的优化目标是最小化两个二分类任务的交叉熵损失之和,从而能够从大量的无标注图像数据中学习的语义信息。具体地,可以使用交叉熵作为损失函数,所述交叉熵经常用于分类问题中,特别是在神经网络做分类问题时,也经常使用交叉熵作为损失函数,由于交叉熵涉及到计算每个类别的概率,所以交叉熵几乎每次都和sigmoid(或softmax)函数一起出现。具体地,在二分的情况下,模型最后需要预测的结果只有两种情况,对于每个类别我们的预测得到的概率为p和1-p。此时表达式为:
-pi——表示样本i预测为正的概率。
对于本申请实施例,所述方法还包括:利用预先配置的多种尺寸的卷积核,预设卷积神经网络模型;将获取的未标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第一训练数据集;利用所述第一训练数据集对所述卷积神经网络模型进行训练,得到预训练模型;将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集;利用所述第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
其中,可以通过预先设计多尺度滑窗卷积神经网络对图像信息进行卷积处理,例如,假设输入图像的尺寸为W×H,则可以使用2×H、3×H、4×H三种尺寸的卷积核,分别学习2个像素,3个像素和4个像素间的上下文信息,每种尺寸卷积核可以以1个像素点为步长在所属图像上滑动,从而分别捕捉不同长度的上下文信息,而对于本申请,预先配置的卷积核数量和尺寸可以根据需求自行设置,本申请不做具体指定。具体地,如输入图像中包括abcd四个字符,每个字符占一个像素,则通过2×H尺寸的卷积核能提取ab的信息,通过3×H尺寸的卷积核能提取abc的信息,通过4×H尺寸的卷积核能提取abcd的信息。在实际场景中,由于获取的图像中句子的长度不同,而且拥有特定含义的词组长度不同,所以需要通过不同尺寸的卷积核才能更好地识别不同句子中的文字含义。
对于本申请实施例,所述方法还包括:根据获取的未标注图像,确定分割区域的最小尺寸;根据所述分割区域的最小尺寸,将所述未标注图像进行超像素分割,得到分割图像;基于所述分割图像确定图像融合阈值;根据所述图像融合阈值,对所述分割图像进行区域融合,得到融合图像;对所述融合图像的局部区域中仅包括一个目标图像的局部区域进行标注,得到已标注图像。
其中,超像素分割是2003年Xiaofeng Ren提出和发展起来的图像分割技术,是指具有相似纹理、颜色、亮度等特征的相邻像素构成的有一定视觉意义的不规则像素块。 它利用像素之间特征的相似性将像素分组,用少量的超像素代替大量的像素来表达图片特征,很大程度上降低了图像后处理的复杂度,所以通常作为分割算法的预处理步骤。
具体地,可以通过目标检测模型,确定所述待标注图像中的目标图像的坐标信息;基于所述坐标信息,确定分割区域的最小尺寸。获取当前分割图像中未标注的区域的像素的第二平均色彩值;基于所述第二平均色彩值,确定所述当前阈值。获取所述分割图像中各个分割区域内的像素的第一平均色彩值;根据所述当前阈值和所述第一平均色彩值,采用区域融合的方式,将所述分割图像中两个所述第一平均色彩值的差小于所述当前阈值的所述分割图像中对应的两个所述分割区域进行融合,得到当前融合图像。根据所述目标图像的类别,对所述局部区域中仅包括一个所述目标图像的局部区域进行标注。
对于本申请实施例,为了进一步提高识别结果的存储安全性,防止信息的外泄,所述方法还可以包括:将输出的文本数据保存在预先建立的区块链网络的存储节点中。
本申请提供一种文字识别方法,能够获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;输出所述文本数据。从而能够解决现有标注的数据少,且人工标注需要对有较强的理解,标注难度大的技术问题,无需标注数据即可实现文字识别,提高文字的识别效率。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种文字识别装置,如图2所示,该装置包括:
获取单元21,可以用于获取文字图像;
提取单元22,可以用于对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;
识别单元23,可以用于通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;
输出单元24,可以用于输出所述文本数据。
进一步地,所述提取单元22,包括:
读取模块221,可以用于对所述文字图像进行文字区域读取;
抽取模块222,可以用于根据所述文字区域抽取底层颜色特征和底层纹理特征;
融合模块223,可以用于对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;
所述抽取模块222具体还可以用于抽取所述文字区域的标签层全局特征;
所述融合模块223具体还可以用于将所述文字区域的底层局部特征与所述文字区域 的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。
进一步地,所述抽取模块222具体可以可以用于在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征;将所述文字区域转换为灰度图像;在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征;将所述底层颜色特征和所述底层纹理特征进行融合,得到底层局部特征。
进一步地,所述装置还包括:
第一训练单元25,可以用于利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型;
第二训练单元26,可以用于利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
进一步地,所述装置还包括:
配置单元27,可以用于利用预先配置的多种尺寸的卷积核,预设卷积神经网络模型;
第一构造单元28,可以用于将获取的未标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第一训练数据集;
所述第一训练单元25具体可以用于利用所述第一训练数据集对所述卷积神经网络模型进行训练,得到预训练模型;
第二构造单元29,可以用于将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集;
所述第二训练单元26具体可以用于利用所述第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
进一步地,所述装置还包括:
确定单元210,可以用于根据获取的未标注图像,确定分割区域的最小尺寸;
分割单元211,可以用于根据所述分割区域的最小尺寸,将所述未标注图像进行超像素分割,得到分割图像;
所述确定单元210具体可以用于基于所述分割图像确定图像融合阈值;
融合单元212,可以用于根据所述图像融合阈值,对所述分割图像进行区域融合,得到融合图像;
标注单元213,可以用于对所述融合图像的局部区域中仅包括一个目标图像的局部区域进行标注,得到已标注图像。
进一步地,所述装置还包括:
保存单元214,可以用于将输出的文本数据保存在预先建立的区块链网络的存储节点中。
根据本申请一个实施例提供了一种存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的文字识别方法。
图3示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具 体实施例并不对计算机设备的具体实现做限定。
如图3所示,该计算机设备可以包括:处理器(processor)302、通信接口(Communications Interface)304、存储器(memory)306、以及通信总线308。
其中:处理器302、通信接口304、以及存储器306通过通信总线308完成相互间的通信。
通信接口304,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器302,用于执行程序310,具体可以执行上述文字识别方法实施例中的相关步骤。
具体地,程序310可以包括程序代码,该程序代码包括计算机操作指令。
处理器302可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器306,用于存放程序310。存储器306可能包含高速RAM存储器,也可能还包括存储器(non-volatile memory),例如至少一个磁盘存储器。所述存储器可以是非易失性的,也可以是易失性的。
程序310具体可以用于使得处理器302执行以下操作:
获取文字图像;
对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;
通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;
输出所述文本数据。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。
Claims (20)
- 一种文字识别方法,其中,包括:获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用未标注文字图像构造的第一训练数据集和已标注文字图像构造的第二训练数据集进行训练得到的;输出所述文本数据。
- 根据权利要求1所述的方法,其中,所述对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量,包括:对所述文字图像进行文字区域读取;根据所述文字区域抽取底层颜色特征和底层纹理特征;对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;抽取所述文字区域的标签层全局特征;将所述文字区域的底层局部特征与所述文字区域的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。
- 根据权利要求2所述的方法,其中,所述根据所述文字区域抽取底层颜色特征和底层纹理特征,包括:在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征;所述对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征,包括:将所述文字区域转换为灰度图像;在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征;将所述底层颜色特征和所述底层纹理特征进行融合,得到底层局部特征。
- 根据权利要求1所述的方法,其中,所述通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据之前,所述方法还包括:利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型;利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求4所述的方法,其中,所述利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型,包括:利用预先配置的多种尺寸的卷积核,预设卷积神经网络模型;将获取的未标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱 或替换处理,以构造第一训练数据集;利用所述第一训练数据集对所述卷积神经网络模型进行训练,得到预训练模型;所述利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型,包括:将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集;利用所述第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求5所述的方法,其中,所述将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集之前,所述方法还包括:根据获取的未标注图像,确定分割区域的最小尺寸;根据所述分割区域的最小尺寸,将所述未标注图像进行超像素分割,得到分割图像;基于所述分割图像确定图像融合阈值;根据所述图像融合阈值,对所述分割图像进行区域融合,得到融合图像;对所述融合图像的局部区域中仅包括一个目标图像的局部区域进行标注,得到已标注图像。
- 根据权利要求1所述的方法,其中,所述输出所述文本数据之后,所述方法还包括:将输出的文本数据保存在预先建立的区块链网络的存储节点中。
- 一种文字识别装置,其中,包括:获取单元,用于获取文字图像;提取单元,用于对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;识别单元,用于通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用配置有多种尺寸卷积核的卷积神经网络模型以及预先构造的第一训练数据集和第二训练数据集进行训练得到的;输出单元,用于输出所述文本数据。
- 一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现文字识别方法,包括:获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用未标注文字图像构造的第一训练数据集和已标注文字图像构造的第二训练数据集进行训练得到的;输出所述文本数据。
- 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量,包括:对所述文字图像进行文字区域读取;根据所述文字区域抽取底层颜色特征和底层纹理特征;对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;抽取所述文字区域的标签层全局特征;将所述文字区域的底层局部特征与所述文字区域的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。
- 根据权利要求10所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现述根据所述文字区域抽取底层颜色特征和底层纹理特征,包括:在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征;所述对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征,包括:将所述文字区域转换为灰度图像;在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征;将所述底层颜色特征和所述底层纹理特征进行融合,得到底层局部特征。
- 根据权利要求10所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据之前,所述方法还包括:利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型;利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求12所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型,包括:利用预先配置的多种尺寸的卷积核,预设卷积神经网络模型;将获取的未标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第一训练数据集;利用所述第一训练数据集对所述卷积神经网络模型进行训练,得到预训练模型;所述利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型,包括:将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集;利用所述第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求13所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集之前,所述方法还包括:根据获取的未标注图像,确定分割区域的最小尺寸;根据所述分割区域的最小尺寸,将所述未标注图像进行超像素分割,得到分割图像;基于所述分割图像确定图像融合阈值;根据所述图像融合阈值,对所述分割图像进行区域融合,得到融合图像;对所述融合图像的局部区域中仅包括一个目标图像的局部区域进行标注,得到已标注图像。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现文字识别方法,包括:获取文字图像;对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量;通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据,其中,所述文字识别模型为利用未标注文字图像构造的第一训练数据集和已标注文字图像构造的第二训练数据集进行训练得到的;输出所述文本数据。
- 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现对所述文字图像进行底层特征提取,将得到的底层颜色特征和底层纹理特征进行融合处理,确定所述文字图像中预设文字区域的特征向量,包括:对所述文字图像进行文字区域读取;根据所述文字区域抽取底层颜色特征和底层纹理特征;对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征;抽取所述文字区域的标签层全局特征;将所述文字区域的底层局部特征与所述文字区域的标签层全局特征进行融合得到所述文字区域中所有像素的特征向量。
- 根据权利要求16所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现根据所述文字区域抽取底层颜色特征和底层纹理特征,包括:在RGB颜色空间抽取所述文字区域块中每个像素点的底层颜色特征;所述对所述底层颜色特征和底层纹理特征进行融合得到底层局部特征,包括:将所述文字区域转换为灰度图像;在所述灰度图像上抽取Gabor纹理特征,得到每个像素点的底层纹理特征;将所述底层颜色特征和所述底层纹理特征进行融合,得到底层局部特征。
- 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现通过预先训练的文字识别模型对所述特征向量进行识别处理,得到文本数据之前,所述方法还包括:利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型;利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求18所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现利用构造的第一训练数据集对预设的卷积神经网络模型进行训练,得到预训练模型,包括:利用预先配置的多种尺寸的卷积核,预设卷积神经网络模型;将获取的未标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第一训练数据集;利用所述第一训练数据集对所述卷积神经网络模型进行训练,得到预训练模型;所述利用构造的第二训练数据集对所述预训练模型进行训练,得到文字识别模型,包括:将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集;利用所述第二训练数据集对所述预训练模型进行训练,得到文字识别模型。
- 根据权利要求19所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现将获取的已标注图像划分为多个子块,按照预设概率对所述多个子块进行随机打乱或替换处理,以构造第二训练数据集之前,所述方法还包括:根据获取的未标注图像,确定分割区域的最小尺寸;根据所述分割区域的最小尺寸,将所述未标注图像进行超像素分割,得到分割图像;基于所述分割图像确定图像融合阈值;根据所述图像融合阈值,对所述分割图像进行区域融合,得到融合图像;对所述融合图像的局部区域中仅包括一个目标图像的局部区域进行标注,得到已标注图像。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011576748.8A CN112613502A (zh) | 2020-12-28 | 2020-12-28 | 文字识别方法及装置、存储介质、计算机设备 |
CN202011576748.8 | 2020-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022142611A1 true WO2022142611A1 (zh) | 2022-07-07 |
Family
ID=75248299
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/125181 WO2022142611A1 (zh) | 2020-12-28 | 2021-10-21 | 文字识别方法及装置、存储介质、计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112613502A (zh) |
WO (1) | WO2022142611A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051686A (zh) * | 2023-01-13 | 2023-05-02 | 中国科学技术大学 | 图上文字擦除方法、系统、设备及存储介质 |
CN116939292A (zh) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | 轨道交通环境下的视频文本内容监测方法及系统 |
CN117727037A (zh) * | 2023-10-09 | 2024-03-19 | 书行科技(北京)有限公司 | 一种文本识别方法、装置、计算机设备、存储介质及产品 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613502A (zh) * | 2020-12-28 | 2021-04-06 | 深圳壹账通智能科技有限公司 | 文字识别方法及装置、存储介质、计算机设备 |
CN113033465B (zh) * | 2021-04-13 | 2023-11-14 | 北京百度网讯科技有限公司 | 活体检测模型训练方法、装置、设备以及存储介质 |
CN113129298B (zh) * | 2021-05-06 | 2024-01-12 | 北京思图场景数据科技服务有限公司 | 文本图像的清晰度识别方法 |
CN113159223A (zh) * | 2021-05-17 | 2021-07-23 | 湖北工业大学 | 一种基于自监督学习的颈动脉超声图像识别方法 |
CN113449725B (zh) * | 2021-06-30 | 2024-02-02 | 平安科技(深圳)有限公司 | 对象分类方法、装置、设备及存储介质 |
CN113420766B (zh) * | 2021-07-05 | 2022-09-16 | 北京理工大学 | 一种融合语言信息的低资源语种ocr方法 |
CN113822275B (zh) * | 2021-09-27 | 2024-09-27 | 北京有竹居网络技术有限公司 | 一种图像语种识别方法及其相关设备 |
CN115273184B (zh) * | 2022-07-15 | 2023-05-05 | 北京百度网讯科技有限公司 | 人脸活体检测模型训练方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968637A (zh) * | 2012-12-20 | 2013-03-13 | 山东科技大学 | 一种复杂背景图像文字分割方法 |
CN106599051A (zh) * | 2016-11-15 | 2017-04-26 | 北京航空航天大学 | 一种基于生成图像标注库的图像自动标注的方法 |
CN112613502A (zh) * | 2020-12-28 | 2021-04-06 | 深圳壹账通智能科技有限公司 | 文字识别方法及装置、存储介质、计算机设备 |
-
2020
- 2020-12-28 CN CN202011576748.8A patent/CN112613502A/zh active Pending
-
2021
- 2021-10-21 WO PCT/CN2021/125181 patent/WO2022142611A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968637A (zh) * | 2012-12-20 | 2013-03-13 | 山东科技大学 | 一种复杂背景图像文字分割方法 |
CN106599051A (zh) * | 2016-11-15 | 2017-04-26 | 北京航空航天大学 | 一种基于生成图像标注库的图像自动标注的方法 |
CN112613502A (zh) * | 2020-12-28 | 2021-04-06 | 深圳壹账通智能科技有限公司 | 文字识别方法及装置、存储介质、计算机设备 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051686A (zh) * | 2023-01-13 | 2023-05-02 | 中国科学技术大学 | 图上文字擦除方法、系统、设备及存储介质 |
CN116051686B (zh) * | 2023-01-13 | 2023-08-01 | 中国科学技术大学 | 图上文字擦除方法、系统、设备及存储介质 |
CN116939292A (zh) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | 轨道交通环境下的视频文本内容监测方法及系统 |
CN116939292B (zh) * | 2023-09-15 | 2023-11-24 | 天津市北海通信技术有限公司 | 轨道交通环境下的视频文本内容监测方法及系统 |
CN117727037A (zh) * | 2023-10-09 | 2024-03-19 | 书行科技(北京)有限公司 | 一种文本识别方法、装置、计算机设备、存储介质及产品 |
Also Published As
Publication number | Publication date |
---|---|
CN112613502A (zh) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022142611A1 (zh) | 文字识别方法及装置、存储介质、计算机设备 | |
He et al. | Multi-scale multi-task fcn for semantic page segmentation and table detection | |
TWI744283B (zh) | 一種單詞的分割方法和裝置 | |
CN111985464B (zh) | 面向法院判决文书的多尺度学习的文字识别方法及系统 | |
EP3660733A1 (en) | Method and system for information extraction from document images using conversational interface and database querying | |
CN113111871B (zh) | 文本识别模型的训练方法及装置、文本识别方法及装置 | |
Karatzas et al. | ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email) | |
RU2631168C2 (ru) | Способы и устройства, которые преобразуют изображения документов в электронные документы с использованием trie-структуры данных, содержащей непараметризованные символы для определения слов и морфем на изображении документа | |
RU2643465C2 (ru) | Устройства и способы, которые используют иерархически упорядоченную структуру данных, содержащую непараметризованные символы, для преобразования изображений документов в электронные документы | |
Demilew et al. | Ancient Geez script recognition using deep learning | |
CN112069900A (zh) | 基于卷积神经网络的票据文字识别方法及系统 | |
Wu et al. | Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks. | |
CN110796145A (zh) | 基于智能决策的多证件分割关联方法及相关设备 | |
Chen et al. | Page segmentation for historical handwritten document images using conditional random fields | |
Akanksh et al. | Automated invoice data extraction using image processing | |
Al Ghamdi | A novel approach to printed Arabic optical character recognition | |
Vidhyalakshmi et al. | Text detection in natural images with hybrid stroke feature transform and high performance deep Convnet computing | |
Devi et al. | Brahmi script recognition system using deep learning techniques | |
CN114708591B (zh) | 基于单字连接的文档图像中文字符检测方法 | |
CN116030472A (zh) | 文字坐标确定方法及装置 | |
Bhatt et al. | Text Extraction & Recognition from Visiting Cards | |
CN114332493A (zh) | 一种跨维度交互式显著检测模型及其检测方法 | |
Jian et al. | Research on born-digital image text extraction based on conditional random field | |
Gatos et al. | An efficient segmentation-free approach to assist old Greek handwritten manuscript OCR | |
Rani et al. | Object Detection in Natural Scene Images Using Thresholding Techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 21.08.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21913356 Country of ref document: EP Kind code of ref document: A1 |