CN116704519A - Character recognition method, character recognition device, electronic equipment and storage medium - Google Patents

Character recognition method, character recognition device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116704519A
CN116704519A CN202310673162.0A CN202310673162A CN116704519A CN 116704519 A CN116704519 A CN 116704519A CN 202310673162 A CN202310673162 A CN 202310673162A CN 116704519 A CN116704519 A CN 116704519A
Authority
CN
China
Prior art keywords
layer
character recognition
recognition model
character
probability matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310673162.0A
Other languages
Chinese (zh)
Inventor
崔雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haoxueduo Intelligent Technology Co ltd
Original Assignee
Shenzhen Rubu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Rubu Technology Co ltd filed Critical Shenzhen Rubu Technology Co ltd
Priority to CN202310673162.0A priority Critical patent/CN116704519A/en
Publication of CN116704519A publication Critical patent/CN116704519A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a character recognition method, a character recognition device, electronic equipment and a storage medium. The character recognition method comprises the following steps: the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix; and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result. According to the technical scheme, the probability matrix is calculated by using the light text recognition model, and the recognition result is output according to the probability matrix, so that the real-time performance of reasoning is improved, and the text recognition efficiency is improved.

Description

文字识别方法、装置、电子设备及存储介质Character recognition method, device, electronic device and storage medium

技术领域technical field

本发明实施例涉及图像处理领域,尤其涉及一种文字识别方法、装置、电子设备及存储介质。Embodiments of the present invention relate to the field of image processing, and in particular, to a character recognition method, device, electronic equipment, and storage medium.

背景技术Background technique

传统的光学字符识别(Optical Character Recognition,OCR)技术往往需要对图片进行多次处理,如去噪、二值化、字符分割等,这些处理过程不仅耗时,而且容易受到光照、噪声等因素的影响,从而影响识别精度。随着移动端芯片的算力不断增强,在移动端部署一些相对复杂、对算力要求较高的深度学习模型成为可能,对于在移动设备上进行深度学习推理的需求也越来越高。Traditional Optical Character Recognition (OCR) technology often requires multiple processing of images, such as denoising, binarization, character segmentation, etc. These processing processes are not only time-consuming, but also easily affected by factors such as light and noise. affect the recognition accuracy. As the computing power of mobile chips continues to increase, it becomes possible to deploy some relatively complex deep learning models that require high computing power on the mobile end, and the demand for deep learning reasoning on mobile devices is also increasing.

实时文字识别采用的模型是大参数量大计算量的模型,通常含有几百上千万个参数量,且模型大小更是在几十或上百兆不等,这在模型推理过程中将占用更大的内存资源和计算资源,这种模型适用于含图形处理器(Graphics Processing Unit,GPU)的云服务推理,因为其有更强的算力支撑,而在计算资源有限的移动端存在响应时间长或者系统崩溃的风险,推理速度慢,文字识别的效率低,难以满足移动端部署实时响应的需求。The model used in real-time text recognition is a model with a large amount of parameters and a large amount of calculation. It usually contains hundreds of millions of parameters, and the size of the model ranges from tens or hundreds of megabytes, which will take up Larger memory resources and computing resources, this model is suitable for cloud service inference with Graphics Processing Unit (GPU), because it has stronger computing power support, and there is a response on the mobile terminal with limited computing resources The risk of long time or system crash, slow reasoning speed, and low efficiency of text recognition make it difficult to meet the real-time response requirements of mobile terminal deployment.

发明内容Contents of the invention

本发明提供了一种文字识别方法、装置、电子设备及存储介质,以提高推理的实时性,提高文字识别效率。The invention provides a character recognition method, device, electronic equipment and storage medium to improve the real-time performance of reasoning and improve the efficiency of character recognition.

第一方面,本发明实施例提供了一种文字识别方法,包括:In a first aspect, an embodiment of the present invention provides a text recognition method, including:

获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;Obtain a text recognition model, the text recognition model includes a convolutional layer, a loop layer and a transcription layer, the convolutional layer is formed by stacking separable convolutional blocks in depth, and the convolutional layer is used to extract the feature sequence of the input image The cyclic layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the prediction result and the true value to obtain a probability matrix; the transcription layer is used to output the recognition result of the input image according to the probability matrix;

将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。The image containing the character to be recognized is input to the character recognition model to obtain a character recognition result.

可选的,每个所述深度可分离卷积块中包含短路连接;Optionally, each of the depthwise separable convolution blocks includes short-circuit connections;

若所述深度可分离卷积块位于下采样层,则所述短路连接为1×1的卷积;If the depthwise separable convolution block is located in the downsampling layer, the short-circuit connection is a 1×1 convolution;

否则,所述短路连接为上层输入的特征图。Otherwise, the short-circuit connection is the feature map input by the upper layer.

可选的,所述文字识别模型用于基于贪婪检索算法,对于每个切片,根据所述切片对应的概率矩阵取最大概率对应的字符作为所述切片对应的识别结果。Optionally, the character recognition model is used based on a greedy retrieval algorithm. For each slice, the character corresponding to the highest probability is selected according to the probability matrix corresponding to the slice as the recognition result corresponding to the slice.

可选的,将包含待识别文字的图像输入至所述文字识别模型,包括:Optionally, input the image containing the text to be recognized into the text recognition model, including:

逐段检测包含待识别文字的图像,每当检测到设定长度的图像时,将当前检测到的设定长度的图像输入至所述文字识别模型。The images containing the characters to be recognized are detected segment by segment, and each time an image with a set length is detected, the currently detected image with a set length is input to the character recognition model.

可选的,所述文字识别模型还用于:若所述文字识别模型针对不同的设定长度的图像的输出包含重合区域,则对于所述重合区域将相应概率矩阵中的各对应位置加权求均值,得到所述重合区域的概率矩阵。Optionally, the text recognition model is also used for: if the output of the text recognition model for images with different set lengths includes overlapping areas, then weighting the corresponding positions in the corresponding probability matrix for the overlapping areas mean, to obtain the probability matrix of the coincident region.

可选的,获取文字识别模型,包括:Optionally, obtain the text recognition model, including:

构建文字识别模型并对所述文字识别模型进行训练;Build a text recognition model and train the text recognition model;

将所述文字识别模型转化为开放标准文件格式并通过简化工具合并部分层;Converting the text recognition model into an open standard file format and merging partial layers through simplified tools;

使用转化工具将所述识别模型转化为推理框架格式。The recognition model is converted into an inference framework format using a conversion tool.

可选的,该方法还包括:按等比例的原则将包含待识别文字的图像缩放到预设尺寸。Optionally, the method further includes: scaling the image containing the text to be recognized to a preset size according to the principle of proportionality.

第二方面,本发明实施例还提供了一种文字识别装置,包括:In the second aspect, the embodiment of the present invention also provides a character recognition device, including:

获取模块,用于获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;The acquisition module is used to acquire a text recognition model, the text recognition model includes a convolutional layer, a loop layer and a transcription layer, the convolutional layer is formed by stacking separable convolutional blocks in depth, and the convolutional layer is used to extract The feature sequence of the input image; the recurrent layer is used to determine the predicted result of the input image according to the feature sequence and calculate the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used to output the input according to the probability matrix Image recognition results;

识别模块,用于将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。The recognition module is used to input the image containing the character to be recognized into the character recognition model to obtain the character recognition result.

第三方面,本发明实施例提供了一种电子设备,包括:In a third aspect, an embodiment of the present invention provides an electronic device, including:

一个或多个处理器;one or more processors;

存储装置,用于存储一个或多个程序;storage means for storing one or more programs;

当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的文字识别方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the character recognition method as described in the first aspect.

第四方面,本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的文字识别方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the character recognition method as described in the first aspect is implemented.

本发明实施例提供了一种文字识别方法、装置、电子设备及存储介质。该文字识别方法包括:获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。上述技术方案,利用轻量化的文字识别模型计算概率矩阵并根据概率矩阵输出识别结果,提高了推理的实时性,提高文字识别效率。The embodiment of the present invention provides a character recognition method, device, electronic equipment and storage medium. The text recognition method includes: obtaining a text recognition model, the text recognition model includes a convolutional layer, a loop layer and a transcription layer, the convolutional layer is formed by stacking separable convolutional blocks in depth, and the convolutional layer is used for Extracting the feature sequence of the input image; the recurrent layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the prediction result and the true value to obtain a probability matrix; the transcription layer is used to output according to the probability matrix Inputting the recognition result of the image; inputting the image containing the character to be recognized into the character recognition model to obtain the character recognition result. In the above technical solution, a lightweight character recognition model is used to calculate a probability matrix and output a recognition result according to the probability matrix, which improves the real-time performance of reasoning and improves the efficiency of character recognition.

附图说明Description of drawings

结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

图1为本发明实施例一提供的一种文字识别方法的流程图;FIG. 1 is a flow chart of a character recognition method provided by Embodiment 1 of the present invention;

图2为一实施例提供的一种短路连接的示意图;Fig. 2 is a schematic diagram of a short-circuit connection provided by an embodiment;

图3为本发明实施例二提供的一种文字识别方法的流程图;FIG. 3 is a flow chart of a character recognition method provided in Embodiment 2 of the present invention;

图4为一实施例提供的一种输出包含重合区域的示意图;FIG. 4 is a schematic diagram of an output including overlapping regions provided by an embodiment;

图5为一实施例提供的一种确定重合区域的概率矩阵的示意图;Fig. 5 is a schematic diagram of a probability matrix for determining overlapping regions provided by an embodiment;

图6为本发明实施例三提供的一种文字识别装置的结构示意图;FIG. 6 is a schematic structural diagram of a character recognition device provided in Embodiment 3 of the present invention;

图7为本发明实施例四提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。此外,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, the embodiments and the features in the embodiments of the present invention can be combined with each other under the condition of no conflict. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the steps as sequential processing, many of the steps may be performed in parallel, concurrently, or simultaneously. Additionally, the order of steps may be rearranged. The process may be terminated when its operations are complete, but may also have additional steps not included in the figure. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.

实施例一Embodiment one

图1为本发明实施例一提供的一种文字识别方法的流程图,本实施例可适用于识别文字的情况,例如,可以应用于扫描笔等一类可以自动采集数据的移动设备上。具体的,该文字识别方法可以由文字识别装置执行,该文字识别装置可以通过软件和/或硬件的方式实现,并集成在电子设备中。进一步的,电子设备可以指台式计算机、服务器、笔记本电脑或扫描设备等。FIG. 1 is a flow chart of a character recognition method provided by Embodiment 1 of the present invention. This embodiment is applicable to character recognition, for example, it can be applied to mobile devices such as scanning pens that can automatically collect data. Specifically, the character recognition method can be executed by a character recognition device, and the character recognition device can be realized by means of software and/or hardware, and integrated into an electronic device. Further, the electronic device may refer to a desktop computer, a server, a notebook computer, or a scanning device, etc.

如图1所示,该方法具体包括如下步骤:As shown in Figure 1, the method specifically includes the following steps:

S110、获取文字识别模型。S110. Acquire a character recognition model.

其中,文字识别模型主要指预先经过训练的深度学习模型,包括卷积层、循环层和转录层,卷积层由深度可分离卷积块堆叠而成,用于提取输入图像的特征序列;循环层用于根据特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;转录层用于根据所述概率矩阵输出输入图像的识别结果。Among them, the text recognition model mainly refers to a pre-trained deep learning model, including a convolutional layer, a recurrent layer, and a transcription layer. The convolutional layer is formed by stacking depth-separable convolutional blocks and is used to extract the feature sequence of the input image; the recurrent layer The layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the prediction result and the true value to obtain a probability matrix; the transcription layer is used to output the recognition result of the input image according to the probability matrix.

本实施例中,使用深度可分离卷积构建轻量化的文字识别模型。深度可分离卷积是一种轻量化的卷积操作,可以减少模型参数数量和计算量,其主要是由逐通道卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution)构成,能够在保持模型性能的同时,能够显著减少计算和存储资源的使用,可以适用于计算资源和存储资源受限的移动设备。In this embodiment, a lightweight character recognition model is constructed using depthwise separable convolution. Depthwise separable convolution is a lightweight convolution operation that can reduce the number of model parameters and calculations. It is mainly composed of channel-by-channel convolution (Depthwise Convolution) and pointwise convolution (Pointwise Convolution), which can maintain While improving the performance of the model, it can significantly reduce the use of computing and storage resources, and can be applied to mobile devices with limited computing and storage resources.

文字识别模型的主干网络(Backbone)可由深度可分离卷积块(Block)堆叠而成,以达到特征提取的目的。示例性的,对于输入宽高通道数为(100,32,3)维度的输入图像来说,经过卷积层下采样,图像宽高分别为原来的1/8和1/32,得到的特征图经过序列化操作得到(w/8,Batch,512)的特征序列,其中,w表示原始图像的宽度,Batch表示批处理的数量,其值为512。得到的特征序列送入到循环层,最终输出(w/4,batch,class_num)的结果,这个结果通过损失函数(如Connectionist Temporal Classification Loss,简称CTC Loss)将图像识别结果和真值进行对应并使用前向和后向推导的方式计算出预测结果和真值的差别,获得的损失(Loss)可进一步用于反向传播训练模型的参数。The backbone network (Backbone) of the text recognition model can be stacked by depth separable convolutional blocks (Block) to achieve the purpose of feature extraction. Exemplarily, for an input image whose input width and height channels are (100, 32, 3) dimensions, after the convolutional layer is down-sampled, the image width and height are respectively 1/8 and 1/32 of the original, and the obtained features The feature sequence of (w/8, Batch, 512) is obtained after the image is serialized, where w represents the width of the original image, and Batch represents the number of batches, and its value is 512. The obtained feature sequence is sent to the recurrent layer, and finally the result of (w/4, batch, class_num) is output. This result uses a loss function (such as Connectionist Temporal Classification Loss, referred to as CTC Loss) to match the image recognition result with the true value and The difference between the predicted result and the true value is calculated by using the forward and backward derivation methods, and the obtained loss (Loss) can be further used to backpropagate the parameters of the training model.

本实施例中,文字识别模型可以基于TNN推理框架实现。TNN是一种轻量化的跨平台深度学习推理框架,采用高度优化的计算引擎,可以实现高效的模型推理,支持中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)和人工智能(Artificial Intelligence,AI)加速器等多种硬件平台,并且具有较低的内存占用和计算资源消耗;此外,TNN支持多种深度学习框架和模型格式,能够快速进行模型部署和服务的开发和管理;具有较小的代码体积和二进制大小,适合移动设备、嵌入式系统等资源受限的环境下使用;并且支持多种操作系统和编程语言,可以在不同的硬件平台和系统环境下进行模型推理。In this embodiment, the text recognition model can be implemented based on the TNN reasoning framework. TNN is a lightweight cross-platform deep learning reasoning framework. It uses a highly optimized computing engine to achieve efficient model reasoning and supports central processing unit (Central Processing Unit, CPU), graphics processing unit (Graphics Processing Unit, GPU) and artificial intelligence (AI) accelerators and other hardware platforms, and has low memory footprint and computing resource consumption; in addition, TNN supports a variety of deep learning frameworks and model formats, enabling rapid model deployment and service development and management; with small code size and binary size, it is suitable for use in resource-constrained environments such as mobile devices and embedded systems; and supports multiple operating systems and programming languages, and can be implemented on different hardware platforms and system environments model reasoning.

S120、将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。S120. Input an image containing a character to be recognized into the character recognition model to obtain a character recognition result.

具体的,文字识别模型经过训练和测试后,具备可靠的识别文字的能力。在此基础上,将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。Specifically, after the character recognition model is trained and tested, it has a reliable ability to recognize characters. On this basis, the image containing the character to be recognized is input to the character recognition model to obtain a character recognition result.

可选的,该方法还包括:按等比例的原则将包含待识别文字的图像缩放到预设尺寸。示例性的,可以将检测得到的图像按等比例的原则对宽高进行拉伸,将图像缩放到(32,w,3)大小。Optionally, the method further includes: scaling the image containing the text to be recognized to a preset size according to the principle of proportionality. Exemplarily, the width and height of the detected image may be stretched according to the principle of equal proportion, and the image may be scaled to a size of (32, w, 3).

在实际应用中,文字识别模型可以部署在具有扫描功能的硬件中。识别的最终功能可以是有多个模型共同完成的,其中可以涉及检测模型,检测模型也可以由类似的Backbone来搭建,再配合三个轻量化的输出头,可以有效加快模型的推理速度。In practical applications, text recognition models can be deployed in hardware with scanning capabilities. The final function of recognition can be completed by multiple models, which can involve the detection model, which can also be built by a similar Backbone. Together with three lightweight output heads, it can effectively speed up the reasoning speed of the model.

本实施例中的文字识别方法,从模型优化的角度可以提升实时推理的速度,采用适用移动端的推理框架对推理速度进一步优化,可以达到实时推理的目的。传统的OCR技术往往需要对图片进行多次处理,如去噪、二值化、字符分割等,这些处理过程不仅耗时,而且容易受到光照、噪声等因素的影响,从而影响识别精度。而本实施例中的文字识别模型能够利用深度学习原理自动学习图像的特征,并直接输出识别结果,避免了传统OCR处理中的多个步骤,具有更高的识别精度和更快的速度,提高了文字识别的效率和准确性,同时能够满足移动端部署实时响应的需求。此外,基于深度视觉的实时文字识别技术还可以支持实时识别、多语言识别等功能,为用户提供更好的使用体验。The character recognition method in this embodiment can improve the speed of real-time reasoning from the perspective of model optimization, and further optimize the reasoning speed by using a reasoning framework suitable for mobile terminals, so as to achieve the purpose of real-time reasoning. Traditional OCR technology often requires multiple processing of images, such as denoising, binarization, character segmentation, etc. These processing processes are not only time-consuming, but also easily affected by factors such as light and noise, thereby affecting the recognition accuracy. However, the text recognition model in this embodiment can use the deep learning principle to automatically learn the features of the image, and directly output the recognition result, avoiding multiple steps in the traditional OCR process, having higher recognition accuracy and faster speed, and improving It not only improves the efficiency and accuracy of text recognition, but also meets the real-time response requirements of mobile terminal deployment. In addition, the real-time text recognition technology based on deep vision can also support real-time recognition, multi-language recognition and other functions to provide users with a better experience.

可选的,每个深度可分离卷积块中包含短路连接;若深度可分离卷积块位于下采样层,则短路连接为1×1的卷积;否则,短路连接为上层输入的特征图。Optionally, each depth-separable convolution block contains a short-circuit connection; if the depth-separable convolution block is located in the downsampling layer, the short-circuit connection is a 1×1 convolution; otherwise, the short-circuit connection is the feature map of the upper layer input .

具体的,每个Block中不仅包含深度可分离卷积模块,还可以包含短路连接,短路连接的方式可以根据该Block是否为下采样层决定。示例性的,图2为一实施例提供的一种短路连接的示意图。如图2中右侧所示,当Block所在层为下采样层,则短路连接为1×1(也可以表示为1*1)卷积;如图2中左侧所示,短路连接为上层输入的特征图。在每个Block中添加短路连接分支,既可以弥补深层网络可能会带来的梯度消失和梯度爆炸问题,也解决了深度可分离卷积提取特征不足的问题。Specifically, each Block not only contains depthwise separable convolution modules, but also contains short-circuit connections, and the way of short-circuit connections can be determined according to whether the Block is a downsampling layer. Exemplarily, FIG. 2 is a schematic diagram of a short-circuit connection provided by an embodiment. As shown on the right side of Figure 2, when the layer where the Block is located is the downsampling layer, the short-circuit connection is 1×1 (also can be expressed as 1*1) convolution; as shown on the left side of Figure 2, the short-circuit connection is the upper layer The input feature map. Adding a short-circuit connection branch in each block can not only make up for the gradient disappearance and gradient explosion problems that may be caused by the deep network, but also solve the problem of insufficient feature extraction by deep separable convolution.

可选的,文字识别模型用于基于贪婪检索算法,对于每个切片,根据切片对应的概率矩阵取最大概率对应的字符作为切片对应的识别结果。本实施例中,文字识别模型在识别过程中预测的输出是一个概率矩阵,与训练过程不同的是,此过程可使用贪婪搜索的方式,获取每个切片(Slice)概率最大的作为预测结果。Optionally, the character recognition model is used based on a greedy retrieval algorithm. For each slice, the character corresponding to the highest probability is selected according to the probability matrix corresponding to the slice as the recognition result corresponding to the slice. In this embodiment, the predicted output of the text recognition model during the recognition process is a probability matrix. Different from the training process, this process can use a greedy search method to obtain the highest probability of each slice (Slice) as the prediction result.

实施例二Embodiment two

图3为本发明实施例二提供的一种文字识别方法的流程图,本实施例在上述实施例的基础上,将文字识别过程具体化。需要说明的是,未在本实施例中详尽描述的技术细节可参见上述任意实施例。FIG. 3 is a flow chart of a character recognition method provided by Embodiment 2 of the present invention. This embodiment embodies the character recognition process on the basis of the above embodiments. It should be noted that for technical details not exhaustively described in this embodiment, reference may be made to any of the foregoing embodiments.

具体的,如图3所示,该方法具体包括如下步骤:Specifically, as shown in Figure 3, the method specifically includes the following steps:

S210、构建文字识别模型并对所述文字识别模型进行训练。S210. Construct a character recognition model and train the character recognition model.

S220、将所述文字识别模型转化为开放标准文件格式并通过简化工具合并部分层。S220, converting the character recognition model into an open standard file format and merging partial layers through a simplified tool.

本实施例中,可以使用Python机器学习库(如Pytorch)将训练好的文字识别模型导出为开放式标准文件格式(Open Neural Network Exchange,ONNX);此外,可以使用简化的工具合并一些层,也可以理解为去掉冗余层,具体的,文字识别模型可以是按照类似CRNN的模型结构搭建的,冗余层例如为中间的3*3的卷积层,这些层在一定程度上会增加模型的参数量,影响模型的推理速度。经过测试,去掉这些层不会明显降低模型的精度。In this embodiment, the trained text recognition model can be exported as an open standard file format (Open Neural Network Exchange, ONNX) using the Python machine learning library (such as Pytorch); in addition, some layers can be merged using simplified tools, and It can be understood as removing redundant layers. Specifically, the text recognition model can be built according to a model structure similar to CRNN. The redundant layer is, for example, the middle 3*3 convolutional layer. These layers will increase the model’s performance to a certain extent. The amount of parameters affects the inference speed of the model. After testing, removing these layers will not significantly reduce the accuracy of the model.

S230、使用转化工具将所述识别模型转化为推理框架格式。S230. Use a conversion tool to convert the recognition model into an inference framework format.

具体的,可以将转化后的ONNX格式的文字识别模型使用TNN转化工具转化为推理框架(TNN)格式,这有助于推理框架自加载模型并做进一步的量化工作。Specifically, the converted text recognition model in ONNX format can be converted into the inference framework (TNN) format using the TNN conversion tool, which helps the inference framework to self-load the model and do further quantitative work.

S240、逐段检测包含待识别文字的图像。S240. Detect the image containing the text to be recognized segment by segment.

S250、是否检测到设定长度的图像?若是,则执行S260,否则返回执行S240。S250. Is an image with a set length detected? If yes, execute S260, otherwise return to execute S240.

S260、将当前检测到的设定长度的图像输入至所述文字识别模型,得到对应的文字识别结果。S260. Input the currently detected image with a set length into the character recognition model to obtain a corresponding character recognition result.

S270、文字识别模型针对不同的设定长度的图像的输出是否包含重合区域?若是,则执行S280,否则返回执行S240。S270. Does the output of the text recognition model for images with different set lengths include overlapping regions? If yes, execute S280, otherwise return to execute S240.

S280、对于所述重合区域将相应概率矩阵中的各对应位置加权求均值,得到所述重合区域的概率矩阵。S280. For the overlapping area, weight and average each corresponding position in the corresponding probability matrix to obtain the probability matrix of the overlapping area.

本实施例中,考虑到实时的显示识别结果,整个流程可以按照实时拼接的方式进行,对于输入文字识别模型的图像逐段检测,每当拼接到一定长度时送入文字识别模型进行识别,其中,每次送入文字识别模型的一段图像的长度可以是相同的,也可以是不同的。在此基础上,检测和识别的过程截图可以是同步进行的。并且,文字识别模型针对每次输入图像得到的输出可能会存在重合区域。对于重合区域,可以将相应概率矩阵中的各对应位置加权求均值,得到该重合区域最终的概率矩阵。In this embodiment, considering the real-time display of recognition results, the whole process can be carried out in a real-time splicing manner. The images input to the character recognition model are detected segment by segment, and each time they are spliced to a certain length, they are sent to the character recognition model for recognition. , the length of an image sent to the text recognition model each time can be the same or different. On this basis, the process screenshots of detection and recognition can be performed simultaneously. Moreover, the output obtained by the text recognition model for each input image may have overlapping areas. For the overlapping area, the corresponding positions in the corresponding probability matrix can be weighted and averaged to obtain the final probability matrix of the overlapping area.

S290、根据所述概率矩阵得到所述重合区域的识别结果。S290. Obtain an identification result of the overlapping area according to the probability matrix.

具体的,根据最终的概率矩阵,可以将最大概率对应的字符作为所述切片对应的识别结果Specifically, according to the final probability matrix, the character corresponding to the maximum probability can be used as the recognition result corresponding to the slice

图4为一实施例提供的一种输出包含重合区域的示意图。由于一个字符可能会被切分成两个部分,因此送入文字识别模型的图像可能包含上次送入文字识别模型图像的后半部分,经过文字识别模型识别之后,相邻的两个结果之间可能存在一些字符重复。本实施例中,可以采用合并字符串的方式去掉冗余部分。如图4所示,这幅图每个小方框表示每张图经过识别之后每个slice的预测概率矩阵,这个slice对应到原输入图宽8高为32象素的图像。其中上部分是前一个图像的预测结果,下面的框序列是后一个图像的预测结果,可以看到相邻两个图有重合的部分。针对重合部分这个问题在是通过对重合区域的概率矩阵对应位置加权取平均的方式得以解决。Fig. 4 is a schematic diagram of an output including overlapping regions provided by an embodiment. Since a character may be divided into two parts, the image sent to the text recognition model may contain the second half of the image sent to the text recognition model last time. After the text recognition model is recognized, between two adjacent results There may be some character repetition. In this embodiment, redundant parts may be removed by merging character strings. As shown in Figure 4, each small box in this picture represents the predicted probability matrix of each slice after each picture has been recognized. This slice corresponds to the original input image with a width of 8 and a height of 32 pixels. The upper part is the prediction result of the previous image, and the lower frame sequence is the prediction result of the next image. It can be seen that the adjacent two images overlap. For the overlapping part, this problem is solved by weighting and averaging the corresponding positions of the probability matrix of the overlapping area.

图5为一实施例提供的一种确定重合区域的概率矩阵的示意图。如图5所示,img1最后的一个Slice结果在img2的居中位置,而img2最前面的一个Slice是img1的居中位置,这两个位置最容易预测出错,而在彼此的居中位置却会大大降低出错的风险。因此img2的最前面的一个Slice选取img1对应居中的预测结果,而img1最后的一个Slice结果选取下部分的居中Slice的预测结果。在实际拼接过程中KEYI根据拼接效果进行选择。Fig. 5 is a schematic diagram of a probability matrix for determining overlapping regions provided by an embodiment. As shown in Figure 5, the last Slice of img1 is at the center of img2, and the first Slice of img2 is the center of img1. These two positions are the easiest to predict and make mistakes, but the center position of each other will be greatly reduced. Risk of error. Therefore, the first Slice of img2 selects the prediction result corresponding to the center of img1, and the last Slice result of img1 selects the prediction result of the middle Slice in the lower part. In the actual splicing process, KEYI selects according to the splicing effect.

本发明实施例二提供的一种文字识别方法,采用轻量化框架搭建文字识别模型用于文字识别,能够识别多达10000个字符(包括中英文)。通过模型的转化并用移动端推理框架TNN部署,在模型精度不变的情况下,可以完成精准的文字识别任务;该文字识别模型可部署在资源有限的移动设备上,无需云服务器的辅助,可以离线使用,可以实现实时识别,并且可结合移动端推理框架对内存和算子的优化可适配各种低算力开发板,即使使用资源有限的CPU也可达到实时出结果的效果,即使不使用量化的情况下也可以完成实时显示的功能。为提高用户体验,移动端设备可采用图像分段或分块的方式进行识别推理,能够及时反馈识别结果,并高效准确地将识别结果进行合并,效提高识别拼接结果的正确性。此外,该文字识别模型还支持实时识别、多语言识别等功能。该文字识别方法采用基于Torch框架设计的文字识别模型,对预测结果的处理不在采用CTC Loss解码过程中搜索转录后最大概率的路径作为识别结果的方法,而是采用贪婪检索的方法,对每个Slice概率矩阵,只取最大概率对应的字符作为该Slice的识别结果,从而有效提高检索效率和推理速度。此外,该方法可以及时地对不定长的图像进行识别并完成结果的拼接,并能及时反馈到显示屏上,这既能降低模型推理错误率,也能改善用户体验感。A text recognition method provided by Embodiment 2 of the present invention uses a lightweight framework to build a text recognition model for text recognition, and can recognize up to 10,000 characters (including Chinese and English). Through the transformation of the model and the deployment of the mobile reasoning framework TNN, the precise text recognition task can be completed under the condition that the accuracy of the model remains unchanged; the text recognition model can be deployed on mobile devices with limited resources without the assistance of cloud servers. It can be used offline to achieve real-time recognition, and can be combined with the mobile terminal reasoning framework to optimize memory and operators. It can be adapted to various low-computing power development boards. Even with a CPU with limited resources, it can achieve real-time results. Even if you don’t The function of real-time display can also be completed when quantization is used. In order to improve the user experience, the mobile terminal device can use the image segment or block method for recognition reasoning, which can timely feedback the recognition results, and efficiently and accurately combine the recognition results to effectively improve the correctness of the recognition stitching results. In addition, the text recognition model also supports real-time recognition, multi-language recognition and other functions. The text recognition method adopts the text recognition model based on the Torch framework design, and the processing of the prediction results does not use the method of searching for the path with the highest probability after transcription in the CTC Loss decoding process as the recognition result, but uses the greedy retrieval method. The Slice probability matrix only takes the character corresponding to the maximum probability as the recognition result of the Slice, thus effectively improving the retrieval efficiency and reasoning speed. In addition, this method can identify images of variable length in time and complete the splicing of the results, which can be fed back to the display screen in time, which can not only reduce the error rate of model reasoning, but also improve the user experience.

实施例三Embodiment three

图6为本发明实施例三提供的一种文字识别装置的结构示意图。本实施例提供的文字识别装置包括:FIG. 6 is a schematic structural diagram of a character recognition device provided by Embodiment 3 of the present invention. The text recognition device provided in this embodiment includes:

获取模块310,用于获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;The obtaining module 310 is used to obtain a text recognition model, the text recognition model includes a convolutional layer, a loop layer and a transcription layer, the convolutional layer is formed by stacking separable convolutional blocks in depth, and the convolutional layer is used for Extracting the feature sequence of the input image; the recurrent layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the prediction result and the true value to obtain a probability matrix; the transcription layer is used to output according to the probability matrix The recognition result of the input image;

识别模块320,用于将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。The recognition module 320 is configured to input an image containing a character to be recognized into the character recognition model to obtain a character recognition result.

本发明实施例三提供的一种文字识别装置,通过获取模块获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;通过识别模块将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。在此基础上,利用轻量化的文字识别模型计算概率矩阵并根据概率矩阵输出识别结果,提高了推理的实时性,提高文字识别效率。A character recognition device provided in Embodiment 3 of the present invention obtains a character recognition model through an acquisition module. The character recognition model includes a convolution layer, a loop layer, and a transcription layer, and the convolution layer is stacked by depth-separable convolution blocks. Formed, the convolution layer is used to extract the feature sequence of the input image; the loop layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the prediction result and the true value to obtain a probability matrix; The transcription layer is used to output the recognition result of the input image according to the probability matrix; the recognition module inputs the image containing the character to be recognized to the character recognition model to obtain the character recognition result. On this basis, the lightweight text recognition model is used to calculate the probability matrix and output the recognition results according to the probability matrix, which improves the real-time reasoning and text recognition efficiency.

在上述实施例的基础上,每个所述深度可分离卷积块中包含短路连接;On the basis of the above embodiments, each of the depthwise separable convolution blocks includes a short-circuit connection;

若所述深度可分离卷积块位于下采样层,则所述短路连接为1×1的卷积;If the depthwise separable convolution block is located in the downsampling layer, the short-circuit connection is a 1×1 convolution;

否则,所述短路连接为上层输入的特征图。Otherwise, the short-circuit connection is the feature map input by the upper layer.

在上述实施例的基础上,所述文字识别模型用于基于贪婪检索算法,对于每个切片,根据所述切片对应的概率矩阵取最大概率对应的字符作为所述切片对应的识别结果。On the basis of the above embodiments, the character recognition model is used based on a greedy search algorithm. For each slice, the character corresponding to the highest probability is selected according to the probability matrix corresponding to the slice as the recognition result corresponding to the slice.

在上述实施例的基础上,识别模块包括:On the basis of the foregoing embodiments, the recognition module includes:

输入单元,用于逐段检测包含待识别文字的图像,每当检测到设定长度的图像时,将当前检测到的设定长度的图像输入至所述文字识别模型。The input unit is used to detect the image containing the character to be recognized segment by segment, and input the currently detected image of the set length to the character recognition model whenever an image of the set length is detected.

在上述实施例的基础上,所述文字识别模型还用于:若所述文字识别模型针对不同的设定长度的图像的输出包含重合区域,则对于所述重合区域将相应概率矩阵中的各对应位置加权求均值,得到所述重合区域的概率矩阵。On the basis of the above-mentioned embodiments, the character recognition model is further configured to: if the output of the character recognition model for images with different set lengths includes overlapping areas, then for the overlapping areas, each of the corresponding probability matrices The weighted mean value corresponding to the position is obtained to obtain the probability matrix of the overlapping area.

在上述实施例的基础上,获取模块310包括:On the basis of the foregoing embodiments, the acquisition module 310 includes:

构建单元,用于构建文字识别模型并对所述文字识别模型进行训练;A construction unit for constructing a text recognition model and training the text recognition model;

简化单元,用于将所述文字识别模型转化为开放标准文件格式并通过简化工具合并部分层;A simplification unit is used for converting the character recognition model into an open standard file format and merging partial layers through a simplification tool;

格式转换单元,用于使用转化工具将所述识别模型转化为推理框架格式。A format conversion unit, configured to use a conversion tool to convert the recognition model into an inference framework format.

在上述实施例的基础上,该装置还包括:On the basis of the foregoing embodiments, the device also includes:

缩放模块,用于按等比例的原则将包含待识别文字的图像缩放到预设尺寸。The scaling module is used to scale the image containing the text to be recognized to a preset size according to the principle of equal proportion.

本发明实施例三提供的文字识别装置可以用于执行上述任意实施例提供的文字识别方法,具备相应的功能和有益效果。The character recognition device provided in the third embodiment of the present invention can be used to execute the character recognition method provided in any of the above embodiments, and has corresponding functions and beneficial effects.

实施例四Embodiment four

图7示出了可以用来实施本发明的实施例的电子设备的结构示意图。FIG. 7 shows a schematic structural diagram of an electronic device that can be used to implement an embodiment of the present invention.

如图7所示,本实施例提供的一种电子设备,包括:处理器410和存储装置420。该电子设备中的处理器可以是一个或多个,图7中以一个处理器410为例,所述电子设备中的处理器410和存储装置420可以通过总线或其他方式连接,图7中以通过总线连接为例。As shown in FIG. 7 , an electronic device provided in this embodiment includes: a processor 410 and a storage device 420 . There may be one or more processors in the electronic device. A processor 410 is taken as an example in FIG. Take connection via bus as an example.

所述一个或多个程序被所述一个或多个处理器410执行,使得所述一个或多个处理器实现上述实施例中任意所述的文字识别方法。The one or more programs are executed by the one or more processors 410, so that the one or more processors implement any of the character recognition methods described in the foregoing embodiments.

该电子设备中的存储装置420作为一种计算机可读存储介质,可用于存储一个或多个程序,所述程序可以是软件程序、计算机可执行程序以及模块,如本发明实施例中文字识别方法对应的程序指令/模块(例如,附图6所示的文字识别装置中的模块,包括:获取模块310、识别模块320)。处理器410通过运行存储在存储装置420中的软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例中的文字识别方法。The storage device 420 in the electronic device is used as a computer-readable storage medium, and can be used to store one or more programs, and the programs can be software programs, computer-executable programs and modules, such as the character recognition method in the embodiment of the present invention Corresponding program instructions/modules (for example, modules in the character recognition device shown in FIG. 6 , including: acquisition module 310 and recognition module 320). The processor 410 executes various functional applications and data processing of the electronic device by running the software programs, instructions and modules stored in the storage device 420 , that is, implements the character recognition method in the above method embodiments.

存储装置420主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据电子设备的使用所创建的数据等(如上述实施例中的特征序列、概率矩阵、文字识别结果等)。此外,存储装置420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置420可进一步包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至服务器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The storage device 420 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data created according to the use of the electronic device (as described above) The feature sequence, probability matrix, character recognition results, etc. in the embodiment). In addition, the storage device 420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some examples, the storage device 420 may further include memories that are remotely located relative to the processor 410, and these remote memories may be connected to a server through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

并且,当上述电子设备中所包括一个或者多个程序被所述一个或者多个处理器410执行时,程序进行如下操作:获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。Moreover, when one or more programs included in the above-mentioned electronic device are executed by the one or more processors 410, the program performs the following operations: obtain a character recognition model, and the character recognition model includes a convolution layer, a loop layer and A transcription layer, the convolutional layer is stacked by depth-separable convolutional blocks, the convolutional layer is used to extract the feature sequence of the input image; the loop layer is used to determine the prediction result of the input image according to the feature sequence And calculate the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used to output the recognition result of the input image according to the probability matrix; input the image containing the text to be recognized to the text recognition model to obtain text recognition result.

电子设备还包括:通信装置430、输入装置440和输出装置450。The electronic device further includes: a communication device 430 , an input device 440 and an output device 450 .

电子设备中的处理器410、存储器420、通信装置430、输入装置440和输出装置450可以通过总线或其他方式连接,图4中以通过总线连接为例。The processor 410, the memory 420, the communication device 430, the input device 440 and the output device 450 in the electronic device may be connected through a bus or in other ways. In FIG. 4, connection through a bus is taken as an example.

输入装置440可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的按键信号输入。输出装置450可包括显示屏等显示设备。The input device 440 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the electronic device. The output device 450 may include a display device such as a display screen.

通信装置430可以包括接收器和发送器。通信装置430设置为根据处理器410的控制进行信息收发通信。Communications device 430 may include a receiver and a transmitter. The communication device 430 is configured to perform information sending and receiving communication according to the control of the processor 410 .

本实施例提出的电子设备与上述实施例提出的文字识别方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述任意实施例,并且本实施例具备与执行文字识别方法相同的有益效果。The electronic device proposed in this embodiment and the character recognition method proposed in the above embodiment belong to the same inventive concept. The technical details not described in detail in this embodiment can be referred to any of the above embodiments, and this embodiment has the same features as the character recognition method. beneficial effect.

在上述实施例的基础上,本实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被数据存储装置执行时实现本发明上述任意实施例中的机器人调度方法,该方法包括:获取文字识别模型,所述文字识别模型包括卷积层、循环层和转录层,所述卷积层由深度可分离卷积块堆叠而成,所述卷积层用于提取输入图像的特征序列;所述循环层用于根据所述特征序列确定输入图像的预测结果并计算预测结果与真值的偏差,得到概率矩阵;所述转录层用于根据所述概率矩阵输出输入图像的识别结果;将包含待识别文字的图像输入至所述文字识别模型,得到文字识别结果。On the basis of the above-mentioned embodiments, this embodiment also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a data storage device, the robot scheduling method in any of the above-mentioned embodiments of the present invention is implemented. The method includes: obtaining a text recognition model, the text recognition model includes a convolutional layer, a loop layer and a transcription layer, the convolutional layer is formed by stacking depth-separable convolutional blocks, and the convolutional layer is used to extract an input image The feature sequence; the recurrent layer is used to determine the prediction result of the input image according to the feature sequence and calculate the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used to output the input image according to the probability matrix Recognition result: input the image containing the character to be recognized into the character recognition model to obtain the character recognition result.

当然,本发明实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的数据存储方法操作,还可以执行本发明任意实施例所提供的数据存储方法中的相关操作,且具备相应的功能和有益效果。Of course, a storage medium containing computer-executable instructions provided by an embodiment of the present invention, the computer-executable instructions are not limited to the operation of the data storage method as described above, and can also execute the data storage method provided by any embodiment of the present invention Relevant operations in , and have corresponding functions and beneficial effects.

通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(RandomAccess Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的数据存储方法。Through the above description about the implementation mode, those skilled in the art can clearly understand that the present invention can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation mode . Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the data storage method described in each embodiment of the present invention.

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, rearrangements and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims (10)

1. A method of text recognition, comprising:
the method comprises the steps of obtaining a character recognition model, wherein the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth-separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and inputting the image containing the characters to be identified into the character identification model to obtain a character identification result.
2. The method of claim 1, wherein each of the depth separable convolution blocks includes a shorting connection therein;
if the depth separable convolution block is located in the downsampling layer, the short circuit connection is a convolution of 1×1;
otherwise, the short circuit connection is a characteristic diagram input by the upper layer.
3. The method of claim 1, wherein the word recognition model is configured to take, for each slice, a character corresponding to a maximum probability according to a probability matrix corresponding to the slice as a recognition result corresponding to the slice based on a greedy search algorithm.
4. The method of claim 1, wherein inputting an image containing text to be recognized into the text recognition model comprises:
detecting images containing characters to be recognized segment by segment, and inputting the currently detected images with the set length into the character recognition model every time the images with the set length are detected.
5. The method of claim 4, wherein the word recognition model is further configured to:
and if the text recognition model outputs images with different set lengths and comprises a superposition area, weighting and averaging the corresponding positions in the corresponding probability matrix of the superposition area to obtain the probability matrix of the superposition area.
6. The method of claim 1, wherein obtaining a word recognition model comprises:
constructing a character recognition model and training the character recognition model;
converting the character recognition model into an open standard file format and merging partial layers through a simplifying tool;
the recognition model is converted to an inference framework format using a conversion tool.
7. The method as recited in claim 1, further comprising:
and scaling the image containing the text to be recognized to a preset size according to the principle of equal proportion.
8. A character recognition device, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a character recognition model, the character recognition model comprises a convolution layer, a circulation layer and a transcription layer, the convolution layer is formed by stacking depth separable convolution blocks, and the convolution layer is used for extracting a characteristic sequence of an input image; the circulating layer is used for determining a predicted result of the input image according to the characteristic sequence and calculating the deviation between the predicted result and the true value to obtain a probability matrix; the transcription layer is used for outputting the recognition result of the input image according to the probability matrix;
and the recognition module is used for inputting the image containing the characters to be recognized into the character recognition model to obtain a character recognition result.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the word recognition method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a text recognition method as claimed in any one of claims 1-7.
CN202310673162.0A 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium Pending CN116704519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310673162.0A CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310673162.0A CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116704519A true CN116704519A (en) 2023-09-05

Family

ID=87830723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310673162.0A Pending CN116704519A (en) 2023-06-07 2023-06-07 Character recognition method, character recognition device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116704519A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912856A (en) * 2023-09-14 2023-10-20 深圳市贝铂智能科技有限公司 Image identification method and device of intelligent scanning pen and intelligent scanning pen
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912856A (en) * 2023-09-14 2023-10-20 深圳市贝铂智能科技有限公司 Image identification method and device of intelligent scanning pen and intelligent scanning pen
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production
CN117058689B (en) * 2023-10-09 2024-02-20 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Similar Documents

Publication Publication Date Title
EP4414890A1 (en) Model training and scene recognition method and apparatus, device, and medium
CN108304882B (en) Image classification method and device, server, user terminal and storage medium
WO2019100723A1 (en) Method and device for training multi-label classification model
CN112434721A (en) Image classification method, system, storage medium and terminal based on small sample learning
CN112232293A (en) Image processing model training method, image processing method and related equipment
CN116704519A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113792851B (en) Font generation model training method, font library building method, font generation model training device and font library building equipment
CN111666931B (en) Mixed convolution text image recognition method, device, equipment and storage medium
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111027576A (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
WO2023173552A1 (en) Establishment method for target detection model, application method for target detection model, and device, apparatus and medium
CN111932577B (en) Text detection method, electronic device and computer readable medium
KR20240144139A (en) Facial pose estimation method, apparatus, electronic device and storage medium
CN108304376B (en) Text vector determination method and device, storage medium and electronic device
CN115457365A (en) Model interpretation method and device, electronic equipment and storage medium
CN116563840B (en) Scene text detection and recognition method based on weak supervision cross-mode contrast learning
CN117744759A (en) Text information identification method and device, storage medium and electronic equipment
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN117036852A (en) Training data set determining method and device based on AIGC, storage medium and terminal
CN116205883A (en) PCB surface defect detection method, system, electronic equipment and medium
CN112200772A (en) Pox check out test set
CN115512360A (en) Text recognition method, device, equipment and storage medium
CN113361567B (en) Image processing method, device, electronic equipment and storage medium
CN119128108A (en) Reply generation method, device, electronic device and medium
CN118247617A (en) Information processing method, device, equipment and medium based on multi-mode large model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240518

Address after: 518000 303, block B, Fu'an science and technology building, No. 013, Gaoxin South 1st Road, high tech Zone community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant after: Shenzhen haoxueduo Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 518000 Guangdong 4 Baoan District City, Shenzhen Province, the third floor of the community of Taihang Wutong Industrial Park, 9A

Applicant before: Shenzhen Rubu Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 408, Building B, Fu'an Technology Building, No. 013 Gaoxin South 1st Road, High tech Zone Community, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province 518000

Applicant after: Shenzhen Tingling Intelligent Technology Co.,Ltd.

Address before: Room 303, Building B, Fu'an Technology Building, No. 013 Gaoxin South 1st Road, High tech Zone Community, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen haoxueduo Intelligent Technology Co.,Ltd.

Country or region before: China