WO2020258491A1 - 通用字符识别方法、装置、计算机设备和存储介质 - Google Patents

通用字符识别方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020258491A1
WO2020258491A1 PCT/CN2019/102942 CN2019102942W WO2020258491A1 WO 2020258491 A1 WO2020258491 A1 WO 2020258491A1 CN 2019102942 W CN2019102942 W CN 2019102942W WO 2020258491 A1 WO2020258491 A1 WO 2020258491A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognized
character
image
character image
feature
Prior art date
Application number
PCT/CN2019/102942
Other languages
English (en)
French (fr)
Inventor
王健宗
闫旭
王威
韩茂琨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910574434.5A external-priority patent/CN110414520B/zh
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020258491A1 publication Critical patent/WO2020258491A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • This application relates to a universal character recognition method, device, computer equipment, and storage medium method, device, computer equipment, and storage medium.
  • Universal characters refer to commonly visible text characters, including text, numbers, letters, and special characters.
  • the existing deep neural networks connected layer by layer are usually used to recognize general characters.
  • general character recognition in order to achieve a large number of recognized characters, high recognition accuracy, and can cope with demanding requirements such as complex text scenes, we often need to use a deeper neural network to perform more complex feature patterns. extract.
  • the inventor realized that the existing deep neural network has a degradation problem, that is, when the number of layers of the deep neural network is increased, the accuracy of the deep neural network will be saturated or even decreased, resulting in a decrease in recognition accuracy .
  • a universal character recognition method, device, computer equipment, and storage medium are provided.
  • a universal character recognition method including:
  • the character in the character image to be recognized is determined according to the feature matrix of the character image to be recognized.
  • a universal character recognition device including:
  • the receiving module is used to receive the image to be recognized, perform text detection on the image to be recognized, and obtain the character image to be recognized;
  • a conversion module configured to perform image digitization processing on the character image to be recognized to obtain a three-dimensional matrix corresponding to the character image to be recognized;
  • An extraction module configured to input the three-dimensional matrix into a preset dense connection network to extract image features of the character image to be recognized by using the dense connection network to obtain a feature matrix of the character image to be recognized;
  • the determining module is configured to determine the characters in the character image to be recognized according to the feature matrix of the character image to be recognized.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the character in the character image to be recognized is determined according to the feature matrix of the character image to be recognized.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the character in the character image to be recognized is determined according to the feature matrix of the character image to be recognized.
  • Fig. 1 is an application scenario diagram of a universal character recognition method according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a universal character recognition method according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of the steps of performing feature extraction on a three-dimensional matrix using a densely connected network to obtain a feature matrix of a character image to be recognized according to one or more embodiments.
  • Fig. 4 is a structural diagram of a densely connected network according to one or more embodiments.
  • Fig. 5 is a structural diagram of a character recognition model according to one or more embodiments.
  • Fig. 6 is a block diagram of a universal character recognition device according to one or more embodiments.
  • Figure 7 is a block diagram of a computer device according to one or more embodiments.
  • the universal character recognition method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 and the server 104 communicate through the network.
  • the server 104 receives the image to be recognized sent by the terminal 102, and the server 104 performs text detection on the image to be recognized to obtain the character image to be recognized.
  • the server 104 performs image digitization processing on the character image to be recognized to obtain a three-dimensional matrix corresponding to the character image to be recognized.
  • the server 104 inputs the three-dimensional matrix into a preset dense connection network, so as to extract image features of the character image to be recognized by using the dense connection network to obtain a feature matrix of the character image to be recognized.
  • the server 104 determines the characters in the character image to be recognized according to the feature matrix of the character image to be recognized.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a universal character recognition method is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
  • Step S202 Receive an image to be recognized, perform text detection on the image to be recognized, and obtain a character image to be recognized.
  • An image is the most commonly used information carrier and contains information about the object being described.
  • the image to be recognized refers to an image containing characters to be recognized, such as an invoice image, a list image, etc.
  • the character image to be recognized refers to an image that only includes characters, that is, an image that does not include blank image areas and other information except characters.
  • the user can issue a recognition instruction through the operating terminal.
  • the terminal After receiving the instruction issued by the user, the terminal sends the identification instruction and the corresponding image to be identified to the server.
  • the server performs cropping and other processing on the image to be recognized to obtain a character image to be recognized that contains only characters.
  • a user collects a large number of invoices offline and needs to enter the reimbursement unit, reimbursement amount and other information in the invoice into the corresponding reimbursement system for reimbursement, they can be obtained by photocopying or scanning these invoices.
  • the corresponding invoice image Then, upload the invoice images to the system by operating the corresponding reimbursement system on the terminal, and issue character recognition instructions in the reimbursement system.
  • the reimbursement system of the terminal receives the invoice image and the character recognition instruction uploaded by the user, it sends the character recognition request to the server corresponding to the reimbursement system and sends the invoice image to the server together.
  • the server can perform text detection on the invoice image. The text detection determines that the invoice image contains only the reimbursement unit, reimbursement amount and other characters to be recognized, and then calls the cropping tool to crop the to-be-recognized character area from the invoice image. The character image to be recognized of the invoice image.
  • Step S204 Perform image digitization processing on the character image to be recognized to obtain a three-dimensional matrix corresponding to the character image to be recognized.
  • each pixel is regarded as a dot or a small grid. From this point of view, the image is a standard rectangle with a certain width and height. It can be understood that the resolution usually refers to the height and width of the rectangle. For example, if the resolution of an image is 1280*720, then 1280 and 720 are the number of rows and columns of the image, respectively.
  • the matrix has rows and columns, and the operation of the matrix is relatively common and mature in mathematics and computer processing. In other words, when the image needs to be processed, the computer usually converts the operation on the image to the operation on the matrix.
  • the character image to be recognized is processed, the character image to be recognized is converted into a corresponding image matrix, so that the operation of the character image to be recognized is converted into the matrix operation, which is convenient for the server to process.
  • the dimensions of the matrix are also different depending on the image type.
  • the processed image is a color image, and the color image includes three components of RGB, that is, each component corresponds to a matrix, and the image matrix corresponding to the color image is a three-dimensional matrix. That is to say, since each pixel in the image includes three channels of data, when the image is converted into a matrix for operation, the image is converted into a three-dimensional matrix with a three-dimensional data structure.
  • the size of the three-dimensional matrix is consistent with the size of the character image to be recognized and the number of channels, that is, the length*width*height of the three-dimensional matrix is equal to the length*width*channel number of the character image to be recognized.
  • performing image digitization processing on the character image to be recognized to obtain a three-dimensional matrix corresponding to the character image to be recognized includes: obtaining RGB values corresponding to all pixels in the character image to be recognized. Convert the RGB value corresponding to each pixel into a three-dimensional matrix.
  • each pixel in the character image to be recognized includes three channels of data
  • the three channels of data are RGB values.
  • the pixel distribution in the image is the same as the element distribution in the matrix, that is, the RGB values of all pixels are converted to a value between 0-1 and then stored in Distribute the corresponding elements in the three-dimensional matrix.
  • step S206 the three-dimensional matrix is input to a preset densely connected network to extract image features of the character image to be recognized by using the densely connected network to obtain a feature matrix of the character image to be recognized.
  • the densely connected network is a layer of network in the character recognition model, and the character recognition model is the model used in this application for general character recognition.
  • Densely connected network is a kind of neural network that uses short-circuit connection to the extreme, that is, each layer of the densely connected network is not only connected to the corresponding next layer, but also connected to all subsequent layers. In other words, for each layer of network, the input feature is not only the output of the previous network, but the output of all previous network layers.
  • step S206 the three-dimensional matrix is input to a preset densely connected network to extract image features of the character image to be recognized by using the densely connected network to obtain a feature matrix of the character image to be recognized It includes the following steps:
  • Step S302 using the densely connected two-dimensional convolutional layer to extract the image features of the three-dimensional matrix to obtain the basic features of the character image to be recognized.
  • the convolutional layer refers to the network layer used to extract the characteristics of the input data.
  • the convolutional layer generally includes multiple convolution kernels.
  • the convolution kernel is a function for weighted average.
  • the two-dimensional convolutional layer refers to the 2D convolutional layer. . Specifically, after the densely connected network receives the three-dimensional matrix of the character image to be recognized, the convolution kernel in the two-dimensional convolution layer is used to calculate the weighted average of the three-dimensional matrix, and the resulting matrix is the basic feature of the character image to be recognized.
  • Step S304 using the dense convolutional layer of the densely connected network to extract the depth features of the character image to be recognized from the basic features.
  • the dense convolutional layer refers to a network layer that short-circuits all convolutional layers.
  • the densely connected network includes a 2D convolutional layer and a dense convolutional layer.
  • the dense convolutional layer includes four convolutional layers. The output of each convolutional layer is connected to all subsequent convolutions through short circuits. Layer up. That is, the basic features output by the 2D convolution layer are convolved again through the four short-circuited convolution layers in the dense convolution layer to extract the depth features of the character image to be recognized.
  • the dense convolutional layer includes convolutional layer H1, convolutional layer H2, convolutional layer H3, and convolutional layer H4.
  • the corresponding output of each convolutional layer is X1, X2, X3, and X4, and X0 is a short-circuit connection.
  • the output of the upper network of the four convolutional layers is the basic feature of the character image to be recognized output by the 2D convolutional layer.
  • X0 is not only used as the input of the convolutional layer H1, but also as the input of the convolutional layer H2, the convolutional layer H3, and the convolutional layer H4.
  • the output X1 obtained by the convolution operation of X0 through the convolutional layer H1 is not only input to the convolutional layer H2, but also to the convolutional layer H3 and the convolutional layer H4.
  • the convolutional layer H2 performs convolution operation based on X0 and X1 and X2 is also input to the convolutional layer H3 and the convolutional layer H4, respectively.
  • the convolutional layer H3 and the convolutional layer H4 are the same as above, and will not be repeated here. In other words, the output of each convolutional layer includes not only its own output, but also its own input.
  • Step S306 Add the basic feature and the depth feature to obtain a feature matrix of the character image to be recognized.
  • the basic feature and the depth feature are added, that is, the feature matrix of the character image to be recognized obtained by feature fusion.
  • the characters in the character image to be recognized can be determined.
  • the addition of the basic feature and the depth feature means that the feature element values in the same row and column of the basic feature and the depth feature are directly added, and the new feature matrix obtained is the feature matrix of the character to be recognized.
  • the output result of the last convolutional layer H4 is X4
  • X4 is the depth feature.
  • X0, X1, X2, and X3 are the corresponding basic features. Therefore, X4 and X0, X1, X2, and X3 are added together as the input of the next layer of the network.
  • the densely connected network can not only fully extract the features, but also use the short-circuit connection method to fully reuse the features extracted by each layer of the network to achieve higher-precision character recognition.
  • Step S208 Determine the character in the character image to be recognized according to the feature matrix of the character image to be recognized.
  • the character recognition model also includes a conversion layer and a long and short-term memory network.
  • the Long Short-Term Memory layer (LSTM layer) is a time recurrent neural network, suitable for processing and predicting relatively long intervals and delays in time series Important events.
  • step S208 determining the characters in the character image to be recognized according to the feature matrix of the character image to be recognized, specifically includes: pooling the feature matrix to obtain the pooled feature matrix.
  • the long- and short-term memory network is used to obtain the correlation information between the features in the pooled feature matrix. Determine the character in the character image to be recognized according to the associated information.
  • the feature matrix is pooled by using the conversion layer, that is, after nonlinear processing is performed by the activation function in the conversion layer, the feature matrix is pooled by the pooling layer.
  • Pooling is generally divided into maximum pooling and average pooling. Maximum pooling refers to selecting the maximum value in the pooling area, while average pooling is to calculate the average of all the characteristic values in the pooling area.
  • the pooling area is preferably 2*2.
  • the resulting pooled matrix is input into the long- and short-term memory network.
  • the long- and short-term memory network can determine the correlation information between the features in the feature matrix after the conversion layer , Determine the character in the character image to be recognized based on the relationship between the features.
  • the above-mentioned universal character recognition method obtains the character image to be recognized by performing text detection on the image to be recognized, thereby ensuring that the blank area in the image to be recognized is removed and only the area with the characters remains. Then, the character image to be recognized is subjected to image digitization processing to obtain a three-dimensional matrix corresponding to the character image to be recognized, which is convenient for subsequent recognition by densely connected networks. Input the three-dimensional matrix corresponding to the character image to be recognized into the preset dense connection network, and use the dense connection network that can increase the number of network layers to extract the image features of the character image to be recognized. After obtaining the feature matrix of the character to be recognized, determine it according to the obtained feature matrix Characters in the character image to be recognized, thereby ensuring the multiplexing of features and realizing higher-precision character recognition.
  • determining the characters in the character image to be recognized according to the association information specifically includes: mapping the recognition result from a preset dictionary based on the association information and a preset mapping relationship; the recognition result includes the character and the corresponding recognition probability ; Select characters according to the recognition probability, and use the selected characters as the characters in the character image to be recognized.
  • the associated information between features means that the identified features can be combined to obtain corresponding characters based on the associated information.
  • the preset dictionary refers to a library that includes many different types of characters, such as text, numbers, and letters.
  • the features that can be combined among the features in the feature matrix are determined according to the associated information obtained by the long and short-term memory network. Then, after combining the combinable features, the corresponding characters are found from the preset dictionary, and the corresponding recognition probability is obtained at the same time, and then the characters are determined according to the recognition probability. For example, if the features that can be combined in the feature matrix include feature 1 and feature 2, and it is assumed that feature 1 is "one" and feature 2 is "ten”. Combine feature 1 " ⁇ " and feature 2 " ⁇ " to map in the preset dictionary.
  • the mapped characters include " ⁇ " and " ⁇ ", and " ⁇ ” and “ ⁇ ” have a corresponding Recognition probability, according to the recognition probability, a character is selected from “earth” and " ⁇ ” as the finally recognized character, that is, the character with high recognition probability is the selected character.
  • the step of receiving the image to be recognized and performing text detection on the image to be recognized to obtain the character image to be recognized specifically includes: performing text detection on the image to be recognized to obtain the character candidate region and the corresponding confidence level.
  • the character candidate area corresponding to the confidence level is scaled to obtain a scaled character candidate area.
  • the zoomed character candidate area is used as the character image to be recognized.
  • the text detection can use a preset text detection model to perform text detection on the image to be recognized, and the text detection model can be any existing text detection neural network model.
  • the character candidate area is an image that may contain the character to be recognized. Confidence is the detection probability given when the text detection model performs text detection.
  • the text detection is performed on the image to be recognized through a preset text detection model to obtain a character candidate region that may contain the character to be recognized, and the confidence that the region is a character candidate region is determined.
  • the character candidate region corresponding to the confidence level greater than the preset confidence level is selected.
  • the character candidate area is cropped from the image to be recognized, and the character candidate area is scaled and used as the character image to be recognized.
  • the step of scaling the character candidate region corresponding to the confidence level as the character image to be recognized includes: adjusting the height of the character candidate region corresponding to the confidence level according to a preset height value to obtain the height-adjusted Character candidate area.
  • the width of the height-adjusted character candidate area is filled with zeros until the preset maximum width value is satisfied.
  • the character candidate area meeting the preset maximum width value is used as the character image to be recognized.
  • the character candidate area is scaled and unified to the same preset height value, that is, the character candidate area larger than the preset height value is reduced in height, and the character candidate area smaller than the preset height value is enlarged in height.
  • the width is adjusted again according to the preset maximum width value, so that the character candidate area containing the character is within the specified image size. If, when adjusting the width of the character candidate area, the width of the character candidate area does not reach the preset maximum width value, fill in the pixel value of the character candidate area to supplement it until the width is equal to the preset maximum width value. Since the preset maximum width value is usually greater than the width of the image, there is no character candidate area whose width exceeds the preset maximum width value.
  • the character candidate area is scaled to the size specified by the character recognition model, and the character candidate area meeting the specified size of the model is used as the character image to be recognized.
  • the character candidate region is adjusted to ensure that the model requirements are met, and the situation where the requirements are not met and cannot be recognized is prevented.
  • the character recognition model includes a densely connected network, a transition layer (Transition block), a long short-term memory network layer (LSTM layer), and a fully connected layer (Fully Connected layer).
  • the densely connected network includes 2D convolutional layer (Convolutional 2D layer) and dense convolutional layer (Dense block). It can be understood that the dense convolutional layer (Dense block) can be increased correspondingly according to the actual required feature depth, and whenever a dense convolutional layer (Dense block) is added, a corresponding layer of transition block (Transition block) needs to be added.
  • the character recognition model includes a 2D convolutional layer (Convolutional 2D layer), two dense convolutional layers (Dense block), and two transformations. Layer (Transition block), a long and short-term memory network layer (LSTM layer), and a fully connected layer (Fully Connected layer).
  • the image to be recognized is received, and after preprocessing the image to be recognized to obtain the character image to be recognized, a three-dimensional matrix is obtained by converting the RGB value of each pixel in the character image to be recognized.
  • the three-dimensional matrix is input into the densely connected network in the character recognition model, and the Convolutional 2D layer in the densely connected network is used for feature extraction to obtain the basic features of the character image to be recognized.
  • use the first dense block in the densely connected network to perform further feature extraction on the basic features to obtain the first depth feature of the character image to be recognized.
  • the first feature matrix is obtained by adding the basic feature and the first depth feature.
  • a universal character recognition device including: a receiving module 602, a conversion module 604, an extraction module 606, and a determination module 608,
  • the receiving module 602 is configured to receive an image to be recognized, perform text detection on the image to be recognized, and obtain a character image to be recognized.
  • the conversion module 604 is configured to perform image digitization processing on the character image to be recognized to obtain a three-dimensional matrix corresponding to the character image to be recognized.
  • the extraction module 606 is configured to input the three-dimensional matrix into a preset densely connected network to extract image features of the character image to be recognized by using the densely connected network to obtain a feature matrix of the character image to be recognized.
  • the determining module 608 is configured to determine the character in the character image to be recognized according to the feature matrix of the character image to be recognized.
  • the extraction module 606 is also used to extract image characteristics of the three-dimensional matrix using the two-dimensional convolution layer of the densely connected network to obtain the basic features of the character image to be recognized; using the dense convolution layer of the densely connected network Extract the depth feature of the character image to be recognized from the basic features; add the basic feature and the depth feature to obtain the feature matrix of the character image to be recognized.
  • the determining module 608 is also used to pool the feature matrix to obtain the pooled feature matrix; use the long- and short-term memory network to obtain the correlation information between the features in the pooled feature matrix; The information determines the characters in the character image to be recognized.
  • the determination module 608 is further configured to map the recognition result from the preset dictionary based on the associated information and the preset mapping relationship; the recognition result includes the character and the corresponding recognition probability; the character is selected according to the recognition probability, and the As the characters in the character image to be recognized.
  • the receiving module 702 is further configured to perform text detection on the image to be recognized to obtain the character candidate region and the corresponding confidence; when there is a confidence not less than the preset confidence, the character candidate region corresponding to the confidence The zooming is performed to obtain the zoomed character candidate area; the zoomed character candidate area is used as the character image to be recognized.
  • the receiving module 702 is further configured to adjust the height of the character candidate area corresponding to the confidence level according to the preset height value to obtain the character candidate area after the height adjustment; and supplement the width of the character candidate area after the height adjustment. Zero, until the preset maximum width value is met; the character candidate area meeting the preset maximum width value is used as the character image to be recognized.
  • the conversion module 604 is also used to obtain the RGB values corresponding to all pixels in the character image to be recognized; to convert the RGB values corresponding to each pixel into a three-dimensional matrix.
  • Each module in the above-mentioned universal character recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the computer equipment database is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a universal character recognition method.
  • FIG. 7 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the one or more processors execute the following steps:
  • the character in the character image to be recognized is determined according to the feature matrix of the character image to be recognized.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the character in the character image to be recognized is determined according to the feature matrix of the character image to be recognized.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

一种通用字符识别方法,包括:接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像;将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵;将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;根据待识别字符图像的特征矩阵确定待识别字符图像中的字符。

Description

通用字符识别方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2019年6月28日提交中国专利局,申请号为2019105744345,申请名称为“通用字符识别方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种通用字符识别方法、装置、计算机设备和存储介质方法、装置、计算机设备和存储介质。
背景技术
通用字符是指常用可见的文本字符,包括文字、数字、字母以及特殊字符等。而随着神经网络的发展,现有通常使用层层依次连接的深度神经网络对通用字符进行识别。但是,对于通用字符识别来说,为了达到识别字符数数目多,识别精度高,可以应对复杂的文本场景等苛刻的要求,往往需要我们采用更深层的神经网络,以便进行更加复杂的特征模式的提取。
然而,发明人意识到,现有使用的深度神经网络出现了退化问题,即当增加深度神经网络的层数时,深度神经网络的准确度就会出现饱和,甚至出现下降,从而导致识别精度下降。
发明内容
根据本申请公开的各种实施例,提供一种通用字符识别方法、装置、计算机设备和存储介质。
一种通用字符识别方法,包括:
接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像 对应的三维矩阵;
将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
一种通用字符识别装置,包括:
接收模块,用于接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
转换模块,用于将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的为三维矩阵;
提取模块,用于将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
确定模块,用于根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图 像;
将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中通用字符识别方法的应用场景图。
图2为根据一个或多个实施例中通用字符识别方法的流程示意图。
图3为根据一个或多个实施例中利用密集连接网络对三维矩阵进行特征提取,得到待识别字符图像的特征矩阵步骤的流程示意图。
图4为根据一个或多个实施例中密集连接网络的结构图。
图5为根据一个或多个实施例中字符识别模型的结构图。
图6为根据一个或多个实施例中通用字符识别装置的框图。
图7为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的通用字符识别方法,可以应用于如图1所示的应用环境中。终端102与服务器104通过网络进行通信。服务器104接收终端102发送的 待识别图像,服务器104对待识别图像进行文本检测,得到待识别字符图像。服务器104将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵。服务器104将三维矩阵输入预设的密集连接网络,以利用密集连接网络对待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵。服务器104根据待识别字符图像的特征矩阵确定待识别字符图像中的字符。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一些实施例中,如图2所示,提供了一种通用字符识别方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像。
图像是最常用的信息载体,包含被描述对象的有关信息,在本实施例中,待识别图像是指包含有待识别字符的图像,例如发票图像、清单图像等。而待识别字符图像,即是指图像中只包括字符的图像,也就是不包括空白图像区域、以及除字符以外其他信息的图像。
具体地,当对图像有字符识别需求时,用户可以通过操作终端下发识别指令。终端接收到用户下发的指令后,将该识别指令以及对应的待识别图像发送至服务器。服务器响应该识别指令,对待识别图像进行裁剪等处理得到只包含字符的待识别字符图像。以发票为例,当用户在线下收集到大量发票,并且需要将发票中的报销单位、报销金额等信息录入对应的报销系统进行报销时,可以通过对该些发票进行影印或者扫描等处理获取到对应的发票图像。然后,通过在终端操作对应的报销系统将该些发票图像上传至系统当中,并且在报销系统中下发字符识别指令。则当终端的报销系统接收到用户上传的发票图像以及字符识别指令后,向报销系统对应的服务器发送字符识别的请求以及将发票图像一并发送至服务器。服务器即可对发票图像进行文本检测,通过文本检测确定得到发票图像中只包含报销单位、报销金额等字符的待识别字符区域,然后调用裁剪工具从发票图像中将待识别字符区域裁剪下来,得到该发票图像的待识别字符图像。
步骤S204,将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵。
由于图像是由多个像素排列组成的,每个像素视为一个点或者一个小格子,由此看来,图像便是一个标准的矩形,有着一定的宽度和高度。可以理解为,通常我们所说分辨率则是指的矩形的高和宽,例如,若图像的分辨率是1280*720,则1280和720分别就是图像的行数和列数。而矩阵有行和列,并且矩阵的操作在数学和计算机中的处理比较常见且很成熟。也就是说,需要对图像进行处理时,计算机通常将对图像的操作转换为对矩阵的操作。因此,对待识别字符图像进行处理时,把待识别字符图像转换为成对应的图像矩阵,以实现把对待识别字符图像的操作转换为对矩阵操作,便于服务器进行处理。而根据图像类型的不同,矩阵的维度也不相同。在本实施例中,针对处理的图像为彩色图像,而彩色图像包括RGB三个分量,即每个分量对应一个矩阵,则彩色图像对应的图像矩阵为三维矩阵。也就是说,由于图像中的每个像素都包括三个通道的数据,因此将图像转换为矩阵进行操作时,是将图像转换成维数为三的数据结构的三维矩阵。也就是说,三维矩阵的大小与待识别字符图像的尺寸和通道数一致,即三维矩阵的长*宽*高等于待识别字符图像的长*宽*通道数。
在一些实施例中,将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵,具体包括:获取待识别字符图像中所有像素对应的RGB值。将各像素对应的RGB值转换为三维矩阵。
具体地,由于待识别字符图像中的每个像素都包括三个通道的数据,该三个通道的数据即为RGB值。而每个像素的颜色均是由对应的RGB值的分量来决定的,每个分量有0-255个值可取。因此,将待识别字符图像转换为三维矩阵即是将待识别字符图像中所有像素的RGB值作为矩阵元素存储至三维矩阵中。并且,由于矩阵的大小取决于图像的尺寸和通道数,因此图像中的像素分布与矩阵中元素分布是一样的,即是将所有像素的RGB值转换到0-1之间后将其存储至三维矩阵中分布对应的元素格中。
步骤S206,将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵。
密集连接网络是字符识别模型中的一层网络,字符识别模型即是本申请用于通用字符识别的模型。密集连接网络是一种将短路连接用到了极致的神经网络,即密集连接网络中的每一层网络不仅与对应的下一层进行连接,还要与后面所有层都进行连接。也就是说,对于每一层网络来说,输入的特征不仅仅只是上一层网络的输出,而是前面所有网络层的输出。
在一些实施例中,如图3所示,步骤S206,将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵包括以下步骤:
步骤S302,利用密集连接网络的二维卷积层对三维矩阵进行图像特征的提取,得到待识别字符图像的基础特征。
卷积层是指用于提取输入数据特征的网络层,卷积层中一般包括多个卷积核,卷积核则是用于加权平均的函数,二维卷积层是指2D卷积层。具体地,当密集连接网络接收到待识别字符图像的三维矩阵后,利用二维卷积层中的卷积核对三维矩阵进行加权平均的计算,得到的矩阵即为待识别字符图像的基础特征。
步骤S304,利用密集连接网络的密集卷积层从基础特征中提取待识别字符图像的深度特征。
密集卷积层即是指将所有的卷积层都进行短路连接的网络层。在本实施例中,密集连接网络包括一个2D卷积层和一个密集卷积层,密集卷积层包括四个卷积层,每个卷积层的输出均通过短路连接到后面所有的卷积层上。即,通过密集卷积层中的四个短路连接的卷积层对2D卷积层输出的基础特征再次进行卷积操作,提取待识别字符图像的深度特征。
参考图4,密集卷积层包括卷积层H1、卷积层H2、卷积层H3和卷积层H4,每个卷积层对应的输出为X1、X2、X3和X4,X0为短路连接的四个卷积层的上一层网络的输出,即2D卷积层输出的待识别字符图像的基础特征。如图4所示,X0不仅作为卷积层H1的输入,还要作为卷积层H2、卷积层H3和卷积层H4的输入。同理,当X0输入至卷积层H1后,X0经过卷积层H1的卷积操作得到的输出X1不仅输入至卷积层H2中,还要输入至卷积层H3和卷积层H4中。而卷积层H2根据X0和X1进行卷积操作得到的X2同样分别输入至 卷积层H3和卷积层H4。卷积层H3和卷积层H4如上述一致,在此不再赘述。也就是说,每一层卷积层的输出结果不仅包括本身的输出,还要包括本身的输入。
步骤S306,将基础特征和深度特征相加,得到待识别字符图像的特征矩阵。
当得到待识别字符的基础特征和深度特征之后,将基础特征与深度特征相加,即进行特征融合得到的待识别字符图像的特征矩阵。根据该特征矩阵即可确定待识别字符图像中的字符。基础特征与深度特征相加,即是指基础特征与深度特征相同的行列里的特征元素值直接相加,所得到新的特征矩阵为待识别字符的特征矩阵。以图4包括四个卷积层的密集卷积层为例,最后一层卷积层H4输出结果X4,X4即为深度特征。对于X4而言,X0、X1、X2、X3是对应的基础特征,因此将X4与X0、X1、X2、X3进行特征相加之后作为下一层网路的输入。
在本实施例中,通过密集连接网络不仅可以充分提取特征,还利用短路连接的方式充分复用了各层网络提取的特征,实现更高精度的字符识别。
步骤S208,根据待识别字符图像的特征矩阵确定待识别字符图像中的字符。
字符识别模型还包括转换层和长短期记忆网络。转换层包括激活函数和池化层,通过转换层对降低特征矩阵的大小,即减少像素信息,只保留重要的信息。也就是说,通过转换层可以起到压缩模型的作用,假定转换层上一层的密集连接网络得到的特征矩阵所对应的特征图的通道数为m,转换层通过卷积层可以产生[nm]个特征,n是压缩系数(compression rate),取值范围为(0,1]。当n=1时,特征个数经过转换层没有变化,即无压缩,而当压缩系数n小于1时,特征个数即会被压缩。而长短期记忆网络层(Long Short-Term Memory layer,LSTM layer)是一种时间递归神经网络,适合用于处理和预测时间序列中间隔和延时相对较长的重要事件。
在一些实施例中,步骤S208,根据待识别字符图像的特征矩阵确定待识别字符图像中的字符,具体包括:将特征矩阵进行池化,得到池化后的特征矩阵。利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息。 根据关联信息确定待识别字符图像中的字符。
具体地,利用转换层对特征矩阵进行池化,即通过转换层中的激活函数进行非线性处理后,再利用池化层对特征矩阵进行池化。池化一般分为最大池化和均值池化,最大池化是指选择池化区域中的最大值,而均值池化即是通过计算池化区域中所有特征值的均值,池化区域优选为2*2。当通过转换层对特征矩阵进行池化之后,将得到的池化后的矩阵输入至长短期记忆网络中,通过长短期记忆网络能够确定经过转换层后的特征矩阵中各特征之间的关联信息,基于特征之间的关系确定待识别字符图像中的字符。
上述通用字符识别方法,通过对待识别图像进行文本检测,得到待识别字符图像,从而保证去掉待识别图像中的空白区域,仅保留有字符的区域。再将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵,便于后续密集连接网络进行识别。将待识别字符图像对应的三维矩阵输入预设的密集连接网络,利用能够提升网络层数的密集连接网络对待识别字符图像进行图像特征的提取得到待识别字符的特征矩阵后,根据得到特征矩阵确定待识别字符图像中的字符,从而保证了特征的复用,实现了更高精度的字符识别。
在一些实施例中,根据关联信息确定待识别字符图像中的字符,具体包括:基于关联信息和预设的映射关系,从预设字典中映射得到识别结果;识别结果包括字符以及对应的识别概率;根据识别概率选择字符,将选择的字符作为待识别字符图像中的字符。
特征之间的关联信息即是指可以根据该关联信息对识别到的特征进行组合得到对应的字符。预设字典是指包括多种不同类型的字符的库,例如文字、数字和字母等。
具体地,根据长短期记忆网络得到的关联信息确定特征矩阵中各特征之间可以进行组合的特征。然后将可组合的特征进行组合后从预设字典中找到对应的字符,以及同时得到对应的识别概率,然后根据识别概率确定字符。例如,若特征矩阵中可以进行组合的特征包括特征1和特征2,而假设特征1为“一”,特征2“十”。将特征1“一”和特征2“十”进行组合后去预设字典中进行映射,映射得到的字符包括“土”和“士”,并且,“土”和“士” 都有一个对应的识别概率,根据该识别概率从“土”和“士”选择一个字符作为最终识别得到的字符,即识别概率高的字符为被选择的字符。
在一些实施例中,接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像的步骤,具体包括:对待识别图像进行文本检测,得到字符候选区域以及对应的置信度。当存在不小于预设置信度的置信度时,将对应所述置信度的字符候选区域进行缩放,得到缩放后的字符候选区域。将缩放后的字符候选区域作为待识别字符图像。
文本检测可利用预设的文本检测模型对待识别图像进行文本检测,文本检测模型可为现有任意一种文本检测神经网络模型。字符候选区域为可能包含待识别字符的图像。置信度为文本检测模型进行文本检测时给出的检测概率。
具体地,通过预设的文本检测模型对待识别图像进行文本检测,得到可能包含待识别字符的字符候选区域,以及确定该区域为字符候选区域的置信度。当检测得到的置信度中有大于预设置信度的置信度时,选择大于预设置信度的置信度对应的字符候选区域。将该字符候选区域从待识别图像中裁剪下来,并对该字符候选区域进行尺寸缩放后作为待识别字符图像。在本实施例中,通过对待识别图像进行文本检测,可以初步排除不包含字符的图像区域,减少后续字符识别的工作量,保证快速识别。
在一些实施例中,将置信度对应的字符候选区域进行缩放后作为待识别字符图像的步骤,具体包括:根据预设高度值,调整置信度对应的字符候选区域的高度,得到高度调整后的字符候选区域。对高度调整后的字符候选区域的宽度进行补零,直到满足预设最大宽度值为止。将满足预设最大宽度值的字符候选区域作为待识别字符图像。
具体地,将字符候选区域进行缩放统一成相同的预设高度值,即大于预设高度值的字符候选区域进行高度缩小,小于预设高度值的字符候选区域进行高度放大。当进行高度缩放后的字符候选区域后,根据预设最大宽度值再次调整宽度,使得包含字符的字符候选区域在规定的图像尺寸内。如果调整字符候选区域宽度时,字符候选区域的宽度没有达到预设最大宽度值,则通过在字符候选区域的像素值中填0进行补充,直到满足宽度等于预设最大宽 度值为止。而由于预设最大宽度值通常会大于图像的宽度,所以不存在宽度超过预设最大宽度值的字符候选区域。如上进行缩放后将字符候选区域缩放到字符识别模型规定的尺寸内,将符合模型规定尺寸的字符候选区域作为待识别字符图像。在本实施例中,通过调整字符候选区域确保满足模型要求,防止不满足要求而无法识别的情况。
在一些实施例中,字符识别模型包括密集连接网络、转换层(Transition block)、长短期记忆网络层(Long Short-Term Memory layer,LSTM layer)以及全连接层(Fully Connected layer)。密集连接网络中包括2D卷积层(Convolutional 2D layer)和密集卷积层(Dense block)。可以理解为,密集卷积层(Dense block)可以根据实际所需特征深度进行相应增加,而每当增加一层密集卷积层(Dense block)则需要相应增加一层转换层(Transition block)。
如图5所示,在本实施例中,提供一种字符识别模型结构,该字符识别模型包括一个2D卷积层(Convolutional 2D layer)、两个密集卷积层(Dense block)、两个转换层(Transition block)、一个长短期记忆网络层(LSTM layer)、一个全连接层(Fully Connected layer)。
具体地,接收待识别图像,对待识别图像进行预处理得到待识别字符图像后,通过将待识别字符图像中各像素的RGB值转化得到三维矩阵。将三维矩阵输入到字符识别模型中的密集连接网络,利用密集连接网络中的Convolutional 2D layer进行特征提取,得到待识别字符图像的基础特征。根据基础特征生成基础特征图。然后利用密集连接网络中的第一个Dense block将基础特征进行进一步的特征提取,得到待识别字符图像的第一深度特征。将基础特征与第一深度特征相加后得到第一特征矩阵。将第一特征矩阵输入到第一个Transition block当中,经过第一个Transition block降低第一特征矩阵的大小。然后将经过第一个Transition block的第一特征矩阵输入到第二个Dense block中,通过第二个Dense block在第一特征矩阵上再次进行深度特征提取,得到第二深度特征。然后将第二深度特征与第一特征矩阵相加后得到第二特征矩阵,将第二特征矩阵输入到第二个Transition block当中,经过第二个Transition block降低第二特征矩阵 的大小后,通过长短期记忆模块确定已经经过第二个Transition block的第二特征矩阵中各特征的关联信息。最后利用全连接层,基于关联信息和预设字典得到字符以及对应的识别概率,并根据概率选择确定字符。
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一些实施例中,如图6所示,提供了一种通用字符识别装置,包括:接收模块602、转换模块604、提取模块606和确定模块608,
接收模块602,用于接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像。
转换模块604,用于将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵。
提取模块606,用于将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵。
确定模块608,用于根据待识别字符图像特征矩阵确定待识别字符图像中的字符。
在一些实施例中,提取模块606还用于利用密集连接网络的二维卷积层对三维矩阵进行图像特的征提取,得到待识别字符图像的基础特征;利用密集连接网络的密集卷积层从基础特征中提取待识别字符图像的深度特征;将基础特征和深度特征相加,得到待识别字符图像的特征矩阵。
在一些实施例中,确定模块608还用于将特征矩阵进行池化,得到池化后的特征矩阵;利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息;根据关联信息确定待识别字符图像中的字符。
在一些实施例中,确定模块608还用于基于关联信息和预设的映射关系,从预设字典中映射得到识别结果;识别结果包括字符以及对应的识别概率;根据识别概率选择字符,将选择的字符作为待识别字符图像中的字符。
在一些实施例中,接收模块702还用于对待识别图像进行文本检测,得到字符候选区域以及对应的置信度;当存在不小于预设置信度的置信度时,将对应置信度的字符候选区域进行缩放,得到缩放后的字符候选区域;将缩放后的字符候选区域作为待识别字符图像。
在一些实施例中,接收模块702还用于根据预设高度值,调整置信度对应的字符候选区域的高度,得到高度调整后的字符候选区域;对高度调整后的字符候选区域的宽度进行补零,直到满足预设最大宽度值为止;将满足预设最大宽度值的字符候选区域作为待识别字符图像。
在一些实施例中,转换模块604还用于获取待识别字符图像中所有像素对应的RGB值;将各像素对应的RGB值转换为三维矩阵。
关于通用字符识别装置的具体限定可以参见上文中对于通用字符识别方法的限定,在此不再赘述。上述通用字符识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种通用字符识别方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的 限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像;
将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵;
将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图进行图像特征的提取,得到待识别字符图像的特征矩阵;及
根据待识别字符图特征矩阵确定待识别字符图像中的字符。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
接收待识别图像,对待识别图像进行文本检测,得到待识别字符图像;
将待识别字符图像进行图像数字化处理,得到待识别字符图像对应的三维矩阵;
将三维矩阵输入至预设的密集连接网络,以利用密集连接网络对待识别字符图进行图像特征的提取,得到待识别字符图像的特征矩阵;及
根据待识别字符图特征矩阵确定待识别字符图像中的字符。
在一些实施例中,计算机可读指令被处理器执行时还实现以下步骤:
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。 易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种通用字符识别方法,包括:
    接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
    将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
    将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
    根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵,包括:
    利用所述密集连接网络的二维卷积层对所述三维矩阵进行图像特征的提取,得到所述待识别字符图像的基础特征;
    利用所述密集连接网络的密集卷积层从所述基础特征中提取所述待识别字符图像的深度特征;及
    将所述基础特征和所述深度特征相加,得到待识别字符图像的特征矩阵。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符,包括:
    将所述特征矩阵进行池化,得到池化后的特征矩阵;
    利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息;及
    根据所述关联信息确定所述待识别字符图像中的字符。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述关联信息确定所述待识别字符图像中的字符,包括:
    基于所述关联信息和预设的映射关系,从预设字典中映射得到识别结果;所述识别结果包括字符以及对应的识别概率;及
    根据所述识别概率选择字符,将选择的字符作为所述待识别字符图像中 的字符。
  5. 根据权利要求1所述的方法,其特征在于,所述接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像,包括:
    对所述待识别图像进行文本检测,得到字符候选区域以及对应的置信度;及
    当存在不小于预设置信度的置信度时,将对应所述置信度的字符候选区域进行缩放,得到缩放后的字符候选区域,将所述缩放后的字符候选区域作为待识别字符图像。
  6. 根据权利要求5所述的方法,其特征在于,所述将对应所述置信度的字符候选区域进行缩放,得到缩放后的字符候选区域,将所述缩放后的字符候选区域作为待识别字符图像,包括:
    根据预设高度值,调整所述置信度对应的字符候选区域的高度,得到高度调整后的字符候选区域;
    对所述高度调整后的字符候选区域的宽度进行补零,直到满足预设最大宽度值为止;及
    将所述满足预设最大宽度值的字符候选区域作为待识别字符图像。
  7. 根据权利要求1所述的方法,其特征在于,所述将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵,包括:
    获取所述待识别字符图像中所有像素对应的RGB值;及
    将各所述像素对应的RGB值转换为三维矩阵。
  8. 一种通用字符识别装置,包括:
    接收模块,用于接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
    转换模块,用于将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
    提取模块,用于将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
    确定模块,用于根据所述待识别字符图像的特征矩阵确定所述待识别字 符图像中的字符。
  9. 根据权利要求8所述的装置,其特征在于,所述提取模块还用于
    利用所述密集连接网络的二维卷积层对所述三维矩阵进行图像特征的提取,得到所述待识别字符图像的基础特征;
    利用所述密集连接网络的密集卷积层从所述基础特征中提取所述待识别字符图像的深度特征;及
    将所述基础特征和所述深度特征相加,得到待识别字符图像的特征矩阵。
  10. 根据权利要求8所述的装置,其特征在于,所述确定模块还用于
    将所述特征矩阵进行池化,得到池化后的特征矩阵;
    利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息;及
    根据所述关联信息确定所述待识别字符图像中的字符。
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
    将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
    将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
    根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    利用所述密集连接网络的二维卷积层对所述三维矩阵进行图像特征的提取,得到所述待识别字符图像的基础特征;
    利用所述密集连接网络的密集卷积层从所述基础特征中提取所述待识别字符图像的深度特征;及
    将所述基础特征和所述深度特征相加,得到待识别字符图像的特征矩阵。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    将所述特征矩阵进行池化,得到池化后的特征矩阵;
    利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息;及
    根据所述关联信息确定所述待识别字符图像中的字符。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    基于所述关联信息和预设的映射关系,从预设字典中映射得到识别结果;所述识别结果包括字符以及对应的识别概率;及
    根据所述识别概率选择字符,将选择的字符作为所述待识别字符图像中的字符。
  15. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    对所述待识别图像进行文本检测,得到字符候选区域以及对应的置信度;及
    当存在不小于预设置信度的置信度时,将对应所述置信度的字符候选区域进行缩放,得到缩放后的字符候选区域,将所述缩放后的字符候选区域作为待识别字符图像。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收待识别图像,对所述待识别图像进行文本检测,得到待识别字符图像;
    将所述待识别字符图像进行图像数字化处理,得到所述待识别字符图像对应的三维矩阵;
    将所述三维矩阵输入预设的密集连接网络,以利用密集连接网络对所述待识别字符图像进行图像特征的提取,得到待识别字符图像的特征矩阵;及
    根据所述待识别字符图像的特征矩阵确定所述待识别字符图像中的字符。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    利用所述密集连接网络的二维卷积层对所述三维矩阵进行图像特征的提取,得到所述待识别字符图像的基础特征;
    利用所述密集连接网络的密集卷积层从所述基础特征中提取所述待识别字符图像的深度特征;及
    将所述基础特征和所述深度特征相加,得到待识别字符图像的特征矩阵。
  18. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    将所述特征矩阵进行池化,得到池化后的特征矩阵;
    利用长短期记忆网络获取池化后的特征矩阵中各特征之间的关联信息;及
    根据所述关联信息确定所述待识别字符图像中的字符。
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    基于所述关联信息和预设的映射关系,从预设字典中映射得到识别结果;所述识别结果包括字符以及对应的识别概率;及
    根据所述识别概率选择字符,将选择的字符作为所述待识别字符图像中的字符。
  20. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    对所述待识别图像进行文本检测,得到字符候选区域以及对应的置信度;及
    当存在不小于预设置信度的置信度时,将对应所述置信度的字符候选区域进行缩放,得到缩放后的字符候选区域,将所述缩放后的字符候选区域作为待识别字符图像。
PCT/CN2019/102942 2019-06-28 2019-08-28 通用字符识别方法、装置、计算机设备和存储介质 WO2020258491A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910574434.5 2019-06-28
CN201910574434.5A CN110414520B (zh) 2019-06-28 通用字符识别方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020258491A1 true WO2020258491A1 (zh) 2020-12-30

Family

ID=68358498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102942 WO2020258491A1 (zh) 2019-06-28 2019-08-28 通用字符识别方法、装置、计算机设备和存储介质

Country Status (1)

Country Link
WO (1) WO2020258491A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592717A (zh) * 2021-08-11 2021-11-02 浙江大华技术股份有限公司 视频图像字符叠加方法、装置、存储介质及电子装置
CN113722434A (zh) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 一种文本数据处理的方法、装置、计算机设备及存储介质
CN115620299A (zh) * 2022-12-14 2023-01-17 深圳思谋信息科技有限公司 图像识别方法、装置、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875722A (zh) * 2017-12-27 2018-11-23 北京旷视科技有限公司 字符识别与识别模型训练方法、装置和系统及存储介质
CN108875787A (zh) * 2018-05-23 2018-11-23 北京市商汤科技开发有限公司 一种图像识别方法及装置、计算机设备和存储介质
CN109685100A (zh) * 2018-11-12 2019-04-26 平安科技(深圳)有限公司 字符识别方法、服务器及计算机可读存储介质
CN109815946A (zh) * 2018-12-03 2019-05-28 东南大学 基于密集连接网络的多线程营业执照定位识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875722A (zh) * 2017-12-27 2018-11-23 北京旷视科技有限公司 字符识别与识别模型训练方法、装置和系统及存储介质
CN108875787A (zh) * 2018-05-23 2018-11-23 北京市商汤科技开发有限公司 一种图像识别方法及装置、计算机设备和存储介质
CN109685100A (zh) * 2018-11-12 2019-04-26 平安科技(深圳)有限公司 字符识别方法、服务器及计算机可读存储介质
CN109815946A (zh) * 2018-12-03 2019-05-28 东南大学 基于密集连接网络的多线程营业执照定位识别方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592717A (zh) * 2021-08-11 2021-11-02 浙江大华技术股份有限公司 视频图像字符叠加方法、装置、存储介质及电子装置
CN113722434A (zh) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 一种文本数据处理的方法、装置、计算机设备及存储介质
CN113722434B (zh) * 2021-08-30 2024-05-03 平安科技(深圳)有限公司 一种文本数据处理的方法、装置、计算机设备及存储介质
CN115620299A (zh) * 2022-12-14 2023-01-17 深圳思谋信息科技有限公司 图像识别方法、装置、计算机设备和存储介质
CN115620299B (zh) * 2022-12-14 2023-03-21 深圳思谋信息科技有限公司 图像识别方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN110414520A (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
US11609968B2 (en) Image recognition method, apparatus, electronic device and storage medium
WO2020232872A1 (zh) 表格识别方法、装置、计算机设备和存储介质
US20190347767A1 (en) Image processing method and device
CN109816615B (zh) 图像修复方法、装置、设备以及存储介质
CN109117846B (zh) 一种图像处理方法、装置、电子设备和计算机可读介质
CN111797834B (zh) 文本识别方法、装置、计算机设备和存储介质
US11714921B2 (en) Image processing method with ash code on local feature vectors, image processing device and storage medium
CN113435594B (zh) 安防检测模型训练方法、装置、设备及存储介质
CN113496150B (zh) 密集目标检测方法、装置、存储介质及计算机设备
CN110866491A (zh) 目标检索方法、装置、计算机可读存储介质和计算机设备
CN112883983B (zh) 特征提取方法、装置和电子系统
WO2020252911A1 (zh) 失踪人脸识别方法、装置、计算机设备和存储介质
CN112990219A (zh) 用于图像语义分割的方法和装置
WO2020258491A1 (zh) 通用字符识别方法、装置、计算机设备和存储介质
CN113838138B (zh) 一种优化特征提取的系统标定方法、系统、装置和介质
CN112001931A (zh) 图像分割方法、装置、设备及存储介质
CN111291741B (zh) 单据识别方法、装置、计算机设备和存储介质
CN110956195A (zh) 图像匹配方法、装置、计算机设备及存储介质
CN112016502A (zh) 安全带检测方法、装置、计算机设备及存储介质
CN107220934A (zh) 图像重建方法及装置
CN115731442A (zh) 图像处理方法、装置、计算机设备和存储介质
CN115620321A (zh) 表格识别方法及装置、电子设备和存储介质
CN110414520B (zh) 通用字符识别方法、装置、计算机设备和存储介质
CN112509052B (zh) 黄斑中心凹的检测方法、装置、计算机设备和存储介质
CN115272906A (zh) 一种基于点渲染的视频背景人像分割模型及算法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935065

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935065

Country of ref document: EP

Kind code of ref document: A1