WO2021218706A1 - Text identification method and apparatus, device, and storage medium - Google Patents

Text identification method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2021218706A1
WO2021218706A1 PCT/CN2021/088389 CN2021088389W WO2021218706A1 WO 2021218706 A1 WO2021218706 A1 WO 2021218706A1 CN 2021088389 W CN2021088389 W CN 2021088389W WO 2021218706 A1 WO2021218706 A1 WO 2021218706A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text
training
neural network
text image
Prior art date
Application number
PCT/CN2021/088389
Other languages
French (fr)
Chinese (zh)
Inventor
王文佳
刘学博
谢恩泽
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2022520075A priority Critical patent/JP2022550195A/en
Publication of WO2021218706A1 publication Critical patent/WO2021218706A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

Embodiments of the present invention provide a text identification method and apparatus, a device, and a storage medium. By obtaining a feature map of a first text image, and processing the first text image according to at least one feature sequence comprised in the feature map, a second text image of which the resolution is greater than that of the first text image is obtained. Because image blocks in the first text image have correlations, by using the mode above, the first text image having the relatively low resolution can be recovered into the second text image having the relatively high resolution by effectively utilizing the correlations between the text, and then text content in the first text image can be identified by carrying out text identification on the second text image.

Description

文本识别方法、装置、设备及存储介质Text recognition method, device, equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本公开要求于2020年4月30日提交的、申请号为202010362519X、发明名称为“文本识别方法、装置、设备及存储介质”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。This disclosure claims the priority of a Chinese patent application filed on April 30, 2020 with an application number of 202010362519X and an invention title of "text recognition method, device, equipment, and storage medium". The entire content of the Chinese patent application is based on The way of reference is incorporated into this article.
技术领域Technical field
本公开涉及计算机视觉技术领域,尤其涉及文本识别方法、装置、设备及存储介质。The present disclosure relates to the field of computer vision technology, and in particular to text recognition methods, devices, equipment, and storage media.
背景技术Background technique
低分辨率的文本图像在日常生活中非常常见。例如,通过手机等安装有图像采集设备的终端设备采集的文本图像的分辨率可能较低。这些图像丢失了细节的内容信息,导致对图像中文本的识别准确率较低。传统的文本识别方式一般是先对图像的纹理进行重建,然后基于重建的图像进行文本识别。然而,这种方式的识别准确率较低。Low-resolution text images are very common in daily life. For example, the resolution of a text image collected by a terminal device equipped with an image collection device such as a mobile phone may be low. These images have lost detailed content information, resulting in a low recognition accuracy of the text in the image. The traditional text recognition method generally reconstructs the texture of the image first, and then performs text recognition based on the reconstructed image. However, the recognition accuracy of this method is low.
发明内容Summary of the invention
本公开提供一种文本识别方法、装置、设备及存储介质。The present disclosure provides a text recognition method, device, equipment and storage medium.
根据本公开实施例的第一方面,提供一种文本识别方法,所述方法包括:获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性;根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率;对所述第二文本图像进行文本识别。According to a first aspect of the embodiments of the present disclosure, there is provided a text recognition method, the method includes: acquiring a feature map of a first text image, the feature map includes at least one feature sequence, and the feature sequence is used to represent all The correlation between at least two image blocks in the first text image; processing the first text image according to the at least one feature sequence to obtain a second text image, the resolution of the second text image is greater than The resolution of the first text image; performing text recognition on the second text image.
在一些实施例中,所述获取第一文本图像的特征图,包括:获取所述第一文本图像的多个通道图和所述第一文本图像对应的二值图像;对所述多个通道图和所述二值图像进行特征提取,得到所述第一文本图像的特征图。In some embodiments, the acquiring a feature map of the first text image includes: acquiring a plurality of channel maps of the first text image and a binary image corresponding to the first text image; Perform feature extraction on the image and the binary image to obtain the feature map of the first text image.
在一些实施例中,所述获取第一文本图像的特征图,包括:将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图。In some embodiments, the obtaining a feature map of the first text image includes: inputting the first text image into a pre-trained neural network, and obtaining a feature map output by the neural network.
在一些实施例中,所述神经网络基于以下方式获取所述特征图:根据所述第一文本图像生成中间图像,所述中间图像的通道数大于所述第一文本图像的通道数;对所述中间图像进行特征提取,得到所述特征图。In some embodiments, the neural network obtains the feature map based on the following method: generating an intermediate image according to the first text image, the number of channels of the intermediate image is greater than the number of channels of the first text image; Perform feature extraction on the intermediate image to obtain the feature map.
在一些实施例中,所述神经网络包括至少一个卷积神经网络以及双向长短期记忆网络,所述至少一个卷积神经网络的输出端与所述双向长短期记忆网络的输入端相连;所述获取第一文本图像的特征序列,包括:将所述第一文本图像输入所述至少一个卷积神 经网络,获取所述至少一个卷积神经网络输出的中间图像;将所述中间图像输入所述双向长短期记忆网络,获取所述双向长短期记忆网络输出的所述特征图。In some embodiments, the neural network includes at least one convolutional neural network and a bidirectional long and short-term memory network, and an output end of the at least one convolutional neural network is connected to an input end of the bidirectional long and short-term memory network; Obtaining the feature sequence of the first text image includes: inputting the first text image into the at least one convolutional neural network, acquiring an intermediate image output by the at least one convolutional neural network; inputting the intermediate image into the A bidirectional long and short-term memory network to obtain the feature map output by the bidirectional long and short-term memory network.
在一些实施例中,所述神经网络包括依次连接的多个子网络;所述将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图,包括:将所述多个子网络中第i子网络输出的第i输出图像输入到所述多个子网络中的第i+1子网络,以通过所述第i+1子网络生成第i+1中间图像,并对所述第i+1中间图像进行特征提取,得到第i+1输出图像;所述第i+1中间图像的通道数大于所述第i输出图像的通道数;将第N输出图像确定为所述特征图;其中,i和N为正整数,N为子网络的总数,1≤i≤N-1,N≥2,其中,通过以下方式得到第1输出图像:第1子网络根据所述第一文本图像生成第1中间图像,并对所述第1中间图像进行特征提取,得到第1输出图像。In some embodiments, the neural network includes multiple sub-networks connected in sequence; the inputting the first text image into a pre-trained neural network and obtaining the feature map output by the neural network includes: The i-th output image output by the i-th sub-network in the plurality of sub-networks is input to the i+1-th sub-network in the plurality of sub-networks to generate the i+1-th intermediate image through the i+1-th sub-network, and compare Feature extraction is performed on the i+1th intermediate image to obtain the i+1th output image; the number of channels of the i+1th intermediate image is greater than the number of channels of the ith output image; the Nth output image is determined to be all The feature map; where i and N are positive integers, N is the total number of sub-networks, 1≤i≤N-1, N≥2, where the first output image is obtained by the following method: the first sub-network is based on the The first text image generates a first intermediate image, and performs feature extraction on the first intermediate image to obtain a first output image.
在一些实施例中,所述方法还包括:在根据所述至少一个特征序列对所述第一文本图像进行处理之前,对所述第一文本图像进行处理,以使所述第一文本图像的通道数与所述特征图的通道数相同。In some embodiments, the method further includes: before processing the first text image according to the at least one feature sequence, processing the first text image so that the The number of channels is the same as that of the feature map.
在一些实施例中,所述方法还包括:在得到第二文本图像之后,对所述第二文本图像进行处理,以使所述第二文本图像的通道数与所述第一文本图像的通道数相同;所述对所述第二文本图像进行文本识别,包括:对处理后的第二文本图像进行文本识别。In some embodiments, the method further includes: after obtaining the second text image, processing the second text image so that the number of channels of the second text image is equal to the number of channels of the first text image. The number is the same; the performing text recognition on the second text image includes: performing text recognition on the processed second text image.
在一些实施例中,所述方法还包括:基于至少一组训练图像对所述神经网络进行训练,每组训练图像包括第一训练图像和第二训练图像,所述第一训练图像与所述第二训练图像包括相同文本;其中,所述第一训练图像的分辨率小于第一分辨率阈值,所述第二训练图像的分辨率大于第二分辨率阈值,所述第一分辨率阈值小于或等于所述第二分辨率阈值。In some embodiments, the method further includes: training the neural network based on at least one set of training images, each set of training images includes a first training image and a second training image, the first training image and the The second training image includes the same text; wherein the resolution of the first training image is less than the first resolution threshold, the resolution of the second training image is greater than the second resolution threshold, and the first resolution threshold is less than Or equal to the second resolution threshold.
在一些实施例中,所述基于至少一组训练图像对所述神经网络进行训练,包括:将所述第一训练图像输入所述神经网络,并获取所述神经网络的输出图像;基于所述第一训练图像对应的第二训练图像与所述输出图像确定损失函数;基于所述损失函数对所述神经网络进行监督训练。In some embodiments, the training the neural network based on at least one set of training images includes: inputting the first training image into the neural network, and obtaining an output image of the neural network; based on the The second training image corresponding to the first training image and the output image determine a loss function; and the neural network is supervised and trained based on the loss function.
在一些实施例中,所述损失函数包括第一损失函数和第二损失函数中的至少一者;所述第一损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的均方差确定;和/或,所述第二损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的梯度场之差确定。In some embodiments, the loss function includes at least one of a first loss function and a second loss function; the first loss function is based on each corresponding pixel in the first training image and the second training image And/or, the second loss function is determined based on the difference between the gradient fields of each corresponding pixel in the first training image and the second training image.
在一些实施例中,所述方法还包括:在基于所述至少一组训练图像对所述神经网络进行训练之前,对所述第一训练图像和所述第二训练图像进行对齐。In some embodiments, the method further includes aligning the first training image and the second training image before training the neural network based on the at least one set of training images.
在一些实施例中,所述对所述第一训练图像和第二训练图像进行对齐,包括:通过预先训练的空间变换网络对所述第一训练图像进行处理,以将所述第一训练图像中的文本与所述第二训练图像中的文本进行对齐。In some embodiments, the aligning the first training image and the second training image includes: processing the first training image through a pre-trained spatial transformation network to transform the first training image The text in is aligned with the text in the second training image.
在一些实施例中,所述第一训练图像由设置有第一焦距的第一图像采集装置对第一位置处的拍摄对象进行拍摄得到;所述第二训练图像由设置有第二焦距的第二图像采集装置对所述第一位置处的所述拍摄对象进行拍摄得到;所述第一焦距小于所述第二焦距。In some embodiments, the first training image is obtained by photographing the subject at the first position by a first image acquisition device with a first focal length; the second training image is obtained by a first image acquisition device with a second focal length. The second image acquisition device is obtained by photographing the subject at the first position; the first focal length is smaller than the second focal length.
根据本公开实施例的第二方面,提供一种文本识别装置,所述装置包括:获取模块,用于获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性;第一处理模块,用于根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率;文本识别模块,用于对第二文本图像进行文本识别。According to a second aspect of the embodiments of the present disclosure, there is provided a text recognition device, the device includes: an acquisition module for acquiring a feature map of a first text image, the feature map includes at least one feature sequence, the feature The sequence is used to represent the correlation between at least two image blocks in the first text image; the first processing module is used to process the first text image according to the at least one characteristic sequence to obtain the second text Image, the resolution of the second text image is greater than the resolution of the first text image; a text recognition module is used to perform text recognition on the second text image.
在一些实施例中,所述获取模块包括:第一获取单元,用于获取所述第一文本图像的多个通道图和所述第一文本图像对应的二值图像;特征提取单元,用于对所述多个通道图和所述二值图像进行特征提取,得到所述第一文本图像的特征图。In some embodiments, the acquisition module includes: a first acquisition unit configured to acquire multiple channel diagrams of the first text image and a binary image corresponding to the first text image; and a feature extraction unit configured to Perform feature extraction on the multiple channel images and the binary image to obtain a feature map of the first text image.
在一些实施例中,所述获取模块用于:将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图。In some embodiments, the acquisition module is configured to: input the first text image into a pre-trained neural network, and acquire a feature map output by the neural network.
在一些实施例中,所述神经网络基于以下方式获取所述特征图:根据所述第一文本图像生成中间图像,所述中间图像的通道数大于所述第一文本图像的通道数;对所述中间图像进行特征提取,得到所述特征图。In some embodiments, the neural network obtains the feature map based on the following method: generating an intermediate image according to the first text image, the number of channels of the intermediate image is greater than the number of channels of the first text image; Perform feature extraction on the intermediate image to obtain the feature map.
在一些实施例中,所述神经网络包括至少一个卷积神经网络以及双向长短期记忆网络,所述至少一个卷积神经网络的输出端与所述双向长短期记忆网络的输入端相连;所述获取模块包括:第二获取单元,用于将所述第一文本图像输入所述至少一个卷积神经网络,获取所述至少一个卷积神经网络输出的中间图像;第三获取单元,用于将所述中间图像输入所述双向长短期记忆网络,获取所述双向长短期记忆网络输出的所述特征图。In some embodiments, the neural network includes at least one convolutional neural network and a bidirectional long and short-term memory network, and an output end of the at least one convolutional neural network is connected to an input end of the bidirectional long and short-term memory network; The acquisition module includes: a second acquisition unit, configured to input the first text image into the at least one convolutional neural network, and acquire an intermediate image output by the at least one convolutional neural network; and a third acquisition unit, configured to The intermediate image is input to the two-way long and short-term memory network, and the feature map output by the two-way long and short-term memory network is obtained.
在一些实施例中,所述神经网络包括依次连接的多个子网络;所述获取模块用于:将所述多个子网络中第i子网络输出的第i输出图像输入到所述多个子网络中的第i+1子网络,以通过所述第i+1子网络生成第i+1中间图像,并对所述第i+1中间图像进行特征提取,得到第i+1输出图像;所述第i+1中间图像的通道数大于所述第i输出图像的通道数;将第N输出图像确定为所述特征图;其中,i和N为正整数,N为子网络的总数,1≤i≤N-1,N≥2,其中,通过以下方式得到第1输出图像:第1子网络根据所述第一文本图像生成第1中间图像,并对所述第1中间图像进行特征提取,得到第1输出图像。In some embodiments, the neural network includes multiple sub-networks connected in sequence; the acquisition module is configured to: input the i-th output image output by the i-th sub-network of the multiple sub-networks into the multiple sub-networks To generate the i+1th intermediate image through the i+1th subnetwork, and perform feature extraction on the i+1th intermediate image to obtain the i+1th output image; The number of channels of the i+1th intermediate image is greater than the number of channels of the i-th output image; the Nth output image is determined as the feature map; where i and N are positive integers, and N is the total number of sub-networks, 1≤ i≤N-1, N≥2, where the first output image is obtained by the following method: the first sub-network generates a first intermediate image according to the first text image, and performs feature extraction on the first intermediate image, Obtain the first output image.
在一些实施例中,所述装置还包括:第二处理模块,用于在根据所述至少一个特征序列对所述第一文本图像进行处理之前,对所述第一文本图像进行处理,以使所述第一文本图像的通道数与所述特征图的通道数相同。In some embodiments, the device further includes: a second processing module, configured to process the first text image before processing the first text image according to the at least one characteristic sequence, so that The number of channels of the first text image is the same as the number of channels of the feature map.
在一些实施例中,所述装置还包括:第三处理模块,用于在得到第二文本图像之后,对所述第二文本图像进行处理,以使所述第二文本图像的通道数与所述第一文本图像的通道数相同;所述文本识别模块用于:对处理后的第二文本图像进行文本识别。In some embodiments, the device further includes: a third processing module, configured to process the second text image after obtaining the second text image, so that the number of channels of the second text image is equal to that of the second text image. The number of channels of the first text image is the same; the text recognition module is used to perform text recognition on the processed second text image.
在一些实施例中,所述装置还包括:训练模块,用于基于至少一组训练图像对所述神经网络进行训练,每组训练图像包括第一训练图像和第二训练图像,所述第一训练图像与所述第二训练图像包括相同文本;其中,所述第一训练图像的分辨率小于第一分辨率阈值,所述第二训练图像的分辨率大于第二分辨率阈值,所述第一分辨率阈值小于或 等于所述第二分辨率阈值。In some embodiments, the device further includes: a training module for training the neural network based on at least one set of training images, and each set of training images includes a first training image and a second training image. The training image and the second training image include the same text; wherein the resolution of the first training image is less than the first resolution threshold, the resolution of the second training image is greater than the second resolution threshold, and the first A resolution threshold is less than or equal to the second resolution threshold.
在一些实施例中,所述训练模块包括:输入单元,用于将所述第一训练图像输入所述神经网络,并获取所述神经网络的输出图像;确定单元,用于基于所述第一训练图像对应的第二训练图像与所述输出图像确定损失函数;训练单元,用于基于所述损失函数对所述神经网络进行监督训练。In some embodiments, the training module includes: an input unit, configured to input the first training image into the neural network, and obtain an output image of the neural network; and a determining unit, configured based on the first training image The second training image corresponding to the training image and the output image determine a loss function; the training unit is configured to perform supervised training on the neural network based on the loss function.
在一些实施例中,所述损失函数包括第一损失函数和第二损失函数中的至少一者;所述第一损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的均方差确定;和/或,所述第二损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的梯度场之差确定。In some embodiments, the loss function includes at least one of a first loss function and a second loss function; the first loss function is based on each corresponding pixel in the first training image and the second training image And/or, the second loss function is determined based on the difference between the gradient fields of each corresponding pixel in the first training image and the second training image.
在一些实施例中,所述装置还包括:对齐模块,用于在基于所述至少一组训练图像对所述神经网络进行训练之前,对所述第一训练图像和所述第二训练图像进行对齐。In some embodiments, the device further includes: an alignment module for performing training on the first training image and the second training image before training the neural network based on the at least one set of training images Aligned.
在一些实施例中,所述对齐模块用于:通过预先训练的空间变换网络对所述第一训练图像进行处理,以将所述第一训练图像中的文本与所述第二训练图像中的文本进行对齐。In some embodiments, the alignment module is used to process the first training image through a pre-trained spatial transformation network, so as to compare the text in the first training image with the text in the second training image. The text is aligned.
在一些实施例中,所述第一训练图像由设置有第一焦距的第一图像采集装置对第一位置处的拍摄对象进行拍摄得到;所述第二训练图像由设置有第二焦距的第二图像采集装置对所述第一位置处的所述拍摄对象进行拍摄得到;所述第一焦距小于所述第二焦距。In some embodiments, the first training image is obtained by photographing the subject at the first position by a first image acquisition device with a first focal length; the second training image is obtained by a first image acquisition device with a second focal length. The second image acquisition device is obtained by photographing the subject at the first position; the first focal length is smaller than the second focal length.
根据本公开实施例的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的方法。According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in any of the embodiments is implemented.
根据本公开实施例的第四方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements any implementation when the program is executed. The method described in the example.
根据本公开实施例的第五方面,提供一种计算机程序,其中,所述计算机程序被处理器执行时实现任一实施例所述的方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program, wherein when the computer program is executed by a processor, the method described in any one of the embodiments is implemented.
本公开实施例通过获取第一文本图像的特征图,根据所述特征图中包括的至少一个特征序列对所述第一文本图像进行处理,得到分辨率大于所述第一文本图像的第二文本图像,由于第一文本图像中的图像块之间具有相关性,因此,通过上述方式能够有效利用文本之间的相关性将分辨率较低的第一文本图像恢复成分辨率较高的第二文本图像,再通过对所述第二文本图像进行文本识别,从而识别出第一文本图像中的文本内容,提高了文本识别的准确率。The embodiment of the present disclosure obtains a feature map of a first text image, and processes the first text image according to at least one feature sequence included in the feature map to obtain a second text with a larger resolution than the first text image Image, due to the correlation between the image blocks in the first text image, the correlation between the text can be effectively used through the above method to restore the first text image with a lower resolution to a second text image with a higher resolution. For the text image, text recognition is performed on the second text image, thereby recognizing the text content in the first text image, which improves the accuracy of text recognition.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the present disclosure.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The drawings herein are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments that conform to the present disclosure, and are used together with the specification to explain the technical solutions of the present disclosure.
图1A是本公开实施例的文本图像的示意图一。Fig. 1A is a first schematic diagram of a text image according to an embodiment of the present disclosure.
图1B是本公开实施例的文本图像的示意图二。Fig. 1B is a second schematic diagram of a text image according to an embodiment of the present disclosure.
图1C是本公开实施例的文本图像的示意图三。Fig. 1C is a third schematic diagram of a text image according to an embodiment of the present disclosure.
图2是本公开实施例的文本识别方法的流程图。Fig. 2 is a flowchart of a text recognition method according to an embodiment of the present disclosure.
图3是本公开实施例的图像之间的不对齐现象的示意图。FIG. 3 is a schematic diagram of the misalignment phenomenon between images in an embodiment of the present disclosure.
图4是本公开实施例的文本识别方法的整体流程的示意图。FIG. 4 is a schematic diagram of the overall flow of a text recognition method according to an embodiment of the present disclosure.
图5是本公开实施例的文本识别装置的框图。Fig. 5 is a block diagram of a text recognition device according to an embodiment of the present disclosure.
图6是本公开实施例的计算机设备的结构示意图。Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. On the contrary, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括复数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。The terms used in the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. The singular forms of "a", "said" and "the" used in the present disclosure and appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. In addition, the term "at least one" herein means any one of a plurality of types or any combination of at least two of the plurality of types.
应当理解,尽管本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that, although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination".
为了使本技术领域的人员更好地理解本公开实施例,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例作进一步详细的说明。In order to enable those skilled in the art to better understand the embodiments of the present disclosure, and to make the above objectives, features, and advantages of the embodiments of the present disclosure more obvious and understandable, the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.
在日常生活中,常常需要从文本图像中识别出文本信息,即,进行文本识别。一些文本图像(例如,通过手机等安装有图像采集设备的终端设备采集的文本图像)的分辨率会比较低。这些分辨率较低的图像丢失了细节的内容信息,导致对图像中文本的识别准确率较低。这一问题对于场景文本图像(Scene Text Image,STI)尤为严重。场景文本图像是在自然场景下拍摄到的包含文本信息的图像。场景文本图像中的文本信息可以包括但不限于身份证号码、车票、广告牌、车牌等中的至少一者。如图1A至图1C所示给出了文本信息的示例。由于不同的场景文本图像中的文本的特征差异较大,例如,文本大小、字体、颜色、亮度和/或扭曲变形程度可能不同,因此,对场景文本图像进行文本识别的难度远大于对扫描文档图像中的文本进行识别,从而导致对场景文本图像的 识别准确率比对印刷体文本图像的识别准确率更低。In daily life, it is often necessary to recognize text information from text images, that is, to perform text recognition. The resolution of some text images (for example, text images collected by terminal devices equipped with image collection devices such as mobile phones) will be relatively low. These lower-resolution images lose detailed content information, resulting in lower accuracy of text recognition in the image. This problem is particularly serious for Scene Text Image (STI). A scene text image is an image that contains text information captured in a natural scene. The text information in the scene text image may include but is not limited to at least one of an ID number, a ticket, a billboard, a license plate, and the like. Examples of text information are shown in Figs. 1A to 1C. Since the characteristics of the text in different scene text images are quite different, for example, the text size, font, color, brightness and/or degree of distortion may be different, therefore, the difficulty of text recognition for scene text images is much greater than that of scanned documents The text in the image is recognized, which leads to a lower recognition accuracy of the scene text image than the recognition accuracy of the printed text image.
传统的文本识别方式一般是先利用文本图像中相邻像素在颜色上的相似性,根据预定义的方式来在相邻像素的颜色之间进行插值,从而对文本图像的纹理进行重建,然后基于重建的文本图像进行文本识别。这种文本识别方式对比较清晰的文本图像的识别准确率较高,但是,对于低分辨率的文本图像的识别准确率急剧下降。基于此,本公开实施例提供一种文本识别方法,如图2所示,所述方法可包括步骤201至步骤203。The traditional text recognition method generally uses the color similarity of adjacent pixels in the text image to interpolate between the colors of adjacent pixels according to a predefined method, thereby reconstructing the texture of the text image, and then based on The reconstructed text image performs text recognition. This text recognition method has a higher recognition accuracy rate for relatively clear text images, but the recognition accuracy rate for low-resolution text images drops sharply. Based on this, an embodiment of the present disclosure provides a text recognition method. As shown in FIG. 2, the method may include step 201 to step 203.
步骤201:获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性。Step 201: Obtain a feature map of a first text image, where the feature map includes at least one feature sequence, and the feature sequence is used to represent the correlation between at least two image blocks in the first text image.
步骤202:根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率。Step 202: Process the first text image according to the at least one characteristic sequence to obtain a second text image, the resolution of the second text image is greater than the resolution of the first text image.
步骤203:对所述第二文本图像进行文本识别。Step 203: Perform text recognition on the second text image.
在步骤201中,所述第一文本图像中的文本可以包括文字、符号和数字中的至少一种。在一些实施例中,所述第一文本图像可以是在自然场景下拍摄得到的图像,所述第一文本图像中的文本可以是自然场景下的各种类型的文本。例如,所述第一文本图像可以是身份证的图像,所述第一文本图像中的文本为身份证上的身份证号和姓名。又例如,所述第一文本图像可以是广告牌的图像,所述第一文本图像中的文本为广告牌上的标语。在另一些实施例中,所述第一文本图像也可以是包括印刷体文字的图像。在实际应用中,所述第一文本图像可以是分辨率较低导致文本识别准确率低于预设的准确率阈值的文本图像。In step 201, the text in the first text image may include at least one of characters, symbols, and numbers. In some embodiments, the first text image may be an image captured in a natural scene, and the text in the first text image may be various types of text in a natural scene. For example, the first text image may be an image of an ID card, and the text in the first text image is the ID number and name on the ID card. For another example, the first text image may be an image of a billboard, and the text in the first text image is a slogan on the billboard. In other embodiments, the first text image may also be an image including printed text. In practical applications, the first text image may be a text image whose resolution is low and the accuracy of text recognition is lower than a preset accuracy threshold.
构成一个词语或者短语的各个文字或者构成一个单词的各个字母并非是随机组合的,例如,对于一组文本“打*鼠”,由于“打地鼠”是一个经常出现的短语,因此,“*”的内容有很大概率为“地”。这种根据上下文来推断文本内容的方式利用了文本之间的相关性。由于文本之间往往具有较强的相关性。因此,可以对第一文本图像进行特征提取,得到所述第一文本图像的特征图。具体来说,可以分别在水平方向和/或垂直方向上对第一文本图像进行特征提取,得到所述第一文本图像的至少一个特征序列。每个特征序列可以表示所述第一文本图像中至少两个图像块之间的相关性。The individual words that make up a word or phrase or the letters that make up a word are not randomly combined. For example, for a group of texts "playing *mouse", because "playing a mole" is a frequently occurring phrase, therefore, "* The content of "has a high probability of being "land". This method of inferring text content based on context takes advantage of the correlation between texts. Because texts tend to have strong relevance. Therefore, feature extraction can be performed on the first text image to obtain a feature map of the first text image. Specifically, feature extraction may be performed on the first text image in the horizontal direction and/or the vertical direction respectively to obtain at least one feature sequence of the first text image. Each feature sequence may represent the correlation between at least two image blocks in the first text image.
在一些实施例中,可以将每个像素点作为一个图像块,所述特征序列中的每个元素可以表示所述第一文本图像中相邻像素点之间的相关性。在另一些实施例中,也可以将多个相邻的像素点共同作为一个图像块,所述特征序列中的每个元素可以表示所述第一文本图像中相邻图像块之间的相关性。In some embodiments, each pixel can be regarded as an image block, and each element in the feature sequence can represent the correlation between adjacent pixels in the first text image. In other embodiments, multiple adjacent pixels can also be used as an image block, and each element in the feature sequence can represent the correlation between adjacent image blocks in the first text image. .
在很多情况下,第一文本图像的背景是单色的,且背景的颜色与文本的颜色一般不同,因此,可以根据第一文本图像对应的二值图像确定第一文本图像中文本的大致位置。在背景颜色与文本颜色差异较大的情况下,通过二值图像确定文本位置的方式能够获得比较准确的结果。此外,可以根据第一文本图像的通道图来确定第一文本图像中文本的颜色。因此,在一些实施例中,可以获取所述第一文本图像的多个通道图和所述第一文本图像对应的二值图像;对所述多个通道图和所述二值图像进行特征提取,得到所述第一文本图像的特征图。In many cases, the background of the first text image is monochrome, and the color of the background is generally different from the color of the text. Therefore, the approximate position of the text in the first text image can be determined according to the binary image corresponding to the first text image . In the case of a large difference between the background color and the text color, a more accurate result can be obtained by using a binary image to determine the position of the text. In addition, the color of the text in the first text image can be determined according to the channel map of the first text image. Therefore, in some embodiments, multiple channel images of the first text image and binary images corresponding to the first text image can be acquired; feature extraction is performed on the multiple channel images and the binary image To obtain a feature map of the first text image.
其中,所述二值图像可以根据第一文本图像的平均灰度值获取。具体来说,可以计算第一文本图像中各个像素点的平均灰度值,将像素值大于该平均灰度值的像素点的灰度值确定为第一灰度值,将像素值小于或等于该平均灰度值的像素点的灰度值确定为第二灰度值,所述第一灰度值大于所述第二灰度值。在一些实施例中,所述第一灰度值与所述第二灰度值之差可以大于预设像素值。例如,所述第一灰度值可以是255,所述第二灰度值可以是0,从而所述二值图中的各个像素点分别为黑色像素点或者白色像素点。这样,可以增加背景像素点的像素值与文本像素点的像素值的差异,使得对文本的定位更准确。所述通道图可以是RGB(Red Green Blue,红绿蓝)图像的R通道、G通道和B通道的通道图,也可以是其他的用于表征图像颜色的通道的通道图。Wherein, the binary image may be obtained according to the average gray value of the first text image. Specifically, the average gray value of each pixel in the first text image can be calculated, the gray value of the pixel whose pixel value is greater than the average gray value is determined as the first gray value, and the pixel value is less than or equal to The gray value of the pixel of the average gray value is determined as the second gray value, and the first gray value is greater than the second gray value. In some embodiments, the difference between the first gray value and the second gray value may be greater than a preset pixel value. For example, the first gray value may be 255, and the second gray value may be 0, so that each pixel in the binary image is a black pixel or a white pixel. In this way, the difference between the pixel value of the background pixel and the pixel value of the text pixel can be increased, so that the positioning of the text is more accurate. The channel diagram may be a channel diagram of the R channel, G channel, and B channel of an RGB (Red Green Blue) image, or may be a channel diagram of other channels used to characterize the color of the image.
在一些实施例中,可以将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图。所述神经网络可以是卷积神经网络(Convolutional Neural Networks,CNN)、长短期记忆网络(Long-Short Term Memory,LSTM)或者其他类型的神经网络,也可以是由多种神经网络组合成的神经网络。在一些实施例中,可以采用双向长短期记忆网络(Bidirectional Long-Short Term Memory,BLSTM)来获取所述特征图,并同时对所述第一文本图像在水平方向和垂直方向上进行特征提取,以提高重建出来的第二文本图像的鲁棒性。In some embodiments, the first text image can be input to a pre-trained neural network, and a feature map output by the neural network can be obtained. The neural network may be a convolutional neural network (Convolutional Neural Networks, CNN), a long-short-term memory network (Long-Short Term Memory, LSTM), or other types of neural networks, or a neural network composed of multiple neural networks. The internet. In some embodiments, a bidirectional Long-Short Term Memory (BLSTM) can be used to obtain the feature map, and feature extraction is performed on the first text image in the horizontal and vertical directions at the same time, In order to improve the robustness of the reconstructed second text image.
所述神经网络可以先根据所述第一文本图像生成中间图像,所述中间图像的通道数大于所述第一文本图像的通道数,再对所述中间图像进行特征提取,得到所述特征图。通过生成通道数大于所述第一文本图像的通道数的中间图像,可以增加所述第一文本图像中的特征的丰富度,从而提高重建出的第二文本图像的分辨率。在实际应用中,所述神经网络可以包括至少一个卷积神经网络以及一个双向长短期记忆网络,所述至少一个卷积神经网络中的每个卷积神经网络依次连接,所述双向长短期记忆网络连接到所述至少一个卷积神经网络中的最后一个卷积神经网络。可以通过所述至少一个卷积神经网络生成所述中间图像,并通过双向长短期记忆网络进行特征提取。The neural network may first generate an intermediate image according to the first text image, the number of channels of the intermediate image is greater than the number of channels of the first text image, and then perform feature extraction on the intermediate image to obtain the feature map . By generating an intermediate image whose number of channels is greater than the number of channels of the first text image, the richness of features in the first text image can be increased, thereby increasing the resolution of the reconstructed second text image. In practical applications, the neural network may include at least one convolutional neural network and a bidirectional long- and short-term memory network, each convolutional neural network in the at least one convolutional neural network is connected in turn, and the bidirectional long- and short-term memory The network is connected to the last one of the at least one convolutional neural network. The intermediate image may be generated through the at least one convolutional neural network, and feature extraction may be performed through a bidirectional long and short-term memory network.
进一步地,所述神经网络包括依次连接的多个子网络,其中每个子网络的结构可以与上述实施例中单个的神经网络的结构相同,此处不再赘述。假设所述神经网络中从前往后数排在第i位的子网络称为第i子网络,则可以将所述多个子网络中第i子网络输出的第i输出图像输入到所述多个子网络中的第i+1子网络,以通过所述第i+1子网络生成第i+1中间图像,并对所述第i+1中间图像进行特征提取,得到第i+1输出图像;所述第i+1中间图像的通道数大于所述第i输出图像的通道数;将第N输出图像确定为所述特征图;其中,i和N为正整数,N为子网络的总数,1≤i≤N-1,N≥2,其中,通过以下方式得到第1输出图像:第1子网络根据所述第一文本图像生成第1中间图像,并对所述第1中间图像进行特征提取,得到第1输出图像。Further, the neural network includes a plurality of sub-networks connected in sequence, and the structure of each sub-network may be the same as the structure of a single neural network in the above-mentioned embodiment, which will not be repeated here. Assuming that the i-th sub-network in the neural network is called the i-th sub-network, the i-th output image output by the i-th sub-network among the multiple sub-networks can be input to the multiple sub-networks. An i+1th subnetwork in the network to generate an i+1th intermediate image through the i+1th subnetwork, and perform feature extraction on the i+1th intermediate image to obtain an i+1th output image; The number of channels of the i+1th intermediate image is greater than the number of channels of the ith output image; the Nth output image is determined as the feature map; where i and N are positive integers, and N is the total number of sub-networks, 1≤i≤N-1, N≥2, where the first output image is obtained in the following way: the first sub-network generates a first intermediate image according to the first text image, and characterizes the first intermediate image Extract to get the first output image.
也就是说,第1子网络根据第一文本图像生成第1中间图像,对第1中间图像进行特征提取得到第1输出图像,并将第1输出图像输入至第2子网络,其中,第1中间图像的通道数大于第1文本图像的通道数。第2子网络根据第1输出图像生成第2中间图像,对第2中间图像进行特征提取得到第2输出图像,并将第2输出图像输入至第3子网络,其中,第2中间图像的通道数大于第1输出图像的通道数。以此类推。通过多个 级联的子网络,能够充分提取出第一文本图像中的特征,从而进一步提高重建出的第二文本图像的分辨率。That is to say, the first sub-network generates a first intermediate image based on the first text image, performs feature extraction on the first intermediate image to obtain the first output image, and inputs the first output image to the second sub-network, where the first The number of channels in the intermediate image is greater than the number of channels in the first text image. The second sub-network generates a second intermediate image based on the first output image, extracts features from the second intermediate image to obtain the second output image, and inputs the second output image to the third sub-network, where the channel of the second intermediate image The number is greater than the number of channels of the first output image. And so on. Through multiple cascaded sub-networks, the features in the first text image can be fully extracted, thereby further improving the resolution of the reconstructed second text image.
在步骤202中,可以基于所述特征序列,采用诸如pixel shuffle等上采样方式对所述第一文本图像进行上采样处理,得到所述第一文本图像对应的第二文本图像。进一步地,如果在步骤201中生成的特征图的通道数大于第一文本图像的通道数,则在步骤202中还可以在根据所述至少一个特征序列对所述第一文本图像进行处理之前,对所述第一文本图像进行处理,以使所述第一文本图像的通道数与所述特征图的通道数相同。然后,再根据特征图中的特征序列对处理后的第一文本图像进行处理,得到第二文本图像。本步骤中对第一文本图像进行处理以增加第一文本图像的通道数的过程可以采用卷积神经网络实现。In step 202, based on the feature sequence, the first text image may be up-sampled using an up-sampling method such as pixel shuffle to obtain a second text image corresponding to the first text image. Further, if the number of channels of the feature map generated in step 201 is greater than the number of channels of the first text image, in step 202, before the first text image is processed according to the at least one feature sequence, The first text image is processed so that the number of channels of the first text image is the same as the number of channels of the feature map. Then, the processed first text image is processed according to the feature sequence in the feature map to obtain the second text image. In this step, the process of processing the first text image to increase the number of channels of the first text image can be implemented by using a convolutional neural network.
在此基础上,在得到第二文本图像之后,还可以对所述第二文本图像进行处理,以使所述第二文本图像的通道数与所述第一文本图像的通道数相同,即,将第二文本图像恢复成四通道。这一过程也可以由一个卷积神经网络实现。On this basis, after the second text image is obtained, the second text image can also be processed so that the number of channels of the second text image is the same as the number of channels of the first text image, that is, Restore the second text image to four channels. This process can also be implemented by a convolutional neural network.
在一些实施例中,步骤201中采用的神经网络可以基于多组训练图像训练得到,每组训练图像包括具有同一文本的第一训练图像和第二训练图像,所述第一训练图像与所述第二训练图像包括相同文本;其中,所述第一训练图像的分辨率小于预设的第一分辨率阈值,所述第二训练图像的分辨率大于预设的第二分辨率阈值,所述第一分辨率阈值小于或等于所述第二分辨率阈值。可以将所述第一训练图像称为低分辨率(Low Resolution,LR)图像,将所述第二训练图像称为高分辨率(High Resolution,HR)图像。In some embodiments, the neural network used in step 201 may be trained based on multiple sets of training images. Each set of training images includes a first training image and a second training image with the same text. The second training image includes the same text; wherein the resolution of the first training image is less than a preset first resolution threshold, the resolution of the second training image is greater than a preset second resolution threshold, and The first resolution threshold is less than or equal to the second resolution threshold. The first training image may be referred to as a low resolution (Low Resolution, LR) image, and the second training image may be referred to as a high resolution (High Resolution, HR) image.
可以预先建立文本图像数据集,所述文本图像数据集中可包括多个文本图像对,每个文本图像对中包括一个低分辨率的文本图像以及一个与所述低分辨率的文本图像对应的高分辨率的文本图像。所述文本图像对中的文本可以是各种自然场景下的文本,所述自然场景可以包括但不限于街道、图书馆、商店、交通工具内部等场景中的至少一者。A text image data set may be established in advance. The text image data set may include a plurality of text image pairs, and each text image pair includes a low-resolution text image and a high-resolution text image corresponding to the low-resolution text image. Resolution text image. The text in the text image pair may be text in various natural scenes, and the natural scene may include, but is not limited to, at least one of scenes such as streets, libraries, shops, and interiors of vehicles.
在另一些实施例中,也可以将以下神经网络作为一个总的神经网络,通过所述第一训练图像和第二训练图像直接对所述总的神经网络进行训练:用于进行特征提取以获取特征图的神经网络,在特征提取之前用于对第一文本图像进行处理以增加第一文本图像的通道数的卷积神经网络,以及在得到第二文本图像之后,对第二文本图像进行通道恢复的卷积神经网络。In other embodiments, the following neural network can also be used as a general neural network, and the general neural network is directly trained through the first training image and the second training image: for feature extraction to obtain A neural network of feature maps, a convolutional neural network used to process the first text image to increase the number of channels of the first text image before feature extraction, and after obtaining the second text image, channel the second text image Restored convolutional neural network.
具体来说,可以将所述第一训练图像输入所述神经网络,并获取所述神经网络的输出图像;基于所述第一训练图像对应的第二训练图像与所述输出图像确定损失函数;基于所述损失函数对所述神经网络进行监督训练。Specifically, the first training image may be input to the neural network, and an output image of the neural network may be obtained; a loss function may be determined based on the second training image corresponding to the first training image and the output image; Perform supervised training on the neural network based on the loss function.
所述损失函数可以是各种类型的损失函数,也可以是两种或者两种以上损失函数的组合。在一些实施例中,所述损失函数包括第一损失函数和第二损失函数中的至少一者,所述第一损失函数可以基于第一训练图像和第二训练图像中各个对应像素的均方差确定,例如,可以是L2损失函数。在另一些实施例中,所述第二损失函数可以基于第一训练图像和第二训练图像中各个对应像素的梯度场之差确定,例如,可以是梯度剖面损 失函数(Gradient Profile Loss,GPL)。The loss function may be various types of loss functions, or a combination of two or more loss functions. In some embodiments, the loss function includes at least one of a first loss function and a second loss function, and the first loss function may be based on the mean square error of each corresponding pixel in the first training image and the second training image. Determining, for example, can be the L2 loss function. In other embodiments, the second loss function may be determined based on the difference between the gradient fields of each corresponding pixel in the first training image and the second training image, for example, may be a gradient profile loss function (Gradient Profile Loss, GPL) .
梯度剖面损失函数L GP定义如下: The gradient profile loss function L GP is defined as follows:
Figure PCTCN2021088389-appb-000001
Figure PCTCN2021088389-appb-000001
其中,
Figure PCTCN2021088389-appb-000002
表示HR图像在像素x处的梯度场,
Figure PCTCN2021088389-appb-000003
表示HR图像对应的超分辨率图像(例如,图4中的输出图像)在像素x处的梯度场,x 0表示像素的下限,x 1表示像素的上限,E表示计算能量,公式中
Figure PCTCN2021088389-appb-000004
的下标1表示计算L1损失函数。
in,
Figure PCTCN2021088389-appb-000002
Represents the gradient field of the HR image at pixel x,
Figure PCTCN2021088389-appb-000003
Represents the gradient field of the super-resolution image corresponding to the HR image (for example, the output image in Figure 4) at pixel x, x 0 represents the lower limit of the pixel, x 1 represents the upper limit of the pixel, E represents the calculated energy, in the formula
Figure PCTCN2021088389-appb-000004
The subscript 1 represents the calculation of the L1 loss function.
梯度场生动地展示了文本图像的文本特征和背景特征。另外,LR图像总是有更宽的梯度场曲线,而HR图像的梯度场曲线更窄。在获取到HR图像的梯度场之后,可以在不进行复杂的数学运算的情况下将梯度场曲线压缩得更窄。因此,通过采用梯度剖面损失函数,可以重建文本特征和背景特征之间的锐利边界,有助于更好地区分文本和背景,并且能够产生更加清晰的形状,使得训练出的神经网络更加可靠。The gradient field vividly shows the text characteristics and background characteristics of the text image. In addition, the LR image always has a wider gradient field curve, while the HR image has a narrower gradient field curve. After acquiring the gradient field of the HR image, the gradient field curve can be compressed to be narrower without performing complicated mathematical operations. Therefore, by using the gradient profile loss function, the sharp boundary between the text feature and the background feature can be reconstructed, which helps to better distinguish the text and the background, and can produce clearer shapes, making the trained neural network more reliable.
在传统的模型训练方式中,一般是通过对高分辨率图像进行下采样的方式人为地生成低分辨率图像(通过这种方式生成的低分辨率图像称为人造低分辨率图像),再通过人造低分辨率图像进行模型训练。然而,相对于这种人造低分辨率图像,真实的低分辨率图像(由于拍摄焦距较长等原因导致的低分辨率图像)的分辨率往往更低,也更加多样化。另外,很多情况下文本图像中的文本具有多样的形状、分散的拍摄光照和不同的背景。因此,通过人造低分辨率图像训练出的模型并不能很好地获取文本图像的特征图,导致文本识别的准确率较低。In traditional model training methods, low-resolution images are artificially generated by down-sampling high-resolution images (low-resolution images generated in this way are called artificial low-resolution images), and then pass Artificial low-resolution images are used for model training. However, compared with such artificial low-resolution images, real low-resolution images (low-resolution images due to a long shooting focal length, etc.) tend to have lower resolution and more diversification. In addition, in many cases, the text in the text image has various shapes, scattered lighting and different backgrounds. Therefore, the model trained through artificial low-resolution images cannot obtain the feature map of the text image well, resulting in low accuracy of text recognition.
为了解决上述问题,本公开实施例采用的第一训练图像和第二训练图像均为真实图像,即,通过不同焦距拍摄到的图像。其中,所述第一训练图像由设置有第一焦距的第一图像采集装置对第一位置处的拍摄对象进行拍摄得到,所述第二训练图像由设置有第二焦距的第二图像采集装置对所述第一位置处的所述拍摄对象进行拍摄得到,所述第一焦距小于所述第二焦距。所述第一图像采集装置和所述第二图像采集装置可以是同一个图像采集装置,也可以是不同的图像采集装置。在一些实施例中,第一焦距的取值可以在24mm至120mm之间,例如可以是70mm。在另一些实施例中,第二焦距的取值可以在120mm至240mm之间,例如,可以是150mm。进一步地,所述第一焦距和所述第二焦距的数量均可以为多个,且多个所述第一焦距中的每个第一焦距均小于多个所述第二焦距中的最小的第二焦距。例如,所述第一焦距可以包括35mm,50mm和70mm等,所述第二焦距可以包括150mm,170mm和190mm等。In order to solve the above-mentioned problem, the first training image and the second training image used in the embodiments of the present disclosure are both real images, that is, images taken at different focal lengths. Wherein, the first training image is obtained by a first image acquisition device provided with a first focal length to photograph the subject at a first position, and the second training image is obtained by a second image acquisition device provided with a second focal length It is obtained by photographing the subject at the first position that the first focal length is smaller than the second focal length. The first image acquisition device and the second image acquisition device may be the same image acquisition device, or may be different image acquisition devices. In some embodiments, the value of the first focal length may be between 24 mm and 120 mm, for example, it may be 70 mm. In other embodiments, the value of the second focal length may be between 120 mm and 240 mm, for example, it may be 150 mm. Further, the number of the first focal length and the second focal length may be multiple, and each of the multiple first focal lengths is smaller than the smallest one of the multiple second focal lengths. The second focal length. For example, the first focal length may include 35mm, 50mm, 70mm, etc., and the second focal length may include 150mm, 170mm, 190mm, etc.
在使用文本图像数据集中的文本图像对进行模型训练时,一般是先从文本图像对中的文本图像中将包括文本的区域裁剪下来,将从文本图像对中的低分辨率的文本图像中裁剪下来的图像区域作为第一训练图像,将从文本图像对中的高分辨率的文本图像中裁剪下来的图像区域作为第二训练图像。裁剪下来的第一训练图像和第二训练图像的尺寸 相同。When using the text image pairs in the text image data set for model training, generally the text image in the text image pair is first cropped from the area including the text, and the text image in the text image pair is cropped from the low-resolution text image. The resulting image area is used as the first training image, and the image area cropped from the high-resolution text image in the text image pair is used as the second training image. The cropped first training image and the second training image have the same size.
由于同一文本图像对中的文本相同,为了提高处理效率,一般会将文本图像对中的一个图像作为参考图像,获取待裁剪区域在该参考图像中的位置,再根据所述位置对另一个图像进行裁剪。例如,可以将文本图像对中的高分辨率图像作为参考图像,根据高分辨率图像中文本的位置来对低分辨率图像进行裁剪。然而,由于在拍摄过程中相机移动等原因,导致每个图像的中心点位置会有所不同,因此,通过上述方式进行裁剪,得到的第一训练图像和第二训练图像中文本的位置会有所不同,这种现象称为不对齐,如图3所示。不对齐会使得模型错误地将一个图像的背景部分与另一个图像的文字部分相对应,从而学习到错误的像素对应信息,产生重影问题。Since the text in the same text image pair is the same, in order to improve processing efficiency, one image in the text image pair is generally used as a reference image, the position of the region to be cropped in the reference image is obtained, and then the other image is adjusted according to the position Make cropping. For example, the high-resolution image in the text image pair can be used as the reference image, and the low-resolution image can be cropped according to the position of the text in the high-resolution image. However, due to the movement of the camera during the shooting and other reasons, the position of the center point of each image will be different. Therefore, through the above-mentioned method of cropping, the position of the text in the first training image and the second training image obtained will be different. The difference is that this phenomenon is called misalignment, as shown in Figure 3. Misalignment will cause the model to mistakenly correspond the background part of one image with the text part of another image, thereby learning the wrong pixel corresponding information and causing ghosting problems.
因此,为了解决上述问题,在一些实施例中,在基于具有同一文本的第一训练图像和第二训练图像进行神经网络训练之前,还可以对所述第一训练图像和第二训练图像进行对齐。具体地,可以通过预先训练的模型对第一训练图像进行处理,从而使得第一训练图像与第二训练图像对齐。所述模型可以对第一训练图像进行插值和平移,从而将第一训练图像与第二训练图像进行对齐。所述预先训练的模型可以是空间变换网络(Spatial Transformation Networks,STN)。通过图像对齐,能够有效减轻重影问题,提高训练出的神经网络的准确率。Therefore, in order to solve the above problem, in some embodiments, before performing neural network training based on the first training image and the second training image with the same text, the first training image and the second training image may also be aligned. . Specifically, the first training image can be processed through a pre-trained model, so that the first training image is aligned with the second training image. The model can interpolate and translate the first training image, thereby aligning the first training image with the second training image. The pre-trained model may be a spatial transformation network (Spatial Transformation Networks, STN). Through image alignment, the ghosting problem can be effectively reduced, and the accuracy of the trained neural network can be improved.
每组训练图像中的第一训练图像和第二训练图像的数量均为1。为了更好地识别图像,可以将所有图像都旋转到水平方向,再根据旋转后的第一训练图像和第二训练图像进行神经网络的训练。The number of the first training image and the second training image in each group of training images is 1. In order to better recognize images, all images can be rotated to the horizontal direction, and then the neural network can be trained according to the rotated first training image and second training image.
还可以对所述第一训练图像和所述第二训练图像中的至少一项进行缩放处理,以使所述第一训练图像和所述第二训练图像的尺寸达到预设值。具体来说,可以对像素尺寸小于第一尺寸的第一训练图像进行上采样处理,以使所述第一训练图像达到第一尺寸;可以对像素尺寸小于第二尺寸的第二训练图像进行上采样处理,以使所述第二训练图像达到第二尺寸,所述第一尺寸小于所述第二尺寸。在实践过程中发现,在文本图像的像素高度达到16的情况下,对文本图像进行重建能够较大地提升文本识别效果,如果文本图像的像素高度太小,即使对文本图像进行重建,识别结果也不够理想,因此,可以选择像素高度为16作为第一尺寸。进一步地,所述第一尺寸可以设置为64×16的像素尺寸。另一方面,在像素高度超过32的情况下,即便增加像素尺寸,对文本识别的效果提升也不大,因此,可以选择像素高度为32作为第二尺寸。进一步地,所述第二尺寸可以设置为128×32的像素尺寸。It is also possible to perform scaling processing on at least one of the first training image and the second training image, so that the sizes of the first training image and the second training image reach a preset value. Specifically, the first training image whose pixel size is smaller than the first size can be up-sampled, so that the first training image reaches the first size; the second training image whose pixel size is smaller than the second size can be uploaded. Sampling processing is performed so that the second training image reaches a second size, and the first size is smaller than the second size. In practice, it is found that when the pixel height of the text image reaches 16, the reconstruction of the text image can greatly improve the text recognition effect. If the pixel height of the text image is too small, even if the text image is reconstructed, the recognition result will be Not ideal enough, therefore, a pixel height of 16 can be selected as the first size. Further, the first size may be set to a pixel size of 64×16. On the other hand, when the pixel height exceeds 32, even if the pixel size is increased, the effect of text recognition is not improved. Therefore, a pixel height of 32 can be selected as the second size. Further, the second size may be set to a pixel size of 128×32.
还可以从所述文本图像数据集中选取一部分图像对作为测试集,用来测试训练出的神经网络的性能。根据所述图像对中低分辨率图像的分辨率,可以将所述测试集划分为三个子集,其中,第一子集中的低分辨率图像的分辨率小于预设的第三分辨率阈值,第二子集中的低分辨率图像的分辨率大于所述第三分辨率阈值且小于预设的第四分辨率阈值,第三子集中的低分辨率图像的分辨率大于预设的第四分辨率阈值,所述第三分辨率阈值小于所述第四分辨率阈值。在一些实施例中,可以根据所述测试集中低分辨率图像的分辨率范围来设置第三分辨率阈值和第四分辨率阈值。可以分别通过三个子集来对神经网络的性能进行测试,并根据三个子集对应的测试结果确定神经网络的性能。It is also possible to select a part of image pairs from the text image data set as a test set to test the performance of the trained neural network. According to the resolution of the low-resolution images in the image pair, the test set can be divided into three subsets, wherein the resolution of the low-resolution images in the first subset is less than the preset third resolution threshold, The resolution of the low-resolution images in the second subset is greater than the third resolution threshold and less than the preset fourth resolution threshold, and the resolution of the low-resolution images in the third subset is greater than the preset fourth resolution The third resolution threshold is smaller than the fourth resolution threshold. In some embodiments, the third resolution threshold and the fourth resolution threshold may be set according to the resolution range of the low-resolution images in the test set. The performance of the neural network can be tested through three subsets, and the performance of the neural network can be determined according to the test results corresponding to the three subsets.
图4示出了本公开实施例的文本识别方法的整体流程。首先,进行总的神经网络训练。将第一训练图像输入神经网络,这里的神经网络包括用于进行特征提取的神经网络,以及用于增加和减少第一文本图像的通道数的神经网络,例如,卷积神经网络,还可以包括用于对训练图像进行对齐的神经网络,例如,空间变换网络。这里用于进行特征提取的每个神经网络可称为一个序列残差模块(Sequential Residual Block,SRB),每个SRB可包括两个卷积神经网络和一个双向长短期记忆网络(BLSTM)。首先将第一训练图像与第二训练图像进行对齐,然后将经过对齐的第一训练图像和第二训练图像通过卷积神经网络进行处理,使第一训练图像的通道数增加,再将卷积神经网络处理后的第一训练图像输入多个级联的序列残差模块进行特征提取以得到第一训练图像的特征图。然后将该特征图通过上采样模块进行上采样处理,再通过卷积神经网络将上采样后的图像的通道数恢复成原来的通道数,得到第一训练图像对应的输出图像。根据所述输出图像与第一训练图像对应的第二训练图像计算L2损失函数和梯度剖面损失函数,通过上述两个损失函数对总的神经网络的训练过程进行监督,得到所述总的神经网络参数。所述总的神经网络训练完成之后,将待处理的第一文本图像输入所述总的神经网络,所述总的神经网络的输出图像即为第二文本图像。对第二文本图像进行文本识别,得到文本识别结果。Fig. 4 shows the overall flow of the text recognition method according to an embodiment of the present disclosure. First, perform general neural network training. The first training image is input to the neural network, where the neural network includes a neural network for feature extraction, and a neural network for increasing and decreasing the number of channels of the first text image, for example, a convolutional neural network, which may also include A neural network used to align training images, for example, a spatial transformation network. Here, each neural network used for feature extraction may be referred to as a sequential residual block (Sequential Residual Block, SRB), and each SRB may include two convolutional neural networks and a bidirectional long short-term memory network (BLSTM). First align the first training image with the second training image, and then process the aligned first training image and second training image through a convolutional neural network to increase the number of channels of the first training image, and then convolution The first training image processed by the neural network is input to a plurality of cascaded sequence residual modules for feature extraction to obtain a feature map of the first training image. Then the feature map is up-sampled by the up-sampling module, and then the number of channels of the up-sampled image is restored to the original number of channels through the convolutional neural network to obtain the output image corresponding to the first training image. Calculate the L2 loss function and the gradient profile loss function according to the second training image corresponding to the output image and the first training image, and supervise the training process of the overall neural network through the above two loss functions to obtain the overall neural network parameter. After the general neural network training is completed, the first text image to be processed is input to the general neural network, and the output image of the general neural network is the second text image. Perform text recognition on the second text image to obtain a text recognition result.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above-mentioned methods of the specific implementation, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possibility. The inner logic is determined.
如图5所示,本公开还提供一种图像处理装置,所述装置包括:As shown in FIG. 5, the present disclosure also provides an image processing device, which includes:
获取模块501,用于获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性;The acquiring module 501 is configured to acquire a feature map of a first text image, the feature map includes at least one feature sequence, and the feature sequence is used to represent the correlation between at least two image blocks in the first text image ;
第一处理模块502,用于根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率;The first processing module 502 is configured to process the first text image according to the at least one characteristic sequence to obtain a second text image, the resolution of the second text image is greater than the resolution of the first text image ;
文本识别模块503,用于对第二文本图像进行文本识别。The text recognition module 503 is used to perform text recognition on the second text image.
在一些实施例中,所述获取模块包括:第一获取单元,用于获取所述第一文本图像的多个通道图和所述第一文本图像对应的二值图像;特征提取单元,用于对所述多个通道图和所述二值图像进行特征提取,得到所述第一文本图像的特征图。In some embodiments, the acquisition module includes: a first acquisition unit configured to acquire multiple channel diagrams of the first text image and a binary image corresponding to the first text image; and a feature extraction unit configured to Perform feature extraction on the multiple channel images and the binary image to obtain a feature map of the first text image.
在一些实施例中,所述获取模块用于:将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图。In some embodiments, the acquisition module is configured to: input the first text image into a pre-trained neural network, and acquire a feature map output by the neural network.
在一些实施例中,所述神经网络基于以下方式获取所述特征图:根据所述第一文本图像生成中间图像,所述中间图像的通道数大于所述第一文本图像的通道数;对所述中间图像进行特征提取,得到所述特征图。In some embodiments, the neural network obtains the feature map based on the following method: generating an intermediate image according to the first text image, the number of channels of the intermediate image is greater than the number of channels of the first text image; Perform feature extraction on the intermediate image to obtain the feature map.
在一些实施例中,所述神经网络包括至少一个卷积神经网络以及双向长短期记忆网络,所述至少一个卷积神经网络的输出端与所述双向长短期记忆网络的输入端相连;所述获取模块包括:第二获取单元,用于将所述第一文本图像输入所述至少一个卷积神经网络,获取所述至少一个卷积神经网络输出的中间图像;第三获取单元,用于将所述中 间图像输入所述双向长短期记忆网络,获取所述双向长短期记忆网络输出的所述特征图。In some embodiments, the neural network includes at least one convolutional neural network and a bidirectional long and short-term memory network, and an output end of the at least one convolutional neural network is connected to an input end of the bidirectional long and short-term memory network; The acquisition module includes: a second acquisition unit, configured to input the first text image into the at least one convolutional neural network, and acquire an intermediate image output by the at least one convolutional neural network; and a third acquisition unit, configured to The intermediate image is input to the two-way long and short-term memory network, and the feature map output by the two-way long and short-term memory network is obtained.
在一些实施例中,所述神经网络包括依次连接的多个子网络;所述获取模块用于:将所述多个子网络中第i子网络输出的第i输出图像输入到所述多个子网络中的第i+1子网络,以通过所述第i+1子网络生成第i+1中间图像,并对所述第i+1中间图像进行特征提取,得到第i+1输出图像;所述第i+1中间图像的通道数大于所述第i输出图像的通道数;将第N输出图像确定为所述特征图;其中,i和N为正整数,N为子网络的总数,1≤i≤N-1,N≥2,其中,通过以下方式得到第1输出图像:第1子网络根据所述第一文本图像生成第1中间图像,并对所述第1中间图像进行特征提取,得到第1特征图。In some embodiments, the neural network includes multiple sub-networks connected in sequence; the acquisition module is configured to: input the i-th output image output by the i-th sub-network of the multiple sub-networks into the multiple sub-networks To generate the i+1th intermediate image through the i+1th subnetwork, and perform feature extraction on the i+1th intermediate image to obtain the i+1th output image; The number of channels of the i+1th intermediate image is greater than the number of channels of the i-th output image; the Nth output image is determined as the feature map; where i and N are positive integers, and N is the total number of sub-networks, 1≤ i≤N-1, N≥2, where the first output image is obtained by the following method: the first sub-network generates a first intermediate image according to the first text image, and performs feature extraction on the first intermediate image, Obtain the first feature map.
在一些实施例中,所述装置还包括:第二处理模块,用于在根据所述至少一个特征序列对所述第一文本图像进行处理之前,对所述第一文本图像进行处理,以使所述第一文本图像的通道数与所述特征图的通道数相同。In some embodiments, the device further includes: a second processing module, configured to process the first text image before processing the first text image according to the at least one characteristic sequence, so that The number of channels of the first text image is the same as the number of channels of the feature map.
在一些实施例中,所述装置还包括:第三处理模块,用于在得到第二文本图像之后,对所述第二文本图像进行处理,以使所述第二文本图像的通道数与所述第一文本图像的通道数相同;所述文本识别模块用于:对处理后的第二文本图像进行文本识别。In some embodiments, the device further includes: a third processing module, configured to process the second text image after obtaining the second text image, so that the number of channels of the second text image is equal to that of the second text image. The number of channels of the first text image is the same; the text recognition module is used to perform text recognition on the processed second text image.
在一些实施例中,所述装置还包括:训练模块,用于基于至少一组训练图像对所述神经网络进行训练,每组训练图像包括第一训练图像和第二训练图像,所述第一训练图像与所述第二训练图像包括相同文本;其中,所述第一训练图像的分辨率小于第一分辨率阈值,所述第二训练图像的分辨率大于第二分辨率阈值,所述第一分辨率阈值小于或等于所述第二分辨率阈值。In some embodiments, the device further includes: a training module for training the neural network based on at least one set of training images, and each set of training images includes a first training image and a second training image. The training image and the second training image include the same text; wherein the resolution of the first training image is less than the first resolution threshold, the resolution of the second training image is greater than the second resolution threshold, and the first A resolution threshold is less than or equal to the second resolution threshold.
在一些实施例中,所述训练模块包括:输入单元,用于将所述第一训练图像输入所述神经网络,并获取所述神经网络的输出图像;确定单元,用于基于所述第一训练图像对应的第二训练图像与所述输出图像确定损失函数;训练单元,用于基于所述损失函数对所述神经网络进行监督训练。In some embodiments, the training module includes: an input unit, configured to input the first training image into the neural network, and obtain an output image of the neural network; and a determining unit, configured based on the first training image The second training image corresponding to the training image and the output image determine a loss function; the training unit is configured to perform supervised training on the neural network based on the loss function.
在一些实施例中,所述损失函数包括第一损失函数和第二损失函数中的至少一者;所述第一损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的均方差确定;和/或,所述第二损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的梯度场之差确定。In some embodiments, the loss function includes at least one of a first loss function and a second loss function; the first loss function is based on each corresponding pixel in the first training image and the second training image And/or, the second loss function is determined based on the difference between the gradient fields of each corresponding pixel in the first training image and the second training image.
在一些实施例中,所述装置还包括对齐模块,用于在基于所述至少一组训练图像对所述神经网络进行训练之前,对所述第一训练图像和所述第二训练图像进行对齐。In some embodiments, the device further includes an alignment module for aligning the first training image and the second training image before training the neural network based on the at least one set of training images .
在一些实施例中,所述对齐模块用于:通过预先训练的空间变换网络对所述第一训练图像进行处理,以将所述第一训练图像中的文本与所述第二训练图像中的文本进行对齐。In some embodiments, the alignment module is used to process the first training image through a pre-trained spatial transformation network, so as to compare the text in the first training image with the text in the second training image. The text is aligned.
在一些实施例中,所述第一训练图像由设置有第一焦距的第一图像采集装置对第一位置处的拍摄对象进行拍摄得到;所述第二训练图像由设置有第二焦距的第二图像采集装置对所述第一位置处的所述拍摄对象进行拍摄得到;所述第一焦距小于所述第二焦距。In some embodiments, the first training image is obtained by photographing the subject at the first position by a first image acquisition device with a first focal length; the second training image is obtained by a first image acquisition device with a second focal length. The second image acquisition device is obtained by photographing the subject at the first position; the first focal length is smaller than the second focal length.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For brevity, here No longer.
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。The embodiments of the present specification also provide a computer device, which includes at least a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program when the program is executed. The method described.
本公开实施例还包括一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。The embodiments of the present disclosure also include a computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the method described in any embodiment when the program is executed. .
图6示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器601、存储器602、输入/输出接口603、通信接口604和总线605。其中处理器601、存储器602、输入/输出接口603和通信接口604通过总线605实现彼此之间在设备内部的通信连接。FIG. 6 shows a more specific hardware structure diagram of a computing device provided by an embodiment of this specification. The device may include a processor 601, a memory 602, an input/output interface 603, a communication interface 604, and a bus 605. The processor 601, the memory 602, the input/output interface 603, and the communication interface 604 realize the communication connection between each other in the device through the bus 605.
处理器601可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。The processor 601 can be implemented by a general CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits for execution related Program to implement the technical solutions provided in the embodiments of this specification.
存储器602可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器602可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器602中,并由处理器601来调用执行。The memory 602 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 602 may store an operating system and other application programs. When the technical solutions provided in the embodiments of the present specification are implemented through software or firmware, related program codes are stored in the memory 602 and called and executed by the processor 601.
输入/输出接口603用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 603 is used to connect an input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
通信接口604用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 604 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线605包括一通路,在设备的各个组件(例如处理器601、存储器602、输入/输出接口603和通信接口604)之间传输信息。The bus 605 includes a path for transmitting information between various components of the device (for example, the processor 601, the memory 602, the input/output interface 603, and the communication interface 604).
需要说明的是,尽管上述设备仅示出了处理器601、存储器602、输入/输出接口603、通信接口604以及总线605,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 601, the memory 602, the input/output interface 603, the communication interface 604, and the bus 605, in the specific implementation process, the device may also include the equipment necessary for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the solutions of the embodiments of the present specification, and not necessarily include all the components shown in the figures.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该 程序被处理器执行时实现前述任一实施例所述的方法。The embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in any of the foregoing embodiments is implemented.
本说明书实施例还提供一种计算机程序,其中,所述计算机程序被处理器执行时实现前述任一实施例所述的方法。The embodiments of this specification also provide a computer program, wherein the computer program implements the method described in any of the foregoing embodiments when the computer program is executed by a processor.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。From the description of the foregoing implementation manners, it can be understood that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the embodiments of this specification can be embodied in the form of software products, which can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in the various embodiments or some parts of the embodiments of this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separated. The functions of the modules can be combined in the same way when implementing the solutions of the embodiments of this specification. Or multiple software and/or hardware implementations. It is also possible to select some or all of the modules according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.

Claims (18)

  1. 一种文本识别方法,所述方法包括:A text recognition method, the method includes:
    获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性;Acquiring a feature map of the first text image, the feature map including at least one feature sequence, the feature sequence being used to represent the correlation between at least two image blocks in the first text image;
    根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率;Processing the first text image according to the at least one characteristic sequence to obtain a second text image, the resolution of the second text image is greater than the resolution of the first text image;
    对所述第二文本图像进行文本识别。Perform text recognition on the second text image.
  2. 根据权利要求1所述的方法,其中,所述获取第一文本图像的特征图,包括:The method according to claim 1, wherein said obtaining the feature map of the first text image comprises:
    获取所述第一文本图像的多个通道图和所述第一文本图像对应的二值图像;Acquiring multiple channel images of the first text image and binary images corresponding to the first text image;
    对所述多个通道图和所述二值图像进行特征提取,得到所述第一文本图像的特征图。Perform feature extraction on the multiple channel images and the binary image to obtain a feature map of the first text image.
  3. 根据权利要求1或2所述的方法,其中,所述获取第一文本图像的特征图,包括:The method according to claim 1 or 2, wherein said obtaining the feature map of the first text image comprises:
    将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图。The first text image is input into a pre-trained neural network, and a feature map output by the neural network is obtained.
  4. 根据权利要求3所述的方法,其中,所述神经网络基于以下方式获取所述特征图:The method according to claim 3, wherein the neural network obtains the feature map based on the following method:
    根据所述第一文本图像生成中间图像,所述中间图像的通道数大于所述第一文本图像的通道数;Generating an intermediate image according to the first text image, the number of channels of the intermediate image is greater than the number of channels of the first text image;
    对所述中间图像进行特征提取,得到所述特征图。Perform feature extraction on the intermediate image to obtain the feature map.
  5. 根据权利要求3或4所述的方法,其中,所述神经网络包括至少一个卷积神经网络以及双向长短期记忆网络,所述至少一个卷积神经网络的输出端与所述双向长短期记忆网络的输入端相连;The method according to claim 3 or 4, wherein the neural network includes at least one convolutional neural network and a bidirectional long- and short-term memory network, and the output end of the at least one convolutional neural network is connected to the bidirectional long- and short-term memory network Connected to the input terminal;
    所述获取第一文本图像的特征序列,包括:The acquiring the characteristic sequence of the first text image includes:
    将所述第一文本图像输入所述至少一个卷积神经网络,获取所述至少一个卷积神经网络输出的中间图像;Inputting the first text image into the at least one convolutional neural network, and obtaining an intermediate image output by the at least one convolutional neural network;
    将所述中间图像输入所述双向长短期记忆网络,获取所述双向长短期记忆网络输出的所述特征图。The intermediate image is input to the bidirectional long and short-term memory network, and the feature map output by the bidirectional long and short-term memory network is obtained.
  6. 根据权利要求3至5任意一项所述的方法,其中,所述神经网络包括依次连接的多个子网络;The method according to any one of claims 3 to 5, wherein the neural network includes a plurality of sub-networks connected in sequence;
    所述将所述第一文本图像输入预先训练的神经网络,并获取所述神经网络输出的特征图,包括:The inputting the first text image into a pre-trained neural network and obtaining a feature map output by the neural network includes:
    将所述多个子网络中第i子网络输出的第i输出图像输入到所述多个子网络中的第i+1子网络,以通过所述第i+1子网络生成第i+1中间图像,并对所述第i+1中间图像进行特征提取,得到第i+1输出图像;所述第i+1中间图像的通道数大于所述第i输出图像的通道数;Input the i-th output image output by the i-th sub-network among the plurality of sub-networks to the i+1-th sub-network among the plurality of sub-networks to generate the i+1-th intermediate image through the i+1-th sub-network , Performing feature extraction on the i+1th intermediate image to obtain the i+1th output image; the number of channels of the i+1th intermediate image is greater than the number of channels of the ith output image;
    将第N输出图像确定为所述特征图;Determining the Nth output image as the feature map;
    其中,i和N为正整数,N为子网络的总数,1≤i≤N-1,N≥2,Among them, i and N are positive integers, N is the total number of sub-networks, 1≤i≤N-1, N≥2,
    其中,通过以下方式得到第1输出图像:第1子网络根据所述第一文本图像生成第1中间图像,并对所述第1中间图像进行特征提取,得到第1输出图像。Wherein, the first output image is obtained in the following manner: the first sub-network generates a first intermediate image according to the first text image, and performs feature extraction on the first intermediate image to obtain the first output image.
  7. 根据权利要求1至6任意一项所述的方法,所述方法还包括:The method according to any one of claims 1 to 6, the method further comprising:
    在根据所述至少一个特征序列对所述第一文本图像进行处理之前,对所述第一文本图像进行处理,以使所述第一文本图像的通道数与所述特征图的通道数相同。Before processing the first text image according to the at least one feature sequence, the first text image is processed so that the number of channels of the first text image is the same as the number of channels of the feature map.
  8. 根据权利要求7所述的方法,所述方法还包括:The method according to claim 7, further comprising:
    在得到第二文本图像之后,对所述第二文本图像进行处理,以使所述第二文本图像的通道数与所述第一文本图像的通道数相同;After obtaining the second text image, process the second text image so that the number of channels of the second text image is the same as the number of channels of the first text image;
    所述对所述第二文本图像进行文本识别,包括:The performing text recognition on the second text image includes:
    对处理后的第二文本图像进行文本识别。Perform text recognition on the processed second text image.
  9. 根据权利要求3至8任意一项所述的方法,所述方法还包括:The method according to any one of claims 3 to 8, the method further comprising:
    基于至少一组训练图像对所述神经网络进行训练,每组训练图像包括第一训练图像和第二训练图像,所述第一训练图像与所述第二训练图像包括相同文本;Training the neural network based on at least one set of training images, each set of training images includes a first training image and a second training image, and the first training image and the second training image include the same text;
    其中,所述第一训练图像的分辨率小于第一分辨率阈值,所述第二训练图像的分辨率大于第二分辨率阈值,所述第一分辨率阈值小于或等于所述第二分辨率阈值。Wherein, the resolution of the first training image is less than a first resolution threshold, the resolution of the second training image is greater than a second resolution threshold, and the first resolution threshold is less than or equal to the second resolution Threshold.
  10. 根据权利要求9所述的方法,其中,所述基于至少一组训练图像对所述神经网络进行训练,包括:The method according to claim 9, wherein the training the neural network based on at least one set of training images comprises:
    将所述第一训练图像输入所述神经网络,并获取所述神经网络的输出图像;Inputting the first training image to the neural network, and obtaining an output image of the neural network;
    基于所述第一训练图像对应的第二训练图像与所述输出图像确定损失函数;Determining a loss function based on the second training image corresponding to the first training image and the output image;
    基于所述损失函数对所述神经网络进行监督训练。Perform supervised training on the neural network based on the loss function.
  11. 根据权利要求10所述的方法,其中,所述损失函数包括第一损失函数和第二损失函数中的至少一者;The method according to claim 10, wherein the loss function includes at least one of a first loss function and a second loss function;
    所述第一损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的均方差确定;和/或,The first loss function is determined based on the mean square error of each corresponding pixel in the first training image and the second training image; and/or,
    所述第二损失函数基于所述第一训练图像和所述第二训练图像中各个对应像素的梯度场之差确定。The second loss function is determined based on the difference between the gradient fields of each corresponding pixel in the first training image and the second training image.
  12. 根据权利要求9至11任意一项所述的方法,所述方法还包括:The method according to any one of claims 9 to 11, the method further comprising:
    在基于所述至少一组训练图像对所述神经网络进行训练之前,对所述第一训练图像和所述第二训练图像进行对齐。Before training the neural network based on the at least one set of training images, the first training image and the second training image are aligned.
  13. 根据权利要求12所述的方法,其中,所述对所述第一训练图像和第二训练图像进行对齐,包括:The method according to claim 12, wherein the aligning the first training image and the second training image comprises:
    通过预先训练的空间变换网络对所述第一训练图像进行处理,以将所述第一训练图像中的文本与所述第二训练图像中的文本进行对齐。The first training image is processed through a pre-trained spatial transformation network to align the text in the first training image with the text in the second training image.
  14. 根据权利要求9至13任意一项所述的方法,其中,所述第一训练图像由设置有第一焦距的第一图像采集装置对第一位置处的拍摄对象进行拍摄得到;The method according to any one of claims 9 to 13, wherein the first training image is obtained by photographing the subject at the first position by a first image acquisition device provided with a first focal length;
    所述第二训练图像由设置有第二焦距的第二图像采集装置对所述第一位置处的所述拍摄对象进行拍摄得到;The second training image is obtained by photographing the subject at the first position by a second image acquisition device provided with a second focal length;
    所述第一焦距小于所述第二焦距。The first focal length is smaller than the second focal length.
  15. 一种文本识别装置,所述装置包括:A text recognition device, the device includes:
    获取模块,用于获取第一文本图像的特征图,所述特征图中包括至少一个特征序列,所述特征序列用于表示所述第一文本图像中至少两个图像块之间的相关性;An obtaining module, configured to obtain a feature map of a first text image, the feature map including at least one feature sequence, the feature sequence being used to represent the correlation between at least two image blocks in the first text image;
    第一处理模块,用于根据所述至少一个特征序列对所述第一文本图像进行处理,得到第二文本图像,所述第二文本图像的分辨率大于所述第一文本图像的分辨率;A first processing module, configured to process the first text image according to the at least one characteristic sequence to obtain a second text image, the resolution of the second text image is greater than the resolution of the first text image;
    文本识别模块,用于对第二文本图像进行文本识别。The text recognition module is used to perform text recognition on the second text image.
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1至14任意一项所述的方法。A computer-readable storage medium with a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1 to 14.
  17. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现权利要求1至14任意一项所述的方法。A computer device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the method of any one of claims 1 to 14 when the program is executed .
  18. 一种计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至14任意一项所述的方法。A computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 14 is realized.
PCT/CN2021/088389 2020-04-30 2021-04-20 Text identification method and apparatus, device, and storage medium WO2021218706A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022520075A JP2022550195A (en) 2020-04-30 2021-04-20 Text recognition method, device, equipment, storage medium and computer program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010362519.XA CN111553290A (en) 2020-04-30 2020-04-30 Text recognition method, device, equipment and storage medium
CN202010362519.X 2020-04-30

Publications (1)

Publication Number Publication Date
WO2021218706A1 true WO2021218706A1 (en) 2021-11-04

Family

ID=72000292

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088389 WO2021218706A1 (en) 2020-04-30 2021-04-20 Text identification method and apparatus, device, and storage medium

Country Status (3)

Country Link
JP (1) JP2022550195A (en)
CN (1) CN111553290A (en)
WO (1) WO2021218706A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium
CN112419159A (en) * 2020-12-07 2021-02-26 上海互联网软件集团有限公司 Character image super-resolution reconstruction system and method
CN112633429A (en) * 2020-12-21 2021-04-09 安徽七天教育科技有限公司 Method for recognizing handwriting choice questions of students
CN117037136B (en) * 2023-10-10 2024-02-23 中国科学技术大学 Scene text recognition method, system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism
CN110033000A (en) * 2019-03-21 2019-07-19 华中科技大学 A kind of text detection and recognition methods of bill images
CN110084172A (en) * 2019-04-23 2019-08-02 北京字节跳动网络技术有限公司 Character recognition method, device and electronic equipment
CN110168573A (en) * 2016-11-18 2019-08-23 易享信息技术有限公司 Spatial attention model for image labeling
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10043231B2 (en) * 2015-06-30 2018-08-07 Oath Inc. Methods and systems for detecting and recognizing text from images
CN107368831B (en) * 2017-07-19 2019-08-02 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN109800749A (en) * 2019-01-17 2019-05-24 湖南师范大学 A kind of character recognition method and device
CN110443239A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 The recognition methods of character image and its device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110168573A (en) * 2016-11-18 2019-08-23 易享信息技术有限公司 Spatial attention model for image labeling
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism
CN110033000A (en) * 2019-03-21 2019-07-19 华中科技大学 A kind of text detection and recognition methods of bill images
CN110084172A (en) * 2019-04-23 2019-08-02 北京字节跳动网络技术有限公司 Character recognition method, device and electronic equipment
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111553290A (en) 2020-08-18
JP2022550195A (en) 2022-11-30

Similar Documents

Publication Publication Date Title
WO2021218706A1 (en) Text identification method and apparatus, device, and storage medium
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
US20200364478A1 (en) Method and apparatus for liveness detection, device, and storage medium
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
US10062195B2 (en) Method and device for processing a picture
Lu et al. Robust blur kernel estimation for license plate images from fast moving vehicles
WO2014014678A1 (en) Feature extraction and use with a probability density function and divergence|metric
US20210256657A1 (en) Method, system, and computer-readable medium for improving quality of low-light images
US20200279166A1 (en) Information processing device
JP2013541119A (en) System and method for improving feature generation in object recognition
CN108876716B (en) Super-resolution reconstruction method and device
US8873839B2 (en) Apparatus of learning recognition dictionary, and method of learning recognition dictionary
Li et al. CG-DIQA: No-reference document image quality assessment based on character gradient
CN103198299A (en) Face recognition method based on combination of multi-direction dimensions and Gabor phase projection characteristics
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
US11481919B2 (en) Information processing device
CN112348008A (en) Certificate information identification method and device, terminal equipment and storage medium
CN111753714A (en) Multidirectional natural scene text detection method based on character segmentation
CN116469172A (en) Bone behavior recognition video frame extraction method and system under multiple time scales
CN113112531B (en) Image matching method and device
WO2022034678A1 (en) Image augmentation apparatus, control method, and non-transitory computer-readable storage medium
CN111402281B (en) Book edge detection method and device
CN114170589A (en) Rock lithology identification method based on NAS, terminal equipment and storage medium
Tran et al. Super-resolution in music score images by instance normalization
CN112036342A (en) Document snapshot method, device and computer storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21795694

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022520075

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21795694

Country of ref document: EP

Kind code of ref document: A1