WO2022147965A1 - 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统 - Google Patents

基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统 Download PDF

Info

Publication number
WO2022147965A1
WO2022147965A1 PCT/CN2021/099935 CN2021099935W WO2022147965A1 WO 2022147965 A1 WO2022147965 A1 WO 2022147965A1 CN 2021099935 W CN2021099935 W CN 2021099935W WO 2022147965 A1 WO2022147965 A1 WO 2022147965A1
Authority
WO
WIPO (PCT)
Prior art keywords
arithmetic
yolov3
network
neural network
mixnet
Prior art date
Application number
PCT/CN2021/099935
Other languages
English (en)
French (fr)
Inventor
刘天亮
梁聪聪
桂冠
戴修斌
Original Assignee
江苏拓邮信息智能技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏拓邮信息智能技术研究院有限公司 filed Critical 江苏拓邮信息智能技术研究院有限公司
Publication of WO2022147965A1 publication Critical patent/WO2022147965A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the invention relates to an arithmetic question reviewing system based on MixNet-YOLOv3 and convolution recursive neural network CRNN, which belongs to the cross text detection and recognition field of computer vision and natural language processing.
  • a typical OCR technical route includes the following three parts: image preprocessing, text detection and text recognition.
  • image preprocessing is usually to correct the imaging problem of the image.
  • Common preprocessing processes include: geometric transformation, distortion correction, blur removal, image enhancement, and light correction.
  • Text detection is to detect the location and range of text and its layout, usually including layout analysis and text line detection.
  • Text recognition is based on text detection, recognizes text content, and converts text information in images into text information. The main problem of text recognition is what each text is.
  • the present invention proposes an end-to-end arithmetic question review system.
  • the system mainly covers two branches: detection and identification.
  • the YOLOv3 algorithm is used to detect the boundary of each question, and the network is easier to learn the level through weight distribution. and replace the extraction network with a lighter MixNet network without losing accuracy; in the recognition part, the convolutional recurrent neural network CRNN network combining accuracy and efficiency is used, and the convolutional neural network CNN (Convolutional neural network) is adopted.
  • the combination of Neural Networks) and long short-term memory network LSTM (Long Short-Term Memory) enables the network to accurately learn the semantic information of the topic, and finally judges right or wrong and returns the correct answer through arithmetic logic.
  • the labeling frame of the picture is firstly generated by LabelImg labeling software, and then the original data set is expanded by means of data enhancement such as translation, rotation, and cropping to enhance the generalization and robustness of the data.
  • data enhancement such as translation, rotation, and cropping to enhance the generalization and robustness of the data.
  • a total of 4 a priori boxes in 2 scales are obtained through the K-means cluster analysis algorithm, which is used for the training of the detection network.
  • the step (2) uses the MixNet network to extract the multi-scale features of the image.
  • the depthwise separable convolution structure is replaced by the mixed depth convolution structure of different convolution kernels, which greatly reduces the amount of network parameters and enhances the ability of the network to integrate multi-scale semantic and localization features, so as to extract a more systematic and comprehensive Characteristics.
  • the MixNet network consists of a mixed depthwise separable convolution module (MDConv), which aims to fuse different convolution kernel sizes into a single convolution operation, making it easy to acquire different modalities with multiple resolutions. where the MDConv convolution operation has several design choices:
  • Kernel size of each group In theory, each group can have a kernel of any size. However, if two groups have the same size of convolution kernel, it is equivalent to merging the two groups into a single group, so each group must have a different size of the convolution kernel. Further, because the convolution kernel of small size usually has fewer parameters and FLOPS, the convolution kernel size is usually limited to start from 3x3 and increase monotonically by 2 per group. In other words, the convolution kernel size of group i is usually 2i+1. For example, a 4-group MDConv typically uses kernel sizes of ⁇ 3x3, 5x5, 7x7, 9x9 ⁇ . Within this constraint, the kernel size of each group is predefined relative to its group size g, thus simplifying the design process.
  • Channel size of each group Two channel division methods are mainly considered here: (a) equal division: the number of channels in each group is the same; (b) exponential division: the number of channels in the i-th group accounts for 2 of the total number of channels -i .
  • a 4-group MDConv has a total of 32 channels, the equal division divides the channels into (8, 8, 8, 8), and the exponential division divides the channels into (16, 8, 4, 4).
  • the step (3) fuses features of different scales through a feature pyramid network FPN (Feature Pyramid Networks), uses upsampling and channel splicing to organically combine positioning information and semantic information, and outputs 8 times and 16 times lower
  • FPN Feature Pyramid Networks
  • the sampled feature map is sent to the prediction module of YOLOv3.
  • the prediction module will convert the relative position of the frame (t x , ty , p w , ph ) to the absolute position (b x , by y , b w , b h ) by the following formula, which is convenient for comparing the prediction frame and the prior frame
  • the intersection ratio of is predicted based on the best prior box.
  • the number of input feature map channels of the prediction module is (Bx(5+C)), where B represents the number of bounding boxes that each unit can predict (2 here), and each bounding box has 5+C attributes, which are described separately Coordinate size, confidence and class C probability of each bounding box. If the center of the object is in the receptive field of the cell (the receptive field is the area where the input image is visible to the cell), then the cell is responsible for predicting the object.
  • the loss function of the prediction module is mainly composed of coordinate loss, confidence loss and classification loss. The formula is as follows:
  • S 2 represents the number of grids at each scale
  • B represents the number of bounding boxes predicted at each scale (take 2)
  • the first two items represent the coordinate loss, and the mean square error function is used.
  • ⁇ center and ⁇ coord are used to control the weight of the center regression and the width and height regression, and are generally set to 1 and 2;
  • the third and fourth items represent the confidence loss, and the cross entropy function is used.
  • each arithmetic question is extracted according to the frame coordinates predicted by the detection module, and the marked text information is used as a label for the training of the recognition module.
  • the CRNN model is used to extract the semantic information of the arithmetic question.
  • the CRNN model is often used for end-to-end variable-length text sequence recognition. It does not need to segment a single text first, but converts text recognition into a sequence-dependent sequence learning problem.
  • the workflow is to give a single-channel grayscale image of the input, first extract the features through the convolutional neural network CNN to obtain the feature map, then convert it into a sequence and send it to the bidirectional long short-term memory network LSTM to obtain the sequence features, and finally pass the time series classification CTC transcription. to get the final tag sequence.
  • the last two pooling layers in the convolutional neural network CNN are changed from 2x2 to 1x2. Considering that most of the text output by the detection module is small in height and long in width, using a 1x2 pooling window can try to ensure that the width direction is not lost. Information.
  • B -1 (l) represents the set of all paths transformed from sequence to sequence mapping function B is l, and ⁇ is one of the paths, and the probability of each path is the product of the corresponding character distribution probability in each time step .
  • This probability value is maximized by training the network, and the loss function is defined as the negative maximum likelihood function of the probability.
  • the testing phase only the characters with the highest probability at each time step are spliced, and then according to the above blank mechanism, namely The final prediction result can be obtained.
  • the arithmetic logic operation is used to determine whether each arithmetic question is correct or not, and the correct answer will be given to the wrong question.
  • the arithmetic question review system based on MixNet-YOLOv3 and Convolutional Recurrent Neural Network CRNN proposed by the present invention can automatically identify the meaning of each arithmetic question in the test paper and make judgments in a very short time, reducing the traditional The labor and time cost brought by manual marking of test papers improves teaching efficiency.
  • Figure 1 is the flow chart of the arithmetic question markup system based on MixNet-YOLOv3 and convolutional recurrent neural network CRNN.
  • Figure 2 is the network structure diagram of MixNet-YOLOv3.
  • Figure 3 is a network structure diagram of the convolutional recurrent neural network CRNN.
  • the present invention discloses an arithmetic question review system based on MixNet-YOLOv3 and Convolutional Recurrent Neural Network CRNN.
  • the system is mainly composed of a detection module and a recognition module.
  • the detection module is mainly composed of three parts: image preprocessing, MixNet feature network, and YOLOv3head prediction network.
  • the preprocessed image will pass through the MixNet-YOLOv3 network fused with multi-scale semantic features and positioning features to obtain the border and category information of each arithmetic question.
  • the recognition module is mainly composed of three parts: CRNN feature network, CTC transcoding and arithmetic, and arithmetic logic discrimination.
  • the classified CTC decoding mechanism obtains the real semantic information of the topic, and finally, through arithmetic logic, it can be judged whether each topic is correct or not.
  • Step A Preprocess the input image data set of the original detection.
  • the labeling frame of the image is generated by the LabelImg labeling software, and then the original data set is expanded by data enhancement methods such as translation, rotation, and cropping to enhance the generalization of the data. robustness.
  • data enhancement methods such as translation, rotation, and cropping to enhance the generalization of the data. robustness.
  • a total of 4 a priori boxes in 2 scales are obtained through the K-means cluster analysis algorithm, which is used for the training of the detection network.
  • step B the entire input image is input into the MixNet network model, and the image localization and semantic feature extraction are performed to obtain multi-scale features representing the global information of the image.
  • the MixNet network replaces the depthwise separable convolution structure with the mixed depth convolution structure of different convolution kernels, which greatly reduces the amount of network parameters and enhances the network's ability to integrate multi-scale semantic and localization features, thereby extracting more System-wide features.
  • the MixNet network consists of a mixed depthwise separable convolution module (MDConv).
  • MDConv mixed depthwise separable convolution module
  • the traditional depthwise separable convolution is to group the input channels, and each group uses the same size convolution kernel, while the mixed depthwise separable convolution Product is to use convolution kernels of different sizes on the basis of the former. By fusing different convolution kernel sizes into a single convolution operation, it can obtain different modes with multiple resolutions.
  • Step C use the feature pyramid network FPN feature fusion technology to fuse the features of different scales, and send them to the corresponding YOLOv3 prediction module respectively.
  • the two scale features obtained by the MixNet feature network are concat fused and sent to the In the detection part of YOLOv3
  • the predicted frame position and category information are obtained through a series of convolution operations, and then the output feature map is calculated according to the two groups of a priori frames previously clustered, and is predicted based on the best a priori frame, Calculate the loss function according to the predicted coordinates, confidence, category information and label information, and obtain a more accurate detection model through iterative training.
  • Feature pyramid network FPN network fuses features of different scales, organically combines positioning information and semantic information by means of upsampling and channel splicing, and outputs 8-fold and 16-fold down-sampling feature maps, which are sent to the prediction module of YOLOv3.
  • the prediction module will convert the relative position of the frame (t x , ty , p w , ph ) to the absolute position (b x , by y , b w , b h ) by the following formula, which is convenient for comparing the prediction frame and the prior frame The intersection ratio of , is predicted based on the best prior box.
  • the number of input feature map channels of the prediction module is (Bx(5+C)), where B represents the number of bounding boxes that each unit can predict (2 here), and each bounding box has 5+C attributes, which are described separately Coordinate size, confidence and class C probability of each bounding box. If the center of the object is in the receptive field of the cell (the receptive field is the area where the input image is visible to the cell), then the cell is responsible for predicting the object.
  • the loss function of the prediction module is mainly composed of coordinate loss, confidence loss and classification loss. The formula is as follows:
  • S 2 represents the number of grids at each scale
  • B represents the number of bounding boxes predicted at each scale (take 2), Indicates whether the jth box of the ith grid is responsible for detecting this object.
  • the first two items represent the coordinate loss, and the mean square error function is used.
  • ⁇ center and ⁇ coord are used to control the weight of the center regression and width and height regression, which are generally set to 1 and 2;
  • the third and fourth items represent the confidence loss, and the cross entropy function is used.
  • the last item represents the category loss, and the cross-entropy function is used to calculate the category probability for each grid responsible for detection.
  • step D according to the frame of the arithmetic question output by the detection network, the extracted arithmetic question and the marked text information are combined to form a data set of the recognition module.
  • the extracted arithmetic problem image is converted into a grayscale image and then sent to the convolutional recurrent neural network CRNN network, first through CNN (3x3 convolution and pooling) to extract features to obtain a feature map, and then convert it into a sequence to send Enter the bidirectional long short-term memory network BLSTM to obtain sequence features, and finally obtain the final semantic information through the connection time series classification CTC transcription.
  • the last two pooling layers in the convolutional neural network CNN are changed from 2x2 to 1x2. Considering that most of the text output by the detection module is small in height and long in width, using a 1x2 pooling window can try to ensure that the width direction is not lost. Information.
  • B -1 (l) represents the set of all paths transformed from sequence to sequence mapping function B is l, and ⁇ is one of the paths, and the probability of each path is the product of the corresponding character distribution probability in each time step .
  • This probability value is maximized through training, and the loss function is defined as the negative maximum likelihood function of the probability.
  • the testing phase only the characters with the highest probability at each time step are spliced together, and then the above blank mechanism can be obtained. the final forecast result.
  • step F arithmetic and logic operations are performed according to the semantic information obtained by the recognition module, thereby judging whether each question is correct or not, and the correct answer will be given to the wrong question.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Seasonings (AREA)

Abstract

一种基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,该智能批阅系统包括检测模块和识别模块两个模块。鉴于算术题分布密集、字体多变的特点和网络轻量化的需求,检测模块采用多尺度语义和定位特征融合的MixNet-YOLOv3网络来实现算术题的边框信息提取;识别模块将前一模块抽取的算术题图像通过基于联结时序分类CTC解码机制的卷积递归神经网络CRNN网络,得到算术题目的语义信息;最后通过算术逻辑运算来判断每道算术题目的正确与否。

Description

[根据细则26改正01.09.2021] 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统 技术领域
本发明涉及一种基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统,属于计算机视觉及自然语言处理交叉的文本检测和识别领域。
背景技术
随着科技水平的进步和教育产业的变革,传统阅卷过程中的效率低、质量和公平无法保证的问题日益凸显,一定程度阻碍了整体教学水平的提高。与此同时,数字图像处理以及计算机视觉技术的飞速发展,让人们将目光逐渐投向智能阅卷,智能阅卷能够极大地提高阅卷的效率,降低教师的工作负担,节省家长的时间,优化教育资源的配置。
以中小学生的算术题为例,考虑到算术题一般是由印刷体跟手写体构成,出版商的印刷体样式、学生的手写体风格层出不穷,还存在一定程度的涂改,基于图像处理和机器学习的传统的光学字符识别OCR(Optical Character Recognition)技术无法满足复杂场景的检测任务。
典型的OCR技术路线包括以下三个部分:图像预处理、文本检测和文本识别,其中影响识别准确率的技术瓶颈是文本检测和文本识别,而这两部分也是OCR技术的重中之重。在传统OCR技术中,图像预处理通常是针对图像的成像问题进行修正,常见的预处理过程包括:几何变换、畸变校正、去除模糊、图像增强和光线校正等。文本检测即检测文本的所在位置和范围及其布局,通常也包括版面分析和文字行检测等,文字检测主要解决的问题是哪里有文字,文字的范围有多大。文本识别是在文本检测的基础上,对文本内容进行识别,将图像中的文本信息转化为文本信息,文字识别主要解决的问题是每个文字是什么。
发明内容
发明目的:为了解决上述问题,本发明提出一个端到端的算术题批阅系统,系统主要涵盖检测跟识别两个分支,首先采用YOLOv3算法检测每道题目的边界,通过权重分配使网络更容易学习水平的边界,并在不损失精度的前提下将提取网络替换为更为轻量化的MixNet网络;在识别部分,采用精度跟效率结合的卷积递归神经网络CRNN网络,通过卷积神经网络CNN(Convolutional Neural Networks)和长短期记忆网络LSTM(Long Short-Term Memory)的组合使网络准确 的学习到题目的语义信息,最后通过算术逻辑判断对错和返回正确答案。
技术方案:本发明技术方案实现包括如下几个步骤:
(1)对原始检测的输入图片数据集进行预处理,通过数据增强手段扩展样本的泛化性,同时利用K均值聚类算法生成适应该场景的先验框,用于检测网络的训练;
(2)将整张的输入图片输入到轻量级MixNet网络模型中,进行图像定位和语义特征的提取,得到表示图像全局信息的多尺度特征;
(3)利用特征金字塔网络FPN(Feature Pyramid Networks)特征融合技术,融合不同尺度的特征,分别输送到对应的YOLOv3预测模块中,并根据两组先验框分别对产生的特征图进行计算,基于最佳的先验框来预测,并根据预测出的坐标、置信度、类别信息与标签信息进行损失函数的计算,通过迭代训练得到更为精确的检测模型;
(4)根据检测网络输出的算术题边框,将抽取的算术题和标注的文本信息组合起来,构成识别模块的数据集;
(5)将抽取的算术题图像转成灰度图送入卷积递归神经网络CRNN网络,需要先经过卷积神经网络CNN提取特征得到特征图,再转换成序列送入双向长短期记忆网络LSTM获取序列特征,最后经过联结时序分类CTC机制转录得到最终的语义信息;
(6)根据识别模块得到的语义信息,进行算术逻辑的运算,从而判断出每道题目的正确与否,对于做错的题目会给出正确答案。
作为优选,所述的步骤(1)首先通过LabelImg标注软件生成图片的标注框,然后利用平移、旋转、裁剪等数据增强手段扩充原始的数据集,增强数据的泛化性和鲁棒性。针对数据集中的边框信息,通过K均值聚类分析算法得到2个尺度共4个先验框,用于检测网络的训练。
作为优选,所述的步骤(2)采用MixNet网络提取图像多尺度特征。通过不同卷积核的混合深度卷积结构代替深度可分离卷积结构,在极大的减少网络参数量的同时,增强了网络融合多尺度语义和定位特征的能力,从而提取到更为系统全面的特征。MixNet网络由混合深度可分离卷积模块(MDConv)组成,旨在将不同的卷积核尺寸融合到一个单独的卷积操作,使其可以易于获取具有多个分辨 率的不同模式。其中MDConv卷积操作具有多个设计选择:
(2.1)组大小g:决定了用于一个单独输入张量的不同类型卷积核的个数。在g=1的极端情况,MDConv等同于普通深度卷积。对于MobileNets,g=4时MDConv可以提高模型的准确性和效率。
(2.2)每个组的卷积核尺寸:理论上,每个组可以有任意尺寸的卷积核。但是,如果两个组有同样尺寸的卷积核,那等同于将两个组合并为一个单独的组,因此必须限制每个组必须拥有不同尺寸的卷积核。进一步,因为小尺寸的卷积核通常拥有更少的参数和FLOPS,限制卷积核尺寸通常从3x3开始,每组单调增加2。换言之,i组的卷积核尺寸通常为2i+1。例如,一个4组的MDConv通常使用的卷积核尺寸为{3x3,5x5,7x7,9x9}。在此限制下,每个组的卷积核尺寸相对于其组大小g已经预定义好了,因而简化了设计过程。
(2.3)每个组的通道大小:在此主要考虑两种通道划分方法:(a)等分:每组通道的数目一致;(b)指数划分:第i组通道数占总通道数的2 -i。例如,一个4组MDConv共有32个通道,等分将通道划分为(8,8,8,8),而指数划分将通道划分为(16,8,4,4)。
作为优选,所述的步骤(3)通过特征金字塔网络FPN(Feature Pyramid Networks)融合不同尺度的特征,采用上采样和通道拼接的方式将定位信息和语义信息有机结合,输出8倍和16倍下采样的特征图,送入到YOLOv3的预测模块。预测模块会将边框的相对位置(t x、t y、p w、p h)通过如下公式转换成绝对位置(b x、b y、b w、b h),便于比较预测框和先验框的交并比,基于最佳的先验框来预测。
b x=σ(t x)+c x
b y=σ(t y)+c y
Figure PCTCN2021099935-appb-000001
Figure PCTCN2021099935-appb-000002
预测模块的输入特征图通道数是(Bx(5+C)),其中B代表每个单元可以预测的边界框数量(这里取2),每个边界框都有5+C个属性,分别描述每个边界框的坐标尺寸、置信度和C类概率。如果对象的中心位于单元格的感受野(感 受野是输入图像对于单元格可见的区域),则由该单元格负责预测对象。预测模块的损失函数主要由坐标损失、置信度损失和分类损失构成,公式如下:
Figure PCTCN2021099935-appb-000003
其中,S 2表示每个尺度的网格数,B表示每个尺度预测的边框数(取2),
Figure PCTCN2021099935-appb-000004
表示第i个网格的第j个box是否负责检测这个object。前两项表示坐标损失,采用均方误差函数,λ center和λ coord用于控制中心回归和宽高回归的权重,一般设置为1,2;三、四项表示置信度损失,采用交叉熵函数,由于不负责检测的边框比重较高,通过设置λ noobj=2来加速置信度的收敛;最后一项表示类别损失,采用交叉熵函数,每一个负责检测的网格计算类别概率。
作为优选,所述的步骤(4)根据检测模块预测的边框坐标提取出每道算术题,标注的文本信息作为标签,用于识别模块的训练。
作为优选,所述的步骤(5)利用CRNN模型对算术题语义信息进行提取。CRNN模型常用于端到端不定长文本序列识别,不用先对单个文字进行分割,而是将文本识别转换成时序依赖的序列学习问题。其工作流程是给定输入的单通道灰度图,先经过卷积神经网络CNN提取特征得到特征图,再转换成序列送入双向长短期记忆网络LSTM获取序列特征,最后经过联结时序分类CTC转录得到最终的标签序列。其中卷积神经网络CNN中的最后两个池化层由2x2改为1x2,考虑到检测模块输出的文本多数是高较小而宽较长,使用1x2的池化窗口可以尽量保证不丢失宽度方向的信息。
将循环神经网络RNN(Recurrent Neural Network)输出的序列翻译成最终结 果的过程中会出现很多冗余信息,比如一个字母被连续识别两次,这时需要利用blank空白机制来解决RNN网络输出中存在的冗余信息,通过在重复的字符之间插入一个“-”(代表blank空白),对于相同字符进行合并(用blank空白字符隔开的除外),即可解决重复字符的问题。
对于RNN给定输入概率分布矩阵x=(x 1,x 2,...,x T),T是序列长度,最后映射为标签文本l的概率为:
Figure PCTCN2021099935-appb-000005
其中B -1(l)表示从序列到序列的映射函数B变换后是l的所有路径集合,而π则是其中的一条路径,每条路径的概率为各个时间步中对应字符分布概率的乘积。通过训练网络使这个概率值最大化,而损失函数定义为概率的负最大似然函数,而在测试阶段,只需将每个时间步概率最大的字符进行拼接,再根据上述的blank空白机制即可得到最终的预测结果。
作为优选,所述的步骤(6)根据识别模块预测的语义信息,通过算术逻辑运算判别每道算术题的正确与否,对于做错的题目将给出正确答案。
有益效果:本发明所提出的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统,能够在极短的时间内自动识别试卷中每道算术题的含义并做出判断,减轻了传统手工批阅试卷带来的人力和时间成本,提高了教学效率。
附图说明
图1是基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统流程图。
图2是MixNet-YOLOv3的网络结构图。
图3是卷积递归神经网络CRNN的网络结构图。
具体实施方式
下面结合附图对本发明的技术方案进行详细说明:
如图1所示,本发明公开了一种基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统,系统主要由检测模块和识别模块两大模块组成。检测模块主要由图像预处理、MixNet特征网络、YOLOv3head预测网络三部分构成,经过预处理的图片会通过多尺度语义特征和定位特征融合的MixNet-YOLOv3网络,得到每道算术题的边框和类别信息,随后经过抽取与标注的算术题将会送入 识别模块;识别模块主要由CRNN特征网络、CTC转码编译、算术逻辑判别三部分构成,算术题经过CRNN网络提取文本特征后,利用基于联结时序分类的CTC解码机制,得到题目的真正语义信息,最后通过算术逻辑即可判断每道题目正确与否。下面结合具体实施,进一步阐述本发明,主要包括如下步骤:
步骤A,对原始检测的输入图片数据集进行预处理,首先通过LabelImg标注软件生成图片的标注框,然后利用平移、旋转、裁剪等数据增强手段扩充原始的数据集,增强数据的泛化性和鲁棒性。针对数据集中的边框信息,通过K均值聚类分析算法得到2个尺度共4个先验框,用于检测网络的训练。
步骤B,将整张输入图片输入到MixNet网络模型中,进行图像定位和语义特征的提取,得到表示图像全局信息的多尺度特征。MixNet网络通过不同卷积核的混合深度卷积结构代替深度可分离卷积结构,在极大的减少网络参数量的同时,增强了网络融合多尺度语义和定位特征的能力,从而提取到更为系统全面的特征。
如图2所示,MixNet网络由混合深度可分离卷积模块(MDConv)组成,传统的深度可分离卷积是把输入通道分组,每组使用相同大小的卷积核,而混合深度可分离卷积则是在前者的基础上使用不同大小的卷积核,通过将不同的卷积核尺寸融合到一个单独的卷积操作,使其可以获取具有多个分辨率的不同模式。
步骤C,利用特征金字塔网络FPN特征融合技术,融合不同尺度的特征,分别输送到对应的YOLOv3预测模块中,如图2所示,MixNet特征网络得到的两个尺度特征经过concat融合,送入到YOLOv3检测部分,经过一系列卷积操作得到预测的边框位置及类别信息,随后根据之前聚类的两组先验框分别对输出的特征图进行计算,并基于最佳的先验框来预测,根据预测出的坐标、置信度、类别信息与标签信息进行损失函数的计算,通过迭代训练得到更为精确的检测模型。特征金字塔网络FPN网络融合不同尺度的特征,采用上采样和通道拼接的方式将定位信息和语义信息有机结合,输出8倍和16倍下采样的特征图,送入到YOLOv3的预测模块。预测模块会将边框的相对位置(t x、t y、p w、p h)通过如下公式转换成绝对位置(b x、b y、b w、b h),便于比较预测框和先验框的交并比,基于最佳的先验框来预测。
b x=σ(t x)+c x
b y=σ(t y)+c y
Figure PCTCN2021099935-appb-000006
Figure PCTCN2021099935-appb-000007
预测模块的输入特征图通道数是(Bx(5+C)),其中B代表每个单元可以预测的边界框数量(这里取2),每个边界框都有5+C个属性,分别描述每个边界框的坐标尺寸、置信度和C类概率。如果对象的中心位于单元格的感受野(感受野是输入图像对于单元格可见的区域),则由该单元格负责预测对象。预测模块的损失函数主要由坐标损失、置信度损失和分类损失构成,公式如下:
Figure PCTCN2021099935-appb-000008
其中,S 2表示每个尺度的网格数,B表示每个尺度预测的边框数(取2),
Figure PCTCN2021099935-appb-000009
表示第i个网格的第j个box是否负责检测这个object目标。前两项表示坐标损失,采用均方误差函数,λ center和λ coord用于控制中心回归和宽高回归的权重,一般设置为1,2;三、四项表示置信度损失,采用交叉熵函数,由于不负责检测的边框比重较高,通过设置λ noobj=2来加速置信度的收敛;最后一项表示类别损失,采用交叉熵函数,每一个负责检测的网格计算类别概率。
步骤D,根据检测网络输出的算术题边框,将抽取的算术题和标注的文本信息组合起来,构成识别模块的数据集。如图3所示,抽取的算术题图像转成灰度图后会送入卷积递归神经网络CRNN网络,先经过CNN(3x3卷积和池化)提取特征得到特征图,再转换成序列送入双向长短期记忆网络BLSTM获取序列特征,最后经过联结时序分类CTC转录得到最终的语义信息。其中卷积神经网络CNN中的最后两个池化层由2x2改为1x2,考虑到检测模块输出的文本多数是高 较小而宽较长,使用1x2的池化窗口可以尽量保证不丢失宽度方向的信息。
将循环神经网络RNN输出的序列翻译成最终结果的过程中会出现很多冗余信息,比如一个字母被连续识别两次,这时需要利用blank机制来解决RNN输出中存在的冗余信息,通过在重复的字符之间插入一个“-”(代表blank),对于相同字符进行合并(用blank字符隔开的除外),即可解决重复字符的问题。对于RNN给定输入概率分布矩阵x=(x 1,x 2,...,x T),T是序列长度,最后映射为标签文本l的概率为:
Figure PCTCN2021099935-appb-000010
其中B -1(l)表示从序列到序列的映射函数B变换后是l的所有路径集合,而π则是其中的一条路径,每条路径的概率为各个时间步中对应字符分布概率的乘积。通过训练使这个概率值最大化,而损失函数定义为概率的负最大似然函数,而在测试阶段,只需将每个时间步概率最大的字符进行拼接,再根据上述的blank机制即可得到最终的预测结果。
步骤F,根据识别模块得到的语义信息,进行算术逻辑的运算,从而判断出每道题目的正确与否,对于做错的题目会给出正确答案。
以上实施例仅为说明本发明的技术思想,不能以此限定本发明的保护范围,凡是按照本发明提出的技术思想,在技术方案基础上所做的任何改动,均落入本发明保护范围之内。

Claims (7)

  1. 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    包括如下步骤:
    (1)对原始检测的输入图片数据集进行预处理,通过数据增强手段增强样本的泛化性,同时利用K均值聚类算法生成适应该场景的两组先验框,用于检测网络的训练;
    (2)将整张的输入图片输入到轻量级的MixNet网络模型中,进行图像定位和语义特征的提取,得到表示图像全局信息的多尺度特征;
    (3)利用特征金字塔网络FPN特征融合技术,融合不同尺度的特征,分别输送到对应的YOLOv3预测模块中,并根据两组先验框分别对产生的特征图进行计算,基于最佳的先验框来预测,并根据预测出的坐标、置信度、类别信息与标签信息进行损失函数的计算,通过迭代训练得到更为精确的检测模型;
    (4)根据检测网络输出的算术题边框,将抽取的算术题和标注的文本信息组合起来,构成识别模块的数据集;
    (5)将抽取的算术题图像转成灰度图送入卷积递归神经网络CRNN网络,首先经过卷积神经网络CNN提取特征得到特征图,再转换成序列送入双向长短期记忆网络LSTM获取序列特征,最后经过联结时序分类CTC算法转录得到最终的语义信息;
    (6)根据识别模块得到的即得语义信息,进行算术逻辑的运算操作,从而判断出每道算术题目的正确与否,对于做错的题目会给出正确答案。
  2. [根据细则26改正14.07.2021]
    根据权利要求1所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(1)首先通过LabelImg标注软件生成图片的标注框,然后利用数据增强手段扩充原始的数据集,增强数据的泛化性和鲁棒性;针对数据集中的边框信息,通过K均值聚类算法得到2个尺度共4个先验框,用于检测网络的训练。
  3. [根据细则26改正14.07.2021]
    根据权利要求1所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(2)采用轻量级MixNet网络提取图像多尺度特征。
  4. [根据细则26改正14.07.2021]
    根据权利要求1所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(3)通过特征金字塔FPN网络融合不同尺度的特征,采用上采样和通道拼接的方式将定位信息和语义信息有机结合,输出8倍和16倍下采样的特征图,送入到YOLOv3的预测模块,预测模块将边框的相对位置(t x、t y、p w、p h)通过如下公式转换成绝对位置(b x、b y、b w、b h),便于比较预测框和先验框的交并比,基于最佳的先验框来预测:
    b x=σ(t x)+c x
    b y=σ(t y)+c y
    Figure PCTCN2021099935-appb-100001
    Figure PCTCN2021099935-appb-100002
    预测模块的输入特征图通道数是(B x(5+C)),其中B代表每个单元可以预测的边界框数量(这里取2),每个边界框都有5+C个属性,分别描述每个边界框的坐标尺寸、置信度和C类概率,如果对象的中心位于单元格的感受野,则由该单元格负责预测对象,所述感受野是输入图像对于单元格可见的区域,预测模块的损失函数主要由坐标损失、置信度损失和分类损失构成,公式如下:
    Figure PCTCN2021099935-appb-100003
    其中,S 2表示每个尺度的网格数,B表示每个尺度预测的边框数(取2),
    Figure PCTCN2021099935-appb-100004
    表示第i个网格的第j个box是否负责检测这个对象,前两项表示坐标损失,采用均方误差函数,λ center和λ coord用于控制中心回归和宽高回归的权重,一般设置为1,2;三、四项表示置信度损失,采用交叉熵函数,由于不负责检测的边框比重 较高,通过设置λ noobj=2来加速置信度的收敛;最后一项表示类别损失,采用交叉熵函数,每一个负责检测的网格计算类别概率。
  5. [根据细则26改正14.07.2021]
    根据权利要求1所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(4)根据检测模块预测的边框坐标提取出每道算术题,标注的文本信息作为标签,用于识别模块的训练。
  6. [根据细则26改正14.07.2021]
    根据权利要求1所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(5)利用卷积递归神经网络CRNN模型对算术题语义信息进行提取。
  7. [根据细则26改正14.07.2021]
    根据权利要求1-6任意一项所述的基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统,其特征在于,
    所述的步骤(6)根据识别模块预测的语义信息,通过算术逻辑运算判别每道算术题的正确与否,对于做错的题目将给出正确答案。
PCT/CN2021/099935 2021-01-09 2021-06-15 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统 WO2022147965A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110026927.2 2021-01-09
CN202110026927.2A CN112528963A (zh) 2021-01-09 2021-01-09 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统

Publications (1)

Publication Number Publication Date
WO2022147965A1 true WO2022147965A1 (zh) 2022-07-14

Family

ID=74977418

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099935 WO2022147965A1 (zh) 2021-01-09 2021-06-15 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统

Country Status (3)

Country Link
CN (1) CN112528963A (zh)
LU (1) LU502472B1 (zh)
WO (1) WO2022147965A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170883A (zh) * 2022-07-19 2022-10-11 哈尔滨市科佳通用机电股份有限公司 一种制动缸活塞推杆开口销丢失故障检测方法
CN115578719A (zh) * 2022-10-13 2023-01-06 中国矿业大学 一种基于ym_ssh的轻量级目标检测的疲劳状态检测方法
CN115830302A (zh) * 2023-02-24 2023-03-21 国网江西省电力有限公司电力科学研究院 一种多尺度特征提取融合配电网设备定位识别方法
CN116630755A (zh) * 2023-04-10 2023-08-22 雄安创新研究院 一种检测场景图像中的文本位置的方法、系统和存储介质
CN116626166A (zh) * 2023-07-26 2023-08-22 中兴海陆工程有限公司 一种基于改进YOLOv5金属焊缝缺陷检测方法
CN116704487A (zh) * 2023-06-12 2023-09-05 三峡大学 一种基于Yolov5s网络和CRNN的车牌检测与识别方法
CN116933114A (zh) * 2023-06-12 2023-10-24 浙江大学 一种基于cnn-lstm的直流微电网检测方法及装置
CN116958713A (zh) * 2023-09-20 2023-10-27 中航西安飞机工业集团股份有限公司 一种航空零部件表面紧固件快速识别与统计方法及系统
CN116978052A (zh) * 2023-07-21 2023-10-31 安徽省交通规划设计研究总院股份有限公司 基于改进YOLOv5的桥梁设计图的子图布局识别方法
CN117058493A (zh) * 2023-10-13 2023-11-14 之江实验室 一种图像识别的安全防御方法、装置和计算机设备
CN117313791A (zh) * 2023-11-30 2023-12-29 青岛科技大学 基于GCL-Peephole的车联网智能无线感知算法
CN117523205A (zh) * 2024-01-03 2024-02-06 广州锟元方青医疗科技有限公司 少样本ki67多类别细胞核的分割识别方法
CN117523428A (zh) * 2023-11-08 2024-02-06 中国人民解放军军事科学院系统工程研究院 基于飞行器平台的地面目标检测方法和装置
CN117809318A (zh) * 2024-03-01 2024-04-02 微山同在电子信息科技有限公司 基于机器视觉的甲骨文识别方法及其系统
CN117830788A (zh) * 2024-03-06 2024-04-05 潍坊科技学院 一种多源信息融合的图像目标检测方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528963A (zh) * 2021-01-09 2021-03-19 江苏拓邮信息智能技术研究院有限公司 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统
CN113435441A (zh) * 2021-07-22 2021-09-24 广州华腾教育科技股份有限公司 基于Bi-LSTM机制的四则运算算式图像智能批改方法
CN113344145B (zh) * 2021-08-02 2021-11-19 智道网联科技(北京)有限公司 字符识别方法、装置、电子设备和存储介质
CN113469147B (zh) * 2021-09-02 2021-12-17 北京世纪好未来教育科技有限公司 答题卡识别方法、装置、电子设备以及存储介质
CN113901879A (zh) * 2021-09-13 2022-01-07 昆明理工大学 融合多尺度语义特征图的缅甸语图像文本识别方法及装置
CN113837157B (zh) * 2021-11-26 2022-02-15 北京世纪好未来教育科技有限公司 题目类型识别方法、系统和存储介质
CN114694133B (zh) * 2022-05-30 2022-09-16 南京华苏科技有限公司 一种基于图像处理与深度学习相结合的文本识别方法
CN115147642A (zh) * 2022-06-02 2022-10-04 盛视科技股份有限公司 基于视觉的渣土车检测方法、装置、计算机及存储介质
CN116128458B (zh) * 2023-04-12 2024-02-20 华中科技大学同济医学院附属同济医院 用于医院经费卡报账的智能自动审核系统

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858414A (zh) * 2019-01-21 2019-06-07 南京邮电大学 一种发票分块检测方法
CN110147807A (zh) * 2019-01-04 2019-08-20 上海海事大学 一种船舶智能识别跟踪方法
CN110399845A (zh) * 2019-07-29 2019-11-01 上海海事大学 一种图像中连续成段文本检测与识别方法
CN110969052A (zh) * 2018-09-29 2020-04-07 杭州萤石软件有限公司 一种作业批改方法和设备
CN111046886A (zh) * 2019-12-12 2020-04-21 吉林大学 号码牌自动识别方法、装置、设备及计算机可读存储介质
CN111310773A (zh) * 2020-03-27 2020-06-19 西安电子科技大学 一种高效的卷积神经网络的车牌定位方法
CN111310861A (zh) * 2020-03-27 2020-06-19 西安电子科技大学 一种基于深度神经网络的车牌识别和定位方法
CN111368828A (zh) * 2020-02-27 2020-07-03 大象慧云信息技术有限公司 一种多票据的识别方法及装置
CN111401371A (zh) * 2020-06-03 2020-07-10 中邮消费金融有限公司 一种文本检测识别方法、系统及计算机设备
CN111553201A (zh) * 2020-04-08 2020-08-18 东南大学 一种基于YOLOv3优化算法的交通灯检测方法
CN111898699A (zh) * 2020-08-11 2020-11-06 海之韵(苏州)科技有限公司 一种船体目标自动检测识别方法
CN112101433A (zh) * 2020-09-04 2020-12-18 东南大学 一种基于YOLO V4和DeepSORT的分车道车辆自动计数方法
CN112528963A (zh) * 2021-01-09 2021-03-19 江苏拓邮信息智能技术研究院有限公司 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969052A (zh) * 2018-09-29 2020-04-07 杭州萤石软件有限公司 一种作业批改方法和设备
CN110147807A (zh) * 2019-01-04 2019-08-20 上海海事大学 一种船舶智能识别跟踪方法
CN109858414A (zh) * 2019-01-21 2019-06-07 南京邮电大学 一种发票分块检测方法
CN110399845A (zh) * 2019-07-29 2019-11-01 上海海事大学 一种图像中连续成段文本检测与识别方法
CN111046886A (zh) * 2019-12-12 2020-04-21 吉林大学 号码牌自动识别方法、装置、设备及计算机可读存储介质
CN111368828A (zh) * 2020-02-27 2020-07-03 大象慧云信息技术有限公司 一种多票据的识别方法及装置
CN111310861A (zh) * 2020-03-27 2020-06-19 西安电子科技大学 一种基于深度神经网络的车牌识别和定位方法
CN111310773A (zh) * 2020-03-27 2020-06-19 西安电子科技大学 一种高效的卷积神经网络的车牌定位方法
CN111553201A (zh) * 2020-04-08 2020-08-18 东南大学 一种基于YOLOv3优化算法的交通灯检测方法
CN111401371A (zh) * 2020-06-03 2020-07-10 中邮消费金融有限公司 一种文本检测识别方法、系统及计算机设备
CN111898699A (zh) * 2020-08-11 2020-11-06 海之韵(苏州)科技有限公司 一种船体目标自动检测识别方法
CN112101433A (zh) * 2020-09-04 2020-12-18 东南大学 一种基于YOLO V4和DeepSORT的分车道车辆自动计数方法
CN112528963A (zh) * 2021-01-09 2021-03-19 江苏拓邮信息智能技术研究院有限公司 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题智能批阅系统

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170883B (zh) * 2022-07-19 2023-03-14 哈尔滨市科佳通用机电股份有限公司 一种制动缸活塞推杆开口销丢失故障检测方法
CN115170883A (zh) * 2022-07-19 2022-10-11 哈尔滨市科佳通用机电股份有限公司 一种制动缸活塞推杆开口销丢失故障检测方法
CN115578719A (zh) * 2022-10-13 2023-01-06 中国矿业大学 一种基于ym_ssh的轻量级目标检测的疲劳状态检测方法
CN115578719B (zh) * 2022-10-13 2024-05-17 中国矿业大学 一种基于ym_ssh的轻量级目标检测的疲劳状态检测方法
CN115830302A (zh) * 2023-02-24 2023-03-21 国网江西省电力有限公司电力科学研究院 一种多尺度特征提取融合配电网设备定位识别方法
CN115830302B (zh) * 2023-02-24 2023-07-04 国网江西省电力有限公司电力科学研究院 一种多尺度特征提取融合配电网设备定位识别方法
CN116630755B (zh) * 2023-04-10 2024-04-02 雄安创新研究院 一种检测场景图像中的文本位置的方法、系统和存储介质
CN116630755A (zh) * 2023-04-10 2023-08-22 雄安创新研究院 一种检测场景图像中的文本位置的方法、系统和存储介质
CN116704487A (zh) * 2023-06-12 2023-09-05 三峡大学 一种基于Yolov5s网络和CRNN的车牌检测与识别方法
CN116704487B (zh) * 2023-06-12 2024-06-11 三峡大学 一种基于Yolov5s网络和CRNN的车牌检测与识别方法
CN116933114A (zh) * 2023-06-12 2023-10-24 浙江大学 一种基于cnn-lstm的直流微电网检测方法及装置
CN116978052A (zh) * 2023-07-21 2023-10-31 安徽省交通规划设计研究总院股份有限公司 基于改进YOLOv5的桥梁设计图的子图布局识别方法
CN116978052B (zh) * 2023-07-21 2024-04-09 安徽省交通规划设计研究总院股份有限公司 基于改进YOLOv5的桥梁设计图的子图布局识别方法
CN116626166B (zh) * 2023-07-26 2023-10-31 中兴海陆工程有限公司 一种基于改进YOLOv5金属焊缝缺陷检测方法
CN116626166A (zh) * 2023-07-26 2023-08-22 中兴海陆工程有限公司 一种基于改进YOLOv5金属焊缝缺陷检测方法
CN116958713A (zh) * 2023-09-20 2023-10-27 中航西安飞机工业集团股份有限公司 一种航空零部件表面紧固件快速识别与统计方法及系统
CN116958713B (zh) * 2023-09-20 2023-12-15 中航西安飞机工业集团股份有限公司 一种航空零部件表面紧固件快速识别与统计方法及系统
CN117058493A (zh) * 2023-10-13 2023-11-14 之江实验室 一种图像识别的安全防御方法、装置和计算机设备
CN117058493B (zh) * 2023-10-13 2024-02-13 之江实验室 一种图像识别的安全防御方法、装置和计算机设备
CN117523428B (zh) * 2023-11-08 2024-03-29 中国人民解放军军事科学院系统工程研究院 基于飞行器平台的地面目标检测方法和装置
CN117523428A (zh) * 2023-11-08 2024-02-06 中国人民解放军军事科学院系统工程研究院 基于飞行器平台的地面目标检测方法和装置
CN117313791B (zh) * 2023-11-30 2024-03-22 青岛科技大学 基于GCL-Peephole的车联网智能无线感知算法
CN117313791A (zh) * 2023-11-30 2023-12-29 青岛科技大学 基于GCL-Peephole的车联网智能无线感知算法
CN117523205B (zh) * 2024-01-03 2024-03-29 广州锟元方青医疗科技有限公司 少样本ki67多类别细胞核的分割识别方法
CN117523205A (zh) * 2024-01-03 2024-02-06 广州锟元方青医疗科技有限公司 少样本ki67多类别细胞核的分割识别方法
CN117809318A (zh) * 2024-03-01 2024-04-02 微山同在电子信息科技有限公司 基于机器视觉的甲骨文识别方法及其系统
CN117809318B (zh) * 2024-03-01 2024-05-28 微山同在电子信息科技有限公司 基于机器视觉的甲骨文识别方法及其系统
CN117830788A (zh) * 2024-03-06 2024-04-05 潍坊科技学院 一种多源信息融合的图像目标检测方法
CN117830788B (zh) * 2024-03-06 2024-05-10 潍坊科技学院 一种多源信息融合的图像目标检测方法

Also Published As

Publication number Publication date
LU502472B1 (en) 2022-11-18
CN112528963A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
WO2022147965A1 (zh) 基于MixNet-YOLOv3和卷积递归神经网络CRNN的算术题批阅系统
CN110334705B (zh) 一种结合全局和局部信息的场景文本图像的语种识别方法
CN111325203B (zh) 一种基于图像校正的美式车牌识别方法及系统
CN111401410B (zh) 一种基于改进级联神经网络的交通标志检测方法
CN112966684A (zh) 一种注意力机制下的协同学习文字识别方法
CN111061904B (zh) 一种基于图像内容识别的本地图片快速检测方法
CN110502655B (zh) 一种嵌入场景文字信息的图像自然描述语句生成方法
CN109002834A (zh) 基于多模态表征的细粒度图像分类方法
CN107169485A (zh) 一种数学公式识别方法和装置
CN112818951A (zh) 一种票证识别的方法
CN112036447A (zh) 零样本目标检测系统及可学习语义和固定语义融合方法
CN113762269B (zh) 基于神经网络的中文字符ocr识别方法、系统及介质
CN111062277A (zh) 基于单目视觉的手语-唇语转化方法
CN112069900A (zh) 基于卷积神经网络的票据文字识别方法及系统
CN106227836B (zh) 基于图像与文字的无监督联合视觉概念学习系统及方法
CN110334709A (zh) 基于端到端多任务深度学习的车牌检测方法
CN116311310A (zh) 一种结合语义分割和序列预测的通用表格识别方法和装置
CN113780059A (zh) 一种基于多特征点的连续手语识别方法
He Research on text detection and recognition based on OCR recognition technology
CN116258990A (zh) 一种基于跨模态亲和力的小样本参考视频目标分割方法
CN117437647B (zh) 基于深度学习和计算机视觉的甲骨文字检测方法
Zhang et al. All-content text recognition method for financial ticket images
CN111507348A (zh) 基于ctc深度神经网络的文字分割和识别的方法
CN110929013A (zh) 一种基于bottom-up attention和定位信息融合的图片问答实现方法
CN116912872A (zh) 图纸识别方法、装置、设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21917012

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21917012

Country of ref document: EP

Kind code of ref document: A1