CN113033249A - Character recognition method, device, terminal and computer storage medium thereof - Google Patents

Character recognition method, device, terminal and computer storage medium thereof Download PDF

Info

Publication number
CN113033249A
CN113033249A CN201911253120.1A CN201911253120A CN113033249A CN 113033249 A CN113033249 A CN 113033249A CN 201911253120 A CN201911253120 A CN 201911253120A CN 113033249 A CN113033249 A CN 113033249A
Authority
CN
China
Prior art keywords
attention
character
picture
inputting
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911253120.1A
Other languages
Chinese (zh)
Inventor
白翔
王勃飞
徐清泉
许永超
刘少丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Huazhong University of Science and Technology
Original Assignee
ZTE Corp
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp, Huazhong University of Science and Technology filed Critical ZTE Corp
Priority to CN201911253120.1A priority Critical patent/CN113033249A/en
Priority to PCT/CN2020/133116 priority patent/WO2021115159A1/en
Publication of CN113033249A publication Critical patent/CN113033249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a character recognition method, a character recognition device, a terminal and a computer storage medium thereof, wherein feature extraction is carried out on an input picture through a convolutional neural network, then an attention mechanism module with a plurality of channels is input to obtain the attention weight of each channel, each channel of a depth feature map is zoomed to obtain a plurality of attention feature maps, then a full connection layer is input to carry out feature fusion to obtain a character type prediction result, a loss function is designed according to character type marks of the input picture and the character type prediction result during model training, the attention weight is optimized, the character recognition accuracy is improved, and the recognition robustness of a difficult sample is stronger.

Description

Character recognition method, device, terminal and computer storage medium thereof
Technical Field
The embodiment of the application relates to the technical field of computer vision, in particular to a character recognition method, a character recognition device, a terminal and a computer storage medium thereof.
Background
Handwritten Chinese Character Recognition (HCCR) has been a very active and challenging research direction in the field of computer vision, since the 20 th century in the 60 th era, and has made great progress, and many applications in real life are closely related to it, such as mail sorting, bank check reading, book and handwritten note transcription, and so on. Despite much research, the recognition of handwritten chinese characters remains a very challenging task, on one hand, because of the large number of chinese character categories and the large number of near-form characters, which are easily confused; on the other hand, different people have great writing style difference, so that even the same type of characters have obvious visual difference, which brings great difficulty to the recognition of handwritten Chinese characters.
Most existing methods based on deep learning use convolutional neural networks to classify handwritten Chinese characters by learning global semantic features from the entire image, but this is not sufficient for the recognition of visually similar characters because there are often only subtle differences between confusing characters. In particular, the global attention provided by these methods may locate entire characters well, but the attention areas between characters of different classes have large overlap and lack of distinctiveness, which may result in a high recognition error rate for near-word and words of large differences within the classes.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
In a first aspect, embodiments of the present application provide a method for training a character recognition network model, a method for character recognition, an apparatus, a terminal, and a computer storage medium thereof, which can improve accuracy of visually confusable character recognition.
In a second aspect, an embodiment of the present application provides a method for training a character recognition network model, including the following steps:
standardizing each picture in the original data set, and carrying out character type labeling on each picture to obtain a standard training data set with the character type labels;
inputting each picture in the standard training data set into a convolutional neural network, extracting the convolutional characteristic of the picture, and obtaining a depth characteristic map containing the convolutional characteristic;
inputting the depth feature map into an attention mechanism module with a plurality of channels to obtain an attention weight of each channel, and rescaling each channel of the depth feature map by using the attention weight to obtain a plurality of attention feature maps;
inputting each attention feature map into a full-connection layer respectively to obtain a plurality of attention feature vectors;
performing feature fusion on the attention feature vectors, and inputting the attention feature vectors into a character full-connection layer to perform character type prediction;
and designing a target loss function according to the character type prediction result and the character type label, performing iteration by using a back propagation algorithm, minimizing the target loss function, and optimizing the attention weight.
In a third aspect, an embodiment of the present application provides a text recognition method, including:
standardizing the picture to be tested, and zooming to a preset height H and a preset width W;
inputting a picture to be tested into a convolutional neural network, extracting the convolutional characteristic of the picture to be tested, and obtaining a depth characteristic graph containing the convolutional characteristic;
inputting the depth feature map into an attention mechanism module with a plurality of channels to obtain an attention weight of each channel, and rescaling each channel of the depth feature map by using the attention weight to obtain a plurality of attention feature maps;
inputting each attention feature map into a full-connection layer respectively to obtain a plurality of attention feature vectors;
and performing feature fusion on the attention feature vectors, and inputting the attention feature vectors into a character full-connection layer to perform character type prediction.
In a fourth aspect, an embodiment of the present application provides a device for training a character recognition network model, including: the computer program is stored in a memory and can be run on a processor, and the processor when executing the computer program implements the word recognition network model training method according to the embodiment of the second aspect.
In a fifth aspect, an embodiment of the present application provides a text recognition apparatus, including: a memory, a processor and a computer program stored on the memory and operable on the processor, the processor implementing the word recognition method according to the embodiment of the third aspect when executing the computer program.
In a sixth aspect, an embodiment of the present application provides a terminal, including the character recognition network model training apparatus according to the fourth aspect or including the character recognition apparatus according to the fifth aspect.
In a seventh aspect, an embodiment of the present application provides a computer storage medium storing computer-executable instructions for performing the method for training a word recognition network model according to the embodiment of the second aspect or performing the method for word recognition according to the embodiment of the third aspect.
According to the scheme provided by the embodiment of the application: the method comprises the steps of extracting features of an input picture through a convolutional neural network, obtaining distinctive attention features through an attention mechanism module, obtaining character category prediction results after feature fusion, designing a loss function according to character category labels of the input picture and the character category prediction results during model training, and optimizing the attention weight, so that accuracy of character recognition is improved, and recognition robustness on difficult samples is higher.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a schematic flow chart of a method for training a character recognition network model and a method for character recognition according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for training a character recognition network model according to an embodiment of the present application;
fig. 3 is a network structure diagram of a character recognition network model provided in an embodiment of the present application, where "CA" denotes a Channel Attention mechanism (Channel Attention);
FIG. 4 is a diagram of a convolutional neural network architecture provided in an embodiment of the present application;
FIG. 5 is a block diagram of an attention module provided in an embodiment of the present disclosure;
FIG. 6 is a flow chart of a text recognition method according to another embodiment of the present application;
FIG. 7 is a block diagram of a device for training a character recognition network model according to another embodiment of the present application;
fig. 8 is a structural diagram of a character recognition apparatus according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart.
Handwritten Chinese Character Recognition (HCCR) has been a very active and challenging research direction in the field of computer vision, since the 20 th century in the 60 th era, and has made great progress, and many applications in real life are closely related to it, such as mail sorting, bank check reading, book and handwritten note transcription, and so on. Despite much research, the recognition of handwritten chinese characters remains a very challenging task, on one hand, because of the large number of chinese character categories and the large number of near-form characters, which are easily confused; on the other hand, different people have great writing style difference, so that even the same type of characters have obvious visual difference, which brings great difficulty to the recognition of handwritten Chinese characters.
Most existing methods based on deep learning use convolutional neural networks to classify handwritten Chinese characters by learning global semantic features from the entire image, but this is not sufficient for the recognition of visually similar characters because there are often only subtle differences between confusing characters. In particular, the global attention provided by these methods may locate entire characters well, but the attention areas between characters of different classes have large overlap and lack of distinctiveness, which may result in a high recognition error rate for near-word and words of large differences within the classes.
According to our daily experience, when a person identifies a specific character among a plurality of confusing Chinese characters, the specific Chinese character category is usually determined by observing detailed features in candidate Chinese characters and then comparing their similarities and differences. For example, "bird" and "Wu" are two Chinese characters that are easily visually confused, but we can distinguish them by observing whether there is a "left-word" in their upper half; similarly, for "diffuse" and "disrespectful," we can judge the components of their left half.
Recently, a handwritten Chinese character recognition method based on a Recurrent Neural Network (RNN) and an attention mechanism is proposed, which uses a residual convolutional neural network as a backbone network and corrects character prediction by iteratively updating attention distribution using the RNN. The method can utilize an attention mechanism to locate local regions of characters to identify visually similar kanji characters. However, this method has two major disadvantages: firstly, based on the method of updating attention distribution by iteration, the initial error may be accumulated and the identification precision is improved to a limited extent, depending on the prediction result of the previous iteration; secondly, the method uses RNN for multiple iterations, the training time is longer, the process is more complex, the RNN cannot fully utilize GPU for parallel computation due to an internal mechanism, and the problems of gradient disappearance, gradient explosion and the like easily occur in the back propagation process.
Under such a background, it is necessary to design a simple and effective text recognition method capable of mining locally distinctive features.
Based on the above, the present application provides a method for training a character recognition network model, a method for character recognition, a device, a terminal and a computer storage medium thereof, wherein features of an input picture are extracted through a convolutional neural network, then, distinctive attention features are obtained through an attention mechanism module, a character type prediction result is obtained after feature fusion, a loss function is designed according to character type labels of the input picture and the character type prediction result during model training, and the attention weight is optimized, so that the accuracy of character recognition is improved, and the recognition robustness of a difficult sample is stronger.
The embodiments of the present application will be further explained with reference to the drawings.
As shown in fig. 1, fig. 1 is a schematic flowchart of a character recognition network model training method and a character recognition method provided in an embodiment of the present application, where solid arrows represent training steps, and dashed arrows represent recognition steps.
The character recognition network model comprises a deep convolution neural network, a multi-channel attention mechanism module, a comparative attention feature learning branch and a multi-attention feature fusion module.
Deep convolutional neural network: a neural network useful for classification, the network consisting essentially of convolutional and pooling layers. The convolution layer is used for extracting picture characteristics; the role of the pooling layer is to reduce the dimensionality of the eigenvectors output by the convolutional layer, reducing overfitting. Parameters in the network can be updated by a back propagation algorithm. In the embodiment of the application, the deep convolutional neural network is composed of 14 convolutional layers and 4 pooling layers.
An attention mechanism module: in a manner of simulating human observation, generally speaking, when people look at a picture, in addition to holding an image as a whole, people pay more attention to some local information of the picture, such as the position of a table, the type of goods, and the like. In the field of computer vision, the essence of the attention mechanism is to select information needing more attention from input information and perform feature extraction from key parts. The introduction of the attention mechanism can increase the expression capacity of the model under the condition of hardly increasing the complexity of the model on one hand; on the other hand, the attention mechanism only selects and processes input information important for the model, and the efficiency of the neural network can be improved.
Comparing attention feature learning branches: the global features of the image are extracted, so that general objects can be well classified, but for the fine-grained classification problem of handwritten Chinese characters, the local features with distinguishing characters need to be concerned. The purpose of learning the contrast attention features is to enable a multi-channel attention mechanism module to locate a plurality of local regions for an input sample, and train under supervision of a contrast loss function and a region center loss function to obtain scattered attention regions, so that the model can be more likely to locate the features with distinguishing force of characters, and the recognition error rate of visually similar characters is reduced.
Referring to fig. 2 and 3, an embodiment of the present application provides a method for training a character recognition network model, including the following steps:
step S100: standardizing each picture in the original data set, and carrying out character type labeling on each picture to obtain a standard training data set with the character type labels;
step S200: inputting each picture in the standard training data set into a convolutional neural network, extracting the convolution characteristics of the pictures, and obtaining a depth characteristic map containing the convolution characteristics;
step S300: inputting the depth feature map into an attention mechanism module with a plurality of channels to obtain the attention weight of each channel, and rescaling each channel of the depth feature map by using the attention weight to obtain a plurality of attention feature maps;
step S400: inputting each attention feature map into a full-connection layer respectively to obtain a plurality of attention feature vectors;
step S500: performing feature fusion on the plurality of attention feature vectors, and inputting the feature vectors into a character full-connection layer to perform character type prediction;
step S600: and designing a target loss function according to the character type prediction result and the character type label, and performing iteration by using a back propagation algorithm to minimize the target loss function and optimize the attention weight.
In an embodiment, step S100 specifically includes: counting each picture I in the original data seti(i ═ 1, ·, N), and the mean and variance of each picture, scaling the height and width of each picture to a preset height H and a preset width W, where generally, the default values of the preset height H and the preset width W are both 96, where N is the number of pictures in the original data set; and for each picture IiAnd carrying out character type labeling to obtain a standard training data set with the character type label.
In an embodiment, referring to fig. 4, step S200 specifically includes: the convolutional neural network comprises 2 convolutional layers (conv1, conv2) and 4 convolutional modules, and the picture I to be standardizedi(i ═ 1, ·, N) are input into 2 convolutional layers (Conv1, Conv2), each of which is followed by a Batch Normalization layer (BN) and a nonlinear activation function ReLU, obtaining a signature graph with a size of 96 × 64, then the signature graph is input into a maximum pooling layer with a step size of 2 for sampling, obtaining a signature graph of 48 × 64, and then the signature graph is input into 4 convolution modules (Conv-Block), each of which is composed of 3 convolutional layers with a convolution kernel size of 3 × 3 and 3 Batch Normalization layers, wherein the 3 Batch Normalization layers are followed by the 3 convolutional layers respectivelyThen, the convolution module (Conv-Block) is a "bottleneck" structure, with the middle layer of 3 convolution layers having fewer channels than the upper and lower layers; each convolution module (Conv-Block) is connected with the largest pooling layer with the step size of 2, the resolution of the input feature map is halved, and finally after 4 convolution modules (Conv-Block), the depth feature map X with the size of 6X 448 is outputiThese depth profiles XiContains high-level semantic information obtained through 14 convolutional layers.
In an embodiment, referring to fig. 5, step S300 specifically includes: a depth profile X of size 6X 448 output by the last convolution module (Conv-Block)iAs input, the data is sent to an attention mechanism module with a plurality of channels to calculate an attention feature map
Figure BDA00023095832400000611
In this embodiment, the value of S is 2; the attention mechanism module uses the channel attention mechanism introduced by the SENET method for reference, and firstly uses a global flattening pool to assemble an input depth feature map X in the H multiplied by W spatial dimensioniTo generate a channel descriptor zs=[z1,…,zC]Wherein z issThe c element of (a)cThe calculation method comprises the following steps:
Figure BDA0002309583240000061
wherein S is 1, S is the number of attention mechanism modules;
wherein C is 1, C, C is the number of channels;
at zsThe channel descriptors are processed using the gating mechanism with Sigmoid activation to obtain the attention weight of each attention mechanism module:
Figure BDA0002309583240000062
where σ is Sigmoid function, δ is ReLU function,
Figure BDA0002309583240000063
r is the channel compression ratio;
each attention mechanism module re-aligns depth feature map X using attention weightsiIs scaled to obtain a plurality of attention feature maps
Figure BDA0002309583240000064
Figure BDA0002309583240000065
Wherein
Figure BDA0002309583240000066
Picture I representing normalizationiC channel of corresponding attention feature map
Figure BDA0002309583240000067
And scalar quantity
Figure BDA0002309583240000068
The product between them.
In an embodiment, step S400 specifically includes: inputting the plurality of attention feature maps obtained in step S300 into a comparative attention feature learning branch for extracting attention features of the local distinctive zones, i.e. each attention feature map
Figure BDA0002309583240000069
To full-junction layers containing 768 neurons respectively:
Figure BDA00023095832400000610
wherein the operator Fflatt(. cndot.) tiling the matrix into 1-dimensional vectors.
In an embodiment, step S500 specifically includes: attentions are paid toEigenvector fi s(S is 1, …, S), and then input to a full-link layer containing 3755 neurons to perform character type prediction:
Yi=soft max(W·[fi 1,…,fi S])
wherein [ ·]Denotes cascade operation, YiRepresenting a picture IiCorresponding scores of 3755 Chinese characters, the category with the highest score is the predicted result of character category
Figure BDA0002309583240000071
In an embodiment, step S600 specifically includes: using character category label gt as expected output of network model to predict result
Figure BDA0002309583240000072
Designing a target loss function between the expected output of the network model and the predicted output of the network model for the predicted output of the network model, and minimizing a cross entropy loss function L in the training processclsTo ensure each attention feature map
Figure BDA0002309583240000073
Can locate areas important for character classification; regarding the comparative attention feature learning branch, taking the plurality of attention features obtained in step S300 as input, and using metric learning loss functions, i.e., a contrast loss function and a region center loss function, to focus the attention feature map of the network model on different regions with distinctive features of the input picture; in particular, a contrast loss function is applied to the attention feature to capture separable attention areas;
defining the target loss function as:
Ltotal=Lcls+λ(Lcenter+Lcontra)
wherein L isclsAs a cross-entropy loss function, LcenterFor the region-centered loss function for reducing the distance between the individual attention features of the same type of character, LcontraFor drawing a picture IiA plurality of attention feature vectors fi sA contrast loss function of zooming out in a high-dimensional space, wherein lambda is a hyper-parameter used for controlling the weight occupied by the two loss functions;
the contrast loss function is defined as:
Figure BDA0002309583240000074
wherein D (I)i) Is defined as:
Figure BDA0002309583240000075
wherein m is a preset threshold; the contrast loss function is to input picture IiA plurality of attention feature vectors fi sIn the high-dimensional space, the distance between every two vectors is larger than a preset threshold m, and m is set to be 40 in the embodiment, so that the local features of the characters positioned by the attention feature maps are different, and thus the character recognition network model is more likely to mine the distinguishing features of the characters.
The area center loss function is defined as:
Figure BDA0002309583240000076
the region center loss function is used for reducing the distance between the attention features of the same type of characters, so that a plurality of attention features learned by the same type of characters are respectively similar, and each attention feature map is convenient for
Figure BDA0002309583240000077
Are activated in the same character part, wherein
Figure BDA0002309583240000078
Is yiThe center of the s-th attention feature of the class, d the dimension of the feature, NoteCenter of gravity characteristics
Figure BDA0002309583240000079
Initializing with a Gaussian distribution with a mean value of 0 and a variance of 1, and then updating the feature center according to a region center loss function algorithm.
And (4) according to the designed target loss function, utilizing a back propagation algorithm to carry out iteration, and minimizing the cross entropy loss function in the training process to realize an optimal network model. Aiming at the off-line handwritten Chinese character recognition task, the original data set is used for iterative training in the training process to obtain the parameters of the network model.
Referring to fig. 6, an embodiment of the present application provides a character recognition method, which recognizes a handwritten chinese character image by using a character recognition network model trained in the above embodiment of the present application, and includes the following steps:
step A100: picture I to be testediCarrying out standardization, and zooming to a preset height H and a preset width W;
step A200: picture I to be testediInputting the convolution neural network to extract a picture I to be testediObtaining a depth feature map X containing the convolution featuresi
Step A300: depth feature map XiInputting an attention mechanism module with a plurality of channels, obtaining the attention weight of each channel, and using the attention weight to re-align the depth feature map XiIs scaled to obtain a plurality of attention feature maps
Figure BDA0002309583240000081
Step A400: each attention feature map
Figure BDA0002309583240000082
Respectively inputting into the full-connection layer to obtain multiple attention feature vectors fi s
Step A500: a plurality of attention feature vectors fi sPerforming feature fusion, inputting into character full-connection layer for character classificationAnd (6) predicting.
In an embodiment, step a200 specifically includes: the convolutional neural network comprises 2 convolutional layers (conv1, conv2) and 4 convolutional modules, and pictures I to be testediInputting 2 convolutional layers (Conv1, Conv2), each of which is followed by a Batch Normalization layer (BN) and a nonlinear activation function ReLU to obtain a feature map with the size of 96 × 64, then inputting the feature map into a maximum pooling layer with the step length of 2 for sampling to obtain a feature map with the size of 48 × 64, and then inputting the feature map into 4 convolution modules (Conv-Block), wherein each convolution module is composed of 3 convolutional layers with the convolution kernel size of 3 × 3 and 3 Batch Normalization layers, wherein the 3 Batch Normalization layers are respectively followed by 3 convolutional layers, and the convolution module (Conv-Block) is a 'bottleneck' structure, and the channels of the middle layers of the 3 convolutional layers are less than that of the upper layer and the lower layer; each convolution module (Conv-Block) is connected with the largest pooling layer with the step size of 2, the resolution of the input feature map is halved, and finally after 4 convolution modules (Conv-Block), the depth feature map X with the size of 6X 448 is outputiDepth feature map XiContains high-level semantic information obtained through 14 convolutional layers.
In an embodiment, step a300 specifically includes: a depth profile X of size 6X 448 output by the last convolution module (Conv-Block)iAs input, the data is sent to an attention mechanism module with a plurality of channels to calculate an attention feature map
Figure BDA0002309583240000083
In this embodiment, the value of S is 2; the attention mechanism module uses the channel attention mechanism introduced by the SENET method for reference, and firstly uses a global flattening pool to assemble an input depth feature map X in the H multiplied by W spatial dimensioniTo generate a channel descriptor zs=[z1,…,zC]Wherein z issThe c element of (a)cThe calculation method comprises the following steps:
Figure BDA0002309583240000091
wherein S is 1, S is the number of attention mechanism modules;
wherein C is 1, C, C is the number of channels;
at zsThe channel descriptors are processed using the gating mechanism with Sigmoid activation to obtain the attention weight of each attention mechanism module:
Figure BDA0002309583240000092
where σ is Sigmoid function, δ is ReLU function,
Figure BDA0002309583240000093
r is the channel compression ratio;
each attention mechanism module re-aligns depth feature map X using attention weightsiIs scaled to obtain a plurality of attention feature maps
Figure BDA0002309583240000094
Figure BDA0002309583240000095
Wherein
Figure BDA0002309583240000096
Picture I representing normalizationiC channel of corresponding attention feature map
Figure BDA0002309583240000097
And scalar quantity
Figure BDA0002309583240000098
The product between them.
In an embodiment, step a400 specifically includes: inputting the plurality of attention feature maps obtained in the step A300 into a comparative attention feature learning branch for extracting bureauAttention characteristics characterised by a distinct region, i.e. each attention profile
Figure BDA0002309583240000099
To full-junction layers containing 768 neurons respectively:
Figure BDA00023095832400000910
wherein the operator Fflatt(. cndot.) tiling the matrix into 1-dimensional vectors.
In an embodiment, step a500 specifically includes: a plurality of attention feature vectors fi s(S is 1, …, S), and then input to a full-link layer containing 3755 neurons to perform character type prediction:
Yi=soft max(W·[fi 1,…,fi S])
wherein [ ·]Denotes cascade operation, YiRepresenting pictures I to be testediCorresponding scores of 3755 Chinese characters, the category with the highest score is the predicted result of character category
Figure BDA00023095832400000911
Through the above technical scheme that this application was conceived, compare with prior art, have following technological effect:
(1) the accuracy is high: aiming at the problem of low character recognition precision of the handwritten Chinese characters with large differences between the shapes and styles of the handwritten Chinese characters, the method creatively utilizes a multiple contrast attention mechanism to extract the distinguishing characteristics of the Chinese characters, and more accurately recognizes the handwritten Chinese characters.
(2) The speed is high: the provided character recognition network model has higher training speed while ensuring the recognition precision.
(3) The universality is strong: the method can accurately identify the Chinese characters with similar characters, can realize complete end-to-end training, has small model parameter, is simple and effective, and is easy for the product to fall on the ground.
(4) The robustness is strong: the method can overcome the shape change of the handwritten Chinese characters caused by the writing styles of different individuals, and achieves the highest recognition precision on the standard handwritten Chinese character test set.
Referring to fig. 7, an embodiment of the present application provides a character recognition network model training apparatus 100, including: a memory 101, a processor 102 and a computer program stored on the memory and executable on the processor, the processor implementing the word recognition network model training method in the above embodiments when executing the computer program, for example, executing the above-described method steps S100 to S600 of fig. 2. The processor 102 and the memory 101 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
Referring to fig. 8, an embodiment of the present application provides a text recognition apparatus 200, including: a memory 201, a processor 202 and a computer program stored on the memory and executable on the processor, the processor implementing the word recognition method in the above embodiments when executing the computer program, for example, performing the above described method steps a100 to a500 of fig. 6. The processor 202 and the memory 201 may be connected by a bus or other means, and fig. 8 illustrates a bus connection as an example.
An embodiment of the present application further provides a terminal, including the character recognition network model training apparatus 100 described in the foregoing embodiment or including the character recognition apparatus 200 described in the foregoing embodiment. The terminal may be any type of smart terminal, such as a smart phone, a tablet computer, a laptop computer, or a desktop computer.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions, which are executed by a processor or a controller, for example, by a processor 102 in fig. 7, and can cause the processor 102 to execute the text recognition network model training method in the above embodiment, for example, execute the above-described method steps S100 to S600 in fig. 2. As another example, execution by one of the processors 202 in fig. 8 may cause the processor 202 to perform the text recognition method in the above-described embodiment, for example, perform the above-described method steps a100 to a500 of fig. 6.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are included in the scope of the present invention defined by the claims.

Claims (18)

1. A character recognition network model training method is characterized by comprising the following steps:
standardizing each picture in the original data set, and carrying out character type labeling on each picture to obtain a standard training data set with the character type labels;
inputting each picture in the standard training data set into a convolutional neural network, extracting the convolutional characteristic of the picture, and obtaining a depth characteristic map containing the convolutional characteristic;
inputting the depth feature map into an attention mechanism module with a plurality of channels to obtain an attention weight of each channel, and rescaling each channel of the depth feature map by using the attention weight to obtain a plurality of attention feature maps;
inputting each attention feature map into a full-connection layer respectively to obtain a plurality of attention feature vectors;
performing feature fusion on the attention feature vectors, and inputting the attention feature vectors into a character full-connection layer to perform character type prediction;
and designing a target loss function according to the character type prediction result and the character type label, performing iteration by using a back propagation algorithm, minimizing the target loss function, and optimizing the attention weight.
2. The method of claim 1, wherein normalizing each image in the raw data set comprises:
counting each picture I in the original data seti(i ═ 1, ·, N), scaling the height and width of each picture to a preset height H and a preset width W, where N is the number of pictures in the original data set.
3. The method of claim 2, wherein the convolutional neural network comprises convolutional layers and convolutional modules;
inputting each picture in the standard training data set into a convolutional neural network, extracting the convolutional characteristic of the picture, and obtaining a depth characteristic map containing the convolutional characteristic, wherein the method comprises the following steps:
picture I to be standardizedi(i ═ 1, ·, N) are respectively input into several convolution layers, each convolution layer is followed by a batch normalization layer and nonlinear activation function ReLU, then input into maximum pooling layer to make sampling, then input into several described convolution modules, every convolution module is formed from several convolution layers with same quantity and batch normalization layer, every batch normalization layer is followed by every convolution layer, every convolution module is connected with maximum pooling layer, and finally one described convolution module outputs depth characteristic graph X containing convolution characteristici
4. The method for training a character recognition network model according to claim 1 or 3, wherein the attention weight is obtained by the following steps:
the attention mechanism module aggregates the input depth feature maps in spatial dimensions using global average pooling to generate channel descriptors, which are processed using a gating mechanism with Sigmoid activation to derive an attention weight for each channel.
5. The method of claim 3, wherein the inputting the depth feature map into an attention mechanism module having a plurality of channels, obtaining an attention weight for each channel, and rescaling each channel of the depth feature map using the attention weight to obtain a plurality of attention feature maps comprises:
the attention mechanism module uses a global flattening pool to assemble the input depth feature map X in the spatial dimension H X WiTo generate a channel descriptor zs=[z1,…,zC]Wherein z issThe c element of (a)cThe calculation method comprises the following steps:
Figure FDA0002309583230000021
wherein S is 1, S is the number of attention mechanism modules;
wherein C is 1, C, C is the number of channels;
at zsThe channel descriptors are processed using a gating mechanism with Sigmoid activation to obtain the attention weight of each attention mechanism module:
Figure FDA0002309583230000022
where σ is Sigmoid function, δ is ReLU function,
Figure FDA0002309583230000023
r is the channel compression ratio;
each attention mechanism module re-aligns the depth feature map X using the attention weightsiIs scaled to obtain a plurality of attention feature maps
Figure FDA0002309583230000024
Figure FDA0002309583230000025
Wherein
Figure FDA0002309583230000026
Picture I representing normalizationiCorresponding c channel of the attention feature map
Figure FDA0002309583230000027
And scalar quantity
Figure FDA0002309583230000028
The product between them.
6. The method as claimed in claim 5, wherein said inputting each said attention feature map into a full-connected layer to obtain a plurality of attention feature vectors comprises:
mapping a plurality of said attention profiles
Figure FDA0002309583230000029
Respectively input to the full connection layer:
Figure FDA00023095832300000210
wherein the operator Fflatt(. cndot.) tiling the matrix into 1-dimensional vectors.
7. The method as claimed in claim 6, wherein said performing feature fusion on a plurality of attention feature vectors, and inputting the feature fusion into a character class fully-connected layer for character class prediction comprises:
a plurality of said attention feature vectors fi s(S is 1, …, S), and then inputting the character into a character full-connection layer for character type prediction:
Yi=softmax(W·[fi 1,…,fi S])
wherein [ ·]Denotes cascade operation, YiRepresenting a picture IiAnd the corresponding scores belonging to the character categories, wherein the category with the highest score is the result of character category prediction.
8. The method of claim 7, wherein the step of designing an objective loss function according to the result of the character class prediction and the character class label, and performing iteration by using a back propagation algorithm to minimize the objective loss function and optimize the attention weight comprises:
defining the target loss function as:
Ltotal=Lcls+λ(Lcenter+Lcontra)
wherein L isclsAs a cross-entropy loss function, LcenterFor the region-centered loss function for reducing the distance between the individual attention features of the same type of character, LcontraFor drawing a picture IiA plurality of said attention feature vectors fi sA contrast loss function of zooming out in a high-dimensional space, wherein lambda is a hyper-parameter used for controlling the weight occupied by the two loss functions;
the contrast loss function is defined as:
Figure FDA0002309583230000031
wherein D (I)i) Is defined as:
Figure FDA0002309583230000032
wherein m is a preset threshold;
the area center loss function is defined as:
Figure FDA0002309583230000033
wherein
Figure FDA0002309583230000034
Is yiThe center of the s-th attention feature of the class, d the dimension of the feature, the center of the attention feature
Figure FDA0002309583230000035
Initializing Gaussian distribution with the mean value of 0 and the variance of 1, and then updating a characteristic center according to a regional center loss function algorithm;
and according to the target loss function, utilizing a back propagation algorithm to carry out iteration, minimizing the cross entropy loss function, and optimizing the attention weight.
9. A method for recognizing a character, comprising:
standardizing the picture to be tested, and zooming to a preset height H and a preset width W;
inputting the picture to be tested into a convolutional neural network, extracting the convolutional characteristic of the picture to be tested, and obtaining a depth characteristic graph containing the convolutional characteristic;
inputting the depth feature map into an attention mechanism module with a plurality of channels to obtain an attention weight of each channel, and rescaling each channel of the depth feature map by using the attention weight to obtain a plurality of attention feature maps;
inputting each attention feature map into a full-connection layer respectively to obtain a plurality of attention feature vectors;
and performing feature fusion on the attention feature vectors, and inputting the attention feature vectors into a character full-connection layer to perform character type prediction.
10. The method of claim 9, wherein the convolutional neural network comprises convolutional layers and convolutional modules;
inputting the picture to be tested into a convolutional neural network, extracting the convolutional characteristic of the picture to be tested, and obtaining a depth characteristic graph containing the convolutional characteristic, wherein the method comprises the following steps:
the picture I to be tested is processediInputting the data into the plurality of convolution layers, each convolution layer is connected with a batch normalization layer and a nonlinear activation function ReLU, then inputting the data into a maximum pooling layer for sampling, and then inputting the data into the plurality of convolution modules, each convolution module is composed of a plurality of convolution layers and batch normalization layers with the same number, each batch normalization layer is connected with each convolution layer, and the last convolution module outputs a depth feature graph X containing convolution featuresi
11. The character recognition method of claim 9 or 10, wherein the attention weight is obtained by:
the attention mechanism module aggregates the input depth feature maps in spatial dimensions using global average pooling to generate channel descriptors, which are processed using a gating mechanism with Sigmoid activation to derive an attention weight for each channel.
12. The method of claim 10, wherein the inputting the depth feature map into an attention mechanism module having a plurality of channels, obtaining an attention weight for each channel, and rescaling each channel of the depth feature map using the attention weight to obtain a plurality of attention feature maps comprises:
the attention mechanism module uses a global flattening pool to assemble the input depth feature map X in the spatial dimension H X WiTo generate a channel descriptor zs=[z1,…,zC]Wherein z issThe c element of (a)cThe calculation method comprises the following steps:
Figure FDA0002309583230000041
wherein S is 1, S is the number of attention mechanism modules;
wherein C is 1, C, C is the number of channels;
at zsThe channel descriptors are processed using a gating mechanism with Sigmoid activation to obtain the attention weight of each attention mechanism module:
Figure FDA0002309583230000042
where σ is Sigmoid function, δ is ReLU function,
Figure FDA0002309583230000043
r is the channel compression ratio;
each attention mechanism module re-aligns the depth feature map X using the attention weightsiIs scaled to obtain a plurality of attention feature maps
Figure FDA0002309583230000044
Figure FDA0002309583230000045
Wherein
Figure FDA0002309583230000046
Picture I representing normalizationiCorresponding c channel of the attention feature map
Figure FDA0002309583230000047
And scalar quantity
Figure FDA0002309583230000048
The product between them.
13. The method of claim 12, wherein said inputting each said attention feature map into a full-concatenation layer to obtain a plurality of attention feature vectors comprises:
mapping a plurality of said attention profiles
Figure FDA0002309583230000051
Respectively input to the full connection layer:
Figure FDA0002309583230000052
wherein the operator Fflatt(. cndot.) tiling the matrix into 1-dimensional vectors.
14. The method of claim 13, wherein said feature fusing a plurality of said attention feature vectors and inputting the fused attention feature vectors into a character class full-link layer for character class prediction comprises:
a plurality of said attention feature vectors fi s(S is 1, …, S), and then inputting the character into a character full-connection layer for character type prediction:
Yi=softmax(W·[fi 1,…,fi S])
wherein [ ·]Denotes cascade operation, YiRepresenting a picture IiAnd the corresponding scores belonging to the character categories, wherein the category with the highest score is the result of character category prediction.
15. A character recognition network model training device comprises: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of training a word recognition network model according to any one of claims 1 to 8 when executing the computer program.
16. A character recognition apparatus comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of word recognition according to any one of claims 9 to 14 when executing the computer program.
17. A terminal comprising the character recognition network model training apparatus of claim 15 or comprising the character recognition apparatus of claim 16.
18. A computer storage medium storing computer-executable instructions for performing the method of any of claims 1 to 8 or for performing the method of any of claims 9 to 14.
CN201911253120.1A 2019-12-09 2019-12-09 Character recognition method, device, terminal and computer storage medium thereof Pending CN113033249A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911253120.1A CN113033249A (en) 2019-12-09 2019-12-09 Character recognition method, device, terminal and computer storage medium thereof
PCT/CN2020/133116 WO2021115159A1 (en) 2019-12-09 2020-12-01 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911253120.1A CN113033249A (en) 2019-12-09 2019-12-09 Character recognition method, device, terminal and computer storage medium thereof

Publications (1)

Publication Number Publication Date
CN113033249A true CN113033249A (en) 2021-06-25

Family

ID=76329519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911253120.1A Pending CN113033249A (en) 2019-12-09 2019-12-09 Character recognition method, device, terminal and computer storage medium thereof

Country Status (2)

Country Link
CN (1) CN113033249A (en)
WO (1) WO2021115159A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364860A (en) * 2020-11-05 2021-02-12 北京字跳网络技术有限公司 Training method and device of character recognition model and electronic equipment
CN113326833A (en) * 2021-08-04 2021-08-31 浩鲸云计算科技股份有限公司 Character recognition improved training method based on center loss
CN113610164A (en) * 2021-08-10 2021-11-05 北京邮电大学 Fine-grained image recognition method and system based on attention balance
CN113610045A (en) * 2021-08-20 2021-11-05 大连理工大学 Remote sensing image target identification generalization method for depth feature integrated learning
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism
CN113705568A (en) * 2021-08-27 2021-11-26 深圳市商汤科技有限公司 Character recognition network training method and device, computer equipment and storage medium
CN113741528A (en) * 2021-09-13 2021-12-03 中国人民解放军国防科技大学 Deep reinforcement learning training acceleration method for collision avoidance of multiple unmanned aerial vehicles
CN113869426A (en) * 2021-09-29 2021-12-31 北京搜狗科技发展有限公司 Formula identification method and device
CN114429633A (en) * 2022-01-28 2022-05-03 北京百度网讯科技有限公司 Text recognition method, model training method, device, electronic equipment and medium
CN114677661A (en) * 2022-03-24 2022-06-28 智道网联科技(北京)有限公司 Roadside identifier identification method and device and electronic equipment
CN114743206A (en) * 2022-05-17 2022-07-12 北京百度网讯科技有限公司 Text detection method, model training method, device and electronic equipment
CN114898345A (en) * 2021-12-13 2022-08-12 华东师范大学 Arabic text recognition method and system
CN116432521A (en) * 2023-03-21 2023-07-14 浙江大学 Handwritten Chinese character recognition and retrieval method based on multi-modal reconstruction constraint
CN118072973A (en) * 2024-04-15 2024-05-24 慧医谷中医药科技(天津)股份有限公司 Intelligent inquiry method and system based on medical knowledge base

Families Citing this family (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469335B (en) * 2021-06-29 2024-05-10 杭州中葳数字科技有限公司 Method for distributing weights for features by utilizing relation among features of different convolution layers
CN113487013B (en) * 2021-06-29 2024-05-07 杭州中葳数字科技有限公司 Attention mechanism-based sorting grouping convolution method
CN113421318B (en) * 2021-06-30 2022-10-28 合肥高维数据技术有限公司 Font style migration method and system based on multitask generation countermeasure network
CN113705344A (en) * 2021-07-21 2021-11-26 西安交通大学 Palm print recognition method and device based on full palm, terminal equipment and storage medium
CN113569727B (en) * 2021-07-27 2022-10-21 广东电网有限责任公司 Method, system, terminal and medium for identifying construction site in remote sensing image
CN113627590B (en) * 2021-07-29 2024-07-12 中汽创智科技有限公司 Attention module, attention mechanism and convolutional neural network of convolutional neural network
CN113793627B (en) * 2021-08-11 2023-12-29 华南师范大学 Attention-based multi-scale convolution voice emotion recognition method and device
CN113688830B (en) * 2021-08-13 2024-04-26 湖北工业大学 Deep learning target detection method based on center point regression
CN113762357B (en) * 2021-08-18 2024-05-14 江苏大学 Intelligent pharmacy prescription checking method based on deep learning
CN113673451A (en) * 2021-08-25 2021-11-19 上海鹏冠生物医药科技有限公司 Graph volume module for extracting image features of tissue cytology pathology pieces
CN113763965B (en) * 2021-08-26 2023-12-19 江苏大学 Speaker identification method with multiple attention feature fusion
CN113763412B (en) * 2021-09-08 2024-07-16 理光软件研究所(北京)有限公司 Image processing method and device, electronic equipment and computer readable storage medium
CN113780170A (en) * 2021-09-10 2021-12-10 昭通亮风台信息科技有限公司 SSD deep learning network-based fire detection and identification method, system and fire alarm method
CN113963352B (en) * 2021-09-22 2022-08-02 支付宝(杭州)信息技术有限公司 Method and device for recognizing picture and training neural network
CN113989541B (en) * 2021-09-23 2024-08-20 神思电子技术股份有限公司 Dressing classification method and system based on feature aggregation
CN113705733A (en) * 2021-09-29 2021-11-26 平安医疗健康管理股份有限公司 Medical bill image processing method and device, electronic device and storage medium
CN113850741B (en) * 2021-10-10 2023-04-07 杭州知存智能科技有限公司 Image noise reduction method and device, electronic equipment and storage medium
CN114037600A (en) * 2021-10-11 2022-02-11 长沙理工大学 New cycleGAN style migration network based on new attention mechanism
CN114140873A (en) * 2021-11-09 2022-03-04 武汉众智数字技术有限公司 Gait recognition method based on convolutional neural network multi-level features
CN114140685A (en) * 2021-11-11 2022-03-04 国网福建省电力有限公司 Environment-adaptive substation instrument reading identification method, equipment and medium
CN114119997A (en) * 2021-11-26 2022-03-01 腾讯科技(深圳)有限公司 Training method and device for image feature extraction model, server and storage medium
CN113836850A (en) * 2021-11-26 2021-12-24 成都数之联科技有限公司 Model obtaining method, system and device, medium and product defect detection method
CN114118415B (en) * 2021-11-29 2024-06-28 暨南大学 Deep learning method of lightweight bottleneck attention mechanism
CN114140357B (en) * 2021-12-02 2024-04-19 哈尔滨工程大学 Multi-temporal remote sensing image cloud zone reconstruction method based on cooperative attention mechanism
CN114119979A (en) * 2021-12-06 2022-03-01 西安电子科技大学 Fine-grained image classification method based on segmentation mask and self-attention neural network
CN114220012B (en) * 2021-12-16 2024-05-31 池明旻 Textile cotton and hemp identification method based on deep self-attention network
CN114973222B (en) * 2021-12-20 2024-05-10 西北工业大学宁波研究院 Scene text recognition method based on explicit supervision attention mechanism
CN114266938A (en) * 2021-12-23 2022-04-01 南京邮电大学 Scene recognition method based on multi-mode information and global attention mechanism
CN114530210A (en) * 2022-01-06 2022-05-24 山东师范大学 Drug molecule screening method and system
CN114049634B (en) * 2022-01-12 2022-05-13 深圳思谋信息科技有限公司 Image recognition method and device, computer equipment and storage medium
CN114445299A (en) * 2022-01-28 2022-05-06 南京邮电大学 Double-residual denoising method based on attention allocation mechanism
CN114694211B (en) * 2022-02-24 2024-04-19 合肥工业大学 Synchronous detection method and system for non-contact type multiple physiological parameters
CN114566216B (en) * 2022-02-25 2024-04-02 桂林电子科技大学 Attention mechanism-based splice site prediction and interpretation method
CN114639169B (en) * 2022-03-28 2024-02-20 合肥工业大学 Human motion recognition system based on attention mechanism feature fusion and irrelevant to position
CN114724219B (en) * 2022-04-11 2024-05-31 辽宁师范大学 Expression recognition method based on attention shielding mechanism
CN116994266A (en) * 2022-04-18 2023-11-03 北京字跳网络技术有限公司 Word processing method, word processing device, electronic equipment and storage medium
CN115034256B (en) * 2022-05-05 2024-08-23 上海大学 Near-ground target acoustic shock signal classification and identification system and method based on deep learning
CN114612791B (en) * 2022-05-11 2022-07-29 西南民族大学 Target detection method and device based on improved attention mechanism
CN114998482B (en) * 2022-06-13 2024-09-03 厦门大学 Intelligent generation method of character artistic pattern
CN114881011B (en) * 2022-07-12 2022-09-23 中国人民解放军国防科技大学 Multichannel Chinese text correction method, device, computer equipment and storage medium
CN115251948A (en) * 2022-07-14 2022-11-01 深圳未来脑律科技有限公司 Classification and identification method and system for bimodal motor imagery and storage medium
CN117523226A (en) * 2022-07-28 2024-02-06 杭州堃博生物科技有限公司 Image registration method, device and storage medium
CN115439849B (en) * 2022-09-30 2023-09-08 杭州电子科技大学 Instrument digital identification method and system based on dynamic multi-strategy GAN network
CN115568860B (en) * 2022-09-30 2024-07-02 厦门大学 Automatic classification method of twelve-lead electrocardiosignals based on double-attention mechanism
CN115471851B (en) * 2022-10-11 2023-07-28 小语智能信息科技(云南)有限公司 Burmese image text recognition method and device integrating dual attention mechanisms
CN116246331B (en) * 2022-12-05 2024-08-16 苏州大学 Automatic keratoconus grading method, device and storage medium
CN115993365B (en) * 2023-03-23 2023-06-13 山东省科学院激光研究所 Belt defect detection method and system based on deep learning
CN116052154B (en) * 2023-04-03 2023-06-16 中科南京软件技术研究院 Scene text recognition method based on semantic enhancement and graph reasoning
CN116563615B (en) * 2023-04-21 2023-11-07 南京讯思雅信息科技有限公司 Bad picture classification method based on improved multi-scale attention mechanism
CN116405310B (en) * 2023-04-28 2024-03-15 北京宏博知微科技有限公司 Network data security monitoring method and system
CN116259067B (en) * 2023-05-15 2023-09-12 济南大学 Method for high-precision identification of PID drawing symbols
CN116993679B (en) * 2023-06-30 2024-04-30 芜湖合德传动科技有限公司 Method for detecting belt abrasion of telescopic machine based on target detection
CN116597258B (en) * 2023-07-18 2023-09-26 华东交通大学 Ore sorting model training method and system based on multi-scale feature fusion
CN116934733B (en) * 2023-08-04 2024-04-09 湖南恩智测控技术有限公司 Reliability test method and system for chip
CN117036891B (en) * 2023-08-22 2024-03-29 睿尔曼智能科技(北京)有限公司 Cross-modal feature fusion-based image recognition method and system
CN117173716B (en) * 2023-09-01 2024-03-26 湖南天桥嘉成智能科技有限公司 Deep learning-based high-temperature slab ID character recognition method and system
CN117079295B (en) * 2023-09-19 2024-05-03 中航西安飞机工业集团股份有限公司 Pointer identification and reading method and system for aviation cable tensiometer
CN117037173B (en) * 2023-09-22 2024-02-27 武汉纺织大学 Two-stage English character detection and recognition method and system
CN117523685B (en) * 2023-11-15 2024-07-09 中国矿业大学 Dual-mode biological feature recognition method and system based on asymmetric comparison fusion
CN117809314B (en) * 2023-11-21 2024-09-17 中化现代农业有限公司 Character recognition method, character recognition device, electronic equipment and storage medium
CN117573810B (en) * 2024-01-15 2024-04-09 腾讯烟台新工科研究院 Multi-language product package instruction text recognition query method and system
CN117593610B (en) * 2024-01-17 2024-04-26 上海秋葵扩视仪器有限公司 Image recognition network training and deployment and recognition methods, devices, equipment and media
CN118279679B (en) * 2024-06-04 2024-08-02 深圳大学 Image classification method, image classification device and medium based on deep learning model
CN118429733A (en) * 2024-07-05 2024-08-02 湖南大学 Multi-head attention-driven kitchen garbage multi-label classification method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368831B (en) * 2017-07-19 2019-08-02 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
US10846854B2 (en) * 2017-10-13 2020-11-24 Shenzhen Keya Medical Technology Corporation Systems and methods for detecting cancer metastasis using a neural network
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110334705B (en) * 2019-06-25 2021-08-03 华中科技大学 Language identification method of scene text image combining global and local information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐清泉: "基于注意力机制的中文识别算法研究", 《万方全文库》, 4 December 2019 (2019-12-04), pages 17 - 22 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364860B (en) * 2020-11-05 2024-06-25 北京字跳网络技术有限公司 Training method and device of character recognition model and electronic equipment
CN112364860A (en) * 2020-11-05 2021-02-12 北京字跳网络技术有限公司 Training method and device of character recognition model and electronic equipment
CN113326833A (en) * 2021-08-04 2021-08-31 浩鲸云计算科技股份有限公司 Character recognition improved training method based on center loss
CN113610164A (en) * 2021-08-10 2021-11-05 北京邮电大学 Fine-grained image recognition method and system based on attention balance
CN113610164B (en) * 2021-08-10 2023-12-22 北京邮电大学 Fine granularity image recognition method and system based on attention balance
CN113610045A (en) * 2021-08-20 2021-11-05 大连理工大学 Remote sensing image target identification generalization method for depth feature integrated learning
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism
CN113705568A (en) * 2021-08-27 2021-11-26 深圳市商汤科技有限公司 Character recognition network training method and device, computer equipment and storage medium
CN113741528B (en) * 2021-09-13 2023-05-23 中国人民解放军国防科技大学 Deep reinforcement learning training acceleration method for collision avoidance of multiple unmanned aerial vehicles
CN113741528A (en) * 2021-09-13 2021-12-03 中国人民解放军国防科技大学 Deep reinforcement learning training acceleration method for collision avoidance of multiple unmanned aerial vehicles
CN113869426A (en) * 2021-09-29 2021-12-31 北京搜狗科技发展有限公司 Formula identification method and device
CN114898345A (en) * 2021-12-13 2022-08-12 华东师范大学 Arabic text recognition method and system
CN114429633B (en) * 2022-01-28 2023-10-27 北京百度网讯科技有限公司 Text recognition method, training method and device of model, electronic equipment and medium
CN114429633A (en) * 2022-01-28 2022-05-03 北京百度网讯科技有限公司 Text recognition method, model training method, device, electronic equipment and medium
CN114677661A (en) * 2022-03-24 2022-06-28 智道网联科技(北京)有限公司 Roadside identifier identification method and device and electronic equipment
CN114743206B (en) * 2022-05-17 2023-10-27 北京百度网讯科技有限公司 Text detection method, model training method, device and electronic equipment
CN114743206A (en) * 2022-05-17 2022-07-12 北京百度网讯科技有限公司 Text detection method, model training method, device and electronic equipment
CN116432521A (en) * 2023-03-21 2023-07-14 浙江大学 Handwritten Chinese character recognition and retrieval method based on multi-modal reconstruction constraint
CN116432521B (en) * 2023-03-21 2023-11-03 浙江大学 Handwritten Chinese character recognition and retrieval method based on multi-modal reconstruction constraint
CN118072973A (en) * 2024-04-15 2024-05-24 慧医谷中医药科技(天津)股份有限公司 Intelligent inquiry method and system based on medical knowledge base

Also Published As

Publication number Publication date
WO2021115159A1 (en) 2021-06-17

Similar Documents

Publication Publication Date Title
CN113033249A (en) Character recognition method, device, terminal and computer storage medium thereof
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
Chherawala et al. Feature set evaluation for offline handwriting recognition systems: application to the recurrent neural network model
Obozinski et al. Multi-task feature selection
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
Dekhtyar et al. Re data challenge: Requirements identification with word2vec and tensorflow
CN112100346B (en) Visual question-answering method based on fusion of fine-grained image features and external knowledge
US11720789B2 (en) Fast nearest neighbor search for output generation of convolutional neural networks
Khémiri et al. Bayesian versus convolutional networks for Arabic handwriting recognition
WO2020108808A1 (en) Method and system for classification of data
WO2015087148A1 (en) Classifying test data based on a maximum margin classifier
CN111582057B (en) Face verification method based on local receptive field
Salamah et al. Towards the machine reading of arabic calligraphy: a letters dataset and corresponding corpus of text
Chooi et al. Handwritten character recognition using convolutional neural network
Dsouza et al. Real Time Facial Emotion Recognition Using CNN
Kumar et al. Bayesian background models for keyword spotting in handwritten documents
Liu et al. Multi-digit recognition with convolutional neural network and long short-term memory
CN116541707A (en) Image-text matching model training method, device, equipment and storage medium
CN111242114A (en) Character recognition method and device
CN111144469A (en) End-to-end multi-sequence text recognition method based on multi-dimensional correlation time sequence classification neural network
Bappi et al. BNVGLENET: Hypercomplex Bangla handwriting character recognition with hierarchical class expansion using Convolutional Neural Networks
Alamsyah et al. Handwriting analysis for personality trait features identification using CNN
Saha et al. Real time Bangla Digit Recognition through Hand Gestures on Air Using Deep Learning and OpenCV
Küçükşahin Design of an offline ottoman character recognition system for translating printed documents to modern turkish
Sudholt Learning attribute representations with deep convolutional neural networks for word spotting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination