WO2021208666A1 - 字符识别方法及装置、电子设备和存储介质 - Google Patents

字符识别方法及装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021208666A1
WO2021208666A1 PCT/CN2021/081759 CN2021081759W WO2021208666A1 WO 2021208666 A1 WO2021208666 A1 WO 2021208666A1 CN 2021081759 W CN2021081759 W CN 2021081759W WO 2021208666 A1 WO2021208666 A1 WO 2021208666A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
target image
encoding
character
image
Prior art date
Application number
PCT/CN2021/081759
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
岳晓宇
旷章辉
蔺琛皓
孙红斌
张伟
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2021567034A priority Critical patent/JP2022533065A/ja
Priority to KR1020227000935A priority patent/KR20220011783A/ko
Publication of WO2021208666A1 publication Critical patent/WO2021208666A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/86Arrangements for image or video recognition or understanding using pattern recognition or machine learning using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognition; using graph matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18133Extraction of features or characteristics of the image regional/local feature not essentially salient, e.g. local binary pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/182Extraction of features or characteristics of the image by coding the contour of the pattern
    • G06V30/1823Extraction of features or characteristics of the image by coding the contour of the pattern using vector-coding

Definitions

  • the present disclosure relates to the field of electronic technology, and in particular to a character recognition method and device, electronic equipment and storage medium.
  • a computer can be used to automatically recognize characters to improve the efficiency of manual processing.
  • character recognition can recognize regular characters, such as parsing documents.
  • Character recognition can also recognize irregular characters, for example, recognize irregular characters in natural scenes such as traffic signs and storefront signs.
  • irregular characters for example, recognize irregular characters in natural scenes such as traffic signs and storefront signs.
  • the present disclosure proposes a technical solution for character recognition.
  • a character recognition method including: obtaining a target image to be recognized; obtaining the character feature of the target image based on the determined position vector and the first image feature of the target image; wherein The position vector is determined based on the position feature of the character in the preset information sequence; the character in the target image is recognized based on the character feature, and the character recognition result of the target image is obtained.
  • the obtaining the character feature of the target image based on the determined position vector and the first image feature of the target image includes: encoding the first image feature of the target image, Obtain the encoding result of the first image feature; determine the second image feature of the target image according to the encoding result of the first image feature; based on the determined position vector, the first image feature, and the second image feature Image feature, the character feature of the target image is obtained.
  • the second image feature has a stronger positional feature
  • the character feature of the obtained target image also has a stronger positional feature, so that the character recognition result obtained from the character feature is more accurate, and the character recognition result is less affected by the semantics. Influence.
  • the encoding the first image feature of the target image to obtain the encoding result of the first image feature includes: sequentially performing multiple first dimensions of the first image feature The feature vector is subjected to at least one level of first encoding processing to obtain the encoding result of the first image feature.
  • sequentially performing one-level or multi-level first encoding processing on the multiple first-dimensional feature vectors of the first image feature the position feature included in the first image feature can be enhanced to obtain the encoding result of the first image feature, It can have more obvious location characteristics between characters.
  • the step of sequentially performing at least one level of first encoding processing on the multiple first-dimensional feature vectors of the first image feature to obtain the encoding result of the first image feature includes: In the first-level first encoding process of the at least one-level first encoding process, the input information of the first encoding node is sequentially encoded by using N first encoding nodes to obtain the output results of the N first encoding nodes; where , In the case of 1 ⁇ i ⁇ N, the input information of the i-th first coding node includes the output result of the i-1th first coding node, and N and i are positive integers; according to the N first codes The output result of the node obtains the encoding result of the first image feature. In this way, the input information of the first first coding node can be transferred to the last first coding node, so that the input information of the first coding node can be memorized for a long time, so that the output result obtained is more accurate.
  • the input information of the first encoding node further includes the first-dimensional feature vector of the first image feature or the output result of the first encoding process of the previous stage.
  • the first-level first encoding process can pass the first-dimensional feature vector of the first image feature or the output result of the previous first-level encoding process to the last first encoding node through the first encoding node, so that the first-level first encoding node The output result of the encoding process can be more accurate.
  • the obtaining the character feature of the target image based on the determined position vector, the first image feature, and the second image feature includes: according to the position vector and the first image feature Two image features, determining the attention weight; using the attention weight to perform feature weighting on the first image feature to obtain the character feature of the target image.
  • the attention weight can be used to enhance the features that need to be paid attention to in the first image feature in one step, so that the character features obtained by using the attention weight to weight the first image feature can more accurately reflect the first image The more important part of the feature.
  • the method further includes: obtaining a preset information sequence including at least one first preset information; sequentially performing at least one level of second encoding processing on the at least one first preset information to obtain The position vector. Because in the process of using the neural network to perform the second encoding process on the at least one first preset information, the at least one first preset information is sequentially encoded, so that the generated position vector is compared with the at least one first preset information. The order is related, so that the position vector can represent the position characteristics between characters.
  • the step of sequentially performing at least one level of second encoding processing on the at least one first preset information to obtain the position vector includes:
  • the first-level second encoding process uses M second encoding nodes to sequentially encode the input information of the second encoding node to obtain the output result of the Mth second encoding node; where, in the case of 1 ⁇ j ⁇ M Next, the input information of the jth second coding node includes the output result of the i-1th second coding node, and M and j are positive integers; according to the output result of the Mth second coding node, the Position vector.
  • the input information of the first second encoding node can be transferred to the last second encoding node, so that the input information of the second encoding node is memorized for a long time, so that the obtained position vector is more accurate.
  • the input information of the second encoding node further includes the first preset information or the output result of the second encoding process of the previous stage.
  • the first level second encoding process can pass the first preset information or the output result of the previous level second encoding process to the last first encoding node through the second encoding node, so that the first level first encoding process output result can be more precise.
  • the recognizing characters in the target image based on the character features to obtain a character recognition result of the target image includes: extracting semantic features of the target image; based on the The semantic feature of the target image and the character feature obtain the character recognition result of the target image.
  • the semantic feature and the character feature can be combined to provide the accuracy of the character recognition result.
  • the extracting the semantic features of the target image includes: sequentially determining the semantic features of the target image in at least one time step based on the acquired second preset information;
  • the semantic feature of the target image and the character feature to obtain the character recognition result of the target image includes: obtaining the target image based on the semantic feature of the target image in at least one time step and the character feature and the character feature.
  • the character recognition result of the time step when there are multiple characters in the target image, the character recognition results can be obtained sequentially according to the positions (character features) and semantics (semantic features) of the characters, so that the accuracy of the character recognition results can be improved.
  • the sequentially determining the semantic features of the target image at at least one time step based on the acquired second preset information includes: performing at least one level of third level of the second preset information on the second preset information. Encoding process to obtain the semantic feature of the first time step in the at least one time step; perform at least one level of third encoding processing on the character recognition result of the target image at the k-1th time step to obtain the target The semantic feature of the image at the k-th time step, where k is an integer greater than 1.
  • the input information of the third coding node that is ranked first can be transferred to the third coding node that is ranked next, so that the input information of the third coding node can be memorized for a long time, and the obtained semantic features are more accurate.
  • a character recognition device including:
  • the obtaining part is configured to obtain the target image to be recognized
  • the determining part is configured to obtain the character feature of the target image based on the determined position vector and the first image feature of the target image; wherein the position vector is determined based on the position feature of the character in the preset information sequence ;
  • the recognition part is configured to recognize characters in the target image based on the character characteristics, and obtain a character recognition result of the target image.
  • the determining part is further configured to encode a first image feature of the target image to obtain an encoding result of the first image feature; according to the encoding of the first image feature As a result, the second image feature of the target image is determined; based on the determined position vector, the first image feature, and the second image feature, the character feature of the target image is obtained.
  • the determining part is further configured to sequentially perform at least one level of first encoding processing on the multiple first-dimensional feature vectors of the first image feature to obtain Encoding results.
  • the determining part is further configured to use N first coding nodes to sequentially perform the first coding process in the first coding process of the at least one first coding process.
  • the input information of the node is encoded to obtain the output results of N first encoding nodes; where, in the case of 1 ⁇ i ⁇ N, the input information of the i-th first encoding node includes the i-1th first encoding node
  • the output result of the coding node, N and i are positive integers; according to the output results of the N first coding nodes, the coding result of the first image feature is obtained.
  • the input information of the first encoding node further includes the first-dimensional feature vector of the first image feature or the output result of the first encoding process of the previous stage.
  • the determining part is further configured to determine an attention weight according to the position vector and the second image feature; use the attention weight to characterize the first image feature Weighting to obtain the character features of the target image.
  • the device further includes: an encoding part configured to obtain a preset information sequence including at least one first preset information; and sequentially perform at least one level of the at least one first preset information The second encoding process obtains the position vector.
  • the encoding part is further configured to use M second encoding nodes to sequentially perform the second encoding process for the first level second encoding process in the at least one level second encoding process.
  • the input information of the node is encoded to obtain the output result of the M-th second encoding node; where, in the case of 1 ⁇ j ⁇ M, the input information of the j-th second encoding node includes the i-1th
  • the output result of the second encoding node, M and j are positive integers; the position vector is obtained according to the output result of the M-th second encoding node.
  • the input information of the second encoding node further includes the first preset information or the output result of the second encoding process of the previous stage.
  • the recognition part is further configured to extract the semantic feature of the target image; and obtain the character recognition result of the target image based on the semantic feature of the target image and the character feature.
  • the recognition part is further configured to sequentially determine the semantic features of the target image at at least one time step based on the acquired second preset information; The semantic feature of the step and the character feature to obtain the character recognition result of the target image in at least one time step.
  • the identification part is further configured to perform at least one level of third encoding processing on the second preset information to obtain the semantic feature of the first time step in the at least one time step ; Perform at least one level of third encoding processing on the character recognition result of the target image at the k-1th time step to obtain the semantic feature of the target image at the kth time step, where k is an integer greater than 1 .
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the above-mentioned character recognition method.
  • a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the above-mentioned character recognition method is realized.
  • a computer program including computer readable code, and when the computer readable code is executed in an electronic device, a processor in the electronic device implements the above-mentioned character recognition method.
  • the target image to be recognized can be obtained, and then based on the determined position vector and the first image feature of the target image, the character feature of the target image is obtained, and then the characters in the target image are recognized based on the character feature.
  • the position vector is determined based on the position characteristics of the characters in the preset information sequence, which can represent the position characteristics between the characters, so that in the process of character recognition, the influence of the position characteristics between the characters on the character recognition results can be increased, and the The accuracy of character recognition, for example, for irregular characters and non-semantic characters, better recognition results can be obtained.
  • Fig. 1 shows a flowchart of a character recognition method according to an embodiment of the present disclosure.
  • Fig. 2 shows a block diagram of an example of determining a second image feature of a target image according to an embodiment of the present disclosure.
  • Fig. 3 shows a block diagram of an example of obtaining a character recognition result by using a neural network according to an embodiment of the present disclosure.
  • Fig. 4 shows a block diagram of an example of a character recognition device according to an embodiment of the present disclosure.
  • Fig. 5 shows a block diagram of an example of a character recognition device according to an embodiment of the present disclosure.
  • Fig. 6 shows a block diagram of an example of an electronic device according to an embodiment of the present disclosure.
  • the character recognition solution provided by the embodiments of the present disclosure can obtain the target image to be recognized, and then obtain the character characteristics of the target image based on the determined position vector and the first image characteristics of the target image, and then compare the characters in the target image based on the character characteristics. Recognize, get the character recognition result of the target image.
  • the position vector is determined based on the position characteristics of the characters in the preset information sequence, and can be used to represent the position characteristics of the characters, so that the position characteristics between the characters can be enhanced in the character recognition process, so that the obtained character recognition The result is more accurate.
  • character sequences are usually identified by semantic features between characters, but some characters in character sequences are less semantically related, for example, between characters in character sequences such as license plate numbers and room numbers. There are fewer semantic associations, so the effect of recognizing character sequences through semantic features is poor.
  • the character recognition scheme provided by the embodiments of the present disclosure can enhance the influence of the position characteristics of characters on character recognition, reduce the dependence of the character recognition process on semantic characteristics, and is better for the recognition of characters with less semantic association or the recognition of irregular characters. Recognition effect.
  • Fig. 1 shows a flowchart of a character recognition method according to an embodiment of the present disclosure.
  • the character recognition method can be executed by a terminal device, a server, or other types of electronic devices.
  • the terminal device can be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, or a personal digital processing device. (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc.
  • the character recognition method can be implemented by a processor calling computer-readable instructions stored in a memory. The following describes the character recognition method of the embodiment of the present disclosure by taking the electronic device as the execution subject as an example.
  • Step S11 Obtain a target image to be recognized.
  • the electronic device may have an image capture function, and may capture the target image to be recognized.
  • the electronic device may obtain the target image to be recognized from other devices.
  • the electronic device may obtain the target image to be recognized from a camera device, a monitoring device, etc. to the device.
  • the target image to be recognized may be an image waiting for character recognition.
  • the target image can carry characters, and the characters can be a single character or a character string.
  • the characters in the target image may be regular characters, for example, text written in a standard font may be regular characters. Regular characters can have the characteristics of neat arrangement, uniform size, no deformation, no occlusion, etc.
  • the characters in the target image may also be irregular characters, for example, some artistic texts on shop signs and advertisement covers. Irregular characters can have features such as irregular arrangements, different sizes, deformities, or being obscured.
  • Step S12 Obtain the character feature of the target image based on the determined position vector and the first image feature of the target image; wherein the position vector is determined based on the position feature of the character in the preset information sequence.
  • the position vector used to represent the position feature of the character can be determined based on the position feature of the character in the preset information sequence. For example, a preset information sequence of a certain length can be obtained, and then the characters in the preset information sequence can be extracted Location characteristics.
  • the position vector is related to the position of the character. For example, if the position of a character to be recognized in the character sequence is the third character position, the position vector can represent the relative position of the character to be recognized in the character sequence, that is, Represents the third character position.
  • the characters in the preset information sequence may be the same.
  • each character in the preset information sequence may also be set to non-existent semantic information, thereby further reducing the correlation between the position vector and the semantics of the character.
  • the position vector is less related to the semantics of the character, so for different target images, the position vector can be the same or different.
  • the first image feature of the target image may be obtained by image extraction of the target image.
  • a neural network may be used to perform at least one convolution operation on the target image to obtain the first image feature of the target image.
  • the character characteristic of the target image can be determined.
  • the determined position vector and the first image characteristic of the target image are fused to obtain the character characteristic of the target image.
  • the character feature is obtained based on the position vector and the first image feature, the character feature is less affected by the semantics of the character.
  • Step S13 Recognizing characters in the target image based on the character characteristics, and obtaining a character recognition result of the target image.
  • the neural network can be used to process the character features, for example, the character feature is activated, or the character feature is output to the fully connected layer of the neural network to perform the fully connected operation, etc., to obtain the characters of the target image Recognition results.
  • the character recognition result may be the recognition result of the characters in the target image. In the case where one character is included in the target image, the character recognition result may be one character. In the case where the target image includes a character sequence, the character recognition result may be a character sequence, and the sequence of each character in the character recognition result is the same as the sequence of the corresponding characters in the target image.
  • character recognition results obtained through character features are less affected by the semantics of the characters, so that some character sequences with less semantically related characters can also have a better recognition effect. For example, it can be used for the semantics of the license plate. Character recognition is performed on unrelated character sequences.
  • the character feature of the target image can be obtained based on the determined position vector and the first image feature of the target image, thereby reducing the influence of semantics on the character feature.
  • the following provides an implementation method for obtaining the character features of the target image.
  • the first image feature of the target image can be encoded to obtain the encoding result of the first image feature, and then according to the encoding result of the first image feature, the second image feature of the target image is determined, and then based on The preset position vector, the first image feature, and the second image feature are used to obtain the character feature of the target image.
  • the neural network can be used to encode the first image feature of the target image.
  • the first image feature can be coded row by row or column by column, so that the location feature included in the first image feature can be coded.
  • the second image feature of the target image can be obtained.
  • the first image feature and the encoding result can be fused to obtain the second image feature of the target image.
  • the feature has a stronger location feature.
  • the character feature of the target image can be obtained.
  • the determined position vector, the first image feature, and the second image feature are fused to obtain the character feature of the target image.
  • the second image feature has a stronger location feature, and the obtained character feature of the target image also has a stronger location feature, so that the character recognition result obtained from the character feature is more accurate, and the character recognition result is less affected by semantics.
  • the first image feature of the target image may be encoded, so that the position feature included in the first image feature is enhanced.
  • the process of obtaining the encoding result of the first image feature is described below through an example.
  • At least one level of first encoding processing may be performed on the multiple first-dimensional feature vectors of the first image feature in sequence to obtain the encoding result of the first image feature.
  • the first image feature may include multiple first-dimensional feature vectors.
  • the first image feature may include features in multiple dimensions.
  • the first image feature may include multiple dimensions such as length, width, and depth.
  • the feature dimensions in different dimensions can be different.
  • the first-dimensional feature vector may be a feature of the first image feature in one dimension, for example, the first-dimensional feature vector may be a feature in a length dimension or a width dimension.
  • the first encoding process may be encoding for the first image feature.
  • the neural network may include at least one first encoding layer, and the encoding process corresponding to the first encoding layer may be the first encoding process.
  • the neural network can be used to sequentially perform one-level or multi-level first encoding processing on multiple first-dimensional feature vectors to obtain processing results of multiple first-dimensional feature vectors.
  • One first-dimensional feature vector can correspond to one processing result.
  • multiple processing results of multiple first-dimensional features can be combined to form a coding result of the first image feature.
  • N first encoding nodes may be used to sequentially encode the input information of the first encoding node to obtain the information of the N first encoding nodes.
  • Output result where, in the case of 1 ⁇ i ⁇ N, the input information of the i-th first coding node includes the output result of the i-1th first coding node, and N and i are positive integers. According to the output results of the N first coding nodes, the coding result of the first image feature is obtained.
  • a neural network may be used to perform at least one level of first encoding processing on the first image feature to obtain the encoding result of the first image feature.
  • the neural network may include at least one level of a first coding layer, the first coding layer can perform a first coding process, and each level of the first coding process is implemented by a plurality of coding nodes. In the case where the first encoding process is multiple stages, the operations performed by the first encoding process at each stage may be the same.
  • N first encoding nodes may be used to encode the input information of the first-level encoding process sequentially, and one first encoding node may correspond to one input information, The input information of different first coding nodes may be different.
  • a first coding node can get an output result.
  • the input information of the first encoding node in the first level of the first encoding process may be the first dimension feature vector of the first image feature.
  • the output result of the first encoding node in the first level of the first encoding process can be used as the input information of the first encoding node in the same order in the second level of the first encoding process, and so on, until the last level of the first encoding process.
  • the output result of the first coding node in the first coding process of the last stage may be the processing result of the first dimension feature vector described above.
  • the first level of the first encoding process may include N first encoding nodes. In the case of 1 ⁇ i ⁇ N, that is, the first encoding node is one of the first encoding nodes in the first encoding process of the current level.
  • the input information of the first coding node may also include the output result of the previous first coding node in the first coding process of this level, so that the input information of the first first coding node can be It is passed to the last first coding node, so that the input information of the first coding node can be memorized for a long time, and the output result obtained is more accurate.
  • Fig. 2 shows a block diagram of an example of determining a second image feature of a target image according to an embodiment of the present disclosure.
  • a neural network such as a Long Short-Term Memory (LSTM)
  • LSTM Long Short-Term Memory
  • the neural network may include two first coding layers, and each first coding layer may include multiple first coding nodes (corresponding to the coding nodes in FIG. 2).
  • the first image feature F of the target image can be input to the first coding layer of the neural network, and the multiple first coding nodes of the first coding layer are used to compare the multiple first-dimensional feature vectors (width Dimensional feature vector) for encoding, and the output result of each first encoding node is obtained.
  • width Dimensional feature vector width Dimensional feature vector
  • the input information of the first first coding node is the first first dimension feature vector
  • the input information of the second first coding node is the output result of the first first coding node and the second first dimension
  • the feature vector, and so on, can get the output result of the last first coding node.
  • the output results of the multiple first coding points are input into the second layer and the first coding layer.
  • the processing procedure of the second layer and the first coding layer is similar to the processing procedure of the first layer and the first coding layer, and will not be repeated here.
  • the encoding result F 2 of the first image feature can be obtained.
  • the first image feature F and the encoding result F 2 of the first image feature can be feature fused, where features can be added or combined to obtain the second image feature of the target image
  • the second image feature is obtained from the first image feature F.
  • It may be the feature vector (first dimension feature vector) of the first image feature F at position (i, j); It can represent the feature vector of the output result F 1 of the first coding layer at the position (i, j) of the first layer; It can represent the feature vector of the output result F 1 at the position (i, j-1); Can represent the feature vector of the encoding result F 2 at position (i, j); Can represent the feature vector of the encoding result F 2 at the position (i, j-1); Can represent the obtained second image feature; Can represent the addition of vectors.
  • i and j are both natural numbers.
  • the character feature of the target image can be obtained based on the determined position vector, the first image feature, and the second image feature.
  • the following provides an example to describe the process of obtaining the character feature of the target image.
  • the attention weight may be determined according to the determined position vector and the second image feature, and then the attention weight may be used to weight the first image feature to obtain the character feature of the target image.
  • the attention weight can be determined according to the position vector and the second image feature, for example, the correlation between the position vector and the second image feature can be determined according to This correlation determines the attention weight.
  • the correlation between the position vector and the second image feature can be obtained by the dot product of the position vector and the second image feature.
  • feature weighting can be performed on the first image feature. For example, the attention weight can be multiplied by the first image feature and then summed to obtain the character feature of the target image.
  • the attention weight can be used to enhance the features that need attention in the first image feature in one step, so that the character features obtained by using the attention weight to weight the first image feature can more accurately reflect the first image feature The more important feature part.
  • the attention weight can be determined by the following formula (4):
  • softmax represents the activation function
  • the character characteristics can be determined by the following formula (5):
  • g t represents character characteristics; Represents the attention weight; f i,j represents the feature vector of the first image feature F at the feature location (i,j). Using the above formula (5), the character feature can be obtained from the attention weight and the first image feature.
  • the attention weight can be determined according to the determined position vector and the second image feature.
  • the position vector can represent the position characteristics of the characters, that is, it can represent the relative positions between the characters. The following describes the process of determining the position vector through an implementation manner.
  • a preset information sequence including at least one first preset information may be obtained, and then at least one first preset information is sequentially subjected to at least one level of second encoding processing to obtain a position vector.
  • the preset information sequence may include one or more first preset information.
  • the first preset information may be information set according to an actual scene, and may not have a specific meaning.
  • the first preset information may be a counting instruction.
  • a neural network may be used to sequentially perform one or more levels of second encoding processing on at least one first preset information to obtain a position vector. Since at least one piece of first preset information is the same and has no specific meaning, the semantic association between at least one piece of first preset information is relatively small, and at least one piece of first preset information is sequentially performed one or more levels. Second, the position vector obtained by the encoding process has a low degree of semantic correlation.
  • the at least one first preset information is sequentially encoded, so that the generated position vector is consistent with the at least one first preset information.
  • the order of the information is related, that is, it can be understood as being related to the position between the at least one first preset information, so that the position vector can represent the position feature between the characters.
  • M second encoding nodes may be used to sequentially encode the input information of the second encoding node to obtain the M-th encoding node.
  • the output result of the second encoding node In the case of 1 ⁇ j ⁇ M, the input information of the jth second coding node includes the output result of the i-1th second coding node, and M and j are positive integers. According to the output result of the M-th second encoding node, the position vector is obtained.
  • a neural network may be used to sequentially perform one or more levels of second encoding processing on at least one first preset information to obtain a position vector.
  • the operations performed by the second encoding process at each stage may be the same.
  • M second encoding nodes may be used to sequentially encode the input information of the second encoding process, and one second encoding node may correspond to one input information, The input information of different second coding nodes may be different.
  • a second encoding node can get an output result.
  • the input information of a second coding node in the first-level second coding process may be a piece of first preset information.
  • the output result of the second coding node in the first level of the first coding process can be used as the input information of the second coding nodes in the same order in the second level of the second coding process, and so on, until the last level of the second coding process.
  • the output result of the last second coding node in the last stage of the second coding process can be used as a position vector, or the output result of the last second coding node in the last stage of the second coding process can be convolved, pooled, etc. Further processing, the position vector can be obtained.
  • the first-level second encoding process may include M second encoding nodes.
  • the second encoding node is one of the first second encoding nodes in the current-level second encoding process.
  • the input information of the second encoding node may also include the output result of the previous second encoding node in the second encoding process at this level, so that the input information of the first second encoding node may be Passed to the last second coding node, the input information of the second coding node is long-term memorized, making the obtained position vector more accurate.
  • the first preset information is a constant " ⁇ next>" and the second encoding process is a two-level LSTM as an example
  • the following formula (6) and formula (7) can be used to determine the position vector h t .
  • h′ t can represent the output result of the t-th second coding node in the first-level second coding process
  • h′ t-1 represents the t-1-th second coding node in the first-level second coding process
  • Output result
  • h t can represent the output result of the t-th second encoding node in the second-level second encoding process, that is, the position vector
  • h t-1 represents the t-1th-th first-level encoding node in the second-level second encoding process
  • Second the output result of the encoding node.
  • t is a natural number.
  • the process of obtaining the position vector from at least one first preset information can be implemented by using the neural network shown in FIG.
  • the output result is not formed by the output results of multiple second coding nodes.
  • the characters in the target image can be recognized based on the character characteristics, and the character recognition result of the target image can be obtained.
  • the semantic features of the characters in the target image can also be considered in the process of recognizing the characters in the target image. The following describes the process of obtaining the character recognition result of the target image through an implementation manner.
  • the semantic features of the target image can be extracted, and then based on the semantic features and character features of the target image, the character recognition result of the target image can be obtained.
  • the semantic features of the target image can be extracted.
  • the semantic features of the target image can be extracted using some scene semantic extraction models, and then the semantic features and character features of the target image can be fused to obtain the fusion result, for example, .
  • the semantic feature and the character feature can be spliced together, or, after the semantic feature and the character feature are spliced, the feature weighting is performed to obtain the fusion result.
  • the weight of the feature weighting may be preset, or it may be calculated based on the semantic feature and the character feature. Then, according to the fusion result, the character recognition result of the target image can be obtained.
  • the fusion result can be subjected to at least one convolution operation, full connection operation, etc., to obtain the character recognition result of the target image.
  • the semantic feature and the character feature can be combined to provide the accuracy of the character recognition result.
  • the semantic feature can be expressed as c t
  • the character feature can be expressed as g t .
  • the following formula (8) and formula (9) can be used to obtain the fusion result of the semantic feature and the character feature:
  • w t can represent the weight of feature weighting on the semantic feature c t and the character feature g t
  • w f can represent the first mapping matrix, where the first mapping matrix can be used to combine the semantic feature c t and the character feature g t is mapped to a two-dimensional vector space
  • b f can represent the first bias term.
  • y t can represent the character recognition result
  • W can represent the second mapping matrix, here, the second mapping matrix can be used to merge the result Perform a linear transformation
  • b can be the second bias term.
  • the semantic features of the target image at at least one time step may be sequentially determined based on the acquired second preset information, and then based on the semantic features and character features of the target image at at least one time step, the target The character recognition result of the image in at least one time step.
  • the acquired second preset information may be selected according to the actual scene, and the second preset information may not have a specific meaning.
  • the second preset information may be a start instruction.
  • the step length of the time step can be set according to actual application requirements.
  • Each time step can determine a semantic feature, and the semantic features obtained at different time steps can be different.
  • a neural network can be used to encode the second preset information to sequentially obtain the semantic features of at least one time step, and then according to the semantic features of the target image at at least one time step and the character features of at least one time step, the target image can be obtained The character recognition result in at least one time step.
  • the semantic feature of one time step and the character feature of the same time step can correspond to the character recognition result of one time step. That is to say, when there are multiple characters in the target image, the character recognition result can be based on the position of the character Features) and semantics (semantic features) are sequentially obtained, which can improve the accuracy of character recognition results.
  • At least one level of third encoding processing can be performed on the second preset information to obtain the semantic features of the first time step in at least one time step, and then the characters of the target image at the k-1th time step
  • the recognition result is subjected to at least one level of third encoding processing to obtain the semantic feature of the target image at the k-th time step.
  • k is an integer greater than 1.
  • the second preset information can be used as input information of at least one level of third coding processing in the neural network.
  • Each level of the third encoding process may include multiple third encoding nodes, and each third encoding node may correspond to input information of one time step.
  • the input information of different third coding nodes may be different.
  • a third encoding node can get an output result.
  • the input information of the first third coding node in the first level of the third coding process may be the second preset information.
  • the output result of the third encoding node in the first level of the third encoding process can be used as the input information of the third encoding node in the same order in the second level of the third encoding process, and so on, until the last level of the third encoding process, so .
  • the second preset information may be subjected to at least one level of third encoding processing to obtain the output result of the first third encoding node in the last level of third encoding processing, and the output result may be the first one in at least one time step Semantic characteristics of time steps.
  • the character recognition result of the first time step can be obtained according to the semantic feature of the first time step and the character feature of the same time step.
  • the input information of the second and third coding node in the first level of the third process can be the character recognition result of the first time step. Then, at least one level of third encoding processing can be performed on the character recognition result of the first time step to obtain the semantic feature of the second time step. Further, the character recognition result of the second time step can be obtained according to the semantic feature of the second time step and the character feature of the same time step. And so on, until the last level of the third encoding process. In the last stage of the third encoding process, the output result of the last third encoding node may be the semantic feature of the last time step.
  • the semantic feature of the target image at the kth time step can be obtained.
  • k is an integer greater than 1
  • the third coding node is a third coding node other than the first third coding node in the third coding process of the current stage
  • the third coding node The input information of can also include the output result of the previous third coding node in the third coding process of this level, so that the input information of the third coding node in the front can be transferred to the third coding node in the following order, so that The input information of the third coding node is long-term memorized, making the obtained semantic features more accurate.
  • the process of determining the semantic feature from the second preset information can be implemented by using the neural network shown in FIG. The output result of the third encoding node.
  • a neural network may be used to obtain the character recognition result of the target image.
  • the following uses an example to illustrate the process of obtaining the character recognition result of the target image by using the neural network.
  • Fig. 3 shows a block diagram of an example of obtaining a character recognition result by using a neural network according to an embodiment of the present disclosure.
  • the neural network may include an encoder and a decoder.
  • the target image can be output to the encoder of the neural network, and the image feature of the target image can be extracted by the encoder to obtain the first image feature F of the target image.
  • a 31-layer residual neural network (Residual Neural Network, ResNet) network architecture can be used to perform image feature extraction on the target image.
  • the encoder may include a position information enhancement module, and the position information enhancement module may be used to enhance the position information in the first image feature to obtain the second image feature of the target image
  • the network architecture of the location information enhancement module can be shown in Figure 2.
  • the second image feature can then be Enter the attention module of the decoder, and use the attention module to convert the second image feature Perform matrix multiplication and activation operations with the position vector h t to obtain the attention weight, and then use the attention weight to perform feature weighting on the first image feature F, that is, perform matrix multiplication on the attention weight and the first image feature to obtain the target image Character characteristics.
  • the decoder also includes a dynamic fusion module, which can be used to fuse character features and semantic features, and then input the fusion result into the fully connected layer to obtain the character recognition result.
  • the decoder also includes a position encoding module, and multiple constants " ⁇ next>" (first preset information) can be sequentially input into the position encoding module, that is, a constant " ⁇ next>” is input for each time step.
  • the position encoding module may include two encoding layers (corresponding to the first encoding process), and may encode the input " ⁇ next>" to obtain the position vector h t at the t-th time step.
  • the position coding module may include a two-layer coding layer.
  • the decoder also includes a semantic module, which can input a special token " ⁇ start>" (second preset information) as the input information of the first time step into the semantic module, and get the first time step output by the semantic module Semantic features. Then the character recognition result y 0 at the first time step can be used as the output result of the second time step of the semantic module to obtain the semantic features of the second time step output by the semantic module, and so on, the semantic module that can be obtained is The semantic feature c t output at the t-th time step.
  • the semantic module may include a two-layer coding layer.
  • the network architecture of the position coding module and the semantic module can be similar to the network architecture in FIG. 2, and will not be repeated here.
  • the encoder includes a position information enhancement module
  • the decoder includes: a position encoding module, an attention module, a semantic module, and a dynamic fusion module; wherein, the position information enhancement module includes a two-layer LSTM (refer to Figure 2).
  • the two-layer LSTM encodes the first image feature of the target image from left to right to obtain the encoding result of the first image feature, and adds the encoding result of the first image feature to the first image feature to obtain the second feature of the target image
  • the second image feature is then determined, and the second image feature is used as the output of the position information enhancement module
  • the position coding module includes a two-layer LSTM; each input of the position coding module is a specific input, It is essentially a character length counter; the position coding module can be used to perform two-level second coding processing on at least one preset information to obtain the position vector; the position vector and the second image feature are input into the attention module, and the attention module Perform matrix multiplication and activation operations on the second image feature and the position vector to obtain the attention weight; then according to the attention weight, take the weighted average of the first image feature to obtain the character feature of the target image; input the second preset information
  • the semantic module obtains the semantic features of the target image; the dynamic fusion module
  • the character encoding scheme adopted by the embodiments of the present disclosure enhances the position information between characters, reduces the dependence of the character recognition result on the semantics, and makes the character recognition more accurate.
  • the character encoding scheme provided by the present disclosure can be applied to more complex character recognition scenarios, for example, the recognition of irregular characters, the recognition of non-semantic characters, etc., and it can also be applied to scenarios such as image recognition, such as image review and image analysis. Wait.
  • the present disclosure also provides devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any character recognition method provided in the present disclosure.
  • the corresponding technical solutions and descriptions and the corresponding records in the method section are not Go into details again.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • Fig. 4 shows a block diagram of a character recognition device according to an embodiment of the present disclosure. As shown in Fig. 4, the device includes:
  • the obtaining part 41 is configured to obtain a target image to be recognized
  • the determining part 42 is configured to obtain the character feature of the target image based on the determined position vector and the first image feature of the target image; wherein the position vector is determined based on the position feature of the character in the preset information sequence of;
  • the recognition part 43 is configured to recognize characters in the target image based on the character characteristics, and obtain a character recognition result of the target image.
  • the determining part 42 is further configured to encode a first image feature of the target image to obtain an encoding result of the first image feature; according to the first image feature Based on the encoding result, the second image feature of the target image is determined; based on the determined position vector, the first image feature, and the second image feature, the character feature of the target image is obtained.
  • the determining part 42 is further configured to sequentially perform at least one level of first encoding processing on the multiple first-dimensional feature vectors of the first image feature to obtain the first image feature The encoding result.
  • the determining part 42 is further configured to use N first coding nodes to sequentially perform the first-level first encoding process in the at least one first-level first encoding process.
  • the input information of the encoding node is encoded to obtain the output results of N first encoding nodes; among them, in the case of 1 ⁇ i ⁇ N, the input information of the i-th first encoding node includes the i-1th first encoding
  • the output result of the node, N and i are positive integers; according to the output results of the N first coding nodes, the coding result of the first image feature is obtained.
  • the input information of the first encoding node further includes the first-dimensional feature vector of the first image feature or the output result of the first encoding process of the previous stage.
  • the determining part 42 is further configured to determine an attention weight based on the position vector and the second image feature; and use the attention weight to perform a calculation on the first image feature.
  • the feature is weighted to obtain the character feature of the target image.
  • the device further includes:
  • the encoding part is configured to obtain a preset information sequence including at least one first preset information; sequentially perform at least one level of second encoding processing on the at least one first preset information to obtain the position vector.
  • the encoding part is further configured to use M second encoding nodes to sequentially perform the second encoding process for the first level second encoding process in the at least one level second encoding process.
  • the input information of the node is encoded to obtain the output result of the M-th second encoding node; among them, in the case of 1 ⁇ j ⁇ M, the input information of the j-th second encoding node includes the i-1th second encoding
  • the output result of the node, M and j are positive integers; the position vector is obtained according to the output result of the M-th second coding node.
  • the input information of the second encoding node further includes the first preset information or the output result of the second encoding process of the previous stage.
  • the recognition part 43 is further configured to extract the semantic feature of the target image; based on the semantic feature of the target image and the character feature, obtain the character recognition result of the target image .
  • the recognition part 43 is further configured to sequentially determine the semantic features of the target image at at least one time step based on the acquired second preset information; The semantic feature of the time step and the character feature are used to obtain the character recognition result of the target image in at least one time step.
  • the identification part 43 is further configured to perform at least one level of third encoding processing on the second preset information to obtain the semantics of the first time step in the at least one time step Features; at least one level of third encoding is performed on the character recognition result of the target image at the k-1th time step to obtain the semantic features of the target image at the kth time step, where k is greater than 1 Integer.
  • parts may be parts of circuits, parts of processors, parts of programs or software, etc., of course, may also be units, modules, or non-modular.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • Fig. 5 is a block diagram showing a character recognition device 800 according to an exemplary embodiment.
  • the device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
  • the device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, And the communication component 816.
  • the processing component 802 generally controls the overall operations of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
  • the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
  • the memory 804 is configured to store various types of data to support operations in the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the power supply component 806 provides power to various components of the device 800.
  • the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the device 800.
  • the multimedia component 808 includes a screen that provides an output interface between the device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), and when the device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
  • the audio component 810 further includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 814 includes one or more sensors for providing the device 800 with various aspects of status assessment.
  • the sensor component 814 can detect the open/close state of the device 800 and the relative positioning of the components.
  • the component is the display and the keypad of the device 800.
  • the sensor component 814 can also detect the position change of the device 800 or a component of the device 800. , The presence or absence of contact between the user and the device 800, the orientation or acceleration/deceleration of the device 800, and the temperature change of the device 800.
  • the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the device 800 and other devices.
  • the device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the apparatus 800 may be implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing equipment (DSPD), programmable logic devices (PLD), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing equipment
  • PLD programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • a computer-readable storage medium such as a memory 804 including computer program instructions, which can be executed by the processor 820 of the device 800 to complete the foregoing method.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory 804 to execute the above method.
  • the electronic device can be provided as a terminal, server or other form of device.
  • Fig. 6 is a block diagram showing an electronic device 1900 according to an exemplary embodiment.
  • the electronic device 1900 may be provided as a server. 6
  • the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions executable by the processing component 1922, such as application programs.
  • the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a computer-readable storage medium such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
  • Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to realize various aspects of the present disclosure.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the target image to be recognized can be obtained, and then based on the determined position vector and the first image feature of the target image, the character feature of the target image is obtained, and then the characters in the target image are recognized based on the character feature to obtain The character recognition result of the target image.
  • the position vector is determined based on the position characteristics of the characters in the preset information sequence, which can represent the position characteristics between the characters, so that in the character recognition process, the influence of the position characteristics between the characters on the character recognition results can be increased, and the effect of the character recognition results can be reduced.
  • the character recognition process relies on semantic features to improve the accuracy of character recognition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)
PCT/CN2021/081759 2020-04-16 2021-03-19 字符识别方法及装置、电子设备和存储介质 WO2021208666A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021567034A JP2022533065A (ja) 2020-04-16 2021-03-19 文字認識方法及び装置、電子機器並びに記憶媒体
KR1020227000935A KR20220011783A (ko) 2020-04-16 2021-03-19 심볼 식별 방법 및 장치, 전자 기기 및 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010301340.3A CN111539410B (zh) 2020-04-16 2020-04-16 字符识别方法及装置、电子设备和存储介质
CN202010301340.3 2020-04-16

Publications (1)

Publication Number Publication Date
WO2021208666A1 true WO2021208666A1 (zh) 2021-10-21

Family

ID=71974957

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081759 WO2021208666A1 (zh) 2020-04-16 2021-03-19 字符识别方法及装置、电子设备和存储介质

Country Status (5)

Country Link
JP (1) JP2022533065A (ko)
KR (1) KR20220011783A (ko)
CN (1) CN111539410B (ko)
TW (1) TW202141352A (ko)
WO (1) WO2021208666A1 (ko)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546810A (zh) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 图像元素类别的识别方法及装置
WO2024027349A1 (zh) * 2022-08-05 2024-02-08 中南大学 一种印刷体数学公式识别方法、装置及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539410B (zh) * 2020-04-16 2022-09-06 深圳市商汤科技有限公司 字符识别方法及装置、电子设备和存储介质
CN113516146A (zh) * 2020-12-21 2021-10-19 腾讯科技(深圳)有限公司 一种数据分类方法、计算机及可读存储介质
CN113052156B (zh) * 2021-03-12 2023-08-04 北京百度网讯科技有限公司 光学字符识别方法、装置、电子设备和存储介质
CN113610081A (zh) * 2021-08-12 2021-11-05 北京有竹居网络技术有限公司 一种字符识别方法及其相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615006A (zh) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 文字识别方法及装置、电子设备和存储介质
CN110569846A (zh) * 2019-09-16 2019-12-13 北京百度网讯科技有限公司 图像文字识别方法、装置、设备及存储介质
US20200097718A1 (en) * 2018-09-26 2020-03-26 Leverton Holding Llc Named entity recognition with convolutional networks
CN111539410A (zh) * 2020-04-16 2020-08-14 深圳市商汤科技有限公司 字符识别方法及装置、电子设备和存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100555308C (zh) * 2005-07-29 2009-10-28 富士通株式会社 地址识别装置和方法
JP5417113B2 (ja) * 2009-10-02 2014-02-12 シャープ株式会社 情報処理装置、情報処理方法、プログラムおよび記録媒体
US10354168B2 (en) * 2016-04-11 2019-07-16 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents
RU2691214C1 (ru) * 2017-12-13 2019-06-11 Общество с ограниченной ответственностью "Аби Продакшн" Распознавание текста с использованием искусственного интеллекта
CN108062290B (zh) * 2017-12-14 2021-12-21 北京三快在线科技有限公司 消息文本处理方法及装置、电子设备、存储介质
CN110321755A (zh) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 一种识别方法及装置
JP2019215647A (ja) * 2018-06-12 2019-12-19 キヤノンマーケティングジャパン株式会社 情報処理装置、その制御方法及びプログラム。
CN110619325B (zh) * 2018-06-20 2024-03-08 北京搜狗科技发展有限公司 一种文本识别方法及装置
CN109492679A (zh) * 2018-10-24 2019-03-19 杭州电子科技大学 基于注意力机制与联结时间分类损失的文字识别方法
CN109919174A (zh) * 2019-01-16 2019-06-21 北京大学 一种基于门控级联注意力机制的文字识别方法
CN110659640B (zh) * 2019-09-27 2021-11-30 深圳市商汤科技有限公司 文本序列的识别方法及装置、电子设备和存储介质
CN110991560B (zh) * 2019-12-19 2023-07-07 深圳大学 一种结合上下文信息的目标检测方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097718A1 (en) * 2018-09-26 2020-03-26 Leverton Holding Llc Named entity recognition with convolutional networks
CN109615006A (zh) * 2018-12-10 2019-04-12 北京市商汤科技开发有限公司 文字识别方法及装置、电子设备和存储介质
CN110569846A (zh) * 2019-09-16 2019-12-13 北京百度网讯科技有限公司 图像文字识别方法、装置、设备及存储介质
CN111539410A (zh) * 2020-04-16 2020-08-14 深圳市商汤科技有限公司 字符识别方法及装置、电子设备和存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024027349A1 (zh) * 2022-08-05 2024-02-08 中南大学 一种印刷体数学公式识别方法、装置及存储介质
CN115546810A (zh) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 图像元素类别的识别方法及装置
CN115546810B (zh) * 2022-11-29 2023-04-11 支付宝(杭州)信息技术有限公司 图像元素类别的识别方法及装置

Also Published As

Publication number Publication date
CN111539410A (zh) 2020-08-14
KR20220011783A (ko) 2022-01-28
CN111539410B (zh) 2022-09-06
TW202141352A (zh) 2021-11-01
JP2022533065A (ja) 2022-07-21

Similar Documents

Publication Publication Date Title
WO2021208666A1 (zh) 字符识别方法及装置、电子设备和存储介质
JP6916970B2 (ja) ビデオ処理方法及び装置、電子機器並びに記憶媒体
TWI732338B (zh) 文本序列的識別方法、電子設備和電腦可讀存儲介質
TWI740309B (zh) 圖像處理方法及裝置、電子設備和電腦可讀儲存介質
CN111310616B (zh) 图像处理方法及装置、电子设备和存储介质
WO2021051650A1 (zh) 人脸和人手关联检测方法及装置、电子设备和存储介质
CN111612070B (zh) 基于场景图的图像描述生成方法及装置
WO2021208667A1 (zh) 图像处理方法及装置、电子设备和存储介质
US20220292265A1 (en) Method for determining text similarity, storage medium and electronic device
WO2021139120A1 (zh) 网络训练方法及装置、图像生成方法及装置
CN109615006B (zh) 文字识别方法及装置、电子设备和存储介质
CN109145150B (zh) 目标匹配方法及装置、电子设备和存储介质
CN110781813B (zh) 图像识别方法及装置、电子设备和存储介质
CN111242303B (zh) 网络训练方法及装置、图像处理方法及装置
WO2019165832A1 (zh) 文字信息处理方法、装置及终端
WO2020220807A1 (zh) 图像生成方法及装置、电子设备及存储介质
WO2019205605A1 (zh) 人脸特征点的定位方法及装置
CN110633470A (zh) 命名实体识别方法、装置及存储介质
WO2022141969A1 (zh) 图像分割方法及装置、电子设备、存储介质和程序
TWI770531B (zh) 人臉識別方法、電子設備和儲存介質
WO2023024439A1 (zh) 一种行为识别方法及装置、电子设备和存储介质
WO2023092975A1 (zh) 图像处理方法及装置、电子设备、存储介质及计算机程序产品
CN113157923B (zh) 实体分类方法、装置及可读存储介质
CN114842404A (zh) 时序动作提名的生成方法及装置、电子设备和存储介质
CN115422932A (zh) 一种词向量训练方法及装置、电子设备和存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021567034

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21789539

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20227000935

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21789539

Country of ref document: EP

Kind code of ref document: A1