CN110858307B - Character recognition model training method and device and character recognition method and device - Google Patents

Character recognition model training method and device and character recognition method and device Download PDF

Info

Publication number
CN110858307B
CN110858307B CN201810973521.3A CN201810973521A CN110858307B CN 110858307 B CN110858307 B CN 110858307B CN 201810973521 A CN201810973521 A CN 201810973521A CN 110858307 B CN110858307 B CN 110858307B
Authority
CN
China
Prior art keywords
content
sample
character recognition
model
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810973521.3A
Other languages
Chinese (zh)
Other versions
CN110858307A (en
Inventor
江建军
郑凯
段立新
李建丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201810973521.3A priority Critical patent/CN110858307B/en
Publication of CN110858307A publication Critical patent/CN110858307A/en
Application granted granted Critical
Publication of CN110858307B publication Critical patent/CN110858307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a character recognition model training method and device and a character recognition method and device, wherein the training method comprises the following steps: acquiring a sample image; wherein, the sample image comprises a plurality of sample contents; confirming content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content; and taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model. According to the method and the device, the target characters in the target image can be identified by utilizing the relation probability matrix between the target characters contained in the target content, and the identification accuracy and efficiency are high.

Description

Character recognition model training method and device and character recognition method and device
Technical Field
The application relates to the technical field of image-text processing, in particular to a character recognition model training method and device and a character recognition method and device.
Background
Optical Character Recognition (OCR) is a relatively common image-based Character Recognition technology that can recognize Optical characters in pictures and translate the Optical characters into computer text through image processing and pattern Recognition technologies.
In the related art, after a picture to be recognized is acquired, an image to be recognized can be recognized through an OCR recognition model, and an OCR recognition result of the acquired image is directly used as a final recognition result. However, the difficulty of learning the word relationship from the image is limited, which results in low recognition accuracy and recognition efficiency of the related technical scheme of character recognition by using the OCR recognition model, and limits the wide application of character recognition to a certain extent.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and an apparatus for training a character recognition model, and a method and an apparatus for recognizing characters, so as to improve accuracy and efficiency of character recognition.
In a first aspect, an embodiment of the present application provides a method for training a character recognition model, including:
acquiring a sample image; wherein the sample image comprises a plurality of sample contents;
confirming content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content;
and taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
With reference to the first aspect, the present application provides a first possible implementation manner of the first aspect, where the training of the character recognition model by using the content feature information of each sample content and the relationship probability matrix between the sample characters included in the sample content as the input features of the character recognition model to be trained and using the sample characters included in the sample content as the output result of the character recognition model to be trained includes:
sequentially taking the determined content characteristic information of each sample content as the input characteristic of a first character recognition submodel, taking the corresponding sample content as the output result of the first character recognition submodel, and training the first character recognition submodel to obtain an initial training first character recognition submodel;
for each sample content in the sample image, taking content feature information of the sample content as input of the initial training first character recognition submodel to obtain recognition content, taking the recognition content as input feature of a second character recognition submodel, taking a relation probability matrix between recognition characters contained in the recognition content as an output result of the second character recognition submodel, and training the second character recognition submodel;
for each sample content in the sample image, taking the content characteristic information of the sample content and a relation probability matrix obtained by identifying the identification content corresponding to the sample content by the trained second character identification submodel as the input characteristic of the initially trained first character identification submodel, and retraining the initially trained first character identification submodel again; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
With reference to the first possible implementation manner of the first aspect, the present application provides a second possible implementation manner of the first aspect, where the taking the recognition content as an input feature of a second character recognition submodel and taking a relationship probability matrix between recognition characters included in the recognition content as an output result of the second character recognition submodel includes:
for each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content by the initial training first character recognition sub-model;
and taking the character encoding matrix of the extracted recognition character as the input characteristic of a second character recognition submodel, and taking the relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
With reference to the second possible implementation manner of the first aspect, the present application provides a third possible implementation manner of the first aspect, where after the training of the second character recognition submodel, before the training of the initially trained first character recognition submodel again, the method further includes:
for each sample content in the sample image, determining an image area of the sample image corresponding to the sample content;
based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
retraining the initially trained first character recognition submodel, including:
and aiming at each sample content in the sample image, performing retraining on the initially trained first character recognition submodel by using the content characteristic information of the sample content and an extended relation probability matrix obtained by extending after the trained second character recognition submodel recognizes the recognition content corresponding to the sample content to obtain a relation probability matrix as the input characteristic of the initially trained first character recognition submodel.
In a second aspect, the present application further provides a method for recognizing a character based on a character recognition model trained in any one of the first aspect and the first possible implementation manner to the third possible implementation manner of the first aspect, including:
acquiring a target image; wherein the target image comprises a plurality of target contents;
confirming content characteristic information of each target content and a relation probability matrix between target characters contained in the target content;
and inputting the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content into the character recognition model, and recognizing to obtain the target characters contained in the target content.
In a third aspect, the present application further provides a character recognition model training apparatus, including:
the image acquisition module is used for acquiring a sample image; wherein the sample image comprises a plurality of sample contents;
the information confirming module is used for confirming the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content;
and the model training module is used for taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
With reference to the third aspect, the present application provides a first possible implementation manner of the third aspect, wherein the model training module includes:
the first sub-model training unit is used for sequentially taking the determined content characteristic information of each sample content as the input characteristic of a first character recognition sub-model, taking the corresponding sample content as the output result of the first character recognition sub-model, and training the first character recognition sub-model to obtain an initial training first character recognition sub-model;
a second sub-model training unit, configured to, for each sample content in the sample image, use content feature information of the sample content as an input of the initial training first character recognition sub-model to obtain recognition content, use the recognition content as an input feature of a second character recognition sub-model, use a relationship probability matrix between recognition characters included in the recognition content as an output result of the second character recognition sub-model, and train the second character recognition sub-model;
the first sub-model training unit is further used for retraining the initial training first character recognition sub-model by taking the content feature information of the sample content and a relation probability matrix obtained by recognizing the recognition content corresponding to the sample content by the trained second character recognition sub-model as the input feature of the initial training first character recognition sub-model aiming at each sample content in the sample image; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
With reference to the first possible implementation manner of the third aspect, the present application provides a second possible implementation manner of the third aspect, wherein the second submodel training unit is specifically configured to:
for each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content by the initial training first character recognition sub-model;
and taking the character coding matrix of the extracted recognition characters as the input characteristic of a second character recognition submodel, and taking a relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
With reference to the second possible implementation manner of the third aspect, the present application provides a third possible implementation manner of the third aspect, where the method further includes:
the matrix expansion module is used for determining the image area of each sample content in the sample image, wherein the sample content corresponds to the sample image; based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
the first sub-model training unit is specifically configured to, for each sample content in the sample image, train the initial training first character recognition sub-model again by using, as input features of the initial training first character recognition sub-model, content feature information of the sample content and an extended relationship probability matrix obtained by extending an extended relationship probability matrix obtained by identifying, by the trained second character recognition sub-model, recognition content corresponding to the sample content.
In a fourth aspect, an embodiment of the present application further provides an apparatus for recognizing a character based on a character recognition model trained by any one of the third aspect and the first possible implementation manner to the third possible implementation manner of the third aspect, including:
the image acquisition module is used for acquiring a target image; wherein the target image comprises a plurality of target contents;
the information confirmation module is used for confirming the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content;
and the character recognition module is used for inputting the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content into the character recognition model, and recognizing to obtain the target characters contained in the target content.
In the above scheme provided by the embodiment of the application, the content feature information of the sample content included in the sample image and the relationship probability matrix between the sample characters included in the sample content are used as the input features of the character recognition model to be trained, and the sample characters included in the sample content are used as the output results of the character recognition model to be trained, so as to train and obtain the character recognition model. The character recognition model trained by the scheme can recognize the target characters in the target image by utilizing the relation probability matrix between the target characters contained in the target content, and the recognition accuracy and efficiency are high.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flow chart illustrating a method for training a character recognition model according to an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating another method for training a character recognition model provided by an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for recognizing characters provided by an embodiment of the present application;
FIG. 4 is a schematic structural diagram illustrating a training apparatus for a character recognition model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram illustrating an apparatus for recognizing characters according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present application;
fig. 7 shows a schematic structural diagram of another computer device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Considering the difficulty limited by learning the character relationship from the image, the recognition accuracy and recognition efficiency of the related technical scheme of character recognition by adopting the OCR recognition model are low. In view of this, an embodiment of the present application provides a method for training a character recognition model to improve accuracy and efficiency of character recognition, which is described in the following embodiments.
As shown in fig. 1, a flowchart of a character recognition model training method provided in an embodiment of the present application is provided, where an execution subject of the method may be a computer device, and the training method includes the following steps:
s101, obtaining a sample image; wherein, the sample image comprises a plurality of sample contents.
Here, it is necessary to acquire a sample image in advance, and the sample image may be in a picture format of JPG, PNG, GIF, BMP, DOC, or the like, and may include a plurality of sample contents. The sample content may be a word (such as a word or a phrase), a number, a mathematical formula, or the like, which is not limited in this application embodiment.
S102, confirming content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content.
Before identifying the sample content in the sample image, the sample content, such as the area where the text content is located, may be first found from the sample image, and the corresponding text area is separated from the sample image, so that the feature extraction may be performed on the image area including only the sample content to obtain the corresponding content feature information. Considering that the related technical scheme of performing character recognition by using an OCR recognition model is limited by the difficulty of learning the character relationship from the image, the embodiment of the application can perform recognition by combining the content feature information on the premise of determining the relationship probability matrix between the sample characters contained in the sample content. That is, the character recognition model training method provided by the embodiment of the application aims to learn the character relationship through the relationship probability matrix corresponding to the sample content, so that the character relationship is prevented from being directly learned from the image, and the recognition efficiency is improved while the recognition accuracy is ensured.
S103, taking the content characteristic information of each sample content and the relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
In the character recognition model training stage, the content feature information of each sample content confirmed in S102 and the relationship probability matrix between the sample characters included in the sample content are used as the input features of the character recognition model to be trained, and the sample characters included in the sample content are used as the output results of the character recognition model to be trained, so as to obtain the parameter information of the character recognition model through training, that is, obtain the trained character recognition model. Thus, the target characters in the target image can be recognized through the trained character recognition model.
In a specific implementation, the character recognition model may be implemented by a combination of a first character recognition submodel and a second character recognition submodel. As shown in fig. 2, the training process of the character recognition model specifically includes the following steps:
s201, sequentially taking the determined content characteristic information of each sample content as an input characteristic of a first character recognition submodel, taking the corresponding sample content as an output result of the first character recognition submodel, and training the first character recognition submodel to obtain an initial training first character recognition submodel;
s202, aiming at each sample content in a sample image, taking content characteristic information of the sample content as input of an initial training first character recognition submodel to obtain recognition content, taking the recognition content as input characteristics of a second character recognition submodel, taking a relation probability matrix between recognition characters contained in the recognition content as an output result of the second character recognition submodel, and training the second character recognition submodel;
s203, aiming at each sample content in the sample image, taking the content characteristic information of the sample content and a relation probability matrix obtained by identifying the identification content corresponding to the sample content by the trained second character identification submodel as the input characteristic of the initially trained first character identification submodel, and retraining the initially trained first character identification submodel again; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
Here, the first character recognition submodel is configured to map content feature information of each sample content to corresponding sample content, the second character recognition submodel is configured to map recognition content output by the first character recognition submodel on the sample content to a relationship probability matrix between recognition characters included in the recognition content, and the relationship probability matrix recognized by the second character recognition submodel may be combined with the content feature information of the sample content to serve as an input feature of the first character recognition submodel to train the first character recognition submodel again, so as to improve accuracy of recognition of sample characters included in the sample content.
For the initial training of the first Character Recognition submodel, an existing Optical Character Recognition (OCR) model, such as a random forest, a Support Vector Machine (SVM), a Neural Network (NN) model, or the like, may be used for training. In this way, based on the trained first character recognizer model, recognition content corresponding to each sample content (corresponding to an image region) in the sample image can be obtained.
For training the second character recognition submodel, it mainly relies on the above-mentioned recognition content to directly train a relationship probability matrix between the recognition characters contained in the recognition content, and the relationship probability matrix is used to characterize the degree of association between the characters, for example, for a character sequence "learning", the second character recognition submodel of the embodiment of the present application may be as much as possible? ' as ' r ' instead of ' a ' and other letters.
The embodiment of the application aims to directly attach the relation probability matrix of the recognition characters included in the recognition content to the input characteristics of the first character recognition submodel for retraining, so that the problems of low efficiency and strong limitation caused by directly learning the image-character relation are avoided as much as possible, and the recognition accuracy and efficiency are high.
In a specific implementation, the second character recognition submodel maps an input sequence (i.e., the recognition content obtained by recognizing the sample content from the initially trained first character recognition submodel) to an output matrix (i.e., a relationship probability matrix between the recognition characters included in the recognition content). The embodiment of the application can adopt a special type of Recurrent Neural Networks (RNN) -Long Short Term Memory (LSTM) network to carry out model training. That is, in the embodiment of the present application, the LSTM network is used to gradually grasp various basic knowledge through repeated iterative learning, and finally how to generate a relationship probability matrix meeting the requirements according to the identification content is learned.
Here, the LSTM network described above can be described in accordance with the following equations (1) to (5):
i t =σ(W xi +W hi ht-1+w ci ⊙c t-1 +b i ) (1)
f t =σ(W xf x t +W hf h t-1 +w cf ⊙c t-1 +b f ) (2)
c t =f t c t-1 +i t tanh(W xc x t +W hc h t-1 +b c ) (3)
o t =σ(W xo x t +W ho h t-1 +w co ⊙c t-1 +b o ) (4)
h t =o t tanh(c t ) (5)
wherein W xi ,W hi ,w ci ,b x Is the input gate parameter, W xo ,W ho ,w co ,b o Is the output gate parameter, W xf ,W hf ,w cf ,b f Is a forgetting gate parameter, W xc ,W hc ,b c The parameter is associated with the input (is the state) and the memory cell can be directly modified. The symbol |, indicates element multiplication. The gating cells are implemented by multiple multipliers, so their domain ranges are [0, 1 ]]Corresponding to sigmoid nonlinear functions. We define p lstm ={W xi ,W hi ,w ci ,b x ,W x o,W h o,w c o,bo,W xf ,W hf ,w cf ,b f ,W xc ,W hc ,b c Denotes the merging parameters of the LSTM network.
In addition, the LSTM network may further include a full link layer, the full link layer may adjust the relational probability matrix, and the calculation of the conversion amount may be performed by
Figure BDA0001776832940000111
Wherein
Figure BDA0001776832940000112
Is a weight matrix, b ∈ R n×d Is the offset, α is the activation function (softmax), and the output is expressed as:
Figure BDA0001776832940000113
wherein x is i,j Representation matrix
Figure BDA0001776832940000114
Element index of, output of current layer
Figure BDA0001776832940000115
Can be viewed as a matrix consisting of a probability matrix of the input sentences.
In addition, after the recognition content is obtained by the sample content recognition from the initial training of the first character recognizer model, the method based on mathematics can be used for: word2vec, which converts the recognition content as natural language into digital information in vector form for machine recognition, this process is called encoding (Encoder). That is, a semantic vector is used to represent a word, and then the semantic vector is used as an input feature of the second character recognition sub-model. The semantic vectors can be obtained by using a word Representation model of One-time Representation (One-hot Representation). That is, in the embodiment of the present application, a very long vector may be used to represent a word, the length of the vector is the word size N of the dictionary, each vector has only one dimension of 1, the remaining dimensions are all 0, and the position of 1 represents the position of the word in the dictionary. That is, the word representation model stores word information in a sparse manner, that is, each word is assigned with a digital identifier, and the representation form is relatively simple. Thus, a character encoding matrix can be associated with each sample content. According to the embodiment of the application, the character encoding matrix of the extracted recognition characters can be used as the input characteristic of the second character recognition submodel, and the relation probability matrix between the recognition characters contained in the recognition content is used as the output result of the second character recognition submodel to train the second character recognition submodel.
According to the embodiment of the application, after the second character recognition submodel is trained and before the initially trained first character recognition submodel is retrained again, the relationship probability matrix obtained by recognizing the recognition content corresponding to the sample content according to the trained second character recognition submodel can be expanded to obtain the expanded relationship probability matrix, the expanded relationship probability matrix and the content characteristic information of the sample content are both used as the input characteristics of the initially trained first character recognition submodel, and the initially trained first character recognition submodel is retrained again, so that the robustness of model training is ensured.
When the relationship probability matrix corresponding to any sample content is expanded, the expansion can be performed depending on the size of the image area of the sample image corresponding to the sample content. In the embodiment of the application, some small and meaningless labels can be inserted into all the relation probability matrixes, and finally, the labels are expanded to be the same as the width (pixels) of the image area, so that the meaningless labels can be filtered to weaken the length difference between the input and the output. For example, the sequence '-bb-u-tt' would be converted to 'but', where '-' is the nonsense tag).
Based on the character recognition model obtained by training in the above embodiment, an embodiment of the present application further provides a method for recognizing a character, as shown in fig. 3, which is a flowchart of the method for recognizing a character provided in the embodiment of the present application, and is applied to a computer device, where the method for recognizing a character includes the following steps:
s301, acquiring a target image; wherein the target image comprises a plurality of target contents;
s302, confirming content characteristic information of each target content and a relation probability matrix between target characters contained in the target content;
and S303, inputting the content characteristic information of each target content and the relation probability matrix between the target characters contained in the target content into a character recognition model, and recognizing to obtain the target characters contained in the target content.
Here, in the embodiment of the present application, the content feature information of each target content and the relationship probability matrix between the target characters included in the target content are input to the trained character recognition model, so that the target characters included in the target content can be recognized and obtained. The process of using the character recognition model is similar to the training process, the first character recognition submodel can be used for generating the relation probability matrix, and then the second character recognition submodel is used for recognizing the characters, so that the recognition accuracy and efficiency are ensured.
Based on the same inventive concept, the embodiment of the present application further provides a character recognition model training device corresponding to the character recognition model training method, and because the principle of solving the problem of the device in the embodiment of the present application is similar to that of the character recognition model training method in the embodiment of the present application, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 4, which is a schematic structural diagram of a character recognition model training apparatus provided in an embodiment of the present application, the character recognition model training apparatus includes:
an image acquisition module 401, configured to acquire a sample image; wherein, the sample image comprises a plurality of sample contents;
an information confirming module 402, configured to confirm content feature information of each sample content and a relationship probability matrix between sample characters included in the sample content;
the model training module 403 is configured to use the content feature information of each sample content and the relationship probability matrix between the sample characters included in the sample content as input features of the character recognition model to be trained, and use the sample characters included in the sample content as output results of the character recognition model to be trained, so as to obtain the character recognition model through training.
In one embodiment, model training module 403 includes:
the first sub-model training unit is used for sequentially taking the determined content characteristic information of each sample content as the input characteristic of the first character recognition sub-model, taking the corresponding sample content as the output result of the first character recognition sub-model, and training the first character recognition sub-model to obtain an initial training first character recognition sub-model;
the second submodel training unit is used for taking the content characteristic information of the sample content as the input of an initial training first character recognition submodel to obtain recognition content aiming at each sample content in a sample image, taking the recognition content as the input characteristic of a second character recognition submodel, taking a relation probability matrix between recognition characters contained in the recognition content as the output result of the second character recognition submodel, and training the second character recognition submodel;
the first sub-model training unit is further used for retraining the initially trained first character recognition sub-model by taking the content characteristic information of the sample content and a relation probability matrix obtained by recognizing the recognition content corresponding to the sample content by the trained second character recognition sub-model as the input characteristic of the initially trained first character recognition sub-model aiming at each sample content in the sample image; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
In another embodiment, the second sub-model training unit is specifically configured to:
aiming at each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content through an initial training first character recognition sub-model;
and taking the character coding matrix of the extracted recognition characters as the input characteristic of the second character recognition submodel, and taking the relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
In another embodiment, the character recognition model training apparatus further includes:
a matrix expansion module 404, configured to determine, for each sample content in the sample image, an image area of the sample image corresponding to the sample content; based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
and the second sub-model training unit is specifically used for retraining the initially trained first character recognition sub-model by taking the content characteristic information of the sample content and the expanded relation probability matrix obtained by expanding the relation probability matrix obtained by recognizing the recognition content corresponding to the sample content by the trained second character recognition sub-model as the input characteristic of the initially trained first character recognition sub-model for each sample content in the sample image.
Based on the same application concept, the embodiment of the present application further provides a device for recognizing characters corresponding to the method for recognizing characters, and because the principle of solving the problem of the device in the embodiment of the present application is similar to that of the method for recognizing characters in the embodiment of the present application, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 5, which is a schematic structural diagram of an apparatus for recognizing characters provided in an embodiment of the present application, the apparatus for recognizing characters includes:
an image obtaining module 501, configured to obtain a target image; the target image comprises a plurality of target contents;
an information confirming module 502, configured to confirm content feature information of each target content and a relationship probability matrix between target characters included in the target content;
the character recognition module 503 is configured to input the content feature information of each target content and the relationship probability matrix between the target characters included in the target content into the character recognition model, and recognize to obtain the target characters included in the target content.
As shown in fig. 6, a schematic structural diagram of a computer device provided in an embodiment of the present application is shown, where the computer device includes: a processor 601, a memory 602 and a bus 603, the memory 602 storing machine-readable instructions executable by the processor 601, the processor 601 and the memory 602 communicating via the bus 603 when the computer device is running, the machine-readable instructions when executed by the processor 601 performing the following:
acquiring a sample image; wherein, the sample image comprises a plurality of sample contents;
confirming content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content;
and taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
In one embodiment, in the processing executed by the processor 601, taking the content feature information of each sample content and the relationship probability matrix between the sample characters included in the sample content as the input features of the character recognition model to be trained, and taking the sample characters included in the sample content as the output result of the character recognition model to be trained, training the character recognition model to obtain a character recognition model, including:
sequentially taking the determined content characteristic information of each sample content as the input characteristic of the first character recognition submodel, taking the corresponding sample content as the output result of the first character recognition submodel, and training the first character recognition submodel to obtain an initial training first character recognition submodel;
aiming at each sample content in the sample image, taking the content characteristic information of the sample content as the input of an initial training first character recognition submodel to obtain recognition content, taking the recognition content as the input characteristic of a second character recognition submodel, taking a relation probability matrix between recognition characters contained in the recognition content as the output result of the second character recognition submodel, and training the second character recognition submodel;
for each sample content in the sample image, taking a relation probability matrix obtained by identifying the identification content corresponding to the sample content by the content characteristic information of the sample content and the trained second character identification submodel as the input characteristic of the initially trained first character identification submodel, and retraining the initially trained first character identification submodel again; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
In another embodiment, the processing executed by the processor 601, taking the recognition content as the input feature of the second character recognition submodel, and taking the relationship probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel, includes:
aiming at each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content through an initial training first character recognition sub-model;
and taking the character coding matrix of the extracted recognition characters as the input characteristic of the second character recognition submodel, and taking the relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
In another embodiment, the processing executed by the processor 601, after the training of the second character recognition submodel, and before the retraining of the initially trained first character recognition submodel, further includes:
for each sample content in the sample image, determining an image area of the sample image corresponding to the sample content;
based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
the above-mentioned processing executed by the processor 601 is to train the initially trained first character recognition submodel again, and includes:
and aiming at each sample content in the sample image, recognizing the recognition content corresponding to the sample content by the content characteristic information of the sample content and the trained second character recognition submodel to obtain a relation probability matrix, and then expanding the relation probability matrix to obtain an expanded relation probability matrix as the input characteristic of the initially trained first character recognition submodel, and retraining the initially trained first character recognition submodel again.
Fig. 7 is a schematic structural diagram of another computer device provided in an embodiment of the present application, where the computer device includes: a processor 701, a memory 702 and a bus 703, the memory 702 storing machine-readable instructions executable by the processor 701, the processor 701 and the memory 702902 communicating via the bus 703 when the computer device is operating, the machine-readable instructions when executed by the processor 701 performing the following:
acquiring a target image; wherein the target image comprises a plurality of target contents;
confirming content characteristic information of each target content and a relation probability matrix between target characters contained in the target content;
and inputting the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content into a character recognition model, and recognizing to obtain the target characters contained in the target content.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor 601, the steps of the character recognition model training method are performed.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the character recognition model training method can be executed, so that the problem that the recognition accuracy and the recognition efficiency of the related technical scheme for performing character recognition by adopting an OCR recognition model are low is solved, and the effect of improving the accuracy and the efficiency of character recognition is achieved.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor 701, the steps of the method for recognizing a character are performed.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the method for recognizing characters can be executed, so that the problem that the recognition accuracy and the recognition efficiency of the related technical scheme for performing character recognition by using an OCR recognition model are low is solved, and the effect of improving the accuracy and the efficiency of character recognition is achieved.
The computer program product of the character recognition model training method and the character recognition method provided in the embodiments of the present application includes a computer readable storage medium storing a program code, and instructions included in the program code may be used to execute the methods in the foregoing method embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A character recognition model training method is characterized by comprising the following steps:
acquiring a sample image; wherein the sample image comprises a plurality of sample contents;
confirming content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content; the content feature information of the sample content is obtained by feature extraction of an image area of the sample content, and a relationship probability matrix between sample characters contained in the sample content is used for representing the degree of association between the sample characters contained in the sample content;
and taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
2. The method according to claim 1, wherein the training of the character recognition model by using the content feature information of each sample content and the relationship probability matrix between the sample characters contained in the sample content as the input features of the character recognition model to be trained and using the sample characters contained in the sample content as the output result of the character recognition model to be trained comprises:
sequentially taking the determined content characteristic information of each sample content as the input characteristic of a first character recognition submodel, taking the corresponding sample content as the output result of the first character recognition submodel, and training the first character recognition submodel to obtain an initial training first character recognition submodel;
for each sample content in the sample image, taking content feature information of the sample content as input of the initial training first character recognition submodel to obtain recognition content, taking the recognition content as input feature of a second character recognition submodel, taking a relation probability matrix between recognition characters contained in the recognition content as an output result of the second character recognition submodel, and training the second character recognition submodel;
for each sample content in the sample image, taking the content characteristic information of the sample content and a relation probability matrix obtained by identifying the identification content corresponding to the sample content by the trained second character identification submodel as the input characteristic of the initially trained first character identification submodel, and retraining the initially trained first character identification submodel again; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
3. The method according to claim 2, wherein the using the recognition content as an input feature of a second character recognition submodel and using a relationship probability matrix between recognition characters included in the recognition content as an output result of the second character recognition submodel comprises:
for each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content by the initial training first character recognition sub-model;
and taking the character coding matrix of the extracted recognition characters as the input characteristic of a second character recognition submodel, and taking a relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
4. The method of claim 3, wherein after training the second character recognition submodel and before retraining the initially trained first character recognition submodel, further comprising:
for each sample content in the sample image, determining an image area of the sample image corresponding to the sample content;
based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
retraining the initially trained first character recognition submodel, including:
and aiming at each sample content in the sample image, identifying the identification content corresponding to the sample content by using the content characteristic information of the sample content and the trained second character identification submodel to obtain a relation probability matrix, and then expanding the relation probability matrix to obtain an expanded relation probability matrix as the input characteristic of the initially trained first character identification submodel, and re-training the initially trained first character identification submodel.
5. A method for recognizing characters based on a character recognition model trained according to any one of claims 1 to 4, comprising:
acquiring a target image; wherein the target image comprises a plurality of target contents;
confirming content characteristic information of each target content and a relation probability matrix between target characters contained in the target content; the content feature information of the target content is obtained by feature extraction of an image area of the target content, and a relation probability matrix between target characters contained in the target content is used for representing the degree of association between the target characters contained in the target content;
and inputting the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content into the character recognition model, and recognizing to obtain the target characters contained in the target content.
6. A character recognition model training apparatus, comprising:
the image acquisition module is used for acquiring a sample image; wherein the sample image comprises a plurality of sample contents;
the information confirming module is used for confirming the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content; the content feature information of the sample content is obtained by performing feature extraction on an image area of the sample content, and a relationship probability matrix between sample characters contained in the sample content is used for representing the degree of association between the sample characters contained in the sample content;
and the model training module is used for taking the content characteristic information of each sample content and a relation probability matrix between sample characters contained in the sample content as input characteristics of the character recognition model to be trained, taking the sample characters contained in the sample content as output results of the character recognition model to be trained, and training to obtain the character recognition model.
7. The apparatus of claim 6, wherein the model training module comprises:
the first sub-model training unit is used for sequentially taking the determined content characteristic information of each sample content as the input characteristic of a first character recognition sub-model, taking the corresponding sample content as the output result of the first character recognition sub-model, and training the first character recognition sub-model to obtain an initial training first character recognition sub-model;
a second sub-model training unit, configured to, for each sample content in the sample image, use content feature information of the sample content as an input of the initial training first character recognition sub-model to obtain recognition content, use the recognition content as an input feature of a second character recognition sub-model, use a relationship probability matrix between recognition characters included in the recognition content as an output result of the second character recognition sub-model, and train the second character recognition sub-model;
the first sub-model training unit is also used for retraining the initial training first character recognition sub-model by taking the content characteristic information of the sample content and a relation probability matrix obtained by recognizing the recognition content corresponding to the sample content by the trained second character recognition sub-model as the input characteristic of the initial training first character recognition sub-model aiming at each sample content in the sample image; the character recognition model comprises a retrained initial training first character recognition submodel and a trained second character recognition submodel.
8. The apparatus of claim 7, wherein the second submodel training unit is specifically configured to:
for each sample content in the sample image, extracting a character coding matrix of a recognition character contained in the recognition content from the recognition content obtained by recognizing the sample content by the initial training first character recognition sub-model;
and taking the character coding matrix of the extracted recognition characters as the input characteristic of a second character recognition submodel, and taking a relation probability matrix between the recognition characters contained in the recognition content as the output result of the second character recognition submodel.
9. The apparatus of claim 8, further comprising:
the matrix expansion module is used for determining each sample content in the sample image, and the sample content corresponds to the image area of the sample image; based on the size of the determined image area, expanding a relation probability matrix between identification characters contained in identification content corresponding to the sample content to obtain an expanded relation probability matrix;
the first sub-model training unit is specifically configured to retrain, for each sample content in the sample image, the initial training first character recognition sub-model again by using content feature information of the sample content and an extended relationship probability matrix obtained by extending an extended relationship probability matrix obtained by identifying, by the trained second character recognition sub-model, the recognition content corresponding to the sample content as input features of the initial training first character recognition sub-model.
10. An apparatus for recognizing a character based on the character recognition model trained in any one of claims 6 to 9, comprising:
the image acquisition module is used for acquiring a target image; wherein the target image comprises a plurality of target contents;
the information confirmation module is used for confirming the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content; the content feature information of the target content is obtained by performing feature extraction on an image area of the target content, and a relation probability matrix between target characters contained in the target content is used for representing the degree of association between the target characters contained in the target content;
and the character recognition module is used for inputting the content characteristic information of each target content and a relation probability matrix between target characters contained in the target content into the character recognition model, and recognizing to obtain the target characters contained in the target content.
CN201810973521.3A 2018-08-24 2018-08-24 Character recognition model training method and device and character recognition method and device Active CN110858307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810973521.3A CN110858307B (en) 2018-08-24 2018-08-24 Character recognition model training method and device and character recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810973521.3A CN110858307B (en) 2018-08-24 2018-08-24 Character recognition model training method and device and character recognition method and device

Publications (2)

Publication Number Publication Date
CN110858307A CN110858307A (en) 2020-03-03
CN110858307B true CN110858307B (en) 2022-09-13

Family

ID=69636220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810973521.3A Active CN110858307B (en) 2018-08-24 2018-08-24 Character recognition model training method and device and character recognition method and device

Country Status (1)

Country Link
CN (1) CN110858307B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496227A (en) * 2020-04-08 2021-10-12 顺丰科技有限公司 Training method and device of character recognition model, server and storage medium
CN111667066B (en) * 2020-04-23 2024-06-11 北京旷视科技有限公司 Training method and device of network model, character recognition method and device and electronic equipment
CN113885711A (en) * 2021-09-28 2022-01-04 济南大学 Character input method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2007291884A1 (en) * 2006-09-01 2008-03-06 Sensen Networks Group Pty Ltd Method and system of identifying one or more features represented in a plurality of sensor acquired data sets
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN103077389A (en) * 2013-01-07 2013-05-01 华中科技大学 Text detection and recognition method combining character level classification and character string level classification
CN105430021A (en) * 2015-12-31 2016-03-23 中国人民解放军国防科学技术大学 Encrypted traffic identification method based on load adjacent probability model
CN106778887A (en) * 2016-12-27 2017-05-31 努比亚技术有限公司 The terminal and method of sentence flag sequence are determined based on condition random field
CN106960206A (en) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 Character identifying method and character recognition system
CN106980856A (en) * 2016-01-15 2017-07-25 上海谦问万答吧云计算科技有限公司 Formula identification method and system and symbolic reasoning computational methods and system
WO2018024243A1 (en) * 2016-08-05 2018-02-08 腾讯科技(深圳)有限公司 Method and device for verifying recognition result in character recognition
CN108182437A (en) * 2017-12-29 2018-06-19 北京金堤科技有限公司 One kind clicks method for recognizing verification code, device and user terminal
CN108200034A (en) * 2017-12-27 2018-06-22 新华三信息安全技术有限公司 A kind of method and device for identifying domain name
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108345880A (en) * 2018-01-26 2018-07-31 金蝶软件(中国)有限公司 Invoice recognition methods, device, computer equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2007291884A1 (en) * 2006-09-01 2008-03-06 Sensen Networks Group Pty Ltd Method and system of identifying one or more features represented in a plurality of sensor acquired data sets
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN103077389A (en) * 2013-01-07 2013-05-01 华中科技大学 Text detection and recognition method combining character level classification and character string level classification
CN105430021A (en) * 2015-12-31 2016-03-23 中国人民解放军国防科学技术大学 Encrypted traffic identification method based on load adjacent probability model
CN106980856A (en) * 2016-01-15 2017-07-25 上海谦问万答吧云计算科技有限公司 Formula identification method and system and symbolic reasoning computational methods and system
WO2018024243A1 (en) * 2016-08-05 2018-02-08 腾讯科技(深圳)有限公司 Method and device for verifying recognition result in character recognition
CN106778887A (en) * 2016-12-27 2017-05-31 努比亚技术有限公司 The terminal and method of sentence flag sequence are determined based on condition random field
CN106960206A (en) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 Character identifying method and character recognition system
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108200034A (en) * 2017-12-27 2018-06-22 新华三信息安全技术有限公司 A kind of method and device for identifying domain name
CN108182437A (en) * 2017-12-29 2018-06-19 北京金堤科技有限公司 One kind clicks method for recognizing verification code, device and user terminal
CN108345880A (en) * 2018-01-26 2018-07-31 金蝶软件(中国)有限公司 Invoice recognition methods, device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Recognition-Based Approach of Numeral Extraction in Handwritten Chemistry Documents Using Contextual Knowledge;Nabil Ghanmi等;《2016 12th IAPR Workshop on Document Analysis Systems (DAS)》;20160613;251-256 *
Statistical Structure Modeling and Optimal Combined Strategy Based Chinese Components Recognition;Bowen Yu等;《2012 Eighth International Conference on Signal Image Technology and Internet Based Systems》;20130111;238-245 *
基于自然场景的文本识别技术研究;黄舒啸;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415;第2018年卷(第4期);I138-2502 *
银行卡号字符的分割与识别算法研究;涂亚飞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115;第2018年卷(第1期);I138-1694 *

Also Published As

Publication number Publication date
CN110858307A (en) 2020-03-03

Similar Documents

Publication Publication Date Title
CN112232149B (en) Document multimode information and relation extraction method and system
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN110858307B (en) Character recognition model training method and device and character recognition method and device
CN110532381A (en) A kind of text vector acquisition methods, device, computer equipment and storage medium
CN113705313A (en) Text recognition method, device, equipment and medium
CN113255331B (en) Text error correction method, device and storage medium
CN111680684B (en) Spine text recognition method, device and storage medium based on deep learning
CN110738238A (en) certificate information classification positioning method and device
US20200279079A1 (en) Predicting probability of occurrence of a string using sequence of vectors
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN114897060A (en) Training method and device of sample classification model, and sample classification method and device
CN112434686B (en) End-to-end misplaced text classification identifier for OCR (optical character) pictures
CN112861864A (en) Topic entry method, topic entry device, electronic device and computer-readable storage medium
CN114092931B (en) Scene character recognition method and device, electronic equipment and storage medium
CN114492661A (en) Text data classification method and device, computer equipment and storage medium
CN110889276B (en) Method, system and computer medium for extracting pointer type extraction triplet information by complex fusion characteristics
CN112307749A (en) Text error detection method and device, computer equipment and storage medium
CN111126059A (en) Method and device for generating short text and readable storage medium
CN114003708B (en) Automatic question-answering method and device based on artificial intelligence, storage medium and server
CN115796141A (en) Text data enhancement method and device, electronic equipment and storage medium
CN116226450A (en) Video representation method and device based on unsupervised pre-training model
CN115017906A (en) Method, device and storage medium for identifying entities in text
CN115130475A (en) Extensible universal end-to-end named entity identification method
CN114782958A (en) Text error detection model training method, text error detection method and text error detection device
Idziak et al. Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing

Applicant after: Guoxin Youyi Data Co.,Ltd.

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

GR01 Patent grant
GR01 Patent grant