CN114399766A - Optical character recognition model training method, device, equipment and medium - Google Patents
Optical character recognition model training method, device, equipment and medium Download PDFInfo
- Publication number
- CN114399766A CN114399766A CN202210056338.3A CN202210056338A CN114399766A CN 114399766 A CN114399766 A CN 114399766A CN 202210056338 A CN202210056338 A CN 202210056338A CN 114399766 A CN114399766 A CN 114399766A
- Authority
- CN
- China
- Prior art keywords
- data set
- recognition model
- character
- data
- character recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012015 optical character recognition Methods 0.000 title claims abstract description 142
- 238000012549 training Methods 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012216 screening Methods 0.000 claims abstract description 29
- 238000004519 manufacturing process Methods 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 238000003860 storage Methods 0.000 claims description 14
- 238000013518 transcription Methods 0.000 claims description 14
- 230000035897 transcription Effects 0.000 claims description 14
- 238000000586 desensitisation Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims 1
- 238000002372 labelling Methods 0.000 abstract description 30
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 102100032202 Cornulin Human genes 0.000 description 4
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002103 transcriptional effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to the field of artificial intelligence, and discloses an optical character recognition model training method, which comprises the following steps: screening error data of an original picture set and an original data set in actual production by using a search engine, and determining that the error data forms a negative sample data set and non-error data forms a positive sample data set; recognizing a positive sample data set, a negative sample data set and a predicted character set of an original image set by using an optical character recognition model; and calculating loss values of the predicted character set, the real character labeling set and the error character labeling set, and if the loss values do not meet preset conditions, adjusting parameters of the model until the loss values meet the preset conditions to obtain the trained optical character recognition model. The invention also relates to a block chain technology, and the trained optical character recognition model can be stored in the block chain link points. The invention also provides an optical character recognition model training device, equipment and a medium. The invention can improve the efficiency and accuracy of the training of the optical character recognition model.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to an optical character recognition model training method and device, electronic equipment and a computer readable storage medium.
Background
With the research and development of artificial intelligence technology, higher and higher requirements are provided for the recognition accuracy of an optical character recognition model (such as an OCR deep learning recognition model), and as a mature OCR deep learning recognition model needs dozens of iterations or even hundreds of iterations, some scientific and technological enterprises invest a large amount of manpower and material resources for obtaining the mature OCR deep learning recognition model, so that the rapid development and iteration of the OCR deep learning recognition model are realized, and the requirement of business growth is met.
However, the traditional optical character recognition model has difference in distribution of development environment training data and production environment data during training, so that the optical character recognition model with better recognition effect in the development environment can not always achieve the same good recognition effect in the production environment, and the accuracy of the optical character recognition model is low; when the recognition effect is not good, the test data is continuously and repeatedly constructed for testing, so that the training efficiency of the optical character recognition model is low, and the accuracy rate cannot be improved.
Disclosure of Invention
The invention provides an optical character recognition model training method, an optical character recognition model training device, electronic equipment and a computer readable storage medium, and mainly aims to improve the efficiency and accuracy of optical character recognition model training.
In order to achieve the above object, the present invention provides a training method for an optical character recognition model, comprising:
acquiring an original picture set and an original data set corresponding to the original picture set in actual production, and storing the original picture set and the original data set into a preset message queue channel;
when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, screening error data of the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
acquiring a real character label set corresponding to the positive sample data set and an error character label set corresponding to the negative sample data set, wherein the error character label set is dynamically updated in real time;
inputting the positive sample data set, the negative sample data set and the original picture set as training data sets into a preset optical character recognition model, and recognizing a predicted character set of the training data sets by using the optical character recognition model;
and obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
Optionally, the performing error data screening on the original data set, determining that the screened error data constitutes a negative sample data set, and non-error data other than the error data constitutes a positive sample data set, includes:
acquiring the sequence length of the original data in the original data set by using a preset search engine, and setting a sequence length index by using a preset screening statement in the preset search engine;
and comparing the sequence length with the sequence length index, forming a negative sample data set by using the original data corresponding to the sequence length with inconsistent length of the sequence length index, and forming a positive sample data set by using the original data corresponding to the sequence length with consistent length of the sequence length index.
Optionally, after determining that the screened error data constitutes a negative sample data set and non-error data other than the error data constitutes a positive sample data set, the method further includes:
acquiring data fields of the positive sample data set and the negative sample data set, and identifying sensitive fields in the data fields;
and carrying out desensitization operation on the sensitive field by using a preset desensitization function.
Optionally, the recognizing the predicted character set of the training data set by using a preset optical character recognition model includes:
extracting a characteristic sequence of the training data set by using a convolutional layer in a preset optical character recognition model to obtain a character vector set;
predicting a character tag set of the character vector set using a loop layer in the optical character recognition model;
and integrating the character label set by utilizing a transcription layer in the optical character recognition model to obtain a predicted character set.
Optionally, the predicting a character tag set of the character vector set using a loop layer in the optical character recognition model includes:
calculating a state value of the character vector set by using an input gate in the circulation layer;
calculating an activation value of the character vector set by using a forgetting gate in the circulation layer;
calculating a state update value of the character vector set according to the state arrival and activation values;
and calculating the character label set of the state updating value by using an output gate in the circulation layer to obtain the character label set of the character vector set.
Optionally, the integrating the character tag set by using a transcription layer in the optical character recognition model to obtain a predicted character set includes:
acquiring all path probabilities of the character label set by utilizing the transcription layer, and searching the maximum path probability corresponding to each character label from the path probabilities;
and combining the maximum path probabilities to obtain a predicted character set of the character label set.
Optionally, before storing the original picture set and the original data set in a preset message queue channel, the method further includes:
establishing a link between the original data set and the original picture set and the message middleware, and forming a message queue channel through the link;
storing the raw picture set and raw data set through the message queue channel.
In order to solve the above problem, the present invention further provides an optical character recognition model training apparatus, including:
the data set acquisition module is used for acquiring an original picture set and an original data set corresponding to the original picture set in actual production and storing the original picture set and the original data set into a preset message queue channel;
the data set screening module is used for acquiring an original data set corresponding to the original picture set from the message queue channel by using a preset search engine when the search engine is idle, screening error data of the original data set, and determining that screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
the data set marking module is used for acquiring a real character marking set corresponding to the positive sample data set and an error character marking set corresponding to the negative sample data set, wherein the error character marking set is dynamically updated in real time;
a training data set identification module, configured to input the positive sample data set, the negative sample data set, and the original image set as training data sets to a preset optical character recognition model, and identify a predicted character set of the training data set by using the optical character recognition model;
and the model training module is used for obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
and the processor executes the computer program stored in the memory to realize the optical character recognition model training method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the optical character recognition model training method described above.
In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are obtained firstly, so that the data distribution difference in a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, the search engine is utilized to obtain an original data set corresponding to the original picture set from the message queue channel, the search engine is utilized to carry out error data identification on the original data set, error data can be directly screened by the search engine instead of manual error data screening, manpower and time are saved, the iteration cycle of subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the subsequent model can be conveniently trained; and finally, recognizing the predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model. Therefore, the optical character recognition model training method, the optical character recognition model training device, the electronic equipment and the storage medium provided by the embodiment of the invention can improve the efficiency and accuracy of optical character recognition model training.
Drawings
FIG. 1 is a schematic flow chart illustrating a training method for an optical character recognition model according to an embodiment of the present invention;
FIG. 2 is a block diagram of an optical character recognition model training apparatus according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device implementing a training method for an optical character recognition model according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides an optical character recognition model training method. The execution subject of the optical character recognition model training method includes, but is not limited to, at least one of the electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the optical character recognition model training method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Referring to fig. 1, which is a schematic flow diagram of an optical character recognition model training method according to an embodiment of the present invention, in an embodiment of the present invention, the optical character recognition model training method includes:
in detail, the training method of the optical character recognition model comprises the following steps:
s1, acquiring an original picture set in actual production and an original data set corresponding to the original picture set, and storing the original picture set and the original data set into a preset message queue channel.
In the embodiment of the invention, the original picture set is unstructured data acquired from an actual production environment process by using a preset optical character recognition interface, the original data set is character information extracted from the original picture set by using the optical character recognition interface, and the character information is structured data.
The structured data is row data, can be stored in a database, and is realized by two-dimensional logic expression; the unstructured data refers to data that cannot be realized with two-dimensional logic expression, such as text, picture, XML, HTML, audio, video, and the like.
Specifically, in this embodiment, a preset optical character recognition interface is used to send the recognized structured data and unstructured data to a message queue channel, and then the data is processed according to the user requirement.
In detail, before storing the original picture set and the original data set in a preset message queue channel, the method further includes:
establishing a link between the original data set and the original picture set and the message middleware, and forming a message queue channel through the link;
storing the raw picture set and raw data set through the message queue channel.
In the embodiment of the invention, the message queue channel is a channel which is formed by linking original data and message middleware and can receive, store and send information.
Preferably, the link may be a TCP link.
Preferably, the message middleware may be kafka.
In another embodiment of the present invention, the identified structured data may be sent to the message queue channel by using a preset optical character recognition interface, the unstructured data is stored in the NAS disk, and then the data is processed according to the user requirement.
And S2, when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, screening error data of the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set.
In the embodiment of the invention, the original picture set and the original data set can be asynchronously stored through the preset message queue channel, when the original picture set and the original data set are transmitted to the message queue channel, the message queue channel can firstly not process the original picture set and the original data set, and the time for processing the original picture set and the original data set is determined according to the requirements of users; the original data set processed in the message queue channel is transmitted to a preset search engine, the original data set corresponding to the original picture set in actual production is obtained, the preset screening data of the search engine can be used for screening error data in the original data set corresponding to the original picture set in the message queue channel, the error data is error character information in the original data set corresponding to the original picture set, the error data forms a negative sample data set, and non-error data except the rest error data forms a positive sample data set.
Preferably, the predetermined search engine may be an ElasticSearch.
In detail, the screening of the error data from the original data set, determining that the screened error data constitutes a negative sample data set, and that non-error data other than the error data constitutes a positive sample data set, includes:
acquiring the sequence length of the original data in the original data set by using a preset search engine, and setting a sequence length index by using a preset screening statement in the preset search engine;
and comparing the sequence length with the sequence length index, forming a negative sample data set by using the original data corresponding to the sequence length with inconsistent length of the sequence length index, and forming a positive sample data set by using the original data corresponding to the sequence length with consistent length of the sequence length index.
In the embodiment of the present invention, the sequence length may be a character length corresponding to the original data, and the sequence length index set by using the preset filtering statement is an index set for a fixed length of a character in the original data.
For example, if a certain original picture is a car license plate picture, and the sequence length included in the original data corresponding to the original picture is seven digits (that is, the length of the characters included in the original picture is seven digits), the screening statement sets the sequence length index to be length 7, and if the sequence length in the original data is identified to be 7, the original data is determined to be positive sample data; and if the sequence length in the original data is not identified to be 7, determining that the original data is negative sample data.
Further, after determining that the screened error data constitutes a negative sample data set and non-error data other than the error data constitutes a positive sample data set, the method further includes:
acquiring data fields of the positive sample data set and the negative sample data set, and identifying sensitive fields in the data fields;
and carrying out desensitization operation on the sensitive field by using a preset desensitization function.
In the embodiment of the invention, fields related to personal privacy information such as a personal name, identity card information, a mobile phone number and the like are sensitive fields, and after the sensitive fields are identified, data replacement or mask shielding processing can be performed on the sensitive fields to realize desensitization, for example, data replacement is performed on the middle four digits of the mobile phone number to obtain 13800001248, or middle four digits of the mobile phone number are masked to obtain 138 x 1248.
For example, desensitization is performed on individual identification information, cell phone numbers, bank card information, etc. collected by institutions and businesses.
In the embodiment of the invention, the privacy information in the positive sample data and the negative sample data can be shielded or hidden by desensitizing the sensitive field by the desensitizing function, so that the safety of the privacy data of the user in the real production environment is protected.
For example, because the embodiment of the invention uses data in the actual production process to train the model, confidential data of an enterprise may be involved, a preset data use authority may be used, and the use limitation is performed in the data use authority, so that a developer cannot view or download the data.
In one embodiment of the invention, the preset data use permission can also ensure that developers can only use data but cannot check or download the data when performing model iteration and training, thereby avoiding the risk of data leakage.
S3, acquiring a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, wherein the error character labeling set is dynamically updated in real time.
In the embodiment of the present invention, the positive sample data set and the negative sample data set may be transmitted to a preset labeling platform, and a preset labeling platform is used to label a real character label of the positive sample data set and label corresponding position information of the positive sample data set in an original picture (i.e. position information of a real character in the original picture), and the real character label of the positive sample data set and the position information of the real character in the original picture are combined into a real character labeling set; similarly, a preset labeling platform is used for labeling the character label of the negative sample data set, and labeling the corresponding position information of the negative sample data set in the original picture (namely the position information of the error character in the original picture), and the character label of the negative sample data set and the position information of the character in the original picture are combined to form an error character labeling set.
For example, seven digits of license plate numbers XA · xxxxxx can be used for enabling the labeling platform to label the real character label of the positive sample data XA · xxxxxx and the labeled real character respectively corresponds to the digit of the license plate number by calling the labeling platform interface, and combining the real character label of the positive sample data XA · xxxxxx and the labeled real character respectively corresponding to the digit of the license plate number into a real character labeling set; the license plate number of the negative sample data is XB XXX, the marking platform can be used for marking the character marking of the negative sample data XB XXX by calling the marking platform interface, the marked characters respectively correspond to the number of digits of the license plate number, and the character marking of the negative sample data XB XXX and the marked characters respectively correspond to the number of digits of the license plate number to form an error character marking set.
In the embodiment of the invention, the real-time dynamic updating of the wrong character label set means that after the label platform finds the wrong character label set, the negative sample data set corresponding to the wrong character label can be subjected to real-time data updating, and the negative sample data set is used as an updated training data set and is input into a subsequent model, so that the accuracy of subsequent model training is improved.
And S4, inputting the positive sample data set, the negative sample data set and the original picture set as training data sets into a preset optical character recognition model, and recognizing a predicted character set of the training data sets by using the optical character recognition model.
In an embodiment of the present invention, the preset optical character recognition model may be a deep learning model of a CRNN structure, where the CRNN structure includes: CNN + LSTM + CTC, and the optical character recognition model comprises: convolutional layer (CNN), cyclic Layer (LSTM), transcriptional layer (CTC), and loss function.
In detail, the recognizing the predicted characters of the training data set by using a preset optical character recognition model includes:
extracting a characteristic sequence of the training data set by using a convolutional layer in a preset optical character recognition model to obtain a character vector set;
predicting a character tag set of the character vector set using a loop layer in the optical character recognition model;
and integrating the character label set by utilizing a transcription layer in the optical character recognition model to obtain a predicted character set.
In one embodiment of the invention, the convolutional layer comprises a convolutional sublayer and a pooling layer, and the feature extraction can be performed on the training data set through the convolutional sublayer to obtain a feature map; and extracting the feature sequence vector in the feature map by using the pooling layer pair to obtain a character vector set.
In another embodiment of the present invention, the loop layer is mainly composed of the variant LTSM of RNN, and because RNN has the problem of gradient disappearance and cannot acquire more context information, LSTM replaces RNN to better extract context information, wherein the loop layer includes an input gate, a forgetting gate and an output gate.
In detail, the predicting the character tag set of the character vector set using a loop layer in the optical character recognition model includes:
calculating a state value of the character vector set by using an input gate in the circulation layer;
calculating an activation value of the character vector set by using a forgetting gate in the circulation layer;
calculating a state update value of the character vector set according to the state arrival and activation values;
and calculating the character label set of the state updating value by using an output gate in the circulation layer to obtain the character label set of the character vector set.
In the embodiment of the invention, the input gate can control the quantity of the character vector sets entering and exiting and the quantity of the character vector sets passing through the gate; the forgetting gate is used for controlling the number of the character vector sets flowing from the previous moment to the current moment; the state updating value is that if the character vector set passing through the forgetting gate is not selected to be forgotten by the forgetting gate, the character vector set is taken as the state updating value; the output gate may output a set of character tags of a set of character vectors.
In the embodiment of the present invention, the transcription layer is mainly composed of ctc (connectionist Temporal classification), and mainly functions to convert a predicted character tag set in the LSTM into a predicted character set with tags.
Further, the integrating the character tag set by using the transcription layer in the optical character recognition model to obtain a predicted character set includes:
acquiring all path probabilities of the character label set by utilizing the transcription layer, and searching the maximum path probability corresponding to each character label from the path probabilities;
and combining the maximum path probabilities to obtain the predicted character of the character label set.
In the embodiment of the invention, the predicted character can be obtained by the following formula:
in the embodiment of the invention, P (pi | x) is the path probability of all character tags, B (pi) is the path set of all character tags, pi is the maximum path probability corresponding to each character tag, and y is the predicted character corresponding to the character tag.
S5, obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
In an embodiment of the present invention, a first loss value of the predicted character set and the real character label set is calculated, a second loss value of the predicted character set and the error character label set is calculated, the first loss value and the second loss value are fused to obtain a loss value, and if the loss value does not satisfy a preset condition, a parameter of the optical character recognition model is adjusted until the loss value satisfies the preset condition, so as to obtain a trained optical character recognition model.
For example, the preset condition may be a preset threshold of 0.1, and when the loss value is less than 0.1, the model parameters are adjusted until the loss value is greater than or equal to 0.1, so as to obtain the trained optical character recognition model.
In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are obtained firstly, so that the data distribution difference in a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, the search engine is utilized to obtain an original data set corresponding to the original picture set from the message queue channel, the search engine is utilized to carry out error data identification on the original data set, error data can be directly screened by the search engine instead of manual error data screening, manpower and time are saved, the iteration cycle of subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the subsequent model can be conveniently trained; and finally, recognizing the predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model. Therefore, the training method of the optical character recognition model provided by the embodiment of the invention can improve the efficiency and accuracy of training the optical character recognition model.
FIG. 2 is a functional block diagram of the training apparatus for OCR model according to the present invention.
The optical character recognition model training apparatus 100 according to the present invention can be installed in an electronic device. According to the implemented functions, the optical character recognition model training apparatus may include a data set obtaining module 101, a data set screening module 102, a data set labeling module 103, a training data set recognition module 104, and a model training module 105, which may also be referred to as a unit, and refers to a series of computer program segments that can be executed by a processor of an electronic device and can perform fixed functions, and are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data set obtaining module 101 is configured to obtain an original picture set and an original data set corresponding to the original picture set in actual production, and store the original picture set and the original data set in a preset message queue channel.
In the embodiment of the invention, the original picture set is unstructured data acquired from an actual production environment process by using a preset optical character recognition interface, the original data set is character information extracted from the original picture set by using the optical character recognition interface, and the character information is structured data.
The structured data is row data, can be stored in a database, and is realized by two-dimensional logic expression; the unstructured data refers to data that cannot be realized with two-dimensional logic expression, such as text, picture, XML, HTML, audio, video, and the like.
Specifically, in this embodiment, a preset optical character recognition interface is used to send the recognized structured data and unstructured data to a message queue channel, and then the data is processed according to the user requirement.
The dataset acquisition module may be to:
establishing a link between the original data set and the original picture set and the message middleware, and forming a message queue channel through the link;
storing the raw picture set and raw data set through the message queue channel.
In the embodiment of the invention, the message queue channel is a channel which is formed by linking original data and message middleware and can receive, store and send information.
Preferably, the link may be a TCP link.
Preferably, the message middleware may be kafka.
In another embodiment of the present invention, the identified structured data may be sent to the message queue channel by using a preset optical character recognition interface, the unstructured data is stored in the NAS disk, and then the data is processed according to the user requirement.
The data set screening module 102 is configured to, when a preset search engine is idle, acquire an original data set corresponding to the original picture set from the message queue channel by using the search engine, perform error data screening on the original data set, determine that screened error data forms a negative sample data set, and determine that non-error data other than the error data forms a positive sample data set.
In the embodiment of the invention, the original picture set and the original data set can be asynchronously stored through the preset message queue channel, when the original picture set and the original data set are transmitted to the message queue channel, the message queue channel can firstly not process the original picture set and the original data set, and the time for processing the original picture set and the original data set is determined according to the requirements of users; the original data set processed in the message queue channel is transmitted to a preset search engine, the original data set corresponding to the original picture set in actual production is obtained, the preset screening data of the search engine can be used for screening error data in the original data set corresponding to the original picture set in the message queue channel, the error data is error character information in the original data set corresponding to the original picture set, the error data forms a negative sample data set, and non-error data except the rest error data forms a positive sample data set.
Preferably, the predetermined search engine may be an ElasticSearch.
In detail, the data set filtering module 102 performs error data filtering on the original data set by performing the following operations, and determines that the filtered error data constitutes a negative sample data set and non-error data other than the error data constitutes a positive sample data set, including:
acquiring the sequence length of the original data in the original data set by using a preset search engine, and setting a sequence length index by using a preset screening statement in the preset search engine;
and comparing the sequence length with the sequence length index, forming a negative sample data set by using the original data corresponding to the sequence length with inconsistent length of the sequence length index, and forming a positive sample data set by using the original data corresponding to the sequence length with consistent length of the sequence length index.
In the embodiment of the present invention, the sequence length may be a character length corresponding to the original data, and the sequence length index set by using the preset filtering statement is an index set for a fixed length of a character in the original data.
For example, if a certain original picture is a car license plate picture, and the sequence length included in the original data corresponding to the original picture is seven digits (that is, the length of the characters included in the original picture is seven digits), the screening statement sets the sequence length index to be length 7, and if the sequence length in the original data is identified to be 7, the original data is determined to be positive sample data; and if the sequence length in the original data is not identified to be 7, determining that the original data is negative sample data.
The dataset screening module is further operable to:
acquiring data fields of the positive sample data set and the negative sample data set, and identifying sensitive fields in the data fields;
and carrying out desensitization operation on the sensitive field by using a preset desensitization function.
In the embodiment of the invention, fields related to personal privacy information such as a personal name, identity card information, a mobile phone number and the like are sensitive fields, and after the sensitive fields are identified, data replacement or mask shielding processing can be performed on the sensitive fields to realize desensitization, for example, data replacement is performed on the middle four digits of the mobile phone number to obtain 13800001248, or middle four digits of the mobile phone number are masked to obtain 138 x 1248.
For example, desensitization is performed on individual identification information, cell phone numbers, bank card information, etc. collected by institutions and businesses.
In the embodiment of the invention, the privacy information in the positive sample data and the negative sample data can be shielded or hidden by desensitizing the sensitive field by the desensitizing function, so that the safety of the privacy data of the user in the real production environment is protected.
For example, because the embodiment of the invention uses data in the actual production process to train the model, confidential data of an enterprise may be involved, a preset data use authority may be used, and the use limitation is performed in the data use authority, so that a developer cannot view or download the data.
In one embodiment of the invention, the preset data use permission can also ensure that developers can only use data but cannot check or download the data when performing model iteration and training, thereby avoiding the risk of data leakage.
The data set labeling module 103 is configured to obtain a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, where the error character labeling set is dynamically updated in real time.
In the embodiment of the present invention, the positive sample data set and the negative sample data set may be transmitted to a preset labeling platform, and a preset labeling platform is used to label a real character label of the positive sample data set and label corresponding position information of the positive sample data set in an original picture (i.e. position information of a real character in the original picture), and the real character label of the positive sample data set and the position information of the real character in the original picture are combined into a real character labeling set; similarly, a preset labeling platform is used for labeling the character label of the negative sample data set, and labeling the corresponding position information of the negative sample data set in the original picture (namely the position information of the error character in the original picture), and the character label of the negative sample data set and the position information of the character in the original picture are combined to form an error character labeling set.
For example, seven digits of license plate numbers XA · xxxxxx can be used for enabling the labeling platform to label the real character label of the positive sample data XA · xxxxxx and the labeled real character respectively corresponds to the digit of the license plate number by calling the labeling platform interface, and combining the real character label of the positive sample data XA · xxxxxx and the labeled real character respectively corresponding to the digit of the license plate number into a real character labeling set; the license plate number of the negative sample data is XB XXX, the marking platform can be used for marking the character marking of the negative sample data XB XXX by calling the marking platform interface, the marked characters respectively correspond to the number of digits of the license plate number, and the character marking of the negative sample data XB XXX and the marked characters respectively correspond to the number of digits of the license plate number to form an error character marking set.
In the embodiment of the invention, the real-time dynamic updating of the wrong character label set means that after the label platform finds the wrong character label set, the negative sample data set corresponding to the wrong character label can be subjected to real-time data updating, and the negative sample data set is used as an updated training data set and is input into a subsequent model, so that the accuracy of subsequent model training is improved.
The training data set identification module 104 is configured to input the positive sample data set, the negative sample data set, and the original image set as training data sets to a preset optical character recognition model, and identify a predicted character set of the training data set by using the optical character recognition model.
In an embodiment of the present invention, the preset optical character recognition model may be a deep learning model of a CRNN structure, where the CRNN structure includes: CNN + LSTM + CTC, and the optical character recognition model comprises: convolutional layer (CNN), cyclic Layer (LSTM), transcriptional layer (CTC), and loss function.
In detail, the training data set recognition module 104 recognizes the predicted characters of the training data set by using a preset optical character recognition model by performing the following operations, including:
extracting a characteristic sequence of the training data set by using a convolutional layer in a preset optical character recognition model to obtain a character vector set;
predicting a character tag set of the character vector set using a loop layer in the optical character recognition model;
and integrating the character label set by utilizing a transcription layer in the optical character recognition model to obtain a predicted character set.
In one embodiment of the invention, the convolutional layer comprises a convolutional sublayer and a pooling layer, and the feature extraction can be performed on the training data set through the convolutional sublayer to obtain a feature map; and extracting the feature sequence vector in the feature map by using the pooling layer pair to obtain a character vector set.
In another embodiment of the present invention, the loop layer is mainly composed of the variant LTSM of RNN, and because RNN has the problem of gradient disappearance and cannot acquire more context information, LSTM replaces RNN to better extract context information, wherein the loop layer includes an input gate, a forgetting gate and an output gate.
In detail, the predicting the character tag set of the character vector set using a loop layer in the optical character recognition model includes:
calculating a state value of the character vector set by using an input gate in the circulation layer;
calculating an activation value of the character vector set by using a forgetting gate in the circulation layer;
calculating a state update value of the character vector set according to the state arrival and activation values;
and calculating the character label set of the state updating value by using an output gate in the circulation layer to obtain the character label set of the character vector set.
In the embodiment of the invention, the input gate can control the quantity of the character vector sets entering and exiting and the quantity of the character vector sets passing through the gate; the forgetting gate is used for controlling the number of the character vector sets flowing from the previous moment to the current moment; the state updating value is that if the character vector set passing through the forgetting gate is not selected to be forgotten by the forgetting gate, the character vector set is taken as the state updating value; the output gate may output a set of character tags of a set of character vectors.
In the embodiment of the present invention, the transcription layer is mainly composed of ctc (connectionist Temporal classification), and mainly functions to convert a predicted character tag set in the LSTM into a predicted character set with tags.
Further, the integrating the character tag set by using the transcription layer in the optical character recognition model to obtain a predicted character set includes:
acquiring all path probabilities of the character label set by utilizing the transcription layer, and searching the maximum path probability corresponding to each character label from the path probabilities;
and combining the maximum path probabilities to obtain the predicted character of the character label set.
In the embodiment of the invention, the predicted character can be obtained by the following formula:
in the embodiment of the invention, P (pi | x) is the path probability of all character tags, B (pi) is the path set of all character tags, pi is the maximum path probability corresponding to each character tag, and y is the predicted character corresponding to the character tag.
The model training module 105 is configured to obtain loss values of the predicted character set, the real character tagging set, and the error character tagging set through calculation, and adjust parameters of the optical character recognition model if the loss values do not satisfy preset conditions until the loss values satisfy the preset conditions, so as to obtain a trained optical character recognition model.
In an embodiment of the present invention, a first loss value of the predicted character set and the real character label set is calculated, a second loss value of the predicted character set and the error character label set is calculated, the first loss value and the second loss value are fused to obtain a loss value, and if the loss value does not satisfy a preset condition, a parameter of the optical character recognition model is adjusted until the loss value satisfies the preset condition, so as to obtain a trained optical character recognition model.
For example, the preset condition may be a preset threshold of 0.1, and when the loss value is less than 0.1, the model parameters are adjusted until the loss value is greater than or equal to 0.1, so as to obtain the trained optical character recognition model.
In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are obtained firstly, so that the data distribution difference in a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, the search engine is utilized to obtain an original data set corresponding to the original picture set from the message queue channel, the search engine is utilized to carry out error data identification on the original data set, error data can be directly screened by the search engine instead of manual error data screening, manpower and time are saved, the iteration cycle of subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the subsequent model can be conveniently trained; and finally, recognizing the predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model. Therefore, the optical character recognition model training device provided by the embodiment of the invention can improve the efficiency and accuracy of optical character recognition model training.
Fig. 3 is a schematic structural diagram of an electronic device implementing the optical character recognition model training method according to the present invention.
The electronic device may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may further include a computer program, such as an optical character recognition model training program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of media, which includes flash memory, removable hard disk, multimedia card, card type memory (e.g., SD or DX memory, etc.), magnetic memory, local disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of an optical character recognition model training program, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., an optical character recognition model training program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The communication bus 12 may be a PerIPheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Optionally, the communication interface 13 may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the communication interface 13 may further include a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), and optionally, a standard wired interface, or a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The optical character recognition model training program stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, and when running in the processor 10, can realize:
acquiring an original picture set and an original data set corresponding to the original picture set in actual production, and storing the original picture set and the original data set into a preset message queue channel;
when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, screening error data of the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
acquiring a real character label set corresponding to the positive sample data set and an error character label set corresponding to the negative sample data set, wherein the error character label set is dynamically updated in real time;
inputting the positive sample data set, the negative sample data set and the original picture set as training data sets into a preset optical character recognition model, and recognizing a predicted character set of the training data sets by using the optical character recognition model;
and obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable medium. The computer readable medium may be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Embodiments of the present invention may also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, the computer program may implement:
acquiring an original picture set and an original data set corresponding to the original picture set in actual production, and storing the original picture set and the original data set into a preset message queue channel;
when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, screening error data of the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
acquiring a real character label set corresponding to the positive sample data set and an error character label set corresponding to the negative sample data set, wherein the error character label set is dynamically updated in real time;
inputting the positive sample data set, the negative sample data set and the original picture set as training data sets into a preset optical character recognition model, and recognizing a predicted character set of the training data sets by using the optical character recognition model;
and obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided by the present invention, it should be understood that the disclosed media, devices, apparatuses and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. A method for training an optical character recognition model, the method comprising:
acquiring an original picture set and an original data set corresponding to the original picture set in actual production, and storing the original picture set and the original data set into a preset message queue channel;
when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, screening error data of the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
acquiring a real character label set corresponding to the positive sample data set and an error character label set corresponding to the negative sample data set, wherein the error character label set is dynamically updated in real time;
inputting the positive sample data set, the negative sample data set and the original picture set as training data sets into a preset optical character recognition model, and recognizing a predicted character set of the training data sets by using the optical character recognition model;
and obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
2. The method for training an optical character recognition model according to claim 1, wherein the performing error data filtering on the original data set, determining that the filtered error data constitutes a negative sample data set, and non-error data other than the error data constitutes a positive sample data set, comprises:
acquiring the sequence length of the original data in the original data set by using a preset search engine, and setting a sequence length index by using a preset screening statement in the preset search engine;
and comparing the sequence length with the sequence length index, forming a negative sample data set by using the original data corresponding to the sequence length with inconsistent length of the sequence length index, and forming a positive sample data set by using the original data corresponding to the sequence length with consistent length of the sequence length index.
3. The method for training an optical character recognition model according to claim 1, wherein after determining that the filtered erroneous data constitutes a negative sample data set and non-erroneous data other than the erroneous data constitutes a positive sample data set, the method further comprises:
acquiring data fields of the positive sample data set and the negative sample data set, and identifying sensitive fields in the data fields;
and carrying out desensitization operation on the sensitive field by using a preset desensitization function.
4. The method for training an optical character recognition model according to claim 1, wherein the recognizing the predicted character set of the training data set by using a preset optical character recognition model comprises:
extracting a characteristic sequence of the training data set by using a convolutional layer in a preset optical character recognition model to obtain a character vector set;
predicting a character tag set of the character vector set using a loop layer in the optical character recognition model;
and integrating the character label set by utilizing a transcription layer in the optical character recognition model to obtain a predicted character set.
5. The method of training an optical character recognition model of claim 4, wherein predicting the set of character labels for the set of character vectors using a loop layer in the optical character recognition model comprises:
calculating a state value of the character vector set by using an input gate in the circulation layer;
calculating an activation value of the character vector set by using a forgetting gate in the circulation layer;
calculating a state update value of the character vector set according to the state arrival and activation values;
and calculating the character label set of the state updating value by using an output gate in the circulation layer to obtain the character label set of the character vector set.
6. The method for training an optical character recognition model according to claim 4, wherein the integrating the character tag set by using a transcription layer in the optical character recognition model to obtain a predicted character set comprises:
acquiring all path probabilities of the character label set by utilizing the transcription layer, and searching the maximum path probability corresponding to each character label from the path probabilities;
and combining the maximum path probabilities to obtain a predicted character set of the character label set.
7. The method for training an optical character recognition model according to claim 1, wherein before storing the original image set and the original data set in a predetermined message queue channel, the method further comprises:
establishing a link between the original data set and the original picture set and the message middleware, and forming a message queue channel through the link;
storing the raw picture set and raw data set through the message queue channel.
8. An optical character recognition model training apparatus, characterized in that the apparatus comprises:
the data set acquisition module is used for acquiring an original picture set and an original data set corresponding to the original picture set in actual production and storing the original picture set and the original data set into a preset message queue channel;
the data set screening module is used for acquiring an original data set corresponding to the original picture set from the message queue channel by using a preset search engine when the search engine is idle, screening error data of the original data set, and determining that screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;
the data set marking module is used for acquiring a real character marking set corresponding to the positive sample data set and an error character marking set corresponding to the negative sample data set, wherein the error character marking set is dynamically updated in real time;
a training data set identification module, configured to input the positive sample data set, the negative sample data set, and the original image set as training data sets to a preset optical character recognition model, and identify a predicted character set of the training data set by using the optical character recognition model;
and the model training module is used for obtaining loss values of the predicted character set, the real character label set and the error character label set through calculation, if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, and obtaining the trained optical character recognition model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method of training an optical character recognition model according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method for training an optical character recognition model according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210056338.3A CN114399766B (en) | 2022-01-18 | 2022-01-18 | Optical character recognition model training method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210056338.3A CN114399766B (en) | 2022-01-18 | 2022-01-18 | Optical character recognition model training method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114399766A true CN114399766A (en) | 2022-04-26 |
CN114399766B CN114399766B (en) | 2024-05-10 |
Family
ID=81230820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210056338.3A Active CN114399766B (en) | 2022-01-18 | 2022-01-18 | Optical character recognition model training method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114399766B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140451A1 (en) * | 2014-11-17 | 2016-05-19 | Yahoo! Inc. | System and method for large-scale multi-label learning using incomplete label assignments |
CN107679531A (en) * | 2017-06-23 | 2018-02-09 | 平安科技(深圳)有限公司 | Licence plate recognition method, device, equipment and storage medium based on deep learning |
CN108875722A (en) * | 2017-12-27 | 2018-11-23 | 北京旷视科技有限公司 | Character recognition and identification model training method, device and system and storage medium |
US20190251369A1 (en) * | 2018-02-11 | 2019-08-15 | Ilya Popov | License plate detection and recognition system |
CN110866524A (en) * | 2019-11-15 | 2020-03-06 | 北京字节跳动网络技术有限公司 | License plate detection method, device, equipment and storage medium |
CN112052850A (en) * | 2020-09-03 | 2020-12-08 | 平安科技(深圳)有限公司 | License plate recognition method and device, electronic equipment and storage medium |
CN112560453A (en) * | 2020-12-18 | 2021-03-26 | 平安银行股份有限公司 | Voice information verification method and device, electronic equipment and medium |
WO2021073266A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Image detection-based test question checking method and related device |
CN112733911A (en) * | 2020-12-31 | 2021-04-30 | 平安科技(深圳)有限公司 | Entity recognition model training method, device, equipment and storage medium |
US20210142093A1 (en) * | 2019-11-08 | 2021-05-13 | Tricentis Gmbh | Method and system for single pass optical character recognition |
CN112836748A (en) * | 2021-02-02 | 2021-05-25 | 太原科技大学 | Casting identification character recognition method based on CRNN-CTC |
WO2021139342A1 (en) * | 2020-07-27 | 2021-07-15 | 平安科技(深圳)有限公司 | Training method and apparatus for ocr recognition model, and computer device |
WO2021174839A1 (en) * | 2020-03-06 | 2021-09-10 | 平安科技(深圳)有限公司 | Data compression method and apparatus, and computer-readable storage medium |
CN113392814A (en) * | 2021-08-16 | 2021-09-14 | 冠传网络科技(南京)有限公司 | Method and device for updating character recognition model and storage medium |
CN113743415A (en) * | 2021-08-05 | 2021-12-03 | 杭州远传新业科技有限公司 | Method, system, electronic device and medium for identifying and correcting image text |
CN113822428A (en) * | 2021-08-06 | 2021-12-21 | 中国工商银行股份有限公司 | Neural network training method and device and image segmentation method |
-
2022
- 2022-01-18 CN CN202210056338.3A patent/CN114399766B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140451A1 (en) * | 2014-11-17 | 2016-05-19 | Yahoo! Inc. | System and method for large-scale multi-label learning using incomplete label assignments |
CN107679531A (en) * | 2017-06-23 | 2018-02-09 | 平安科技(深圳)有限公司 | Licence plate recognition method, device, equipment and storage medium based on deep learning |
CN108875722A (en) * | 2017-12-27 | 2018-11-23 | 北京旷视科技有限公司 | Character recognition and identification model training method, device and system and storage medium |
US20190251369A1 (en) * | 2018-02-11 | 2019-08-15 | Ilya Popov | License plate detection and recognition system |
WO2021073266A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Image detection-based test question checking method and related device |
US20210142093A1 (en) * | 2019-11-08 | 2021-05-13 | Tricentis Gmbh | Method and system for single pass optical character recognition |
CN110866524A (en) * | 2019-11-15 | 2020-03-06 | 北京字节跳动网络技术有限公司 | License plate detection method, device, equipment and storage medium |
WO2021174839A1 (en) * | 2020-03-06 | 2021-09-10 | 平安科技(深圳)有限公司 | Data compression method and apparatus, and computer-readable storage medium |
WO2021139342A1 (en) * | 2020-07-27 | 2021-07-15 | 平安科技(深圳)有限公司 | Training method and apparatus for ocr recognition model, and computer device |
CN112052850A (en) * | 2020-09-03 | 2020-12-08 | 平安科技(深圳)有限公司 | License plate recognition method and device, electronic equipment and storage medium |
CN112560453A (en) * | 2020-12-18 | 2021-03-26 | 平安银行股份有限公司 | Voice information verification method and device, electronic equipment and medium |
CN112733911A (en) * | 2020-12-31 | 2021-04-30 | 平安科技(深圳)有限公司 | Entity recognition model training method, device, equipment and storage medium |
CN112836748A (en) * | 2021-02-02 | 2021-05-25 | 太原科技大学 | Casting identification character recognition method based on CRNN-CTC |
CN113743415A (en) * | 2021-08-05 | 2021-12-03 | 杭州远传新业科技有限公司 | Method, system, electronic device and medium for identifying and correcting image text |
CN113822428A (en) * | 2021-08-06 | 2021-12-21 | 中国工商银行股份有限公司 | Neural network training method and device and image segmentation method |
CN113392814A (en) * | 2021-08-16 | 2021-09-14 | 冠传网络科技(南京)有限公司 | Method and device for updating character recognition model and storage medium |
Non-Patent Citations (2)
Title |
---|
刘正琼等: "基于字符编码与卷积神经网络的汉字识别", 电子测量与仪器学报, vol. 34, no. 02, 29 February 2020 (2020-02-29), pages 143 - 149 * |
王逸铭等: "基于神经网络模型的扫描电镜图像字符识别方法", 制造业自动化, vol. 42, no. 07, 31 July 2020 (2020-07-31), pages 18 - 20 * |
Also Published As
Publication number | Publication date |
---|---|
CN114399766B (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112052850A (en) | License plate recognition method and device, electronic equipment and storage medium | |
CN112396005A (en) | Biological characteristic image recognition method and device, electronic equipment and readable storage medium | |
CN113157927A (en) | Text classification method and device, electronic equipment and readable storage medium | |
CN113961473A (en) | Data testing method and device, electronic equipment and computer readable storage medium | |
CN113298159A (en) | Target detection method and device, electronic equipment and storage medium | |
CN114491047A (en) | Multi-label text classification method and device, electronic equipment and storage medium | |
CN113268665A (en) | Information recommendation method, device and equipment based on random forest and storage medium | |
CN115205225A (en) | Training method, device and equipment of medical image recognition model and storage medium | |
CN112541688B (en) | Service data verification method and device, electronic equipment and computer storage medium | |
CN113627160A (en) | Text error correction method and device, electronic equipment and storage medium | |
CN113658002A (en) | Decision tree-based transaction result generation method and device, electronic equipment and medium | |
CN113434542A (en) | Data relation identification method and device, electronic equipment and storage medium | |
CN113313211A (en) | Text classification method and device, electronic equipment and storage medium | |
CN112269875A (en) | Text classification method and device, electronic equipment and storage medium | |
CN112733551A (en) | Text analysis method and device, electronic equipment and readable storage medium | |
CN112486957A (en) | Database migration detection method, device, equipment and storage medium | |
CN115496166A (en) | Multitasking method and device, electronic equipment and storage medium | |
CN113221888B (en) | License plate number management system test method and device, electronic equipment and storage medium | |
CN114996386A (en) | Business role identification method, device, equipment and storage medium | |
CN112580505B (en) | Method and device for identifying network point switch door state, electronic equipment and storage medium | |
CN115544566A (en) | Log desensitization method, device, equipment and storage medium | |
CN115203364A (en) | Software fault feedback processing method, device, equipment and readable storage medium | |
CN115146064A (en) | Intention recognition model optimization method, device, equipment and storage medium | |
CN114399766B (en) | Optical character recognition model training method, device, equipment and medium | |
CN113515591A (en) | Text bad information identification method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |