CN111639527A - English handwritten text recognition method and device, electronic equipment and storage medium - Google Patents
English handwritten text recognition method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111639527A CN111639527A CN202010329360.1A CN202010329360A CN111639527A CN 111639527 A CN111639527 A CN 111639527A CN 202010329360 A CN202010329360 A CN 202010329360A CN 111639527 A CN111639527 A CN 111639527A
- Authority
- CN
- China
- Prior art keywords
- picture
- english
- recognition model
- pictures
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 84
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 36
- 238000012360 testing method Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 16
- 238000013518 transcription Methods 0.000 claims description 14
- 230000035897 transcription Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 9
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
A method of english handwritten text recognition, the method comprising: acquiring an English handwritten text line picture set; according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures; determining a first standard picture and a picture with a length to be compensated from the plurality of zoom pictures; adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture; randomly adjusting the first standard picture and the second standard picture to obtain a training picture; training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model; acquiring a picture to be identified; and inputting the picture to be recognized into the trained recognition model to obtain a recognition result. The invention also provides an English handwritten text recognition device, electronic equipment and a storage medium. The invention can identify the whole line of English text.
Description
Technical Field
The invention relates to the technical field of picture recognition, in particular to an English handwritten text recognition method and device, electronic equipment and a storage medium.
Background
At present, characters in character images, such as English letters and single words in the character images, can be recognized through a computer, but in practice, the characters in some character images are handwritten by users, the written characters are different in form due to different personal writing habits, and for a whole line of text, spaces are formed among words, punctuation marks are formed, the length of the text is not fixed, so that the whole line of English text cannot be recognized.
Therefore, how to recognize the whole line of english text is a technical problem that needs to be solved urgently.
Disclosure of Invention
In view of the above, it is desirable to provide an english handwritten text recognition method, apparatus, electronic device and storage medium, which can recognize an entire line of english text.
The first aspect of the present invention provides an english handwritten text recognition method, including:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English, blank spaces and punctuation marks in the picture to be recognized.
In a possible implementation manner, the randomly adjusting the first standard picture and the second standard picture to obtain a training picture includes:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
In a possible implementation manner, the training an initial recognition model according to a back propagation algorithm and the training picture, and obtaining a trained recognition model includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
In a possible implementation manner, the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain a trained recognition model includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
In one possible implementation, the method further includes:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
In a possible implementation manner, after the training an initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model, the method further includes:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In one possible implementation, the initial recognition model includes a convolutional layer, a cyclic layer, and a transcription layer.
A second aspect of the present invention provides an apparatus for recognizing handwritten english text, the apparatus comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an English handwritten text line picture set, and pictures of the English handwritten text line picture set comprise English letters, spaces and punctuations;
the zooming module is used for carrying out equal-scale zooming on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of zoomed pictures;
a determining module, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold;
an adding module, configured to add a blank region to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold;
the adjusting module is used for randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
the training module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
the acquisition module is also used for acquiring a picture to be identified;
and the input module is used for inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English, blank spaces and punctuation marks in the picture to be recognized.
A third aspect of the present invention provides an electronic device, which includes a processor and a memory, wherein the processor is configured to implement the method for recognizing handwritten english text when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the method for recognizing handwritten english text.
By the technical scheme, the recognition model can be trained by using a large number of English handwritten text line picture sets to recognize the whole line of English texts, wherein the pictures for training are scaled in equal proportion, so that the characters in the pictures are ensured not to deform, the brightness, the contrast, the saturation and the noise of the pictures are randomly adjusted, the picture types generated under different scenes are simulated, the precision of the recognition model can be improved, and the English text lines in various pictures can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
Drawings
Fig. 1 is a flowchart illustrating a method for recognizing handwritten english text according to a preferred embodiment of the present invention.
Fig. 2 is a functional block diagram of an apparatus for recognizing handwritten english text according to a preferred embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a method for recognizing handwritten english text according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The English handwritten text recognition method is applied to electronic equipment, and can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and the server and the electronic equipment are jointly executed. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network device, a server group consisting of a plurality of network devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network devices, wherein the Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for recognizing handwritten english text according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.
S11, the electronic equipment obtains an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks.
The picture set of the english handwritten text line can be obtained from an open IAM handwriting Database (IAM and writing Database), the IAM handwriting Database contains unlimited english handwritten texts, and the english handwritten texts are scanned at a resolution of 300dpi and are stored as a PNG image with 256 gray levels.
And S12, the electronic equipment performs equal-scale scaling on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of scaling pictures.
The width of the zooming picture is a preset width, and the lengths of the zooming pictures may be different.
In the embodiment of the invention, the English letters in the picture can be prevented from deforming by scaling in equal proportion. The pictures can be scaled to have the same width as the preset width, because the length-width ratio of the pictures is fixed, if the original length-width ratio of each picture is different, the scaled pictures have the same width but different lengths.
S13, the electronic device determines a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold.
In the embodiment of the invention, the pictures with the length larger than the preset length can be deleted.
And S14, adding a blank area to the picture with the length to be compensated according to the preset length threshold value by the electronic equipment to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value.
In the embodiment of the invention, a blank area is added at the left end or the right end of the picture with the length to be compensated to obtain a second standard picture, so that the sizes of the pictures are kept consistent. The neural network used in the training has certain requirements on the input pictures (length and width), and the pictures which meet the requirements and have consistent picture length and consistent picture width can be simultaneously input into the neural network to be trained, so that the training time is saved.
S15, the electronic equipment randomly adjusts the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects comprise picture brightness, picture contrast, picture saturation, noise and picture font size.
In the embodiment of the invention, the brightness, the contrast, the saturation, the noise and the font size of the picture can be adjusted, English text pictures shot in different environments can be simulated, the diversity of training samples can be increased, and the training effect is improved.
Specifically, the randomly adjusting the first standard picture and the second standard picture to obtain the training picture includes:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
The preset scaling multiple interval may be [0.6, 1.0], so that the length and the width of the scaled picture do not exceed the original length and the original width (that is, the scaled picture does not), and the scaled picture may be mapped on a canvas with a preset size.
In this optional embodiment, the zoom factor may be randomly obtained from the preset zoom factor interval to zoom the image, so as to simulate a situation that different people write words with different font sizes. The random adjustment of the brightness, the contrast and the saturation of the picture is carried out in order to simulate the pictures with different effects caused by different picture backgrounds and different shooting light rays in a real scene. The noise was added randomly in order to simulate different quality pictures. Through the training pictures adjusted randomly, the recognition model with higher accuracy and wider applicability can be trained.
And S16, the electronic equipment trains the initial recognition model according to a back propagation algorithm and the training picture to obtain the trained recognition model.
The neural network in the initial recognition model can have a loss function, the loss function is used for calculating the distance between the data output by the current neural network modeling and the ideal data, and the back propagation algorithm can update each parameter in the neural network, so that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is continuously close to the ideal data.
Wherein the initial recognition model comprises a convolutional layer, a cyclic layer and a transcription layer.
Among them, the Convolutional layer may be CNN (Convolutional Neural Networks), the cyclic layer may be RNN (Recurrent Neural Networks), and the transcription layer may be CTC (connection timing Classification).
Specifically, the training the initial recognition model according to the back propagation algorithm and the training picture, and obtaining the trained recognition model includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
The label sequence is identified English text which comprises English letters, punctuation marks and spaces.
In this alternative embodiment, the pixel features of the picture may be extracted by the convolutional layer; then inputting the pixel characteristics into a loop layer to obtain image time sequence characteristics, and finally mapping the image time sequence characteristics into a tag sequence by a transcription layer, such as: the English letter "ab" exists in the input picture, the obtained image time sequence features can be a group of vectors (t1, t2, t3, t4, t5), and the label sequence output by the final transcription layer can be "ab".
As an optional implementation manner, the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain a trained recognition model includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
Wherein, the test set can be English text pictures used for testing.
In this optional implementation, when the parameters of the model are continuously updated by using the back propagation algorithm, the model may be tested by using the test set to obtain the recognition accuracy of the model, and if the recognition accuracy of the model meets the preset requirement (i.e., the recognition accuracy is greater than the preset accuracy threshold), the model and the training may be considered to be completed.
As an optional implementation, the method further comprises:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not reached the expected recognition effect, and the training may be continued or may be retrained.
And S17, the electronic equipment acquires the picture to be identified.
The picture to be recognized can be a picture carrying English letters.
S18, the electronic equipment inputs the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
In the embodiment of the invention, the trained recognition model can recognize the whole line of English text in the picture.
As an optional implementation manner, after the initial recognition model is trained according to a back propagation algorithm and the training picture, and a trained recognition model is obtained, the method further includes:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In this alternative embodiment, the Hough transform (Hough) may map the letter image into a parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain a horizontal letter image. It is possible to prevent a problem that the recognition effect is not good due to the inclination of the letter image caused by the personal writing or photographing.
In the method flow described in fig. 1, the recognition model can be trained by using a large number of english handwritten text line image sets to recognize the whole line of english text, wherein the images for training are scaled in equal proportion, so as to ensure that the characters in the images are not deformed, and the brightness, contrast, saturation and noise of the images are randomly adjusted to simulate the image types generated in different scenes, so that the accuracy of the recognition model can be improved, and the english text lines in various images can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
Referring to fig. 2, fig. 2 is a functional block diagram of a preferred embodiment of an english handwritten text recognition apparatus according to the present invention.
In some embodiments, the English handwritten text recognition device runs in an electronic device. The program may include a plurality of functional modules comprised of program code segments. The program codes of the various program segments of the english handwritten text recognition apparatus may be stored in a memory and executed by at least one processor to perform some or all of the steps of the english handwritten text recognition method described in fig. 1.
In this embodiment, the english handwritten text recognition apparatus may be divided into a plurality of functional modules according to the functions executed by the apparatus. The functional module may include: an acquisition module 201, a scaling module 202, a determination module 203, an addition module 204, an adjustment module 205, a training module 206, and an input module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.
The obtaining module 201 is configured to obtain an english handwritten text line picture set, where the pictures of the english handwritten text line picture set include english letters, spaces, and punctuation marks.
The picture set of the english handwritten text line can be obtained from an open IAM handwriting Database (IAM and writing Database), the IAM handwriting Database contains unlimited english handwritten texts, and the english handwritten texts are scanned at a resolution of 300dpi and are stored as a PNG image with 256 gray levels.
And the zooming module 202 is configured to perform equal-scale zooming on all the pictures in the english handwritten text line picture set according to a preset width threshold, so as to obtain a plurality of zoomed pictures.
The width of the zooming picture is a preset width, and the lengths of the zooming pictures may be different.
In the embodiment of the invention, the English letters in the picture can be prevented from deforming by scaling in equal proportion. The pictures can be scaled to have the same width as the preset width, because the length-width ratio of the pictures is fixed, if the original length-width ratio of each picture is different, the scaled pictures have the same width but different lengths.
A determining module 203, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold.
In the embodiment of the invention, the pictures with the length larger than the preset length can be deleted.
An adding module 204, configured to add a blank area to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold.
In the embodiment of the invention, a blank area is added at the left end or the right end of the picture with the length to be compensated to obtain a second standard picture, so that the sizes of the pictures are kept consistent. The neural network used in the training has certain requirements on the input pictures (length and width), and the pictures which meet the requirements and have consistent picture length and consistent picture width can be simultaneously input into the neural network to be trained, so that the training time is saved.
An adjusting module 205, configured to randomly adjust the first standard picture and the second standard picture to obtain a training picture, where an object of the random adjustment includes picture brightness, picture contrast, picture saturation, noise, and picture font size.
In the embodiment of the invention, the brightness, the contrast, the saturation, the noise and the font size of the picture can be adjusted, English text pictures shot in different environments can be simulated, the diversity of training samples can be increased, and the training effect is improved.
And the training module 206 is configured to train the initial recognition model according to a back propagation algorithm and the training picture, so as to obtain a trained recognition model.
The neural network in the initial recognition model can have a loss function, the loss function is used for calculating the distance between the data output by the current neural network modeling and the ideal data, and the back propagation algorithm can update each parameter in the neural network, so that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is continuously close to the ideal data.
Wherein the initial recognition model comprises a convolutional layer, a cyclic layer and a transcription layer.
Among them, the Convolutional layer may be CNN (Convolutional Neural Networks), the cyclic layer may be RNN (Recurrent Neural Networks), and the transcription layer may be CTC (connection timing Classification).
The obtaining module 201 is further configured to obtain a picture to be identified;
the picture to be recognized can be a picture carrying English letters.
And the input module 207 is configured to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes an english letter, a space and a punctuation mark in the picture to be recognized.
In the embodiment of the invention, the trained recognition model can recognize the whole line of English text in the picture.
As an optional implementation manner, the adjusting module 205 randomly adjusts the first standard picture and the second standard picture to obtain the training picture specifically:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
The preset scaling multiple interval may be [0.6, 1.0], so that the length and the width of the scaled picture do not exceed the original length and the original width (that is, the scaled picture does not), and the scaled picture may be mapped on a canvas with a preset size.
In this optional embodiment, the zoom factor may be randomly obtained from the preset zoom factor interval to zoom the image, so as to simulate a situation that different people write words with different font sizes. The random adjustment of the brightness, the contrast and the saturation of the picture is carried out in order to simulate the pictures with different effects caused by different picture backgrounds and different shooting light rays in a real scene. The noise was added randomly in order to simulate different quality pictures. Through the training pictures adjusted randomly, the recognition model with higher accuracy and wider applicability can be trained.
As an optional implementation manner, the training module 206 trains the initial recognition model according to a back propagation algorithm and the training picture, and the manner of obtaining the trained recognition model specifically includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
The label sequence is identified English text which comprises English letters, punctuation marks and spaces.
In this alternative embodiment, the pixel features of the picture may be extracted by the convolutional layer; then inputting the pixel characteristics into a loop layer to obtain image time sequence characteristics, and finally mapping the image time sequence characteristics into a tag sequence by a transcription layer, such as: the English letter "ab" exists in the input picture, the obtained image time sequence features can be a group of vectors (t1, t2, t3, t4, t5), and the label sequence output by the final transcription layer can be "ab".
As an optional implementation manner, the training module 206 updates the network parameters of the initial recognition model according to a back propagation algorithm and the loss value, and the manner of obtaining the trained recognition model specifically includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
Wherein, the test set can be English text pictures used for testing.
In this optional implementation, when the parameters of the model are continuously updated by using the back propagation algorithm, the model may be tested by using the test set to obtain the recognition accuracy of the model, and if the recognition accuracy of the model meets the preset requirement (i.e., the recognition accuracy is greater than the preset accuracy threshold), the model and the training may be considered to be completed.
As an optional implementation manner, the determining module 203 is further configured to determine that the model to be tested is an untrained recognition model if the accuracy is less than or equal to a preset accuracy threshold;
the training module 206 is further configured to retrain the untrained recognition model.
In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not reached the expected recognition effect, and the training may be continued or may be retrained.
As an optional implementation manner, the english handwritten text recognition apparatus may further include:
and the correction module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model, and then performing inclination correction on the picture to be recognized according to a Hough transform algorithm to obtain a corrected picture.
The input module 207 inputs the picture to be recognized into the trained recognition model, and the mode of obtaining the recognition result specifically includes:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In this alternative embodiment, the Hough transform (Hough) may map the letter image into a parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain a horizontal letter image. It is possible to prevent a problem that the recognition effect is not good due to the inclination of the letter image caused by the personal writing or photographing.
In the apparatus for recognizing handwritten english text depicted in fig. 2, a recognition model can be trained by using a large number of sets of images of handwritten english text lines to recognize the whole line of english text, wherein the images for training are scaled in equal proportion, so as to ensure that the characters in the images are not deformed, and the brightness, contrast, saturation and noise of the images are randomly adjusted to simulate the types of images generated in different scenes, thereby improving the accuracy of the recognition model and recognizing the english text lines in various images. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device for implementing a method for recognizing handwritten english text according to a preferred embodiment of the present invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The electronic device 3 may also include, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like. The Network where the electronic device 3 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a discrete hardware component, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. In addition, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one disk storage device, a flash memory device, and the like.
With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement an english handwritten text recognition method, and the processor 32 executes the plurality of instructions to implement:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 depicted in fig. 3, the recognition model can be trained by using a large number of sets of pictures of the handwritten English text lines to recognize the whole line of English text, wherein the pictures for training are scaled in equal proportion, so that the characters in the pictures are not deformed, the brightness, the contrast, the saturation and the noise of the pictures are randomly adjusted, the types of the pictures generated in different scenes are simulated, the precision of the recognition model can be improved, and the English text lines in various pictures can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program code may be in source code form, object code form, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. An English handwritten text recognition method, characterized in that the English handwritten text recognition method comprises:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
2. The method for recognizing handwritten English text according to claim 1, wherein the randomly adjusting the first standard picture and the second standard picture to obtain the training picture comprises:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
3. The method for recognizing handwritten English text according to claim 1, wherein the training an initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model comprises:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
4. The method of claim 3, wherein the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain the trained recognition model comprises:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
5. The english handwritten text recognition method according to claim 4, further comprising:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
6. The method for recognizing handwritten English text according to claim 1, wherein after the initial recognition model is trained according to a back propagation algorithm and the training picture, and the trained recognition model is obtained, the method for recognizing handwritten English text further comprises:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
7. The english handwritten text recognition method according to any one of claims 1 to 6, wherein said initial recognition model includes a convolutional layer, a cyclic layer, and a transcription layer.
8. An english handwritten text recognition apparatus, characterized in that said english handwritten text recognition apparatus comprises:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an English handwritten text line picture set, and pictures of the English handwritten text line picture set comprise English letters, spaces and punctuations;
the zooming module is used for carrying out equal-scale zooming on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of zoomed pictures;
a determining module, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold;
an adding module, configured to add a blank region to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold;
the adjusting module is used for randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
the training module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
the acquisition module is also used for acquiring a picture to be identified;
and the input module is used for inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the english handwritten text recognition method according to any of claims 1 to 7.
10. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements the english handwritten text recognition method according to any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010329360.1A CN111639527A (en) | 2020-04-23 | 2020-04-23 | English handwritten text recognition method and device, electronic equipment and storage medium |
PCT/CN2020/098237 WO2021212652A1 (en) | 2020-04-23 | 2020-06-24 | Handwritten english text recognition method and device, electronic apparatus, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010329360.1A CN111639527A (en) | 2020-04-23 | 2020-04-23 | English handwritten text recognition method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111639527A true CN111639527A (en) | 2020-09-08 |
Family
ID=72328702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010329360.1A Pending CN111639527A (en) | 2020-04-23 | 2020-04-23 | English handwritten text recognition method and device, electronic equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111639527A (en) |
WO (1) | WO2021212652A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113887546A (en) * | 2021-12-08 | 2022-01-04 | 军事科学院系统工程研究院网络信息研究所 | Method and system for improving image recognition accuracy |
CN114065868A (en) * | 2021-11-24 | 2022-02-18 | 马上消费金融股份有限公司 | Training method of text detection model, text detection method and device |
CN115546614A (en) * | 2022-12-02 | 2022-12-30 | 天津城建大学 | Safety helmet wearing detection method based on improved YOLOV5 model |
WO2023173617A1 (en) * | 2022-03-18 | 2023-09-21 | 北京百度网讯科技有限公司 | Image processing method and apparatus, device, and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114550158A (en) * | 2022-02-23 | 2022-05-27 | 厦门大学 | Scene character recognition method and system |
WO2024103292A1 (en) * | 2022-11-16 | 2024-05-23 | 京东方科技集团股份有限公司 | Handwritten form recognition method, and handwritten form recognition model training method and device |
CN116798052B (en) * | 2023-08-28 | 2023-12-08 | 腾讯科技(深圳)有限公司 | Training method and device of text recognition model, storage medium and electronic equipment |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10936862B2 (en) * | 2016-11-14 | 2021-03-02 | Kodak Alaris Inc. | System and method of character recognition using fully convolutional neural networks |
EP3598339A1 (en) * | 2018-07-19 | 2020-01-22 | Tata Consultancy Services Limited | Systems and methods for end-to-end handwritten text recognition using neural networks |
CN109376658B (en) * | 2018-10-26 | 2022-03-08 | 信雅达科技股份有限公司 | OCR method based on deep learning |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
CN110298338B (en) * | 2019-06-20 | 2021-08-24 | 北京易道博识科技有限公司 | Document image classification method and device |
CN110298343A (en) * | 2019-07-02 | 2019-10-01 | 哈尔滨理工大学 | A kind of hand-written blackboard writing on the blackboard recognition methods |
CN110619326B (en) * | 2019-07-02 | 2023-04-18 | 安徽七天网络科技有限公司 | English test paper composition detection and identification system and method based on scanning |
CN110781885A (en) * | 2019-10-24 | 2020-02-11 | 泰康保险集团股份有限公司 | Text detection method, device, medium and electronic equipment based on image processing |
CN110765966B (en) * | 2019-10-30 | 2022-03-25 | 哈尔滨工业大学 | One-stage automatic recognition and translation method for handwritten characters |
CN111008624A (en) * | 2019-12-05 | 2020-04-14 | 嘉兴太美医疗科技有限公司 | Optical character recognition method and method for generating training sample for optical character recognition |
-
2020
- 2020-04-23 CN CN202010329360.1A patent/CN111639527A/en active Pending
- 2020-06-24 WO PCT/CN2020/098237 patent/WO2021212652A1/en active Application Filing
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114065868A (en) * | 2021-11-24 | 2022-02-18 | 马上消费金融股份有限公司 | Training method of text detection model, text detection method and device |
CN114065868B (en) * | 2021-11-24 | 2022-09-02 | 马上消费金融股份有限公司 | Training method of text detection model, text detection method and device |
CN113887546A (en) * | 2021-12-08 | 2022-01-04 | 军事科学院系统工程研究院网络信息研究所 | Method and system for improving image recognition accuracy |
CN113887546B (en) * | 2021-12-08 | 2022-03-11 | 军事科学院系统工程研究院网络信息研究所 | Method and system for improving image recognition accuracy |
WO2023173617A1 (en) * | 2022-03-18 | 2023-09-21 | 北京百度网讯科技有限公司 | Image processing method and apparatus, device, and storage medium |
CN115546614A (en) * | 2022-12-02 | 2022-12-30 | 天津城建大学 | Safety helmet wearing detection method based on improved YOLOV5 model |
CN115546614B (en) * | 2022-12-02 | 2023-04-18 | 天津城建大学 | Safety helmet wearing detection method based on improved YOLOV5 model |
Also Published As
Publication number | Publication date |
---|---|
WO2021212652A1 (en) | 2021-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021212652A1 (en) | Handwritten english text recognition method and device, electronic apparatus, and storage medium | |
CN108121986B (en) | Object detection method and device, computer device and computer readable storage medium | |
CN107977633B (en) | Age recognition methods, device and the storage medium of facial image | |
CN111754596B (en) | Editing model generation method, device, equipment and medium for editing face image | |
US10902283B2 (en) | Method and device for determining handwriting similarity | |
CN110276342B (en) | License plate identification method and system | |
WO2022156640A1 (en) | Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product | |
CN109345553B (en) | Palm and key point detection method and device thereof, and terminal equipment | |
CN110443140B (en) | Text positioning method, device, computer equipment and storage medium | |
CN111291629A (en) | Method and device for recognizing text in image, computer equipment and computer storage medium | |
CN109583509B (en) | Data generation method and device and electronic equipment | |
US20210383199A1 (en) | Object-Centric Learning with Slot Attention | |
CN111507330B (en) | Problem recognition method and device, electronic equipment and storage medium | |
US20230027412A1 (en) | Method and apparatus for recognizing subtitle region, device, and storage medium | |
CN110598703B (en) | OCR (optical character recognition) method and device based on deep neural network | |
CN112464798A (en) | Text recognition method and device, electronic equipment and storage medium | |
WO2022126917A1 (en) | Deep learning-based face image evaluation method and apparatus, device, and medium | |
CN112949649B (en) | Text image identification method and device and computing equipment | |
CN113516697A (en) | Image registration method and device, electronic equipment and computer-readable storage medium | |
WO2020244076A1 (en) | Face recognition method and apparatus, and electronic device and storage medium | |
CN112990134B (en) | Image simulation method and device, electronic equipment and storage medium | |
CN112836467B (en) | Image processing method and device | |
US11900258B2 (en) | Learning device, image generating device, learning method, image generating method, and program | |
CN114140802B (en) | Text recognition method and device, electronic equipment and storage medium | |
US20240193980A1 (en) | Method for recognizing human body area in image, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210128 Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.) Applicant after: Shenzhen saiante Technology Service Co.,Ltd. Address before: 518000 1st-34th floor, Qianhai free trade building, 3048 Mawan Xinghai Avenue, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong Applicant before: Ping An International Smart City Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |