CN111639527A - English handwritten text recognition method and device, electronic equipment and storage medium - Google Patents

English handwritten text recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111639527A
CN111639527A CN202010329360.1A CN202010329360A CN111639527A CN 111639527 A CN111639527 A CN 111639527A CN 202010329360 A CN202010329360 A CN 202010329360A CN 111639527 A CN111639527 A CN 111639527A
Authority
CN
China
Prior art keywords
picture
english
recognition model
pictures
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010329360.1A
Other languages
Chinese (zh)
Inventor
赵振兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010329360.1A priority Critical patent/CN111639527A/en
Priority to PCT/CN2020/098237 priority patent/WO2021212652A1/en
Publication of CN111639527A publication Critical patent/CN111639527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

A method of english handwritten text recognition, the method comprising: acquiring an English handwritten text line picture set; according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures; determining a first standard picture and a picture with a length to be compensated from the plurality of zoom pictures; adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture; randomly adjusting the first standard picture and the second standard picture to obtain a training picture; training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model; acquiring a picture to be identified; and inputting the picture to be recognized into the trained recognition model to obtain a recognition result. The invention also provides an English handwritten text recognition device, electronic equipment and a storage medium. The invention can identify the whole line of English text.

Description

English handwritten text recognition method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of picture recognition, in particular to an English handwritten text recognition method and device, electronic equipment and a storage medium.
Background
At present, characters in character images, such as English letters and single words in the character images, can be recognized through a computer, but in practice, the characters in some character images are handwritten by users, the written characters are different in form due to different personal writing habits, and for a whole line of text, spaces are formed among words, punctuation marks are formed, the length of the text is not fixed, so that the whole line of English text cannot be recognized.
Therefore, how to recognize the whole line of english text is a technical problem that needs to be solved urgently.
Disclosure of Invention
In view of the above, it is desirable to provide an english handwritten text recognition method, apparatus, electronic device and storage medium, which can recognize an entire line of english text.
The first aspect of the present invention provides an english handwritten text recognition method, including:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English, blank spaces and punctuation marks in the picture to be recognized.
In a possible implementation manner, the randomly adjusting the first standard picture and the second standard picture to obtain a training picture includes:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
In a possible implementation manner, the training an initial recognition model according to a back propagation algorithm and the training picture, and obtaining a trained recognition model includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
In a possible implementation manner, the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain a trained recognition model includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
In one possible implementation, the method further includes:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
In a possible implementation manner, after the training an initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model, the method further includes:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In one possible implementation, the initial recognition model includes a convolutional layer, a cyclic layer, and a transcription layer.
A second aspect of the present invention provides an apparatus for recognizing handwritten english text, the apparatus comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an English handwritten text line picture set, and pictures of the English handwritten text line picture set comprise English letters, spaces and punctuations;
the zooming module is used for carrying out equal-scale zooming on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of zoomed pictures;
a determining module, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold;
an adding module, configured to add a blank region to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold;
the adjusting module is used for randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
the training module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
the acquisition module is also used for acquiring a picture to be identified;
and the input module is used for inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English, blank spaces and punctuation marks in the picture to be recognized.
A third aspect of the present invention provides an electronic device, which includes a processor and a memory, wherein the processor is configured to implement the method for recognizing handwritten english text when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the method for recognizing handwritten english text.
By the technical scheme, the recognition model can be trained by using a large number of English handwritten text line picture sets to recognize the whole line of English texts, wherein the pictures for training are scaled in equal proportion, so that the characters in the pictures are ensured not to deform, the brightness, the contrast, the saturation and the noise of the pictures are randomly adjusted, the picture types generated under different scenes are simulated, the precision of the recognition model can be improved, and the English text lines in various pictures can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
Drawings
Fig. 1 is a flowchart illustrating a method for recognizing handwritten english text according to a preferred embodiment of the present invention.
Fig. 2 is a functional block diagram of an apparatus for recognizing handwritten english text according to a preferred embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a method for recognizing handwritten english text according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The English handwritten text recognition method is applied to electronic equipment, and can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and the server and the electronic equipment are jointly executed. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network device, a server group consisting of a plurality of network devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network devices, wherein the Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for recognizing handwritten english text according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.
S11, the electronic equipment obtains an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks.
The picture set of the english handwritten text line can be obtained from an open IAM handwriting Database (IAM and writing Database), the IAM handwriting Database contains unlimited english handwritten texts, and the english handwritten texts are scanned at a resolution of 300dpi and are stored as a PNG image with 256 gray levels.
And S12, the electronic equipment performs equal-scale scaling on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of scaling pictures.
The width of the zooming picture is a preset width, and the lengths of the zooming pictures may be different.
In the embodiment of the invention, the English letters in the picture can be prevented from deforming by scaling in equal proportion. The pictures can be scaled to have the same width as the preset width, because the length-width ratio of the pictures is fixed, if the original length-width ratio of each picture is different, the scaled pictures have the same width but different lengths.
S13, the electronic device determines a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold.
In the embodiment of the invention, the pictures with the length larger than the preset length can be deleted.
And S14, adding a blank area to the picture with the length to be compensated according to the preset length threshold value by the electronic equipment to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value.
In the embodiment of the invention, a blank area is added at the left end or the right end of the picture with the length to be compensated to obtain a second standard picture, so that the sizes of the pictures are kept consistent. The neural network used in the training has certain requirements on the input pictures (length and width), and the pictures which meet the requirements and have consistent picture length and consistent picture width can be simultaneously input into the neural network to be trained, so that the training time is saved.
S15, the electronic equipment randomly adjusts the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects comprise picture brightness, picture contrast, picture saturation, noise and picture font size.
In the embodiment of the invention, the brightness, the contrast, the saturation, the noise and the font size of the picture can be adjusted, English text pictures shot in different environments can be simulated, the diversity of training samples can be increased, and the training effect is improved.
Specifically, the randomly adjusting the first standard picture and the second standard picture to obtain the training picture includes:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
The preset scaling multiple interval may be [0.6, 1.0], so that the length and the width of the scaled picture do not exceed the original length and the original width (that is, the scaled picture does not), and the scaled picture may be mapped on a canvas with a preset size.
In this optional embodiment, the zoom factor may be randomly obtained from the preset zoom factor interval to zoom the image, so as to simulate a situation that different people write words with different font sizes. The random adjustment of the brightness, the contrast and the saturation of the picture is carried out in order to simulate the pictures with different effects caused by different picture backgrounds and different shooting light rays in a real scene. The noise was added randomly in order to simulate different quality pictures. Through the training pictures adjusted randomly, the recognition model with higher accuracy and wider applicability can be trained.
And S16, the electronic equipment trains the initial recognition model according to a back propagation algorithm and the training picture to obtain the trained recognition model.
The neural network in the initial recognition model can have a loss function, the loss function is used for calculating the distance between the data output by the current neural network modeling and the ideal data, and the back propagation algorithm can update each parameter in the neural network, so that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is continuously close to the ideal data.
Wherein the initial recognition model comprises a convolutional layer, a cyclic layer and a transcription layer.
Among them, the Convolutional layer may be CNN (Convolutional Neural Networks), the cyclic layer may be RNN (Recurrent Neural Networks), and the transcription layer may be CTC (connection timing Classification).
Specifically, the training the initial recognition model according to the back propagation algorithm and the training picture, and obtaining the trained recognition model includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
The label sequence is identified English text which comprises English letters, punctuation marks and spaces.
In this alternative embodiment, the pixel features of the picture may be extracted by the convolutional layer; then inputting the pixel characteristics into a loop layer to obtain image time sequence characteristics, and finally mapping the image time sequence characteristics into a tag sequence by a transcription layer, such as: the English letter "ab" exists in the input picture, the obtained image time sequence features can be a group of vectors (t1, t2, t3, t4, t5), and the label sequence output by the final transcription layer can be "ab".
As an optional implementation manner, the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain a trained recognition model includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
Wherein, the test set can be English text pictures used for testing.
In this optional implementation, when the parameters of the model are continuously updated by using the back propagation algorithm, the model may be tested by using the test set to obtain the recognition accuracy of the model, and if the recognition accuracy of the model meets the preset requirement (i.e., the recognition accuracy is greater than the preset accuracy threshold), the model and the training may be considered to be completed.
As an optional implementation, the method further comprises:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not reached the expected recognition effect, and the training may be continued or may be retrained.
And S17, the electronic equipment acquires the picture to be identified.
The picture to be recognized can be a picture carrying English letters.
S18, the electronic equipment inputs the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
In the embodiment of the invention, the trained recognition model can recognize the whole line of English text in the picture.
As an optional implementation manner, after the initial recognition model is trained according to a back propagation algorithm and the training picture, and a trained recognition model is obtained, the method further includes:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In this alternative embodiment, the Hough transform (Hough) may map the letter image into a parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain a horizontal letter image. It is possible to prevent a problem that the recognition effect is not good due to the inclination of the letter image caused by the personal writing or photographing.
In the method flow described in fig. 1, the recognition model can be trained by using a large number of english handwritten text line image sets to recognize the whole line of english text, wherein the images for training are scaled in equal proportion, so as to ensure that the characters in the images are not deformed, and the brightness, contrast, saturation and noise of the images are randomly adjusted to simulate the image types generated in different scenes, so that the accuracy of the recognition model can be improved, and the english text lines in various images can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
Referring to fig. 2, fig. 2 is a functional block diagram of a preferred embodiment of an english handwritten text recognition apparatus according to the present invention.
In some embodiments, the English handwritten text recognition device runs in an electronic device. The program may include a plurality of functional modules comprised of program code segments. The program codes of the various program segments of the english handwritten text recognition apparatus may be stored in a memory and executed by at least one processor to perform some or all of the steps of the english handwritten text recognition method described in fig. 1.
In this embodiment, the english handwritten text recognition apparatus may be divided into a plurality of functional modules according to the functions executed by the apparatus. The functional module may include: an acquisition module 201, a scaling module 202, a determination module 203, an addition module 204, an adjustment module 205, a training module 206, and an input module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.
The obtaining module 201 is configured to obtain an english handwritten text line picture set, where the pictures of the english handwritten text line picture set include english letters, spaces, and punctuation marks.
The picture set of the english handwritten text line can be obtained from an open IAM handwriting Database (IAM and writing Database), the IAM handwriting Database contains unlimited english handwritten texts, and the english handwritten texts are scanned at a resolution of 300dpi and are stored as a PNG image with 256 gray levels.
And the zooming module 202 is configured to perform equal-scale zooming on all the pictures in the english handwritten text line picture set according to a preset width threshold, so as to obtain a plurality of zoomed pictures.
The width of the zooming picture is a preset width, and the lengths of the zooming pictures may be different.
In the embodiment of the invention, the English letters in the picture can be prevented from deforming by scaling in equal proportion. The pictures can be scaled to have the same width as the preset width, because the length-width ratio of the pictures is fixed, if the original length-width ratio of each picture is different, the scaled pictures have the same width but different lengths.
A determining module 203, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold.
In the embodiment of the invention, the pictures with the length larger than the preset length can be deleted.
An adding module 204, configured to add a blank area to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold.
In the embodiment of the invention, a blank area is added at the left end or the right end of the picture with the length to be compensated to obtain a second standard picture, so that the sizes of the pictures are kept consistent. The neural network used in the training has certain requirements on the input pictures (length and width), and the pictures which meet the requirements and have consistent picture length and consistent picture width can be simultaneously input into the neural network to be trained, so that the training time is saved.
An adjusting module 205, configured to randomly adjust the first standard picture and the second standard picture to obtain a training picture, where an object of the random adjustment includes picture brightness, picture contrast, picture saturation, noise, and picture font size.
In the embodiment of the invention, the brightness, the contrast, the saturation, the noise and the font size of the picture can be adjusted, English text pictures shot in different environments can be simulated, the diversity of training samples can be increased, and the training effect is improved.
And the training module 206 is configured to train the initial recognition model according to a back propagation algorithm and the training picture, so as to obtain a trained recognition model.
The neural network in the initial recognition model can have a loss function, the loss function is used for calculating the distance between the data output by the current neural network modeling and the ideal data, and the back propagation algorithm can update each parameter in the neural network, so that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is continuously close to the ideal data.
Wherein the initial recognition model comprises a convolutional layer, a cyclic layer and a transcription layer.
Among them, the Convolutional layer may be CNN (Convolutional Neural Networks), the cyclic layer may be RNN (Recurrent Neural Networks), and the transcription layer may be CTC (connection timing Classification).
The obtaining module 201 is further configured to obtain a picture to be identified;
the picture to be recognized can be a picture carrying English letters.
And the input module 207 is configured to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes an english letter, a space and a punctuation mark in the picture to be recognized.
In the embodiment of the invention, the trained recognition model can recognize the whole line of English text in the picture.
As an optional implementation manner, the adjusting module 205 randomly adjusts the first standard picture and the second standard picture to obtain the training picture specifically:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
The preset scaling multiple interval may be [0.6, 1.0], so that the length and the width of the scaled picture do not exceed the original length and the original width (that is, the scaled picture does not), and the scaled picture may be mapped on a canvas with a preset size.
In this optional embodiment, the zoom factor may be randomly obtained from the preset zoom factor interval to zoom the image, so as to simulate a situation that different people write words with different font sizes. The random adjustment of the brightness, the contrast and the saturation of the picture is carried out in order to simulate the pictures with different effects caused by different picture backgrounds and different shooting light rays in a real scene. The noise was added randomly in order to simulate different quality pictures. Through the training pictures adjusted randomly, the recognition model with higher accuracy and wider applicability can be trained.
As an optional implementation manner, the training module 206 trains the initial recognition model according to a back propagation algorithm and the training picture, and the manner of obtaining the trained recognition model specifically includes:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
The label sequence is identified English text which comprises English letters, punctuation marks and spaces.
In this alternative embodiment, the pixel features of the picture may be extracted by the convolutional layer; then inputting the pixel characteristics into a loop layer to obtain image time sequence characteristics, and finally mapping the image time sequence characteristics into a tag sequence by a transcription layer, such as: the English letter "ab" exists in the input picture, the obtained image time sequence features can be a group of vectors (t1, t2, t3, t4, t5), and the label sequence output by the final transcription layer can be "ab".
As an optional implementation manner, the training module 206 updates the network parameters of the initial recognition model according to a back propagation algorithm and the loss value, and the manner of obtaining the trained recognition model specifically includes:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
Wherein, the test set can be English text pictures used for testing.
In this optional implementation, when the parameters of the model are continuously updated by using the back propagation algorithm, the model may be tested by using the test set to obtain the recognition accuracy of the model, and if the recognition accuracy of the model meets the preset requirement (i.e., the recognition accuracy is greater than the preset accuracy threshold), the model and the training may be considered to be completed.
As an optional implementation manner, the determining module 203 is further configured to determine that the model to be tested is an untrained recognition model if the accuracy is less than or equal to a preset accuracy threshold;
the training module 206 is further configured to retrain the untrained recognition model.
In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not reached the expected recognition effect, and the training may be continued or may be retrained.
As an optional implementation manner, the english handwritten text recognition apparatus may further include:
and the correction module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model, and then performing inclination correction on the picture to be recognized according to a Hough transform algorithm to obtain a corrected picture.
The input module 207 inputs the picture to be recognized into the trained recognition model, and the mode of obtaining the recognition result specifically includes:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
In this alternative embodiment, the Hough transform (Hough) may map the letter image into a parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain a horizontal letter image. It is possible to prevent a problem that the recognition effect is not good due to the inclination of the letter image caused by the personal writing or photographing.
In the apparatus for recognizing handwritten english text depicted in fig. 2, a recognition model can be trained by using a large number of sets of images of handwritten english text lines to recognize the whole line of english text, wherein the images for training are scaled in equal proportion, so as to ensure that the characters in the images are not deformed, and the brightness, contrast, saturation and noise of the images are randomly adjusted to simulate the types of images generated in different scenes, thereby improving the accuracy of the recognition model and recognizing the english text lines in various images. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device for implementing a method for recognizing handwritten english text according to a preferred embodiment of the present invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The electronic device 3 may also include, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like. The Network where the electronic device 3 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a discrete hardware component, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. In addition, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one disk storage device, a flash memory device, and the like.
With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement an english handwritten text recognition method, and the processor 32 executes the plurality of instructions to implement:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 depicted in fig. 3, the recognition model can be trained by using a large number of sets of pictures of the handwritten English text lines to recognize the whole line of English text, wherein the pictures for training are scaled in equal proportion, so that the characters in the pictures are not deformed, the brightness, the contrast, the saturation and the noise of the pictures are randomly adjusted, the types of the pictures generated in different scenes are simulated, the precision of the recognition model can be improved, and the English text lines in various pictures can be recognized. Meanwhile, after the pictures for training are subjected to equal-scale scaling, the length of the pictures with insufficient length is supplemented, and the length consistency and the width consistency of all the pictures are ensured, so that a large number of pictures can be used for training at the same time, and the speed of training the recognition model is improved.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program code may be in source code form, object code form, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. An English handwritten text recognition method, characterized in that the English handwritten text recognition method comprises:
acquiring an English handwritten text line picture set, wherein pictures of the English handwritten text line picture set comprise English letters, spaces and punctuation marks;
according to a preset width threshold value, carrying out equal-scale scaling on all pictures in the English handwritten text line picture set to obtain a plurality of scaling pictures;
determining a first standard picture and a picture with length to be compensated from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with length to be compensated is smaller than the preset length threshold;
adding a blank area to the picture with the length to be compensated according to the preset length threshold value to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold value;
randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
acquiring a picture to be identified;
and inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
2. The method for recognizing handwritten English text according to claim 1, wherein the randomly adjusting the first standard picture and the second standard picture to obtain the training picture comprises:
acquiring a preset zooming multiple interval;
according to the preset zooming multiple interval, carrying out equal-proportion random zooming on the first standard image and the second standard image to obtain a random zooming image;
mapping the random zooming picture on a canvas with a preset size to obtain a target picture with a consistent size;
respectively randomly adjusting the brightness, the contrast and the saturation of the target picture to obtain pictures with random brightness, random contrast and random saturation;
and adding random noise to the pictures with random brightness, random contrast and random saturation to obtain training pictures.
3. The method for recognizing handwritten English text according to claim 1, wherein the training an initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model comprises:
inputting the training picture into a convolution layer of the initial recognition model to obtain image pixel characteristics;
inputting the image pixel characteristics into a circulation layer of the initial recognition model to obtain image time sequence characteristics;
inputting the image time sequence characteristics into a transcription layer of the initial recognition model to obtain a tag sequence;
calculating a loss value corresponding to the label sequence by using a loss function;
and updating the network parameters of the initial recognition model according to a back propagation algorithm and the loss value to obtain a trained recognition model.
4. The method of claim 3, wherein the updating the network parameters of the initial recognition model according to the back propagation algorithm and the loss value to obtain the trained recognition model comprises:
according to a back propagation algorithm and the loss value, adjusting network parameters of the initial identification model to minimize the loss value to obtain a model to be tested;
acquiring a preset test set;
testing the model to be tested by using the test set, and determining the accuracy rate of the model to be tested passing the test;
and if the accuracy is greater than a preset accuracy threshold, determining that the model to be tested is a trained recognition model.
5. The english handwritten text recognition method according to claim 4, further comprising:
if the accuracy is smaller than or equal to a preset accuracy threshold, determining that the model to be tested is an untrained recognition model;
and retraining the untrained recognition model.
6. The method for recognizing handwritten English text according to claim 1, wherein after the initial recognition model is trained according to a back propagation algorithm and the training picture, and the trained recognition model is obtained, the method for recognizing handwritten English text further comprises:
according to a Hough transform algorithm, performing tilt correction on the picture to be recognized to obtain a corrected picture;
inputting the picture to be recognized into the trained recognition model, and obtaining a recognition result comprises:
and inputting the correction picture into the trained recognition model to obtain a recognition result.
7. The english handwritten text recognition method according to any one of claims 1 to 6, wherein said initial recognition model includes a convolutional layer, a cyclic layer, and a transcription layer.
8. An english handwritten text recognition apparatus, characterized in that said english handwritten text recognition apparatus comprises:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an English handwritten text line picture set, and pictures of the English handwritten text line picture set comprise English letters, spaces and punctuations;
the zooming module is used for carrying out equal-scale zooming on all pictures in the English handwritten text line picture set according to a preset width threshold value to obtain a plurality of zoomed pictures;
a determining module, configured to determine a first standard picture and a length-to-be-compensated picture from the multiple zoom pictures, where a length of the first standard picture is equal to a preset length threshold, and a length of the length-to-be-compensated picture is smaller than the preset length threshold;
an adding module, configured to add a blank region to the picture with the length to be compensated according to the preset length threshold, to obtain a second standard picture, where the length of the second standard picture is equal to the preset length threshold;
the adjusting module is used for randomly adjusting the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted object comprises picture brightness, picture contrast, picture saturation, noise and picture font size;
the training module is used for training the initial recognition model according to a back propagation algorithm and the training picture to obtain a trained recognition model;
the acquisition module is also used for acquiring a picture to be identified;
and the input module is used for inputting the picture to be recognized into the trained recognition model to obtain a recognition result, wherein the recognition result comprises English letters, spaces and punctuation marks in the picture to be recognized.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the english handwritten text recognition method according to any of claims 1 to 7.
10. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements the english handwritten text recognition method according to any one of claims 1 to 7.
CN202010329360.1A 2020-04-23 2020-04-23 English handwritten text recognition method and device, electronic equipment and storage medium Pending CN111639527A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010329360.1A CN111639527A (en) 2020-04-23 2020-04-23 English handwritten text recognition method and device, electronic equipment and storage medium
PCT/CN2020/098237 WO2021212652A1 (en) 2020-04-23 2020-06-24 Handwritten english text recognition method and device, electronic apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010329360.1A CN111639527A (en) 2020-04-23 2020-04-23 English handwritten text recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111639527A true CN111639527A (en) 2020-09-08

Family

ID=72328702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010329360.1A Pending CN111639527A (en) 2020-04-23 2020-04-23 English handwritten text recognition method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111639527A (en)
WO (1) WO2021212652A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887546A (en) * 2021-12-08 2022-01-04 军事科学院系统工程研究院网络信息研究所 Method and system for improving image recognition accuracy
CN114065868A (en) * 2021-11-24 2022-02-18 马上消费金融股份有限公司 Training method of text detection model, text detection method and device
CN115546614A (en) * 2022-12-02 2022-12-30 天津城建大学 Safety helmet wearing detection method based on improved YOLOV5 model
WO2023173617A1 (en) * 2022-03-18 2023-09-21 北京百度网讯科技有限公司 Image processing method and apparatus, device, and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550158A (en) * 2022-02-23 2022-05-27 厦门大学 Scene character recognition method and system
WO2024103292A1 (en) * 2022-11-16 2024-05-23 京东方科技集团股份有限公司 Handwritten form recognition method, and handwritten form recognition model training method and device
CN116798052B (en) * 2023-08-28 2023-12-08 腾讯科技(深圳)有限公司 Training method and device of text recognition model, storage medium and electronic equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936862B2 (en) * 2016-11-14 2021-03-02 Kodak Alaris Inc. System and method of character recognition using fully convolutional neural networks
EP3598339A1 (en) * 2018-07-19 2020-01-22 Tata Consultancy Services Limited Systems and methods for end-to-end handwritten text recognition using neural networks
CN109376658B (en) * 2018-10-26 2022-03-08 信雅达科技股份有限公司 OCR method based on deep learning
CN109598290A (en) * 2018-11-22 2019-04-09 上海交通大学 A kind of image small target detecting method combined based on hierarchical detection
CN110298338B (en) * 2019-06-20 2021-08-24 北京易道博识科技有限公司 Document image classification method and device
CN110298343A (en) * 2019-07-02 2019-10-01 哈尔滨理工大学 A kind of hand-written blackboard writing on the blackboard recognition methods
CN110619326B (en) * 2019-07-02 2023-04-18 安徽七天网络科技有限公司 English test paper composition detection and identification system and method based on scanning
CN110781885A (en) * 2019-10-24 2020-02-11 泰康保险集团股份有限公司 Text detection method, device, medium and electronic equipment based on image processing
CN110765966B (en) * 2019-10-30 2022-03-25 哈尔滨工业大学 One-stage automatic recognition and translation method for handwritten characters
CN111008624A (en) * 2019-12-05 2020-04-14 嘉兴太美医疗科技有限公司 Optical character recognition method and method for generating training sample for optical character recognition

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065868A (en) * 2021-11-24 2022-02-18 马上消费金融股份有限公司 Training method of text detection model, text detection method and device
CN114065868B (en) * 2021-11-24 2022-09-02 马上消费金融股份有限公司 Training method of text detection model, text detection method and device
CN113887546A (en) * 2021-12-08 2022-01-04 军事科学院系统工程研究院网络信息研究所 Method and system for improving image recognition accuracy
CN113887546B (en) * 2021-12-08 2022-03-11 军事科学院系统工程研究院网络信息研究所 Method and system for improving image recognition accuracy
WO2023173617A1 (en) * 2022-03-18 2023-09-21 北京百度网讯科技有限公司 Image processing method and apparatus, device, and storage medium
CN115546614A (en) * 2022-12-02 2022-12-30 天津城建大学 Safety helmet wearing detection method based on improved YOLOV5 model
CN115546614B (en) * 2022-12-02 2023-04-18 天津城建大学 Safety helmet wearing detection method based on improved YOLOV5 model

Also Published As

Publication number Publication date
WO2021212652A1 (en) 2021-10-28

Similar Documents

Publication Publication Date Title
WO2021212652A1 (en) Handwritten english text recognition method and device, electronic apparatus, and storage medium
CN108121986B (en) Object detection method and device, computer device and computer readable storage medium
CN107977633B (en) Age recognition methods, device and the storage medium of facial image
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
US10902283B2 (en) Method and device for determining handwriting similarity
CN110276342B (en) License plate identification method and system
WO2022156640A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
CN109345553B (en) Palm and key point detection method and device thereof, and terminal equipment
CN110443140B (en) Text positioning method, device, computer equipment and storage medium
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
CN109583509B (en) Data generation method and device and electronic equipment
US20210383199A1 (en) Object-Centric Learning with Slot Attention
CN111507330B (en) Problem recognition method and device, electronic equipment and storage medium
US20230027412A1 (en) Method and apparatus for recognizing subtitle region, device, and storage medium
CN110598703B (en) OCR (optical character recognition) method and device based on deep neural network
CN112464798A (en) Text recognition method and device, electronic equipment and storage medium
WO2022126917A1 (en) Deep learning-based face image evaluation method and apparatus, device, and medium
CN112949649B (en) Text image identification method and device and computing equipment
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
CN112990134B (en) Image simulation method and device, electronic equipment and storage medium
CN112836467B (en) Image processing method and device
US11900258B2 (en) Learning device, image generating device, learning method, image generating method, and program
CN114140802B (en) Text recognition method and device, electronic equipment and storage medium
US20240193980A1 (en) Method for recognizing human body area in image, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210128

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen saiante Technology Service Co.,Ltd.

Address before: 518000 1st-34th floor, Qianhai free trade building, 3048 Mawan Xinghai Avenue, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination