WO2021212652A1 - Handwritten english text recognition method and device, electronic apparatus, and storage medium - Google Patents
Handwritten english text recognition method and device, electronic apparatus, and storage medium Download PDFInfo
- Publication number
- WO2021212652A1 WO2021212652A1 PCT/CN2020/098237 CN2020098237W WO2021212652A1 WO 2021212652 A1 WO2021212652 A1 WO 2021212652A1 CN 2020098237 W CN2020098237 W CN 2020098237W WO 2021212652 A1 WO2021212652 A1 WO 2021212652A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- picture
- recognition model
- pictures
- preset
- training
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 96
- 238000012360 testing method Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 18
- 238000013518 transcription Methods 0.000 claims description 14
- 230000035897 transcription Effects 0.000 claims description 14
- 238000012937 correction Methods 0.000 claims description 11
- 230000000306 recurrent effect Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to an English handwritten text recognition method, device, electronic equipment, and storage medium.
- artificial intelligence can be used to identify text in text images, such as English letters and individual words in text images, but the inventor realizes that some text in text images is handwritten by users, which are written out due to different personal writing habits.
- the text is different in shape, and for the entire line of text, there are spaces between words, with punctuation marks, and the text length is not fixed, resulting in the entire line of English text cannot be recognized.
- the first aspect of this application provides a method for recognizing English handwritten text.
- the method includes:
- the picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English, spaces, and punctuation in the picture to be recognized.
- a second aspect of the present application provides an electronic device including a processor and a memory, and the processor is configured to execute computer-readable instructions stored in the memory to implement the following steps:
- the picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- a third aspect of the present application provides a computer-readable storage medium having at least one computer-readable instruction stored thereon, and the at least one computer-readable instruction is executed by a processor to implement the following steps:
- the picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- a fourth aspect of the present application provides an English handwritten text recognition device, the device includes:
- An acquiring module for acquiring a picture collection of English handwritten text lines, wherein the pictures of the picture set of English handwritten text lines include English letters, spaces, and punctuation marks;
- the zoom module is used to scale all the pictures in the English handwritten text line picture set in equal proportions according to a preset width threshold to obtain multiple zoom pictures;
- the determining module is configured to determine a first standard picture and a picture with a length to be supplemented from the plurality of zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than The preset length threshold;
- An adding module configured to add a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
- the adjustment module is used to randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture fonts size;
- the training module is used to train the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model
- the obtaining module is also used to obtain the picture to be recognized
- the input module is used to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English, spaces, and punctuation in the picture to be recognized.
- a recognition model can be trained to recognize the entire line of English text by using a large set of pictures of English handwritten text lines. Deformation occurs, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model and can recognize English text lines in various pictures.
- the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
- Fig. 1 is a flowchart of a preferred embodiment of a method for recognizing English handwritten text disclosed in the present application.
- Fig. 2 is a functional block diagram of a preferred embodiment of an English handwritten text recognition device disclosed in the present application.
- FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recognizing English handwritten text according to the present application.
- the English handwritten text recognition method of the embodiment of the present application is applied to an electronic device, and can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network, and is executed by the server and the electronic device.
- Networks include, but are not limited to: wide area networks, metropolitan area networks, or local area networks.
- FIG. 1 is a flowchart of a preferred embodiment of an English handwritten text recognition method disclosed in the present application. Among them, according to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
- the electronic device obtains an English handwritten text line picture collection, where the pictures in the English handwritten text line picture collection include English letters, spaces, and punctuation marks.
- the English handwritten text line picture collection can be obtained from the public IAM Handwriting Database (IAM Handwriting Database).
- IAM Handwriting Database contains unlimited English handwritten texts. These English handwritten texts are scanned at a resolution of 300dpi, and Save as a 256 grayscale PNG image.
- the electronic device performs equal scaling on all pictures in the set of pictures of the English handwritten text line according to the preset width threshold to obtain multiple scaled pictures.
- the width of the zoomed picture is a preset width, and the length of the zoomed picture may be different.
- the proportional zoom can prevent the English letters in the picture from being deformed.
- the picture can be scaled to a picture with the same width as the preset width, because the aspect ratio of the picture is fixed. If the original aspect ratio of each picture is inconsistent, the width of the scaled picture is the same, but the length is inconsistent.
- the electronic device determines a first standard picture and a picture with a length to be supplemented from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than all The preset length threshold.
- pictures with a length greater than a preset length can be deleted.
- the electronic device adds a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold.
- a blank area is added at the left or right end of the to-be-compensated length picture to obtain the second standard picture, so that the size of the picture is consistent. Because the neural network used in training has certain requirements for the input picture (length and width), and pictures that meet the requirements, the picture length and the picture width are the same can be input into the neural network at the same time for training, saving training time.
- the electronic device randomly adjusts the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size .
- the brightness, contrast, saturation, noise, and font size of the picture can be adjusted to simulate English text pictures taken in different environments, which can increase the diversity of training samples and improve the training effect.
- the random adjustment of the first standard picture and the second standard picture to obtain the training picture includes:
- Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
- the preset zoom factor interval may be [0.6, 1.0], to ensure that the length of the zoomed picture will not exceed the original length and the width will not exceed the original width (that is, the zoomed picture will not). Map on the canvas of preset size.
- the zoom factor can be randomly obtained from the preset zoom factor interval to zoom the picture, simulating the situation where there is a font size difference in the writing of different people.
- the purpose of randomly adjusting the brightness, contrast, and saturation of the picture is to simulate pictures with different effects in real scenes due to different picture backgrounds and different shooting light. Random noise is added to simulate pictures of different quality.
- a recognition model with higher accuracy and wider applicability can be trained.
- the electronic device trains the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model.
- the neural network in the initial recognition model can have a loss function.
- the loss function is used to calculate the distance between the data output by the current neural network modeling and the ideal data.
- the back propagation algorithm can update each parameter in the neural network. , So that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is constantly close to the ideal data.
- the initial recognition model includes a convolutional layer, a recurrent layer, and a transcription layer.
- the convolutional layer may be CNN (Convolutional Neural Networks)
- the recurrent layer may be RNN (Recurrent Neural Network)
- the transcription layer may be CTC (Connectionist Temporal Classification).
- the training of the initial recognition model according to the backpropagation algorithm and the training pictures, and obtaining the trained recognition model includes:
- the network parameters of the initial recognition model are updated to obtain a trained recognition model.
- the tag sequence is recognized English text, including English letters, punctuation marks, and spaces.
- the pixel features of the picture can be extracted through the convolutional layer; then the pixel features are input into the recurring layer to obtain the image timing features, and finally the transcription layer can map the image timing features to a label sequence, such as:
- the obtained image sequence feature can be a set of vectors (t1, t2, t3, t4, t5), and the tag sequence output by the final transcription layer can be "ab".
- the updating the network parameters of the initial recognition model according to the backpropagation algorithm and the loss value to obtain a trained recognition model includes:
- test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test
- the model to be tested is a trained recognition model.
- test set may be some English text pictures used for testing.
- the model when the backpropagation algorithm is used to continuously update the parameters of the model, the model can be tested using the test set to obtain the recognition accuracy of the model. If the recognition accuracy of the model meets the preset requirements (That is, the recognition accuracy is greater than the preset accuracy threshold), it can be considered that the model and training are completed.
- the method further includes:
- the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not yet reached the expected recognition effect, and training can be continued or retraining can also be performed.
- the electronic device obtains the picture to be recognized.
- the picture to be recognized may be a picture carrying English letters.
- the electronic device inputs the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- the trained recognition model can recognize the entire line of English text in the picture.
- the initial recognition model is trained according to the backpropagation algorithm and the training pictures, and after obtaining the trained recognition model, the method further includes:
- the inputting the picture to be recognized into the trained recognition model to obtain a recognition result includes:
- the correction picture is input into the trained recognition model to obtain a recognition result.
- the Hough transform can map the letter image to the parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain Horizontal letter image. It can prevent the problem of poor recognition effect due to the tilt of the letter image caused by personal writing or shooting.
- the recognition model can be trained to recognize the entire line of English text by using a large number of English handwritten text line picture sets.
- the training pictures are scaled in equal proportions to ensure the text in the pictures No deformation occurs, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model and can recognize English text lines in various pictures.
- the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
- FIG. 2 is a functional module diagram of a preferred embodiment of an English handwritten text recognition device disclosed in the present application.
- the English handwritten text recognition device runs in an electronic device.
- the English handwritten text recognition device may include multiple functional modules composed of program code segments, and the program is a series of computer-readable instruction codes.
- the program code of each program segment in the English handwritten text recognition device can be stored in a memory and executed by at least one processor to perform part or all of the steps in the English handwritten text recognition method described in FIG. 1, specifically Reference may be made to the related description in the method described in FIG. 1, which will not be repeated here.
- the device for recognizing English handwritten text can be divided into multiple functional modules according to the functions it performs.
- the functional modules may include: an acquisition module 201, a zoom module 202, a determination module 203, an addition module 204, an adjustment module 205, a training module 206, and an input module 207.
- the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.
- the acquiring module 201 is configured to acquire a picture collection of English handwritten text lines, wherein the pictures of the picture set of English handwritten text lines include English letters, spaces, and punctuation marks.
- the English handwritten text line picture collection can be obtained from the public IAM Handwriting Database (IAM Handwriting Database), which contains unlimited English handwritten texts, and these English handwritten texts are scanned at a resolution of 300dpi, and Save as a 256 grayscale PNG image.
- IAM Handwriting Database IAM Handwriting Database
- the zoom module 202 is configured to perform equal-scale zooming of all the pictures in the set of pictures of the English handwritten text line according to a preset width threshold to obtain multiple zoomed pictures.
- the width of the zoomed picture is a preset width, and the length of the zoomed picture may be different.
- the proportional zoom can prevent the English letters in the picture from being deformed.
- the picture can be scaled to a picture with the same width as the preset width, because the aspect ratio of the picture is fixed. If the original aspect ratio of each picture is inconsistent, the width of the scaled picture is the same, but the length is inconsistent.
- the determining module 203 is configured to determine a first standard picture and a to-be-added length picture from the multiple zoomed pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the to-be-added length picture Less than the preset length threshold.
- pictures with a length greater than a preset length can be deleted.
- the adding module 204 is configured to add a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold.
- a blank area is added at the left or right end of the to-be-compensated length picture to obtain the second standard picture, so that the size of the picture is consistent. Because the neural network used in training has certain requirements for the input picture (length and width), and pictures that meet the requirements, the picture length and the picture width are the same can be input into the neural network at the same time for training, saving training time.
- the adjustment module 205 is configured to perform random adjustments on the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size.
- the brightness, contrast, saturation, noise, and font size of the picture can be adjusted to simulate English text pictures taken in different environments, which can increase the diversity of training samples and improve the training effect.
- the training module 206 is used to train the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model.
- the neural network in the initial recognition model can have a loss function.
- the loss function is used to calculate the distance between the data output by the current neural network modeling and the ideal data.
- the back propagation algorithm can update each parameter in the neural network. , So that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is constantly close to the ideal data.
- the initial recognition model includes a convolutional layer, a recurrent layer, and a transcription layer.
- the convolutional layer may be CNN (Convolutional Neural Networks)
- the recurrent layer may be RNN (Recurrent Neural Network)
- the transcription layer may be CTC (Connectionist Temporal Classification).
- the obtaining module 201 is also used to obtain a picture to be recognized
- the picture to be recognized may be a picture carrying English letters.
- the input module 207 is configured to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation marks in the picture to be recognized.
- the trained recognition model can recognize the entire line of English text in the picture.
- the adjustment module 205 performs random adjustment on the first standard picture and the second standard picture, and the specific method for obtaining the training picture is:
- Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
- the preset zoom factor interval may be [0.6, 1.0], to ensure that the length of the zoomed picture will not exceed the original length and the width will not exceed the original width (that is, the zoomed picture will not). Map on the canvas of preset size.
- the zoom factor can be randomly obtained from the preset zoom factor interval to zoom the picture, simulating the situation where there is a font size difference in the writing of different people.
- the purpose of randomly adjusting the brightness, contrast, and saturation of the picture is to simulate pictures with different effects in real scenes due to different picture backgrounds and different shooting light. Random noise is added to simulate pictures of different quality.
- a recognition model with higher accuracy and wider applicability can be trained.
- the training module 206 trains the initial recognition model according to the backpropagation algorithm and the training pictures, and the specific method for obtaining the trained recognition model is as follows:
- the network parameters of the initial recognition model are updated to obtain a trained recognition model.
- the tag sequence is recognized English text, including English letters, punctuation marks, and spaces.
- the pixel features of the picture can be extracted through the convolutional layer; then the pixel features are input into the recurring layer to obtain the image timing features, and finally the transcription layer can map the image timing features to a label sequence, such as:
- the obtained image sequence feature can be a set of vectors (t1, t2, t3, t4, t5), and the tag sequence output by the final transcription layer can be "ab".
- the training module 206 updates the network parameters of the initial recognition model according to the backpropagation algorithm and the loss value, and the specific method for obtaining the trained recognition model is as follows:
- test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test
- the model to be tested is a trained recognition model.
- test set may be some English text pictures used for testing.
- the model when the backpropagation algorithm is used to continuously update the parameters of the model, the model can be tested using the test set to obtain the recognition accuracy of the model. If the recognition accuracy of the model meets the preset requirements (That is, the recognition accuracy is greater than the preset accuracy threshold), it can be considered that the model and training are completed.
- the determining module 203 is further configured to determine that the model to be tested is an untrained recognition model if the accuracy rate is less than or equal to a preset accuracy rate threshold;
- the training module 206 is also used to retrain the untrained recognition model.
- the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not yet reached the expected recognition effect, and training can be continued or retraining can also be performed.
- the device for recognizing English handwritten text may further include:
- the correction module is used to train the initial recognition model according to the backpropagation algorithm and the training picture, and after obtaining the trained recognition model, perform tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain Correct the picture.
- the input module 207 inputs the image to be recognized into the trained recognition model, and the specific method for obtaining the recognition result is as follows:
- the correction picture is input into the trained recognition model to obtain a recognition result.
- the Hough transform can map the letter image to the parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain Horizontal letter image. It can prevent the problem of poor recognition effect due to the tilt of the letter image caused by personal writing or shooting.
- the recognition model can be trained by using a large number of English handwritten text line picture sets to recognize the entire line of English text.
- the training pictures are scaled in equal proportions to ensure that the pictures
- the text in the text is not deformed, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model, and can recognize English text lines in various pictures .
- the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
- FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recognizing English handwritten text according to the present application.
- the electronic device 3 includes a memory 31, at least one processor 32, computer readable instructions 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
- FIG. 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. It may include more or less components than those shown in the figure, or a combination. Certain components, or different components, for example, the electronic device 3 may also include input and output devices, network access devices, and so on.
- the electronic device 3 also includes, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, etc.
- Personal digital assistants Personal Digital Assistant, PDA
- game consoles interactive network television (Internet Protocol Television, IPTV), smart wearable devices, etc.
- the at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the processor 32 can be a microprocessor, or the processor 32 can also be any conventional processor, etc.
- the processor 32 is the control center of the electronic device 3, and connects the entire electronic device 3 through various interfaces and lines. Parts.
- the memory 31 may be used to store the computer-readable instructions 33 and/or modules/units, and the processor 32 can run or execute the computer-readable instructions and/or modules/units stored in the memory 31, and
- the data stored in the memory 31 is called to realize various functions of the electronic device 3.
- the memory 31 may mainly include a storage program area and a storage data area.
- the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data and the like created in accordance with the use of the electronic device 3 are stored.
- the memory 31 may include volatile memory such as high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital ( Secure Digital, SD card, Flash Card, at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
- volatile memory such as high-speed random access memory
- non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital ( Secure Digital, SD card, Flash Card, at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
- non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital ( Secure Digital, SD card, Flash Card, at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage
- the memory 31 in the electronic device 3 stores a plurality of instructions to implement an English handwritten text recognition method, and the processor 32 can execute the plurality of instructions to achieve:
- the picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- a recognition model can be trained by using a large set of pictures of English handwritten text lines to recognize the entire line of English text.
- the text is not deformed, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model, and can recognize English text lines in various pictures.
- the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
- the integrated module/unit of the electronic device 3 may be stored in a computer-readable storage medium, which may be non-easy.
- a volatile storage medium can also be a volatile storage medium.
- the computer-readable instruction includes computer-readable instruction code
- the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
- the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory).
- modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
A handwritten English text recognition method and device, an electronic apparatus, and a storage medium, relating to the technical field of artificial intelligence. The method comprises: an electronic apparatus acquiring a handwritten English text line image set (S11); the electronic apparatus performing uniform scaling on all images in the handwritten English text line image set according to a preconfigured width threshold so as to obtain multiple scaled images (S12); the electronic apparatus determining, from the multiple scaled images, a first standard image and an image having a length to be extended (S13); the electronic apparatus adding, according to a preconfigured length threshold, a blank region to the image having the length to be extended so as to obtain a second standard image (S14); the electronic apparatus randomly adjusting the first standard image and the second standard image so as to obtain a training image (S15); the electronic apparatus training an initial recognition model according to a backpropagation algorithm and the training image so as to obtain a trained recognition model (S16); the electronic apparatus acquiring an image to undergo recognition (S17); and the electronic apparatus inputting the image to undergo recognition into the trained recognition model, and acquiring a recognition result (S18). The method allows recognition to be performed on an entire line of English text.
Description
本申请要求于2020年04月23日提交中国专利局,申请号为202010329360.1发明名称为“英文手写文本识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims to be submitted to the Chinese Patent Office on April 23, 2020. The application number is 202010329360.1. The title of the invention is "English handwritten text recognition method, device, electronic equipment and storage medium". The priority of the Chinese patent application, the entire content is by reference Incorporated in this application.
本申请涉及人工智能技术领域,尤其涉及一种英文手写文本识别方法、装置、电子设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to an English handwritten text recognition method, device, electronic equipment, and storage medium.
目前,可以通过人工智能识别出文字图像中的文字,比如文字图像中的英文字母以及单个单词,但发明人意识到,有些文字图像中的文字是用户手写的,由于个人书写习惯不同,写出来的文字在形态上不一样,而且对于整行文本,单词之间有空格,带有标点符号,文本长度不固定,导致整行英文文本不能被识别出来。At present, artificial intelligence can be used to identify text in text images, such as English letters and individual words in text images, but the inventor realizes that some text in text images is handwritten by users, which are written out due to different personal writing habits. The text is different in shape, and for the entire line of text, there are spaces between words, with punctuation marks, and the text length is not fixed, resulting in the entire line of English text cannot be recognized.
因此,如何识别整行英文文本是一个亟需解决的技术问题。Therefore, how to recognize the entire line of English text is a technical problem that needs to be solved urgently.
发明内容Summary of the invention
鉴于以上内容,有必要提供一种英文手写文本识别方法、装置、电子设备及存储介质,能够识别整行英文文本。In view of the above, it is necessary to provide an English handwritten text recognition method, device, electronic equipment, and storage medium, which can recognize the entire line of English text.
本申请的第一方面提供一种英文手写文本识别方法,所述方法包括:The first aspect of this application provides a method for recognizing English handwritten text. The method includes:
获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;
根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;
从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold
根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;
根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;
获取待识别图片;Obtain the picture to be recognized;
将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English, spaces, and punctuation in the picture to be recognized.
本申请的第二方面提供一种电子设备,所述电子设备包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令以实现以下步骤:A second aspect of the present application provides an electronic device including a processor and a memory, and the processor is configured to execute computer-readable instructions stored in the memory to implement the following steps:
获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;
根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得 多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;
从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold
根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;
根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;
获取待识别图片;Obtain the picture to be recognized;
将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
本申请的第三方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行以实现以下步骤:A third aspect of the present application provides a computer-readable storage medium having at least one computer-readable instruction stored thereon, and the at least one computer-readable instruction is executed by a processor to implement the following steps:
获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;
根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;
从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold
根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;
根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;
获取待识别图片;Obtain the picture to be recognized;
将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
本申请的第四方面提供一种英文手写文本识别装置,所述装置包括:A fourth aspect of the present application provides an English handwritten text recognition device, the device includes:
获取模块,用于获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;An acquiring module for acquiring a picture collection of English handwritten text lines, wherein the pictures of the picture set of English handwritten text lines include English letters, spaces, and punctuation marks;
缩放模块,用于根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;The zoom module is used to scale all the pictures in the English handwritten text line picture set in equal proportions according to a preset width threshold to obtain multiple zoom pictures;
确定模块,用于从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;The determining module is configured to determine a first standard picture and a picture with a length to be supplemented from the plurality of zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than The preset length threshold;
添加模块,用于根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;An adding module, configured to add a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
调整模块,用于对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;The adjustment module is used to randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture fonts size;
训练模块,用于根据反向传播算法以及所述训练图片,对初始识别模型进行训练, 获得训练好的识别模型;The training module is used to train the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model;
所述获取模块,还用于获取待识别图片;The obtaining module is also used to obtain the picture to be recognized;
输入模块,用于将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文、空格以及标点符号。The input module is used to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English, spaces, and punctuation in the picture to be recognized.
由以上技术方案,本申请中,可以通过使用大量的英文手写文本行图片集训练出识别模型来识别整行英文文本,其中,将训练用的图片进行等比例缩放,保证了图片中的文字没有发生形变,并对图片的亮度、对比度、饱和度、噪声进行随机的调整,模拟不同场景下产生的图片类型,可以提高识别模型的精度,可以识别各种图片中的英文文本行。同时,在对训练用的图片进行等比例缩放后,对长度不足的图片进行补长度,保证所有图片的长度一致以及宽度一致,从而可以同时使用大量图片进行训练,提高了识别模型训练的速度。Based on the above technical solutions, in this application, a recognition model can be trained to recognize the entire line of English text by using a large set of pictures of English handwritten text lines. Deformation occurs, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model and can recognize English text lines in various pictures. At the same time, after the images for training are scaled proportionally, the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
图1是本申请公开的一种英文手写文本识别方法的较佳实施例的流程图。Fig. 1 is a flowchart of a preferred embodiment of a method for recognizing English handwritten text disclosed in the present application.
图2是本申请公开的一种英文手写文本识别装置的较佳实施例的功能模块图。Fig. 2 is a functional block diagram of a preferred embodiment of an English handwritten text recognition device disclosed in the present application.
图3是本申请实现英文手写文本识别方法的较佳实施例的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recognizing English handwritten text according to the present application.
本申请实施例的英文手写文本识别方法应用在电子设备中,也可以应用在电子设备和通过网络与所述电子设备进行连接的服务器所构成的硬件环境中,由服务器和电子设备共同执行。网络包括但不限于:广域网、城域网或局域网。The English handwritten text recognition method of the embodiment of the present application is applied to an electronic device, and can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network, and is executed by the server and the electronic device. Networks include, but are not limited to: wide area networks, metropolitan area networks, or local area networks.
请参见图1,图1是本申请公开的一种英文手写文本识别方法的较佳实施例的流程图。其中,根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。Please refer to FIG. 1. FIG. 1 is a flowchart of a preferred embodiment of an English handwritten text recognition method disclosed in the present application. Among them, according to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
S11、电子设备获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号。S11. The electronic device obtains an English handwritten text line picture collection, where the pictures in the English handwritten text line picture collection include English letters, spaces, and punctuation marks.
其中,所述英文手写文本行图片集可以从公开的IAM手写数据库(IAM Handwriting Database)中获取,IAM手写数据库包含无限制的英文手写文本,这些英文手写文本被以300dpi的分辨率进行扫描,并保存为256灰度的PNG图像。Among them, the English handwritten text line picture collection can be obtained from the public IAM Handwriting Database (IAM Handwriting Database). The IAM handwriting database contains unlimited English handwritten texts. These English handwritten texts are scanned at a resolution of 300dpi, and Save as a 256 grayscale PNG image.
S12、电子设备根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片。S12. The electronic device performs equal scaling on all pictures in the set of pictures of the English handwritten text line according to the preset width threshold to obtain multiple scaled pictures.
其中,所述缩放图片的宽度为预设宽度,所述缩放图片的长度可能各不相同。Wherein, the width of the zoomed picture is a preset width, and the length of the zoomed picture may be different.
本申请实施例中,等比例缩放可以防止图片中的英文字母发生形变。可以将图片进行等比例缩放宽度至与预设宽度一致的图片,因为图片的长宽比例固定,如果各张图片原来的长宽比例不一致,那么缩放后的图片的宽度一致,但长度不一致。In the embodiment of the present application, the proportional zoom can prevent the English letters in the picture from being deformed. The picture can be scaled to a picture with the same width as the preset width, because the aspect ratio of the picture is fixed. If the original aspect ratio of each picture is inconsistent, the width of the scaled picture is the same, but the length is inconsistent.
S13、电子设备从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值。S13. The electronic device determines a first standard picture and a picture with a length to be supplemented from the multiple zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than all The preset length threshold.
本申请实施例中,可以删除长度大于预设长度的图片。In this embodiment of the application, pictures with a length greater than a preset length can be deleted.
S14、电子设备根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值。S14. The electronic device adds a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold.
本申请实施例中,在所述待补长度图片左端或者右端添加空白区域,获得第二标准图片,使图片的尺寸保持一致。因为在训练使用的神经网络对输入的图片(长度以及宽度)是有一定的要求的,而且,符合要求、图片长度一致且图片宽度一致的图片可以同时输入至神经网络中一起训练,节约了训练时间。In the embodiment of the present application, a blank area is added at the left or right end of the to-be-compensated length picture to obtain the second standard picture, so that the size of the picture is consistent. Because the neural network used in training has certain requirements for the input picture (length and width), and pictures that meet the requirements, the picture length and the picture width are the same can be input into the neural network at the same time for training, saving training time.
S15、电子设备对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大 小。S15. The electronic device randomly adjusts the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size .
本申请实施例中,可以对图片的亮度、对比度、饱和度、噪声以及图片的字体大小进行调整,模拟在不同环境下拍摄的英文文本图片,可以增加训练样本的多样性,从而提高训练效果。In the embodiment of the application, the brightness, contrast, saturation, noise, and font size of the picture can be adjusted to simulate English text pictures taken in different environments, which can increase the diversity of training samples and improve the training effect.
具体的,所述对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片包括:Specifically, the random adjustment of the first standard picture and the second standard picture to obtain the training picture includes:
获取预设缩放倍数区间;Obtain the preset zoom multiple interval;
根据所述预设缩放倍数区间,对所述第一标准图像和所述第二标准图像进行等比例的随机缩放,获得随机缩放图片;Performing random scaling on the first standard image and the second standard image in equal proportions according to the preset scaling factor interval to obtain a randomly scaled picture;
将所述随机缩放图片映射在预设尺寸的画布上,获得尺寸一致的目标图片;Mapping the randomly zoomed picture on a canvas of a preset size to obtain a target picture with a consistent size;
对所述目标图片的亮度、对比度以及饱和度分别进行随机调整,获得随机亮度、随机对比度以及随机饱和度的图片;Randomly adjust brightness, contrast, and saturation of the target picture, respectively, to obtain pictures with random brightness, random contrast, and random saturation;
对所述随机亮度、随机对比度以及随机饱和度的图片添加随机的噪声,获得训练图片。Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
其中,所述预设缩放倍数区间可以为[0.6,1.0],确保经过缩放后的图片的长度不会超过原来的长度以及宽度不会超过原来的宽度(即缩放后的图片不会),可以映射在预设尺寸的画布上。Wherein, the preset zoom factor interval may be [0.6, 1.0], to ensure that the length of the zoomed picture will not exceed the original length and the width will not exceed the original width (that is, the zoomed picture will not). Map on the canvas of preset size.
在该可选的实施方式中,可以从所述预设缩放倍数区间中随机获取缩放倍数来对图片进行缩放,模拟不同人写字存在字体大小差异的情况。对图片的亮度、对比度以及饱和度进行随机调整,是为了模拟真实场景中由于图片背景不同、拍摄光线不同导致不同效果的图片。随机添加噪声是为了模拟不同质量的图片。通过随机调整的训练图片,可以训练出准确度更高以及适用性更广的识别模型。In this optional implementation manner, the zoom factor can be randomly obtained from the preset zoom factor interval to zoom the picture, simulating the situation where there is a font size difference in the writing of different people. The purpose of randomly adjusting the brightness, contrast, and saturation of the picture is to simulate pictures with different effects in real scenes due to different picture backgrounds and different shooting light. Random noise is added to simulate pictures of different quality. Through randomly adjusted training images, a recognition model with higher accuracy and wider applicability can be trained.
S16、电子设备根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型。S16. The electronic device trains the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model.
其中,初始识别模型中的神经网络都可以有一个损失函数,损失函数是用来计算当前神经网络建模输出的数据和理想数据之间的距离,反向传播算法可以更新神经网络中的各个参数,使损失函数计算出的损失值不断减少,即使神经网络建模输出的数据不断接近理想数据。Among them, the neural network in the initial recognition model can have a loss function. The loss function is used to calculate the distance between the data output by the current neural network modeling and the ideal data. The back propagation algorithm can update each parameter in the neural network. , So that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is constantly close to the ideal data.
其中,所述初始识别模型包括卷积层、循环层以及转录层。Wherein, the initial recognition model includes a convolutional layer, a recurrent layer, and a transcription layer.
其中,卷积层可以是CNN(Convolutional Neural Networks,卷积神经网络),循环层可以是RNN(Recurrent Neural Network,循环神经网络),转录层可以是CTC(Connectionist Temporal Classification,连接时序分类)。Among them, the convolutional layer may be CNN (Convolutional Neural Networks), the recurrent layer may be RNN (Recurrent Neural Network), and the transcription layer may be CTC (Connectionist Temporal Classification).
具体的,所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型包括:Specifically, the training of the initial recognition model according to the backpropagation algorithm and the training pictures, and obtaining the trained recognition model includes:
将所述训练图片输入至所述初始识别模型的卷积层中,获得图像像素特征;Input the training picture into the convolutional layer of the initial recognition model to obtain image pixel features;
将所述图像像素特征输入至所述初始识别模型的循环层中,获得图像时序特征;Input the image pixel features into the recurring layer of the initial recognition model to obtain image time series features;
将所述图像时序特征输入至所述初始识别模型的转录层中,获得标签序列;Inputting the image timing features into the transcription layer of the initial recognition model to obtain a tag sequence;
使用损失函数,计算出所述标签序列对应的损失值;Using a loss function to calculate the loss value corresponding to the tag sequence;
根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型。According to the back propagation algorithm and the loss value, the network parameters of the initial recognition model are updated to obtain a trained recognition model.
其中,所述标签序列为识别出来的英文文本,包括英文字母、标点符号以及空格。Wherein, the tag sequence is recognized English text, including English letters, punctuation marks, and spaces.
在该可选的实施方式中,可以通过卷积层提取图片的像素特征;然后将像素特征输入至循环层中,获得图像时序特征,最后转录层可以将图像时序特征映射为标签序列,比如:输入的图片中存在英文字母“ab”,获得的图像时序特征可以为一组向量(t1,t2,t3,t4,t5),最后转录层输出的标签序列可以为“ab”。In this alternative embodiment, the pixel features of the picture can be extracted through the convolutional layer; then the pixel features are input into the recurring layer to obtain the image timing features, and finally the transcription layer can map the image timing features to a label sequence, such as: There are English letters "ab" in the input picture, the obtained image sequence feature can be a set of vectors (t1, t2, t3, t4, t5), and the tag sequence output by the final transcription layer can be "ab".
作为一种可选的实施方式,所述根据反向传播算法以及所述损失值,更新所述初始识别 模型的网络参数,获得训练好的识别模型包括:As an optional implementation manner, the updating the network parameters of the initial recognition model according to the backpropagation algorithm and the loss value to obtain a trained recognition model includes:
根据反向传播算法以及所述损失值,调整所述初始识别模型的网络参数以最小化所述损失值,获得待测试模型;According to the back propagation algorithm and the loss value, adjusting the network parameters of the initial recognition model to minimize the loss value to obtain the model to be tested;
获取预设的测试集;Get the preset test set;
使用所述测试集对所述待测试模型进行测试,并确定所述待测试模型被测试通过的准确率;Use the test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test;
若所述准确率大于预设准确率阈值,确定所述待测试模型为训练好的识别模型。If the accuracy rate is greater than the preset accuracy rate threshold, it is determined that the model to be tested is a trained recognition model.
其中,所述测试集可以是一些用来测试的英文文本图片。Wherein, the test set may be some English text pictures used for testing.
在该可选的实施方式中,在使用反向传播算法不断更新模型的参数的时候,可以使用测试集对模型进行测试,获得模型的识别准确率,若模型的识别准确率满足预设要求(即识别准确率大于预设准确率阈值),可以认为该模型以及训练完成。In this alternative embodiment, when the backpropagation algorithm is used to continuously update the parameters of the model, the model can be tested using the test set to obtain the recognition accuracy of the model. If the recognition accuracy of the model meets the preset requirements ( That is, the recognition accuracy is greater than the preset accuracy threshold), it can be considered that the model and training are completed.
作为一种可选的实施方式,所述方法还包括:As an optional implementation manner, the method further includes:
若所述准确率小于或等于预设准确率阈值,确定所述待测试模型为未训练好的识别模型;If the accuracy rate is less than or equal to the preset accuracy rate threshold, determining that the model to be tested is an untrained recognition model;
对所述未训练好的识别模型重新进行训练。Retrain the untrained recognition model.
在该可选的实施方式中,若模型的识别准确率小于或等于预设准确率阈值,说明该模型的识别效果还未达到预期的识别效果,可以继续训练,也可以重新训练。In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not yet reached the expected recognition effect, and training can be continued or retraining can also be performed.
S17、电子设备获取待识别图片。S17. The electronic device obtains the picture to be recognized.
其中,所述待识别图片,可以是携带有英文字母的图片。Wherein, the picture to be recognized may be a picture carrying English letters.
S18、电子设备将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。S18. The electronic device inputs the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
本申请实施例中,所述训练好的识别模型可以将图片中的整行英文文本识别出来。In the embodiment of the present application, the trained recognition model can recognize the entire line of English text in the picture.
作为一种可选的实施方式,所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型之后,所述方法还包括:As an optional implementation manner, the initial recognition model is trained according to the backpropagation algorithm and the training pictures, and after obtaining the trained recognition model, the method further includes:
根据霍夫变换算法,对所述待识别图片进行倾斜校正,获得校正图片;Performing tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain a corrected picture;
所述将所述待识别图片输入至所述训练好的识别模型中,获得识别结果包括:The inputting the picture to be recognized into the trained recognition model to obtain a recognition result includes:
将所述校正图片输入至所述训练好的识别模型中,获得识别结果。The correction picture is input into the trained recognition model to obtain a recognition result.
在该可选的实施方式中,所述霍夫变换(Hough)可以将字母图像映射至参数空间中,计算出字母图像倾斜的角度,然后根据字母图像倾斜的角度,将字母图像进行旋转,获得水平的字母图像。可以防止由于个人书写或者拍摄导致的字母图像倾斜导致识别效果不佳的问题。In this optional implementation, the Hough transform can map the letter image to the parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain Horizontal letter image. It can prevent the problem of poor recognition effect due to the tilt of the letter image caused by personal writing or shooting.
在图1所描述的方法流程中,可以通过使用大量的英文手写文本行图片集训练出识别模型来识别整行英文文本,其中,将训练用的图片进行等比例缩放,保证了图片中的文字没有发生形变,并对图片的亮度、对比度、饱和度、噪声进行随机的调整,模拟不同场景下产生的图片类型,可以提高识别模型的精度,可以识别各种图片中的英文文本行。同时,在对训练用的图片进行等比例缩放后,对长度不足的图片进行补长度,保证所有图片的长度一致以及宽度一致,从而可以同时使用大量图片进行训练,提高了识别模型训练的速度。In the method flow described in Figure 1, the recognition model can be trained to recognize the entire line of English text by using a large number of English handwritten text line picture sets. Among them, the training pictures are scaled in equal proportions to ensure the text in the pictures No deformation occurs, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model and can recognize English text lines in various pictures. At the same time, after the images for training are scaled proportionally, the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
以上所述,仅是本申请的具体实施方式,但本申请的保护范围并不局限于此,对于本领域的普通技术人员来说,在不脱离本申请创造构思的前提下,还可以做出改进,但这些均属于本申请的保护范围。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. For those of ordinary skill in the art, without departing from the creative concept of this application, they can also make Improvements, but these all belong to the scope of protection of this application.
请参见图2,图2是本申请公开的一种英文手写文本识别装置的较佳实施例的功能模块图。Please refer to FIG. 2. FIG. 2 is a functional module diagram of a preferred embodiment of an English handwritten text recognition device disclosed in the present application.
在一些实施例中,所述英文手写文本识别装置运行于电子设备中。所述英文手写文本识别装置可以包括多个由程序代码段所组成的功能模块,所述程序是一系列的计算机可读指令代码。所述英文手写文本识别装置中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行图1所描述的英文手写文本识别方法中的部分或全部步骤,具 体可以参照图1所述方法中的相关描述,在此不再赘述。In some embodiments, the English handwritten text recognition device runs in an electronic device. The English handwritten text recognition device may include multiple functional modules composed of program code segments, and the program is a series of computer-readable instruction codes. The program code of each program segment in the English handwritten text recognition device can be stored in a memory and executed by at least one processor to perform part or all of the steps in the English handwritten text recognition method described in FIG. 1, specifically Reference may be made to the related description in the method described in FIG. 1, which will not be repeated here.
本实施例中,所述英文手写文本识别装置根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:获取模块201、缩放模块202、确定模块203、添加模块204、调整模块205、训练模块206及输入模块207。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。In this embodiment, the device for recognizing English handwritten text can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an acquisition module 201, a zoom module 202, a determination module 203, an addition module 204, an adjustment module 205, a training module 206, and an input module 207. The module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.
获取模块201,用于获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号。The acquiring module 201 is configured to acquire a picture collection of English handwritten text lines, wherein the pictures of the picture set of English handwritten text lines include English letters, spaces, and punctuation marks.
其中,所述英文手写文本行图片集可以从公开的IAM手写数据库(IAM Handwriting Database)中获取,IAM手写数据库包含无限制的英文手写文本,这些英文手写文本被以300dpi的分辨率进行扫描,并保存为256灰度的PNG图像。Among them, the English handwritten text line picture collection can be obtained from the public IAM Handwriting Database (IAM Handwriting Database), which contains unlimited English handwritten texts, and these English handwritten texts are scanned at a resolution of 300dpi, and Save as a 256 grayscale PNG image.
缩放模块202,用于根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片。The zoom module 202 is configured to perform equal-scale zooming of all the pictures in the set of pictures of the English handwritten text line according to a preset width threshold to obtain multiple zoomed pictures.
其中,所述缩放图片的宽度为预设宽度,所述缩放图片的长度可能各不相同。Wherein, the width of the zoomed picture is a preset width, and the length of the zoomed picture may be different.
本申请实施例中,等比例缩放可以防止图片中的英文字母发生形变。可以将图片进行等比例缩放宽度至与预设宽度一致的图片,因为图片的长宽比例固定,如果各张图片原来的长宽比例不一致,那么缩放后的图片的宽度一致,但长度不一致。In the embodiment of the present application, the proportional zoom can prevent the English letters in the picture from being deformed. The picture can be scaled to a picture with the same width as the preset width, because the aspect ratio of the picture is fixed. If the original aspect ratio of each picture is inconsistent, the width of the scaled picture is the same, but the length is inconsistent.
确定模块203,用于从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值。The determining module 203 is configured to determine a first standard picture and a to-be-added length picture from the multiple zoomed pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the to-be-added length picture Less than the preset length threshold.
本申请实施例中,可以删除长度大于预设长度的图片。In this embodiment of the application, pictures with a length greater than a preset length can be deleted.
添加模块204,用于根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值。The adding module 204 is configured to add a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold.
本申请实施例中,在所述待补长度图片左端或者右端添加空白区域,获得第二标准图片,使图片的尺寸保持一致。因为在训练使用的神经网络对输入的图片(长度以及宽度)是有一定的要求的,而且,符合要求、图片长度一致且图片宽度一致的图片可以同时输入至神经网络中一起训练,节约了训练时间。In the embodiment of the present application, a blank area is added at the left or right end of the to-be-compensated length picture to obtain the second standard picture, so that the size of the picture is consistent. Because the neural network used in training has certain requirements for the input picture (length and width), and pictures that meet the requirements, the picture length and the picture width are the same can be input into the neural network at the same time for training, saving training time.
调整模块205,用于对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小。The adjustment module 205 is configured to perform random adjustments on the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size.
本申请实施例中,可以对图片的亮度、对比度、饱和度、噪声以及图片的字体大小进行调整,模拟在不同环境下拍摄的英文文本图片,可以增加训练样本的多样性,从而提高训练效果。In the embodiment of the application, the brightness, contrast, saturation, noise, and font size of the picture can be adjusted to simulate English text pictures taken in different environments, which can increase the diversity of training samples and improve the training effect.
训练模块206,用于根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型。The training module 206 is used to train the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model.
其中,初始识别模型中的神经网络都可以有一个损失函数,损失函数是用来计算当前神经网络建模输出的数据和理想数据之间的距离,反向传播算法可以更新神经网络中的各个参数,使损失函数计算出的损失值不断减少,即使神经网络建模输出的数据不断接近理想数据。Among them, the neural network in the initial recognition model can have a loss function. The loss function is used to calculate the distance between the data output by the current neural network modeling and the ideal data. The back propagation algorithm can update each parameter in the neural network. , So that the loss value calculated by the loss function is continuously reduced, even if the data output by the neural network modeling is constantly close to the ideal data.
其中,所述初始识别模型包括卷积层、循环层以及转录层。Wherein, the initial recognition model includes a convolutional layer, a recurrent layer, and a transcription layer.
其中,卷积层可以是CNN(Convolutional Neural Networks,卷积神经网络),循环层可以是RNN(Recurrent Neural Network,循环神经网络),转录层可以是CTC(Connectionist Temporal Classification,连接时序分类)。Among them, the convolutional layer may be CNN (Convolutional Neural Networks), the recurrent layer may be RNN (Recurrent Neural Network), and the transcription layer may be CTC (Connectionist Temporal Classification).
所述获取模块201,还用于获取待识别图片;The obtaining module 201 is also used to obtain a picture to be recognized;
其中,所述待识别图片,可以是携带有英文字母的图片。Wherein, the picture to be recognized may be a picture carrying English letters.
输入模块207,用于将所述待识别图片输入至所述训练好的识别模型中,获得识别结果, 其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The input module 207 is configured to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation marks in the picture to be recognized.
本申请实施例中,所述训练好的识别模型可以将图片中的整行英文文本识别出来。In the embodiment of the present application, the trained recognition model can recognize the entire line of English text in the picture.
作为一种可选的实施方式,所述调整模块205对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片的方式具体为:As an optional implementation manner, the adjustment module 205 performs random adjustment on the first standard picture and the second standard picture, and the specific method for obtaining the training picture is:
获取预设缩放倍数区间;Obtain the preset zoom multiple interval;
根据所述预设缩放倍数区间,对所述第一标准图像和所述第二标准图像进行等比例的随机缩放,获得随机缩放图片;Performing random scaling on the first standard image and the second standard image in equal proportions according to the preset scaling factor interval to obtain a randomly scaled picture;
将所述随机缩放图片映射在预设尺寸的画布上,获得尺寸一致的目标图片;Mapping the randomly zoomed picture on a canvas of a preset size to obtain a target picture with a consistent size;
对所述目标图片的亮度、对比度以及饱和度分别进行随机调整,获得随机亮度、随机对比度以及随机饱和度的图片;Randomly adjust brightness, contrast, and saturation of the target picture, respectively, to obtain pictures with random brightness, random contrast, and random saturation;
对所述随机亮度、随机对比度以及随机饱和度的图片添加随机的噪声,获得训练图片。Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
其中,所述预设缩放倍数区间可以为[0.6,1.0],确保经过缩放后的图片的长度不会超过原来的长度以及宽度不会超过原来的宽度(即缩放后的图片不会),可以映射在预设尺寸的画布上。Wherein, the preset zoom factor interval may be [0.6, 1.0], to ensure that the length of the zoomed picture will not exceed the original length and the width will not exceed the original width (that is, the zoomed picture will not). Map on the canvas of preset size.
在该可选的实施方式中,可以从所述预设缩放倍数区间中随机获取缩放倍数来对图片进行缩放,模拟不同人写字存在字体大小差异的情况。对图片的亮度、对比度以及饱和度进行随机调整,是为了模拟真实场景中由于图片背景不同、拍摄光线不同导致不同效果的图片。随机添加噪声是为了模拟不同质量的图片。通过随机调整的训练图片,可以训练出准确度更高以及适用性更广的识别模型。In this optional implementation manner, the zoom factor can be randomly obtained from the preset zoom factor interval to zoom the picture, simulating the situation where there is a font size difference in the writing of different people. The purpose of randomly adjusting the brightness, contrast, and saturation of the picture is to simulate pictures with different effects in real scenes due to different picture backgrounds and different shooting light. Random noise is added to simulate pictures of different quality. Through randomly adjusted training images, a recognition model with higher accuracy and wider applicability can be trained.
作为一种可选的实施方式,所述训练模块206根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型的方式具体为:As an optional implementation manner, the training module 206 trains the initial recognition model according to the backpropagation algorithm and the training pictures, and the specific method for obtaining the trained recognition model is as follows:
将所述训练图片输入至所述初始识别模型的卷积层中,获得图像像素特征;Input the training picture into the convolutional layer of the initial recognition model to obtain image pixel features;
将所述图像像素特征输入至所述初始识别模型的循环层中,获得图像时序特征;Input the image pixel features into the recurring layer of the initial recognition model to obtain image time series features;
将所述图像时序特征输入至所述初始识别模型的转录层中,获得标签序列;Inputting the image timing features into the transcription layer of the initial recognition model to obtain a tag sequence;
使用损失函数,计算出所述标签序列对应的损失值;Using a loss function to calculate the loss value corresponding to the tag sequence;
根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型。According to the back propagation algorithm and the loss value, the network parameters of the initial recognition model are updated to obtain a trained recognition model.
其中,所述标签序列为识别出来的英文文本,包括英文字母、标点符号以及空格。Wherein, the tag sequence is recognized English text, including English letters, punctuation marks, and spaces.
在该可选的实施方式中,可以通过卷积层提取图片的像素特征;然后将像素特征输入至循环层中,获得图像时序特征,最后转录层可以将图像时序特征映射为标签序列,比如:输入的图片中存在英文字母“ab”,获得的图像时序特征可以为一组向量(t1,t2,t3,t4,t5),最后转录层输出的标签序列可以为“ab”。In this alternative embodiment, the pixel features of the picture can be extracted through the convolutional layer; then the pixel features are input into the recurring layer to obtain the image timing features, and finally the transcription layer can map the image timing features to a label sequence, such as: There are English letters "ab" in the input picture, the obtained image sequence feature can be a set of vectors (t1, t2, t3, t4, t5), and the tag sequence output by the final transcription layer can be "ab".
作为一种可选的实施方式,所述训练模块206根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型的方式具体为:As an optional implementation manner, the training module 206 updates the network parameters of the initial recognition model according to the backpropagation algorithm and the loss value, and the specific method for obtaining the trained recognition model is as follows:
根据反向传播算法以及所述损失值,调整所述初始识别模型的网络参数以最小化所述损失值,获得待测试模型;According to the back propagation algorithm and the loss value, adjusting the network parameters of the initial recognition model to minimize the loss value to obtain the model to be tested;
获取预设的测试集;Get the preset test set;
使用所述测试集对所述待测试模型进行测试,并确定所述待测试模型被测试通过的准确率;Use the test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test;
若所述准确率大于预设准确率阈值,确定所述待测试模型为训练好的识别模型。If the accuracy rate is greater than the preset accuracy rate threshold, it is determined that the model to be tested is a trained recognition model.
其中,所述测试集可以是一些用来测试的英文文本图片。Wherein, the test set may be some English text pictures used for testing.
在该可选的实施方式中,在使用反向传播算法不断更新模型的参数的时候,可以使用测试集对模型进行测试,获得模型的识别准确率,若模型的识别准确率满足预设要求(即识别准确率大于预设准确率阈值),可以认为该模型以及训练完成。In this alternative embodiment, when the backpropagation algorithm is used to continuously update the parameters of the model, the model can be tested using the test set to obtain the recognition accuracy of the model. If the recognition accuracy of the model meets the preset requirements ( That is, the recognition accuracy is greater than the preset accuracy threshold), it can be considered that the model and training are completed.
作为一种可选的实施方式,所述确定模块203还用于若所述准确率小于或等于预设准确率阈值,确定所述待测试模型为未训练好的识别模型;As an optional implementation manner, the determining module 203 is further configured to determine that the model to be tested is an untrained recognition model if the accuracy rate is less than or equal to a preset accuracy rate threshold;
所述训练模块206,还用于对所述未训练好的识别模型重新进行训练。The training module 206 is also used to retrain the untrained recognition model.
在该可选的实施方式中,若模型的识别准确率小于或等于预设准确率阈值,说明该模型的识别效果还未达到预期的识别效果,可以继续训练,也可以重新训练。In this optional embodiment, if the recognition accuracy of the model is less than or equal to the preset accuracy threshold, it indicates that the recognition effect of the model has not yet reached the expected recognition effect, and training can be continued or retraining can also be performed.
作为一种可选的实施方式,所述英文手写文本识别装置还可以包括:As an optional implementation manner, the device for recognizing English handwritten text may further include:
校正模块,用于所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型之后,根据霍夫变换算法,对所述待识别图片进行倾斜校正,获得校正图片。The correction module is used to train the initial recognition model according to the backpropagation algorithm and the training picture, and after obtaining the trained recognition model, perform tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain Correct the picture.
所述输入模块207所述将所述待识别图片输入至所述训练好的识别模型中,获得识别结果的方式具体为:The input module 207 inputs the image to be recognized into the trained recognition model, and the specific method for obtaining the recognition result is as follows:
将所述校正图片输入至所述训练好的识别模型中,获得识别结果。The correction picture is input into the trained recognition model to obtain a recognition result.
在该可选的实施方式中,所述霍夫变换(Hough)可以将字母图像映射至参数空间中,计算出字母图像倾斜的角度,然后根据字母图像倾斜的角度,将字母图像进行旋转,获得水平的字母图像。可以防止由于个人书写或者拍摄导致的字母图像倾斜导致识别效果不佳的问题。In this optional implementation, the Hough transform can map the letter image to the parameter space, calculate the tilt angle of the letter image, and then rotate the letter image according to the tilt angle of the letter image to obtain Horizontal letter image. It can prevent the problem of poor recognition effect due to the tilt of the letter image caused by personal writing or shooting.
在图2所描述的英文手写文本识别装置中,可以通过使用大量的英文手写文本行图片集训练出识别模型来识别整行英文文本,其中,将训练用的图片进行等比例缩放,保证了图片中的文字没有发生形变,并对图片的亮度、对比度、饱和度、噪声进行随机的调整,模拟不同场景下产生的图片类型,可以提高识别模型的精度,可以识别各种图片中的英文文本行。同时,在对训练用的图片进行等比例缩放后,对长度不足的图片进行补长度,保证所有图片的长度一致以及宽度一致,从而可以同时使用大量图片进行训练,提高了识别模型训练的速度。In the English handwritten text recognition device described in Figure 2, the recognition model can be trained by using a large number of English handwritten text line picture sets to recognize the entire line of English text. Among them, the training pictures are scaled in equal proportions to ensure that the pictures The text in the text is not deformed, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model, and can recognize English text lines in various pictures . At the same time, after the images for training are scaled proportionally, the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
如图3所示,图3是本申请实现英文手写文本识别方法的较佳实施例的电子设备的结构示意图。所述电子设备3包括存储器31、至少一个处理器32、存储在所述存储器31中并可在所述至少一个处理器32上运行的计算机可读指令33及至少一条通讯总线34。As shown in FIG. 3, FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recognizing English handwritten text according to the present application. The electronic device 3 includes a memory 31, at least one processor 32, computer readable instructions 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
本领域技术人员可以理解,图3所示的示意图仅仅是所述电子设备3的示例,并不构成对所述电子设备3的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备3还可以包括输入输出设备、网络接入设备等。Those skilled in the art can understand that the schematic diagram shown in FIG. 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. It may include more or less components than those shown in the figure, or a combination. Certain components, or different components, for example, the electronic device 3 may also include input and output devices, network access devices, and so on.
所述电子设备3还包括但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。The electronic device 3 also includes, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, etc. Personal digital assistants (Personal Digital Assistant, PDA), game consoles, interactive network television (Internet Protocol Television, IPTV), smart wearable devices, etc.
所述至少一个处理器32可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器32可以是微处理器或者该处理器32也可以是任何常规的处理器等,所述处理器32是所述电子设备3的控制中心,利用各种接口和线路连接整个电子设备3的各个部分。The at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The processor 32 can be a microprocessor, or the processor 32 can also be any conventional processor, etc. The processor 32 is the control center of the electronic device 3, and connects the entire electronic device 3 through various interfaces and lines. Parts.
所述存储器31可用于存储所述计算机可读指令33和/或模块/单元,所述处理器32通过运行或执行存储在所述存储器31内的计算机可读指令和/或模块/单元,以及调用存储在存储器31内的数据,实现所述电子设备3的各种功能。所述存储器31可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备3的使用所创建的数据等。 此外,存储器31可以包括高速随机存取存储器等易失性存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。The memory 31 may be used to store the computer-readable instructions 33 and/or modules/units, and the processor 32 can run or execute the computer-readable instructions and/or modules/units stored in the memory 31, and The data stored in the memory 31 is called to realize various functions of the electronic device 3. The memory 31 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data and the like created in accordance with the use of the electronic device 3 are stored. In addition, the memory 31 may include volatile memory such as high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital ( Secure Digital, SD card, Flash Card, at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
结合图1,所述电子设备3中的所述存储器31存储多个指令以实现一种英文手写文本识别方法,所述处理器32可执行所述多个指令从而实现:With reference to FIG. 1, the memory 31 in the electronic device 3 stores a plurality of instructions to implement an English handwritten text recognition method, and the processor 32 can execute the plurality of instructions to achieve:
获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;
根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;
从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold
根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;
对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;
根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;
获取待识别图片;Obtain the picture to be recognized;
将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
具体地,所述处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above-mentioned instructions by the processor 32, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1, which will not be repeated here.
在图3所描述的电子设备3中,可以通过使用大量的英文手写文本行图片集训练出识别模型来识别整行英文文本,其中,将训练用的图片进行等比例缩放,保证了图片中的文字没有发生形变,并对图片的亮度、对比度、饱和度、噪声进行随机的调整,模拟不同场景下产生的图片类型,可以提高识别模型的精度,可以识别各种图片中的英文文本行。同时,在对训练用的图片进行等比例缩放后,对长度不足的图片进行补长度,保证所有图片的长度一致以及宽度一致,从而可以同时使用大量图片进行训练,提高了识别模型训练的速度。In the electronic device 3 described in Figure 3, a recognition model can be trained by using a large set of pictures of English handwritten text lines to recognize the entire line of English text. The text is not deformed, and the brightness, contrast, saturation, and noise of the picture are randomly adjusted to simulate the types of pictures generated in different scenes, which can improve the accuracy of the recognition model, and can recognize English text lines in various pictures. At the same time, after the images for training are scaled proportionally, the insufficient length of the images is supplemented to ensure that all the images have the same length and width, so that a large number of images can be used for training at the same time, which improves the speed of training the recognition model.
所述电子设备3集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,所述计算机可读存储介质可以是非易失性的存储介质,也可以是易失性的存储介质。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于计算机可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存储器(RAM,Random Access Memory)。If the integrated module/unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium, which may be non-easy. A volatile storage medium can also be a volatile storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. When the computer-readable instructions are executed by the processor, they can implement the steps of the foregoing method embodiments. Wherein, the computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory).
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元 上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any associated diagram marks in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to indicate names, and do not indicate any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.
Claims (20)
- 一种英文手写文本识别方法,其中,所述英文手写文本识别方法包括:An English handwritten text recognition method, wherein the English handwritten text recognition method includes:获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;获取待识别图片;Obtain the picture to be recognized;将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- 根据权利要求1所述的英文手写文本识别方法,其中,所述对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片包括:The method for recognizing English handwritten text according to claim 1, wherein said randomly adjusting said first standard picture and said second standard picture to obtain training pictures comprises:获取预设缩放倍数区间;Obtain the preset zoom multiple interval;根据所述预设缩放倍数区间,对所述第一标准图像和所述第二标准图像进行等比例的随机缩放,获得随机缩放图片;Performing random scaling on the first standard image and the second standard image in equal proportions according to the preset scaling factor interval to obtain a randomly scaled picture;将所述随机缩放图片映射在预设尺寸的画布上,获得尺寸一致的目标图片;Mapping the randomly zoomed picture on a canvas of a preset size to obtain a target picture with a consistent size;对所述目标图片的亮度、对比度以及饱和度分别进行随机调整,获得随机亮度、随机对比度以及随机饱和度的图片;Randomly adjust brightness, contrast, and saturation of the target picture, respectively, to obtain pictures with random brightness, random contrast, and random saturation;对所述随机亮度、随机对比度以及随机饱和度的图片添加随机的噪声,获得训练图片。Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
- 根据权利要求1所述的英文手写文本识别方法,其中,所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型包括:The method for recognizing English handwritten text according to claim 1, wherein the training of the initial recognition model according to the backpropagation algorithm and the training picture to obtain the trained recognition model comprises:将所述训练图片输入至所述初始识别模型的卷积层中,获得图像像素特征;Input the training picture into the convolutional layer of the initial recognition model to obtain image pixel features;将所述图像像素特征输入至所述初始识别模型的循环层中,获得图像时序特征;Input the image pixel features into the recurring layer of the initial recognition model to obtain image time series features;将所述图像时序特征输入至所述初始识别模型的转录层中,获得标签序列;Inputting the image timing features into the transcription layer of the initial recognition model to obtain a tag sequence;使用损失函数,计算出所述标签序列对应的损失值;Using a loss function to calculate the loss value corresponding to the tag sequence;根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型。According to the back propagation algorithm and the loss value, the network parameters of the initial recognition model are updated to obtain a trained recognition model.
- 根据权利要求3所述的英文手写文本识别方法,其中,所述根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型包括:4. The method for recognizing English handwritten text according to claim 3, wherein the updating the network parameters of the initial recognition model according to the backpropagation algorithm and the loss value to obtain the trained recognition model comprises:根据反向传播算法以及所述损失值,调整所述初始识别模型的网络参数以最小化所述损失值,获得待测试模型;According to the back propagation algorithm and the loss value, adjusting the network parameters of the initial recognition model to minimize the loss value to obtain the model to be tested;获取预设的测试集;Get the preset test set;使用所述测试集对所述待测试模型进行测试,并确定所述待测试模型被测试通过的准确率;Use the test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test;若所述准确率大于预设准确率阈值,确定所述待测试模型为训练好的识别模型。If the accuracy rate is greater than the preset accuracy rate threshold, it is determined that the model to be tested is a trained recognition model.
- 根据权利要求4所述的英文手写文本识别方法,其中,所述英文手写文本识别方法还包括:4. The method for recognizing English handwritten text according to claim 4, wherein the method for recognizing English handwritten text further comprises:若所述准确率小于或等于预设准确率阈值,确定所述待测试模型为未训练好的识别模型;If the accuracy rate is less than or equal to the preset accuracy rate threshold, determining that the model to be tested is an untrained recognition model;对所述未训练好的识别模型重新进行训练。Retrain the untrained recognition model.
- 根据权利要求1所述的英文手写文本识别方法,其中,所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型之后,所述英文手写文本识别方法还包括:The method for recognizing English handwritten text according to claim 1, wherein the initial recognition model is trained according to the backpropagation algorithm and the training picture, and after a trained recognition model is obtained, the method for recognizing English handwritten text Also includes:根据霍夫变换算法,对所述待识别图片进行倾斜校正,获得校正图片;Performing tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain a corrected picture;所述将所述待识别图片输入至所述训练好的识别模型中,获得识别结果包括:The inputting the picture to be recognized into the trained recognition model to obtain a recognition result includes:将所述校正图片输入至所述训练好的识别模型中,获得识别结果。The correction picture is input into the trained recognition model to obtain a recognition result.
- 根据权利要求1至6中任一项所述的英文手写文本识别方法,其中,所述初始识别模型包括卷积层、循环层以及转录层。The method for recognizing English handwritten text according to any one of claims 1 to 6, wherein the initial recognition model includes a convolutional layer, a recurrent layer, and a transcription layer.
- 一种电子设备,其中,所述电子设备包括处理器和存储器,所述处理器用于执行存储器中存储的至少一个计算机可读指令以实现以下步骤:An electronic device, wherein the electronic device includes a processor and a memory, and the processor is configured to execute at least one computer-readable instruction stored in the memory to implement the following steps:获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;获取待识别图片;Obtain the picture to be recognized;将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- 根据权利要求8所述的电子设备,其中,在所述对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片时,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:8. The electronic device according to claim 8, wherein, when the first standard picture and the second standard picture are randomly adjusted to obtain a training picture, the processor executes the at least one computer readable Instructions to achieve the following steps:获取预设缩放倍数区间;Obtain the preset zoom multiple interval;根据所述预设缩放倍数区间,对所述第一标准图像和所述第二标准图像进行等比例的随机缩放,获得随机缩放图片;Performing random scaling on the first standard image and the second standard image in equal proportions according to the preset scaling factor interval to obtain a randomly scaled picture;将所述随机缩放图片映射在预设尺寸的画布上,获得尺寸一致的目标图片;Mapping the randomly zoomed picture on a canvas of a preset size to obtain a target picture with a consistent size;对所述目标图片的亮度、对比度以及饱和度分别进行随机调整,获得随机亮度、随机对比度以及随机饱和度的图片;Randomly adjust brightness, contrast, and saturation of the target picture, respectively, to obtain pictures with random brightness, random contrast, and random saturation;对所述随机亮度、随机对比度以及随机饱和度的图片添加随机的噪声,获得训练图片。Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
- 根据权利要求8所述的电子设备,其中,在所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型时,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:The electronic device according to claim 8, wherein, when the initial recognition model is trained according to the backpropagation algorithm and the training picture to obtain a trained recognition model, the processor executes the at least one Computer readable instructions to achieve the following steps:将所述训练图片输入至所述初始识别模型的卷积层中,获得图像像素特征;Input the training picture into the convolutional layer of the initial recognition model to obtain image pixel features;将所述图像像素特征输入至所述初始识别模型的循环层中,获得图像时序特征;Input the image pixel features into the recurring layer of the initial recognition model to obtain image time series features;将所述图像时序特征输入至所述初始识别模型的转录层中,获得标签序列;Inputting the image timing features into the transcription layer of the initial recognition model to obtain a tag sequence;使用损失函数,计算出所述标签序列对应的损失值;Using a loss function to calculate the loss value corresponding to the tag sequence;根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型。According to the back propagation algorithm and the loss value, the network parameters of the initial recognition model are updated to obtain a trained recognition model.
- 根据权利要求10所述的电子设备,其中,在所述根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型时,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:The electronic device according to claim 10, wherein, when the network parameters of the initial recognition model are updated according to the backpropagation algorithm and the loss value to obtain a trained recognition model, the processor executes all At least one computer-readable instruction is described to implement the following steps:根据反向传播算法以及所述损失值,调整所述初始识别模型的网络参数以最小化所述损失值,获得待测试模型;According to the back propagation algorithm and the loss value, adjusting the network parameters of the initial recognition model to minimize the loss value to obtain the model to be tested;获取预设的测试集;Get the preset test set;使用所述测试集对所述待测试模型进行测试,并确定所述待测试模型被测试通过的准确率;Use the test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test;若所述准确率大于预设准确率阈值,确定所述待测试模型为训练好的识别模型。If the accuracy rate is greater than the preset accuracy rate threshold, it is determined that the model to be tested is a trained recognition model.
- 根据权利要求11所述的电子设备,其中,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:The electronic device of claim 11, wherein the processor executes the at least one computer-readable instruction to implement the following steps:若所述准确率小于或等于预设准确率阈值,确定所述待测试模型为未训练好的识别模型;If the accuracy rate is less than or equal to the preset accuracy rate threshold, determining that the model to be tested is an untrained recognition model;对所述未训练好的识别模型重新进行训练。Retrain the untrained recognition model.
- 根据权利要求8所述的电子设备,其中,在所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型之后,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:8. The electronic device according to claim 8, wherein, after the initial recognition model is trained according to the backpropagation algorithm and the training picture to obtain a trained recognition model, the processor executes the at least one Computer readable instructions to achieve the following steps:根据霍夫变换算法,对所述待识别图片进行倾斜校正,获得校正图片;Performing tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain a corrected picture;所述将所述待识别图片输入至所述训练好的识别模型中,获得识别结果包括:The inputting the picture to be recognized into the trained recognition model to obtain a recognition result includes:将所述校正图片输入至所述训练好的识别模型中,获得识别结果。The correction picture is input into the trained recognition model to obtain a recognition result.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;Acquiring a picture collection of English handwritten text lines, wherein the pictures in the picture collection of English handwritten text lines include English letters, spaces, and punctuation marks;根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;According to a preset width threshold, all pictures in the set of pictures of the English handwritten text line are scaled equally to obtain multiple scaled pictures;从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;From the plurality of zoomed pictures, determine a first standard picture and a picture with a length to be supplemented, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than the preset length Threshold根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;Adding a blank area to the to-be-added length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;Randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture font size;根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;Training the initial recognition model according to the backpropagation algorithm and the training pictures to obtain a trained recognition model;获取待识别图片;Obtain the picture to be recognized;将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The picture to be recognized is input into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces, and punctuation in the picture to be recognized.
- 根据权利要求14所述的存储介质,其中,在所述对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片时,所述至少一个计算机可读指令被处理器执行以实现以下步骤:14. The storage medium according to claim 14, wherein when the first standard picture and the second standard picture are randomly adjusted to obtain a training picture, the at least one computer readable instruction is executed by a processor To achieve the following steps:获取预设缩放倍数区间;Obtain the preset zoom multiple interval;根据所述预设缩放倍数区间,对所述第一标准图像和所述第二标准图像进行等比例的随机缩放,获得随机缩放图片;Performing random scaling on the first standard image and the second standard image in equal proportions according to the preset scaling factor interval to obtain a randomly scaled picture;将所述随机缩放图片映射在预设尺寸的画布上,获得尺寸一致的目标图片;Mapping the randomly zoomed picture on a canvas of a preset size to obtain a target picture with a consistent size;对所述目标图片的亮度、对比度以及饱和度分别进行随机调整,获得随机亮度、随机对比度以及随机饱和度的图片;Randomly adjust brightness, contrast, and saturation of the target picture, respectively, to obtain pictures with random brightness, random contrast, and random saturation;对所述随机亮度、随机对比度以及随机饱和度的图片添加随机的噪声,获得训练图片。Random noise is added to the pictures with random brightness, random contrast, and random saturation to obtain training pictures.
- 根据权利要求14所述的存储介质,其中,在所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型时,所述至少一个计算机可读指令被处理器执行以实现以下步骤:The storage medium according to claim 14, wherein, when the initial recognition model is trained according to the backpropagation algorithm and the training picture to obtain a trained recognition model, the at least one computer readable instruction is The processor executes to achieve the following steps:将所述训练图片输入至所述初始识别模型的卷积层中,获得图像像素特征;Input the training picture into the convolutional layer of the initial recognition model to obtain image pixel features;将所述图像像素特征输入至所述初始识别模型的循环层中,获得图像时序特征;Input the image pixel features into the recurring layer of the initial recognition model to obtain image time series features;将所述图像时序特征输入至所述初始识别模型的转录层中,获得标签序列;Inputting the image timing features into the transcription layer of the initial recognition model to obtain a tag sequence;使用损失函数,计算出所述标签序列对应的损失值;Using a loss function to calculate the loss value corresponding to the tag sequence;根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型。According to the back propagation algorithm and the loss value, the network parameters of the initial recognition model are updated to obtain a trained recognition model.
- 根据权利要求16所述的存储介质,其中,在所述根据反向传播算法以及所述损失值,更新所述初始识别模型的网络参数,获得训练好的识别模型时,所述至少一个计算机可读指令被处理器执行以实现以下步骤:The storage medium according to claim 16, wherein, when the network parameters of the initial recognition model are updated according to the backpropagation algorithm and the loss value to obtain a trained recognition model, the at least one computer can The read instruction is executed by the processor to implement the following steps:根据反向传播算法以及所述损失值,调整所述初始识别模型的网络参数以最小化所述损失值,获得待测试模型;According to the back propagation algorithm and the loss value, adjusting the network parameters of the initial recognition model to minimize the loss value to obtain the model to be tested;获取预设的测试集;Get the preset test set;使用所述测试集对所述待测试模型进行测试,并确定所述待测试模型被测试通过的准确率;Use the test set to test the model to be tested, and determine the accuracy rate at which the model to be tested passes the test;若所述准确率大于预设准确率阈值,确定所述待测试模型为训练好的识别模型。If the accuracy rate is greater than the preset accuracy rate threshold, it is determined that the model to be tested is a trained recognition model.
- 根据权利要求17所述的存储介质,其中,所述至少一个计算机可读指令被处理器执行时还用以实现以下步骤:The storage medium according to claim 17, wherein the at least one computer readable instruction is further used to implement the following steps when executed by the processor:若所述准确率小于或等于预设准确率阈值,确定所述待测试模型为未训练好的识别模型;If the accuracy rate is less than or equal to the preset accuracy rate threshold, determining that the model to be tested is an untrained recognition model;对所述未训练好的识别模型重新进行训练。Retrain the untrained recognition model.
- 根据权利要求14所述的存储介质,其中,在所述根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型之后,所述至少一个计算机可读指令被处理器执行还用以实现以下步骤:The storage medium according to claim 14, wherein, after the initial recognition model is trained according to the backpropagation algorithm and the training picture to obtain the trained recognition model, the at least one computer readable instruction is The processor execution is also used to implement the following steps:根据霍夫变换算法,对所述待识别图片进行倾斜校正,获得校正图片;Performing tilt correction on the picture to be recognized according to the Hough transform algorithm to obtain a corrected picture;所述将所述待识别图片输入至所述训练好的识别模型中,获得识别结果包括:The inputting the picture to be recognized into the trained recognition model to obtain a recognition result includes:将所述校正图片输入至所述训练好的识别模型中,获得识别结果。The correction picture is input into the trained recognition model to obtain a recognition result.
- 一种英文手写文本识别装置,其中,所述英文手写文本识别装置包括:An English handwritten text recognition device, wherein the English handwritten text recognition device includes:获取模块,用于获取英文手写文本行图片集,其中,所述英文手写文本行图片集的图片中包括英文字母、空格以及标点符号;An acquiring module for acquiring a picture collection of English handwritten text lines, wherein the pictures of the picture set of English handwritten text lines include English letters, spaces, and punctuation marks;缩放模块,用于根据预设宽度阈值,对所述英文手写文本行图片集中的所有图片进行等比例缩放,获得多张缩放图片;The zoom module is used to scale all the pictures in the English handwritten text line picture set in equal proportions according to a preset width threshold to obtain multiple zoom pictures;确定模块,用于从所述多张缩放图片中,确定第一标准图片以及待补长度图片,其中,所述第一标准图片的长度等于预设长度阈值,所述待补长度图片的长度小于所述预设长度阈值;The determining module is configured to determine a first standard picture and a picture with a length to be supplemented from the plurality of zoom pictures, wherein the length of the first standard picture is equal to a preset length threshold, and the length of the picture with a length to be supplemented is less than The preset length threshold;添加模块,用于根据所述预设长度阈值,对所述待补长度图片添加空白区域,获得第二 标准图片,其中,所述第二标准图片的长度等于所述预设长度阈值;An adding module, configured to add a blank area to the to-be-compensated length picture according to the preset length threshold to obtain a second standard picture, wherein the length of the second standard picture is equal to the preset length threshold;调整模块,用于对所述第一标准图片以及所述第二标准图片进行随机调整,获得训练图片,其中,所述随机调整的对象包括图片亮度、图片对比度、图片饱和度、噪声以及图片字体大小;The adjustment module is used to randomly adjust the first standard picture and the second standard picture to obtain a training picture, wherein the randomly adjusted objects include picture brightness, picture contrast, picture saturation, noise, and picture fonts size;训练模块,用于根据反向传播算法以及所述训练图片,对初始识别模型进行训练,获得训练好的识别模型;The training module is used to train the initial recognition model according to the backpropagation algorithm and the training picture to obtain a trained recognition model;所述获取模块,还用于获取待识别图片;The obtaining module is also used to obtain the picture to be recognized;输入模块,用于将所述待识别图片输入至所述训练好的识别模型中,获得识别结果,其中,所述识别结果包括所述待识别图片中的英文字母、空格以及标点符号。The input module is used to input the picture to be recognized into the trained recognition model to obtain a recognition result, where the recognition result includes English letters, spaces and punctuation marks in the picture to be recognized.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010329360.1 | 2020-04-23 | ||
CN202010329360.1A CN111639527A (en) | 2020-04-23 | 2020-04-23 | English handwritten text recognition method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021212652A1 true WO2021212652A1 (en) | 2021-10-28 |
Family
ID=72328702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/098237 WO2021212652A1 (en) | 2020-04-23 | 2020-06-24 | Handwritten english text recognition method and device, electronic apparatus, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111639527A (en) |
WO (1) | WO2021212652A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114550158A (en) * | 2022-02-23 | 2022-05-27 | 厦门大学 | Scene character recognition method and system |
CN115082936A (en) * | 2022-05-27 | 2022-09-20 | 深圳市航盛电子股份有限公司 | English word handwriting recognition method and terminal device |
CN116798052A (en) * | 2023-08-28 | 2023-09-22 | 腾讯科技(深圳)有限公司 | Training method and device of text recognition model, storage medium and electronic equipment |
WO2024103292A1 (en) * | 2022-11-16 | 2024-05-23 | 京东方科技集团股份有限公司 | Handwritten form recognition method, and handwritten form recognition model training method and device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114065868B (en) * | 2021-11-24 | 2022-09-02 | 马上消费金融股份有限公司 | Training method of text detection model, text detection method and device |
CN113887546B (en) * | 2021-12-08 | 2022-03-11 | 军事科学院系统工程研究院网络信息研究所 | Method and system for improving image recognition accuracy |
CN114612725B (en) * | 2022-03-18 | 2023-04-25 | 北京百度网讯科技有限公司 | Image processing method, device, equipment and storage medium |
CN115546614B (en) * | 2022-12-02 | 2023-04-18 | 天津城建大学 | Safety helmet wearing detection method based on improved YOLOV5 model |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137349A1 (en) * | 2016-11-14 | 2018-05-17 | Kodak Alaris Inc. | System and method of character recognition using fully convolutional neural networks |
CN109376658A (en) * | 2018-10-26 | 2019-02-22 | 信雅达系统工程股份有限公司 | A kind of OCR method based on deep learning |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
CN110298343A (en) * | 2019-07-02 | 2019-10-01 | 哈尔滨理工大学 | A kind of hand-written blackboard writing on the blackboard recognition methods |
CN110298338A (en) * | 2019-06-20 | 2019-10-01 | 北京易道博识科技有限公司 | A kind of file and picture classification method and device |
CN110619326A (en) * | 2019-07-02 | 2019-12-27 | 安徽七天教育科技有限公司 | English test paper composition detection and identification system and method based on scanning |
US20200026951A1 (en) * | 2018-07-19 | 2020-01-23 | Tata Consultancy Services Limited | Systems and methods for end-to-end handwritten text recognition using neural networks |
CN110765966A (en) * | 2019-10-30 | 2020-02-07 | 哈尔滨工业大学 | One-stage automatic recognition and translation method for handwritten characters |
CN110781885A (en) * | 2019-10-24 | 2020-02-11 | 泰康保险集团股份有限公司 | Text detection method, device, medium and electronic equipment based on image processing |
CN111008624A (en) * | 2019-12-05 | 2020-04-14 | 嘉兴太美医疗科技有限公司 | Optical character recognition method and method for generating training sample for optical character recognition |
-
2020
- 2020-04-23 CN CN202010329360.1A patent/CN111639527A/en active Pending
- 2020-06-24 WO PCT/CN2020/098237 patent/WO2021212652A1/en active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137349A1 (en) * | 2016-11-14 | 2018-05-17 | Kodak Alaris Inc. | System and method of character recognition using fully convolutional neural networks |
US20200026951A1 (en) * | 2018-07-19 | 2020-01-23 | Tata Consultancy Services Limited | Systems and methods for end-to-end handwritten text recognition using neural networks |
CN109376658A (en) * | 2018-10-26 | 2019-02-22 | 信雅达系统工程股份有限公司 | A kind of OCR method based on deep learning |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
CN110298338A (en) * | 2019-06-20 | 2019-10-01 | 北京易道博识科技有限公司 | A kind of file and picture classification method and device |
CN110298343A (en) * | 2019-07-02 | 2019-10-01 | 哈尔滨理工大学 | A kind of hand-written blackboard writing on the blackboard recognition methods |
CN110619326A (en) * | 2019-07-02 | 2019-12-27 | 安徽七天教育科技有限公司 | English test paper composition detection and identification system and method based on scanning |
CN110781885A (en) * | 2019-10-24 | 2020-02-11 | 泰康保险集团股份有限公司 | Text detection method, device, medium and electronic equipment based on image processing |
CN110765966A (en) * | 2019-10-30 | 2020-02-07 | 哈尔滨工业大学 | One-stage automatic recognition and translation method for handwritten characters |
CN111008624A (en) * | 2019-12-05 | 2020-04-14 | 嘉兴太美医疗科技有限公司 | Optical character recognition method and method for generating training sample for optical character recognition |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114550158A (en) * | 2022-02-23 | 2022-05-27 | 厦门大学 | Scene character recognition method and system |
CN114550158B (en) * | 2022-02-23 | 2024-09-06 | 厦门大学 | Scene character recognition method and system |
CN115082936A (en) * | 2022-05-27 | 2022-09-20 | 深圳市航盛电子股份有限公司 | English word handwriting recognition method and terminal device |
WO2024103292A1 (en) * | 2022-11-16 | 2024-05-23 | 京东方科技集团股份有限公司 | Handwritten form recognition method, and handwritten form recognition model training method and device |
CN116798052A (en) * | 2023-08-28 | 2023-09-22 | 腾讯科技(深圳)有限公司 | Training method and device of text recognition model, storage medium and electronic equipment |
CN116798052B (en) * | 2023-08-28 | 2023-12-08 | 腾讯科技(深圳)有限公司 | Training method and device of text recognition model, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111639527A (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021212652A1 (en) | Handwritten english text recognition method and device, electronic apparatus, and storage medium | |
CN109461167B (en) | Training method, matting method, device, medium and terminal of image processing model | |
US11978245B2 (en) | Method and apparatus for generating image | |
CN109345553B (en) | Palm and key point detection method and device thereof, and terminal equipment | |
CN110276342B (en) | License plate identification method and system | |
CN107403130A (en) | A kind of character identifying method and character recognition device | |
CN112261477B (en) | Video processing method and device, training method and storage medium | |
US20210383199A1 (en) | Object-Centric Learning with Slot Attention | |
WO2022089170A1 (en) | Caption area identification method and apparatus, and device and storage medium | |
CN111783767B (en) | Character recognition method, character recognition device, electronic equipment and storage medium | |
CN109086654A (en) | Handwriting model training method, text recognition method, device, equipment and medium | |
CN111489401A (en) | Image color constancy processing method, system, equipment and storage medium | |
CN113420763B (en) | Text image processing method and device, electronic equipment and readable storage medium | |
CN113012075B (en) | Image correction method, device, computer equipment and storage medium | |
WO2022126917A1 (en) | Deep learning-based face image evaluation method and apparatus, device, and medium | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
CN112949649B (en) | Text image identification method and device and computing equipment | |
CN113221718A (en) | Formula identification method and device, storage medium and electronic equipment | |
CN113436222A (en) | Image processing method, image processing apparatus, electronic device, and storage medium | |
CN111008624A (en) | Optical character recognition method and method for generating training sample for optical character recognition | |
CN111062377A (en) | Question number detection method, system, storage medium and electronic equipment | |
CN112990009B (en) | End-to-end lane line detection method, device, equipment and storage medium | |
CN113516697A (en) | Image registration method and device, electronic equipment and computer-readable storage medium | |
WO2020244076A1 (en) | Face recognition method and apparatus, and electronic device and storage medium | |
CN116630992A (en) | Copybook grid text intelligent matching method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20931889 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.02.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931889 Country of ref document: EP Kind code of ref document: A1 |