WO2020098250A1 - Character recognition method, server, and computer readable storage medium - Google Patents

Character recognition method, server, and computer readable storage medium Download PDF

Info

Publication number
WO2020098250A1
WO2020098250A1 PCT/CN2019/088638 CN2019088638W WO2020098250A1 WO 2020098250 A1 WO2020098250 A1 WO 2020098250A1 CN 2019088638 W CN2019088638 W CN 2019088638W WO 2020098250 A1 WO2020098250 A1 WO 2020098250A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
character recognition
image
processing
picture
Prior art date
Application number
PCT/CN2019/088638
Other languages
French (fr)
Chinese (zh)
Inventor
许洋
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020098250A1 publication Critical patent/WO2020098250A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of character recognition, and in particular, to a character recognition method, server, and computer-readable storage medium.
  • OCR Optical Character Recognition, optical character recognition
  • the identification content of the field is in a very limited set, and even can be regarded as an infinite set (such as the name of the ID card, the owner of the driving license, etc.), the identification is easily limited by the amount of labeled data, accurate The rate will also be affected to a certain extent.
  • this application proposes a character recognition method, which can increase the character recognition range and increase the accuracy of character recognition.
  • a first aspect of the present application provides a server, the server includes a memory and a processor, and the memory stores a character recognition system operable on the processor When executed by the processor, the following steps are implemented:
  • the character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  • a second aspect of the present application also provides a character recognition method, which is applied to a server, and the method includes:
  • the character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  • the third aspect of the present application further provides a computer-readable storage medium storing a character recognition system
  • the character recognition system may be executed by at least one processor, Causing the at least one processor to perform the steps of the character recognition method described in any one of the above.
  • the character recognition method, server and computer-readable storage medium proposed in this application acquire character data, and perform image synthesis on each character data acquired with a preset background picture to obtain each character data Corresponding character images; performing random perturbation processing on the synthesized character images to obtain different types of character images; inputting the different types of character images into a deep learning network for training to generate a character recognition model; The image is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  • a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
  • 1 is a schematic diagram of an optional hardware architecture of the server of this application.
  • FIG. 2 is a schematic diagram of a program module of the first embodiment of the character recognition system of the present application
  • FIG. 3 is a schematic diagram of a program module of the second embodiment of the character recognition system of the present application.
  • FIG. 4 is a schematic diagram of an implementation process of the first embodiment of the character recognition method of the present application.
  • FIG. 5 is a schematic diagram of an implementation process of a second embodiment of a character recognition method of the present application.
  • Memory 11 processor 12 Network Interface 13 Character recognition system 100 Get module 101 Processing module 102 Generate module 103 Output module 104 Test module 105 Adjustment module 106
  • FIG. 1 it is a schematic diagram of an optional hardware architecture of the application server 2 of the present application.
  • the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 may be connected to each other through a system bus. It should be noted that FIG. 2 only shows the application server 2 having the components 11-13, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the application server 2 may be an independent server or a server cluster composed of multiple servers .
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the storage 11 may be an internal storage unit of the application server 2, such as a hard disk or a memory of the application server 2.
  • the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both the internal storage unit of the application server 2 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the application server 2 and various application software, such as program codes of the character recognition system 100.
  • the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is generally used to control the overall operation of the application server 2.
  • the processor 12 is used to run the program code or process data stored in the memory 11, for example, to run the character recognition system 100.
  • the network interface 13 may include a wireless network interface or a wired network interface.
  • the network interface 13 is generally used to establish a communication connection between the application server 2 and other electronic devices.
  • the present application proposes a character recognition system 100.
  • the present application proposes a character recognition system 100.
  • FIG. 2 it is a program module diagram of the first embodiment of the character recognition system 100 of the present application.
  • the character recognition system 100 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the character recognition operations of the embodiments of the present application can be implemented.
  • the character recognition system 100 may be divided into one or more modules based on the specific operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the character recognition system 100 may be divided into an acquisition module 101, a processing module 102, a generation module 103, and an output module 104. among them:
  • the acquisition module 101 is used to acquire character data, and perform image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data.
  • the character data may be English letters, symbols, numbers, Chinese characters, etc.
  • the character data includes at least one character.
  • the character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file.
  • the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
  • the preset background picture is a picture determined by the user according to actual needs.
  • the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper".
  • the picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers.
  • the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
  • each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images.
  • image synthesis it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment.
  • the character data can be combined with multiple background pictures for image synthesis to increase the diversity of character images.
  • any existing image synthesis technology may be used to achieve image synthesis.
  • the length of the character data, the style of the character data, and the font size of the character data may be used.
  • the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
  • the processing module 102 is used to perform random disturbance processing on the synthesized character image to obtain different types of character images.
  • the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture.
  • the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance
  • the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture.
  • Gaussian blur is to filter the picture;
  • small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point;
  • the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space;
  • the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
  • different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • a character image with a rotated pattern for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • the generating module 103 is used to input the different types of character images into a deep learning network for training to generate a character recognition model.
  • the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
  • the deep learning network is preferably a CRNN model.
  • the CRNN model is a joint model of a convolutional neural network and a recurrent neural network.
  • the CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
  • the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected
  • the FC layer is used to classify the extracted spatial and temporal features.
  • the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
  • the output module 104 is used to input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
  • the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is specifically selected according to the actual needs of the user, and is not limited in this embodiment.
  • the character recognition system 100 acquires character data, and synthesizes each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Perform random perturbation processing on the synthesized character images to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; and input character images to be recognized into the characters
  • the recognition model the recognition result of the character image to be recognized is output.
  • a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
  • the character recognition system 100 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the character recognition operations of the embodiments of the present application can be implemented.
  • the character recognition system 100 may be divided into one or more modules based on the specific operations implemented by the various parts of the computer program instructions.
  • the character recognition system 100 may be divided into an acquisition module 101, a processing module 102, a generation module 103, an output module 104, a test module 105, and an adjustment module 106.
  • the program modules 101-104 are the same as the first embodiment of the character recognition system 100 of the present application, and on this basis, a test module 105 and an adjustment module 106 are added. among them:
  • the acquisition module 101 is used to acquire character data, and perform image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data.
  • the character data may be English letters, symbols, numbers, Chinese characters, etc.
  • the character data includes at least one character.
  • the character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file.
  • the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
  • the preset background picture is a picture determined by the user according to actual needs.
  • the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper".
  • the picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers.
  • the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
  • each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images.
  • image synthesis it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment.
  • the character data can be combined with multiple background pictures for image synthesis to increase the diversity of character images.
  • any existing image synthesis technology may be used to achieve image synthesis.
  • the length of the character data, the style of the character data, and the font size of the character data may be used.
  • the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
  • the processing module 102 is used to perform random disturbance processing on the synthesized character image to obtain different types of character images.
  • the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture.
  • the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance
  • the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture.
  • Gaussian blur is to filter the picture;
  • small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point;
  • the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space;
  • the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
  • different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • a character image with a rotated pattern for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • the generating module 103 is used to input the different types of character images into a deep learning network for training to generate a character recognition model.
  • the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
  • the deep learning network is preferably a CRNN model.
  • the CRNN model is a joint model of a convolutional neural network and a recurrent neural network.
  • the CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
  • the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected
  • the FC layer is used to classify the extracted spatial and temporal features.
  • the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
  • the output module 104 is used to input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
  • the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is selected according to the actual needs of the user.
  • the testing module 105 is used to test the character recognition accuracy of the character recognition model.
  • the recognition accuracy of the character recognition model on real character image data needs to be tested.
  • the user inputs character images of several real characters into the character recognition model, outputs the recognition result corresponding to the real character, and then calculates the accuracy rate of character recognition according to the output recognition result. It can be understood that, in order to obtain accurate calculation results of the character recognition rate, the amount of real character data input into the character recognition model should be as much as possible.
  • the recognition result output by the character recognition model can be compared with the pre-stored character data to determine whether the character recognition model is true or not for the character recognition. If the data recognition is correct, you can count and accumulate 1 until after all character recognition is completed, divide the calculated accumulated value by the number of characters input into the character recognition model to obtain the recognition accuracy rate of the character recognition model for real character image data .
  • the adjustment module 106 is used to adjust the character recognition model if the recognition accuracy rate is lower than a preset threshold.
  • the character recognition accuracy rate of the character recognition model is obtained, the character recognition accuracy rate is compared with a preset threshold, and if the character recognition accuracy rate is lower than the preset threshold, the character Identify the model and make adjustments.
  • the preset threshold is the lowest value of the character recognition accuracy rate set in advance, for example, the preset threshold is 90%.
  • the preset threshold can be set according to the actual needs of the user, and the preset threshold after the setting can be further modified according to the actual needs.
  • the character recognition model is adjusted in this embodiment, the character recognition model is only fine-tuned, and does not need to be adjusted too much.
  • the step of adjusting the character recognition model includes:
  • Step A Freeze the parameters of the VGG16 layer.
  • the parameters of the VGG16 layer are not changed, that is, the parameters of the VGG16 layer are frozen to prevent the adjustment of the character recognition model.
  • the parameters of the VGG16 layer are adjusted under the stimulation of the training sample data.
  • Step B Adjust the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers.
  • the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers are adjusted, specifically, by releasing the two The parameters of the long- and short-term memory network LSTM layer and two fully connected FC layers, and the learning rate is set to decay every several epochs until it decay to a boundary value.
  • Step C Train the adjusted character recognition model using real character image data.
  • the real character image is input to the character recognition after the adjustment parameters are entered
  • the model is further trained to obtain an adjusted character recognition model.
  • the test module 105 is used to test the recognition accuracy of the model. If the test result meets the requirements, the character recognition model training is completed; if the test module 105 is used to recognize the character
  • step A to step C are repeated until the recognition accuracy of the obtained character recognition model reaches the requirements.
  • the character recognition system 100 acquires character data, and synthesizes each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Perform random perturbation processing on the synthesized character images to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; input character images to be recognized into the character recognition In the model, the recognition result of the character image to be recognized is output; the character recognition accuracy rate of the character recognition model is tested; and if the recognition accuracy rate is lower than a preset threshold, the character recognition model is adjusted. In this way, by fine-tuning the character recognition model when the character recognition model does not reach the preset recognition accuracy, the accuracy of character recognition is improved.
  • this application also proposes a character recognition method.
  • FIG. 4 it is a schematic diagram of an implementation process of the first embodiment of the character recognition method of the present application.
  • the execution order of the steps in the flowchart shown in FIG. 4 may be changed, and some steps may be omitted.
  • step S500 character data is acquired, and each acquired character data is image synthesized with a preset background picture to obtain a character image corresponding to each character data.
  • the character data may be English letters, symbols, numbers, Chinese characters, etc.
  • the character data includes at least one character.
  • the character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file.
  • the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
  • the preset background picture is a picture determined by the user according to actual needs.
  • the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper".
  • the picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers.
  • the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
  • each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images.
  • image synthesis it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment.
  • the character data can be increased in versatility by performing image synthesis with character data and multiple background pictures.
  • any existing image synthesis technology may be used to achieve image synthesis.
  • the length of the character data, the style of the character data, and the font size of the character data may be used.
  • the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
  • Step S502 Perform random perturbation processing on the synthesized character image to obtain character images of different types.
  • the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture.
  • the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance
  • the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture.
  • Gaussian blur is to filter the picture;
  • small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point;
  • the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space;
  • the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
  • different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • a character image with a rotated pattern for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc.
  • Step S504 input the character images of different types into the deep learning network for training to generate a character recognition model.
  • the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
  • the deep learning network is preferably a CRNN model.
  • the CRNN model is a joint model of a convolutional neural network and a recurrent neural network.
  • the CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
  • the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected
  • the FC layer is used to classify the extracted spatial and temporal features.
  • the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
  • Step S506 Input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
  • the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is specifically selected according to the actual needs of the user, and is not limited in this embodiment.
  • the character recognition method proposed in this application acquires character data, and performs image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Random image perturbation processing to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; and input character images to be recognized into the character recognition model , Output the recognition result of the character image to be recognized.
  • a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
  • FIG. 5 it is a schematic diagram of an implementation process of the second embodiment of the character recognition method of the present application.
  • the execution order of the steps in the flowchart shown in FIG. 5 may be changed, and some steps may be omitted.
  • step S600 character data is acquired, and each acquired character data is image synthesized with a preset background picture to obtain a character image corresponding to each character data.
  • Step S602 random disturbance processing is performed on the synthesized character image to obtain character images of different types.
  • Step S604 input the character images of different types into the deep learning network for training to generate a character recognition model.
  • Step S606 Input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
  • steps S600-S606 are similar to the steps S500-S506, and will not be repeated in this embodiment.
  • Step S608 Test the character recognition accuracy of the character recognition model.
  • the recognition accuracy of the character recognition model on real character image data needs to be tested.
  • the user inputs character images of several real characters into the character recognition model, outputs the recognition result corresponding to the real character, and then calculates the accuracy rate of character recognition according to the output recognition result. It can be understood that, in order to obtain accurate calculation results of the character recognition rate, the amount of real character data input into the character recognition model should be as much as possible.
  • the recognition result output by the character recognition model can be compared with the pre-stored character data to determine whether the character recognition model is true or not for the character recognition. If the data recognition is correct, you can count and accumulate 1 until after all character recognition is completed, divide the calculated accumulated value by the number of characters input into the character recognition model to obtain the recognition accuracy rate of the character recognition model for real character image data .
  • Step S610 If the recognition accuracy rate is lower than a preset threshold, adjust the character recognition model.
  • the character recognition accuracy rate of the character recognition model is obtained, the character recognition accuracy rate is compared with a preset threshold, and if the character recognition accuracy rate is lower than the preset threshold, the character Identify the model and make adjustments.
  • the preset threshold is the lowest value of the character recognition accuracy rate set in advance, for example, the preset threshold is 90%.
  • the preset threshold can be set according to the actual needs of the user, and the preset threshold after the setting can be further modified according to the actual needs.
  • the character recognition model is adjusted in this embodiment, the character recognition model is only fine-tuned, and does not need to be adjusted too much.
  • the step of adjusting the character recognition model includes:
  • Step A Freeze the parameters of the VGG16 layer.
  • the parameters of the VGG16 layer are not changed, that is, the parameters of the VGG16 layer are frozen to prevent the adjustment of the character recognition model.
  • the parameters of the VGG16 layer are adjusted under the stimulation of the training sample data.
  • Step B Adjust the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers.
  • the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers are adjusted, specifically, by releasing the two The parameters of the long- and short-term memory network LSTM layer and two fully connected FC layers, and the learning rate is set to decay every several epochs until it decay to a boundary value.
  • Step C Train the adjusted character recognition model using real character image data.
  • the real character image is input to the character recognition after the adjustment parameters are entered
  • the model is further trained to obtain an adjusted character recognition model.
  • the test module 105 is used to test the recognition accuracy of the model. If the test result meets the requirements, the character recognition model training is completed; if the test module 105 is used to recognize the character
  • step A to step C are repeated until the recognition accuracy of the obtained character recognition model reaches the requirements.
  • the character recognition method proposed in this application acquires character data, and performs image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Random image perturbation processing to obtain different types of character images; input the different types of character images into the deep learning network for training to generate a character recognition model; input character images to be recognized into the character recognition model , Output the recognition result of the character image to be recognized; test the character recognition accuracy rate of the character recognition model; and if the recognition accuracy rate is lower than a preset threshold, adjust the character recognition model. In this way, by fine-tuning the character recognition model when the character recognition model does not reach the preset recognition accuracy, the accuracy of character recognition is improved.

Abstract

The present application relates to artificial intelligence. Disclosed is a character recognition method, comprising: obtaining character data and performing image synthesis on each of the obtained character data and a preset background picture to obtain character images corresponding to each of the character data; performing random disturbance processing on the synthesized character images to obtain different types of character images; inputting the different types of character images to a deep learning network for training to generate a character recognition model; and inputting a character image to be recognized to the character recognition model to output a recognition result of said character image. The present application also provides a server and a computer readable storage medium. The character recognition method, server, and computer readable storage medium provided by the present application implement an OCR function on the basis of a deep learning algorithm, and can increase the range of character recognition and improve the accuracy of character recognition.

Description

字符识别方法、服务器及计算机可读存储介质Character recognition method, server and computer readable storage medium
优先权申明Priority declaration
本申请基于巴黎公约申明享有2018年11月12日递交的申请号为CN 201811341729.X、名称为“字符识别方法、服务器及计算机可读存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application is based on the declaration of the Paris Convention and enjoys the priority of the Chinese patent application with the application number CN201811341729.X and the name "character recognition method, server and computer-readable storage medium" submitted on November 12, 2018 The entire content is incorporated in this application by reference.
技术领域Technical field
本申请涉及字符识别领域,尤其涉及一种字符识别方法、服务器及计算机可读存储介质。The present application relates to the field of character recognition, and in particular, to a character recognition method, server, and computer-readable storage medium.
背景技术Background technique
在OCR(Optical Character Recognition,光学字符识别)业务的过程中,通常是根据业务方的需求对某一特定场景下的某些字段进行识别,这一般需要业务方提供该场景下真实的图片数据,并且需要对数据进行人工标注,然后用这些标注的图片进行检测和识别模型的深度学习的训练。在这些字段的识别内容处于不大的有限集内的时候(比如身份证的性别、行驶证的车辆类型、使用性质等),识别正确率通常比较高。而当字段的识别内容处于一个极大的有限集,甚至可以看成是无限集的时候(比如身份证的姓名、行驶证的所有人等),识别就容易受到标注数据的多少的限制,准确率也会受到一定的影响。In the process of OCR (Optical Character Recognition, optical character recognition) business, usually in accordance with the needs of the business party to identify certain fields in a specific scene, this generally requires the business party to provide real picture data in the scene, And you need to manually label the data, and then use these labeled pictures to detect and recognize the deep learning training of the model. When the identification content of these fields is in a small limited set (such as the gender of the ID card, the vehicle type of the driving license, the nature of use, etc.), the recognition accuracy rate is usually relatively high. When the identification content of the field is in a very limited set, and even can be regarded as an infinite set (such as the name of the ID card, the owner of the driving license, etc.), the identification is easily limited by the amount of labeled data, accurate The rate will also be affected to a certain extent.
发明内容Summary of the invention
有鉴于此,本申请提出一种字符识别方法,可以增加字符识别范围以及增加字符识别准确率。In view of this, this application proposes a character recognition method, which can increase the character recognition range and increase the accuracy of character recognition.
首先,为实现上述目的,本申请第一方面提出一种服务器,所述服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的字符识别系统,所述字符识别系统被所述处理器执行时实现如下步骤:First, in order to achieve the above object, a first aspect of the present application provides a server, the server includes a memory and a processor, and the memory stores a character recognition system operable on the processor When executed by the processor, the following steps are implemented:
获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;Obtain character data, and synthesize each acquired character data with a preset background picture to obtain a character image corresponding to each character data;
对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;Randomly disturb the synthesized character image to obtain different types of character images;
将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及Input the different types of character images into the deep learning network for training to generate a character recognition model; and
将待识别字符图像输入至所述字符识别模型中,输出所述待识别 字符图像的识别结果。The character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
此外,为实现上述目的,本申请第二方面还提供一种字符识别方法,应用于服务器,所述方法包括:In addition, in order to achieve the above object, a second aspect of the present application also provides a character recognition method, which is applied to a server, and the method includes:
获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;Obtain character data, and synthesize each acquired character data with a preset background picture to obtain a character image corresponding to each character data;
对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;Randomly disturb the synthesized character image to obtain different types of character images;
将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及Input the different types of character images into the deep learning network for training to generate a character recognition model; and
将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
进一步地,为实现上述目的,本申请第三方面还提供一种计算机可读存储介质,所述计算机可读存储介质存储有字符识别系统,所述字符识别系统可被至少一个处理器执行,以使所述至少一个处理器执行如上所述任一项所述的字符识别方法的步骤。Further, in order to achieve the above object, the third aspect of the present application further provides a computer-readable storage medium storing a character recognition system, the character recognition system may be executed by at least one processor, Causing the at least one processor to perform the steps of the character recognition method described in any one of the above.
相较于现有技术,本申请所提出的字符识别方法、服务器及计算机可读存储介质,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。这样,可根据需要生成多样化的训练样本数据,以解决现有技术中由于训练样本的真实数据不均匀分布导致字符识别范围小以及准确率低的问题,增加字符识别范围以及增加字符识别准确率。Compared with the prior art, the character recognition method, server and computer-readable storage medium proposed in this application acquire character data, and perform image synthesis on each character data acquired with a preset background picture to obtain each character data Corresponding character images; performing random perturbation processing on the synthesized character images to obtain different types of character images; inputting the different types of character images into a deep learning network for training to generate a character recognition model; The image is input into the character recognition model, and the recognition result of the character image to be recognized is output. In this way, a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
附图说明BRIEF DESCRIPTION
图1是本申请服务器一可选的硬件架构的示意图;1 is a schematic diagram of an optional hardware architecture of the server of this application;
图2是本申请字符识别系统第一实施例的程序模块示意图;2 is a schematic diagram of a program module of the first embodiment of the character recognition system of the present application;
图3是本申请字符识别系统第二实施例的程序模块示意图;3 is a schematic diagram of a program module of the second embodiment of the character recognition system of the present application;
图4为本申请字符识别方法第一实施例的实施流程示意图;4 is a schematic diagram of an implementation process of the first embodiment of the character recognition method of the present application;
图5为本申请字符识别方法第二实施例的实施流程示意图。5 is a schematic diagram of an implementation process of a second embodiment of a character recognition method of the present application.
附图标记:Reference mark:
服务器server 22
网络The internet 33
存储器Memory 1111
处理器processor 1212
网络接口Network Interface 1313
字符识别系统 Character recognition system 100100
获取模块Get module 101101
处理模块Processing module 102102
生成模块Generate module 103103
输出模块Output module 104104
测试模块Test module 105105
调整模块Adjustment module 106106
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional characteristics and advantages of the present application will be further described in conjunction with the embodiments and with reference to the drawings.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be described in further detail in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions related to "first", "second", etc. in this application are for descriptive purposes only, and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of technical features indicated . Thus, the features defined with "first" and "second" may include at least one of the features either explicitly or implicitly. In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the ability of those skilled in the art to realize. When the combination of technical solutions contradicts or cannot be realized, it should be considered that the combination of such technical solutions does not exist , Nor within the scope of protection required by this application.
参阅图1所示,是本申请应用服务器2一可选的硬件架构的示意图。Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the application server 2 of the present application.
本实施例中,所述应用服务器2可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是,图2仅示出了具有组件11-13的应用服务器2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组 件。In this embodiment, the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 may be connected to each other through a system bus. It should be noted that FIG. 2 only shows the application server 2 having the components 11-13, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
其中,所述应用服务器2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该应用服务器2可以是独立的服务器,也可以是多个服务器所组成的服务器集群。The application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 2 may be an independent server or a server cluster composed of multiple servers .
所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述应用服务器2的内部存储单元,例如该应用服务器2的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述应用服务器2的外部存储设备,例如该应用服务器2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述应用服务器2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述应用服务器2的操作系统和各类应用软件,例如字符识别系统100的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the application server 2, such as a hard disk or a memory of the application server 2. In other embodiments, the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 11 may also include both the internal storage unit of the application server 2 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the application server 2 and various application software, such as program codes of the character recognition system 100. In addition, the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述应用服务器2的总体操作。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行所述的字符识别系统100等。The processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is generally used to control the overall operation of the application server 2. In this embodiment, the processor 12 is used to run the program code or process data stored in the memory 11, for example, to run the character recognition system 100.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述应用服务器2与其他电子设备之间建立通信连接。The network interface 13 may include a wireless network interface or a wired network interface. The network interface 13 is generally used to establish a communication connection between the application server 2 and other electronic devices.
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。So far, the hardware structure and functions of the relevant equipment of this application have been introduced in detail. In the following, various embodiments of the present application will be filed based on the above introduction.
首先,本申请提出一种字符识别系统100。First, the present application proposes a character recognition system 100.
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。So far, the hardware structure and functions of the relevant equipment of this application have been introduced in detail. In the following, various embodiments of the present application will be filed based on the above introduction.
首先,本申请提出一种字符识别系统100。First, the present application proposes a character recognition system 100.
参阅图2所示,是本申请字符识别系统100第一实施例的程序模块图。Referring to FIG. 2, it is a program module diagram of the first embodiment of the character recognition system 100 of the present application.
本实施例中,所述字符识别系统100包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的字符识别操作。在一些实施例中,基于该计 算机程序指令各部分所实现的特定的操作,字符识别系统100可以被划分为一个或多个模块。例如,在图2中,字符识别系统100可以被分割成获取模块101、处理模块102、生成模块103及输出模块104。其中:In this embodiment, the character recognition system 100 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the character recognition operations of the embodiments of the present application can be implemented. In some embodiments, the character recognition system 100 may be divided into one or more modules based on the specific operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the character recognition system 100 may be divided into an acquisition module 101, a processing module 102, a generation module 103, and an output module 104. among them:
所述获取模块101用于获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像。The acquisition module 101 is used to acquire character data, and perform image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data.
具体地,所述字符数据可以为英文字母、符号、数字、汉字等,在本实施例中,所述字符数据包括至少一个字符。该字符数据可以从网络中抓取得到,然后存储在预设文件中,当用户需要使用该字符数据时,直接从所述预设文件中获取;该字符数据也可以为业务方所提供的字符数据,并将之存储在预设文件中,当用户需要使用该字符数据时,同样可直接从该预设文件中获取。优选地,该预设文件为TXT格式的文件。本领域技术人员可以采用任何方式获取所述字符数据,此处不再赘述。Specifically, the character data may be English letters, symbols, numbers, Chinese characters, etc. In this embodiment, the character data includes at least one character. The character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file. Preferably, the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
所述预设的背景图片为用户根据实际需要所确定的图片,在本实施例中,所述预设的背景图片优选为从网上以关键词为“纸张”所抓取的图片,该图片至少为一张,当然,该图片也可以用户自己使用相机针对各种纸张进行拍摄后所得到的图片。可以理解的是,在本申请其他实施方式中,该预设的背景图也可以为其他样式的图片,比如为车牌号的图片,身份证的图片等。The preset background picture is a picture determined by the user according to actual needs. In this embodiment, the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper". The picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers. It can be understood that, in other embodiments of the present application, the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
举例而言,当获取的字符数据中具有5个字符数据,该预设的背景图片具有4张时,则在进行图像合成时,优选地,可以将每个字符数据分别与每张背景图像进行图像合成,这样,每个字符数据就可以合成4张字符图像,5个字符数据即可合成20张字符图像。当然,在进行图像合成时,也可以不必每个字符数据都需要和每一张背景图像进行图像合成以得到字符图像,在本实施例中不做限定。在本实施例中,通过字符数据与多张背景图片进行图像合成,可以增加字符图像的多样性。For example, when the acquired character data has 5 character data, and the preset background picture has 4 images, when image synthesis is performed, preferably, each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images. Of course, when image synthesis is performed, it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment. In this embodiment, the character data can be combined with multiple background pictures for image synthesis to increase the diversity of character images.
在本实施例中,可以采用任意一种现有的图像合成技术来实现图像的合成,例如,在进行图像合成时,首先,可以根据该字符数据的长度、字符数据的样式以及字符数据的字号确定该字符数据所占像素空间的长宽,在确定该字符数据所占像素空间的长宽之后,从该背景图片的像素中选取对应的像素区域以便该字符数据对应的像素能够插入到该像素区域中,并取代原先位于该像素区域中的像素。可以理解的是,在其他实施方式中,也可以不采用像素取代的方式,而是直接采用像素叠加的方式,即将该字符数据对应的每个像素分别与该像素区域中对应的每个像素进行叠加,将叠加后的像素值作为该像素区域中的各个像素的像素值。In this embodiment, any existing image synthesis technology may be used to achieve image synthesis. For example, when performing image synthesis, first, the length of the character data, the style of the character data, and the font size of the character data may be used. Determine the length and width of the pixel space occupied by the character data, and after determining the length and width of the pixel space occupied by the character data, select the corresponding pixel area from the pixels of the background picture so that the pixel corresponding to the character data can be inserted into the pixel In the area, and replace the pixels originally in the pixel area. It can be understood that, in other embodiments, instead of pixel replacement, the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
所述处理模块102用于对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像。The processing module 102 is used to perform random disturbance processing on the synthesized character image to obtain different types of character images.
具体地,所述随机扰动处理包括高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、以及图片的对比度处理和颜色变化处理等。其中,对图片进行高斯模糊处理指的是对图片进行一定均值和方差的高斯滤波;对图片进行高斯噪声处理指的是在图片的三个颜色通道上添加高斯噪点,与高斯模糊不同的是,这是直接在值上的叠加,而高斯模糊是对图片进行滤波;对图片小幅度旋转处理指的是根据字段框确定要旋转的中心点,也可以直接取图片的中心作为旋转的中心点,这可以根据实际的业务需要进行调整,然后按中心点旋转一个角度;图片的对比度处理指的是对图片在HSV色彩空间上的S(Saturation饱和度)和V(Value明度)进行随机变化;图片的颜色变化处理指的是对图片在HSV色彩空间上的H(Hue色调)进行随机变化。Specifically, the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture. Among them, the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance; the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture. Unlike Gaussian blur, This is directly superimposed on the value, and Gaussian blur is to filter the picture; small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point; the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space; the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
在本实施例中,通过对合成后的图像采用上述所述的至少一种扰动处理方法可以得到不同类型的字符图像,比如,可以得到旋转样式的字符图像,有噪点的字符图像,倾斜的字符图像等。通过对合成的图像进行扰动处理,可以增加字符图像的多样性,使得训练样本的数据更加丰富,以便通过该训练样本训练得到的字符识别模型能够具有更高的识别准确率。In this embodiment, different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc. By disturbing the synthesized image, the diversity of character images can be increased, so that the data of the training sample is more abundant, so that the character recognition model trained by the training sample can have higher recognition accuracy.
所述生成模块103用于将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型。The generating module 103 is used to input the different types of character images into a deep learning network for training to generate a character recognition model.
具体地,在将不同类型的字符图像输入至深度学习网络之前,需要先对该字符图像进行预处理,以将该字符图像转换为需求特征向量,然后将该需求特征向量输入至深度学习网络中进行训练。Specifically, before inputting different types of character images into the deep learning network, the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
在本实施例中,该深度学习网络优选为CRNN模型,该CRNN模型为一种卷积神经网络与循环神经网络的联合模型,该CRNN模型是一种端到端可训练的模型,其具有如下优点:1)输入数据可以为任意长度(图像宽度任意,单词长度任意);2)训练集无需有字符的标定;3)带字典和不带字典的库(样本)都可以使用;4)性能好,而且模型小(参数少)。In this embodiment, the deep learning network is preferably a CRNN model. The CRNN model is a joint model of a convolutional neural network and a recurrent neural network. The CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
在一具体实施方式中,该CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层由13个卷积层以及3个全链接层组成,用于对字符图像的空间特征进行提取;所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,以获得待训练识别的文本的上下文关系;所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。本实施例中的CRNN模型与现有的CRNN模型相比,新增了一个全连接FC层来加快训练的收敛速度。In a specific embodiment, the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected The FC layer is used to classify the extracted spatial and temporal features. Compared with the existing CRNN model, the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
所述输出模块104用于将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The output module 104 is used to input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
在本实施例中,当用户需要识别字符时,只需要采集待识别字符的字符图像后,将该字符图像输入至字符识别模型中,该字符识别模型即可识别出该字符图像对应的字符。在本实施例中,该字符识别模型可以存储在本地的字符识别终端中,也可以存储在服务器中,具体根据用户的实际需要进行选择,在本实施例中不做限定。In this embodiment, when a user needs to recognize a character, he only needs to collect a character image of the character to be recognized, and then input the character image into a character recognition model, and the character recognition model can recognize the character corresponding to the character image. In this embodiment, the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is specifically selected according to the actual needs of the user, and is not limited in this embodiment.
通过上述程序模块101-104,本申请所提出的字符识别系统100,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。这样,可根据需要生成多样化的训练样本数据,以解决现有技术中由于训练样本的真实数据不均匀分布导致字符识别范围小以及准确率低的问题,增加字符识别范围以及增加字符识别准确率。Through the above program modules 101-104, the character recognition system 100 proposed in this application acquires character data, and synthesizes each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Perform random perturbation processing on the synthesized character images to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; and input character images to be recognized into the characters In the recognition model, the recognition result of the character image to be recognized is output. In this way, a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
参阅图3所示,是本申请字符识别系统100第二实施例的程序模块图。本实施例中,所述字符识别系统100包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的字符识别操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,字符识别系统100可以被划分为一个或多个模块。例如,在图3中,字符识别系统100可以被分割成获取模块101、处理模块102、生成模块103、输出模块104、测试模块105及调整模块106。所述各程序模块101-104与本申请字符识别系统100第一实施例相同,并在此基础上增加测试模块105及调整模块106。其中:Referring to FIG. 3, it is a program module diagram of the second embodiment of the character recognition system 100 of the present application. In this embodiment, the character recognition system 100 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the character recognition operations of the embodiments of the present application can be implemented. In some embodiments, the character recognition system 100 may be divided into one or more modules based on the specific operations implemented by the various parts of the computer program instructions. For example, in FIG. 3, the character recognition system 100 may be divided into an acquisition module 101, a processing module 102, a generation module 103, an output module 104, a test module 105, and an adjustment module 106. The program modules 101-104 are the same as the first embodiment of the character recognition system 100 of the present application, and on this basis, a test module 105 and an adjustment module 106 are added. among them:
所述获取模块101用于获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像。The acquisition module 101 is used to acquire character data, and perform image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data.
具体地,所述字符数据可以为英文字母、符号、数字、汉字等,在本实施例中,所述字符数据包括至少一个字符。该字符数据可以从网络中抓取得到,然后存储在预设文件中,当用户需要使用该字符数据时,直接从所述预设文件中获取;该字符数据也可以为业务方所提供的字符数据,并将之存储在预设文件中,当用户需要使用该字符数据时,同样可直接从该预设文件中获取。优选地,该预设文件为TXT格式的文件。本领域技术人员可以采用任何方式获取所述字符数据,此处不再赘述。Specifically, the character data may be English letters, symbols, numbers, Chinese characters, etc. In this embodiment, the character data includes at least one character. The character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file. Preferably, the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
所述预设的背景图片为用户根据实际需要所确定的图片,在本实 施例中,所述预设的背景图片优选为从网上以关键词为“纸张”所抓取的图片,该图片至少为一张,当然,该图片也可以用户自己使用相机针对各种纸张进行拍摄后所得到的图片。可以理解的是,在本申请其他实施方式中,该预设的背景图也可以为其他样式的图片,比如为车牌号的图片,身份证的图片等。The preset background picture is a picture determined by the user according to actual needs. In this embodiment, the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper". The picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers. It can be understood that, in other embodiments of the present application, the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
举例而言,当获取的字符数据中具有5个字符数据,该预设的背景图片具有4张时,则在进行图像合成时,优选地,可以将每个字符数据分别与每张背景图像进行图像合成,这样,每个字符数据就可以合成4张字符图像,5个字符数据即可合成20张字符图像。当然,在进行图像合成时,也可以不必每个字符数据都需要和每一张背景图像进行图像合成以得到字符图像,在本实施例中不做限定。在本实施例中,通过字符数据与多张背景图片进行图像合成,可以增加字符图像的多样性。For example, when the acquired character data has 5 character data, and the preset background picture has 4 images, when image synthesis is performed, preferably, each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images. Of course, when image synthesis is performed, it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment. In this embodiment, the character data can be combined with multiple background pictures for image synthesis to increase the diversity of character images.
在本实施例中,可以采用任意一种现有的图像合成技术来实现图像的合成,例如,在进行图像合成时,首先,可以根据该字符数据的长度、字符数据的样式以及字符数据的字号确定该字符数据所占像素空间的长宽,在确定该字符数据所占像素空间的长宽之后,从该背景图片的像素中选取对应的像素区域以便该字符数据对应的像素能够插入到该像素区域中,并取代原先位于该像素区域中的像素。可以理解的是,在其他实施方式中,也可以不采用像素取代的方式,而是直接采用像素叠加的方式,即将该字符数据对应的每个像素分别与该像素区域中对应的每个像素进行叠加,将叠加后的像素值作为该像素区域中的各个像素的像素值。In this embodiment, any existing image synthesis technology may be used to achieve image synthesis. For example, when performing image synthesis, first, the length of the character data, the style of the character data, and the font size of the character data may be used. Determine the length and width of the pixel space occupied by the character data, and after determining the length and width of the pixel space occupied by the character data, select the corresponding pixel area from the pixels of the background picture so that the pixel corresponding to the character data can be inserted into the pixel In the area, and replace the pixels originally in the pixel area. It can be understood that, in other embodiments, instead of pixel replacement, the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
所述处理模块102用于对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像。The processing module 102 is used to perform random disturbance processing on the synthesized character image to obtain different types of character images.
具体地,所述随机扰动处理包括高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、以及图片的对比度处理和颜色变化处理等。其中,对图片进行高斯模糊处理指的是对图片进行一定均值和方差的高斯滤波;对图片进行高斯噪声处理指的是在图片的三个颜色通道上添加高斯噪点,与高斯模糊不同的是,这是直接在值上的叠加,而高斯模糊是对图片进行滤波;对图片小幅度旋转处理指的是根据字段框确定要旋转的中心点,也可以直接取图片的中心作为旋转的中心点,这可以根据实际的业务需要进行调整,然后按中心点旋转一个角度;图片的对比度处理指的是对图片在HSV色彩空间上的S(Saturation饱和度)和V(Value明度)进行随机变化;图片的颜色变化处理指的是对图片在HSV色彩空间上的H(Hue色调)进行随机变化。Specifically, the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture. Among them, the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance; the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture. Unlike Gaussian blur, This is directly superimposed on the value, and Gaussian blur is to filter the picture; small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point; the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space; the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
在本实施例中,通过对合成后的图像采用上述所述的至少一种扰动处理方法可以得到不同类型的字符图像,比如,可以得到旋转样式 的字符图像,有噪点的字符图像,倾斜的字符图像等。通过对合成的图像进行扰动处理,可以增加字符图像的多样性,使得训练样本的数据更加丰富,以便通过该训练样本训练得到的字符识别模型能够具有更高的识别准确率。In this embodiment, different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc. By disturbing the synthesized image, the diversity of character images can be increased, so that the data of the training sample is more abundant, so that the character recognition model trained by the training sample can have higher recognition accuracy.
所述生成模块103用于将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型。The generating module 103 is used to input the different types of character images into a deep learning network for training to generate a character recognition model.
具体地,在将不同类型的字符图像输入至深度学习网络之前,需要先对该字符图像进行预处理,以将该字符图像转换为需求特征向量,然后将该需求特征向量输入至深度学习网络中进行训练。Specifically, before inputting different types of character images into the deep learning network, the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
在本实施例中,该深度学习网络优选为CRNN模型,该CRNN模型为一种卷积神经网络与循环神经网络的联合模型,该CRNN模型是一种端到端可训练的模型,其具有如下优点:1)输入数据可以为任意长度(图像宽度任意,单词长度任意);2)训练集无需有字符的标定;3)带字典和不带字典的库(样本)都可以使用;4)性能好,而且模型小(参数少)。In this embodiment, the deep learning network is preferably a CRNN model. The CRNN model is a joint model of a convolutional neural network and a recurrent neural network. The CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
在一具体实施方式中,该CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层由13个卷积层以及3个全链接层组成,用于对字符图像的空间特征进行提取;所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,以获得待训练识别的文本的上下文关系;所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。本实施例中的CRNN模型与现有的CRNN模型相比,新增了一个全连接FC层来加快训练的收敛速度。In a specific embodiment, the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected The FC layer is used to classify the extracted spatial and temporal features. Compared with the existing CRNN model, the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
所述输出模块104用于将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The output module 104 is used to input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
在本实施例中,当用户需要识别字符时,只需要采集待识别字符的字符图像后,将该字符图像输入至字符识别模型中,该字符识别模型即可识别出该字符图像对应的字符。在本实施例中,该字符识别模型可以存储在本地的字符识别终端中,也可以存储在服务器中,具体根据用户的实际需要进行选择。In this embodiment, when a user needs to recognize a character, he only needs to collect a character image of the character to be recognized, and then input the character image into a character recognition model, and the character recognition model can recognize the character corresponding to the character image. In this embodiment, the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is selected according to the actual needs of the user.
测试模块105用于测试所述字符识别模型对字符的识别准确率。The testing module 105 is used to test the character recognition accuracy of the character recognition model.
具体地,当生成字符识别模型之后,需要测试该字符识别模型对真实字符图像数据的识别准确率。Specifically, after the character recognition model is generated, the recognition accuracy of the character recognition model on real character image data needs to be tested.
在一实施例中,用户将若干个真实字符的字符图像输入至该字符识别模型中,输出该真实字符对应的识别结果,然后根据该输出的识别结果计算字符识别的准确率。可以理解的是,为了得到准确的字符识别率的计算结果,输入至字符识别模型中的真实字符数据的数量应该尽可能多。In one embodiment, the user inputs character images of several real characters into the character recognition model, outputs the recognition result corresponding to the real character, and then calculates the accuracy rate of character recognition according to the output recognition result. It can be understood that, in order to obtain accurate calculation results of the character recognition rate, the amount of real character data input into the character recognition model should be as much as possible.
在计算字符识别的准确率时,可以将该字符识别模型输出的识别结果与预存的该字符数据进行对比,从而确定该字符识别模型对该字符识别的真确与否,若确定该某个的字符数据识别正确,则可以计数累加1,直到所有的字符识别完成之后,将该计算累加值除以输入至字符识别模型中的字符个数以得到该字符识别模型对真实字符图像数据的识别准确率。When calculating the accuracy of character recognition, the recognition result output by the character recognition model can be compared with the pre-stored character data to determine whether the character recognition model is true or not for the character recognition. If the data recognition is correct, you can count and accumulate 1 until after all character recognition is completed, divide the calculated accumulated value by the number of characters input into the character recognition model to obtain the recognition accuracy rate of the character recognition model for real character image data .
所述调整模块106用于若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。The adjustment module 106 is used to adjust the character recognition model if the recognition accuracy rate is lower than a preset threshold.
具体地,当获取到所述字符识别模型对字符的识别准确率之后,将该字符识别准确率与预设阈值进行比较,若该字符识别的准确率低于该预设阈值,则对该字符识别模型进行调整。在本实施例中,该预设阈值为预先设置的字符识别准确率的最低值,例如,该预设阈值为90%。该预设阈值可以根据用户的实际需求进行设定,该设定之后的预设阈值也可以进一步根据实际需求进行修改。Specifically, after the character recognition accuracy rate of the character recognition model is obtained, the character recognition accuracy rate is compared with a preset threshold, and if the character recognition accuracy rate is lower than the preset threshold, the character Identify the model and make adjustments. In this embodiment, the preset threshold is the lowest value of the character recognition accuracy rate set in advance, for example, the preset threshold is 90%. The preset threshold can be set according to the actual needs of the user, and the preset threshold after the setting can be further modified according to the actual needs.
需要说明的是,本实施例中对所述字符识别模型进行调整时,仅仅是对该字符识别模型进行微调,而不需要做太大的调整。It should be noted that when the character recognition model is adjusted in this embodiment, the character recognition model is only fine-tuned, and does not need to be adjusted too much.
具体地,所述对所述字符识别模型进行调整的步骤包括:Specifically, the step of adjusting the character recognition model includes:
步骤A、冻结所述VGG16层的参数。Step A. Freeze the parameters of the VGG16 layer.
在本实施例中,当对所述字符识别模型进行调整时,对该VGG16层的各个参数不进行改变,即对该VGG16层的参数进行冻结,以防止在对字符识别模型进行调整时,所述VGG16层的参数在训练样本数据的刺激下进行调整。In this embodiment, when the character recognition model is adjusted, the parameters of the VGG16 layer are not changed, that is, the parameters of the VGG16 layer are frozen to prevent the adjustment of the character recognition model. The parameters of the VGG16 layer are adjusted under the stimulation of the training sample data.
步骤B、对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整。Step B: Adjust the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers.
在本实施例中,当对所述字符识别模型进行调整时,对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整,具体地,通过放开两个长短期记忆网络LSTM层和两个全连接FC层的参数,并将学习率设置为每相隔若干个epoch进行衰减,直到衰减到一个边界值。In this embodiment, when the character recognition model is adjusted, the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers are adjusted, specifically, by releasing the two The parameters of the long- and short-term memory network LSTM layer and two fully connected FC layers, and the learning rate is set to decay every several epochs until it decay to a boundary value.
步骤C、采用真实的字符图像数据对经过调整后的字符识别模型进行训练。Step C: Train the adjusted character recognition model using real character image data.
在本实施例中,在对对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整的同时,将真实的字符图像输入至进过调整参数后的字符识别模型中进行进一步地训练以得到调整后的字符识别模型。在得到调整后的字符识别模型之后,再采用测试模块105对该模型的识别准确率进行测试,若测试结果满足需求,则完成对该字符识别模型的训练;若采用测试模块105对该字符识别模型进行测试得到的测试结果仍不满足需求时,则重复执行步骤A-步 骤C,直到得到的字符识别模型的识别准确率达到要求为止。In this embodiment, while adjusting the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers, the real character image is input to the character recognition after the adjustment parameters are entered The model is further trained to obtain an adjusted character recognition model. After obtaining the adjusted character recognition model, the test module 105 is used to test the recognition accuracy of the model. If the test result meets the requirements, the character recognition model training is completed; if the test module 105 is used to recognize the character When the test result obtained by the model test still does not meet the requirements, step A to step C are repeated until the recognition accuracy of the obtained character recognition model reaches the requirements.
通过上述程序模块101-106,本申请所提出的字符识别系统100,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果;测试所述字符识别模型对字符的识别准确率;及若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。这样,通过在字符识别模型达不到预设的识别准确率时,对字符识别模型进行微调,从而提高字符识别的准确率。Through the above program modules 101-106, the character recognition system 100 proposed in this application acquires character data, and synthesizes each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Perform random perturbation processing on the synthesized character images to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; input character images to be recognized into the character recognition In the model, the recognition result of the character image to be recognized is output; the character recognition accuracy rate of the character recognition model is tested; and if the recognition accuracy rate is lower than a preset threshold, the character recognition model is adjusted. In this way, by fine-tuning the character recognition model when the character recognition model does not reach the preset recognition accuracy, the accuracy of character recognition is improved.
此外,本申请还提出一种字符识别方法。In addition, this application also proposes a character recognition method.
参阅图4所示,是本申请字符识别方法第一实施例的实施流程示意图。在本实施例中,根据不同的需求,图4所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 4, it is a schematic diagram of an implementation process of the first embodiment of the character recognition method of the present application. In this embodiment, according to different requirements, the execution order of the steps in the flowchart shown in FIG. 4 may be changed, and some steps may be omitted.
步骤S500,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像。In step S500, character data is acquired, and each acquired character data is image synthesized with a preset background picture to obtain a character image corresponding to each character data.
具体地,所述字符数据可以为英文字母、符号、数字、汉字等,在本实施例中,所述字符数据包括至少一个字符。该字符数据可以从网络中抓取得到,然后存储在预设文件中,当用户需要使用该字符数据时,直接从所述预设文件中获取;该字符数据也可以为业务方所提供的字符数据,并将之存储在预设文件中,当用户需要使用该字符数据时,同样可直接从该预设文件中获取。优选地,该预设文件为TXT格式的文件。本领域技术人员可以采用任何方式获取所述字符数据,此处不再赘述。Specifically, the character data may be English letters, symbols, numbers, Chinese characters, etc. In this embodiment, the character data includes at least one character. The character data can be captured from the network and then stored in a preset file. When users need to use the character data, they can directly obtain it from the preset file; the character data can also be the characters provided by the business party The data is stored in the preset file. When the user needs to use the character data, it can also be directly obtained from the preset file. Preferably, the preset file is a file in TXT format. A person skilled in the art may obtain the character data in any manner, which will not be repeated here.
所述预设的背景图片为用户根据实际需要所确定的图片,在本实施例中,所述预设的背景图片优选为从网上以关键词为“纸张”所抓取的图片,该图片至少为一张,当然,该图片也可以用户自己使用相机针对各种纸张进行拍摄后所得到的图片。可以理解的是,在本申请其他实施方式中,该预设的背景图也可以为其他样式的图片,比如为车牌号的图片,身份证的图片等。The preset background picture is a picture determined by the user according to actual needs. In this embodiment, the preset background picture is preferably a picture grabbed from the Internet with a keyword of "paper". The picture is at least This is a picture, of course, the picture can also be obtained by the user using the camera to shoot various papers. It can be understood that, in other embodiments of the present application, the preset background image may also be a picture of another style, such as a license plate number picture, an ID card picture, and the like.
举例而言,当获取的字符数据中具有5个字符数据,该预设的背景图片具有4张时,则在进行图像合成时,优选地,可以将每个字符数据分别与每张背景图像进行图像合成,这样,每个字符数据就可以合成4张字符图像,5个字符数据即可合成20张字符图像。当然,在进行图像合成时,也可以不必每个字符数据都需要和每一张背景图像进行图像合成以得到字符图像,在本实施例中不做限定。在本实施例中,通过字符数据与多张背景图片进行图像合成,可以增加字符图像的多 样性。For example, when the acquired character data has 5 character data, and the preset background picture has 4 images, when image synthesis is performed, preferably, each character data may be separately performed with each background image Image synthesis, so that each character data can synthesize 4 character images, 5 character data can synthesize 20 character images. Of course, when image synthesis is performed, it is not necessary for each character data to be image synthesized with each background image to obtain a character image, which is not limited in this embodiment. In this embodiment, the character data can be increased in versatility by performing image synthesis with character data and multiple background pictures.
在本实施例中,可以采用任意一种现有的图像合成技术来实现图像的合成,例如,在进行图像合成时,首先,可以根据该字符数据的长度、字符数据的样式以及字符数据的字号确定该字符数据所占像素空间的长宽,在确定该字符数据所占像素空间的长宽之后,从该背景图片的像素中选取对应的像素区域以便该字符数据对应的像素能够插入到该像素区域中,并取代原先位于该像素区域中的像素。可以理解的是,在其他实施方式中,也可以不采用像素取代的方式,而是直接采用像素叠加的方式,即将该字符数据对应的每个像素分别与该像素区域中对应的每个像素进行叠加,将叠加后的像素值作为该像素区域中的各个像素的像素值。In this embodiment, any existing image synthesis technology may be used to achieve image synthesis. For example, when performing image synthesis, first, the length of the character data, the style of the character data, and the font size of the character data may be used. Determine the length and width of the pixel space occupied by the character data, and after determining the length and width of the pixel space occupied by the character data, select the corresponding pixel area from the pixels of the background picture so that the pixel corresponding to the character data can be inserted into the pixel In the area, and replace the pixels originally in the pixel area. It can be understood that, in other embodiments, instead of pixel replacement, the pixel superposition method may be used directly, that is, each pixel corresponding to the character data is respectively performed with each pixel corresponding to the pixel area Superposition, the superimposed pixel value is used as the pixel value of each pixel in the pixel area.
步骤S502,对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像。Step S502: Perform random perturbation processing on the synthesized character image to obtain character images of different types.
具体地,所述随机扰动处理包括高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、以及图片的对比度处理和颜色变化处理等。其中,对图片进行高斯模糊处理指的是对图片进行一定均值和方差的高斯滤波;对图片进行高斯噪声处理指的是在图片的三个颜色通道上添加高斯噪点,与高斯模糊不同的是,这是直接在值上的叠加,而高斯模糊是对图片进行滤波;对图片小幅度旋转处理指的是根据字段框确定要旋转的中心点,也可以直接取图片的中心作为旋转的中心点,这可以根据实际的业务需要进行调整,然后按中心点旋转一个角度;图片的对比度处理指的是对图片在HSV色彩空间上的S(Saturation饱和度)和V(Value明度)进行随机变化;图片的颜色变化处理指的是对图片在HSV色彩空间上的H(Hue色调)进行随机变化。Specifically, the random disturbance processing includes Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, and contrast processing and color change processing of the picture. Among them, the Gaussian blur processing of the picture refers to the Gaussian filtering of the picture with a certain mean and variance; the Gaussian noise processing of the picture refers to adding Gaussian noise to the three color channels of the picture. Unlike Gaussian blur, This is directly superimposed on the value, and Gaussian blur is to filter the picture; small-scale rotation of the picture refers to determining the center point to be rotated according to the field frame, or you can directly take the center of the picture as the center point of rotation, This can be adjusted according to the actual business needs, and then rotated by an angle according to the center point; the contrast processing of the picture refers to the random change of the S (Saturation) and V (Value lightness) of the picture in the HSV color space; the picture The color change process refers to randomly changing the H (Hue hue) of the picture in the HSV color space.
在本实施例中,通过对合成后的图像采用上述所述的至少一种扰动处理方法可以得到不同类型的字符图像,比如,可以得到旋转样式的字符图像,有噪点的字符图像,倾斜的字符图像等。通过对合成的图像进行扰动处理,可以增加字符图像的多样性,使得训练样本的数据更加丰富,以便通过该训练样本训练得到的字符识别模型能够具有更高的识别准确率。In this embodiment, different types of character images can be obtained by using at least one of the above-mentioned perturbation processing methods on the synthesized image, for example, a character image with a rotated pattern, a noisy character image, and an inclined character can be obtained Images etc. By disturbing the synthesized image, the diversity of character images can be increased, so that the data of the training sample is more abundant, so that the character recognition model trained by the training sample can have higher recognition accuracy.
步骤S504,将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型。Step S504, input the character images of different types into the deep learning network for training to generate a character recognition model.
具体地,在将不同类型的字符图像输入至深度学习网络之前,需要先对该字符图像进行预处理,以将该字符图像转换为需求特征向量,然后将该需求特征向量输入至深度学习网络中进行训练。Specifically, before inputting different types of character images into the deep learning network, the character image needs to be pre-processed to convert the character image into a required feature vector, and then the required feature vector is input into the deep learning network For training.
在本实施例中,该深度学习网络优选为CRNN模型,该CRNN模型为一种卷积神经网络与循环神经网络的联合模型,该CRNN模型是一种端到端可训练的模型,其具有如下优点:1)输入数据可以 为任意长度(图像宽度任意,单词长度任意);2)训练集无需有字符的标定;3)带字典和不带字典的库(样本)都可以使用;4)性能好,而且模型小(参数少)。In this embodiment, the deep learning network is preferably a CRNN model. The CRNN model is a joint model of a convolutional neural network and a recurrent neural network. The CRNN model is an end-to-end trainable model, which has the following Advantages: 1) The input data can be of any length (the image width is arbitrary, the word length is arbitrary); 2) The training set does not require character calibration; 3) Both the dictionary with and without the dictionary (sample) can be used; 4) Performance Good, and the model is small (fewer parameters).
在一具体实施方式中,该CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层由13个卷积层以及3个全链接层组成,用于对字符图像的空间特征进行提取;所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,以获得待训练识别的文本的上下文关系;所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。本实施例中的CRNN模型与现有的CRNN模型相比,新增了一个全连接FC层来加快训练的收敛速度。In a specific embodiment, the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, where the VGG16 layer is composed of 13 convolutional layers and 3 fully linked layers For extracting the spatial features of character images; the two long and short-term memory network LSTM layers are used to extract the temporal features of character images to obtain the contextual relationship of the text to be trained and recognized; the two are fully connected The FC layer is used to classify the extracted spatial and temporal features. Compared with the existing CRNN model, the CRNN model in this embodiment adds a fully connected FC layer to speed up the convergence of training.
步骤S506,将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。Step S506: Input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
在本实施例中,当用户需要识别字符时,只需要采集待识别字符的字符图像后,将该字符图像输入至字符识别模型中,该字符识别模型即可识别出该字符图像对应的字符。在本实施例中,该字符识别模型可以存储在本地的字符识别终端中,也可以存储在服务器中,具体根据用户的实际需要进行选择,在本实施例中不做限定。In this embodiment, when a user needs to recognize a character, he only needs to collect a character image of the character to be recognized, and then input the character image into a character recognition model, and the character recognition model can recognize the character corresponding to the character image. In this embodiment, the character recognition model may be stored in a local character recognition terminal or may be stored in a server, which is specifically selected according to the actual needs of the user, and is not limited in this embodiment.
通过上述步骤S500-S506,本申请所提出的字符识别方法,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。这样,可根据需要生成多样化的训练样本数据,以解决现有技术中由于训练样本的真实数据不均匀分布导致字符识别范围小以及准确率低的问题,增加字符识别范围以及增加字符识别准确率。Through the above steps S500-S506, the character recognition method proposed in this application acquires character data, and performs image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Random image perturbation processing to obtain different types of character images; input the different types of character images into a deep learning network for training to generate a character recognition model; and input character images to be recognized into the character recognition model , Output the recognition result of the character image to be recognized. In this way, a variety of training sample data can be generated as needed to solve the problem of small character recognition range and low accuracy due to uneven distribution of real data of training samples in the prior art, increasing the character recognition range and increasing character recognition accuracy .
参阅图5所示,是本申请字符识别方法第二实施例的实施流程示意图。在本实施例中,根据不同的需求,图5所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 5, it is a schematic diagram of an implementation process of the second embodiment of the character recognition method of the present application. In this embodiment, according to different requirements, the execution order of the steps in the flowchart shown in FIG. 5 may be changed, and some steps may be omitted.
步骤S600,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像。In step S600, character data is acquired, and each acquired character data is image synthesized with a preset background picture to obtain a character image corresponding to each character data.
步骤S602,对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像。Step S602, random disturbance processing is performed on the synthesized character image to obtain character images of different types.
步骤S604,将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型。Step S604, input the character images of different types into the deep learning network for training to generate a character recognition model.
步骤S606,将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。Step S606: Input the character image to be recognized into the character recognition model, and output the recognition result of the character image to be recognized.
上述步骤S600-S606与步骤S500-S506似,在本实施方式中不再赘述。The above steps S600-S606 are similar to the steps S500-S506, and will not be repeated in this embodiment.
步骤S608,测试所述字符识别模型对字符的识别准确率。Step S608: Test the character recognition accuracy of the character recognition model.
具体地,当生成字符识别模型之后,需要测试该字符识别模型对真实字符图像数据的识别准确率。Specifically, after the character recognition model is generated, the recognition accuracy of the character recognition model on real character image data needs to be tested.
在一实施例中,用户将若干个真实字符的字符图像输入至该字符识别模型中,输出该真实字符对应的识别结果,然后根据该输出的识别结果计算字符识别的准确率。可以理解的是,为了得到准确的字符识别率的计算结果,输入至字符识别模型中的真实字符数据的数量应该尽可能多。In one embodiment, the user inputs character images of several real characters into the character recognition model, outputs the recognition result corresponding to the real character, and then calculates the accuracy rate of character recognition according to the output recognition result. It can be understood that, in order to obtain accurate calculation results of the character recognition rate, the amount of real character data input into the character recognition model should be as much as possible.
在计算字符识别的准确率时,可以将该字符识别模型输出的识别结果与预存的该字符数据进行对比,从而确定该字符识别模型对该字符识别的真确与否,若确定该某个的字符数据识别正确,则可以计数累加1,直到所有的字符识别完成之后,将该计算累加值除以输入至字符识别模型中的字符个数以得到该字符识别模型对真实字符图像数据的识别准确率。When calculating the accuracy of character recognition, the recognition result output by the character recognition model can be compared with the pre-stored character data to determine whether the character recognition model is true or not for the character recognition. If the data recognition is correct, you can count and accumulate 1 until after all character recognition is completed, divide the calculated accumulated value by the number of characters input into the character recognition model to obtain the recognition accuracy rate of the character recognition model for real character image data .
步骤S610,若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。Step S610: If the recognition accuracy rate is lower than a preset threshold, adjust the character recognition model.
具体地,当获取到所述字符识别模型对字符的识别准确率之后,将该字符识别准确率与预设阈值进行比较,若该字符识别的准确率低于该预设阈值,则对该字符识别模型进行调整。在本实施例中,该预设阈值为预先设置的字符识别准确率的最低值,例如,该预设阈值为90%。该预设阈值可以根据用户的实际需求进行设定,该设定之后的预设阈值也可以进一步根据实际需求进行修改。Specifically, after the character recognition accuracy rate of the character recognition model is obtained, the character recognition accuracy rate is compared with a preset threshold, and if the character recognition accuracy rate is lower than the preset threshold, the character Identify the model and make adjustments. In this embodiment, the preset threshold is the lowest value of the character recognition accuracy rate set in advance, for example, the preset threshold is 90%. The preset threshold can be set according to the actual needs of the user, and the preset threshold after the setting can be further modified according to the actual needs.
需要说明的是,本实施例中对所述字符识别模型进行调整时,仅仅是对该字符识别模型进行微调,而不需要做太大的调整。It should be noted that when the character recognition model is adjusted in this embodiment, the character recognition model is only fine-tuned, and does not need to be adjusted too much.
具体地,所述对所述字符识别模型进行调整的步骤包括:Specifically, the step of adjusting the character recognition model includes:
步骤A、冻结所述VGG16层的参数。Step A. Freeze the parameters of the VGG16 layer.
在本实施例中,当对所述字符识别模型进行调整时,对该VGG16层的各个参数不进行改变,即对该VGG16层的参数进行冻结,以防止在对字符识别模型进行调整时,所述VGG16层的参数在训练样本数据的刺激下进行调整。In this embodiment, when the character recognition model is adjusted, the parameters of the VGG16 layer are not changed, that is, the parameters of the VGG16 layer are frozen to prevent the adjustment of the character recognition model. The parameters of the VGG16 layer are adjusted under the stimulation of the training sample data.
步骤B、对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整。Step B: Adjust the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers.
在本实施例中,当对所述字符识别模型进行调整时,对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整,具体地,通过放开两个长短期记忆网络LSTM层和两个全连接FC层的参数,并将学习率设置为每相隔若干个epoch进行衰减,直 到衰减到一个边界值。In this embodiment, when the character recognition model is adjusted, the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers are adjusted, specifically, by releasing the two The parameters of the long- and short-term memory network LSTM layer and two fully connected FC layers, and the learning rate is set to decay every several epochs until it decay to a boundary value.
步骤C、采用真实的字符图像数据对经过调整后的字符识别模型进行训练。Step C: Train the adjusted character recognition model using real character image data.
在本实施例中,在对对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整的同时,将真实的字符图像输入至进过调整参数后的字符识别模型中进行进一步地训练以得到调整后的字符识别模型。在得到调整后的字符识别模型之后,再采用测试模块105对该模型的识别准确率进行测试,若测试结果满足需求,则完成对该字符识别模型的训练;若采用测试模块105对该字符识别模型进行测试得到的测试结果仍不满足需求时,则重复执行步骤A-步骤C,直到得到的字符识别模型的识别准确率达到要求为止。In this embodiment, while adjusting the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers, the real character image is input to the character recognition after the adjustment parameters are entered The model is further trained to obtain an adjusted character recognition model. After obtaining the adjusted character recognition model, the test module 105 is used to test the recognition accuracy of the model. If the test result meets the requirements, the character recognition model training is completed; if the test module 105 is used to recognize the character When the test result obtained by the model test still does not meet the requirements, step A to step C are repeated until the recognition accuracy of the obtained character recognition model reaches the requirements.
通过上述步骤S600-S610,本申请所提出的字符识别方法,获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果;测试所述字符识别模型对字符的识别准确率;及若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。这样,通过在字符识别模型达不到预设的识别准确率时,对字符识别模型进行微调,从而提高字符识别的准确率。Through the above steps S600-S610, the character recognition method proposed in this application acquires character data, and performs image synthesis on each acquired character data with a preset background picture to obtain a character image corresponding to each character data; Random image perturbation processing to obtain different types of character images; input the different types of character images into the deep learning network for training to generate a character recognition model; input character images to be recognized into the character recognition model , Output the recognition result of the character image to be recognized; test the character recognition accuracy rate of the character recognition model; and if the recognition accuracy rate is lower than a preset threshold, adjust the character recognition model. In this way, by fine-tuning the character recognition model when the character recognition model does not reach the preset recognition accuracy, the accuracy of character recognition is improved.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The sequence numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台服务器(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or part of contributions to the existing technology, and the computer software products are stored in a storage medium (such as ROM / RAM, magnetic disk, The CD-ROM includes several instructions to enable a server (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and do not limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by the description and drawings of this application, or directly or indirectly used in other related technical fields The same reason is included in the patent protection scope of this application.

Claims (20)

  1. 一种字符识别方法,应用于服务器,其特征在于,所述方法包括:A character recognition method applied to a server, characterized in that the method includes:
    获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;Obtain character data, and synthesize each acquired character data with a preset background picture to obtain a character image corresponding to each character data;
    对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;Randomly disturb the synthesized character image to obtain different types of character images;
    将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及Input the different types of character images into the deep learning network for training to generate a character recognition model; and
    将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  2. 如权利要求1所述的字符识别方法,其特征在于,所述深度学习网络为CRNN模型,所述CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层用于对字符图像的空间特征进行提取,所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。The character recognition method according to claim 1, wherein the deep learning network is a CRNN model, and the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers, wherein , The VGG16 layer is used to extract the spatial features of character images, the two long and short-term memory network LSTM layers are used to extract the timing features of character images, and the two fully connected FC layers are used to extract Classify the spatial and temporal features of
  3. 如权利要求2所述的字符识别方法,其特征在于,所述将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型的步骤之后,还包括:The character recognition method according to claim 2, wherein after the step of inputting the character images of different types into a deep learning network for training to generate a character recognition model, the method further comprises:
    测试所述字符识别模型对字符的识别准确率;及Testing the character recognition accuracy of the character recognition model; and
    若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。If the recognition accuracy rate is lower than a preset threshold, the character recognition model is adjusted.
  4. 如权利要求3所述的字符识别方法,其特征在于,所述对所述字符识别模型进行调整的步骤包括:The character recognition method according to claim 3, wherein the step of adjusting the character recognition model includes:
    冻结所述VGG16层的参数;Freeze the parameters of the VGG16 layer;
    对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整;及Adjusting the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers; and
    采用真实的字符图像数据对经过调整后的字符识别模型进行训练。The real character image data is used to train the adjusted character recognition model.
  5. 如权利要求1所述的字符识别方法,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The character recognition method according to claim 1, wherein the random disturbance processing includes: Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, contrast change processing of the picture, and color change processing of the picture At least one.
  6. 如权利要求2所述的字符识别方法,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The character recognition method according to claim 2, wherein the random disturbance processing includes: Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, contrast change processing of the picture, and color change processing of the picture At least one.
  7. 如权利要求3所述的字符识别方法,其特征在于,所述随机 扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The character recognition method according to claim 3, wherein the random disturbance processing includes: Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, contrast change processing of the picture, and color change processing of the picture At least one.
  8. 如权利要求4所述的字符识别方法,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The character recognition method according to claim 4, wherein the random disturbance processing includes: Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the picture, contrast change processing of the picture, and color change processing of the picture At least one.
  9. 一种服务器,其特征在于,所述服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的字符识别系统,所述字符识别系统被所述处理器执行时实现如下步骤:A server, characterized in that the server includes a memory and a processor, and a character recognition system that can run on the processor is stored on the memory, and the character recognition system is implemented as follows when executed by the processor step:
    获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;Obtain character data, and synthesize each acquired character data with a preset background picture to obtain a character image corresponding to each character data;
    对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;Randomly disturb the synthesized character image to obtain different types of character images;
    将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及Input the different types of character images into the deep learning network for training to generate a character recognition model; and
    将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  10. 如权利要求9所述的服务器,其特征在于,所述深度学习网络为CRNN模型,所述CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层用于对字符图像的空间特征进行提取,所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。The server according to claim 9, wherein the deep learning network is a CRNN model, and the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers. The VGG16 layer is used to extract the spatial features of the character image, the two long and short-term memory network LSTM layers are used to extract the timing features of the character image, and the two fully connected FC layers are used to extract the space Features and timing features are classified.
  11. 如权利要求10所述的服务器,其特征在于,所述字符识别系统被所述处理器执行时,还实现如下步骤:The server according to claim 10, wherein when the character recognition system is executed by the processor, the following steps are further implemented:
    测试所述字符识别模型对字符的识别准确率;及Testing the character recognition accuracy of the character recognition model; and
    若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。If the recognition accuracy rate is lower than a preset threshold, the character recognition model is adjusted.
  12. 如权利要求11所述的服务器,其特征在于,所述对所述字符识别模型进行调整的步骤包括:The server according to claim 11, wherein the step of adjusting the character recognition model includes:
    冻结所述VGG16层的参数;Freeze the parameters of the VGG16 layer;
    对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整;及Adjusting the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers; and
    采用真实的字符图像数据对经过调整后的字符识别模型进行训练。The real character image data is used to train the adjusted character recognition model.
  13. 如权利要求9所述的服务器,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The server according to claim 9, wherein the random disturbance processing includes at least one of Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the image, contrast change processing of the image, and color change processing of the image Species.
  14. 如权利要求10所述的服务器,其特征在于,所述随机扰动 处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The server according to claim 10, wherein the random disturbance processing includes at least one of Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the image, contrast change processing of the image, and color change processing of the image Species.
  15. 如权利要求11所述的服务器,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The server according to claim 11, wherein the random disturbance processing includes at least one of Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the image, contrast change processing of the image, and color change processing of the image Species.
  16. 如权利要求12所述的服务器,其特征在于,所述随机扰动处理包括:高斯模糊处理、高斯噪声处理、图片的小幅度旋转处理、图片的对比度变化处理及图片的颜色变化处理中的至少一种。The server according to claim 12, wherein the random disturbance processing includes at least one of Gaussian blur processing, Gaussian noise processing, small-scale rotation processing of the image, contrast change processing of the image, and color change processing of the image Species.
  17. 一种计算机可读存储介质,所述计算机可读存储介质存储有字符识别系统,所述字符识别系统可被至少一个处理器执行,以使所述至少一个处理器执行如下步骤:A computer-readable storage medium storing a character recognition system, the character recognition system may be executed by at least one processor, so that the at least one processor performs the following steps:
    获取字符数据,并将获取到的各个字符数据与预设的背景图片进行图像合成以得到各个字符数据对应的字符图像;Obtain character data, and synthesize each acquired character data with a preset background picture to obtain a character image corresponding to each character data;
    对合成后的字符图像进行随机扰动处理以得到不同类型的字符图像;Randomly disturb the synthesized character image to obtain different types of character images;
    将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型;及Input the different types of character images into the deep learning network for training to generate a character recognition model; and
    将待识别字符图像输入至所述字符识别模型中,输出所述待识别字符图像的识别结果。The character image to be recognized is input into the character recognition model, and the recognition result of the character image to be recognized is output.
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述深度学习网络为CRNN模型,所述CRNN模型包括一个VGG16层、两个长短期记忆网络LSTM层以及两个全连接FC层,其中,所述VGG16层用于对字符图像的空间特征进行提取,所述两个长短期记忆网络LSTM层用于对字符图像的时序特征进行提取,所述两个全连接FC层用于对提取出的空间特征及时序特征进行分类。The computer-readable storage medium of claim 17, wherein the deep learning network is a CRNN model, and the CRNN model includes a VGG16 layer, two long and short-term memory network LSTM layers, and two fully connected FC layers , Where the VGG16 layer is used to extract the spatial features of the character image, the two long and short-term memory network LSTM layers are used to extract the timing features of the character image, and the two fully connected FC layers are used to The extracted spatial features and time series features are classified.
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,所述将所述不同类型的字符图像输入至深度学习网络中进行训练以生成字符识别模型的步骤之后,还包括:The computer-readable storage medium of claim 18, wherein after the step of inputting the character images of different types into a deep learning network for training to generate a character recognition model, further comprising:
    测试所述字符识别模型对字符的识别准确率;及Testing the character recognition accuracy of the character recognition model; and
    若所述识别准确率低于预设阈值,则对所述字符识别模型进行调整。If the recognition accuracy rate is lower than a preset threshold, the character recognition model is adjusted.
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,所述对所述字符识别模型进行调整的步骤包括:The computer-readable storage medium of claim 19, wherein the step of adjusting the character recognition model includes:
    冻结所述VGG16层的参数;Freeze the parameters of the VGG16 layer;
    对所述两个长短期记忆网络LSTM层以及所述两个全连接FC层的参数进行调整;及Adjusting the parameters of the two long-short-term memory network LSTM layers and the two fully connected FC layers; and
    采用真实的字符图像数据对经过调整后的字符识别模型进行训练。The real character image data is used to train the adjusted character recognition model.
PCT/CN2019/088638 2018-11-12 2019-05-27 Character recognition method, server, and computer readable storage medium WO2020098250A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811341729.X 2018-11-12
CN201811341729.XA CN109685100A (en) 2018-11-12 2018-11-12 Character identifying method, server and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2020098250A1 true WO2020098250A1 (en) 2020-05-22

Family

ID=66185317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088638 WO2020098250A1 (en) 2018-11-12 2019-05-27 Character recognition method, server, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN109685100A (en)
WO (1) WO2020098250A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287934A (en) * 2020-08-12 2021-01-29 北京京东尚科信息技术有限公司 Method and device for recognizing characters and obtaining character image feature extraction model
CN112287936A (en) * 2020-09-24 2021-01-29 深圳市智影医疗科技有限公司 Optical character recognition test method and device, readable storage medium and terminal equipment
CN112613572A (en) * 2020-12-30 2021-04-06 北京奇艺世纪科技有限公司 Sample data obtaining method and device, electronic equipment and storage medium
CN112906693A (en) * 2021-03-05 2021-06-04 杭州费尔斯通科技有限公司 Method for identifying subscript character and subscript character
CN113971806A (en) * 2021-10-26 2022-01-25 北京百度网讯科技有限公司 Model training method, character recognition method, device, equipment and storage medium
CN114495106A (en) * 2022-04-18 2022-05-13 电子科技大学 MOCR (metal-oxide-semiconductor resistor) deep learning method applied to DFB (distributed feedback) laser chip
CN114758339A (en) * 2022-06-15 2022-07-15 深圳思谋信息科技有限公司 Method and device for acquiring character recognition model, computer equipment and storage medium

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685100A (en) * 2018-11-12 2019-04-26 平安科技(深圳)有限公司 Character identifying method, server and computer readable storage medium
CN110135413B (en) * 2019-05-08 2021-08-17 达闼机器人有限公司 Method for generating character recognition image, electronic equipment and readable storage medium
CN110222693B (en) * 2019-06-03 2022-03-08 第四范式(北京)技术有限公司 Method and device for constructing character recognition model and recognizing characters
CN110348436A (en) * 2019-06-19 2019-10-18 平安普惠企业管理有限公司 Text information in image is carried out to know method for distinguishing and relevant device
CN110458184B (en) * 2019-06-26 2023-06-30 平安科技(深圳)有限公司 Optical character recognition assistance method, device, computer equipment and storage medium
CN110414520A (en) * 2019-06-28 2019-11-05 平安科技(深圳)有限公司 Universal character recognition methods, device, computer equipment and storage medium
CN110363290B (en) * 2019-07-19 2023-07-25 广东工业大学 Image recognition method, device and equipment based on hybrid neural network model
CN112287932A (en) * 2019-07-23 2021-01-29 上海高德威智能交通系统有限公司 Method, device and equipment for determining image quality and storage medium
CN110765442A (en) * 2019-09-30 2020-02-07 奇安信科技集团股份有限公司 Method and device for identifying verification code in verification picture and electronic equipment
US10990876B1 (en) 2019-10-08 2021-04-27 UiPath, Inc. Detecting user interface elements in robotic process automation using convolutional neural networks
US11157783B2 (en) 2019-12-02 2021-10-26 UiPath, Inc. Training optical character detection and recognition models for robotic process automation
CN113221601A (en) 2020-01-21 2021-08-06 深圳富泰宏精密工业有限公司 Character recognition method, device and computer readable storage medium
CN113378118B (en) * 2020-03-10 2023-08-22 百度在线网络技术(北京)有限公司 Method, apparatus, electronic device and computer storage medium for processing image data
CN111414844B (en) * 2020-03-17 2023-08-29 北京航天自动控制研究所 Container number identification method based on convolutional neural network
CN112052852B (en) * 2020-09-09 2023-12-29 国家气象信息中心 Character recognition method of handwriting meteorological archive data based on deep learning
CN112215221A (en) * 2020-09-22 2021-01-12 国交空间信息技术(北京)有限公司 Automatic vehicle frame number identification method
CN113012265A (en) * 2021-04-22 2021-06-22 中国平安人寿保险股份有限公司 Needle printing character image generation method and device, computer equipment and medium
CN113239854B (en) * 2021-05-27 2023-12-19 北京环境特性研究所 Ship identity recognition method and system based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100054599A1 (en) * 2008-08-26 2010-03-04 Fuji Xerox Co., Ltd. Document processing apparatus, document processing method, and computer readable medium
CN107392221A (en) * 2017-06-05 2017-11-24 天方创新(北京)信息技术有限公司 The method and device of the training method of disaggregated model, OCR recognition results of classifying
CN108564103A (en) * 2018-01-09 2018-09-21 众安信息技术服务有限公司 Data processing method and device
CN108596180A (en) * 2018-04-09 2018-09-28 深圳市腾讯网络信息技术有限公司 Parameter identification, the training method of parameter identification model and device in image
CN109685100A (en) * 2018-11-12 2019-04-26 平安科技(深圳)有限公司 Character identifying method, server and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4189506B2 (en) * 2000-06-09 2008-12-03 コニカミノルタビジネステクノロジーズ株式会社 Apparatus, method and recording medium for image processing
JP6078953B2 (en) * 2012-02-17 2017-02-15 オムロン株式会社 Character recognition method, and character recognition apparatus and program using this method
CN107273896A (en) * 2017-06-15 2017-10-20 浙江南自智能科技股份有限公司 A kind of car plate detection recognition methods based on image recognition
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100054599A1 (en) * 2008-08-26 2010-03-04 Fuji Xerox Co., Ltd. Document processing apparatus, document processing method, and computer readable medium
CN107392221A (en) * 2017-06-05 2017-11-24 天方创新(北京)信息技术有限公司 The method and device of the training method of disaggregated model, OCR recognition results of classifying
CN108564103A (en) * 2018-01-09 2018-09-21 众安信息技术服务有限公司 Data processing method and device
CN108596180A (en) * 2018-04-09 2018-09-28 深圳市腾讯网络信息技术有限公司 Parameter identification, the training method of parameter identification model and device in image
CN109685100A (en) * 2018-11-12 2019-04-26 平安科技(深圳)有限公司 Character identifying method, server and computer readable storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287934A (en) * 2020-08-12 2021-01-29 北京京东尚科信息技术有限公司 Method and device for recognizing characters and obtaining character image feature extraction model
CN112287936A (en) * 2020-09-24 2021-01-29 深圳市智影医疗科技有限公司 Optical character recognition test method and device, readable storage medium and terminal equipment
CN112613572A (en) * 2020-12-30 2021-04-06 北京奇艺世纪科技有限公司 Sample data obtaining method and device, electronic equipment and storage medium
CN112613572B (en) * 2020-12-30 2024-01-23 北京奇艺世纪科技有限公司 Sample data obtaining method and device, electronic equipment and storage medium
CN112906693A (en) * 2021-03-05 2021-06-04 杭州费尔斯通科技有限公司 Method for identifying subscript character and subscript character
CN112906693B (en) * 2021-03-05 2022-06-24 杭州费尔斯通科技有限公司 Method for identifying subscript character and subscript character
CN113971806A (en) * 2021-10-26 2022-01-25 北京百度网讯科技有限公司 Model training method, character recognition method, device, equipment and storage medium
CN113971806B (en) * 2021-10-26 2023-05-05 北京百度网讯科技有限公司 Model training and character recognition method, device, equipment and storage medium
CN114495106A (en) * 2022-04-18 2022-05-13 电子科技大学 MOCR (metal-oxide-semiconductor resistor) deep learning method applied to DFB (distributed feedback) laser chip
CN114758339A (en) * 2022-06-15 2022-07-15 深圳思谋信息科技有限公司 Method and device for acquiring character recognition model, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109685100A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
WO2020098250A1 (en) Character recognition method, server, and computer readable storage medium
CN107977633B (en) Age recognition methods, device and the storage medium of facial image
WO2022161286A1 (en) Image detection method, model training method, device, medium, and program product
CN108229591B (en) Neural network adaptive training method and apparatus, device, program, and storage medium
CN106778928B (en) Image processing method and device
CN112052781A (en) Feature extraction model training method, face recognition device, face recognition equipment and medium
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
CN109871845B (en) Certificate image extraction method and terminal equipment
CN109413510B (en) Video abstract generation method and device, electronic equipment and computer storage medium
CN111046879B (en) Certificate image classification method, device, computer equipment and readable storage medium
Lu et al. Robust blur kernel estimation for license plate images from fast moving vehicles
CN112101359B (en) Text formula positioning method, model training method and related device
CN111079816A (en) Image auditing method and device and server
CN109377494A (en) A kind of semantic segmentation method and apparatus for image
CN110874574A (en) Pedestrian re-identification method and device, computer equipment and readable storage medium
CN112633221A (en) Face direction detection method and related device
CN110826534A (en) Face key point detection method and system based on local principal component analysis
CN114663726A (en) Training method of target type detection model, target detection method and electronic equipment
CN110163910B (en) Object positioning method, device, computer equipment and storage medium
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN113838076A (en) Method and device for labeling object contour in target image and storage medium
CN113177543B (en) Certificate identification method, device, equipment and storage medium
CN113255700B (en) Image feature map processing method and device, storage medium and terminal
CN113158773B (en) Training method and training device for living body detection model
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19884229

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 20.08.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19884229

Country of ref document: EP

Kind code of ref document: A1