WO2023173546A1 - Method and apparatus for training text recognition model, and computer device and storage medium - Google Patents

Method and apparatus for training text recognition model, and computer device and storage medium Download PDF

Info

Publication number
WO2023173546A1
WO2023173546A1 PCT/CN2022/090160 CN2022090160W WO2023173546A1 WO 2023173546 A1 WO2023173546 A1 WO 2023173546A1 CN 2022090160 W CN2022090160 W CN 2022090160W WO 2023173546 A1 WO2023173546 A1 WO 2023173546A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
image
text
neural network
images
Prior art date
Application number
PCT/CN2022/090160
Other languages
French (fr)
Chinese (zh)
Inventor
郑喜民
朱翌
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173546A1 publication Critical patent/WO2023173546A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the technical field of natural language processing of artificial intelligence technology. Specifically, this application relates to a training method, device, computer equipment and storage medium for a text recognition model.
  • Text recognition requires certain image processing to identify the text content in the image.
  • Text recognition can be used in many fields, such as sorting letters and packages, editing and proofreading of manuscripts, summarizing and analyzing a large number of statistical reports and cards, processing bank checks, statistical summarization of commodity invoices, identification of commodity codes, commodity warehouses Management, document retrieval, identification of various documents and office automation of financial bill processing, etc., facilitate users to quickly enter information and improve work efficiency in all walks of life.
  • CRNN ConvolutionalRecurrentNeural Network
  • This model first Use Convolutional Neural Networks (CNN) to extract feature sequences from the input image, then use Recurrent Neural Networks (RNN) to predict the label distribution of the feature sequences obtained from the convolutional layer, and finally introduce connectionist time series Classification (Connectionist temporal classification, CTC) converts the label distribution obtained from the loop layer into the final recognition result through operations such as deduplication and integration.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • CTC Connectionist time series Classification
  • the performance of the convolutional neural network is highly dependent on the training data. When the training data is diverse The more characteristics and the larger the amount of data, the better the performance of the trained model will be. However, when the amount of training data is smaller, the recognition accuracy of the trained text recognition model will be lower.
  • the main purpose of this application is to provide a training method, device, computer equipment and storage medium for a text recognition model, so as to increase the amount of training data and thereby improve the recognition accuracy of the text recognition model.
  • this application provides a training method for a text recognition model, which includes:
  • This application also provides a training device for a text recognition model, which includes:
  • the acquisition module is used to acquire the first image containing text information
  • An amplification processing module used to perform random amplification processing on the first image to obtain multiple second images
  • a marking module used to mark the first image and the plurality of second images as reference images
  • a calculation module used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images
  • An input module configured to use the two reference images with a similarity greater than a preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
  • a judgment module used to obtain the training results after the neural network model is trained, and judge whether the training results meet the requirements
  • a determination module configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
  • This application also provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements a training method for a text recognition model, wherein the method includes the following step:
  • This application also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program When executed by a processor, it implements a training method for a text recognition model, wherein the method includes the following: step:
  • the training method, device, computer equipment and storage medium of a text recognition model provided by this application can improve the recognition accuracy of the text recognition model.
  • Figure 1 is a schematic flowchart of a training method for a text recognition model according to an embodiment of the present application
  • Figure 2 is a schematic structural block diagram of a text recognition model training device according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • This application proposes a training method for a text recognition model.
  • the embodiments of this application can acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, etc.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the text recognition model training method proposed in this application uses a server as the execution subject.
  • the server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • cloud services cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications.
  • middleware services domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • the training method of the text recognition model includes:
  • the object recognized by the text recognition model of the present application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image.
  • the first image with text information obtained in this embodiment can be an image uploaded by the user.
  • the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
  • the first image after acquiring the first image containing text information, can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image.
  • the first image is usually in color and has multiple colors.
  • the character color of the text information is mostly a color with a relatively dark brightness value.
  • the first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
  • the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image.
  • the brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion.
  • Each character of the text information in the first image is extracted.
  • the preset color brightness value can be adjusted as needed.
  • the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
  • random amplification is a method of expanding data. Random amplification can increase the number of samples in the training set, which can effectively alleviate the overfitting of the model and can also bring stronger performance to the model.
  • Generalization The purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy.
  • random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
  • This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image.
  • a random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images.
  • the image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
  • this embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, and then obtains the text of the text information in each reference image from the data set.
  • Features calculate the similarity of text features of each two reference images. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained.
  • the coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images.
  • the vector space model represents the word features of each two reference images as word vectors.
  • the cosine distance algorithm the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
  • the text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area.
  • the position information of the virtual text box in the entire reference image is used as the text position information of the text information.
  • the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, and the reference image pair is used as a training data, and use reference images to train the input neural network model, so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model.
  • the preset similarity threshold can be customized, for example, set to 0.9.
  • this application also considers the possibility of introducing a blockchain structure and making full use of the relevant characteristics of the blockchain (for example, the data on the blockchain cannot be tampered with, etc.), and uploads the training data to the blockchain before training. Certificate storage; during the training process, the associated data during the training process is uploaded to the blockchain for certificate storage, so that if necessary later, the triggered supervision server can obtain and trace back based on the relevant data saved on the blockchain , to reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
  • this embodiment can set the iteration conditions of the neural network model.
  • the iteration conditions include the number of training times or training duration, etc.
  • the training ends, and the above-mentioned iteration conditions are obtained.
  • the training results after the training of the neural network model are used to determine whether the training results meet the requirements.
  • the trained neural network model is used as a text recognition model to identify text information in the image.
  • the training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair.
  • This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
  • the training method of a text recognition model obtaineds a first image containing text information, performs random amplification processing on the first image to obtain multiple second images, and combines the first image and multiple second images. Mark it as a reference image, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and use the two reference images with a similarity greater than the preset similarity threshold as a reference image pair.
  • the input neural network model is trained with reference to the image, the training results after the training of the neural network model are obtained, and it is judged whether the training results meet the requirements.
  • the trained neural network model is used as the text recognition model to Through the processing method of data amplification, the amount of training data is increased, thereby improving the recognition accuracy of the text recognition model; and the neural network model is trained by using two reference images with high similarity, so that the trained text
  • the recognition model can combine the correlation between training data to further improve the recognition accuracy of the text recognition model.
  • determining whether the training results meet the requirements may specifically include:
  • the preset cross entropy loss function can be used to calculate the loss value of the neural network model after each training is completed, and when the loss value meets the preset threshold or is less than the preset
  • the loss value is set, that is, the training results of the neural network model meet the requirements, it means that the neural network model meets the training requirements, and the training of the text recognition model is completed to improve the text recognition accuracy of the text recognition model.
  • the cross-entropy loss function is used to evaluate the degree to which the predicted value of the text recognition model is different from the true value. The better the loss function, the better the performance of the text recognition model.
  • cross entropy loss function is often used in classification problems, especially when neural networks do classification problems.
  • Cross entropy is also often used as a loss function. Since cross entropy involves calculating the probability of each category, cross entropy is used almost every time. All appear together with the sigmoid (or softmax) function.
  • the loss function in this embodiment is not specifically limited. For example, it can be a mean square error function, a covariance function, etc.
  • the preset loss value in this embodiment can be determined according to the actual situation, and the preset loss value is different from the corresponding loss threshold when the text recognition model is finally trained. Generally, the preset loss value here is greater than the final training of the text recognition model. Hershey's corresponding loss threshold. For example, the corresponding loss threshold when the text recognition model is finally trained is 0.002. The preset loss value here should be larger than 0.002, for example, it can be 0.005.
  • the method further includes:
  • the loss value of the text recognition model is not less than the preset loss value
  • forward transmission can be carried out in the neural network structure of the text recognition model according to the loss value
  • the relevant parameters of the text recognition model can be adjusted
  • the reference image pair input can be reset to the relevant parameters.
  • the text recognition model of the parameters is retrained until the loss value of the text recognition model is less than the preset loss value.
  • the text recognition model training ends, and a text recognition model whose training results meet the requirements is obtained to obtain a trained text recognition model.
  • the method may further include:
  • the target image is input into the text recognition model to obtain text information of the target image.
  • This embodiment obtains the target image to be recognized, inputs the target image into the text recognition model, and obtains the text information of the target image with the help of the text recognition model output.
  • the target image to be recognized may be a text image uploaded by the user, or may be a text image collected directly through a camera by an electronic device that performs the text recognition method.
  • the acquisition method of the target image to be recognized is not limited here. Since the text recognition model of this application does not require sample labeling, the text recognition model can be obtained at a lower cost. The cost of directly using the text recognition model for text recognition is also low. In addition, since the text recognition model does not need to be trained during training Sample labeling and recognition accuracy will no longer be affected by the sample labeling method, and will no longer be limited by the number of training samples. The recognition accuracy and reliability of the model obtained after training with a large number of training samples are higher, so using The text recognition model trained by this application can accurately identify the text information of the target image.
  • calculating the similarity of text features of each two reference images may specifically include:
  • a common method is to calculate the cosine distance between text features.
  • Cosine distance can reflect the difference between two vectors in space, aggregate two similar semantic relationships to complete the aggregation of all semantic relationships, and filter out the most aggregated semantic relationships as the semantic recognition result of text features, such as When most semantic relations are gathered in area A, the semantic relations closest to the center of area A are selected from area A as the semantic recognition result.
  • the Word2Vec word vector model can be used to convert the text features of each reference image into word vectors to obtain the text vector of each reference image. Then, the cosine distance of the text vectors of each two reference images is calculated, and the cosine distance is calculated. distance as the similarity.
  • the Word2Vec word vector model is a model that learns semantic knowledge from a large amount of text in an unsupervised manner. It trains a large amount of text and represents the words in the text in the form of vectors. This vector is called a word vector. We can calculate the distance between the word vectors of two words to learn the connection between the two words.
  • training the input neural network model with the reference image may specifically include:
  • Determining whether the training results meet the requirements includes:
  • the trained neural network model is verified according to the verification image. If the verification result does not meet the preset iteration stop conditions, it is determined that the training result does not meet the requirements.
  • one reference image can be randomly selected from the reference image pair as the training image, and the other reference image in the reference image pair can be used as the verification image.
  • the training image can be used to train the neural network model.
  • the verification results may include that the predicted similarity is the same as or different from the similarity of the corresponding text features. For example, the similarity between the text information of the training image output by the neural network model and the text information of the verification image output by the neural network model may be calculated, Obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, use the trained neural network model as a text recognition model to accurately identify text information in the image.
  • the random amplification process on the first image to obtain multiple second images may specifically include:
  • the random amplification processing method includes but is not limited to flipping, translating, scaling the image, adjusting the weight of each RGB channel of the image, and rotating the image.
  • the first image can be flipped, and then the flipped first image can be enlarged to obtain a second image.
  • an embodiment of the present application also provides a training device for a text recognition model, including:
  • Acquisition module 11 used to acquire the first image containing text information
  • the amplification processing module 12 is used to perform random amplification processing on the first image to obtain multiple second images;
  • Marking module 13 used to mark the first image and multiple second images as reference images
  • the calculation module 14 is used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
  • the input module 15 is used to use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
  • the judgment module 16 is used to obtain the training results after training the neural network model and judge whether the training results meet the requirements
  • the determination module 17 is configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
  • the object recognized by the text recognition model of this application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image.
  • the first image with text information obtained in this embodiment can be an image uploaded by the user.
  • the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
  • the first image after acquiring the first image containing text information, can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image.
  • the first image is usually in color and has multiple colors.
  • the character color of the text information is mostly a color with a relatively dark brightness value.
  • the first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
  • the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image.
  • the brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion.
  • Each character of the text information in the first image is extracted.
  • the preset color brightness value can be adjusted as needed.
  • the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
  • random amplification processing is a method of expanding data. Random amplification processing can increase the number of samples in the training set, can effectively alleviate the over-fitting of the model, and can also bring stronger generalization to the model. ization ability.
  • the purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy.
  • random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
  • This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image.
  • a random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images.
  • the image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
  • This embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, then obtains the text features of the text information in each reference image from the data set, and calculates the text features of each two reference images. Similarity of text features. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained. The coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images. According to the The vector space model represents the word features of each two reference images as word vectors. According to the cosine distance algorithm, the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
  • the text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area.
  • the position information of the virtual text box in the entire reference image is used as the text position information of the text information.
  • the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, the reference image pair is used as training data, and the reference image is
  • the input neural network model is trained so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model.
  • the preset similarity threshold can be customized, for example, set to 0.9.
  • this application can also introduce a blockchain structure and make full use of the relevant characteristics of the blockchain (such as the data on the blockchain cannot be tampered with, etc.).
  • the training data can be uploaded to the blockchain for storage. ;
  • the associated data during the training process is uploaded to the blockchain for storage, so that if necessary, the triggered supervision server can obtain and trace back based on the relevant data stored on the blockchain to Reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
  • This embodiment can set the iteration conditions of the neural network model.
  • the iteration conditions include the number of training times or the training duration, etc.
  • the training is ended.
  • the training results after training of the neural network model are obtained.
  • the trained neural network model is used as a text recognition model to identify text information in the image.
  • the training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair.
  • This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
  • each component of the text recognition model training device proposed in this application can implement the functions of any of the above text recognition model training methods, and the specific structure will not be described again.
  • an embodiment of the present application also provides a computer device, the internal structure of which can be shown in Figure 3.
  • the computer device includes a processor, memory, network interface, and database connected through a system bus.
  • the processor designed by the computer is used to provide computing and control capabilities.
  • the memory of the computer device includes storage media and internal memory.
  • the storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operating system and computer programs in the storage medium to run.
  • the database of the computer device is used to store data related to the training method of the text recognition model.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer program when executed by the processor, implements a method for training a text recognition model.
  • the above-mentioned processor executes the above-mentioned text recognition model training method, including the following steps:
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. When the computer program is executed by a processor, Implement a training method for text recognition models, which includes the following steps:
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), expanded SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SDRAM dual-speed data rate SDRAM
  • ESDRAM expanded SDRAM
  • SLDRAM Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • this application provides a training method, device, computer equipment and storage medium for a text recognition model, which acquires a first image containing text information, performs random amplification processing on the first image, and obtains multiple second images.
  • image mark the first image and multiple second images as reference images, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and set the similarity to be greater than the preset similarity
  • the two reference images of the threshold are used as a reference image pair, the reference image pair is input to the neural network model for training, the training results after the neural network model training are obtained, and it is judged whether the training results meet the requirements.
  • the The trained neural network model is used as a text recognition model.
  • the data volume of the training data is increased through data amplification, and the neural network model is trained by using two reference images with high similarity, so that the training can obtain
  • the text recognition model can combine the correlation between training data, thereby improving the recognition accuracy of the text recognition model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The present application relates to the technical field of natural language processing of artificial intelligence technology. Provided in the present application are a method and apparatus for training a text recognition model, and a computer device and a storage medium. The method comprises: performing random augmentation processing on a first image, so as to obtain a plurality of second images; marking the first image and the plurality of second images as reference images; acquiring a text feature of text information in each reference image, and calculating the similarity of the text features of every two reference images; taking two reference images, the similarity between which is greater than a preset similarity threshold, as a reference image pair, and inputting the reference image pair into a neural network model for training; acquiring a training result after the neural network model is trained, and determining whether the training result meets a requirement; and if so, taking the trained neural network model as a text recognition model. In this way, the data volume of training data is increased by means of a data augmentation processing mode, such that the recognition accuracy of a text recognition model is improved.

Description

文本识别模型的训练方法、装置、计算机设备及存储介质Training method, device, computer equipment and storage medium for text recognition model
本申请要求于2022年3月15日提交中国专利局、申请号为202210253870.4,发明名称为“文本识别模型的训练方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on March 15, 2022, with application number 202210253870.4, and the invention name is "Training method, device, computer equipment and storage medium for text recognition model", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及人工智能技术的自然语言处理技术领域,具体而言,本申请涉及一种文本识别模型的训练方法、装置、计算机设备及存储介质。This application relates to the technical field of natural language processing of artificial intelligence technology. Specifically, this application relates to a training method, device, computer equipment and storage medium for a text recognition model.
背景技术Background technique
文本识别任务要求通过一定的图像处理来识别图像中的文本内容。文本识别可应用于许多领域,如信件和包裹的分拣、稿件的编辑和校对、大量统计报表和卡片的汇总与分析、银行支票的处理、商品发票的统计汇总、商品编码的识别、商品仓库的管理、文档检索、各类证件识别和财务票据处理的办公自动化等,方便用户快速录入信息,提高各行各业的工作效率。The text recognition task requires certain image processing to identify the text content in the image. Text recognition can be used in many fields, such as sorting letters and packages, editing and proofreading of manuscripts, summarizing and analyzing a large number of statistical reports and cards, processing bank checks, statistical summarization of commodity invoices, identification of commodity codes, commodity warehouses Management, document retrieval, identification of various documents and office automation of financial bill processing, etc., facilitate users to quickly enter information and improve work efficiency in all walks of life.
发明人发现,目前的文本识别方法,常用深度学习方式,进行不分割地端到端处理,目前效果较好且比较常用的算法模型是CRNN(ConvolutionalRecurrentNeural Network,卷积循环神经网络),该模型首先使用卷积神经网络(Convolutional NeuralNetworks,CNN)从输入图像中提取出特征序列,然后使用循环神经网络(Recurrent Neural Networks,RNN)预测从卷积层获取的特征序列的标签分布,最后引入联结主义时序分类(Connectionist temporal classification,CTC)把从循环层获取的标签分布通过去重、整合等操作转换成最终的识别结果,而卷积神经网络的性能对训练数据的依赖性很高,当训练数据多样性越多,数据量越大时,训练得到的模型性能往往更好,但是当训练数据的数据量较少时,则训练得到的文本识别模型的识别准确率较低。The inventor found that the current text recognition method commonly uses deep learning methods to perform end-to-end processing without segmentation. The algorithm model that currently has better results and is more commonly used is CRNN (ConvolutionalRecurrentNeural Network). This model first Use Convolutional Neural Networks (CNN) to extract feature sequences from the input image, then use Recurrent Neural Networks (RNN) to predict the label distribution of the feature sequences obtained from the convolutional layer, and finally introduce connectionist time series Classification (Connectionist temporal classification, CTC) converts the label distribution obtained from the loop layer into the final recognition result through operations such as deduplication and integration. The performance of the convolutional neural network is highly dependent on the training data. When the training data is diverse The more characteristics and the larger the amount of data, the better the performance of the trained model will be. However, when the amount of training data is smaller, the recognition accuracy of the trained text recognition model will be lower.
技术问题technical problem
本申请的主要目的为提供一种文本识别模型的训练方法、装置、计算机设备及存储介质,以提高训练数据的数据量,进而提高文本识别模型的识别准确率。The main purpose of this application is to provide a training method, device, computer equipment and storage medium for a text recognition model, so as to increase the amount of training data and thereby improve the recognition accuracy of the text recognition model.
技术解决方案Technical solutions
为了实现上述发明目的,本申请提供一种文本识别模型的训练方法,其包括:In order to achieve the above-mentioned object of the invention, this application provides a training method for a text recognition model, which includes:
获取含有文本信息的第一图像;Obtain the first image containing text information;
对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
本申请还提供一种文本识别模型的训练装置,其包括:This application also provides a training device for a text recognition model, which includes:
获取模块,用于获取含有文本信息的第一图像;The acquisition module is used to acquire the first image containing text information;
扩增处理模块,用于对所述第一图像进行随机扩增处理,得到多张第二图像;An amplification processing module, used to perform random amplification processing on the first image to obtain multiple second images;
标记模块,用于将所述第一图像和多张第二图像标记为参考图像;A marking module, used to mark the first image and the plurality of second images as reference images;
计算模块,用于获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;A calculation module, used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
输入模块,用于将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;An input module, configured to use the two reference images with a similarity greater than a preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
判断模块,用于获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;A judgment module, used to obtain the training results after the neural network model is trained, and judge whether the training results meet the requirements;
判定模块,用于在判定所述训练结果满足要求时,将训练后的所述神经网络模型作为文本识别模型。A determination module, configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种文本识别模型的训练方法,其中,所述方法包括以下步骤:This application also provides a computer device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements a training method for a text recognition model, wherein the method includes the following step:
获取含有文本信息的第一图像;Obtain the first image containing text information;
对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现一种文本识别模型的训练方法,其中,所述方法包括以下步骤:This application also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, it implements a training method for a text recognition model, wherein the method includes the following: step:
获取含有文本信息的第一图像;Obtain the first image containing text information;
对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
有益效果beneficial effects
本申请所提供的一种文本识别模型的训练方法、装置、计算机设备及存储介质,可提高文本识别模型的识别准确率。The training method, device, computer equipment and storage medium of a text recognition model provided by this application can improve the recognition accuracy of the text recognition model.
附图说明Description of the drawings
图1为本申请一实施例的文本识别模型的训练方法的流程示意图;Figure 1 is a schematic flowchart of a training method for a text recognition model according to an embodiment of the present application;
图2为本申请一实施例的文本识别模型的训练装置的结构示意框图;Figure 2 is a schematic structural block diagram of a text recognition model training device according to an embodiment of the present application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
本发明的最佳实施方式Best Mode of Carrying Out the Invention
本申请提出一种文本识别模型的训练方法,本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。This application proposes a training method for a text recognition model. The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等方向。Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, etc. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请提出的一种文本识别模型的训练方法,以服务器为执行主体,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The text recognition model training method proposed in this application uses a server as the execution subject. The server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
该文本识别模型的训练方法用于解决在训练数据的数据量较少时,则训练得到的文本识别模型的识别准确率较低的技术问题。参考图1,其中一个实施例中,该文本识别模型的训练方法包括:This text recognition model training method is used to solve the technical problem that when the amount of training data is small, the recognition accuracy of the trained text recognition model is low. Referring to Figure 1, in one embodiment, the training method of the text recognition model includes:
S11、获取含有文本信息的第一图像;S11. Obtain the first image containing text information;
S12、对所述第一图像进行随机扩增处理,得到多张第二图像;S12. Perform random amplification processing on the first image to obtain multiple second images;
S13、将所述第一图像和多张第二图像标记为参考图像;S13. Mark the first image and multiple second images as reference images;
S14、获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;S14. Obtain the text features of the text information in each reference image, and calculate the similarity of the text features of each two reference images;
S15、将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;S15. Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
S16、获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;S16. Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
S17、若是,将训练后的所述神经网络模型作为文本识别模型。S17. If yes, use the trained neural network model as a text recognition model.
如上述步骤S11所述,本申请的文本识别模型所识别的对象为含有文本信息的图像,对图像中的文本信息进行识别,实现图像的文本识别功能。本实施例获取的具有文本信息的第一图像可以是用户上传的图像,如用户可通过对具有文本信息的纸质或者其他介质文档进行扫描获得,也可以为截取手机屏幕内容的截屏图像等等。As described in step S11 above, the object recognized by the text recognition model of the present application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image. The first image with text information obtained in this embodiment can be an image uploaded by the user. For example, the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
在一实施例中,在获取到含有文本信息的第一图像后,还可对第一图像进行预处理,如调整第一图像的图像尺寸、亮度、清晰度等等图像参数。此外,通常的第一图像为彩色,具有多种颜色,文本信息的字符颜色多为亮度值比较暗的颜色,为利于将第一图像中的文本信息的每个字符提取出来,还可以设定亮度值为标准对第一图像进行二值化处理,将第一图像转换为黑白图像,以凸显第一图像中的文本信息,避免第一图像中的颜色干扰。In one embodiment, after acquiring the first image containing text information, the first image can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image. In addition, the first image is usually in color and has multiple colors. The character color of the text information is mostly a color with a relatively dark brightness value. In order to facilitate the extraction of each character of the text information in the first image, it is also possible to set The first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
具体的,服务器获取第一图像中的颜色亮度值,将第一图像中的颜色亮度值与预设颜色亮度值进行比对,得到比对结果,该比对结果中包含第一图像中的颜色亮度值大于、等于或小于预设颜色亮度值;根据比对结果,将第一图像中的颜色亮度值大于预设颜色亮度值的第一图像转换为白色,反之则转换为黑色,以利于将第一图像中的文本信息的每个字符提取出来。其中,预设颜色亮度值可根据需要可进行调整。Specifically, the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image. The brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion. Each character of the text information in the first image is extracted. Among them, the preset color brightness value can be adjusted as needed.
在一实施例中,当检测到第一图像的背景为黑色、文本信息为白色时, 即黑底白字的情况。为避免影响文本信息的识别,本实施例的服务器还可对第一图像的背景颜色进行判断,将背景颜色为黑色、文本信息为白色的第一图像转换为背景颜色为白色、文本信息为黑色的图像,即将黑底白字的图像转换为白底黑字的图像。In one embodiment, when it is detected that the background of the first image is black and the text information is white, it is a situation of white text on a black background. In order to avoid affecting the recognition of text information, the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
如上述步骤S12所述,随机扩增处理是对数据进行扩充的方法,随机扩增处理可以增加训练集的样本,可以有效缓解模型过拟合的情况,也可以给模型带来的更强的泛化能力。随机扩增处理的目的就是使得训练数据尽可能的接近测试数据,从而提高预测精度。另外随机扩增处理可以迫使网络学习到更鲁棒性的特征,从而使模型拥有更强的泛化能力。As mentioned in step S12 above, random amplification is a method of expanding data. Random amplification can increase the number of samples in the training set, which can effectively alleviate the overfitting of the model and can also bring stronger performance to the model. Generalization. The purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy. In addition, random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
本实施例对第一图像进行随机扩增处理,如对第一图像进行放大、缩小、裁剪、亮度调整、饱和度调整等等随机扩增处理方式,可以采用一种随机扩增处理方式,还可以是多种随机扩增处理方式结合使用,最终得到多张第二图像。本实施例的图像扩增技术对深度学习中目标检测有着积极的作用,它能够增加各类别的数据量,使各类别保持平衡,避免样本不均衡带来的过拟合问题,同时也能够在一定程度上减少前期样本采集的数据量。This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image. A random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images. The image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
如上述步骤S13-S14所述,本实施例将第一图像和多张第二图像标记为参考图像,生成包括所有参考图像的数据集,然后从数据集中获取每张参考图像中文本信息的文本特征,计算每两张参考图像的文本特征的相似度。具体的,可以在参考图像中识别出文本信息的文本位置信息,根据文本位置信息,对参考图像进行校正,得到校正后的参考图像,采用识别模型的编码网络对校正后的参考图像的文本信息进行特征提取,得到文本特征,然后根据每两张参考图像的文本特征中所包含的词特征,分别构建用于计算每两张参考图像的文本特征之间相似度的向量空间模型,根据所述向量空间模型,将每两张参考图像的词特征表示为词向量,按照余弦距离算法,计算出每两张参考图像的词向量之间夹角的余弦值,将该余弦值作为每两张参考图像的文本特征的相似度。As described in steps S13-S14 above, this embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, and then obtains the text of the text information in each reference image from the data set. Features, calculate the similarity of text features of each two reference images. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained. The coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images. According to the The vector space model represents the word features of each two reference images as word vectors. According to the cosine distance algorithm, the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
其中,文本位置信息可以为包括文本信息的文本框在参考图像中的位置信息。例如,在参考图像识别出包含文本信息的文本区域,获取该文本区域的位置信息,作为文本信息的文本位置信息,比如,在参考图像中识别出包含内容的文本区域,计算这个文本区域对应的虚拟文本框在整个参考图像中的位置信息,将该位置信息作为文本信息的文本位置信息。The text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area. The position information of the virtual text box in the entire reference image is used as the text position information of the text information.
如上述步骤S15所述,本实施例根据计算得到的每两张参考图像的文本特征的相似度,将相似度大于预设相似度阈值的两张参考图像作为参考图像对,将参考图像对作为训练数据,并将参考图像对输入神经网络模型进行训练,以使训练得到的文本识别模型能结合训练数据之间的关联性,提高文本识别模型的识别准确率。其中,预设相似度阈值可自定义设置,如设置为0.9。As described in step S15 above, in this embodiment, based on the calculated similarity of the text features of each two reference images, the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, and the reference image pair is used as a training data, and use reference images to train the input neural network model, so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model. Among them, the preset similarity threshold can be customized, for example, set to 0.9.
同时,本申请还考虑可以引入区块链结构,并充分利用区块链的相关 特性(例如,区块链上的数据不可篡改等),在训练之前,将训练数据上传到区块链上进行存证;在训练过程中,将训练过程中的关联数据上传至区块链上进行存证,这样后续如果需要,被触发的监管服务器可以获取并根据保存在区块链上的相关数据进行回溯,以重建训练过程;再根据重建的训练过程检测出在训练过程中是否存在风险行为,以保护数据方的数据安全,提高训练过程的安全性和可信度。At the same time, this application also considers the possibility of introducing a blockchain structure and making full use of the relevant characteristics of the blockchain (for example, the data on the blockchain cannot be tampered with, etc.), and uploads the training data to the blockchain before training. Certificate storage; during the training process, the associated data during the training process is uploaded to the blockchain for certificate storage, so that if necessary later, the triggered supervision server can obtain and trace back based on the relevant data saved on the blockchain , to reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
如上述步骤S16-S17所述,本实施例可设置神经网络模型的迭代条件,迭代条件包括训练次数或训练时长等等,当神经网络模型满足迭代条件时,则结束训练,此时获取所述神经网络模型训练后的训练结果,判断训练结果是否满足要求,在判定训练结果满足要求时,则将训练后的所述神经网络模型作为文本识别模型,用于识别图像中的文本信息。As described in the above steps S16-S17, this embodiment can set the iteration conditions of the neural network model. The iteration conditions include the number of training times or training duration, etc. When the neural network model meets the iteration conditions, the training ends, and the above-mentioned iteration conditions are obtained. The training results after the training of the neural network model are used to determine whether the training results meet the requirements. When it is determined that the training results meet the requirements, the trained neural network model is used as a text recognition model to identify text information in the image.
其中,训练结果可包括识别得到的参考图像对中每张参考图像的文本信息,并标记为参考图像对中每张参考图像的目标文本信息。本实施例可计算参考图像对中两张参考图像的目标文本信息的相似度,得到预测相似度,判断所述预测相似度与对应的文本特征的相似度是否一致,若是,则将训练后的所述神经网络模型作为文本识别模型,以精确识别图像中的文本信息。The training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair. This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
本申请所提供的一种文本识别模型的训练方法,获取含有文本信息的第一图像,对第一图像进行随机扩增处理,得到多张第二图像,将第一图像和多张第二图像标记为参考图像,获取每张参考图像中文本信息的文本特征,计算每两张参考图像的文本特征的相似度,将相似度大于预设相似度阈值的两张参考图像作为参考图像对,将参考图像对输入神经网络模型进行训练,获取神经网络模型训练后的训练结果,判断训练结果是否满足要求,在判定所述训练结果满足要求时,将训练后的神经网络模型作为文本识别模型,以通过数据扩增的处理方式,提高训练数据的数据量,进而提高文本识别模型的识别准确率;并通过将两张相似度较高的参考图像对神经网络模型进行训练,以使训练得到的文本识别模型能结合训练数据之间的关联性,进一步提高文本识别模型的识别准确率。The training method of a text recognition model provided by this application obtains a first image containing text information, performs random amplification processing on the first image to obtain multiple second images, and combines the first image and multiple second images. Mark it as a reference image, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and use the two reference images with a similarity greater than the preset similarity threshold as a reference image pair. The input neural network model is trained with reference to the image, the training results after the training of the neural network model are obtained, and it is judged whether the training results meet the requirements. When it is determined that the training results meet the requirements, the trained neural network model is used as the text recognition model to Through the processing method of data amplification, the amount of training data is increased, thereby improving the recognition accuracy of the text recognition model; and the neural network model is trained by using two reference images with high similarity, so that the trained text The recognition model can combine the correlation between training data to further improve the recognition accuracy of the text recognition model.
在一实施例中,所述判断所述训练结果是否满足要求,可具体包括:In one embodiment, determining whether the training results meet the requirements may specifically include:
根据所述训练结果及预设的损失函数计算训练后的所述神经网络模型的损失值;Calculate the loss value of the trained neural network model according to the training results and the preset loss function;
判断所述损失值是否低于预设损失值;Determine whether the loss value is lower than the preset loss value;
若是,判定所述训练结果满足要求;If so, determine that the training results meet the requirements;
若否,判定所述训练结果不满足要求。If not, it is determined that the training results do not meet the requirements.
在本实施例中,在对神经网络模型每次训练后,可利用预设的交叉熵损失函数计算每次训练完成后的神经网络模型的损失值,并在损失值满足预设阈值或小于预设损失值时,即神经网络模型的训练结果满足要求,则表明神经网络模型达到训练要求,完成所述文本识别模型的训练,以提高文本识别模型的文本识别准确率。In this embodiment, after each training of the neural network model, the preset cross entropy loss function can be used to calculate the loss value of the neural network model after each training is completed, and when the loss value meets the preset threshold or is less than the preset When the loss value is set, that is, the training results of the neural network model meet the requirements, it means that the neural network model meets the training requirements, and the training of the text recognition model is completed to improve the text recognition accuracy of the text recognition model.
其中,交叉熵损失函数用来评价文本识别模型的预测值和真实值不一样的程度,损失函数越好,通常文本识别模型的性能越好。此外,交叉熵损失函数经常用于分类问题中,特别是在神经网络做分类问题时,也经常使用交叉熵作为损失函数,由于交叉熵涉及到计算每个类别的概率,所以交叉熵几乎每次都和sigmoid(或softmax)函数一起出现。此外,本实施例的损失函数不做具体限定,例如可以是均方差函数、协方差函数等等。Among them, the cross-entropy loss function is used to evaluate the degree to which the predicted value of the text recognition model is different from the true value. The better the loss function, the better the performance of the text recognition model. In addition, cross entropy loss function is often used in classification problems, especially when neural networks do classification problems. Cross entropy is also often used as a loss function. Since cross entropy involves calculating the probability of each category, cross entropy is used almost every time. All appear together with the sigmoid (or softmax) function. In addition, the loss function in this embodiment is not specifically limited. For example, it can be a mean square error function, a covariance function, etc.
另外,本实施例的预设损失值可以根据实际情况而定,而且该预设损失值与文本识别模型最终训练好时对应的损失阈值不同,一般这里的预设损失值大于文本识别模型最终训练好时对应的损失阈值。例如,文本识别模型最终训练好时对应的损失阈值是0.002,这里的预设损失值应该比0.002大,例如可以是0.005。In addition, the preset loss value in this embodiment can be determined according to the actual situation, and the preset loss value is different from the corresponding loss threshold when the text recognition model is finally trained. Generally, the preset loss value here is greater than the final training of the text recognition model. Hershey's corresponding loss threshold. For example, the corresponding loss threshold when the text recognition model is finally trained is 0.002. The preset loss value here should be larger than 0.002, for example, it can be 0.005.
在一实施例中,所述判定所述训练结果不满足要求之后,还包括:In one embodiment, after determining that the training results do not meet the requirements, the method further includes:
基于所述损失值更新所述神经网络模型的参数,将所述参考图像对输入更新参数后的所述神经网络模型进行再次训练,直至所述训练结果满足要求为止,并输出得到训练好的文本识别模型。Update the parameters of the neural network model based on the loss value, retrain the neural network model after inputting the updated parameters with the reference image until the training results meet the requirements, and output the trained text Identify the model.
当文本识别模型的损失值不小于预设损失值时,则可根据损失值在文本识别模型的神经网络结构中进行前向传递,调整文本识别模型的相关参数,将参考图像对输入重新设置相关参数的文本识别模型进行重新训练,直至文本识别模型的损失值小于预设损失值时为止,至此文本识别模型训练结束,得到训练结果满足要求的文本识别模型,以得到训练好的文本识别模型。When the loss value of the text recognition model is not less than the preset loss value, forward transmission can be carried out in the neural network structure of the text recognition model according to the loss value, the relevant parameters of the text recognition model can be adjusted, and the reference image pair input can be reset to the relevant parameters. The text recognition model of the parameters is retrained until the loss value of the text recognition model is less than the preset loss value. At this point, the text recognition model training ends, and a text recognition model whose training results meet the requirements is obtained to obtain a trained text recognition model.
在一实施例中,所述将训练后的所述神经网络模型作为文本识别模型之后,还可包括:In an embodiment, after using the trained neural network model as a text recognition model, the method may further include:
获取待识别的目标图像;Obtain the target image to be recognized;
将所述目标图像输入所述文本识别模型,得到所述目标图像的文本信息。The target image is input into the text recognition model to obtain text information of the target image.
本实施例获取待识别的目标图像,将目标图像输入文本识别模型中,借助文本识别模型输出得到目标图像的文本信息。其中,待识别的目标图像可以是用户上传的文本图像,也可以是执行文本识别方法的电子设备直接通过摄像头采集的文本图像,在此不限制待识别的目标图像的获取方式。由于本申请的文本识别模型无需样本标注,因而可以较低成本的得到该文本识别模型,直接采用该文本识别模型进行文本识别所需的成本也较低,另外,由于文本识别模型在训练时无需样本标注,识别准确度也不会再受到样本标注方式的影响,而且也可以不再受到训练样本数量的限制,采用大量训练样本进行训练后所得模型的识别精度及可靠性均更高,因此采用本申请训练得到的文本识别模型,能准确识别出目标图像的文本信息。This embodiment obtains the target image to be recognized, inputs the target image into the text recognition model, and obtains the text information of the target image with the help of the text recognition model output. The target image to be recognized may be a text image uploaded by the user, or may be a text image collected directly through a camera by an electronic device that performs the text recognition method. The acquisition method of the target image to be recognized is not limited here. Since the text recognition model of this application does not require sample labeling, the text recognition model can be obtained at a lower cost. The cost of directly using the text recognition model for text recognition is also low. In addition, since the text recognition model does not need to be trained during training Sample labeling and recognition accuracy will no longer be affected by the sample labeling method, and will no longer be limited by the number of training samples. The recognition accuracy and reliability of the model obtained after training with a large number of training samples are higher, so using The text recognition model trained by this application can accurately identify the text information of the target image.
在一实施例中,所述计算每两张所述参考图像的文本特征的相似度,可具体包括:In an embodiment, calculating the similarity of text features of each two reference images may specifically include:
将每张所述参考图像的文本特征分别转换为向量形式,得到每张所述参考图像的文本向量;Convert the text features of each reference image into vector form to obtain the text vector of each reference image;
计算每两张所述参考图像的文本向量的余弦距离,得到每两张所述参考图像的文本特征的相似度。Calculate the cosine distance of the text vectors of each two reference images to obtain the similarity of the text features of each two reference images.
在本实施例中,对于文本特征之间相似度的衡量,常用的方式是计算文本特征之间的余弦距离。余弦距离可以体现空间中两个向量间的差异性,将两个相近的语义关系进行聚集,以完成所有语义关系的聚集,并从中筛选出最聚集的语义关系作为文本特征的语义识别结果,如当大多数的语义关系聚集在区域A时,则从区域A中选取距离区域A的中心最近的语义关系作为语义识别结果。In this embodiment, for measuring the similarity between text features, a common method is to calculate the cosine distance between text features. Cosine distance can reflect the difference between two vectors in space, aggregate two similar semantic relationships to complete the aggregation of all semantic relationships, and filter out the most aggregated semantic relationships as the semantic recognition result of text features, such as When most semantic relations are gathered in area A, the semantic relations closest to the center of area A are selected from area A as the semantic recognition result.
本实施例可利用Word2Vec词向量模型分别将每张参考图像的文本特征转换为词向量,得到每张参考图像的文本向量,然后计算每两张所述参考图像的文本向量的余弦距离,将余弦距离作为该相似度。In this embodiment, the Word2Vec word vector model can be used to convert the text features of each reference image into word vectors to obtain the text vector of each reference image. Then, the cosine distance of the text vectors of each two reference images is calculated, and the cosine distance is calculated. distance as the similarity.
其中,Word2Vec词向量模型是从大量文本中学习语义知识的一种模型,采用无监督的方式。其通过训练大量文本,将文本中的词用向量形式表示,这个向量我们称之为词向量,可以通过计算两个词的词向量之间的距离,从而得知两个词之间的联系。Among them, the Word2Vec word vector model is a model that learns semantic knowledge from a large amount of text in an unsupervised manner. It trains a large amount of text and represents the words in the text in the form of vectors. This vector is called a word vector. We can calculate the distance between the word vectors of two words to learn the connection between the two words.
在一实施例中,所述将所述参考图像对输入神经网络模型进行训练,可具体包括:In one embodiment, training the input neural network model with the reference image may specifically include:
从所述参考图像对中随机选取一张参考图像作为训练图像,将所述参考图像对中的另一张参考图像作为验证图像;Randomly select one reference image from the reference image pair as the training image, and use the other reference image from the reference image pair as the verification image;
将所述训练图像输入到神经网络模型中进行训练;Input the training images into the neural network model for training;
所述判断所述训练结果是否满足要求,包括:Determining whether the training results meet the requirements includes:
根据所述验证图像对训练后的所述神经网络模型进行验证,若验证结果不符合预设的迭代停止条件时,则判定所述训练结果不满足要求。The trained neural network model is verified according to the verification image. If the verification result does not meet the preset iteration stop conditions, it is determined that the training result does not meet the requirements.
在本实施例中,可以从参考图像对中随机选取一张参考图像作为训练图像,将参考图像对中的另一张参考图像作为验证图像,利用训练图像对神经网络模型进行训练,根据验证图像对每次训练后的神经网络模型进行验证,若验证结果不符合预设的迭代停止条件时,则判定训练结果不满足要求。其中,验证结果可包括预测相似度与对应的文本特征的相似度相同或不同,例如,可计算神经网络模型输出的训练图像的文本信息与神经网络模型输出的验证图像的文本信息的相似度,得到预测相似度,判断所述预测相似度与对应的文本特征的相似度是否一致,若是,则将训练后的所述神经网络模型作为文本识别模型,以精确识别图像中的文本信息。In this embodiment, one reference image can be randomly selected from the reference image pair as the training image, and the other reference image in the reference image pair can be used as the verification image. The training image can be used to train the neural network model. According to the verification image Verify the neural network model after each training. If the verification results do not meet the preset iteration stop conditions, it is determined that the training results do not meet the requirements. The verification results may include that the predicted similarity is the same as or different from the similarity of the corresponding text features. For example, the similarity between the text information of the training image output by the neural network model and the text information of the verification image output by the neural network model may be calculated, Obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, use the trained neural network model as a text recognition model to accurately identify text information in the image.
在一实施例中,所述对所述第一图像进行随机扩增处理,得到多张第二图像,可具体包括:In one embodiment, the random amplification process on the first image to obtain multiple second images may specifically include:
对所述第一图像进行翻转、平移、缩放、旋转及调整图像RGB各通道权重中的至少一种随机扩增处理方式,得到多张第二图像。Perform at least one random amplification processing method of flipping, translating, scaling, rotating and adjusting the weight of each RGB channel of the image on the first image to obtain a plurality of second images.
在本实施例中,随机扩增处理的方式包括但不限于对图像进行翻转、平移、缩放、调整图像RGB各通道权重以及对图像进行旋转等。例如,可以对第一图像进行翻转,然后对翻转后的第一图像进行放大,得到一张第二图像。In this embodiment, the random amplification processing method includes but is not limited to flipping, translating, scaling the image, adjusting the weight of each RGB channel of the image, and rotating the image. For example, the first image can be flipped, and then the flipped first image can be enlarged to obtain a second image.
参照图2,本申请实施例中还提供一种文本识别模型的训练装置,包括:Referring to Figure 2, an embodiment of the present application also provides a training device for a text recognition model, including:
获取模块11,用于获取含有文本信息的第一图像;Acquisition module 11, used to acquire the first image containing text information;
扩增处理模块12,用于对所述第一图像进行随机扩增处理,得到多张第二图像;The amplification processing module 12 is used to perform random amplification processing on the first image to obtain multiple second images;
标记模块13,用于将所述第一图像和多张第二图像标记为参考图像;Marking module 13, used to mark the first image and multiple second images as reference images;
计算模块14,用于获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;The calculation module 14 is used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
输入模块15,用于将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;The input module 15 is used to use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
判断模块16,用于获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;The judgment module 16 is used to obtain the training results after training the neural network model and judge whether the training results meet the requirements;
判定模块17,用于在判定所述训练结果满足要求时,将训练后的所述神经网络模型作为文本识别模型。The determination module 17 is configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
本申请的文本识别模型所识别的对象为含有文本信息的图像,对图像中的文本信息进行识别,实现图像的文本识别功能。本实施例获取的具有文本信息的第一图像可以是用户上传的图像,如用户可通过对具有文本信息的纸质或者其他介质文档进行扫描获得,也可以为截取手机屏幕内容的截屏图像等等。The object recognized by the text recognition model of this application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image. The first image with text information obtained in this embodiment can be an image uploaded by the user. For example, the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
在一实施例中,在获取到含有文本信息的第一图像后,还可对第一图像进行预处理,如调整第一图像的图像尺寸、亮度、清晰度等等图像参数。此外,通常的第一图像为彩色,具有多种颜色,文本信息的字符颜色多为亮度值比较暗的颜色,为利于将第一图像中的文本信息的每个字符提取出来,还可以设定亮度值为标准对第一图像进行二值化处理,将第一图像转换为黑白图像,以凸显第一图像中的文本信息,避免第一图像中的颜色干扰。In one embodiment, after acquiring the first image containing text information, the first image can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image. In addition, the first image is usually in color and has multiple colors. The character color of the text information is mostly a color with a relatively dark brightness value. In order to facilitate the extraction of each character of the text information in the first image, it is also possible to set The first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
具体的,服务器获取第一图像中的颜色亮度值,将第一图像中的颜色亮度值与预设颜色亮度值进行比对,得到比对结果,该比对结果中包含第一图像中的颜色亮度值大于、等于或小于预设颜色亮度值;根据比对结果,将第一图像中的颜色亮度值大于预设颜色亮度值的第一图像转换为白色,反之则转换为黑色,以利于将第一图像中的文本信息的每个字符提取出来。其中,预设颜色亮度值可根据需要可进行调整。Specifically, the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image. The brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion. Each character of the text information in the first image is extracted. Among them, the preset color brightness value can be adjusted as needed.
在一实施例中,当检测到第一图像的背景为黑色、文本信息为白色时, 即黑底白字的情况。为避免影响文本信息的识别,本实施例的服务器还可对第一图像的背景颜色进行判断,将背景颜色为黑色、文本信息为白色的第一图像转换为背景颜色为白色、文本信息为黑色的图像,即将黑底白字的图像转换为白底黑字的图像。In one embodiment, when it is detected that the background of the first image is black and the text information is white, it is a situation of white text on a black background. In order to avoid affecting the recognition of text information, the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
在本实施例中,随机扩增处理是对数据进行扩充的方法,随机扩增处理可以增加训练集的样本,可以有效缓解模型过拟合的情况,也可以给模型带来的更强的泛化能力。随机扩增处理的目的就是使得训练数据尽可能的接近测试数据,从而提高预测精度。另外随机扩增处理可以迫使网络学习到更鲁棒性的特征,从而使模型拥有更强的泛化能力。In this embodiment, random amplification processing is a method of expanding data. Random amplification processing can increase the number of samples in the training set, can effectively alleviate the over-fitting of the model, and can also bring stronger generalization to the model. ization ability. The purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy. In addition, random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
本实施例对第一图像进行随机扩增处理,如对第一图像进行放大、缩小、裁剪、亮度调整、饱和度调整等等随机扩增处理方式,可以采用一种随机扩增处理方式,还可以是多种随机扩增处理方式结合使用,最终得到多张第二图像。本实施例的图像扩增技术对深度学习中目标检测有着积极的作用,它能够增加各类别的数据量,使各类别保持平衡,避免样本不均衡带来的过拟合问题,同时也能够在一定程度上减少前期样本采集的数据量。This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image. A random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images. The image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
本实施例将第一图像和多张第二图像标记为参考图像,生成包括所有参考图像的数据集,然后从数据集中获取每张参考图像中文本信息的文本特征,计算每两张参考图像的文本特征的相似度。具体的,可以在参考图像中识别出文本信息的文本位置信息,根据文本位置信息,对参考图像进行校正,得到校正后的参考图像,采用识别模型的编码网络对校正后的参考图像的文本信息进行特征提取,得到文本特征,然后根据每两张参考图像的文本特征中所包含的词特征,分别构建用于计算每两张参考图像的文本特征之间相似度的向量空间模型,根据所述向量空间模型,将每两张参考图像的词特征表示为词向量,按照余弦距离算法,计算出每两张参考图像的词向量之间夹角的余弦值,将该余弦值作为每两张参考图像的文本特征的相似度。This embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, then obtains the text features of the text information in each reference image from the data set, and calculates the text features of each two reference images. Similarity of text features. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained. The coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images. According to the The vector space model represents the word features of each two reference images as word vectors. According to the cosine distance algorithm, the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
其中,文本位置信息可以为包括文本信息的文本框在参考图像中的位置信息。例如,在参考图像识别出包含文本信息的文本区域,获取该文本区域的位置信息,作为文本信息的文本位置信息,比如,在参考图像中识别出包含内容的文本区域,计算这个文本区域对应的虚拟文本框在整个参考图像中的位置信息,将该位置信息作为文本信息的文本位置信息。The text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area. The position information of the virtual text box in the entire reference image is used as the text position information of the text information.
本实施例根据计算得到的每两张参考图像的文本特征的相似度,将相似度大于预设相似度阈值的两张参考图像作为参考图像对,将参考图像对作为训练数据,并将参考图像对输入神经网络模型进行训练,以使训练得到的文本识别模型能结合训练数据之间的关联性,提高文本识别模型的识别准确率。其中,预设相似度阈值可自定义设置,如设置为0.9。In this embodiment, based on the calculated similarity of the text features of each two reference images, the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, the reference image pair is used as training data, and the reference image is The input neural network model is trained so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model. Among them, the preset similarity threshold can be customized, for example, set to 0.9.
同时,本申请还可引入区块链结构,并充分利用区块链的相关特性(如区块链上的数据不可篡改等),在训练前,将训练数据上传到区块链上进行存证;在训练过程中,将训练过程中的关联数据上传至区块链上进行存证,这样后续如果需要,被触发的监管服务器可以获取并根据保存在区块链上的相关数据进行回溯,以重建训练过程;再根据重建的训练过程检测出在训练过程中是否存在风险行为,以保护数据方的数据安全,提高训练过程的安全性和可信度。At the same time, this application can also introduce a blockchain structure and make full use of the relevant characteristics of the blockchain (such as the data on the blockchain cannot be tampered with, etc.). Before training, the training data can be uploaded to the blockchain for storage. ;During the training process, the associated data during the training process is uploaded to the blockchain for storage, so that if necessary, the triggered supervision server can obtain and trace back based on the relevant data stored on the blockchain to Reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
本实施例可设置神经网络模型的迭代条件,迭代条件包括训练次数或训练时长等等,当神经网络模型满足迭代条件时,则结束训练,此时获取所述神经网络模型训练后的训练结果,判断训练结果是否满足要求,在判定训练结果满足要求时,则将训练后的所述神经网络模型作为文本识别模型,用于识别图像中的文本信息。This embodiment can set the iteration conditions of the neural network model. The iteration conditions include the number of training times or the training duration, etc. When the neural network model meets the iteration conditions, the training is ended. At this time, the training results after training of the neural network model are obtained. Determine whether the training results meet the requirements. When it is determined that the training results meet the requirements, the trained neural network model is used as a text recognition model to identify text information in the image.
其中,训练结果可包括识别得到的参考图像对中每张参考图像的文本信息,并标记为参考图像对中每张参考图像的目标文本信息。本实施例可计算参考图像对中两张参考图像的目标文本信息的相似度,得到预测相似度,判断所述预测相似度与对应的文本特征的相似度是否一致,若是,则将训练后的所述神经网络模型作为文本识别模型,以精确识别图像中的文本信息。The training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair. This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
如上所述,可以理解地,本申请中提出的所述文本识别模型的训练装置的各组成部分可以实现如上所述文本识别模型的训练方法任一项的功能,具体结构不再赘述。As mentioned above, it can be understood that each component of the text recognition model training device proposed in this application can implement the functions of any of the above text recognition model training methods, and the specific structure will not be described again.
参照图3,本申请实施例中还提供一种计算机设备,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括存储介质、内存储器。该存储介质存储有操作系统、计算机程序和数据库。该内存器为存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储所述文本识别模型的训练方法的相关数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种文本识别模型的训练方法。Referring to Figure 3, an embodiment of the present application also provides a computer device, the internal structure of which can be shown in Figure 3. The computer device includes a processor, memory, network interface, and database connected through a system bus. Among them, the processor designed by the computer is used to provide computing and control capabilities. The memory of the computer device includes storage media and internal memory. The storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operating system and computer programs in the storage medium to run. The database of the computer device is used to store data related to the training method of the text recognition model. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer program, when executed by the processor, implements a method for training a text recognition model.
上述处理器执行上述的文本识别模型的训练方法,包括以下步骤:The above-mentioned processor executes the above-mentioned text recognition model training method, including the following steps:
获取含有文本信息的第一图像;Obtain the first image containing text information;
对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
本申请一实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以是非易失性的,也可以是易失性的,其上存储有计算机程序,计算机程序被处理器执行时实现一种文本识别模型的训练方法,该方法包括以下步骤:An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. A computer program is stored thereon. When the computer program is executed by a processor, Implement a training method for text recognition models, which includes the following steps:
获取含有文本信息的第一图像;Obtain the first image containing text information;
对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、扩增型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When executed, the computer program may include the processes of the above method embodiments. Any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), expanded SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
综上所述,本申请提供的一种文本识别模型的训练方法、装置、计算机设备及存储介质,获取含有文本信息的第一图像,对第一图像进行随机扩增处理,得到多张第二图像,将第一图像和多张第二图像标记为参考图像,获取每张参考图像中文本信息的文本特征,计算每两张参考图像的文本特征的相似度,将相似度大于预设相似度阈值的两张参考图像作为参考图像对,将参考图像对输入神经网络模型进行训练,获取神经网络模型训练后的训练结果,判断训练结果是否满足要求,在判定所述训练结果满足要求时,将训练后的神经网络模型作为文本识别模型,以通过数据扩增的处理方式,提高训练数据的数据量,并通过将两张相似度较高的参考图像对神经网络模型进行训练,以使训练得到的文本识别模型能结合训练数据之间的关联性,从而提高了文本识别模型的识别准确率。In summary, this application provides a training method, device, computer equipment and storage medium for a text recognition model, which acquires a first image containing text information, performs random amplification processing on the first image, and obtains multiple second images. image, mark the first image and multiple second images as reference images, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and set the similarity to be greater than the preset similarity The two reference images of the threshold are used as a reference image pair, the reference image pair is input to the neural network model for training, the training results after the neural network model training are obtained, and it is judged whether the training results meet the requirements. When it is determined that the training results meet the requirements, the The trained neural network model is used as a text recognition model. The data volume of the training data is increased through data amplification, and the neural network model is trained by using two reference images with high similarity, so that the training can obtain The text recognition model can combine the correlation between training data, thereby improving the recognition accuracy of the text recognition model.

Claims (20)

  1. 一种文本识别模型的训练方法,其中,所述训练方法包括:A training method for a text recognition model, wherein the training method includes:
    获取含有文本信息的第一图像;Obtain the first image containing text information;
    对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
    将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
    获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
    将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
    获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
    若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
  2. 根据权利要求1所述的方法,其中,所述判断所述训练结果是否满足要求,包括:The method according to claim 1, wherein determining whether the training results meet requirements includes:
    根据所述训练结果及预设的损失函数计算训练后的所述神经网络模型的损失值;Calculate the loss value of the trained neural network model according to the training results and the preset loss function;
    判断所述损失值是否低于预设损失值;Determine whether the loss value is lower than the preset loss value;
    若是,判定所述训练结果满足要求;If so, determine that the training results meet the requirements;
    若否,判定所述训练结果不满足要求。If not, it is determined that the training results do not meet the requirements.
  3. 根据权利要求2所述的方法,其中,所述判定所述训练结果不满足要求之后,还包括:The method according to claim 2, wherein after determining that the training results do not meet requirements, it further includes:
    基于所述损失值更新所述神经网络模型的参数,将所述参考图像对输入更新参数后的所述神经网络模型进行再次训练,直至所述训练结果满足要求为止,并输出得到训练好的文本识别模型。Update the parameters of the neural network model based on the loss value, retrain the neural network model after inputting the updated parameters with the reference image until the training results meet the requirements, and output the trained text Identify the model.
  4. 根据权利要求1所述的方法,其中,所述将训练后的所述神经网络模型作为文本识别模型之后,还包括:The method according to claim 1, wherein after using the trained neural network model as a text recognition model, it further includes:
    获取待识别的目标图像;Obtain the target image to be recognized;
    将所述目标图像输入所述文本识别模型,得到所述目标图像的文本信息。The target image is input into the text recognition model to obtain text information of the target image.
  5. 根据权利要求1所述的方法,其中,所述计算每两张所述参考图像的文本特征的相似度,包括:The method according to claim 1, wherein calculating the similarity of text features of each two reference images includes:
    将每张所述参考图像的文本特征分别转换为向量形式,得到每张所述参考图像的文本向量;Convert the text features of each reference image into vector form to obtain the text vector of each reference image;
    计算每两张所述参考图像的文本向量的余弦距离,得到每两张所述参考图像的文本特征的相似度。Calculate the cosine distance of the text vectors of each two reference images to obtain the similarity of the text features of each two reference images.
  6. 根据权利要求1所述的方法,其中,所述将所述参考图像对输入神经网络模型进行训练,包括:The method according to claim 1, wherein said inputting the reference image to a neural network model for training includes:
    从所述参考图像对中随机选取一张参考图像作为训练图像,将所述参考图像对中的另一张参考图像作为验证图像;Randomly select one reference image from the reference image pair as the training image, and use the other reference image from the reference image pair as the verification image;
    将所述训练图像输入到神经网络模型中进行训练;Input the training images into the neural network model for training;
    所述判断所述训练结果是否满足要求,包括:Determining whether the training results meet the requirements includes:
    根据所述验证图像对训练后的所述神经网络模型进行验证,若验证结果不符合预设的迭代停止条件时,则判定所述训练结果不满足要求。The trained neural network model is verified according to the verification image. If the verification result does not meet the preset iteration stop conditions, it is determined that the training result does not meet the requirements.
  7. 根据权利要求1所述的方法,其中,所述对所述第一图像进行随机扩增处理,得到多张第二图像,包括:The method of claim 1, wherein the first image is randomly amplified to obtain a plurality of second images, including:
    对所述第一图像进行翻转、平移、缩放、旋转及调整图像RGB各通道权重中的至少一种随机扩增处理方式,得到多张第二图像。Perform at least one random amplification processing method of flipping, translating, scaling, rotating and adjusting the weight of each RGB channel of the image on the first image to obtain a plurality of second images.
  8. 一种文本识别模型的训练装置,其中,所述训练装置包括:A training device for a text recognition model, wherein the training device includes:
    获取模块,用于获取含有文本信息的第一图像;The acquisition module is used to acquire the first image containing text information;
    扩增处理模块,用于对所述第一图像进行随机扩增处理,得到多张第二图像;An amplification processing module, used to perform random amplification processing on the first image to obtain multiple second images;
    标记模块,用于将所述第一图像和多张第二图像标记为参考图像;A marking module, used to mark the first image and the plurality of second images as reference images;
    计算模块,用于获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;A calculation module, used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
    输入模块,用于将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;An input module, configured to use the two reference images with a similarity greater than a preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
    判断模块,用于获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;A judgment module, used to obtain the training results after the neural network model is trained, and judge whether the training results meet the requirements;
    判定模块,用于在判定所述训练结果满足要求时,将训练后的所述神经网络模型作为文本识别模型。A determination module, configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
  9. 一种计算机设备,其中,所述计算机设备包括:A computer device, wherein the computer device includes:
    处理器;processor;
    存储器;memory;
    其中,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种文本识别模型的训练方法,其中,所述方法包括以下步骤:Wherein, the memory stores a computer program, and when the processor executes the computer program, it implements a training method for a text recognition model, wherein the method includes the following steps:
    获取含有文本信息的第一图像;Obtain the first image containing text information;
    对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
    将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
    获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
    将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
    获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
    若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
  10. 根据权利要求9所述的计算机设备,其中,所述判断所述训练结果是否满足要求,包括:The computer device according to claim 9, wherein the determining whether the training result meets the requirements includes:
    根据所述训练结果及预设的损失函数计算训练后的所述神经网络模型的损失值;Calculate the loss value of the trained neural network model according to the training results and the preset loss function;
    判断所述损失值是否低于预设损失值;Determine whether the loss value is lower than the preset loss value;
    若是,判定所述训练结果满足要求;If so, determine that the training results meet the requirements;
    若否,判定所述训练结果不满足要求。If not, it is determined that the training results do not meet the requirements.
  11. 根据权利要求10所述的计算机设备,其中,所述判定所述训练结果不满足要求之后,还包括:The computer device according to claim 10, wherein after determining that the training result does not meet requirements, it further includes:
    基于所述损失值更新所述神经网络模型的参数,将所述参考图像对输入更新参数后的所述神经网络模型进行再次训练,直至所述训练结果满足要求为止,并输出得到训练好的文本识别模型。Update the parameters of the neural network model based on the loss value, retrain the neural network model after inputting the updated parameters with the reference image until the training results meet the requirements, and output the trained text Identify the model.
  12. 根据权利要求9所述的计算机设备,其中,所述将训练后的所述神经网络模型作为文本识别模型之后,还包括:The computer device according to claim 9, wherein after using the trained neural network model as a text recognition model, the method further includes:
    获取待识别的目标图像;Obtain the target image to be recognized;
    将所述目标图像输入所述文本识别模型,得到所述目标图像的文本信息。The target image is input into the text recognition model to obtain text information of the target image.
  13. 根据权利要求9所述的计算机设备,其中,所述计算每两张所述参考图像的文本特征的相似度,包括:The computer device according to claim 9, wherein the calculating the similarity of text features of each two reference images includes:
    将每张所述参考图像的文本特征分别转换为向量形式,得到每张所述参考图像的文本向量;Convert the text features of each reference image into vector form to obtain the text vector of each reference image;
    计算每两张所述参考图像的文本向量的余弦距离,得到每两张所述参考图像的文本特征的相似度。Calculate the cosine distance of the text vectors of each two reference images to obtain the similarity of the text features of each two reference images.
  14. 根据权利要求9所述的计算机设备,其中,所述将所述参考图像对输入神经网络模型进行训练,包括:The computer device according to claim 9, wherein said inputting the reference image to a neural network model for training includes:
    从所述参考图像对中随机选取一张参考图像作为训练图像,将所述参考图像对中的另一张参考图像作为验证图像;Randomly select one reference image from the reference image pair as the training image, and use the other reference image from the reference image pair as the verification image;
    将所述训练图像输入到神经网络模型中进行训练;Input the training images into the neural network model for training;
    所述判断所述训练结果是否满足要求,包括:Determining whether the training results meet the requirements includes:
    根据所述验证图像对训练后的所述神经网络模型进行验证,若验证结果不符合预设的迭代停止条件时,则判定所述训练结果不满足要求。The trained neural network model is verified according to the verification image. If the verification result does not meet the preset iteration stop conditions, it is determined that the training result does not meet the requirements.
  15. 根据权利要求9所述的计算机设备,其中,所述对所述第一图像进行随机扩增处理,得到多张第二图像,包括:The computer device according to claim 9, wherein the random amplification process is performed on the first image to obtain a plurality of second images, including:
    对所述第一图像进行翻转、平移、缩放、旋转及调整图像RGB各通道权重中的至少一种随机扩增处理方式,得到多张第二图像。Perform at least one random amplification processing method of flipping, translating, scaling, rotating and adjusting the weight of each RGB channel of the image on the first image to obtain a plurality of second images.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现一种文本识别模型的训练方法,其中,所述方法包括以下步骤:A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, a training method for a text recognition model is implemented, wherein the method includes the following steps :
    获取含有文本信息的第一图像;Obtain the first image containing text information;
    对所述第一图像进行随机扩增处理,得到多张第二图像;Perform random amplification processing on the first image to obtain multiple second images;
    将所述第一图像和多张第二图像标记为参考图像;Mark the first image and the plurality of second images as reference images;
    获取每张所述参考图像中文本信息的文本特征,计算每两张所述参考图像的文本特征的相似度;Obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
    将相似度大于预设相似度阈值的两张所述参考图像作为参考图像对,将所述参考图像对输入神经网络模型进行训练;Use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
    获取所述神经网络模型训练后的训练结果,判断所述训练结果是否满足要求;Obtain the training results after training the neural network model, and determine whether the training results meet the requirements;
    若是,将训练后的所述神经网络模型作为文本识别模型。If so, use the trained neural network model as a text recognition model.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述判断所述训练结果是否满足要求,包括:The computer-readable storage medium according to claim 16, wherein the determining whether the training results meet requirements includes:
    根据所述训练结果及预设的损失函数计算训练后的所述神经网络模型的损失值;Calculate the loss value of the trained neural network model according to the training results and the preset loss function;
    判断所述损失值是否低于预设损失值;Determine whether the loss value is lower than the preset loss value;
    若是,判定所述训练结果满足要求;If so, determine that the training results meet the requirements;
    若否,判定所述训练结果不满足要求。If not, it is determined that the training results do not meet the requirements.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述判定所述训练结果不满足要求之后,还包括:The computer-readable storage medium according to claim 17, wherein after determining that the training result does not meet the requirements, it further includes:
    基于所述损失值更新所述神经网络模型的参数,将所述参考图像对输入更新参数后的所述神经网络模型进行再次训练,直至所述训练结果满足要求为止,并输出得到训练好的文本识别模型。Update the parameters of the neural network model based on the loss value, retrain the neural network model after inputting the updated parameters with the reference image until the training results meet the requirements, and output the trained text Identify the model.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述将训练后的所述神经网络模型作为文本识别模型之后,还包括:The computer-readable storage medium according to claim 16, wherein after using the trained neural network model as a text recognition model, it further includes:
    获取待识别的目标图像;Obtain the target image to be recognized;
    将所述目标图像输入所述文本识别模型,得到所述目标图像的文本信息。The target image is input into the text recognition model to obtain text information of the target image.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述计算每两张所述参考图像的文本特征的相似度,包括:The computer-readable storage medium according to claim 16, wherein the calculating the similarity of text features of each two reference images includes:
    将每张所述参考图像的文本特征分别转换为向量形式,得到每张所述参考图像的文本向量;Convert the text features of each reference image into vector form to obtain the text vector of each reference image;
    计算每两张所述参考图像的文本向量的余弦距离,得到每两张所述参考图像的文本特征的相似度。Calculate the cosine distance of the text vectors of each two reference images to obtain the similarity of the text features of each two reference images.
PCT/CN2022/090160 2022-03-15 2022-04-29 Method and apparatus for training text recognition model, and computer device and storage medium WO2023173546A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210253870.4 2022-03-15
CN202210253870.4A CN114724162A (en) 2022-03-15 2022-03-15 Training method and device of text recognition model, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023173546A1 true WO2023173546A1 (en) 2023-09-21

Family

ID=82238595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090160 WO2023173546A1 (en) 2022-03-15 2022-04-29 Method and apparatus for training text recognition model, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN114724162A (en)
WO (1) WO2023173546A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117457032A (en) * 2023-12-25 2024-01-26 山东万里红信息技术有限公司 Storage medium destroying method based on volume identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376352A (en) * 2018-08-28 2019-02-22 中山大学 A kind of patent text modeling method based on word2vec and semantic similarity
CN111401375A (en) * 2020-03-09 2020-07-10 苏宁云计算有限公司 Text recognition model training method, text recognition device and text recognition equipment
US20210295162A1 (en) * 2019-01-04 2021-09-23 Ping An Technology(Shenzhen)Co.,Ltd. Neural network model training method and apparatus, computer device, and storage medium
CN114005012A (en) * 2021-11-05 2022-02-01 北京市商汤科技开发有限公司 Training method, device, equipment and storage medium of multi-mode pre-training model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104510B (en) * 2019-11-15 2023-05-09 南京中新赛克科技有限责任公司 Text classification training sample expansion method based on word embedding
CN112818975A (en) * 2021-01-27 2021-05-18 北京金山数字娱乐科技有限公司 Text detection model training method and device and text detection method and device
CN114036907A (en) * 2021-11-18 2022-02-11 国网江苏省电力有限公司电力科学研究院 Text data amplification method based on domain features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376352A (en) * 2018-08-28 2019-02-22 中山大学 A kind of patent text modeling method based on word2vec and semantic similarity
US20210295162A1 (en) * 2019-01-04 2021-09-23 Ping An Technology(Shenzhen)Co.,Ltd. Neural network model training method and apparatus, computer device, and storage medium
CN111401375A (en) * 2020-03-09 2020-07-10 苏宁云计算有限公司 Text recognition model training method, text recognition device and text recognition equipment
CN114005012A (en) * 2021-11-05 2022-02-01 北京市商汤科技开发有限公司 Training method, device, equipment and storage medium of multi-mode pre-training model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117457032A (en) * 2023-12-25 2024-01-26 山东万里红信息技术有限公司 Storage medium destroying method based on volume identification
CN117457032B (en) * 2023-12-25 2024-03-22 山东万里红信息技术有限公司 Storage medium destroying method based on volume identification

Also Published As

Publication number Publication date
CN114724162A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2020098074A1 (en) Face sample picture marking method and apparatus, computer device, and storage medium
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
US20230119593A1 (en) Method and apparatus for training facial feature extraction model, method and apparatus for extracting facial features, device, and storage medium
CN110866530A (en) Character image recognition method and device and electronic equipment
CN111079841A (en) Training method and device for target recognition, computer equipment and storage medium
CN111476268A (en) Method, device, equipment and medium for training reproduction recognition model and image recognition
CN111191695A (en) Website picture tampering detection method based on deep learning
WO2021164481A1 (en) Neural network model-based automatic handwritten signature verification method and device
US20170185913A1 (en) System and method for comparing training data with test data
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
US20230215125A1 (en) Data identification method and apparatus
CN113723070A (en) Text similarity model training method, text similarity detection method and text similarity detection device
CN113806613B (en) Training image set generation method, training image set generation device, computer equipment and storage medium
WO2023173546A1 (en) Method and apparatus for training text recognition model, and computer device and storage medium
CN117593752B (en) PDF document input method, PDF document input system, storage medium and electronic equipment
CN113283388B (en) Training method, device, equipment and storage medium of living body face detection model
CN111898544B (en) Text image matching method, device and equipment and computer storage medium
CN114328942A (en) Relationship extraction method, apparatus, device, storage medium and computer program product
Machado et al. Improving face detection
CN114519416A (en) Model distillation method and device and electronic equipment
CN113516148A (en) Image processing method, device and equipment based on artificial intelligence and storage medium
CN114283429A (en) Material work order data processing method, device, equipment and storage medium
Li et al. Unsupervised steganalysis over social networks based on multi-reference sub-image sets
US20240176951A1 (en) Electronic document validation
US11631267B1 (en) Systems and methods for utilizing a tiered processing scheme

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931573

Country of ref document: EP

Kind code of ref document: A1