WO2023173546A1 - Procédé et appareil d'entraînement de modèle de reconnaissance de texte, et dispositif informatique et support de stockage - Google Patents

Procédé et appareil d'entraînement de modèle de reconnaissance de texte, et dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2023173546A1
WO2023173546A1 PCT/CN2022/090160 CN2022090160W WO2023173546A1 WO 2023173546 A1 WO2023173546 A1 WO 2023173546A1 CN 2022090160 W CN2022090160 W CN 2022090160W WO 2023173546 A1 WO2023173546 A1 WO 2023173546A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
image
text
neural network
images
Prior art date
Application number
PCT/CN2022/090160
Other languages
English (en)
Chinese (zh)
Inventor
郑喜民
朱翌
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173546A1 publication Critical patent/WO2023173546A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the technical field of natural language processing of artificial intelligence technology. Specifically, this application relates to a training method, device, computer equipment and storage medium for a text recognition model.
  • Text recognition requires certain image processing to identify the text content in the image.
  • Text recognition can be used in many fields, such as sorting letters and packages, editing and proofreading of manuscripts, summarizing and analyzing a large number of statistical reports and cards, processing bank checks, statistical summarization of commodity invoices, identification of commodity codes, commodity warehouses Management, document retrieval, identification of various documents and office automation of financial bill processing, etc., facilitate users to quickly enter information and improve work efficiency in all walks of life.
  • CRNN ConvolutionalRecurrentNeural Network
  • This model first Use Convolutional Neural Networks (CNN) to extract feature sequences from the input image, then use Recurrent Neural Networks (RNN) to predict the label distribution of the feature sequences obtained from the convolutional layer, and finally introduce connectionist time series Classification (Connectionist temporal classification, CTC) converts the label distribution obtained from the loop layer into the final recognition result through operations such as deduplication and integration.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • CTC Connectionist time series Classification
  • the performance of the convolutional neural network is highly dependent on the training data. When the training data is diverse The more characteristics and the larger the amount of data, the better the performance of the trained model will be. However, when the amount of training data is smaller, the recognition accuracy of the trained text recognition model will be lower.
  • the main purpose of this application is to provide a training method, device, computer equipment and storage medium for a text recognition model, so as to increase the amount of training data and thereby improve the recognition accuracy of the text recognition model.
  • this application provides a training method for a text recognition model, which includes:
  • This application also provides a training device for a text recognition model, which includes:
  • the acquisition module is used to acquire the first image containing text information
  • An amplification processing module used to perform random amplification processing on the first image to obtain multiple second images
  • a marking module used to mark the first image and the plurality of second images as reference images
  • a calculation module used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images
  • An input module configured to use the two reference images with a similarity greater than a preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
  • a judgment module used to obtain the training results after the neural network model is trained, and judge whether the training results meet the requirements
  • a determination module configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
  • This application also provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements a training method for a text recognition model, wherein the method includes the following step:
  • This application also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program When executed by a processor, it implements a training method for a text recognition model, wherein the method includes the following: step:
  • the training method, device, computer equipment and storage medium of a text recognition model provided by this application can improve the recognition accuracy of the text recognition model.
  • Figure 1 is a schematic flowchart of a training method for a text recognition model according to an embodiment of the present application
  • Figure 2 is a schematic structural block diagram of a text recognition model training device according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • This application proposes a training method for a text recognition model.
  • the embodiments of this application can acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, etc.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the text recognition model training method proposed in this application uses a server as the execution subject.
  • the server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • cloud services cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications.
  • middleware services domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • the training method of the text recognition model includes:
  • the object recognized by the text recognition model of the present application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image.
  • the first image with text information obtained in this embodiment can be an image uploaded by the user.
  • the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
  • the first image after acquiring the first image containing text information, can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image.
  • the first image is usually in color and has multiple colors.
  • the character color of the text information is mostly a color with a relatively dark brightness value.
  • the first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
  • the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image.
  • the brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion.
  • Each character of the text information in the first image is extracted.
  • the preset color brightness value can be adjusted as needed.
  • the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
  • random amplification is a method of expanding data. Random amplification can increase the number of samples in the training set, which can effectively alleviate the overfitting of the model and can also bring stronger performance to the model.
  • Generalization The purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy.
  • random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
  • This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image.
  • a random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images.
  • the image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
  • this embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, and then obtains the text of the text information in each reference image from the data set.
  • Features calculate the similarity of text features of each two reference images. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained.
  • the coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images.
  • the vector space model represents the word features of each two reference images as word vectors.
  • the cosine distance algorithm the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
  • the text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area.
  • the position information of the virtual text box in the entire reference image is used as the text position information of the text information.
  • the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, and the reference image pair is used as a training data, and use reference images to train the input neural network model, so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model.
  • the preset similarity threshold can be customized, for example, set to 0.9.
  • this application also considers the possibility of introducing a blockchain structure and making full use of the relevant characteristics of the blockchain (for example, the data on the blockchain cannot be tampered with, etc.), and uploads the training data to the blockchain before training. Certificate storage; during the training process, the associated data during the training process is uploaded to the blockchain for certificate storage, so that if necessary later, the triggered supervision server can obtain and trace back based on the relevant data saved on the blockchain , to reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
  • this embodiment can set the iteration conditions of the neural network model.
  • the iteration conditions include the number of training times or training duration, etc.
  • the training ends, and the above-mentioned iteration conditions are obtained.
  • the training results after the training of the neural network model are used to determine whether the training results meet the requirements.
  • the trained neural network model is used as a text recognition model to identify text information in the image.
  • the training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair.
  • This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
  • the training method of a text recognition model obtaineds a first image containing text information, performs random amplification processing on the first image to obtain multiple second images, and combines the first image and multiple second images. Mark it as a reference image, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and use the two reference images with a similarity greater than the preset similarity threshold as a reference image pair.
  • the input neural network model is trained with reference to the image, the training results after the training of the neural network model are obtained, and it is judged whether the training results meet the requirements.
  • the trained neural network model is used as the text recognition model to Through the processing method of data amplification, the amount of training data is increased, thereby improving the recognition accuracy of the text recognition model; and the neural network model is trained by using two reference images with high similarity, so that the trained text
  • the recognition model can combine the correlation between training data to further improve the recognition accuracy of the text recognition model.
  • determining whether the training results meet the requirements may specifically include:
  • the preset cross entropy loss function can be used to calculate the loss value of the neural network model after each training is completed, and when the loss value meets the preset threshold or is less than the preset
  • the loss value is set, that is, the training results of the neural network model meet the requirements, it means that the neural network model meets the training requirements, and the training of the text recognition model is completed to improve the text recognition accuracy of the text recognition model.
  • the cross-entropy loss function is used to evaluate the degree to which the predicted value of the text recognition model is different from the true value. The better the loss function, the better the performance of the text recognition model.
  • cross entropy loss function is often used in classification problems, especially when neural networks do classification problems.
  • Cross entropy is also often used as a loss function. Since cross entropy involves calculating the probability of each category, cross entropy is used almost every time. All appear together with the sigmoid (or softmax) function.
  • the loss function in this embodiment is not specifically limited. For example, it can be a mean square error function, a covariance function, etc.
  • the preset loss value in this embodiment can be determined according to the actual situation, and the preset loss value is different from the corresponding loss threshold when the text recognition model is finally trained. Generally, the preset loss value here is greater than the final training of the text recognition model. Hershey's corresponding loss threshold. For example, the corresponding loss threshold when the text recognition model is finally trained is 0.002. The preset loss value here should be larger than 0.002, for example, it can be 0.005.
  • the method further includes:
  • the loss value of the text recognition model is not less than the preset loss value
  • forward transmission can be carried out in the neural network structure of the text recognition model according to the loss value
  • the relevant parameters of the text recognition model can be adjusted
  • the reference image pair input can be reset to the relevant parameters.
  • the text recognition model of the parameters is retrained until the loss value of the text recognition model is less than the preset loss value.
  • the text recognition model training ends, and a text recognition model whose training results meet the requirements is obtained to obtain a trained text recognition model.
  • the method may further include:
  • the target image is input into the text recognition model to obtain text information of the target image.
  • This embodiment obtains the target image to be recognized, inputs the target image into the text recognition model, and obtains the text information of the target image with the help of the text recognition model output.
  • the target image to be recognized may be a text image uploaded by the user, or may be a text image collected directly through a camera by an electronic device that performs the text recognition method.
  • the acquisition method of the target image to be recognized is not limited here. Since the text recognition model of this application does not require sample labeling, the text recognition model can be obtained at a lower cost. The cost of directly using the text recognition model for text recognition is also low. In addition, since the text recognition model does not need to be trained during training Sample labeling and recognition accuracy will no longer be affected by the sample labeling method, and will no longer be limited by the number of training samples. The recognition accuracy and reliability of the model obtained after training with a large number of training samples are higher, so using The text recognition model trained by this application can accurately identify the text information of the target image.
  • calculating the similarity of text features of each two reference images may specifically include:
  • a common method is to calculate the cosine distance between text features.
  • Cosine distance can reflect the difference between two vectors in space, aggregate two similar semantic relationships to complete the aggregation of all semantic relationships, and filter out the most aggregated semantic relationships as the semantic recognition result of text features, such as When most semantic relations are gathered in area A, the semantic relations closest to the center of area A are selected from area A as the semantic recognition result.
  • the Word2Vec word vector model can be used to convert the text features of each reference image into word vectors to obtain the text vector of each reference image. Then, the cosine distance of the text vectors of each two reference images is calculated, and the cosine distance is calculated. distance as the similarity.
  • the Word2Vec word vector model is a model that learns semantic knowledge from a large amount of text in an unsupervised manner. It trains a large amount of text and represents the words in the text in the form of vectors. This vector is called a word vector. We can calculate the distance between the word vectors of two words to learn the connection between the two words.
  • training the input neural network model with the reference image may specifically include:
  • Determining whether the training results meet the requirements includes:
  • the trained neural network model is verified according to the verification image. If the verification result does not meet the preset iteration stop conditions, it is determined that the training result does not meet the requirements.
  • one reference image can be randomly selected from the reference image pair as the training image, and the other reference image in the reference image pair can be used as the verification image.
  • the training image can be used to train the neural network model.
  • the verification results may include that the predicted similarity is the same as or different from the similarity of the corresponding text features. For example, the similarity between the text information of the training image output by the neural network model and the text information of the verification image output by the neural network model may be calculated, Obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, use the trained neural network model as a text recognition model to accurately identify text information in the image.
  • the random amplification process on the first image to obtain multiple second images may specifically include:
  • the random amplification processing method includes but is not limited to flipping, translating, scaling the image, adjusting the weight of each RGB channel of the image, and rotating the image.
  • the first image can be flipped, and then the flipped first image can be enlarged to obtain a second image.
  • an embodiment of the present application also provides a training device for a text recognition model, including:
  • Acquisition module 11 used to acquire the first image containing text information
  • the amplification processing module 12 is used to perform random amplification processing on the first image to obtain multiple second images;
  • Marking module 13 used to mark the first image and multiple second images as reference images
  • the calculation module 14 is used to obtain the text features of the text information in each of the reference images, and calculate the similarity of the text features of each two of the reference images;
  • the input module 15 is used to use the two reference images whose similarity is greater than the preset similarity threshold as a reference image pair, and input the reference image pair into the neural network model for training;
  • the judgment module 16 is used to obtain the training results after training the neural network model and judge whether the training results meet the requirements
  • the determination module 17 is configured to use the trained neural network model as a text recognition model when it is determined that the training results meet the requirements.
  • the object recognized by the text recognition model of this application is an image containing text information, and the text information in the image is recognized to realize the text recognition function of the image.
  • the first image with text information obtained in this embodiment can be an image uploaded by the user.
  • the user can obtain it by scanning a paper or other media document with text information, or it can also be a screenshot of the mobile phone screen content, etc. .
  • the first image after acquiring the first image containing text information, can also be pre-processed, such as adjusting image parameters such as image size, brightness, and sharpness of the first image.
  • the first image is usually in color and has multiple colors.
  • the character color of the text information is mostly a color with a relatively dark brightness value.
  • the first image is binarized using the brightness value as the standard, and the first image is converted into a black and white image to highlight the text information in the first image and avoid color interference in the first image.
  • the server obtains the color brightness value in the first image, compares the color brightness value in the first image with the preset color brightness value, and obtains a comparison result, which includes the color in the first image.
  • the brightness value is greater than, equal to or less than the preset color brightness value; according to the comparison result, the first image in the first image whose color brightness value is greater than the preset color brightness value is converted to white, and vice versa is converted to black to facilitate the conversion.
  • Each character of the text information in the first image is extracted.
  • the preset color brightness value can be adjusted as needed.
  • the server in this embodiment can also determine the background color of the first image, and convert the first image with a background color of black and text information of white into a first image with a background color of white and text information of black image, that is, convert an image with white text on a black background into an image with black text on a white background.
  • random amplification processing is a method of expanding data. Random amplification processing can increase the number of samples in the training set, can effectively alleviate the over-fitting of the model, and can also bring stronger generalization to the model. ization ability.
  • the purpose of random amplification processing is to make the training data as close as possible to the test data, thereby improving the prediction accuracy.
  • random amplification processing can force the network to learn more robust features, thereby giving the model stronger generalization capabilities.
  • This embodiment performs random amplification processing on the first image, such as enlarging, reducing, cropping, brightness adjustment, saturation adjustment, etc. random amplification processing methods on the first image.
  • a random amplification processing method can be used, and also Multiple random amplification processing methods can be used in combination to finally obtain multiple second images.
  • the image amplification technology of this embodiment has a positive effect on target detection in deep learning. It can increase the amount of data in each category, balance each category, and avoid over-fitting problems caused by sample imbalance. It can also Reduce the amount of data collected in early samples to a certain extent.
  • This embodiment marks the first image and multiple second images as reference images, generates a data set including all reference images, then obtains the text features of the text information in each reference image from the data set, and calculates the text features of each two reference images. Similarity of text features. Specifically, the text position information of the text information can be identified in the reference image, the reference image is corrected according to the text position information, and a corrected reference image is obtained. The coding network of the recognition model is used to encode the text information of the corrected reference image. Perform feature extraction to obtain text features, and then construct a vector space model for calculating the similarity between the text features of each two reference images based on the word features contained in the text features of each two reference images. According to the The vector space model represents the word features of each two reference images as word vectors. According to the cosine distance algorithm, the cosine value of the angle between the word vectors of each two reference images is calculated, and the cosine value is used as the reference for each two images. Similarity of text features of images.
  • the text position information may be position information of a text frame including text information in the reference image. For example, identify a text area containing text information in the reference image, and obtain the position information of the text area as the text position information of the text information. For example, identify a text area containing content in the reference image, and calculate the corresponding text area.
  • the position information of the virtual text box in the entire reference image is used as the text position information of the text information.
  • the two reference images with a similarity greater than the preset similarity threshold are used as a reference image pair, the reference image pair is used as training data, and the reference image is
  • the input neural network model is trained so that the trained text recognition model can combine the correlation between the training data and improve the recognition accuracy of the text recognition model.
  • the preset similarity threshold can be customized, for example, set to 0.9.
  • this application can also introduce a blockchain structure and make full use of the relevant characteristics of the blockchain (such as the data on the blockchain cannot be tampered with, etc.).
  • the training data can be uploaded to the blockchain for storage. ;
  • the associated data during the training process is uploaded to the blockchain for storage, so that if necessary, the triggered supervision server can obtain and trace back based on the relevant data stored on the blockchain to Reconstruct the training process; and then detect whether there are risky behaviors during the training process based on the reconstructed training process to protect the data security of the data side and improve the security and credibility of the training process.
  • This embodiment can set the iteration conditions of the neural network model.
  • the iteration conditions include the number of training times or the training duration, etc.
  • the training is ended.
  • the training results after training of the neural network model are obtained.
  • the trained neural network model is used as a text recognition model to identify text information in the image.
  • the training results may include the recognized text information of each reference image in the reference image pair, and mark it as the target text information of each reference image in the reference image pair.
  • This embodiment can calculate the similarity of the target text information of the two reference images in the reference image pair, obtain the predicted similarity, and determine whether the predicted similarity is consistent with the similarity of the corresponding text features. If so, the trained The neural network model serves as a text recognition model to accurately identify text information in images.
  • each component of the text recognition model training device proposed in this application can implement the functions of any of the above text recognition model training methods, and the specific structure will not be described again.
  • an embodiment of the present application also provides a computer device, the internal structure of which can be shown in Figure 3.
  • the computer device includes a processor, memory, network interface, and database connected through a system bus.
  • the processor designed by the computer is used to provide computing and control capabilities.
  • the memory of the computer device includes storage media and internal memory.
  • the storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operating system and computer programs in the storage medium to run.
  • the database of the computer device is used to store data related to the training method of the text recognition model.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer program when executed by the processor, implements a method for training a text recognition model.
  • the above-mentioned processor executes the above-mentioned text recognition model training method, including the following steps:
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. When the computer program is executed by a processor, Implement a training method for text recognition models, which includes the following steps:
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), expanded SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SDRAM dual-speed data rate SDRAM
  • ESDRAM expanded SDRAM
  • SLDRAM Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • this application provides a training method, device, computer equipment and storage medium for a text recognition model, which acquires a first image containing text information, performs random amplification processing on the first image, and obtains multiple second images.
  • image mark the first image and multiple second images as reference images, obtain the text features of the text information in each reference image, calculate the similarity of the text features of each two reference images, and set the similarity to be greater than the preset similarity
  • the two reference images of the threshold are used as a reference image pair, the reference image pair is input to the neural network model for training, the training results after the neural network model training are obtained, and it is judged whether the training results meet the requirements.
  • the The trained neural network model is used as a text recognition model.
  • the data volume of the training data is increased through data amplification, and the neural network model is trained by using two reference images with high similarity, so that the training can obtain
  • the text recognition model can combine the correlation between training data, thereby improving the recognition accuracy of the text recognition model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

La présente demande se rapporte au domaine technique du traitement de langage naturel d'une technologie d'intelligence artificielle. La présente demande concerne un procédé et un appareil permettant d'entraîner un modèle de reconnaissance de texte, ainsi qu'un dispositif informatique et un support de stockage. Le procédé consiste : à effectuer un traitement d'augmentation aléatoire sur une première image, de façon à obtenir une pluralité de secondes images ; à marquer la première image et la pluralité de secondes images en tant qu'images de référence ; à acquérir une caractéristique de texte d'informations de texte dans chaque image de référence, et à calculer la similarité des caractéristiques de texte de chaque paire d'images de référence ; à prendre deux images de référence, dont la similarité est supérieure à un seuil de similarité prédéfini, en tant que paire d'images de référence, et à entrer la paire d'images de référence dans un modèle de réseau neuronal pour entraînement ; à acquérir un résultat d'apprentissage après l'entraînement du modèle de réseau neuronal, et à déterminer si le résultat d'apprentissage satisfait une exigence ; et si tel est le cas, à prendre le modèle de réseau neuronal entraîné en tant que modèle de reconnaissance de texte. Ainsi, le volume de données pour des données d'apprentissage est augmenté au moyen d'un mode de traitement d'augmentation de données, de telle sorte que la précision de reconnaissance d'un modèle de reconnaissance de texte est améliorée.
PCT/CN2022/090160 2022-03-15 2022-04-29 Procédé et appareil d'entraînement de modèle de reconnaissance de texte, et dispositif informatique et support de stockage WO2023173546A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210253870.4 2022-03-15
CN202210253870.4A CN114724162A (zh) 2022-03-15 2022-03-15 文本识别模型的训练方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023173546A1 true WO2023173546A1 (fr) 2023-09-21

Family

ID=82238595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090160 WO2023173546A1 (fr) 2022-03-15 2022-04-29 Procédé et appareil d'entraînement de modèle de reconnaissance de texte, et dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN114724162A (fr)
WO (1) WO2023173546A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117457032A (zh) * 2023-12-25 2024-01-26 山东万里红信息技术有限公司 一种基于体量识别的存储介质销毁方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376352A (zh) * 2018-08-28 2019-02-22 中山大学 一种基于word2vec和语义相似度的专利文本建模方法
CN111401375A (zh) * 2020-03-09 2020-07-10 苏宁云计算有限公司 文本识别模型训练方法、文本识别方法、装置及设备
US20210295162A1 (en) * 2019-01-04 2021-09-23 Ping An Technology(Shenzhen)Co.,Ltd. Neural network model training method and apparatus, computer device, and storage medium
CN114005012A (zh) * 2021-11-05 2022-02-01 北京市商汤科技开发有限公司 多模态预训练模型的训练方法、装置、设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915437A (zh) * 2011-08-02 2013-02-06 腾讯科技(深圳)有限公司 文本信息识别方法及系统
CN109979439B (zh) * 2019-03-22 2021-01-29 泰康保险集团股份有限公司 基于区块链的语音识别方法、装置、介质及电子设备
CN111104510B (zh) * 2019-11-15 2023-05-09 南京中新赛克科技有限责任公司 一种基于词嵌入的文本分类训练样本扩充方法
CN112818975B (zh) * 2021-01-27 2024-09-24 北京金山数字娱乐科技有限公司 文本检测模型训练方法及装置、文本检测方法及装置
CN114036907B (zh) * 2021-11-18 2024-06-25 国网江苏省电力有限公司电力科学研究院 一种基于领域特征的文本数据扩增方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376352A (zh) * 2018-08-28 2019-02-22 中山大学 一种基于word2vec和语义相似度的专利文本建模方法
US20210295162A1 (en) * 2019-01-04 2021-09-23 Ping An Technology(Shenzhen)Co.,Ltd. Neural network model training method and apparatus, computer device, and storage medium
CN111401375A (zh) * 2020-03-09 2020-07-10 苏宁云计算有限公司 文本识别模型训练方法、文本识别方法、装置及设备
CN114005012A (zh) * 2021-11-05 2022-02-01 北京市商汤科技开发有限公司 多模态预训练模型的训练方法、装置、设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117457032A (zh) * 2023-12-25 2024-01-26 山东万里红信息技术有限公司 一种基于体量识别的存储介质销毁方法
CN117457032B (zh) * 2023-12-25 2024-03-22 山东万里红信息技术有限公司 一种基于体量识别的存储介质销毁方法

Also Published As

Publication number Publication date
CN114724162A (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2020098074A1 (fr) Procédé et appareil de marquage d'image d'échantillon de visage, dispositif informatique et support de stockage
WO2021164481A1 (fr) Procédé et dispositif de vérification automatique de signature manuscrite reposant sur un modèle de réseau neuronal
US20230119593A1 (en) Method and apparatus for training facial feature extraction model, method and apparatus for extracting facial features, device, and storage medium
CN110866530A (zh) 一种字符图像识别方法、装置及电子设备
CN111079841A (zh) 目标识别的训练方法、装置、计算机设备和存储介质
CN111476268A (zh) 翻拍识别模型训练、图像识别方法、装置、设备及介质
CN111191695A (zh) 一种基于深度学习的网站图片篡改检测方法
CN112464925B (zh) 基于机器学习的移动端开户资料银行信息自动提取方法
WO2022246989A1 (fr) Procédé et appareil d'identification de données, et dispositif et support de stockage lisible
US20170185913A1 (en) System and method for comparing training data with test data
CN113723070A (zh) 文本相似度模型训练方法、文本相似度检测方法及装置
CN111898544B (zh) 文字图像匹配方法、装置和设备及计算机存储介质
CN114549928B (zh) 图像增强处理方法、装置、计算机设备及存储介质
WO2023173546A1 (fr) Procédé et appareil d'entraînement de modèle de reconnaissance de texte, et dispositif informatique et support de stockage
CN113806613B (zh) 训练图像集生成方法、装置、计算机设备及存储介质
CN113283388B (zh) 活体人脸检测模型的训练方法、装置、设备及存储介质
CN114328942A (zh) 关系抽取方法、装置、设备、存储介质和计算机程序产品
CN112613341A (zh) 训练方法及装置、指纹识别方法及装置、电子设备
CN117593752A (zh) 一种pdf文档录入方法、系统、存储介质及电子设备
Fatihia et al. CNN with Batch Normalization Adjustment for Offline Hand-written Signature Genuine Verification
CN114519416A (zh) 模型蒸馏方法、装置及电子设备
CN113516148A (zh) 基于人工智能的图像处理方法、装置、设备及存储介质
CN113837169A (zh) 文本数据处理方法、装置、计算机设备及存储介质
Li et al. Unsupervised steganalysis over social networks based on multi-reference sub-image sets
US20240176951A1 (en) Electronic document validation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931573

Country of ref document: EP

Kind code of ref document: A1