WO2023078070A1 - Procédé et appareil de reconnaissance de caractères, dispositif, support et produit - Google Patents

Procédé et appareil de reconnaissance de caractères, dispositif, support et produit Download PDF

Info

Publication number
WO2023078070A1
WO2023078070A1 PCT/CN2022/125603 CN2022125603W WO2023078070A1 WO 2023078070 A1 WO2023078070 A1 WO 2023078070A1 CN 2022125603 W CN2022125603 W CN 2022125603W WO 2023078070 A1 WO2023078070 A1 WO 2023078070A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
character recognition
sequence
discriminator
recognition model
Prior art date
Application number
PCT/CN2022/125603
Other languages
English (en)
Chinese (zh)
Inventor
范湉湉
黄灿
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023078070A1 publication Critical patent/WO2023078070A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of artificial intelligence (AI), and in particular to a character recognition method, device, equipment, computer-readable storage medium, and computer program product.
  • AI artificial intelligence
  • OCR optical character recognition
  • OCR refers to the process of analyzing and recognizing images to obtain text information. OCR usually includes text detection and text recognition. Text detection refers to detecting the text area in the image, and text recognition refers to identifying the text area to obtain text information.
  • Text recognition based on deep learning is mainly divided into a connectionist temporal classification (CTC) method represented by a convolutional recurrent neural network (CRNN) and an attention method represented by a transformer.
  • CTC connectionist temporal classification
  • CRNN convolutional recurrent neural network
  • Both the CTC method and the attention method use an autoregressive structure, specifically using the generated characters to predict the character at the next position.
  • the recognition model in the above method usually uses a larger model architecture and is trained on a massive data set, which improves the expressive ability of the recognition model.
  • a recognition model with high expressive ability can recognize the blank area in the image as a sentence in the training data based on the autoregressive attribute, which exposes the privacy of the training data on the one hand and reduces the recognition accuracy on the other hand.
  • the purpose of the present application is to provide a character recognition method, device, device, computer-readable storage medium and computer program product, which can improve the recognition accuracy of characters and avoid exposure of training data.
  • the present application provides a character recognition method, the method comprising:
  • the image is recognized by a character recognition model to obtain a sequence of recognition results; wherein the character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and a generative adversarial loss function.
  • a discriminator is used to discriminate the recognition result sequence, and when the discrimination is passed, the recognition result sequence is output.
  • the present application provides a character recognition device, characterized in that the device includes:
  • the communication module is used to obtain the image to be identified
  • a recognition module configured to recognize the image through a character recognition model to obtain a sequence of recognition results; wherein the character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and a generated against the loss function;
  • the discrimination module is configured to use a discriminator to discriminate the recognition result sequence, and output the recognition result sequence when the discrimination is passed.
  • the present application provides an electronic device, including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method described in any one of the first aspect or the second aspect of the present application.
  • the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in any one of the first aspect or the second aspect of the present application are implemented.
  • the present application provides a computer program product containing instructions, which, when run on a device, causes the device to execute the method described in any implementation manner of the first aspect or the second aspect above.
  • the present application at least has the following advantages:
  • the electronic device acquires an image to be recognized, and recognizes the image through a character recognition model including an encoder and a decoder to obtain a sequence of recognition results.
  • the loss function of the decoder includes the autoregressive decoding loss function and the generation confrontation loss function.
  • the generation confrontation loss function is obtained according to the discriminator.
  • the discriminator is used to discriminate the output results in the training of the character recognition model to improve the recognition of the character recognition model. Accuracy, and avoid exposure of training data.
  • FIG. 1 is a schematic flow diagram of a character recognition method provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an encoding-decoding model provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another encoding-decoding model provided by the embodiment of the present application.
  • FIG. 4 is an encoding-decoding model diagram provided by an embodiment of the present application with a discriminator added;
  • Fig. 5 is a schematic diagram of an image to be recognized provided by the embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a character recognition device provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • first and second in the embodiments of the present application are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • CTC connectionist temporal classification
  • attention method text recognition based on deep learning
  • the autoregressive structure refers to predicting the character at the next position based on the character predicted at the previous position. Therefore, if there is a problem with the character prediction at any position, a series of errors may be caused. For example, when there is a blank area, the blank area may be predicted as a certain character due to prediction errors, and continuous errors are caused due to the characteristics of the autoregressive structure, resulting in the "sentence-making" phenomenon.
  • “Sentence making” refers to creating a sentence of content that does not exist in the image to be recognized based on the training data, which affects the recognition accuracy.
  • the blank area is wrongly predicted as a certain character, which is a common character in the training data, and the content generated by "sentence making” is a common character string in the training data, so this method may lead to The training data is exposed, which affects the security of the training data, and the recognition accuracy is low.
  • An electronic device refers to a device capable of data processing, such as a server or a terminal.
  • the terminal includes, but is not limited to, a smart phone, a tablet computer, a notebook computer, a personal digital assistant (personal digital assistant, PDA) or a smart wearable device.
  • the server may be a cloud server, for example, a central server in a central cloud computing cluster, or an edge server in an edge cloud computing cluster.
  • the server may also be a server in a local data center.
  • An on-premises data center refers to a data center directly controlled by the user.
  • the electronic device acquires an image to be recognized, recognizes the image through a character recognition model including an encoder and a decoder, and obtains a sequence of recognition results.
  • the loss function of the decoder includes the autoregressive decoding loss function and the generation confrontation loss function.
  • the generation confrontation loss function is obtained according to the discriminator.
  • the discriminator is used to discriminate the output results in the training of the character recognition model to improve the recognition of the character recognition model. Accuracy, and avoid exposure of training data.
  • the loss function of the decoder includes an autoregressive decoding loss function and a generative adversarial loss function
  • the autoregressive decoding loss value generated according to the autoregressive decoding loss function and the generative adversarial loss value generated according to the generative adversarial loss function can update the character recognition model Parameters in , so as to improve the recognition accuracy of the model.
  • the loss function of the decoder includes generating an adversarial loss function, which is obtained according to the discriminator.
  • the discriminator can discriminate the recognition result sequence output by the character recognition model, so The character recognition model obtained by training can avoid the exposure of training data.
  • S102 The terminal acquires an image to be recognized.
  • the image to be recognized refers to an image including text lines.
  • the text in the text row may be text in multiple languages, such as Chinese, English, Japanese, Russian and so on.
  • the image of the text line can be in various forms, for example, it can be a screenshot of the electronic text, or it can be a photo of the printed text, or it can be a photo of handwritten characters. OCR can recognize the image of the text line.
  • OCR refers to the process in which electronic equipment converts character shapes into computer text and outputs them.
  • OCR includes text detection and text recognition. Text detection is used to find and segment text areas in pictures, and text recognition is used to convert text characters into computer text.
  • Text detection is used to find and segment text areas in pictures
  • text recognition is used to convert text characters into computer text.
  • the judgment of whether an image is an image to be recognized that is, judging whether an image is an image to be recognized, can be judged by the user, can also be judged by the terminal, and can also be judged jointly by the user and the terminal.
  • the user can input the image to be recognized on the page corresponding to the terminal, or the user can turn on the camera connected to the terminal, and the terminal determines whether there is an image to be recognized within the capture range of the camera.
  • the terminal can judge the image input by the user to determine whether it is an image to be recognized, or the user can judge the image captured by the terminal through the camera, and after confirmation, use the image as image to be recognized.
  • the terminal can obtain the image to be recognized in various ways.
  • the terminal can obtain the image to be recognized by calling the corresponding camera, or call the image to be recognized in the picture stored in the terminal, or call the image to be recognized in the current page of the terminal. recognized image.
  • the terminal can turn on the camera according to the user's choice, and obtain the image to be recognized through the information captured by the camera.
  • the user can capture an image of a certain frame as the image to be recognized by clicking the "shooting" control in the terminal, and the terminal can also directly use the image with clear characters in the multi-frame images captured by the camera as the image to be recognized.
  • the terminal may provide the image to the user by displaying it, and after the user confirms the image on the terminal, the terminal obtains the image to be recognized.
  • S104 The terminal recognizes the image through the character recognition model, and obtains a sequence of recognition results.
  • the character recognition model can be shown in FIG. 2 , which is an encoder-decoder model, including an encoder and a decoder.
  • the encoder is used to convert real-world problems into mathematical problems
  • the decoder is used to solve mathematical problems and convert the solution results into real-world solutions.
  • the encoder is used to output data such as input text, pictures, and audio as vectors
  • the decoder is used to generate corresponding text and the like from the vectors output by the encoder.
  • the input of the encoder is an image to be recognized
  • the output of the encoder is a vector
  • the input of the decoder is a vector
  • the output is a sequence of initial recognition results.
  • the length of the vector from the encoder output to the decoder input is fixed, so the problem of information loss may occur.
  • the encoder needs to compress the information of the entire sequence into a fixed-length vector.
  • the vector cannot fully represent the information of the entire sequence.
  • the information carried by the content entered first will be diluted by the information carried by the content entered later. Therefore, the decoder cannot obtain enough information of the input sequence when decoding according to the vector, which affects the accuracy of the decoding result.
  • the loss function is used to measure the degree of inconsistency between the predicted value of the model and the real value.
  • the loss function can be used to measure the quality of the model prediction, and can also be used to supplement the model so that the prediction result of the model is closer to the real value.
  • the autoregressive decoding loss value can be determined according to the initial recognition result sequence, the label sequence of the training data and the autoregressive decoding loss function, and the obtained autoregressive decoding loss value can be used to update the character recognition model. parameters, so that the prediction result of the character recognition model is closer to the real value.
  • the autoregressive decoding loss function can be used to compensate for the error caused by the limitation of the vector length in the encoding-decoding model, but the autoregressive decoding loss function cannot completely reduce the error. Moreover, even if the loss function of the decoder in the character recognition model includes an autoregressive decoding loss function, there may still be output results based on the autoregressive properties of the encoding-decoding model, identifying blank areas in the image as sentences in the training data, Expose training data privacy and reduce recognition accuracy.
  • the terminal may add a discriminator during the training process of the character recognition model to judge the recognition result output by the character recognition model.
  • the terminal can use the discriminator in the generative adversarial networks (GAN) to generate an adversarial loss function.
  • GAN generative adversarial networks
  • the discriminator can output the discrimination result according to the encoding features of the recognition result sequence and the label sequence output by the decoder for the training data and the encoding features in the encoder, and then according to the recognition result sequence, the discrimination result and the generated
  • the adversarial loss function determines the generated adversarial loss value, thereby updating the parameters of the character recognition model according to the loss value of the generated adversarial loss function.
  • the "real (real)/fake (fake)" result output by the discriminator can be used to train the decoder.
  • Generative confrontation network is a deep learning model that includes a generator and a discriminator. Taking generating a picture as an example, the generator is used to receive a random noise z, and the picture is generated through the noise, and the discriminator is used to judge whether the picture is real or not.
  • the goal of the generator is to generate real pictures as much as possible to deceive the discriminator, and the goal of the discriminator is to separate the pictures generated by the generator from the real pictures as much as possible, and finally obtain the generator used to generate pictures. .
  • the generated confrontation loss function can be:
  • G represents the generator
  • D represents the discriminator
  • x represents the real data
  • pdata represents the probability density distribution of the real data
  • z represents the randomly input data.
  • the discriminator D needs to distinguish the real sample x from the fake sample G(z) as much as possible, so D(x) needs to be as large as possible, and D(G(z) ) needs to be as small as possible, that is, V(D,G) needs to be as large as possible.
  • the generator hopes that the data G(z) it generates can fool the discriminator as much as possible, that is, it hopes that D(G(z)) is as large as possible, that is, V(D,G) is as large as possible. Possibly small. Therefore, the two modules of GAN are trained against each other to achieve the global optimum.
  • the generator in the GAN is a decoder
  • the decoder is used for decoding according to the vector output by the encoder
  • the discriminator is used for judging the decoding result (output).
  • the discriminator judges the output result of the decoder
  • the label sequence groundtruth
  • the output result of the decoder is the same as the reference label sequence, the output is real (real), and when it is not the same, the output is fake (fake).
  • the encoder and the decoder encode and decode the text line characters in the image to be recognized character by character, and the discriminator judges the output result of the decoder sentence by sentence Make a judgment.
  • the output sequence of the decoder output sequence
  • the output is real (real), and when they are not the same, the output is false (fake).
  • the terminal can obtain the generative confrontation loss function in the form of a crossentropy loss function (crossentropy loss) and a hinge loss function (hinge loss).
  • cross-entropy loss function is an algorithm commonly used in classification problems, which can be applied to binary classification problems or multi-classification problems.
  • Cross entropy is mainly used to measure the difference between two probability distributions. It can measure the difference between two different probability distributions in the same random variable. In machine learning, it is expressed as the difference between the real probability distribution and the predicted probability distribution. . The smaller the value of cross entropy, the better the model predicts.
  • the hinge loss function is an algorithm dedicated to binary classification problems.
  • the label value is -1 or 1, and the predicted value y ⁇ R.
  • the hinge loss function is usually used as the objective function of support vector machine (SVM).
  • SVM is a class of generalized linear classifiers that perform binary classification on data in a supervised learning manner.
  • SVM uses the hinge loss function (hinge loss) to calculate the empirical risk (empirical risk) and adds a regularization term to the solution system to optimize the structural risk (structural risk).
  • SVM is a sparse and robust classifier.
  • the loss function of the decoder includes not only the autoregressive decoding loss function, but also the generative adversarial loss function.
  • the autoregressive decoding loss value can be determined according to the recognition result sequence, the label sequence and the autoregressive decoding loss function.
  • the terminal determines the generated confrontation loss value according to the recognition result sequence, the discrimination result and the generated confrontation loss function, and updates the parameters of the character recognition model according to the autoregressive decoding loss value and the generated confrontation loss value, so as to obtain a high recognition accuracy without exposing the privacy of the training data. character recognition model.
  • the loss function of the decoder includes generating an adversarial loss function, so the decoder is updated after being jointly trained with the discriminator, and the recognition result generated by the character recognition model Sequence can avoid exposing training data privacy and improve recognition accuracy.
  • the terminal may further discriminate the output recognition result sequence through the discriminator updated through joint training with the decoder, and output the recognition result sequence when the discrimination is passed.
  • S106 The terminal uses a discriminator to discriminate the recognition result sequence, and outputs the recognition result sequence when the discrimination is passed.
  • the generator (decoder) and the discriminator are trained together, and the decoder and the discriminator are trained alternately during the training process.
  • the initial decoder is G0
  • the initial discriminator is D0
  • Both the decoder and the discriminator are being optimized during alternate training.
  • gradient descent the method of gradient descent (gradient descent) can be used for the training of the decoder and the discriminator.
  • Gradient descent methods are based on the observation that if a real-valued function F(x) is differentiable and defined at point a, then the function F(x) at point a follows the direction opposite to the gradient (a) drops the most.
  • the character recognition method provided in this embodiment is introduced by taking the scene shown in FIG. 5 as an example when the user needs to recognize characters on paper.
  • S102 The terminal acquires an image to be recognized.
  • the terminal obtains the image to be recognized, wherein the terminal can obtain the image to be recognized by calling the camera to capture the content in the paper, or can obtain the image to be recognized in the paper by calling the camera to scan.
  • the picture includes two parts: part A and part B, wherein part A and part B both include text line characters, but part B also includes blank sentences.
  • S104 The terminal recognizes the image through the character recognition model, and obtains a sequence of recognition results.
  • the recognition result sequence output by part B may expose the training data.
  • the training data is Chinese test questions
  • the content of the output part B may be: "Moonlight in front of the bed, fill in the blanks with ancient poems.” and so on.
  • the character recognition model recognizes the blank area "___” as a sentence in the training data based on the autoregressive attribute, such as "fill in the blanks with ancient poems” that appear frequently in Chinese test questions.
  • the output recognition result sequence will expose the privacy of the training data.
  • the training data includes "filling in the blanks with ancient poems", which will lead to the exposure of the training data of the character recognition model.
  • the decoder in the character recognition model When the decoder in the character recognition model is trained by the discriminator, it can output accurate output results without exposing the training data.
  • the label sequence of part B is: "1. Moonlight in front of the bed, ___.” Therefore, the recognition result sequence output by the decoder is: "Moonlight in front of the bed, fill in the blanks with ancient poems .”, the discriminator obtains the discriminant result: false according to the recognition result sequence of the decoder, the encoding features of the label sequence in the training data, and the encoding features of the encoder in the character recognition model.
  • the character recognition model re-recognizes, thereby avoiding the exposure of training data.
  • the character recognition model in this embodiment can obtain an accurate sequence of recognition results by recognizing the image to be recognized.
  • the image to be recognized includes a blank area, it can output an accurate sequence of recognition results, avoid exposure of training data, and protect data security.
  • S106 The terminal uses a discriminator to discriminate the recognition result sequence, and outputs the recognition result sequence when the discrimination is passed.
  • the discriminator trained together with the decoder can also be used to discriminate the recognition result sequence to ensure the security of the training data.
  • the recognition result sequence is output. The recognition result sequence output in this way can further ensure the security of the training data.
  • the embodiment of the present application provides a character recognition method.
  • the terminal obtains the image to be recognized, recognizes the obtained image through the character recognition model, and obtains the sequence of recognition results, wherein the character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and a generated confrontation loss function.
  • the discriminator can discriminate the recognition result sequence output by the character recognition model, thereby improving the recognition accuracy of the character recognition model and preventing the exposure of training data. Therefore, the character recognition method provided in this embodiment can effectively improve the accuracy of character recognition and avoid exposing the privacy of training data.
  • Fig. 6 is a schematic diagram of a character recognition device according to an exemplary disclosed embodiment. As shown in Fig. 6, the character recognition device 600 includes:
  • the recognition module 604 is configured to recognize the image through a character recognition model to obtain a sequence of recognition results; wherein the character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and Generate an adversarial loss function.
  • the discrimination module 606 is configured to use a discriminator to discriminate the recognition result sequence, and output the recognition result sequence when the discrimination is passed.
  • the character recognition model is obtained through training as follows:
  • the decoder performs decoding in units of characters
  • the discriminator performs discrimination in units of sentences.
  • the generative adversarial loss function is a cross-entropy loss function.
  • FIG. 7 it shows a schematic structural diagram of an electronic device 700 suitable for implementing the embodiment of the present application.
  • the terminal equipment in the embodiment of the present application may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 1 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
  • an electronic device 700 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) Various appropriate actions and processes are executed by programs in the memory (RAM) 703 . In the RAM 703, various programs and data necessary for the operation of the electronic device 700 are also stored.
  • the processing device 701, ROM 702, and RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704 .
  • the following devices can be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 707 such as a computer; a storage device 708 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 709.
  • the communication means 709 may allow the electronic device 700 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 7 shows electronic device 700 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 709, or from storage means 708, or from ROM 702.
  • the processing device 701 the above-mentioned functions defined in the method of the embodiment of the present application are performed.
  • the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks ("LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains an image to be recognized; recognizes the image through a character recognition model, and obtains A recognition result sequence; wherein, the character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and a generation confrontation loss function; a discriminator is used to discriminate the recognition result sequence, when When the judgment is passed, the recognition result sequence is output.
  • Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • modules involved in the embodiments described in the present application may be implemented by means of software or hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a character recognition method, the method comprising: acquiring an image to be recognized; recognizing the image through a character recognition model to obtain a sequence of recognition results; wherein, The character recognition model includes an encoder and a decoder, and the loss function of the decoder includes an autoregressive decoding loss function and a generation confrontation loss function; a discriminator is used to discriminate the recognition result sequence, and when the discrimination is passed, the output The recognition result sequence described above.
  • Example 2 provides the method of Example 1, the character recognition model is obtained by training as follows: input training data into the character recognition model, and obtain the training image in the training data recognition result sequence; input the coding features of the recognition result sequence and the label sequence in the training data and the coding features of the encoder in the character recognition model into a discriminator to obtain a discrimination result; according to the recognition result sequence, the The label sequence and the autoregressive decoding loss function determine the autoregressive decoding loss value, and determine the generated confrontation loss value according to the recognition result sequence, the discrimination result and the generated confrontation loss function; according to the autoregressive decoding loss values and the generated adversarial loss values update the parameters of the character recognition model.
  • Example 3 provides the method of Example 2, the decoder performs decoding in units of characters, and the discriminator performs discrimination in units of sentences.
  • Example 4 provides the method of any one of Example 1 to Example 3, wherein the generation confrontation loss function is a cross-entropy loss function.
  • Example 5 provides a character recognition device, the device includes: a communication module, used to acquire an image to be recognized; a recognition module, used to identify the image through a character recognition model Carry out recognition, obtain recognition result sequence;
  • described character recognition model comprises coder and decoder
  • the loss function of described decoder comprises autoregressive decoding loss function and generation adversarial loss function; The recognition result sequence is judged, and when the judgment is passed, the recognition result sequence is output.
  • Example 6 provides the device of Example 5, and the character recognition model is obtained through training as follows: input training data into the character recognition model, and obtain the training image in the training data recognition result sequence; input the coding features of the recognition result sequence and the label sequence in the training data and the coding features of the encoder in the character recognition model into a discriminator to obtain a discrimination result; according to the recognition result sequence, the The label sequence and the autoregressive decoding loss function determine the autoregressive decoding loss value, and determine the generated confrontation loss value according to the recognition result sequence, the discrimination result and the generated confrontation loss function; according to the autoregressive decoding loss values and the generated adversarial loss values update the parameters of the character recognition model.
  • Example 7 provides the apparatus of Example 6, the decoder performs decoding in units of characters, and the discriminator performs discrimination in units of sentences.
  • Example 8 provides the devices of Example 5 to Example 7, wherein the generated confrontation loss function is a cross-entropy loss function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

La présente demande concerne un procédé et un appareil de reconnaissance de caractères, un dispositif et un support. Le procédé comprend : l'obtention, par un dispositif électronique, d'une image à reconnaître ; et la reconnaissance de ladite image au moyen d'un modèle de reconnaissance de caractères comprenant un codeur et un décodeur, pour obtenir une séquence de résultats de reconnaissance. Une fonction de perte du décodeur comprend une fonction de perte de décodage autorégressive et une fonction de perte antagoniste générative, la fonction de perte antagoniste générative est obtenue selon un discriminateur, et le discriminateur est utilisé pour discriminer un résultat de sortie au cours de l'apprentissage du modèle de reconnaissance de caractères de façon à améliorer la précision de reconnaissance du modèle de reconnaissance de caractères et d'éviter une exposition des données d'apprentissage.
PCT/CN2022/125603 2021-11-04 2022-10-17 Procédé et appareil de reconnaissance de caractères, dispositif, support et produit WO2023078070A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111302425.4A CN114037990A (zh) 2021-11-04 2021-11-04 一种字符识别方法、装置、设备、介质及产品
CN202111302425.4 2021-11-04

Publications (1)

Publication Number Publication Date
WO2023078070A1 true WO2023078070A1 (fr) 2023-05-11

Family

ID=80136376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125603 WO2023078070A1 (fr) 2021-11-04 2022-10-17 Procédé et appareil de reconnaissance de caractères, dispositif, support et produit

Country Status (2)

Country Link
CN (1) CN114037990A (fr)
WO (1) WO2023078070A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116683648A (zh) * 2023-06-13 2023-09-01 浙江华耀电气科技有限公司 一种智能型配电柜及其控制系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037990A (zh) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 一种字符识别方法、装置、设备、介质及产品
CN114973229B (zh) * 2022-05-31 2024-07-02 深圳市星桐科技有限公司 文本识别模型训练、文本识别方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373979A1 (en) * 2017-06-22 2018-12-27 Adobe Systems Incorporated Image captioning utilizing semantic text modeling and adversarial learning
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence
CN112489168A (zh) * 2020-12-16 2021-03-12 中国科学院长春光学精密机械与物理研究所 一种图像数据集生成制作方法、装置、设备及存储介质
CN113269189A (zh) * 2021-07-20 2021-08-17 北京世纪好未来教育科技有限公司 文本识别模型的构建方法、文本识别方法、装置及设备
CN113283427A (zh) * 2021-07-20 2021-08-20 北京世纪好未来教育科技有限公司 文本识别方法、装置、设备及介质
CN113344014A (zh) * 2021-08-03 2021-09-03 北京世纪好未来教育科技有限公司 文本识别方法和装置
CN114037990A (zh) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 一种字符识别方法、装置、设备、介质及产品

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084281B (zh) * 2019-03-31 2023-09-12 华为技术有限公司 图像生成方法、神经网络的压缩方法及相关装置、设备
CN113313064B (zh) * 2021-06-23 2024-08-23 北京有竹居网络技术有限公司 字符识别方法、装置、可读介质及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373979A1 (en) * 2017-06-22 2018-12-27 Adobe Systems Incorporated Image captioning utilizing semantic text modeling and adversarial learning
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence
CN112489168A (zh) * 2020-12-16 2021-03-12 中国科学院长春光学精密机械与物理研究所 一种图像数据集生成制作方法、装置、设备及存储介质
CN113269189A (zh) * 2021-07-20 2021-08-17 北京世纪好未来教育科技有限公司 文本识别模型的构建方法、文本识别方法、装置及设备
CN113283427A (zh) * 2021-07-20 2021-08-20 北京世纪好未来教育科技有限公司 文本识别方法、装置、设备及介质
CN113344014A (zh) * 2021-08-03 2021-09-03 北京世纪好未来教育科技有限公司 文本识别方法和装置
CN114037990A (zh) * 2021-11-04 2022-02-11 北京有竹居网络技术有限公司 一种字符识别方法、装置、设备、介质及产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116683648A (zh) * 2023-06-13 2023-09-01 浙江华耀电气科技有限公司 一种智能型配电柜及其控制系统
CN116683648B (zh) * 2023-06-13 2024-02-20 浙江华耀电气科技有限公司 一种智能型配电柜及其控制系统

Also Published As

Publication number Publication date
CN114037990A (zh) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2023078070A1 (fr) Procédé et appareil de reconnaissance de caractères, dispositif, support et produit
CN113313064B (zh) 字符识别方法、装置、可读介质及电子设备
WO2022252881A1 (fr) Procédé et appareil de traitement d'image, support lisible et dispositif électronique
WO2023077995A1 (fr) Procédé et appareil d'extraction d'informations, dispositif, support et produit
WO2022012179A1 (fr) Procédé et appareil pour générer un réseau d'extraction de caractéristique, dispositif et support lisible par ordinateur
CN113723341B (zh) 视频的识别方法、装置、可读介质和电子设备
CN113140012B (zh) 图像处理方法、装置、介质及电子设备
CN112766284B (zh) 图像识别方法和装置、存储介质和电子设备
WO2023142914A1 (fr) Procédé et appareil de reconnaissance de date, support lisible et dispositif électronique
CN115376559A (zh) 基于音视频的情绪识别方法、装置及设备
CN112712036A (zh) 交通标志识别方法、装置、电子设备及计算机存储介质
CN113408507B (zh) 基于履历文件的命名实体识别方法、装置和电子设备
CN111797822B (zh) 文字对象评价方法、装置和电子设备
CN113111167B (zh) 基于深度学习模型的接处警文本车辆型号提取方法和装置
WO2023130925A1 (fr) Procédé et appareil de reconnaissance de police, support lisible et dispositif électronique
US20230048495A1 (en) Method and platform of generating document, electronic device and storage medium
CN116244431A (zh) 文本分类方法、装置、介质及电子设备
CN114004229A (zh) 文本识别方法、装置、可读介质及电子设备
CN111797931A (zh) 图像处理方法、图像处理网络训练方法、装置、设备
CN115565152B (zh) 融合车载激光点云与全景影像的交通标志牌提取方法
CN113345426B (zh) 一种语音意图识别方法、装置及可读存储介质
CN117435739B (zh) 图像文本分类的方法及装置
US20240281639A1 (en) Complex clipping for improved generalization in machine learning
CN116109989B (zh) 评价信息生成方法、装置、电子设备和计算机可读介质
WO2023092296A1 (fr) Procédé et appareil de reconnaissance de texte, support de stockage et dispositif électronique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889095

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE